• Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL

Appendix A. GLOSSARY

Appendix A. GLOSSARY


See [ASCII-compatible encoding.]

The process of connecting the source and target segments into pairs for translation memory.


Acronym for American Standard Code for Information Interchange. ASCII is a 7-bit coded character set and has since been widely replaced by ISO 8859-1.

ASCII-compatible encoding (ACE)

The result of converting internationalized domain names (IDNs) into an ASCII string that can be resolved by DNS servers.


Hiring another translation agency or freelance translator (after your site has been translated) to tell you what your site is saying to him or her; functions as part of a quality audit.

bi-directional language (bidi)

Includes text that flows from right to left and left to right. Arabic and Hebrew text, for example, flows from right to left, but also includes text that flows from left to right, such as numbers and text from other languages.


The smallest unit a computer can process is a bit, represented by a 1 or a 0. Computers think in numbers, represented in bits, which means every character and every web page is represented by a combination of 1s and 0s.


An Internet connection that is generally faster than a dial-up modem. Typical broadband connections include DSL (digital subscriber line), cable, or digital satellite.


One byte is equal to eight bits.



Country code top-level domain;

See also [domain name.]

change order

When a translation or localization project expands beyond the quoted scope, the vendor typically creates a change order detailing the added charges and timetable, which must be approved by the client before work begins.


The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape, rather than the specific shape (that is, its rendering in a specific typeface), which is referred to as a glyph (source: Unicode Consortium; www.unicode.org).

character entity (named and numeric)

The HTML protocol includes character entities as an alternative method of representing characters. An entity can be in the form of a named entity, such as €, or numeric entity, such as &#175.

character set

A collection of characters typically grouped by script.

character set conflict

HTML allows for the use of only one character set per web page. Sometimes the web page specifies characters from more than one character set, thereby creating a conflict. The long-term solution to such conflicts is the use of the “super character set” Unicode.


See [controlled language.]

Can refer to an organization that hires translators or a translation/localization agency. A client can also refer to a web browser or email application.


See [content management system.]
coded character set

When the characters of a character set are mapped to numbers, or code points, the character set is called a coded character set. The distinction between character set and coded character set is important as computers work only in numbers.


Microsoft has developed its own coded character sets, called Codepages. Each Codepage is typically a modification of existing ISO character sets. For example, Codepage 1251 is a modification of ISO 8859-1 in which extra characters were added.

computer-aided translation (CAT)

A broad term that may include a wide range of software tools designed to help translators work more quickly and/or improve the quality of their work. CAT tools range from electronic bilingual dictionaries to translation memory software.


Linking two or more objects. Concatenation is often used in software and web development to create text strings that are assembled dynamically in pieces. Concatenation can cause problems when text is translated, as the pieces might no longer fit together so neatly.

content audit

A comprehensive review of all web site content, typically conducted before redesigns, but also important before web globalization projects. A content audit helps an organization decide as early as possible what components of a site need to be localized and whether they will be localized.

content management system (CMS)

A CMS refers to software designed to help organizations manage the creation and dissemination of web content. CMS applications become increasingly important as web sites grow larger, more decentralized, and more sophisticated.

content negotiation

A still-evolving system of delivering language-specific content to web browsers based on the language preference of the web browser.

controlled language (CL)

In the field of translation, a controlled language imposes strict rules designed to result in text that is more easily, clearly, and consistently translated. Rules often apply to terminology, grammar, and length of sentences. Also referred to as simplified language.


Think of cookies as nametags for computers, stored on a web user’s computer after visiting a web site. Many web sites rely on cookies to maintain a more personal relationship with visitors. Language and country preferences are increasingly being added to the list of data stored in cookies.

customer relationship management (CRM)

The process of developing a better understanding of and a closer relationship with an organization’s customers. CRM software packages facilitate this process.


See [double-byte character set.]

See [Dynamic Hypertext Markup Language.]

See [Domain Name System or Domain Name Service.]
domain name

A unique alphanumeric string that identifies a particular computer or domain, such as “amazon.com.” The domain name consists of two parts: the top-level domain and the second-level domain. The top level is a generic top-level domain (gTLD) or country code top-level domain (ccTLD).

Domain Name System or Domain Name Service (DNS)

A distributed system of translating domain names into their matching numeric Internet Protocol (IP) addresses. The DNS allows web users to locate a web site by its easy-to-remember domain name rather than the more cumbersome IP address, which is a string of numbers.

double-byte character set (DBCS)

Some character sets contain more than 256 characters—the most that can be represented in a single-byte character set. Adding a second byte enables you to represent more than 65,000 characters. Double-byte character sets have been widely used for Chinese and Japanese.

Dynamic Hypertext Markup Language (DHTML)

A combination of scripts and standards that have evolved to make web pages more animated and interactive. Put simply, DHTML allows a web page to interact with the user long after it has been fully downloaded.

encoding, encoding scheme

A system of mapping characters to numbers so that computers can manipulate them. After a character set is encoded, it is called a coded character set. Encoding can be confusing because it also means the mapping of coded characters to actual byte values; for example, Unicode is a coded character set. Each character is assigned one number only, but each of these numbers can be assigned different byte values to accommodate computing systems, so there are several encodings: UTF-8, UTF-16, and UTF-32.


The official currency of 12 countries within the European Union.

exact match

When translation memory software scans new text, it looks for sentences that match previously translated sentences. If it finds a perfect match, ignoring any formatting information, the two are considered an exact match.

eXtensible Markup Language (XML)

XML, a markup language like HTML, acts as a flexible “content wrapper.”


A common abbreviation for the European languages French, Italian, German, and Spanish.


A collection of glyphs typically grouped by language or script.

fuzzy match

When translation memory software scans new text, it looks for sentences that match previously translated sentences. When it encounters two sentences that are similar but not an exact match, it is called a fuzzy match. The degree of fuzziness is expressed as a percentage, such as “80% fuzzy match.”

G11N, g11n

The abbreviation for globalization; see globalization. The number 11 refers to the number of letters between g and n.

gateway (international gateway, global gateway)

The web interface, and underlying functionality, used to direct users to their localized web pages.

generational strategy

A gradual approach to web globalization in which a company could, for example, begin with one localized site and limited functionality and then build from there, gradually adding languages and layers of sophistication.


A very rough form of translation in which only the “gist” of the text is translated. Machine translation tools are generally only capable of gisting.

globalization (g11n)

Globalization means vastly different things to different people, but for the purposes of this book, it is the process of expanding an organization beyond its native market. When applied to a web site, globalization encompasses the full range of actions required to adapt that web site for new markets, such as business strategy, internationalization, localization, translation, testing, support, and promotion. Although globalization generally applies to expanding an organization’s geographic reach, it can apply to expanding an organization’s linguistic or cultural reach, such as an American company that translates its web site for Americans who do not speak English.

globalization management system (GMS)

Software designed to help an organization manage ongoing web localization, particularly text translation. Globalization software can include many of the features of content management systems or may work in tandem with such a system. Features might include workflow management, translation memory integration, and vendor management tools.

globalization workflow

Basically a to-do list of all the tasks needed to internationalize and localize a web site.

global resource

file In web globalization, a resource file contains all translatable text strings. Centralizing all text strings saves time and often reduces room for error.

global template

A web design template that standardizes such elements as navigation, colors, and typefaces, but allows room for localized content and promotions.


An image used to visually represent a character. In other words, it’s how a character is rendered in a particular typeface or font.


See [globalization management system.]
graceful degradation

The practice of building web pages so that they adapt themselves to the user’s browser. This practice can also be applied to global gateways.


Generic top-level domain;

See also [domain name.]


Similar to standardization. The term has been traditionally used in the regulatory fields to describe the effort to standardize rules and regulations between countries. Harmonization seeks to minimize barriers to cross-border trade.


A numbering system that uses 16 digits. The decimal system uses 10 digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. Hexadecimal uses these digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F. Unicode characters are encoded by using hexadecimal notation.

Hypertext Markup Language (HTML)

A standard for publishing and viewing content on the Internet.

Hypertext Transfer Protocol (HTTP)

Defines how browsers and servers communicate with one another to effectively transmit and display HTML documents. A key component of this protocol in regards to the multilingual Internet is the communication of document encodings.

I18N, i18n

Abbreviation for internationalization. The number 18 refers to the number of letters between i and n.


See [Internet Corporation for Assigned Names and Numbers.]
ideographic writing system

A writing system that relies on symbols that represent ideas, rather than sounds, to communicate meaning. Chinese began as an ideographic writing system.


See [internationalized domain name.]

See [input method editor.]
input method editor (IME)

A software tool that enables a Latin keyboard to input non-Latin characters. IMEs are commonly used for Japanese, Chinese, and Korean characters.

International Organization for Standardization (ISO)

ISO is a voluntary, worldwide federation of national standards bodies founded in 1946. It promotes the development of standardization to facilitate the international exchange of goods and services and cooperation in the spheres of intellectual, scientific, technological, and economic activity. ISO includes one representative from the national standards organizations of about 100 member countries (source: ISO; www.iso.ch).

internationalization (i18n)

The process of designing and building a web site (or software application) so that it can be more easily localized. Internationalization entails isolating the elements of a site that will need to be changed during localization and making the necessary allowances for their modification. Internationalization can be more extensive for an application that requires Asian localization versus one that might need only Western European localization.

internationalized domain name (IDN)

A system for allowing non-ASCII characters into the domain name. Currently, many registrars offer multilingual registration, but there is no official standard. For more information, visit the IETF IDN Working Group at www.ietf.org.

Internet Corporation for Assigned Names and Numbers (ICANN)

ICANN (www.icann.org) manages IP address space allocation, protocol parameter assignment, domain name system management, and root server system management functions.

Internet Engineering Task Force (IETF)

A group that works to solve the technical challenges of the Internet, such as the internationalization of domain names (www.ietf.org).

L10N, L10n

Abbreviation for localization. The number 10 refers to the number of letters between L and n.

language pair

The combination of source language and target language, such as English → French. The arrow indicates the relationship between the two languages. Translators generally specialize in just one language pair—their native language and one other language.


A combination of language and region or country, such as en-US or en-UK. A number of attributes are typically associated with each locale, such as language, number format, time and date formats, currency, and so forth.

localization (L10n)

The adaptation of a web site to a locale. The process can include a wide range of linguistic, cultural, and technical modifications, such as text translation; conversion of date, time, and measurement formats; and customization of the interface.

localization kit

Before web localization can begin, a localization kit is developed. It includes the necessary files, glossaries, guidelines, checklists, tracking sheets, and style guides to enable efficient and consistent translation. Also called a translation kit.

localization vendor

A localization vendor differentiates itself from a translation vendor by providing the technical expertise and tools necessary to localize HTML files, graphics, and databases, often in addition to translation.

localized façade

A site that translates only its first few pages.


See [multibyte character set.]

Short for mobile commerce, the ability to access the Internet and conduct transactions by using a cellular phone or other mobile device.

machine translation (MT)

The process of translating from one human language to another using software. The term dates back to a time when computers were called machines. One of the most popular (and free) MT applications is Babel Fish (http://world.altavista.com).

masculinity (MAS) index

One of Geert Hofstede’s five dimensions of culture; measures how traditional roles are assigned to gender in different cultures. A high MAS index emphasizes division of labor and roles between genders.


Data that defines or describes the data being managed. It may include context descriptions, technical details, and instructions.


See [machine translation.]
multibyte character set (MBCS)

A Latin-based language uses a relatively short script of characters, but some languages, such as Japanese or Chinese, rely on several thousands of characters. To represent all these characters numerically so that computers can manage them, more than one byte is required. Unicode is a multibyte character set because it relies on encodings that vary in byte length, depending on the code point of the character being represented. For example, a Latin character remains as a single byte, but a Chinese character could require three bytes.

named character entity

See [character entity.]
numeric character entity

See [character entity.]
overweight web page

In broad terms, an overweight web page uses so many elements—text, graphics, functionality—that the page size negatively affects the download time for the end user. The weight of a page can be calculated in kilobytes (KB); the average web page weight is 89KB. If a web site is larger than 80KB, web users who rely on slower connections might not be patient enough to wait for the web page to display.

phonetic writing system

A writing system relying on symbols that represent sounds. The Latin alphabet is a phonetic system. There are two major phonetic systems: alphabetic (each character represents a vowel or consonant) and syllabic (each character represents a combination of consonants and/or vowels).

power-distance (PD)

One of Geert Hofstede’s five dimensions of culture; reflects the extent to which people within a culture accept (or expect) unequal power distribution. High PD countries exhibited centralized power structures and clearly defined class distinctions.


Request for Proposal.

rich text format (RTF)

An encoding system for formatted text that allows the transfer of documents between operating systems and applications without loss of formatting. Translators and translation agencies rely heavily on this file format because it preserves files in their native encoding and is platform neutral.

rollover image

An effect created by using an “on” and “off” graphic image, one on top of the other. When the web user’s mouse rolls over the off graphic, the on graphic appears. Also known as a mouseover. Rollover graphics are most frequently used on buttons.


See [rich text format.]

In translation memory, the fundamental unit of text that can be stored into memory—typically a sentence.

source language

The language one translates from.

target language

The language one translates into.

terminology glossary

A collection of terminology, slogans, and navigational wording that must be consistently translated (or not translated) throughout the web site.

terminology manager

A software tool that aids in the translation process by storing source and target terms. Terminology managers are often included as part of a larger translation memory software product.

text contraction/text expansion

When text is translated into another language, the resulting target text might end up longer or shorter than the source text. For example, when English is translated into German, text can expand by as much as 40%; conversely, when English is translated into Chinese, the resulting text may be 20% shorter.


See [translation memory.]

See [Translation Memory eXchange.]

Process of transferring the meaning of the text from one language to another.

translation memory (TM)

The process of saving previous translations as source sentence/target sentence pairs so that they can be reused if a similar source sentence appears again. The larger a translation memory grows, the more valuable it generally becomes because it reduces the number of source sentences that require manual translation. Translation memory also aids in overall translation consistency.

Translation Memory eXchange (TMX)

A standard for enabling the exchange of translation memories between different software tools. It is managed by the Localization Industry Standards Association (LISA) and is designed to work with XML.

translation vendor

Translation vendors manage translators—in-house, freelance, or both. They might specialize in a few language pairs or could manage as many as 60 language pairs. A translation vendor manages, at a minimum, text translation. Some vendors also manage graphics localization and other technical aspects of web localization.


Transferring the sound of the text in one language into the text of another language. Translation results in new words that convey the same meaning, but transliteration results in new words that sound like the old words, regardless of meaning. Transliteration is frequently used for creating “romanized” versions of text in Asian, Arabic, or Cyrillic languages.


A universal coded character set, designed to include the characters from all the world’s major languages. Unicode Version 3.1 contains 94,140 encoded characters. The Unicode Consortium (www.unicode.org) developed Unicode.


An orderly process of checking files throughout the localization process to ensure that errors haven’t been introduced by translators, editors, or web developers.


Anything that changes from market to market, or even within a market, such as measurements and sizes or prices and currencies.


Can include individual translators, translation agencies (who manage multiple translators), localization agencies, or software vendors.

visual keyboard

A software tool available with the Microsoft Office suite that makes it easier for users to type languages not represented on the physical keyboard.


See [eXtensible Markup Language.]



Not a subscriber?

Start A Free Trial

  • Creative Edge
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint