Generic Internationalization

(Published in the LISA Globalization Insider, Number 2.3, May 22, 2002)

In the March 15th (2002) issue of the LISA newsletter (the Globalization Insider), in collaboration with Bert Esselink, we introduced the following formula and definition:

Formula

Globalization = Internationalization + N * Localization (where N is the number of targeted locales)

Definition

Internationalization of a thing consists in any and all preparatory tasks that will facilitate subsequent localization of said thing.

Which proposed internationalization as the first step in an efficient globalization process.

In this article, the intent is to further refine the definition of internationalization while remaining as general as possible, by posing the question: "What can we say about internationalization that remains generally true, independent of the thing being internationalized?"

This is basically an exercise in abstract modeling (the passion of a software architect) which introduces some rough new ideas like "multimedia glossary" and "localization memory". The intent, as usual, is to stimulate further thought.

Throughout the following discussion we will consider that globalization is a two-step process where internationalization is the first step whose purpose is to make the second step (localization) easier. "N" is always a reference to the number of target locales and "i18n" will occasionally be used as the abbreviation for internationalization.

Internationalization is just common sense

We made it clear in the March 15th article that internationalization was not required to define globalization; it was simply required to make the globalization process more efficient by facilitating localization. Thus, internationalization is not some fancy, complicated, new technical concept with uncertain benefits; it is a well-established management approach to get the job done faster and cheaper, simply by thinking ahead and doing things once rather than N times. Common sense, no? Yet, still today, the internationalization process is relatively unknown; many organizations are trying to reduce the costs of translation (notably for Web sites) but have failed to apply the enabling principles of internationalization.

Internationalization is pervasive

In fact, globalization as a whole is pervasive, since it is about the roles of language and culture in the communication process. The first rule of communication is to know your audience; anytime there is an interaction between your business (your products, your services) and your customers, globalization will come into play.

From the ads placed in a magazines, the product brochures, and the user's guides and reference manuals, via the user interface in your programs, and the pages on your web sites, to your reports and databases, all and any of these may need to be globalized.

Furthermore, globalization also involves the product specifications, the packaging, the deployment, the after sales support, etc. In a phrase: globalization involves the whole business. Since globalization affects the whole business, and since every globalization effort will start with internationalization as the first step, it follows that internationalization can also involve the whole business; after all, common sense can always play a useful role at any link in the chain.

Things and not just content

As we have just seen, internationalization can apply to software programs, web sites, documentation, brochures, test scripts, install programs, etc. All of these examples are types of information or content (the popular but somewhat vague designation these days). However, in the above definition of internationalization we use thing because it is a more general term that encompasses not only content but also physical devices.

Consider, for example, 110V versus 220V power supplies. Many years ago, at ALIS, we bought English terminals and printers, arabized them and sold them to the Arabic world. Some Arabic countries had 220V electrical systems, others 110V. In the early 80s, we had to order specifically distinct 110V or 220V device models from our suppliers. This increased inventory-carrying costs and opened the door for the occasional embarrassing mistake e.g. a terminal going up in smoke when plugged in by the customer. A few years later, our suppliers had started to internationalize power supplies and all models now supported 110V and 220V, with a switch to set on the back of the device. A perfect solution as long as you did not forget to set the switch. Indeed, over the years, just about everybody who manned the stand at trade shows in Europe or the Middle-East had the pleasure of frying a 110V device in a 220V outlet. Fortunately, a few years later, power supplies were more completely internationalized and automatically adapted to 110V or 220V; no more mistakes were possible, no more terminals toasted.

Locale Adaptability

The preceding example about power supplies also illustrates another generic aspect of internationalization, namely how flexibly the globalized thing can adapt to the locale. In the example, we saw that adapting at installation time by means of a 110V-220V switch was not as convenient as adapting to the voltage dynamically at run-time. Of course, developing a power supply capable of switching dynamically from 110V to 220V was more trouble.

We characterize locale adaptability by the time at which the globalized thing can adapt to the locale; here are some examples:

creation time
this corresponds to no internationalization whatsoever. Each localized thing is created separately. Rarely a good idea for software, but a common case for documentation.
installation time
some software allows locale specification as part of the installation process; or consider setting the 110V-220V switch when installing a printer.
start-up time
imagine software detecting the system locale at start-up time or better yet asking the user what locale he wants; this would be useful in an airport or hotel business center.
run-time
run-time adaptability is sometimes referred to as multilingualization, the capacity to change locale dynamically at any time. This is becoming more common; examples include Web sites and DVDs (I like to switch the movie back to English when my French-speaking wife goes to bed). Someday, when globalization technology is less primitive, this will be the most common approach.

Selecting the right level of locale adaptability will keep your customers satisfied and keep your costs down.

Myths of Internationalization

Before we go more deeply into specific internationalization tasks, it is wise to consider the following common misconceptions about internationalization which can easily lead to cost overruns, missed deadlines and generally unsatisfied expectations.

Myth #1: i18n is making things locale-independent

There is a simple mental model of internationalization in which text strings are removed from a program, leaving two well-defined pieces: one set of locale-dependent text strings and a program that is now locale-independent. This mental model is simple, satisfying and wrong 1! Internationalization is not about removing items to make things locale-independent; it is more about adding items and generalizing things to make them locale-aware so they can correctly support the N targeted locales. For example, supporting the right-to-left presentation direction for Arabic or Hebrew in a program requires a substantial amount of work. It will not be achieved by removing code, but rather by adding extra code. There is in fact no such thing as locale independence; if there were, our software could easily work in Martian! There is no free lunch.

Myth #2: the i18n effort is the same for all languages/locales

This second myth is a consequence of the first; if we believe that internationalization consists in "removing language dependencies" then all languages are surely the same!

However, once the first myth is dispelled, once we realize that we may actually have to do some real work to support any given language, the question then becomes: "How much work?". This is where we discover the language complexity scale for internationalization: Asian double-byte languages are more trouble than just single byte, and bi-directional languages are even more complex (not to mention Thai and Devanagari).

Myth #3: the name of a locale is a sufficient specification

The basic requirement of an Arabic version, say, is to support the Arabic language. No other functional changes are required, right? Wrong! Beyond the stated requirement, one must look for derived requirements and side effects. Will the Arabic version need a phonetic search capability? Will the size of the Chinese font create problems for an embedded system? Will your Japanese version need to support ruby 2?

The Project Management Triangle

Consider the golden triangle of project management below:

This triangle defines the aspects of a project that a manager must control. It also serves to illustrate the inter-dependence of these aspects, e.g. to increase quality, one must increase time or cost or both.

If internationalization is to facilitate localization in any real sense, then it must either reduce time, reduce cost or increase quality of the localization process. In what follows, we discuss various ways to achieve this.

Challenge the N locales

The best way to reduce the cost of a project is to reduce its scope, i.e. do less work. The simplest way to achieve this to reduce the number of target locales.

It can be argued that establishing the list of target locales for any globalization project is a business decision (made by marketing or product management, etc.) and that internationalization happens only after such a decision has been made. Yet globalization projects are often very loosely defined, which usually translates into cost overruns and missed deadlines.

Consider a case where one salesperson has a good lead in Japan, another has a decent lead in Germany and a third salesperson has an unlikely lead in Kuwait and all of sudden some product manager, who knows just enough about internationalization to fall victim to its myths, has a bright idea "Let's make our product international; let's make it language independent!" Sometimes this kind of suggestion may go directly to the development team - but this is the wrong type of decision to take and must be clearly challenged.

Imagine going back to the product manager and telling him he can have European languages in 3 months, but Asian languages will take a least 6 months and translation costs will be double, and Arabic would be a lot more work still. Will the N target locales change as a result?

The challenge, then, is all about providing the decision makers with accurate and realistic information about the cost and complexity of internationalization and subsequent localization.

Get specs for each locale

As we saw in the discussion of myth #3, each locale may have derived requirements that must be made explicit. Each locale may also have different translation needs, providing an opportunity to reduce the localization work even further. You can, on a per case basis, decide to translate or not to translate certain things, depending on their intended audience. You can, for example, decide to translate documents, programs or web pages destined for the general user, but to not translate documents, programs or web pages destined for programmers or administrators. This is a delicate decision for each locale; it may depend on the level of English (or other source language) literacy for that targeted audience in that target locale, it may depend on the law in that country or simply on political sensitivities, it may also depend on what your competitors are doing. These decisions are difficult but they can save hundreds of thousands of dollars.

Ideally the specifications should be reviewed by representatives of each locale (e.g. sales offices, distributors, and so on). Once the specs for each locale have been validated, you can go back yet again to the product manager to produce a realistic ROI for each locale.

Clean-up the source

In today's fast-paced business environment, it is not uncommon for projects - be they software, web sites, documents, on-line help, etc. to advance at a hectic pace. There is a lot of "let's get it working first, we'll clean up later", and later rarely comes. An internationalization project offers the chance to finally perform such long overdue clean-ups. Indeed, if something is about to be localized for N locales, it can be very cost effective to get rid of repetitions, throw away useless stuff, add some documentation and generally make the program cleaner, simpler and more maintainable 3.

For example, in large software systems it is common to have pieces that are still in source control, still being built as part of the product, but simply not used anymore. This happens because systems are always being built up, not down; if something is no longer needed, it just stays there and does not cause any problems. Removing useless components rather than spending time localizing them is very cost efficient.

The same can be done with documentation, removing unnecessary repetitions and verboseness. In fact, simply reviewing the text with the idea that each page costs US$1000 (say for localization into 10 languages) may yield substantial benefits.

Organize by locale-dependency class

A fundamental aspect of any internationalization effort is to determine, for any given thing, what parts need to be changed and what parts do not i.e. which parts are locale-dependent and which parts are suitable for all target locales. If localization is to process smoothly, those parts that need to localized should be easily recognizable and easily manipulable 4.

To this end, locale-dependent items must be separated from the rest. For example, having text and graphics on separate layers in a Photoshop file. Where possible, locale-dependent items are grouped together providing a more easily manipulable package (e.g. Windows resource files, JAVA resource bundles).

In general, however, there are more than two locale-dependency classes. Consider a multilingual Web site with many pages, images and templates, organized as follows:

We see that pages and images have been grouped by language, that some images apply to all locales (e.g. the company logo) but that templates are divided by presentation direction. Templates do not contain any text; they simply specify how the text will be presented (fonts and layout). We see with this example that templates are not really "locale-independent", strongly suggesting once again that "locale-independence" is a misleading concept. We can also imagine other locale-dependency classes, for example, languages requiring phonetic search capabilities or locales where anything even remotely sexy is inappropriate.

We can now state in general terms that one of the fundamental tasks of internationalizing a thing is to organize its static structure to separate parts of said thing according to their locale-dependency class, to make the parts needing change more recognizable, accessible and manipulable.

Note that the above remains true even for such a thing as a power supply!

Clarify the source

Another way to reduce localization effort (time and cost) is to make the source material less ambiguous, easier to understand, easier to translate. The use of controlled language or applying specific grammar rules to textual data is one of the most cost-effective examples of this approach.

Consider also error messages within programs which are notoriously ambiguous and are often provided to the translator with no context whatsoever (i.e. the translator has no idea under what conditions the message appears). It may also be cost effective to clarify the error messages before torturing another N translators.

The multimedia glossary

Consider the following simplified diagram illustrating the steps in creating something global (note that internationalization could indeed totally overlap creation):

The people involved in creation are 'creators' - programmers, technical writers, graphic artists, etc. A common problem is that the creators and their future translators are usually separated from each other by professional, departmental and often geographical boundaries. Creators and translators rarely talk to each other and creators are often disappointed by the translations they receive. Why is this not surprising?

Creators precisely create new things, new concepts and ideas, new words or new twists on existing words to describe new inventions. There is no magic here; after all, businesses are constantly creating new language (brand names are an obvious example). Unless translators are informed of the meaning and rules of this new language, they simply cannot translate it correctly.

The internationalization step should bridge the communication gap between creators and translators. In particular, creating a glossary that defines the business's new language will not only reduce translation time but also improve translation quality. Moreover, businesses not only invent new terms, they also invent new sounds, images, etc. If some of these multimedia items need to be localized, then it will also be necessary to describe the exact intent of each sound, image, etc. For this reason, we talk more generally of a multimedia glossary.

An interesting approach to this communication problem is the TermGlobal service offered by Lionbridge; it is a web-based application that allows creators and translators to collaborate online to establish and maintain a glossary (see http://termglobal.lionbridge.com/).

The localization memory

If previous translation work has been done (say on a previous version of the same thing), then there may be a translation memory that will help reduce the cost of translating this new version. Organizations should consider translation memories as valuable knowledge assets. Once again though, we can generalize from text to multimedia. If a certain image has, in the past, been localized for use in Japan or Egypt, we would like this fact to be remembered and we would like to be able to retrieve the previously localized images for re-use, to avoid needless work duplication.

We propose the term localization memory to represent a tool or set of tools that will complement translation memories with the capacity to remember localization efforts on sounds, images, etc. One can even imagine forms of fuzzy matching on images; for example, discovering that two images have the same background only different text. Note also that the localization memory for 110V, 220V power supplies could be their schematic diagrams, be they in paper or in electronic format.

Deliver a localization kit

Finally, any well-defined project should have a clear deliverable. The deliverable for any internationalization project is the localization kit which should contain, at a minimum:

  • The source things to be localized: reduced, rationalized, cleaned-up, organized and clarified as discussed above
  • The multimedia glossary that provides semantics for all invented items
  • Previous localization memories, if any exist, to re-use past knowledge

The participants in an internationalization project should remember that the translator is a knowledge worker in a production environment and that it is their job to make the translator's job easier.

Footnotes

  • 1 Or, at least, woefully incomplete!
  • 2 Ruby are small phonetic symbols usually written on top of the ideographic Japanese characters to instruct the reader on how to correctly pronounce the characters below.
  • 3 For better or for worse, internationalization often happens as a "maintenance" activity.
  • 4 While not in the MS-Word spell-checker, manipulable is in the Merriam-Webster online dictionary.