internationalization guidelines 2

44
8/9/2019 Internationalization Guidelines 2 http://slidepdf.com/reader/full/internationalization-guidelines-2 1/44

Upload: ivaylo

Post on 30-May-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 1/44

Page 2: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 2/44

 2

Table of Contents

Overview.......................................................................................................................... 4 

Top Five Reasons Why Product Localization Fails ........................................... 4 

Gettext in a nutshell ................................................................................................... 6 

Pango ................................................................................................................................ 7 

The Locale Model .......................................................................................................... 7 

Separation of Code and Resources ....................................................................... 8 

Calendar Differences................................................................................................... 8 

Date Formats ............................................................................................................. 8 

Time Formats ............................................................................................................. 9 

Numbers and Currency .............................................................................................. 9 

Weights and Measures ............................................................................................... 9 

Capitalization, Uppercasing, and Lowercasing ................................................. 9 

Comparing and Sorting ............................................................................................ 10 

Keyboard ....................................................................................................................... 10 

Composite Messages................................................................................................. 10 

Using the Same Variable for Different Purposes........................................ 11 

Plural Constructions .............................................................................................. 11 

Punctuation and Spacing..................................................................................... 12 

Dynamic Text Insertion....................................................................................... 12 

Variable Order ............................................................................................................. 13 

Character Sets............................................................................................................. 14 

Hard·Coded Characters........................................................................................ 14 

English is Compact .................................................................................................... 15 

Screen Layout.......................................................................................................... 15 

String Buffers........................................................................................................... 15 

Internationalization: A Summary ........................................................................ 16 

Appendix A: Reference............................................................................................. 17 

Localization and Internationalization Terms................................................ 17 

Character Set and Encoding Terms ................................................................ 18 

Examples of Supported Languages ................................................................. 19 

Components to be Localized .............................................................................. 20 

User Interface Design Guidelines..................................................................... 22 

Page 3: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 3/44

 3

Appendix B: Coding Requirements ..................................................................... 24 

BIOS, Firmware, and UEFI Requirements .................................................... 28 

Appendix C: Gettext Tutorial ................................................................................. 30 

Compiling and testing source code ................................................................. 31 

Internationalization: program mdtest 1.0.................................................... 32 

Internationalization: library mdtestlib 1.1.0 ............................................... 36 

Translators: creating a new language version ........................................... 40 

Page 4: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 4/44

 4

Overview

Many software projects have encountered preventable internationalization failures:

•  English version fails to install on localized OS.•  English version crashes on localized OS.•  English version is unable to accept Unicode characters.•  One defect in the English product causes 24 defects in 24 languages.•  And so on.

To prevent localization defects:

Requirement Description

Internationalization Prior To Localization

The software shall be internationalized prior to localization.Proper internationalization is a basic requirement for software architecture,regardless of whether there are plans to localize the software or not.

Software itself is perhaps the most important component to internationalizecorrectly. If the user interface is not segregated from application code, localizationcan be tedious and time-consuming. The goal of software internationalization is topermit localization that does not require changes to application code.

Top Five Reasons Why Product Localization Fails

I m p r o p e r o r i n c o m p l e t e i n t e r n a t i o n al i za t i o n o f t h e p r o d u ct   Many internationalization efforts fail because they are inaccurate or simplyincomplete. Are you covering all of the following?

1.  Follow established internationalization standards to prepare code forlocalization. Adopt Unicode and externalize user strings.

2.  Perform pseudo-translations, and carry out quality assurance steps.3.  Create a complete localization kit that includes the resource bundles, install

script, help manuals, and any other files that end users see when they’reusing your product.

4.  Double-check your localization kit for completeness and accuracy before thelocalization effort starts.

Lack o f p r ocess   Not having a localization process (or using an outdated, unproven, or incompleteprocess) can have long-term consequences for your product's future updates andsuccess. Before you begin localization, design a plan for carrying out these keysteps:

•  Preparing the files•  Building the translation database•  Leveraging the translation•  Reusing the translation

Every company should establish a localization process that permits easy fileprocessing and translation reuse. A collection of project reference materials – style

Page 5: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 5/44

 5

guides, translation databases, glossaries for each language in your target market – isalso essential.

I n a d eq u a t e b u d g e t s   There are very inexpensive ways to produce translations. Machine translation is oneway that can be effective when all that’s needed is the essence of a document. But

the essence is seldom enough, and it’s never acceptable when international releasesare the goal. Professional translations and localization are costly, and require afinancial commitment, first for the initial effort, and then for ongoing maintenance.

Unrea l is t i c schedu les   Having the right strategy, a strong process, and a large team can help expeditelocalized releases, but there is a minimum time investment for a quality result that arush job simply can’t satisfy. Give localization projects the time they deserve, even if that comes at the expense of time-to-market. A short delay in a successful productrelease is always preferred over a timely release of a potentially failed product.

I nex pe r ienc ed s ta f f   Any localization project calls for good project managers, translators, and engineers.Hire experienced translators armed with an excellent command of the source andtarget languages as well as a good knowledge of your product's subject area.Complement them with competent engineering staff. Then, delegate authority to acapable project manager who's tasked with delivering results on time, on budget,and within pre-established quality standards.

In short, a policy to have the best technology, people, and processes is still the bestrecipe for localization success.

Page 6: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 6/44

 6

Gettext in a nutshell

gettext is the GNU i18n library, and it is one of the most important standards in FreeSoftware about i18n of software. Translators understand it, and there are a varietyof tools available that work with gettext data.

Add this to the headers of the C module in which you have your main() function:

#ifdef USE_GETTEXT

#include "libintl.h"

#include "locale.h"

#define _(String) gettext (String)

#else

#define _(String) String

#endif

Add this at the beginning of your main() function:

#ifdef USE_GETTEXT

setlocale (LC_MESSAGES, "");setlocale (LC_CTYPE, "");

setlocale (LC_COLLATE, "");

textdomain ("my-program");

bindtextdomain ("my-program", NULL);

#endif

That’s all for the initialization of gettext. How about the rest of your code? In eachmodule, you must add this to the headers:

#ifdef USE_GETTEXT

#include "libintl.h"

#define _(String) gettext (String)

#else

#define _(String) String

#endif

Now you can internationalize whatever text you want—just write _(”anything”)instead of just “anything”:

printf(_("anything");

Compile your program with -DUSE_GETTEXT, or at least to #define USE_GETTEXTsomewhere.

To generate the .pot templates, and install the .mo translation modules in yoursystem:

xgettext -k_ -o my-program.pot *.c *.h --from-code=iso-8859-1

Note: You can create different .pot files for the different parts of your programsinstead of a really big one to help translators, and then joining all the .po files insidea single one before converting it to .mo with msgcat:

Page 7: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 7/44

 7

msgcat -o my-program.es.po my-program.part1.es.po my-program.part2.po

my-program.part3.es.po

To compile the .po files into .mo:

msgfmt my-program.es.po -o my-program.es.mo

Then install it in the proper directory of your system:

cp my-program.es.mo /usr/share/locale/es/LC_MESSAGES/my-program.mo

Pango

Pango is a library for laying out and rendering text, with an emphasis oninternationalization. Pango can be used anywhere that text layout is needed, thoughmost of the work on Pango so far has been done in the context of the GTK+ widgettoolkit. Pango forms the core of text and font handling for GTK+-2.x.

Pango is designed to be modular; the core Pango layout engine can be used withdifferent font backends. There are three basic backends, with multiple options forrendering with each.

•  Client side fonts using the FreeType and fontconfig libraries. Rendering can bewith Cairo or Xft libraries, or directly to an in-memory buffer with noadditional libraries.

•  Native fonts on Microsoft Windows using Uniscribe for complex-text handling.Rendering can be done via Cairo or directly using the native Win32 API.

•  Native fonts on MacOS X using ATSUI for complex-text handling, renderingvia Cairo.

The integration of Pango with Cairo (http://cairographics.org/) provides a completesolution with high quality text handling and graphics rendering.

Dynamically loaded modules then handle text layout for particular combinations of script and font backend. Pango ships with a wide selection of modules, includingmodules for Hebrew, Arabic, Hangul, Thai, and a number of Indic scripts. Virtually allof the world's major scripts are supported.

As well as the low level layout rendering routines, Pango includes PangoLayout, ahigh level driver for laying out entire blocks of text, and routines to assist in editinginternationalized text.

Pango depends on 2.x series of the GLib library; more information about GLib can befound at http://www.gtk.org/. 

The Locale Model

The ISO C-language standard defines the concept of locale, and includes severalfunctions that use locales. A locale is a collection of rules and data specific to alanguage and a geographic area. Locales include information on sorting rules, date

Page 8: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 8/44

 8

and time formatting, numeric and monetary conventions, and characterclassification.Most of the popular platforms and programming standards support the locale model;many of them add enhancements.

Summary: Employ the ISO-C locale model in the design of your application.

Separation of Code and Resources

Separating source code from user interface text is the first and most basicrequirement for internationalization. You should create and maintain oneinternationalized code base that is used for all language versions.

Summary: Separate your source code from locale-dependent resources. Never hardcode text strings.

Calendar DifferencesCalendars and calendar layout vary country to country. Some applications, likespreadsheets or databases, require date calculations. It is necessary to supportJulian dates in these applications.

Summary: Your application should support the correct calendar conventions foreach of your target markets. Whenever possible, make use of the internationalsupport included in the operating system or platform for which you aredeveloping.

Date Formats

Even though the Gregorian (seven-day week, 12 month) calendar is standard inmost European countries, date presentations vary. Often, the day comes first,followed by the month, and then the year. As in the United States, however, thereare a variety of ways to represent dates in each country. Some examples of commondate formats are given below:

January 14, 2009 US14 janvier 2009 France14. Januar 2009 Germany14.1.09 Italy

Try to avoid day-of-week abbreviations. In some languages, such as Hebrew, all thedays of the week begin with the same letter. Instead, use functions to performnecessary date abbreviations.

Summary: Avoid hard-coded date formats. When developing C programs, usefunctions such as strftime() to format dates.

Page 9: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 9/44

 9

Time Formats

Time formats and notations also vary around the world: twelve-hour clock versus thetwenty-four hour clock. Here are some examples:

9:35 PM US

21:35 France21 h35 French Canadakl21.35 Sweden

Summary: Avoid hard-coded time formats. When developing C programs, usefunctions such as strftime() to format time.

Numbers and Currency

The conventions for displaying numbers and currency also vary from country tocountry. Your software must support decimal and thousand separators, grouping,currency symbols, and negative signs according to each locale's requirements. For

example:123,456 US123.456 Germany123 456 Sweden

Summary: Use functions that separate the internal from the external representationof numbers.

Weights and Measures

The metric system is used worldwide, except in the United States. If yourapplication specifies units of measure in inches and feet, make sure to include thecapability to process metric units of measure as well.

Summary: Design your application to allow the user to choose between differentmeasurement systems.

Capitalization, Uppercasing, and Lowercasing

People in different countries, using the same language and alphabet, do usedifferent capitalization rules. Some languages, such as Arabic, Hebrew, and Thaidon’t use capitalization at all.

/* Do not use a formula, as is done in this example. Extended

characters fall outside the specified range */

if ( (c >= 'a') && (c <= 'z') ) bLower= TRUE;

/* Do use appropriate functions which change behavior according to the

selected locale */

bLower = isLower(c);

Page 10: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 10/44

 10

Summary: Never use code to produce upper-case or lower-case characters. Instead,use functions sensitive to the selected locale.

Comparing and Sorting

String comparisons and sorts consume a significant share of processing time in most

applications, and may consume even more in other languages because of theirextended character sets. For this reason, your sorting routines should be as efficientas possible.

Two languages using the same alphabet do not necessarily use the same sort order.For instance, the sort order for Russian is slightly different from that for Serbo-Croatian or Bulgarian.

In many languages, accented characters are sorted with their non-accentedequivalents. However, most Scandinavian languages sort the 'i)', 'ii', and otheraccented vowels after the 'z' character.

The sorting of double letters varies according to language. In Danish andNorwegian, 'aa' comes at the very last position in the alphabet, but this is not thecase in Swedish. In Spanish-speaking Latin America the 'ch' is considered a singleletter and is sorted between 'c' and 'd', while in Spain this is no longer the case.Spain has redefined. 'ch' to be sorted. as two separate letters, as had been thepractice worldwide prior to 1840.

The additional work required to properly sort a variety of languages will oftenincrease code size.

Summary: Use locale-specific functions such as strcoll() and strxfrm() to compareand sort strings.

Keyboard

If your application uses specific keyboard entries, be aware that keyboard layoutsdiffer from one country to another. For example, German keyboards reverse the positions of the Y and Z keys relative to American keyboards, among other things.

Summary: Never assume that a particular key location or a particular key sequencegenerates a particular character.

Composite Messages

One of the most difficult problems to overcome in software translation is thegeneration of correct composite messages. For various (and usually very good)reasons, developers compose messages by concatenating strings. This causesproblems during localization, as explained below. The most "localizable"applications are those that minimize, or even avoid, composite messages.

Page 11: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 11/44

 11

Using the Same Variable for Different Purposes

Many developers assume that the simplicity of English is also found in otherlanguages. For instance:

#define STR_FILE_MSG “File: %s\n”

#define STR_DIR_MSG “Directory: %s\n”

#define STR_NONE “None”

sprintf( OutBuf, STR_FILE_MSG,

is empty (szFileName) ?

STR_NONE :

szFileName) );

sprint( OutBuf, STR_DIR_MSG,

(is empty(szDirName) ?

STR_NONE :

szDirName) );

This code works fine in English, but in German where "the file" is "die Datei" and"the directory" is "das Verzeichnis," the translation of "None" changes according tothe gender of the noun to which it refers. In this situation, you would need "keine"for "Datei" and "kein" for "Verzeichnis." Since the code example has only oneSTR_NONE defined, a correct translation for both cases cannot be made. Duringlocalization, this code would require modification.

Summary: Never use the same text constant for two different purposes.

Plural Constructions

The rules for constructing plural nouns vary according to language. The bestsolution to this problem is to spell out each possible pluralization in your resources,rather than constructing them at runtime.

/* Do not create plurals by adding an ‘s’, as shown here, since

This won’t work reliably */

#define STR_FILE_MSG “The total is %d file%s.”

#define STR_HORSE_MSG “The total is %d horse%s.”

...

#define PLURAL ( (x == 1) ?

(**) :

(“s”) )

...

...sprintf(OutBuf, STR_FILE_MSG, nFiles,

PLURAL(nFiles));

sprintf(OutBuf, STR_HORSE_MSG, nHors,

PLURAL(nHors));

Sometimes adding an “s” to make a plural works in a foreign language, but neverreliably. For example, in French, the plural for "fichier" (file) is "fichiers." However,the plural for "cheval" (horse) is "chevaux." In this case, the above example would

Page 12: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 12/44

 12

have converted "cheval" to "chevals," which is unacceptable.

It is very difficult to translate messages that are structured like the ones given in this(negative) example. If the macro generating the "s" is located in a different part of the code than the message, or perhaps even in a different file, the correct translationwill be difficult to determine.

Summary: “Tricky” code that works in English may not work in other languages. If possible, spell out all plurals rather than constructing them usingconcatenation.

Punctuation and Spacing

Punctuation and spacing rules vary according to language. For instance, if youdevelop a text wrapping function, do not assume that all languages will use twospaces after a period, or no space before a colon.

Example:

/* Do not hard code punctuation, as is done here */

#define STR_POPULATION “Population”

#define STR_CAPITAL “Capital”

...

...

sprintf(OutBuf, “%s: “, STR_POPULATION);

sprintf(OutBuf, “%s: “, STR_CAPITAL);

/* This code displays:

Population:

Capital:

*/

In this case, the colon ‘:’ following each word is part of the code. Some languages,such as French, require a space before a colon. A translator who sees the words “Population” and “Capital,” but not the code, has no way of knowing that each wordis displayed with a colon. The translator will not add the required spaces. A smallmistake like this might be difficult to catch.

Summary: Always treat punctuation and spacing as text; include them as part of their associated string resource.

Dynamic Text Insertion

A program that inserts user-entered text into a stored string may produce syntactical

errors in any language, including English. For example, most words that start with avowel should be preceded by the article "an" instead of "a," but not all. "A user" iscorrect, but so is "an uploaded file." Many languages pose additional problems fortext insertion with possessives and adjectives that can change according to thegender and number of the subject.

Consider this simple English example of an article problem:

#define STR_FOUND_MSG “Found a %s character”

Page 13: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 13/44

 13

#define STR_WORD_COMMA “comma”

#define STR_WORD_ASTER “asterisk”

...

...

char *item;

...

item= (c == ‘*’) ?

STR_WORD_ASTER :

STR_WORD_COMMA;

sprintf (OutBuf, MSG_FOUND, item);

If a comma is found, this code correctly displays “Found a comma character,” butwhen an asterisk is found, the code awkwardly displays “Found a asterisk character.” 

Summary: Avoid dynamic text insertion whenever possible. Instead, definecomplete messages in the resource file.

Variable Order

One of the most difficult problems in message translation is the localization of stringswith multiple text variables.

#define STR_THIS_MSG “This is my %s %s”

#define STR_RED “red”

#define STR_CAR “car”

...

...

strcpy (color, STR_RED);

strcpy (object, STR_CAR);

...

sprintf (OutBuf, STR_THIS_MSG, color, object);

/* Prints:

This is my red car

*/

The appropriate ordering of noun and adjective varies among languages. In theprevious example, the "color, object" variable order would require language-by-language re-coding during localization.

A better approach is to use a function to order the parameters, which moves thereordering work from the code to a header file.

Here's an example that uses an order-independent version of sprint() calledStrParam():

#define STR_THIS_MSG “This is my $1s $2s”...

...

sprintf(OutBuf, StrParam(STR_THIS_MSG, color

object));

Here, with a small change to the header file, the French translation displayscorrectly.

#define STR_THIS_MSG “Voici ma $2s $1s”

Page 14: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 14/44

 14

...

...

sprintf(OutBuf, StrParam(STR_THIS_MSG, color

object));

However, note that the possessive "my" must still agree with the gender of the

variable noun.Here are two strategies to internationalize this sort of text by circumventing theproblems caused by concatenation. The first is simply to include a unique string foreach possibility, which leaves nothing to the translator's imagination and inherentlysupports accurate localization. Memory requirements, however, will be higher. Forexample:

#define STR_REDCAR_MSG “This is my red car”

#define STR_BLUECAR_MSG “This is my blue car”

#define STR_REDPHONE_MSG “This is my red telephone”

The second strategy is to employ a non-sentence-based construction, such as “Car

color: red,” thereby avoiding gender and order issues in most cases.

Summary: If possible, avoid concatenation for construction UI messages. If youmust concatenate, employ order-independent functions such as StrParam().

Character Sets

The variety of character sets found in use today presents a huge challenge to thedeveloper wanting to internationalize a software application. Even variants of thesame language may require different character sets.

Hard·Coded Characters

The code page selected by a user may differ from the one used during codecompilation. Let's see how this can cause problems, using the following codeexample that looks for the currency symbol '¤'.

/* This example uses a hard-coded currency symbol */

if (c == '¤')

ProcessCurrency();

We'll assume that the code is compiled using code page 850 (Multilingual) but theproduct is run on a Canadian machine using the French-Canadian code page 863.

Using code page 850, the '¤' symbol becomes represented internally in the compiledsoftware as the value '207'. However, a Canadian machine using code page 863understands '207' to be the graphic character '±'. It is the value '152' that code page863 uses for '0'.

When this example code encounters the currency symbol on a Canadian machine, itsees the value '152' rather than the expected '207'; the ProcessCurrency() function will not be called, and the code fails.

Page 15: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 15/44

 15

Here is another example:

#define STR_VERSION_MSG "XYZ ® - Version %d"

...

...

nVersion= 4;

sprint(OutBuf, STR_VERSION_MSG, nVersion);

Compiled in and running under Windows, this code displays the expectedcharacter after 'XYZ'; i.e., the Registered symbol '®'. If this same code is used in aMacintosh version, the program will incorrectly display the Æ character.

Summary: Sharing resource files across platforms requires conversion to therelevant character set prior to inclusion in the program.

English is Compact

Words and sentences tend to be shorter in English than in other languages.Translated text typically contains 20-30% more characters than its English

equivalent.

Paragraphs can be rephrased in most cases to fit length restrictions. On the otherhand, single words and short expressions, such as column headers, usually cannotbe rephrased; moreover, they may double in length, making text expansion agreater problem for short expressions than for paragraphs.

Screen Layout

Leave room for text expansion in dialog boxes, menu bars, tool palettes, and otheron-screen elements. Limit each static text control to about 180 characters.

String Buffers

Text expansion can also cause problems if your code employs string buffers. A bufferlarge enough for the longest English string in your program may fail in otherlanguages, causing a nasty and expensive-to-solve memory-overwrite bug. To avoidmemory overwrites, you may want to check the lengths of strings before copyingthem into buffers,

Summary: Design your string buffers to hold twice as many characters as thelongest string expected in your English product. You may want to check string lengthbefore copying strings to a buffer to avoid memory overwrites.

Page 16: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 16/44

 16

Internationalization: A Summary

•  Take text expansion into account to avoid the costly redesign of your screens anddocumentation. You also avoid potential memory problems that can causesoftware failure.

•  Take character-set differences and multi-byte functionality into account so yourcode works on all your target platforms, in all your target markets.

•  Employ locale-sensitive functions that call international resources so your codeadapts itself to present your customers with appropriate date, time, monetary,and text formatting—no matter what country they're in.

•  Keep user-interface resources separate from code and localization will becomefast and efficient.

•  Avoid jargon, humor, and cultural references and no market will beunintentionally offended or confused.

Page 17: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 17/44

 17

Appendix A: Reference

Localization and Internationalization Terms

Acronym or Term Definition

Globalization (G11N) Designing software for the input, display, and output of adefined set of Unicode supported language, scripts anddata relating to specific locales and cultures. This termis often synonymous with internationalization.

Internationalization(I18N)

The process of designing and developing a product toenable its adaptation for a specific international marketso that the localization team does not require the sametechnical skills as those of the original developers. Thisterm is often synonymous with globalization.

Locale A locale is the set of cultural conventions that aredependent on language and country that can affect the

user interface of a computer system. Example: Dateformat.

Localizability Designing software code and resources such thatresources can be localized with no changes to the sourcecode.

Localization (L10N) The process of customizing or translating the separateddata and resources needed for a specific language-localepair.

PO files A PO file is just a structured list of messages thatcontains, for each message, the English original messagetext (msgid) and the translation to the target language(or an empty string that will be filled by somebody with

the translation of the message). If the file still does nothave any translations (all the translated strings areempty), it is called a PO template file, and usually giventhe .POT extension, instead of the traditional .POextension of PO files. Once a translator starts to translatea .POT file, he will save it with the .PO extension. PO filesare text files in UTF-8 format, so they can also be seenand modified with a normal text editor that can handleUTF-8 format.

Translation The linguistic component of the localization process.World-Readiness A product that is properly globalized, simple to customize

and easily localizable.XLIFF XML Localization Interchange File Format

Page 18: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 18/44

 18

Character Set and Encoding Terms

Acronym or Term Definition

Encoding A pairing of a sequence of characters from a givencharacter set with something else, such as a sequence of numbers, or octets, or electrical signals. Similar to acode page.

Code Page A term used for a specific character encoding table,usually corresponding to a character set belonging to aspecific language, in which each character is mapped toa specific number. Similar to an encoding.

ASCII American Standard Code for Information InterchangeThis is a 7-bit code that is the US national variant of ISO646. Characters are 1 byte long. The lower 127 bytes of every character set and code page contain the ASCII set.

SBCS Single Byte Character SetsCharacters are always one byte.DBCS Double Byte Character Sets

Characters are usually 2 bytes each. Many people referto variable-length encodings, such as UTF-16 or GB2312as “double-byte character sets” although technically UTF-16 can be either 2 or 4 bytes, and GB2312 can be 1 or 2bytes.

MBCS Multi Byte Character SetsUnicode An industry standard that allows computers to represent

and manipulate text from all of the world’s writingsystems. Unicode is not an encoding.

UTF Unicode Transformation FormatUTF-8 This is a variable byte-wide encoding for Unicode. Each

character may require 1-6 bytes per character. UTF-8 isused as the default encoding in many other standards,such as HTML, XML, etc. It is now the predominantencoding on the Internet and in Linux.

UTF-16 This is an encoding scheme for Unicode where charactersin the Basic Multilingual Plane are 2 bytes long.Characters in the Supplementary Multilingual Planes are4 bytes long. This encoding is used in many operatingsystems and development environments because of itsconvenience to the programmer.

UTF-32 This is 32 bit encoding scheme for Unicode, where all

characters are 4 bytes.

Page 19: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 19/44

 19

Examples of Supported Languages

There are many ways of listing languages for a given product.

Requirement Description Explanation

Prefer Specific Culture Names for LocalesWhen specifying a locale name (e.g. for location, formatting,and standards), a specific locale shall be used.For example:

•  Use “en_US” instead of “en” •  Use “zh_CN” instead of “zh-Hans” 

The following table should assist you in choosing abbreviations for the languageswith odd exceptions: Chinese, Norwegian, and Portuguese.

Language NameRFC 4646

tag

Fallback /Neutral Culture

Name

Specific Culture

Name

Chinese zh N/A N/AChinese Simplified zh-Hans zh-Hans N/AChinese (PRC) zh_CN zh-Hans zh-CNChinese Traditional zh-Hant zh-Hant N/AChinese (Taiwan) zh_TW zh-Hant zh-TWNorwegian no no N/ANorwegian (Bokmål) nb-NO no nb-NONorwegian (Nynorsk) nn-NO no nn-NOPortuguese pt pt N/A

Portuguese (Brazil) pt-BR pt-BR pt-BRPortuguese (Portugal) pt-PT pt-PT pt-PT

Notes for the table above:

•  There are more exceptions, but this is the short list of languages commonacross many products.

•  Portuguese – for Brazil and Portugal:o  If localizing into just one Portuguese flavor, it might be preferable to place

resources into the “pt” neutral code so that the user selecting the other flavor

sees Portuguese instead of the fallback language, which is likely to be English.

o  If localizing into both languages, use both codes, one for each flavor.

Page 20: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 20/44

 20

Components to be Localized

Requirement Description Explanation

All Visible Product Elements

•  Installation•  UI strings•  Dialog layout•  Device drivers•  Hotkeys and accelerators•  Font names•  Sometimes log files•  Sometimes readme and release notes•  Web marketing for pre-sales•  Web support content•  Help files and documentation•  OEM/ODM support collateral•  Regulatory inserts•  Packaging and CD labels•  Licenses and legal text•  Graphics•  Multimedia•  Sample files

This is an all-

inclusive generalrequirement.The otherrequirements belowindividually list thecomponents to belocalized.

Components LocalizedSoftware localization is supported by the followingcomponents, for the OSs and languages indicated:

•  All visible text found in drivers, INFs, software, plug-ins, device manager UI, wizards, ASF, auto run, installutility, online help, and embedded help.

User InterfaceSoftware must ensure that all end user visible text is localizedunless marked otherwise.Start Menu, Desktop , Control Panel, Services, DeviceManager, etc.All product strings must be localized, even when displayed bythe operating system.Help LocalizationAll text present in the Help and User Guide for Productsoftware must be localized.Installation LocalizationAll install programs must be localized.

See the sectionrelated to

Installation formore detailedrequirements.

License Agreement

Software that includes license agreements must provide themin localized languages that are approved by legal.Software shall also include the English license agreement, asthe binding legal agreement.

Page 21: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 21/44

 21

Requirement Description Explanation

README.TXT fi leREADME.TXT file shall be localized.

Recommended, butoptional.

Release Notes

Release Notes shall be localized.

Recommended, but

optional.Localized Graphics and ScreenshotsAll text in images and screenshots shall be in the samelanguage as the language of the surrounding text.DriversAll driver messages displayed to the user or written to thesystem event log shall be translated.

Page 22: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 22/44

 22

User Interface Design Guidelines

Requirement Description

Dynamic UI Layout

The user interface layout shall use dynamic layout for all windows, dialogs andcontrols, such that all containing elements can adapt their size to the contents, asmuch as is possible (within max and min limits).As text expands, the controls shall either grow in size, or shall show scroll bars.Right-to-Left LanguagesAll product software component software shall operate correctly with languages thatread from right-to-left (viz., Arabic, Hebrew), as specified by its functionalrequirements.Aside from ensuring that the text can be set to read from right to left, it should alsobe possible to flip all dialogs (and their controls) as well as menus so that theyappear to be a mirror image of the English dialogs/menus.User Interface Design Guidelines for Text

To allow for ease of localization of the text strings in the dialogs, the user interfacedesign should incorporate the following guidelines:

•  Space must be allowed for string expansion up to 1.5 the length of English.Size of labels must be made long so the text can expand without resizing thelabel.

•  For checkboxes and other cases that are short on horizontal space, theMulti-line and Top-align properties shall be enabled and room allowed fortext to wrap.

•  In list boxes, horizontal scrolling shall be enabled.•  For labels that include both static text and variable text, separate labels shall

be used for each.•  Clear concise text shall be written using short and simple sentences.• 

Terms and special characters shall be used consistently.General Usability Guidelines

•  Dialog boxes shall be properly resized and dialog text shall be hyphenated according to therules of the user interface language.

•  Translated dialog boxes, status bars, toolbars, and menus shall fit on the screen at differentresolutions. No text or controls shall be cut off.

•  Each menu and dialog accelerator shall be unique.•  Visual layout shall be consistent with the source language edition's layout. For example, dialog

elements shall be in the same tab order.•  User can type accented characters and long strings in documents and dialog boxes.•  User can enter text, accelerators, and shortcut-key combinations using international keyboard

layouts.•  User can save files using filenames that contain accented characters.•  User can successfully print documents that contain accented characters.•  User can successfully cut and paste text that contains accented characters to other applications.

•  Software shall respond to changes in Control Panel's international/locale settings.•  Software shall operate correctly on different types of hardware, particularly on hardware that is

sold in the target market.

FontsFont and font style information shall not be hard coded.Place font and font style information in resource files separated from the code files.

Page 23: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 23/44

 23

Requirement Description

Non-Standard FontsIf non-standard fonts are used, there shall be equivalent font libraries or substitutions specified for alltarget languages.Standard fonts are defined as the set of fonts that are bundled with the operating system. When non-

standard or special-use fonts are used, these fonts shall be supplied to the localization team for thepurpose of creating localized materials.Data Entry FormsData entry form elements shall be enabled to accept localized input including:

•  Input through IME unless requirements specify ASCII only•  UI items with localizable content (text boxes, buttons, etc.) shall be sized to allow larger

translated strings and word-wrap due to sizing (e.g., German resource strings generally have30% more characters than their English equivalent).

•  Locale specific data fields (address, currency, measurements, honorifics & personal titles,date/time etc.) shall allow for international regional data (phone numbers, bank sort codes,state/region information, etc.)

•  Length of data fields shall be sufficient to enter local user data. Examples:o  User Name: Wolfeschlegelsteinhausenbergerdorff o  User account: [email protected]  Domain Name:

http://thelongestlistofthelongeststuffatthelongestdomainnameatlonglast.com/ •  Fields restricted to ASCII characters (legacy account names and passwords, script returnvalues, etc) shall present warnings or error messages to customers when customers enter non-ASCII or other restricted characters.

Avoid Text in GraphicsAvoid placing text in bitmaps, icons, graphics or images in software, since it unnecessarily complicatesthe task of localization and significantly increases the cost.Culture Neutral GraphicsAll graphics/icons/colors shall be culturally neutral and icons shall be accompanied by text (text cannotbe a part of the graphic file) that clarifies the icon’s meeting.Country flags shall not be used anywhere.ColorsFor a change in status, in addition to a change in color, also change the shape of objects, change thewords, or change an image to indicate changing status. Do not use only colors to indicate a change instatus.

SortingAll lists shall be sorted in a culturally appropriate manner.Units of MeasurementEvery measurement shall include the unit of measure whenever displayed, including web page, reportsand graphs.Units of MeasurementExample: For the US Culture:

•  Weight shall be measured in pounds and shall be a numeral•  Temperature shall be measured in Fahrenheit and shall be a numeral

Example: For the UK Culture:

•  Weight shall be measured in kilograms and shall be a floating point number•  Temperature shall be measured in Centigrade and shall be a floating point number

Page 24: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 24/44

 24

Appendix B: Coding Requirements

This section contains guidelines and requirements for the software architecture andcode itself.

Requirement Description Explanation

Unicode Character ProcessingProduct software shall use Unicode character processing throughout the entiresoftware system. 

Choose UTF-8, UTF-16,or UTF-32 depending onthe situation.

Unicode Security ConsiderationsSoftware shall be reviewed for character processing security defects, as perthe Unicode Technical Report #36:http://www.unicode.org/reports/tr36/ Unicode Combining CharactersSoftware shall process and display Unicode combining characters properly.Examples:

•    “Å” U+00C5 (A-ring pre-composed)•   “A+°” U+0041 + U+030A (A + combining ring above)•    “Å” U+212B (Angstrom) 

Externalize All User-Visible Text StringsUser-visible text strings shall be externalized from code to allow languageresources to be loaded as appropriate from second source (resource files,external localized string files, etc.)Separate UI from CodeUI Isolation mechanism shall use run-time binding for the translatedelements. Run-time binding of translatable elements shall always be favored(e.g. in Linux, use PO files).One Resource File Per LanguageA single resource file per language shall be used. Multilingual files wouldcreate unnecessary release, packaging and support dependencies.The only exception to this is for driver installs, where the certification processwould impose undue burden to support this requirement. For driverlocalization, a single resource file shall be used for all languages requiring a

single certification process.Avoid String ConcatenationTranslatable text shall not be concatenated. Combining pre-defined stringsto create complete sentences prohibits translation.Use the “Subject: Predicate” pattern wherever possible:

Don’t do this: “Your balance is $100.00.” When you can do this: “Balance: $100.00” 

Use numbered parameters for all the rest: “Page {0} of {1}” 

Formatting Strings Using Numbered ParametersMessage format mechanisms shall use numbered variables instead of positional ones because the syntax of the target language might be reversedor otherwise rearranged during translation.Example:

•  English:o   “Page {0} of {1}” o   “Page 1 of 10” 

•  Japanese:o   “ページ{1}の{0}” o   “ページ10の1” 

Avoid Creating User Interface in Code

Dynamic construction or modification of the user interface shall be avoided.The position and size of elements may be altered during translation.Example to avoid:

Window w = new Window(“Title”, 300, 300);

Page 25: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 25/44

 25

Requirement Description Explanation

Buffer LengthsUse of fixed string buffer lengths and pointer arithmetic for stringmanipulation shall be avoided.Dynamic memory allocation and Unicode-aware language routines for string

length determination shall be adhered to in order to avoid memory bufferoverruns.East Asian Character Enablement (e.g., Chinese, Japanese, Korean)

•  Rely on the operating system development libraries for locale-sensitive functions and character categorization.

•  Text and strings shall not be parsed on a byte-by-byte basis. Parsersshall scan strings on a character by character basis.

•  Characters shall not be used as integers or vice versa.•  Low-level system calls shall not be used for the input of character

data. Japanese, Chinese and Korean systems use a special front-endprocessor (IME) to allow users to enter thousands of differentcharacters with a regular sized keyboard. The IME is installed withthe operating system and is normally transparent to the developer.However, if you call low-level system calls, you incur the risk of by-passing the IME, in which case the user won't be able to correctly

enter data.Transferring DataWhere a component requires the transfer of data between components, thefollowing transfer methods shall be used:

•  Transport of all data shall use Unicode, whether encoded as UTF-8,UTF-16, or UTF-32.

•  Any gateway/data conduit shall be verified to function withoutconverting/modifying the data.

•  Communication mechanisms shall provide code page conversionfeatures when transferring text string or character data acrossmachine boundaries

•  Language-dependent data shall be transferred as internal language-neutral codes and converted into the local language by the receivingprocess if application is transferring text or strings that are to bedisplayed to the end-user.

•  Where the application transfers sorted string data across machineboundaries, either 1) the sorting shall be done by the receivingprocess, or 2) the sending process shall apply the sorting rulesassociated with the locale of the receiving process.

•  Where the application transfers date, time or numeric valuesformatted as strings across machine boundaries, it shall betransferred in a locale-neutral representation, and the receivingprocess shall perform the formatting according to its local rules.

•  Where the application transfers monetary amounts across machineboundaries, either 1) the receiving process shall know whichcurrency the sending process is using, or 2) the data transferredshall provide both the amount and the currency (along with exchangerates, date of exchange, etc.)

•  Code shall not make assumptions about formats and no processingshall be directly based on such formats. Values shall be kept in theinternal representation and formatted or converted at the latest

possible moment. Most operating systems provide locale-sensitivefunctions to perform these conversions.

Page 26: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 26/44

 26

Requirement Description Explanation

Floating Point and Date Values as StringsWhenever a floating point value or a date value is encoded as a string, specialcare shall be taken to encode it in a format that is guaranteed to be parseableby the receiver of the string, regardless of the locale.

For this discussion, the “encoder” refers to the code that converts a value to astring, and the “decoder” refers to the code that converts the string back to avalue.There are two general situations where string encoding of values occurs in aweb application:

•  The sending of form data from a browser to a web server, in whichall data needs to be sent as a string

•  The transmission of XML, i.e <mytag myfloatvalue=”1278.67”>

Dates:Dates shall be encoded using the string representation of the millisecondvalue (the value returned by the Java Date.getTime() value). This completelyavoids the problem of the decoder needing to handle various different of dateformats. The millisecond value shall always be parse-able and convertibleback into a Java Date object regardless of the decoder’s locale.Floating point numbers:

Since the decoder may be using a different locale than the encoder, thedecimal point separator character may be different than what the decoderexpects. For example, the English number “234.56” might be encoded as

 “234,56” by an encoder running in a German locale. The Java functionDouble.parseDouble() is not guaranteed to know the correct locale.Therefore, an all-purpose, locale independent floating point parser functionshall be written in order to convert strings to doubles. It could possibly usesome fuzzy-logic in order to determine how to parse the string, by examiningits contents. This shall be used in all places where floating point strings aredecoded.Cultural Dependencies

•  Word boundaries shall not be used for processing reports. Not allwriting systems use blanks or spaces for writing reports.

•  Monetary amounts shall have both currency and amountcaptured/tracked.

•  Rounding rules for monetary amounts shall be examined for legalimplications, depending on the country.•  Monetary input/output fields shall be large enough for “weak” 

currencies.•  If written representations of monetary amounts are generated, a

mechanism to plug-in localized versions of the text generationalgorithm shall be provided.

•  The operating system’s calendar processing capabilities shall be usedfor any calendar or day-of-week reference.

•  Units of measure shall be handled internally in the metric system. Amechanism shall be employed to set the user’s preference.

•  Postal addresses and telephone numbers shall use very generic fieldstructures (i.e. address line 1, address line 2 instead of city, state,etc.) Employ user definable formats for telephone numbers also.

•  Icons shall not be based on US-centric metaphors or symbols derivedfrom the English language.

•  Page sizes shall vary according to locale.

Page 27: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 27/44

 27

Requirement Description Explanation

Localized Operating SystemsAll versions of the product software shall run correctly on all localizedoperating system languages.Examples of localized text in operating systems:

•  Account Nameso  English: Administratoro  French: Administrateuro  Spanish: Administradoro  English: Userso  French: Utilisateurso  Spanish: Usuarios

•  System Pathso   “My Documents” o   “Program Files” 

•  System Services•  Etc.

International Domain Names (IDN)All software that processes internet domain names shall handle Unicodecharacters in URLs:

•  http://日立.com/ (Hitachi Japan)•  http://ايكيا .com (Arabic transliteration of IKEA)•  http://икеа.com (Russian transliteration of IKEA)•  http://宜家.com/ (IKEA China)•  http://例子.测试 •  http://пример.испытание 

More examples anddetails:http://idn.icann.org/ 

String ComparisonFor user-visible strings, use locale-sensitive string comparisons.For internal or file name string comparisons, use case-insensitive, locale-insensitive string comparisons.XML File FormatUse unique attributes as IDs in XML files.Bad:

<Msg Text=“Cannot open file {0}.”>

Bad: <Text>Cannot open file {0}.</Text><Text>Invalid parameter.</Text>

Bad:<msg001>Cannot open file {0}.</msg001><msg002>Invalid parameter.</msg002>

Good:<msg myId="1">

<text>Cannot open file {0}.</text></msg><msg myId="2">

<text>Invalid parameter.</text></msg>

Word Boundaries and Line BreaksUse appropriate locale-sensitive APIs for determining word boundaries andline breaks in applications that parse text or display formatted text.

Western languages use white space, punctuation, and hyphenation rules todetermine word and line boundaries. The rules for Asian languages areconsiderably different. Spaces do not necessarily separate words. Punctuationrules also may be different (or punctuation may not be used, as is the casewith Thai). 

Spaces are not anindicator of a line breakin some languages.

Page 28: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 28/44

 28

BIOS, Firmware, and UEFI Requirements

Requirement Description Explanation

Multilingual Character Set and Font SupportFirmware shall support the processing and rendering of all character sets and

fonts used in all target market languages.Avoid the following I18N defect:

Multilingual Font Support Workarounds

Most Western European languages are covered by the Latin 1 character set(ISO-8859-1): English, German, French (incomplete, as Latin 1 charset ismissing the œ letter), Italian, Spanish, Danish, Dutch, etc. When seldom usedcharacters, or borrowed characters are not available, there is usually anaccepted work-around. For example, "oe" instead of "œ” in French.Although some 80,000 characters would be needed to ensure completecoverage for Chinese, it might be possible to offer a good user experiencewhile supporting only a limited subset of characters. Here are for example thecoverage rates of the most frequently used Chinese characters:

•  Most frequently used 1,000 characters: ~90% (Coverage rate)•  Most frequently used 2,500 characters: 98.0% (Coverage rate)•  Most frequently used 3,500 characters: 99.5% (Coverage rate)

Detection of OS Locale Settings for F irmware Text Input OverlaysWhen firmware text input screens (secure sprites) are overlaid over theoperating system screen, the firmware shall detect the current operating

system settings:•  Language•  Formatting•  Keyboard layout or Input Method Editor

If the user had been typing using the French keyboard layout in their OS, thensuddenly a firmware screen pops up over the OS with the U.S. Englishkeyboard layout, the user would be unable to type correct characters. Worse,if the user is expected to type a password, they would be unable to type thepassword characters because the user would have no indication whichcharacter is being typed.For example, if a user types “abc123”, he/she expects the application toprocess the string as “abc123”, not “qbc!@#” or “qbc&é”” Example:

Note that there are alsomany users of theDvorak keyboard layout

in the United States.

Page 29: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 29/44

 29

Requirement Description Explanation

Firmw are Keyboard Layout and IME SelectionThe firmware shall have keyboard layouts and IMEs for each target marketlanguage.The keyboard layout shall be set by the OEM at the factory.

The keyboard layouts shall be selectable by the user, for the purpose of whenthe user changes their physical keyboard.French keyboard example:

Implementation detail:scan codes for allkeyboards world-widegenerally don’t move on

the keyboard – instead,what is printed on thekeys move, and theBIOS or OS shallperform the scan code tocharacter translations.

Page 30: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 30/44

 30

Appendix C: Gettext Tutorial

The GNU gettext package offers a well-integrated set of tools and documentationthat provides a framework to help GNU packages display multi-lingual messages.These tools include a set of guidelines about how programs should be written tosupport message catalogs, a directory and file naming organization for the messagecatalogs themselves, and a runtime library supporting the retrieval of translatedmessages.

The GNU gettext Online Manual is available atwww.gnu.org/software/gettext/manual/gettext.html 

The document  “A tutorial on Native Language Support using GNU gettext”  providesan excellent detailed explanation for creating simple “Hello World” C code.

We created a simple but verbose program that uses a shared library to calculate x*yand x/d. The shared library also outputs one line of text when it is called. Thissoftware initially worked only in English, but this exercise details how to

internationalize it, and then localize it for one additional language; in this case,Russian. Feel free to an add additional language as you follow along and create POfiles for translation.

This program/shared library scenario covers multiple aspects of i18n: initializingapplication and shared libraries, working with different text domain and function setsusing GNU Automake.

The code for the following tutorial is here. 

The main program located in the directory mdtest-1.0 initially has eleven files:

1. AUTHORS

2. ChangeLog

3. configure.ac

4. Makefile.am

5. NEWS

6. README

7. src/Makefile.am

8. src/mdtest.c

9. src/mdtest.h

10. src/test1.c

11. src/test2.c

The shared library is located in mdtestlib-1.0.0 directory and initially has nine files:

1. AUTHORS

2. ChangeLog

3. configure.ac

4. Makefile.am

5. NEWS

6. README

7. src/Makefile.am

8. src/mdtestlib.c

9. src/mdtestlib.h

Page 31: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 31/44

 31

The program and the library use GNU Automake, a program that creates GNUstandards-compliant Makefiles from template files. Seehttp://www.gnu.org/software/automake/manual/automake.html for moreinformation about GNU Automake. For more information about creating sharedlibraries with GNU Automake see

http://www.openismus.com/documents/linux/building_libraries/building_libraries.shtml 

We will modify (i18n) package mdtest-1.0 to mdtest-1.1 and package mdtestlib-1.0.0 to mdtestlib-1.1.0. An extra five files will be added to the initial source code ineach package:

1. po/LINGUAS

2. po/Makevars

3. po/POTFILES.in

4. po/ru.po

5. m4/ChangeLog

Numerous files are created by Automake, so you know where the most importantones are.

Compiling and testing source code

sudo apt-get install automake gettext cvs libtool makesudo apt-get install language-pack-en language-pack-ruautoreconf --install

./configure

 make

sudo make install

sudo ldconfigcd ../mdtest-1.0

autoreconf --install./configure

 makesudo make install

 mdtest

The program is mdtest 1.0.

The library is mdtestlib 1.0.0.

Performing the tests with x = 6.000000 and y = 5.000000

1. Testing the mathematical operation of multiplication ...

Calling the multiplication function ...

The number 6.000000 multiplied by the number 5.000000 is equal to

the number 30.000000

Comparing the result of multiplication ...

The operation of multiplication was performed successfully.

2. Testing the mathematical operation of division ...

Calling the division function ...

The number 6.000000 divided by the number 5.000000 is equal to the

number 1.200000

Page 32: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 32/44

 32

Comparing the result of division ...

The operation of division was performed successfully.

The program mdtest 1.0 with the library mdtestlib 1.0.0 successfully

performed all tests.

sudo make uninstall

Internationalization: program mdtest 1.0

We will internationalize the main program – mdtest1.0 package in the next twelvesteps (the command lines to enter and the changes are highlighted with bold font):

1.  Enter the command gettextize. Press Enter when the suggestions appear. (Wewill follow them in step 2 to step 7.)

2.  Edit the confige.ac file. You can use gedit conf ige.ac (or any other text editor):

After the line AC_PROG_CC add two lines:

 AM_GNU_GETTEXT([external])

 AM_GNU_GETTEXT_VERSION(0.17)

Change the package version to 1.1

AC_INIT([mdtest], [1.1], [[email protected]])

3.  Edit po/Makevars.template and save it as a file named po/Makevars.

COPYRIGHT_HOLDER = Your Corporation 

MSGID_BUGS_ADDRESS = [email protected] 

4.  Enter the command aclocal -I m4 

5.  Get the latest config.guess and config.sub files

wget http://savannah.gnu.org/cgi- bin/viewcvs/*checkout*/config/config/config.guess

wget http://savannah.gnu.org/cgi-

 bin/viewcvs/*checkout*/config/config/config.sub

6.  Copy gettext.h file

cp /usr/share/gettext/gettext.h src/gettext.h

7.  Add three source file references to file po/POTFILES.in

src/mdtest.c

src/test1.c

src/test2.c

8.  Modify the source file src/mdtest.c

Page 33: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 33/44

 33

The initialization of locale data and gettext library should be done with thesame code in every program, as demonstrated below.

char *textdomain (const char *domain_name);

The textdomain function changes or queries the status of the current globaldomain of the LC_MESSAGE category. The argument is a null-terminatedstring, whose characters must be legal in the use in filenames. If thedomain_name argument is NULL, the function returns the current value. If novalue has been set before, the name of the default domain is returned:messages. Please note that although the return value of textdomain is of typechar * no changing is allowed. It is also important to know that no checks of the availability are made. If the name is not available, you will see this by thefact that no translations are provided.

To use a domain set by textdomain the function is called:

char *gettext (const char *msgid);

This is the simplest reasonable form. The translation of the string msgid isreturned if it is available in the current domain. If it is not available, theargument itself is returned. If the argument is NULL the result is undefined.

gettext not only looks up a translation in a message catalog, but also convertsthe translation on the fly to the desired output character set. This is useful if you are working in a different character set than the translator who createdthe message catalog, because it avoids distributing variants of messagecatalogs which differ only in the character set.

The output character set is, by default, the value of nl_langinfo (CODESET),

which depends on the LC_CTYPE part of the current locale. But programswhich store strings in a locale-independent way (e.g. UTF-8) can request thatgettext and related functions return the translations in that encoding, by useof the bind_textdomain_codeset function.

Note that the msgid argument to gettext is not subject to character setconversion. Also, when gettext does not find a translation for msgid, it returnsmsgid unchanged - independently of the current output character set. It istherefore recommended that all msgids be US-ASCII strings.

- Function: char * bind_textdomain_codeset (const char

*domainname, const char *codeset)

The bind_textdomain_codeset function can be used to specify the outputcharacter set for message catalogs for domain domainname. The codesetargument must be a valid codeset name which can be used for theiconv_open function, or a null pointer.

If the codeset parameter is the null pointer, bind_textdomain_codeset returnsthe currently selected codeset for the domain with the name domainname. Itreturns NULL if no codeset has yet been selected.

Page 34: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 34/44

 34

The bind_textdomain_codeset function can be used several times. If usedmultiple times with the same domainname argument, the latter call overridesthe settings made by the former.

The bind_textdomain_codeset function returns a pointer to a string containing

the name of the selected codeset. The string is allocated internally in thefunction and must not be changed by the user. If the system went out of coreduring the execution of bind_textdomain_codeset, the return value is NULLand the global variable errno is set accordingly.

Trigger gettext operations by calling gettext (or _() macro) or gettext_noop incase of the string initializers.

Here is the file src/mdtest.c with the changes we made from version 1.0 to1.1, highlighted in bold:

#include <stdio.h> // printf

#include <libgen.h> // basename

#include <locale.h> // setlocale LC_ALL#include "config.h" // PACKAGE_STRING ENABLE_NLS

#include "mdtestlib.h" // mdtestlib_version

#include "mdtest.h" // test1 test2

#include "gettext.h" // gettext textdomain bindtextdomain

#define _(string) gettext(string)

int main(int argc, char *argv[])

{

setlocale(LC_ALL, "");static char *domain = PACKAGE; bindtextdomain(domain, LOCALEDIR);

textdomain(domain);

static char *operation[] = {gettext_noop("multiplication"),

gettext_noop("division")};

#define total (sizeof(operation) / sizeof(operation[0]))

typedef int TEST(double, double);

static TEST *test[total] = {test1, test2};

 printf(_("\nThe program is %s.\n"), PACKAGE_STRING); printf(_("The library is %s.\n"), mdtestlib_version());

double x = 6.0;

double y = 5.0;

if (argc >= 4)

{

char *prog = basename(argv[0]);

fprintf(stderr, _("Usage: %s [x] [y]\n"), prog);fprintf(stderr, _(" the default value of double number x is

%f\n"), x);

fprintf(stderr, _(" the default value of double number y is%f\n"), y);

return 1;

Page 35: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 35/44

 35

}

if (argc >= 2)

{

sscanf(argv[1], "%lf", &x);

}

if (argc >= 3)

{

sscanf(argv[2], "%lf", &y);

}

 printf(_("Performing the tests with x = %lf and y = %lf\n"), x,

y);

int n;

for (n = 0; n < total; n++)

{

char *s = gettext(operation[n]);

puts("");

 printf(_("%d. Testing the mathematical operation of %s...\n"), n + 1, s);

int i = test[n](x, y);

if (i == 0){

  printf(_(" The operation of %s was performedsuccessfully.\n"), s);

}

else

{

  printf(_(" The operation of %s failed.\n"), s);

return 1;

}

}

printf(

 _("\nThe program %s with the library %s successfully

 performed all tests.""\n\n"), PACKAGE_STRING, mdtestlib_version());

return 0;

}

9.  Modify the source file src/test1.c in a similar way:

#include <stdio.h> // puts

#include <math.h> // fabs

#include "config.h" // ENABLE_NLS

#include "mdtestlib.h" // mdtestlib_multiplication

#include "mdtest.h" // test1

#include "gettext.h" // gettext

#define _(string) gettext(string)

int test1(double x0, double x1)

{

  puts(_(" Calling the multiplication function ..."));

double x = mdtestlib_multiplication(x0, x1);

  puts(_(" Comparing the result of multiplication ..."));

return ((fabs(x - x0 * x1) < 0.000001) ? 0 : 1);

Page 36: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 36/44

 36

}

10. Make similar changes to the file src/test2.c

#include <stdio.h> // puts

#include <math.h> // fabs

#include "config.h" // ENABLE_NLS#include "mdtestlib.h" // mdtestlib_division

#include "mdtest.h" // test2

#include "gettext.h" // gettext

#define _(string) gettext(string)

int test2(double x0, double x1)

{

  puts(_(" Calling the division function ..."));

double x = mdtestlib_division(x0, x1);

  puts(_(" Comparing the result of division ..."));

return ((fabs(x - x0 / x1) < 0.000001) ? 0 : 1);

}

11. Edit the file src/Makefile.am. Add the next line:

DEFS = -DLOCALEDIR=\"$(localedir)\" @DEFS@

Modify the next line:

mdtest_SOURCES = mdtest.c mdtest.h test1.c test2.c gettext.h 

12. Compile and test the package, then create a distribution tar ball.

 makesudo make install

 mdtest

The program is mdtest 1.1.

The library is mdtestlib 1.0.0.

...

sudo make unstall make distcheck

...

mdtest-1.1 archives ready for distribution:

mdtest-1.1.tar.gz

Internationalization: library mdtestlib 1.1.0

We will internationalize the shared library program – mdtestlib-1.0.0 package in thenext twelve steps, mostly identical to the steps before (the differences from theprevious steps are highlighted in yellow):

1.  cd ../mdtestlib-1.0.0

Page 37: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 37/44

 37

Enter the command gettextize. Press Return to acknowledge the paragraphsthat appear. We will utilize them from step 2 to step 7.

2.  Edit the confige.ac file. You can use gedit conf ige.ac (or any other text editor).

After line AC_PROG_ PROG_LIBTOOL add two lines:

 AM_GNU_GETTEXT([external])

 AM_GNU_GETTEXT_VERSION(0.17)

Change the package version to 1.1.0

AC_INIT([mdtestlib], [1.1.0], [[email protected]])

3.  Edit po/Makevars.template and save it as a file named po/Makevars

COPYRIGHT_HOLDER = Your Corporation MSGID_BUGS_ADDRESS = [email protected] 

4.  Enter the command aclocal -I m4 

5.  Get the latest config.guess and config.sub files

wget http://savannah.gnu.org/cgi-

 bin/viewcvs/*checkout*/config/config/config.guesswget http://savannah.gnu.org/cgi-

 bin/viewcvs/*checkout*/config/config/config.sub

6.  Copy gettext.h file

cp /usr/share/gettext/gettext.h src/gettext.h

7.  Add one source file reference to file the po/POTFILES.in

src/mdtestlib.c

8.  Modify the source file src/mdtestlib.c

While this single name domain works well for most applications, there might bethe need to get translations from more than one domain. Of course one couldswitch between different domains with calls to textdomain, but this is really notconvenient nor is it fast. A possible situation could be all error messages of functions in the set of common used functions should go into a separate domain

error. By this means we would only need to translate them once. Another case ismessages from a library, as these have to be independent of the current domainset by the application.

For these reasons there are two more functions to retrieve strings:

char *dgettext (const char *domain_name, const char *msgid);

char *dcgettext (const char *domain_name, const char *msgid,

int category);

Page 38: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 38/44

 38

Both take an additional argument at the first place, which corresponds to theargument of textdomain. The third argument of dcgettext allows you to useanother locale category but LC_MESSAGES. If the domain_name is NULL orcategory has a value beside the known ones, the result is undefined. It shouldalso be noted that this function is not part of the second known implementation

of this function family (the one found in Solaris).

A second ambiguity can arise by the fact that perhaps more than one domain hasthe same name. This can be solved by specifying where the needed messagecatalog files can be found.

char *bindtextdomain (const char *domain_name, const char

*dir_name);

Calling this function binds the given domain to a file in the specified directory(how this file is determined follows below). A file in the system's default place isnot favored against the specified file anymore (as it would be by solely using

textdomain). A NULL pointer for the dir_name parameter returns the bindingassociated with domain_name. If domain_name itself is NULL nothing happensand a NULL pointer is returned.

It is important to remember that relative path names for the dir_name parametercan be trouble. Since the path is always computed relative to the currentdirectory, different results are achieved when the program executes a chdircommand. Relative paths should always be avoided to avoid dependencies andunreliabilities. The initialization of gettext library should be done with the samecode in a shared library.

Message Catalog Files: Because many different languages for many differentpackages have to be stored, we need a way to add this information to file

message catalog files. The way usually used in Unix environments is to have thisencoding in the file name. This is also done here. The directory name given inbindtextdomains second argument (or the default directory), followed by thename of the locale, the locale category, and the domain name are concatenated:

dir_name/locale/LC_category/domain_name.mo

The default value for dir_name is system specific. For the GNU library, and forpackages adhering to its conventions, it's:

/usr/local/share/locale

Trigger gettext operations with dgettext (or _() macro) and gettext_noop in caseof the string initializers.

Here is the file src/mdtestlib.c with the changes we made from version 1.0.0 to1.1.0, bolded:

#include <stdio.h> // printf

#include "config.h" // PACKAGE_STRING ENABLE_NLS 

Page 39: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 39/44

 39

#include "mdtestlib.h" // mdtestlib_multiplication

mdtestlib_division

#include "gettext.h" // gettext textdomain bindtextdomain

#define _(String) dgettext (PACKAGE, String)

extern void __attribute__ ((constructor)) mdtestlib_init(void)

{ bindtextdomain(PACKAGE, LOCALEDIR);

}

extern char *mdtestlib_version(void)

{

return PACKAGE_STRING;

}

static void print_message(const char *operation, double x, double y,

double z)

{

 printf(_(" The number %f %s by the number %f is equal to the

number %f\n"),x, operation, y, z);

}

extern double mdtestlib_multiplication(double x, double y)

{

double z = x * y;

 print_message(_("multiplied"), x, y, z);

return z;

}

extern double mdtestlib_division(double x, double y)

{

double z = x / y;

 print_message(_("divided"), x, y, z);

return z;

}

9.  There are no more source files to modify

10. There are no more source files to modify

11. Edit the file src/Makefile.am. Add the next line:

DEFS = -DLOCALEDIR=\"$(localedir)\" @DEFS@

Modify the next line:

libmdtestlib_la_SOURCES = mdtestlib.c gettext.h 

12. Compile and test the package.

 make

sudo make installsudo ldconfig

Page 40: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 40/44

 40

../mdtest-1.0/src/mdtest

The program is mdtest 1.1.

The library is mdtestlib 1.1.0.

...

sudo make unstall make distcheck

...

mdtestlib-1.1.0 archives ready for distribution:

mdtestlib-1.1.0.tar.gz

...

cd ..

If you did all the steps then you have everything ready for translation. You canrename directories and skip steps 1, 2, and 3 in the next paragraph.

 mv mdtest-1.0 mdtest-1.1

 mv mdtestlib-1.0.0 mdtestlib-1.1.0cd mdtestlib-1.1.0

Go to step 4 in the next paragraph.

Translators: creating a new language version

1.  Prepare your system and install all required packages:

sudo apt-get install automake gettext cvs libtool makesudo apt-get install language-pack-en language-pack-ru 

2.  Extract the packages to be translated:

gzip -d mdtest-1.1.tar.gztar -xf mdtest-1.1.targzip -d mdtestlib-1.1.0.tar.gz

tar -xf mdtestlib-1.1.0.tar

3.  Change the locale, configure the package mdtestlib-1.1.0, and build it.

A locale name usually has the form ‘ll_CC’. Here ‘ll’ is an ISO 639 two-letterlanguage code:http://www.gnu.org/software/gettext/manual/gettext.html#Language-Codes, and ‘CC’ is an ISO 3166 two-letter country code:http://www.gnu.org/software/gettext/manual/gettext.html#Country-Codes. 

Many locale names have an extended syntax ‘ll_CC.encoding’ that also specifiesthe character encoding. These are in use because between 2000 and 2005, mostusers have switched to locales in UTF-8 encoding.

cd mdtestlib-1.1.0

export -n LANGUAGEexport LANG=ru_RU.UTF-8

./configure

Page 41: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 41/44

 41

4.  Add the new language code (two-letter ISO 639-1 codehttp://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) to the file po/LINGUAS.Create the file LINGUAS in po directory if doesn’t exist:

# Set of available languages.

ru 

5.  Create the file (ru.po) for translation:

 make

cd po make -B

 msginit

Which is your email address?

1 [email protected]

2 first@first-uq

Please choose the number, or enter your email address.1

A translation team for your language (ru) does not exist yet.

If you want to create a new translation team for ru, please visit

http://www.iro.umontreal.ca/contrib/po/HTML/teams.html

http://www.iro.umontreal.ca/contrib/po/HTML/leaders.html

http://www.iro.umontreal.ca/contrib/po/HTML/index.html

Создано ru.po. (Created ru.po.) 

6.  Translate the PO file (ru.po).

A PO file is made up of many entries, each entry holding the relation between anoriginal string and its corresponding translation. All entries in a given PO fileusually relate to a single project, and all translations are expressed in a singletarget language. One PO file entry has the following schematic structure:

white-space

# translator-comments

#. extracted-comments

#: reference...

#, flag...

#| msgid previous-untranslated-string

msgid untranslated-string

msgstr translated-string

The general structure of a PO file should be well understood by the translator.See more information about PO files athttp://www.gnu.org/software/autoconf/manual/gettext/PO-Files.html#PO-Files 

Any UTF-8 capable editor can be used to translate the messages. We didtranslation with gedit ro.ru, but there is a special PO GNU editor: gtranslator or Emacs's PO File Editor.

Page 42: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 42/44

 42

Also, it is possible to convert the PO file to DOS file format:

sed -e 's/$/\r/' ru.po > ru.txt

Then send just the DOS file to a translator for translation on a Windows systemwith Notepad editor or any UFT-8 capable Windows editor. In this case the

maintainer (or programmer) is responsible to execute the previous and the nextsteps.

Here are the translated messages for the ru.po file for mdtestlib-1.1.0 example(translations are highlighted in bold):

#: src/mdtestlib.c:20

#, c-format

msgid " The number %f %s by the number %f is equal to the

number %f\n"

msgstr " Число %f %s на число %f равно числу %f\n"

#: src/mdtestlib.c:27

msgid "multiplied"msgstr "умноженное"

#: src/mdtestlib.c:34

msgid "divided"

msgstr " разделённое"

Because the PO files must be portable to operating systems with less advancedinternationalization facilities, the character encodings that can be used are limitedto those supported by both GNU libc and GNU libiconv. These are: ASCII, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-13, ISO-8859-14, ISO-8859-15,KOI8-R, KOI8-U, KOI8-T, CP850, CP866, CP874, CP932, CP949, CP950, CP1250,

CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, GB2312, EUC-JP,EUC-KR, EUC-TW, BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS, JOHAB, TIS-620, VISCII, GEORGIAN-PS, UTF-8.

In the GNU system, the following encodings are frequently used for thecorresponding languages.

•  ISO-8859-1 Afrikaans, Albanian, Basque, Breton, Catalan, Cornish,Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician,German, Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx,Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek,Walloon

• ISO-8859-2 Bosnian, Croatian, Czech, Hungarian, Polish, Romanian,Serbian, Slovak, Slovenian

•  ISO-8859-3 Maltese•  ISO-8859-5 Macedonian, Serbian•  ISO-8859-6 Arabic•  ISO-8859-7 Greek,•  ISO-8859-8 Hebrew•  ISO-8859-9 Turkish•  ISO-8859-13 Latvian, Lithuanian, Maori

Page 43: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 43/44

 43

•  ISO-8859-14 Welsh•  ISO-8859-15 Basque, Catalan, Dutch, English, Finnish, French, Galician,

German, Irish, Italian, Portuguese, Spanish, Swedish, Walloon•  KOI8-R Russian•  KOI8-U Ukrainian•  KOI8-T Tajik•  CP1251 Bulgarian, Byelorussian•  GB2312, GBK, GB18030 simplified Chinese•  BIG5, BIG5-HKSCS traditional Chinese•  EUC-JP Japanese•  EUC-KR Korean•  TIS-620 Thai,•  GEORGIAN-PS Georgian•  UTF-8 for any language, including those listed above.

We recommend using UTF-8 encoding for new translations.

7.  Force build

 make -B

8.  Make corrections if there are any errors.

In our case it was suggested to change line 18:

"Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 :n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);\n"

9.  Force build again:

 make -B

10. Change to the package directory and install the package:

cd ..sudo make install

11. Repeat above steps 2-10 to translate the package mdtest-1.1.

cd ../mdtest-1.1

./configure

...sudo make install

12. Test translations, create the distribution packages, and return them to thepackage maintainer.

 mdtest

Запущенна программа mdtest 1.1. 

Используется библиотека mdtestlib 1.1.0. 

Выполняются тесты для x = 6,000000 и y = 5,000000 

1. Проверяется математическая операция умножения ... 

Page 44: Internationalization Guidelines 2

8/9/2019 Internationalization Guidelines 2

http://slidepdf.com/reader/full/internationalization-guidelines-2 44/44

Calling the multiplication function ...

Число 6,000000 умноженное на число 5,000000 равно числу 30,000000 

Comparing the result of multiplication ...

Операция умножения была успешно выполнена. 

2. Проверяется математическая операция деления ... 

Calling the division function ...

Число 6,000000 разделённое на число 5,000000 равно числу 1,200000 

Comparing the result of division ...

Операция деления была успешно выполнена. 

Программа mdtest 1.1 с библиотекой mdtestlib 1.1.0 успешно выполнила

все тесты. 

Looks like everything works as expected: mdtest-1.1 calls mdtestlib-1.1.0 and allmessages are translated.

 make distcheck

...

============================================

mdtest-1.1 archives ready for distribution:mdtest-1.1.tar.gz

============================================

...

cd ../mdtestlib-1.1.0

 make distcheck

...

=================================================

mdtestlib-1.1.0 archives ready for distribution:

mdtestlib-1.1.0.tar.gz

=================================================

...

The packages mdtest-1.1.tar.gz and mdtestlib-1.1.0.tar.gz are now localized forUS English and Russian languages; and ready for distribution.