internationalisation and globalisation
DESCRIPTION
This is a very old presentation but if you gloss over the usage of VB6 there is plenty of value. I presented this to the VBUG Annual Conference in 2003.TRANSCRIPT
Internationalisation and Globalisation
Visual Basic 6
Alan Dean
or
©2003
Credit to Kaplan
• “Internationalisation with Visual Basic”Michael S. KaplanISBN 0672319772
Credit to Appleman
• “Visual Basic Programmer’s Guide to the Win32 API”Dan ApplemanISBN 0672315904
Outline
• “In a connected world, it is increasingly important to be able to implement solutions for users across the world. Unfortunately, the ability to do this with VB6 is not well documented, requires a lot of effort to understand and is not available 'out of the box'.”
• http://www.unitoolbox.com
Contents
• The following subjects are covered:CharactersKeyboardsFonts (very briefly…)LanguagesStringsTechniques to code an internationalised
application
Terminology
Terminology – Contents
• Globalisation
• Internationalisation (i18N)
• Multinationalisation (M18N)
• Translation
• Localisation (L10N)
Internationalisation (i18N)
• The process of converting an application to be capable of multinationalisation and localisation
• Culture-specific issues are addressede.g. conventions, preferences, data formatting
• Depends upon default system or user preferences
• Does not require the translation of the text of an application
Globalisation
• The process of designing and developing an application that supports localized user interfaces and regional data for users in multiple cultures
.NET Framework Developers Guide
Multinationalisation (M18N)
• The process of converting an application to support multiple cultures
• A significant enhancement of i18N
• Multiple language availability, including crossing the code page barrierE.g. Office2000 multilanguage packs
(langpacks) and Win2000 multilanguage user interface (MUI)
Translation
• The process of representing the text of an application in another languagee.g. dialogs, menus, alerts, documentation
etc.
• For example, the ‘File|Open’ menu item is translated to ‘Fichier|Ouvrir’ in FrenchMicrosoft International Word List
• Converts the meaning and sense of the text, not just the words
Beware Babelfish!
• “Insert the boot disk into Drive A”Translate from English to German using
Babelfish“Legen Sie die Boot Diskette in Laufwerk A
ein” which means“Insert the charge disk in Propulsion A”
• “Setzen Sie die Aufladung Scheibe in Antrieb A ein” is the correct translation
Localisation (L10N)
• The process of converting an application to adhere to the local culture of a user
Terminology - Summary
• Explained some of the general terms used around internationalisation
• Discussed the scope of the terms used
About Characters
About Character - Contents
• Character Repertoires
• Character Codes & Encoding
• Character SetsASCII, ANSI, DBCS, Unicode
• Windows Character Set Usage
Character (definition)
• character noun…7. letter or symbol: any written or printed letter, number, or other symbol…Source: Encarta World English Dictionary
Character (alternate definition)
• A character is the atomic unit of textual communication
Character Repertoire
• An abstract set of distinct charactersUsually defined by specifying a name and
sample presentation of each characterThe ordering of characters for sorting
purposes is not definedEither:
Fixed (e.g. English), or Open (e.g. Unicode, Chinese)
Character Repertoire (English)
• The character repertoire of English containsAlphabet
Upper case A ‘A’ … Lower case Z ‘z’Punctuation
Period . Ellipses … Comma , Semicolon ; Colon : Question Mark ? Exclamation Point ! Quotation Marks “” Parentheses () Apostrophe ‘ Hyphen -
Character Repertoires
Character Code
• A mapping between an unsigned integer and a charactere.g. 65=‘A’
• The VB Functions Chr$(…) and Asc(…) address this mappinge.g. Chr$(65) returns “A”e.g. Asc(“A”) returns 65
Character Encoding
• The process of collating code points by assigning an unsigned integer to each character in a repertoire
• The output of encoding is a character set
• The values assigned imply ordering of the character set, but the ordering may not be meaningful
Character Set
• An encoded character repertoire
• There are a large number of character sets
• Character sets are not language specifice.g. Latin Alphabet No.1 (ISO 8859-1)
ASCII Character Set
ANSI Character Sets
Double-byte Character Sets (DBCS)
• aka MBCS (Multi-byte character set)Because first 128 characters single-byte
encoded as ANSIAdditional characters double-byte encoded
• Double-byte encodingthe first (or ‘lead’) byte signals that both
itself and the next byte are to be interpreted as a single character
Double-byte character
DBCS Example
Unicode Character Set
• All characters as double-byte encoded(as far as Windows is concerned anyway: UCS-2/UTF-16)
• Although DBCS and Unicode both use double-byte encoding, the mapping differs
• All characters in the Unicode character set are given a unique value
Character Set Comparison
Character Repertoires Revisited
Windows Character Set Usage
• 16-bit Windows use ANSI character setsKnown as Code Pages
• 32-bit Windows use Unicode
Windows Code Page
• A table of 256(+) code points for a languageFirst 128 code points are the same (the
ASCII table of non-printing and English characters)
Next 128(+) are used for non-English characters needed by the language
• Based on ANSI character sets
Windows Code Page 1252, etc.
• http://www.microsoft.com/globaldev/reference/sbcs/1252.htm
About Characters - Summary
• Explained how characters are gathered into repertoires, and are then encoded into character sets
• Described the main character sets supported by Windows
About Keyboards
About Keyboards - Contents
• Scan Codes
• Keyboard Layouts
• Virtual Keys
Scan Code
• A hardware-dependent code sent by a keyboard to indicate a keyboard operation
• Scan codes can vary between different keyboards
Keyboard Layout
• A definition of the scan codes supported by a keyboardWin3.x have a system-wide layoutWin9x and WinNT support multiple layouts
on a system-wide and per-thread basis
Virtual Key
• An abstraction of scan codes, so that interpretation of input need not be hardware-specific
• API Constants exist with VK_ prefixe.g. VK_A
From Key to Character
Keyboard limitations
• Keyboards are an effective data entry method for most languages
• However there are no keyboards for character-based languages because there are no keyboards with thousands of keys…i.e. Far East languages (also known as
Chinese/Japanese/Korean, or CJK languages)
Input Method Editor (IME)
• Software to allow the input of CJK charactersA group that approximates a character is
selectedAn actual character can then be selected
from the group
• Run by the Input Method Manager (IMM)
Japanese IME
About Keyboards - Summary
• Explained how keystrokes become characters
• Briefly discussed non-keyboard input
About Fonts
About Fonts - Contents
• Character-based systems
• Graphic-based systems
• Glyphs & Fonts
Character-based Systems
• Such systems display characters only
Graphic-based Systems
• Such systems display glyphs, not characters
Glyph
• A glyph is a graphical representation of a character
Font
• A collection of glyphs
About Fonts - Summary
• Discussed the difference between character-based and graphic-based systems
• Briefly discussed the representation of characters by glyphs and fonts
About Languages
About Languages - Contents
• Languages
• Locales
Language (definition)
• language noun1. speech of group: the speech of a country, region, or group of people, including its diction, syntax, and grammar…Source: Encarta World English Dictionary
Locale
• A specific international market where a target user is working
• Encompasses localisation issues:e.g. conventions, culture, language,
preferencesincluding formatting of numbers,
currencies, etc.phraseology can vary also
Locale Identifier (LCID)
• A 32-bit unsigned integer that identifies the locale for the system or thread
• Commonly pronounced el-sid
LCID Structure
LCID Language
• Language IdentifierA combination of the primary and secondary
language identifiers
• Primary Language IdentifierRepresents the language itself(e.g. ‘English’)
• Secondary Language IdentifierRepresents the country or region where the
language is spoken(e.g. ‘English as spoken in the United Kingdom’)
LCID Sorting
• Sort IdentifierRepresents the order in which characters
are to be sorted (usually the default)
• Sort VersionCurrently unused (it is reserved and must
be set to 0)
Locale Coverage
• Windows does not have locales for all possible language / region combinationsIn fact, almost without exception, a locale
is only supported if there is a country or region that speaks the language
For example there is no locale for Esperanto, Coptic or Latin and certainly not for Klingon!
Locale Usage
• Settings associated with Locales are heavily used by Windows, COM and VBSo, the current Locale fundamentally
affects the processing of information on a system
• Settings are accessed by the Regional Options control panel
About Languages - Summary
• Discussed the relationship between languages and locales
• Explained the structure of the locale identifier
About Strings
About Strings - Contents
• C Strings
• VB Strings
• VB String calls to COM and Win32 API functions
String
• An array of characters
• Not a primitive datatype
• A number of string datatypes existe.g. LPSTR, BSTR, etc.
Pointer to String (LPSTR)
• C datatype
• Null-terminated
• Used extensively throughout the Windows API
Basic String (BSTR)
• COM datatype, used by VB internally
• Unicode pointer to a block of memory prefixed by a length encoding representing the size of the stringA contract for creation (allocation)A contract for destruction (deallocation)An API
VB COM Calls
• Both VB and COM use Unicode, so strings are not transposed into alternate character sets
VB Win32 API Calls
• Character encodingVB and WinNT use Unicode encoding, butWin9x uses ANSI encoding
• Unfortunately VB does not know the encoding expected on the target API callStrings are therefore encoded as ANSIThus the call succeeds both on Win9x and
WinNT, but this wasteful on WinNT…
VB Win9x API Call
VB WinNT API Call
VB WinNT API Call (Unicode)
About Strings - Summary
• Discussed C and VB strings
• Explained how COM and Win32 API string function calls are transacted
An Internationalised App
1.0.1
• ‘Plain vanilla’ VB Standard EXE
2.0.2
• 1st attempt to internationaliseAddition of resource file
2.1.2
• 2nd attempt to internationaliseIsolate persistent strings
2.2.2
• 3rd attempt to internationaliseParameterise resource strings
2.2.3
• 4th attempt to internationaliseLoading with current LCIDBy setting thread locale
3.0.4
• 5th attempt to internationaliseLoading with current LCID (again…)By loading resources directly
3.1.5
• 6th attempt to internationaliseLoading with current LCID (yet again!)By employing satellite resource
3.1.6
• 5th attempt to internationaliseLoading all strings from satellite resources
Conclusion
• Covered Characters, Keyboards, Fonts, and Languages
• Explained Strings and the usage of Strings
• Coded a simple internationalised application