internationalizing javascript applications...representation • at runtime, use object as key-value...
TRANSCRIPT
Internationalizing JavaScript Applications
Norbert Lindenberg
© Norbert Lindenberg 2013. All rights reserved.
Agenda
• Unicode support
• Collation
• Number and date/time formatting
• Localizable resources
• Message construction
2
JavaScript is…• ECMAScript Language
• ECMAScript Internationalization API
• Browser: DOM, Navigator, XMLHttpRequest
• Server: Node.js
• Platforms: Firefox OS, Windows 8, Phonegap
• Libraries: jQuery, Dojo, YUI, GWT, Node modules, etc.
3
ECMAScript
• Language Speci!cation
• Developed by Ecma TC 39
• Language syntax and semantics
• Core API: Object, String, Array, RegExp, ...
• Edition 5.1 current
• Edition 6 expected December 2014
4
ECMAScript
• Internationalization API Speci!cation
• Developed by Ecma TC 39 + experts
• API: Collator, NumberFormat, DateTimeFormat
• Edition 1 approved December 2012
• Chrome, Opera, Explorer, Windows shipped; Firefox, Node.js coming
• Edition 2 expected December 2014
5
Unicode
Unicode support
• All text in UTF-16 internally
• UTF-8 well supported for transport
• Need to identify charset in <script> tags, Content-Type headers
• Need to use encodeURIComponent for path and query string components
7
Occupy Wall Street. By @tanlines.
Supplementary characters
• Characters above U+FFFF
• Emoji, rare CJK, ancient scripts, musical symbols, ...
• 2 code units in UTF-16
9
Today: UCS-2 or UTF-16?UCS-2:
• Regular expressions
• String comparison
• Case conversion
UTF-16:
• Source text conversion
• URI handling
10
Today: UCS-2 or UTF-16?UCS-2:
• Regular expressions
• String comparison
• Case conversion
UTF-16:
• Source text conversion
• URI handling
• DOM, text input, text rendering, XMLHttpRequest
11
ECMAScript 6: UTF-16• Case conversion for full Unicode
• Full Unicode in identi!ers
• String accessors for code points
• But: no change to low-level string comparison
• Planned: New Unicode mode in regular expressions
12
Regular expressions
• RegExp in ES5 doesn’t have much Unicode support
• No support for Unicode character properties
• No support for supplementary characters
13
Regular expressions
• CSet (inimino): Character classes with supplementary characters
• XRegExp (Steven Levithan and Mathias Bynens): Unicode categories and properties with supplementary characters
14
Unicode normalization
• Makes strings be equal that users perceive as equal (more or less)
• ä = a ¨U+00E4 = U+0061 U+0308
• 김 = ; dtU+AE40 = U+1100 U+1175 U+11B7
15
Unicode normalization
• ECMAScript 5 “assumes” normalization happens where needed
• Reality: applications have to do it
• ECMAScript 6: String.prototype.normalize"김".normalize("NFD") → "\u1100\u1175\u11B7"
• Libraries available, but not up to date:
• unorm (Matsuza)
• Richard Ishida’s normalizer
16
北京大学.中国
北京大学.中国
Internationalized domain names
• Unicode at user interface
• ASCII under the hood
• 北京大学.中国 = xn--1lq90ic7fzpc.xn--!qs8s
• Main steps:
• normalization (as discussed)
• punycode (Mathias Bynens has latest)
19
Collation
Collation (sorting)• Old: String.prototype.localeCompare
• Only string argument
• New: Intl.Collator
• locales
• options
• Fixed: String.prototype.localeCompare
• With locales and options arguments
21
Locales• BCP 47 language tags
• Language, script, country codes
• “es”, “en-AU”, “zh-Hans-CN”
• Unicode locale extension
• “de-u-co-phonebk”
• Preference lists
• [“mr”, “hi”, “en-IN”]
22
Locale negotiation• BCP 47 Lookup
• [“es-GT”, “es-MX”] → “es-GT”, “es”, “es-MX”
• Best !t
• implementation de!ned
• [“es-GT”, “es-MX”] → “es-GT”, “es-MX”, “es”
• Unicode extension handled separately
23
Collator extensions
• co: collation – phonebook, pinyin, ...
• kf: case !rst – upper, lower
• kn: numeric sorting
24
Collator options
• localeMatcher: lookup, best !t
• usage: sort, search
• sensitivity: base, accent, case, variant
• ignorePunctuation
• numeric, caseFirst
25
Non-ECMAScript
• Nothing good found (some for Latin only)
• Collation is hard
• Knowledge of full Unicode character set
• Big tables
• Send lists that need alphabetic sorting to server
26
Number formatting
27
Number formatting• Old: Number.prototype.toLocaleString
• No arguments
• New: Intl.NumberFormat
• locales
• options
• Fixed: Number.prototype.toLocaleString
• With locales and options arguments
28
NumberFormat extensions
• nu: numbering system
29
NumberFormat options
• localeMatcher: lookup, best !t
• style: decimal, currency, percent
• currency: ISO 4217 currency code
• currencyDisplay: symbol, code, name
• minimum/maximum digits
• useGrouping
30
¤ % ๙ # , ⚑Globalize + + - + - 250+
Dojo + + - + - 30+
Closure + + + + + 300+
Windows 8 + + + + + 100s
iLib + + - + - 10+¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.
Non-ECMAScript
31
Date and time formatting
Date and time formatting
• Old: Date.prototype.toLocale[|Date|Time]String
• No arguments
• New: Intl.DateTimeFormat
• locales
• options
• Fixed: Date.prototype.toLocale[|Date|Time]String
• With locales and options arguments
33
DateTimeFormat extensions
• ca: calendar
• nu: numbering system
34
DateTimeFormat options
• localeMatcher: lookup, best !t
• timeZone: UTC
• hour12
• weekday, era, year, month, day, hour, minute, second, timeZoneName: components
• formatMatcher: basic, best !t
35
Non-ECMAScript
ca tz ๙ ⚑Globalize 5+ + - 250+Dojo 4 - - 30+Closure + + + 300+Windows 8 ? - ? ?Moment - - - 50YUI - - - 50+ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.
36
Resources
Localizable resources
• ECMAScript doesn’t have an I/O system, therefore no standard resource loading
• Developers have invented many different mechanisms
38
Representation• At runtime, use object as key-value map
• JSON good for transfer; not as source
• Localizers need comments
• Existing source formats have problems
• Java properties !les are not UTF-8
• gettext .po !les encoding is unspeci!ed
• YUI uses its own YRB format
39
How dynamic would you like it?
• Making resources available to application
• Inject strings/objects into JavaScript code
• Bundle resources with JavaScript code
• Load resource bundles at runtime
40
Injecting strings into JavaScript
• Server knows locale when generating JavaScript and HTML
• Inject strings or objects directly where needed
• E.g., JavaServer Pages Standard Tag Library:
<fmt:bundle basename="Messages”> <script> alert("<fmt:message key='HELLO'/>"); </script></fmt:bundle>
41
Injecting strings into JavaScript
• Problems:
• Mixes multiple programming languages
• Can introduce syntax errors through localization
42
Bundling resources with JavaScript
• Server knows locale when serving JS
• Bundles resources with JavaScript
• Convert to JavaScript/JSON
• Concatenate with other JavaScript
• Resources:var MyResources = {HELLO: "안녕하세요"};
• Code:alert(MyResources.HELLO);
43
Loading resources at runtime
• Locale not known until runtime
• Request resources at runtime
• Using XMLHttpRequest
• By creating script tag
• Resources in JSON or module format
44
Loading resources at runtime
• Cross-domain support?
• Not with XMLHttpRequest
• Possible with script tag
• Synchronous access?
• More convenient programming model
• Can lock up browser
• BCP 47 support?
• Many loaders assume aa-AA format
45
Access to resources in libraries
• Dojo
• Loading at runtime, synchronous
• GWT
• Injecting resources (Constant)
• Bundling resources (with HTML, Dictionary)
• YUI
• Bundling resources via module loader
46
Message construction
Photo © Den Widhana
Message construction
• Substitution
• {user} went to {city}.
• {user}さんは{city}へ行きました。
48
Message construction
• Plurals
• {user} est allé à {city}.
• {user1} et {user2} sont allés à {city}.
• 1-6 forms depending on language
• {number, plural {one {...} few {...} many {...}}}
49
Message construction
• Gender
• {user} est allé à {city}.
• {user} est allée à {city}.
• 1-4 forms depending on language
• {gender, select {female {...} male {...} unknown {...}}}
50
Message construction{gender, select {
female {num, plural {
one {{user1} est allée à {city}.}
other {{user1} et {user2} sont allées à {city}.}}}
male {num, plural {
one {{user1} est allé à {city}.}
other {{user1} et {user2} sont allés à {city}.}}}
}}
51
Message construction
• Google has MessageFormat for Closure environment
• Alex Sexton provided standalone version
• Mozilla has even more ambitious L20n library
52
Summary• ECMAScript Internationalization API provides
core functionality
• http://norbertlindenberg.com/2012/12/ecmascript-internationalization-api/
• Libraries provide more internationalization support than you may think
• http://norbertlindenberg.com/2013/10/javascript-internationalization/
53