internationalizing javascript applications...representation • at runtime, use object as key-value...

Post on 12-Mar-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Internationalizing JavaScript Applications

Norbert Lindenberg

© Norbert Lindenberg 2013. All rights reserved.

Agenda

• Unicode support

• Collation

• Number and date/time formatting

• Localizable resources

• Message construction

2

JavaScript is…• ECMAScript Language

• ECMAScript Internationalization API

• Browser: DOM, Navigator, XMLHttpRequest

• Server: Node.js

• Platforms: Firefox OS, Windows 8, Phonegap

• Libraries: jQuery, Dojo, YUI, GWT, Node modules, etc.

3

ECMAScript

• Language Speci!cation

• Developed by Ecma TC 39

• Language syntax and semantics

• Core API: Object, String, Array, RegExp, ...

• Edition 5.1 current

• Edition 6 expected December 2014

4

ECMAScript

• Internationalization API Speci!cation

• Developed by Ecma TC 39 + experts

• API: Collator, NumberFormat, DateTimeFormat

• Edition 1 approved December 2012

• Chrome, Opera, Explorer, Windows shipped; Firefox, Node.js coming

• Edition 2 expected December 2014

5

Unicode

Unicode support

• All text in UTF-16 internally

• UTF-8 well supported for transport

• Need to identify charset in <script> tags, Content-Type headers

• Need to use encodeURIComponent for path and query string components

7

Occupy Wall Street. By @tanlines.

Supplementary characters

• Characters above U+FFFF

• Emoji, rare CJK, ancient scripts, musical symbols, ...

• 2 code units in UTF-16

9

Today: UCS-2 or UTF-16?UCS-2:

• Regular expressions

• String comparison

• Case conversion

UTF-16:

• Source text conversion

• URI handling

10

Today: UCS-2 or UTF-16?UCS-2:

• Regular expressions

• String comparison

• Case conversion

UTF-16:

• Source text conversion

• URI handling

• DOM, text input, text rendering, XMLHttpRequest

11

ECMAScript 6: UTF-16• Case conversion for full Unicode

• Full Unicode in identi!ers

• String accessors for code points

• But: no change to low-level string comparison

• Planned: New Unicode mode in regular expressions

12

Regular expressions

• RegExp in ES5 doesn’t have much Unicode support

• No support for Unicode character properties

• No support for supplementary characters

13

Regular expressions

• CSet (inimino): Character classes with supplementary characters

• XRegExp (Steven Levithan and Mathias Bynens): Unicode categories and properties with supplementary characters

14

Unicode normalization

• Makes strings be equal that users perceive as equal (more or less)

• ä = a ¨U+00E4 = U+0061 U+0308

• 김 = ; dtU+AE40 = U+1100 U+1175 U+11B7

15

Unicode normalization

• ECMAScript 5 “assumes” normalization happens where needed

• Reality: applications have to do it

• ECMAScript 6: String.prototype.normalize"김".normalize("NFD") → "\u1100\u1175\u11B7"

• Libraries available, but not up to date:

• unorm (Matsuza)

• Richard Ishida’s normalizer

16

北京大学.中国

北京大学.中国

Internationalized domain names

• Unicode at user interface

• ASCII under the hood

• 北京大学.中国 = xn--1lq90ic7fzpc.xn--!qs8s

• Main steps:

• normalization (as discussed)

• punycode (Mathias Bynens has latest)

19

Collation

Collation (sorting)• Old: String.prototype.localeCompare

• Only string argument

• New: Intl.Collator

• locales

• options

• Fixed: String.prototype.localeCompare

• With locales and options arguments

21

Locales• BCP 47 language tags

• Language, script, country codes

• “es”, “en-AU”, “zh-Hans-CN”

• Unicode locale extension

• “de-u-co-phonebk”

• Preference lists

• [“mr”, “hi”, “en-IN”]

22

Locale negotiation• BCP 47 Lookup

• [“es-GT”, “es-MX”] → “es-GT”, “es”, “es-MX”

• Best !t

• implementation de!ned

• [“es-GT”, “es-MX”] → “es-GT”, “es-MX”, “es”

• Unicode extension handled separately

23

Collator extensions

• co: collation – phonebook, pinyin, ...

• kf: case !rst – upper, lower

• kn: numeric sorting

24

Collator options

• localeMatcher: lookup, best !t

• usage: sort, search

• sensitivity: base, accent, case, variant

• ignorePunctuation

• numeric, caseFirst

25

Non-ECMAScript

• Nothing good found (some for Latin only)

• Collation is hard

• Knowledge of full Unicode character set

• Big tables

• Send lists that need alphabetic sorting to server

26

Number formatting

27

Number formatting• Old: Number.prototype.toLocaleString

• No arguments

• New: Intl.NumberFormat

• locales

• options

• Fixed: Number.prototype.toLocaleString

• With locales and options arguments

28

NumberFormat extensions

• nu: numbering system

29

NumberFormat options

• localeMatcher: lookup, best !t

• style: decimal, currency, percent

• currency: ISO 4217 currency code

• currencyDisplay: symbol, code, name

• minimum/maximum digits

• useGrouping

30

¤ % ๙ # , ⚑Globalize + + - + - 250+

Dojo + + - + - 30+

Closure + + + + + 300+

Windows 8 + + + + + 100s

iLib + + - + - 10+¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.

Non-ECMAScript

31

Date and time formatting

Date and time formatting

• Old: Date.prototype.toLocale[|Date|Time]String

• No arguments

• New: Intl.DateTimeFormat

• locales

• options

• Fixed: Date.prototype.toLocale[|Date|Time]String

• With locales and options arguments

33

DateTimeFormat extensions

• ca: calendar

• nu: numbering system

34

DateTimeFormat options

• localeMatcher: lookup, best !t

• timeZone: UTC

• hour12

• weekday, era, year, month, day, hour, minute, second, timeZoneName: components

• formatMatcher: basic, best !t

35

Non-ECMAScript

ca tz ๙ ⚑Globalize 5+ + - 250+Dojo 4 - - 30+Closure + + + 300+Windows 8 ? - ? ?Moment - - - 50YUI - - - 50+ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.

36

Resources

Localizable resources

• ECMAScript doesn’t have an I/O system, therefore no standard resource loading

• Developers have invented many different mechanisms

38

Representation• At runtime, use object as key-value map

• JSON good for transfer; not as source

• Localizers need comments

• Existing source formats have problems

• Java properties !les are not UTF-8

• gettext .po !les encoding is unspeci!ed

• YUI uses its own YRB format

39

How dynamic would you like it?

• Making resources available to application

• Inject strings/objects into JavaScript code

• Bundle resources with JavaScript code

• Load resource bundles at runtime

40

Injecting strings into JavaScript

• Server knows locale when generating JavaScript and HTML

• Inject strings or objects directly where needed

• E.g., JavaServer Pages Standard Tag Library:

<fmt:bundle basename="Messages”> <script> alert("<fmt:message key='HELLO'/>"); </script></fmt:bundle>

41

Injecting strings into JavaScript

• Problems:

• Mixes multiple programming languages

• Can introduce syntax errors through localization

42

Bundling resources with JavaScript

• Server knows locale when serving JS

• Bundles resources with JavaScript

• Convert to JavaScript/JSON

• Concatenate with other JavaScript

• Resources:var MyResources = {HELLO: "안녕하세요"};

• Code:alert(MyResources.HELLO);

43

Loading resources at runtime

• Locale not known until runtime

• Request resources at runtime

• Using XMLHttpRequest

• By creating script tag

• Resources in JSON or module format

44

Loading resources at runtime

• Cross-domain support?

• Not with XMLHttpRequest

• Possible with script tag

• Synchronous access?

• More convenient programming model

• Can lock up browser

• BCP 47 support?

• Many loaders assume aa-AA format

45

Access to resources in libraries

• Dojo

• Loading at runtime, synchronous

• GWT

• Injecting resources (Constant)

• Bundling resources (with HTML, Dictionary)

• YUI

• Bundling resources via module loader

46

Message construction

Photo © Den Widhana

Message construction

• Substitution

• {user} went to {city}.

• {user}さんは{city}へ行きました。

48

Message construction

• Plurals

• {user} est allé à {city}.

• {user1} et {user2} sont allés à {city}.

• 1-6 forms depending on language

• {number, plural {one {...} few {...} many {...}}}

49

Message construction

• Gender

• {user} est allé à {city}.

• {user} est allée à {city}.

• 1-4 forms depending on language

• {gender, select {female {...} male {...} unknown {...}}}

50

Message construction{gender, select {

female {num, plural {

one {{user1} est allée à {city}.}

other {{user1} et {user2} sont allées à {city}.}}}

male {num, plural {

one {{user1} est allé à {city}.}

other {{user1} et {user2} sont allés à {city}.}}}

}}

51

Message construction

• Google has MessageFormat for Closure environment

• Alex Sexton provided standalone version

• Mozilla has even more ambitious L20n library

52

Summary• ECMAScript Internationalization API provides

core functionality

• http://norbertlindenberg.com/2012/12/ecmascript-internationalization-api/

• Libraries provide more internationalization support than you may think

• http://norbertlindenberg.com/2013/10/javascript-internationalization/

53

top related