language / locale ids m. davis, ibm a. phillips, webmethods

21
Language / Locale Language / Locale IDs IDs M. Davis, IBM M. Davis, IBM A. Phillips, webMethods A. Phillips, webMethods

Upload: kory-wiggins

Post on 29-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

Language / Locale IDsLanguage / Locale IDs

M. Davis, IBMM. Davis, IBM

A. Phillips, webMethodsA. Phillips, webMethods

Page 2: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

LanguageLanguage

"A shprakh iz a diyalekt mit an armey "A shprakh iz a diyalekt mit an armey un a flot"un a flot"

Max Weinreich (Joshua Fishman), Max Weinreich (Joshua Fishman), 1945.1945.

the written form is the most important the written form is the most important for computersfor computers

does include “culturally-specific” does include “culturally-specific” formatting (as we’ll see later)formatting (as we’ll see later)

does not include currency, time-zone, does not include currency, time-zone, seat-assignment, etc.seat-assignment, etc.

Page 3: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

Language Tags: Two NeedsLanguage Tags: Two Needs

IdentificationIdentification Announce that this text is American, Announce that this text is American,

Northern Californian, Casual, PG-13 Northern Californian, Casual, PG-13 EnglishEnglish

Filtering/MatchingFiltering/Matching Accept Any English, Any French, Swiss Accept Any English, Any French, Swiss

German,…German,…

Page 4: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

BackgroundBackground

RFC 1766RFC 1766 RFC 3066RFC 3066 Used in XML, HTML,…Used in XML, HTML,… Used both as language ID Used both as language ID andand locale locale

ID (narrow sense)ID (narrow sense)

Page 5: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

RFC 3066bisRFC 3066bis

Successor to 3066Successor to 3066 For use in XML, HTML, Java, …For use in XML, HTML, Java, … Addresses limitations of 3066Addresses limitations of 3066 First Draft: 2003/10First Draft: 2003/10 Latest Draft: 2004/2Latest Draft: 2004/2

http://www.ietf.org/internet-drafts/draft-http://www.ietf.org/internet-drafts/draft-phillips-langtags-01.txtphillips-langtags-01.txt

Final Draft: 2004/5??Final Draft: 2004/5??

Page 6: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

Main GoalsMain Goals Maintain backward compatibility (so that Maintain backward compatibility (so that

all previous codes would remain valid)all previous codes would remain valid) Reduce the need for large numbers of Reduce the need for large numbers of

registrationsregistrations Provide a more formal structure to allow Provide a more formal structure to allow

parsing into subtags even where software parsing into subtags even where software does not have the latest registrationsdoes not have the latest registrations

Provide stability in the face of potential Provide stability in the face of potential instability in ISO 639, 3166, and 15924 instability in ISO 639, 3166, and 15924 codes (codes (demonstrated instabilitydemonstrated instability in the case in the case of ISO 3166)of ISO 3166)

Allow for external extension mechanisms. Allow for external extension mechanisms.

Page 7: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

ExpressivenessExpressiveness

Allows ISO15924 script code subtags and Allows ISO15924 script code subtags and allows them to be used generatively. allows them to be used generatively.

Adds the concept of a variant subtag and Adds the concept of a variant subtag and allows variants to be used generatively.allows variants to be used generatively.

Allows use of UN M49 codes:Allows use of UN M49 codes: es-419 = ”Spanish, Latin America”es-419 = ”Spanish, Latin America”

Changes the IANA language tag registry Changes the IANA language tag registry to a language to a language subtagsubtag registry registry

Page 8: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

StabilityStability

Allows backward/forward compatible Allows backward/forward compatible parsingparsing

Defines a process for handling reuse Defines a process for handling reuse of values by ISO639, ISO15924, and of values by ISO639, ISO15924, and ISO3166 in the event that they ISO3166 in the event that they register a previously used value for a register a previously used value for a new purpose.new purpose.

Page 9: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

Private Use & ExtensionsPrivate Use & Extensions

Adds an extension mechanism which does Adds an extension mechanism which does not require registration to use.not require registration to use.

Defines the private use tags in ISO639, Defines the private use tags in ISO639, ISO15924, and ISO3166 as the mechanism ISO15924, and ISO3166 as the mechanism for creating private use language, script, for creating private use language, script, and region subtags respectively and region subtags respectively

Defines a syntax for private use variant Defines a syntax for private use variant subtags which can be used without subtags which can be used without registration.registration.

Page 10: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

Structure (Bizarro BNF)Structure (Bizarro BNF)

tagtag = lang = lang ** ["-s-" extlang]["-s-" extlang]["-" script]["-" script]["-" region] ["-" region]

** ["-" variant]["-" variant]["-x" ["-x"

extensions] extensions] =/ "x" extensions =/ "x" extensions ; private use; private use =/ grandfathered-registrations =/ grandfathered-registrations

langlang = 2*3 ALPHA = 2*3 ALPHA ; shortest ISO 639; shortest ISO 639 =/ registered-lang =/ registered-lang

registered-langregistered-lang = 5*15 = 5*15 alphanum alphanum

Page 11: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

Structure IIStructure II

scriptscript = 4 ALPHA = 4 ALPHA ; ISO ; ISO 1592415924

regionregion = 2 ALPHA = 2 ALPHA ; ISO 3166; ISO 3166

=/ 3 DIGIT =/ 3 DIGIT ; UN ; UN country #country #

variantvariant = 5*15 alphanum = 5*15 alphanum

extensionsextensions = 1* ("-" value) = 1* ("-" value)

valuevalue = 1*31 alphanum = 1*31 alphanum

Page 12: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

Examples IExamples I Simple language code:Simple language code:

de (German) de (German) fr (French) fr (French) ja (Japanese) ja (Japanese)

Language code plus Script code :Language code plus Script code : zh-Hant (Traditional Chinese) zh-Hant (Traditional Chinese) en-Latn (English written in Latin script) en-Latn (English written in Latin script) sr-Cyrl (Serbian written with Cyrillic script) sr-Cyrl (Serbian written with Cyrillic script)

Language-Region:Language-Region: de-DE (German for Germany) de-DE (German for Germany) zh-SG (Chinese for Singapore) zh-SG (Chinese for Singapore) cs-CS (Czech for Czechoslovakia) cs-CS (Czech for Czechoslovakia) sr-891 (Serbian for Serbia and Montenegro)sr-891 (Serbian for Serbia and Montenegro)

Page 13: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

Examples IIExamples II

Language-Script-Region:Language-Script-Region: zh-Hans-CN (Simplified Chinese for the zh-Hans-CN (Simplified Chinese for the

PRC) PRC) sr-Latn-891 (Serbian, Latin script, Serbia sr-Latn-891 (Serbian, Latin script, Serbia

& Monte.) & Monte.) Language-Script-Region-Variant:Language-Script-Region-Variant:

en-Latn-US-boont (Boontling dialect of en-Latn-US-boont (Boontling dialect of English) English)

Other Mixtures:Other Mixtures: zh-CN (Chinese for the PRC) zh-CN (Chinese for the PRC) en-boont (Boontling dialect of English) en-boont (Boontling dialect of English)

Page 14: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

Examples IIIExamples III Extension mechanism:Extension mechanism:

x-valley-girlx-valley-girl de-CH-x-phonebook de-CH-x-phonebook az-Arab-x-AZE-derbend az-Arab-x-AZE-derbend

Extended language subtags:Extended language subtags: zh-s-min zh-s-min zh-s-min-s-nan-Hant-CN zh-s-min-s-nan-Hant-CN

Private Use tags:Private Use tags: qaa-Qaaa-QM-xsouthern (all private tags) qaa-Qaaa-QM-xsouthern (all private tags) de-Qaaa (German, with a private script) de-Qaaa (German, with a private script) de-Latn-QM (German, Latin-script, private region) de-Latn-QM (German, Latin-script, private region) de-Qaaa-DE (German, private script, for Germany) de-Qaaa-DE (German, private script, for Germany)

Page 15: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

Examples IVExamples IV

Some Invalid Tags:Some Invalid Tags: de-891-DE (two region tags) de-891-DE (two region tags) a-DE (use of a single character tag) a-DE (use of a single character tag) zh-xsouthern-DE (private use variant zh-xsouthern-DE (private use variant

followed by another tag) followed by another tag)

Page 16: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

LocaleLocale

different interpretationsdifferent interpretations narrow = languagenarrow = language broad = any user-preferencesbroad = any user-preferences

user preferenceslanguage

Page 17: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

Language vs LocaleLanguage vs Locale

Page 18: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

Which are English?Which are English? "Theatre Center News: The date of the last version of this "Theatre Center News: The date of the last version of this

document was 2003document was 2003 年年 33 月月 2020 日日 . A copy can be obtained for . A copy can be obtained for $50,0 or 1.234,57 грн. We would like to acknowledge $50,0 or 1.234,57 грн. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa contributions by the following authors (in alphabetical order): Alaa Ghoneim, Behdad Esfahbod, Ahmed Talaat, Eric Mader, Asmus Ghoneim, Behdad Esfahbod, Ahmed Talaat, Eric Mader, Asmus Freytag, Avery Bishop, and Doug Felt."Freytag, Avery Bishop, and Doug Felt."

"Theater Center News: The date of the last version of this "Theater Center News: The date of the last version of this document was 3/20/2003. A copy can be obtained for $50.00 or document was 3/20/2003. A copy can be obtained for $50.00 or 1,234.57 Ukrainian Hryvni. We would like to acknowledge 1,234.57 Ukrainian Hryvni. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa contributions by the following authors (in alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric Mader."Esfahbod, Doug Felt, Eric Mader."

"Theatre Centre News: The date of the last version of this "Theatre Centre News: The date of the last version of this document was 20/3/2003. A copy can be obtained for $50.00 or document was 20/3/2003. A copy can be obtained for $50.00 or 1,234.57 Ukrainian Hryvni. We would like to acknowledge 1,234.57 Ukrainian Hryvni. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa contributions by the following authors (in alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric Mader."Esfahbod, Doug Felt, Eric Mader."

Page 19: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

SummarySummary

Improved version of 3066Improved version of 3066 Used for language Used for language

andand locale (in narrow sense) locale (in narrow sense) Addresses IssuesAddresses Issues

Script DistinctionsScript Distinctions ParseabilityParseability ExtensionsExtensions ……

Page 20: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

ReferencesReferences

Latest Public DraftLatest Public Draft http://www.ietf.org/internet-drafts/draft-phillipshttp://www.ietf.org/internet-drafts/draft-phillips

-langtags-01.txt-langtags-01.txt Working DraftWorking Draft

http://www.inter-locale.com/ID/draft-phillips-lanhttp://www.inter-locale.com/ID/draft-phillips-langtags-02.htmlgtags-02.html (HTML version) (HTML version)

Language Code Issues (+ Locales)Language Code Issues (+ Locales) http://oss.software.ibm.com/cvs/icu/~checkouthttp://oss.software.ibm.com/cvs/icu/~checkout

~/icuhtml/design/language_code_issues.html~/icuhtml/design/language_code_issues.html

Page 21: Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

Q&AQ&A