language / locale ids m. davis, ibm a. phillips, webmethods
TRANSCRIPT
Language / Locale IDsLanguage / Locale IDs
M. Davis, IBMM. Davis, IBM
A. Phillips, webMethodsA. Phillips, webMethods
LanguageLanguage
"A shprakh iz a diyalekt mit an armey "A shprakh iz a diyalekt mit an armey un a flot"un a flot"
Max Weinreich (Joshua Fishman), Max Weinreich (Joshua Fishman), 1945.1945.
the written form is the most important the written form is the most important for computersfor computers
does include “culturally-specific” does include “culturally-specific” formatting (as we’ll see later)formatting (as we’ll see later)
does not include currency, time-zone, does not include currency, time-zone, seat-assignment, etc.seat-assignment, etc.
Language Tags: Two NeedsLanguage Tags: Two Needs
IdentificationIdentification Announce that this text is American, Announce that this text is American,
Northern Californian, Casual, PG-13 Northern Californian, Casual, PG-13 EnglishEnglish
Filtering/MatchingFiltering/Matching Accept Any English, Any French, Swiss Accept Any English, Any French, Swiss
German,…German,…
BackgroundBackground
RFC 1766RFC 1766 RFC 3066RFC 3066 Used in XML, HTML,…Used in XML, HTML,… Used both as language ID Used both as language ID andand locale locale
ID (narrow sense)ID (narrow sense)
RFC 3066bisRFC 3066bis
Successor to 3066Successor to 3066 For use in XML, HTML, Java, …For use in XML, HTML, Java, … Addresses limitations of 3066Addresses limitations of 3066 First Draft: 2003/10First Draft: 2003/10 Latest Draft: 2004/2Latest Draft: 2004/2
http://www.ietf.org/internet-drafts/draft-http://www.ietf.org/internet-drafts/draft-phillips-langtags-01.txtphillips-langtags-01.txt
Final Draft: 2004/5??Final Draft: 2004/5??
Main GoalsMain Goals Maintain backward compatibility (so that Maintain backward compatibility (so that
all previous codes would remain valid)all previous codes would remain valid) Reduce the need for large numbers of Reduce the need for large numbers of
registrationsregistrations Provide a more formal structure to allow Provide a more formal structure to allow
parsing into subtags even where software parsing into subtags even where software does not have the latest registrationsdoes not have the latest registrations
Provide stability in the face of potential Provide stability in the face of potential instability in ISO 639, 3166, and 15924 instability in ISO 639, 3166, and 15924 codes (codes (demonstrated instabilitydemonstrated instability in the case in the case of ISO 3166)of ISO 3166)
Allow for external extension mechanisms. Allow for external extension mechanisms.
ExpressivenessExpressiveness
Allows ISO15924 script code subtags and Allows ISO15924 script code subtags and allows them to be used generatively. allows them to be used generatively.
Adds the concept of a variant subtag and Adds the concept of a variant subtag and allows variants to be used generatively.allows variants to be used generatively.
Allows use of UN M49 codes:Allows use of UN M49 codes: es-419 = ”Spanish, Latin America”es-419 = ”Spanish, Latin America”
Changes the IANA language tag registry Changes the IANA language tag registry to a language to a language subtagsubtag registry registry
StabilityStability
Allows backward/forward compatible Allows backward/forward compatible parsingparsing
Defines a process for handling reuse Defines a process for handling reuse of values by ISO639, ISO15924, and of values by ISO639, ISO15924, and ISO3166 in the event that they ISO3166 in the event that they register a previously used value for a register a previously used value for a new purpose.new purpose.
Private Use & ExtensionsPrivate Use & Extensions
Adds an extension mechanism which does Adds an extension mechanism which does not require registration to use.not require registration to use.
Defines the private use tags in ISO639, Defines the private use tags in ISO639, ISO15924, and ISO3166 as the mechanism ISO15924, and ISO3166 as the mechanism for creating private use language, script, for creating private use language, script, and region subtags respectively and region subtags respectively
Defines a syntax for private use variant Defines a syntax for private use variant subtags which can be used without subtags which can be used without registration.registration.
Structure (Bizarro BNF)Structure (Bizarro BNF)
tagtag = lang = lang ** ["-s-" extlang]["-s-" extlang]["-" script]["-" script]["-" region] ["-" region]
** ["-" variant]["-" variant]["-x" ["-x"
extensions] extensions] =/ "x" extensions =/ "x" extensions ; private use; private use =/ grandfathered-registrations =/ grandfathered-registrations
langlang = 2*3 ALPHA = 2*3 ALPHA ; shortest ISO 639; shortest ISO 639 =/ registered-lang =/ registered-lang
registered-langregistered-lang = 5*15 = 5*15 alphanum alphanum
Structure IIStructure II
scriptscript = 4 ALPHA = 4 ALPHA ; ISO ; ISO 1592415924
regionregion = 2 ALPHA = 2 ALPHA ; ISO 3166; ISO 3166
=/ 3 DIGIT =/ 3 DIGIT ; UN ; UN country #country #
variantvariant = 5*15 alphanum = 5*15 alphanum
extensionsextensions = 1* ("-" value) = 1* ("-" value)
valuevalue = 1*31 alphanum = 1*31 alphanum
Examples IExamples I Simple language code:Simple language code:
de (German) de (German) fr (French) fr (French) ja (Japanese) ja (Japanese)
Language code plus Script code :Language code plus Script code : zh-Hant (Traditional Chinese) zh-Hant (Traditional Chinese) en-Latn (English written in Latin script) en-Latn (English written in Latin script) sr-Cyrl (Serbian written with Cyrillic script) sr-Cyrl (Serbian written with Cyrillic script)
Language-Region:Language-Region: de-DE (German for Germany) de-DE (German for Germany) zh-SG (Chinese for Singapore) zh-SG (Chinese for Singapore) cs-CS (Czech for Czechoslovakia) cs-CS (Czech for Czechoslovakia) sr-891 (Serbian for Serbia and Montenegro)sr-891 (Serbian for Serbia and Montenegro)
Examples IIExamples II
Language-Script-Region:Language-Script-Region: zh-Hans-CN (Simplified Chinese for the zh-Hans-CN (Simplified Chinese for the
PRC) PRC) sr-Latn-891 (Serbian, Latin script, Serbia sr-Latn-891 (Serbian, Latin script, Serbia
& Monte.) & Monte.) Language-Script-Region-Variant:Language-Script-Region-Variant:
en-Latn-US-boont (Boontling dialect of en-Latn-US-boont (Boontling dialect of English) English)
Other Mixtures:Other Mixtures: zh-CN (Chinese for the PRC) zh-CN (Chinese for the PRC) en-boont (Boontling dialect of English) en-boont (Boontling dialect of English)
Examples IIIExamples III Extension mechanism:Extension mechanism:
x-valley-girlx-valley-girl de-CH-x-phonebook de-CH-x-phonebook az-Arab-x-AZE-derbend az-Arab-x-AZE-derbend
Extended language subtags:Extended language subtags: zh-s-min zh-s-min zh-s-min-s-nan-Hant-CN zh-s-min-s-nan-Hant-CN
Private Use tags:Private Use tags: qaa-Qaaa-QM-xsouthern (all private tags) qaa-Qaaa-QM-xsouthern (all private tags) de-Qaaa (German, with a private script) de-Qaaa (German, with a private script) de-Latn-QM (German, Latin-script, private region) de-Latn-QM (German, Latin-script, private region) de-Qaaa-DE (German, private script, for Germany) de-Qaaa-DE (German, private script, for Germany)
Examples IVExamples IV
Some Invalid Tags:Some Invalid Tags: de-891-DE (two region tags) de-891-DE (two region tags) a-DE (use of a single character tag) a-DE (use of a single character tag) zh-xsouthern-DE (private use variant zh-xsouthern-DE (private use variant
followed by another tag) followed by another tag)
LocaleLocale
different interpretationsdifferent interpretations narrow = languagenarrow = language broad = any user-preferencesbroad = any user-preferences
user preferenceslanguage
Language vs LocaleLanguage vs Locale
Which are English?Which are English? "Theatre Center News: The date of the last version of this "Theatre Center News: The date of the last version of this
document was 2003document was 2003 年年 33 月月 2020 日日 . A copy can be obtained for . A copy can be obtained for $50,0 or 1.234,57 грн. We would like to acknowledge $50,0 or 1.234,57 грн. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa contributions by the following authors (in alphabetical order): Alaa Ghoneim, Behdad Esfahbod, Ahmed Talaat, Eric Mader, Asmus Ghoneim, Behdad Esfahbod, Ahmed Talaat, Eric Mader, Asmus Freytag, Avery Bishop, and Doug Felt."Freytag, Avery Bishop, and Doug Felt."
"Theater Center News: The date of the last version of this "Theater Center News: The date of the last version of this document was 3/20/2003. A copy can be obtained for $50.00 or document was 3/20/2003. A copy can be obtained for $50.00 or 1,234.57 Ukrainian Hryvni. We would like to acknowledge 1,234.57 Ukrainian Hryvni. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa contributions by the following authors (in alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric Mader."Esfahbod, Doug Felt, Eric Mader."
"Theatre Centre News: The date of the last version of this "Theatre Centre News: The date of the last version of this document was 20/3/2003. A copy can be obtained for $50.00 or document was 20/3/2003. A copy can be obtained for $50.00 or 1,234.57 Ukrainian Hryvni. We would like to acknowledge 1,234.57 Ukrainian Hryvni. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa contributions by the following authors (in alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric Mader."Esfahbod, Doug Felt, Eric Mader."
SummarySummary
Improved version of 3066Improved version of 3066 Used for language Used for language
andand locale (in narrow sense) locale (in narrow sense) Addresses IssuesAddresses Issues
Script DistinctionsScript Distinctions ParseabilityParseability ExtensionsExtensions ……
ReferencesReferences
Latest Public DraftLatest Public Draft http://www.ietf.org/internet-drafts/draft-phillipshttp://www.ietf.org/internet-drafts/draft-phillips
-langtags-01.txt-langtags-01.txt Working DraftWorking Draft
http://www.inter-locale.com/ID/draft-phillips-lanhttp://www.inter-locale.com/ID/draft-phillips-langtags-02.htmlgtags-02.html (HTML version) (HTML version)
Language Code Issues (+ Locales)Language Code Issues (+ Locales) http://oss.software.ibm.com/cvs/icu/~checkouthttp://oss.software.ibm.com/cvs/icu/~checkout
~/icuhtml/design/language_code_issues.html~/icuhtml/design/language_code_issues.html
Q&AQ&A