international summit on localisation (mait/tdil) new delhi, 2004-12-08 (r2) localization data mark...

19
International Summit on Localisation (MAIT/TDIL) International Summit on Localisation (MAIT/TDIL) New Delhi, 2004-12-08 (R2) New Delhi, 2004-12-08 (R2) Localization Data Localization Data Mark Davis, PhD Mark Davis, PhD Chief SW Globalization Arch., Chief SW Globalization Arch., IBM IBM President, Unicode Consortium President, Unicode Consortium

Upload: timothy-owen

Post on 27-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1

International Summit on Localisation (MAIT/TDIL) New Delhi, 2004-12-08 (R2) Localization Data Mark Davis, PhD Chief SW Globalization Arch., IBM President, Unicode Consortium Slide 2 Importance of Standards Products developed in each country interoperate with other products: inside and outside that country Products developed in each country interoperate with other products: inside and outside that country Mechanism for countries / industries to promulgate best practices Mechanism for countries / industries to promulgate best practices SW Localization SW Localization Unicode: Universal character encodingUnicode: Universal character encoding CLDR: Common Locale Data RepositoryCLDR: Common Locale Data Repository Slide 3 Universal Character Encoding Unicode: Unique character codes for all languages Unicode: Unique character codes for all languages Slide 4 Common Locale Data Repository Relatively new project: 2004 Relatively new project: 2004 Hosted by Unicode Consortium Hosted by Unicode Consortium http://www.unicode.org/cldr/http://www.unicode.org/cldr/http://www.unicode.org/cldr/ Goals: Goals: Common, required SW locale data for world languagesCommon, required SW locale data for world languages XML format for effective interchangeXML format for effective interchange Freely availableFreely available Slide 5 What is Locale Data Locale = identifier string referring to linguistic and cultural preferences Locale = identifier string referring to linguistic and cultural preferences Typical data Typical data Dates/time formatsDates/time formats Number/Currency formatsNumber/Currency formats Measurement SystemMeasurement System Collation Specification (Collation)Collation Specification (Collation) Used for sorting, searching, matching Used for sorting, searching, matching Translated names for language, territory, script, timezones, currencies,Translated names for language, territory, script, timezones, currencies, Slide 6 Latest Release: CLDR 1.2 Released:November, 2004 Released:November, 2004 localeslanguagesterritories Approved:23272 108 Approved:23272 108 Draft:632728 Draft:632728 Data Data Unique XPaths:2,540Unique XPaths:2,540 Actual Values:56,290Actual Values:56,290 Fully Resolved:358,860Fully Resolved:358,860 (not including collation, aliased data) Slide 7 Next Release: CLDR 1.3 Jan 2005: Freeze date Jan 2005: Freeze date For new enhancement requests & bug reportsFor new enhancement requests & bug reports Apr 2005: Target release date Apr 2005: Target release date Planned features Planned features New data / corrections / tests (ongoing)New data / corrections / tests (ongoing) Survey toolSurvey tool POSIX conversion toolPOSIX conversion tool Additional MechanismsAdditional Mechanisms lenient date/time/number parsing; lenient date/time/number parsing; different combinations of date fields; different combinations of date fields; names for dialects, measurement systems; names for dialects, measurement systems; narrative reference information narrative reference information Slide 8 Usage (direct or indirect) Caveats Caveats Not a complete list: usage is not tracked, so this is an estimateNot a complete list: usage is not tracked, so this is an estimate CLDR first available in 2004, so may use precursor dataCLDR first available in 2004, so may use precursor data Companies / Organizations Companies / Organizations Adobe, Apple (Mac OS X), abas Software, Argonne National Laboratory, Ascential Software, Avaya, BEA, BroadJump, BluePhoenix Solutions, BMC Software (Remedy), Business Objects, caris, CERN, Cognos, Debian Linux, Gentoo Linux, HP, Hyperion, IBM, Inktomi, Innodata Isogen, Informatica, Intel, Interlogics, IONA, IXOS, JD Edwards, Jikes, Macromedia, Mathworks, Mozilla, OpenOffice, Language Analysis Systems, Lawson Software, Leica Geosystems GIS & Mapping LLC, Mandrake Linux, Parrot, PayPal, Progress Software, Python, QNX, Rogue Wave, SAP, Siebel, SIL, SPSS, Software AG, Sun Microsystems (Solaris, Java), SuSE, Sybase, Teradata (NCR), Trend Micro, Virage, webMethods, Wine, WMS Gaming,Adobe, Apple (Mac OS X), abas Software, Argonne National Laboratory, Ascential Software, Avaya, BEA, BroadJump, BluePhoenix Solutions, BMC Software (Remedy), Business Objects, caris, CERN, Cognos, Debian Linux, Gentoo Linux, HP, Hyperion, IBM, Inktomi, Innodata Isogen, Informatica, Intel, Interlogics, IONA, IXOS, JD Edwards, Jikes, Macromedia, Mathworks, Mozilla, OpenOffice, Language Analysis Systems, Lawson Software, Leica Geosystems GIS & Mapping LLC, Mandrake Linux, Parrot, PayPal, Progress Software, Python, QNX, Rogue Wave, SAP, Siebel, SIL, SPSS, Software AG, Sun Microsystems (Solaris, Java), SuSE, Sybase, Teradata (NCR), Trend Micro, Virage, webMethods, Wine, WMS Gaming, Optional use: Optional use: Apache, Perl, Xalan, Xerces, Apache, Perl, Xalan, Xerces, Slide 9 Sample: Languages, Scripts, TerritoriesAfar Afar Abkhasisk Abkhasisk Arabisk Arabisk Andorra Andorra Forenede Arabiske Emirater Forenede Arabiske Emirater Slide 10 Sample: Characters / Dates[a-z ] [a-z ] sn sn man man Slide 11 Sample: Timezones / Currencies Pacific-normaltid Pacific-normaltid Pacific-sommertid Pacific-sommertid Gabonesisk CFA-franc Gabonesisk CFA-franc GAF GAF Slide 12 Sample: Collation0 0 Slide 13 Committee Process For most effective participation from people around the world For most effective participation from people around the world MeetingsMeetings By phone, never F2F By phone, never F2F Short, often Short, often Allows preparation between meetings Allows preparation between meetings WrittenWritten Email Email Database submissions Database submissions Slide 14 Vetting Process for Data Collect from different platforms, experts, submissions: new or revised Collect from different platforms, experts, submissions: new or revised References to external sources strongly encouragedReferences to external sources strongly encouraged Must be before freeze date for releaseMust be before freeze date for release Will use Survey ToolWill use Survey Tool Enter in the repository Enter in the repository Mark with draft attributeMark with draft attribute Add references, standardsAdd references, standards Verify by CLDR committee members Verify by CLDR committee members Consulting with country contactsConsulting with country contacts If disagreement, decide in committeeIf disagreement, decide in committee Accept Accept As main form: draft attribute removedAs main form: draft attribute removed As alternate form: marked with different attributesAs alternate form: marked with different attributes Slide 15 Challenges Aggressive, 6 month release schedule Aggressive, 6 month release schedule Complex Formats Complex Formats Collation, Date Formats, Exemplar characters, etc.Collation, Date Formats, Exemplar characters, etc. Require close interaction of CLDR experts with language expertsRequire close interaction of CLDR experts with language experts Choosing most customary, acceptable forms Choosing most customary, acceptable forms Regional differences, individual preferencesRegional differences, individual preferences Context (months in formats vs. calendars)Context (months in formats vs. calendars) Uncommon cases (Interlingua)Uncommon cases (Interlingua) Standards vs. common modern usageStandards vs. common modern usage Obtaining references for dataObtaining references for data But can have multiple, alternate versionsBut can have multiple, alternate versions Slide 16 Getting Involved Simplest Simplest Bug report / feature request anyone!Bug report / feature request anyone! More Involved More Involved Vetting, Assessment, Tools, Policies, Decisions, Vetting, Assessment, Tools, Policies, Decisions, Any Unicode member eligible to name representativesAny Unicode member eligible to name representatives Full members: IBM, Apple, Sun, Oracle, India, Full members: IBM, Apple, Sun, Oracle, India, Liaison members: Ireland, Finland, Liaison members: Ireland, Finland, Associate members: Tamil Nadu, Associate members: Tamil Nadu, Slide 17 Example Country Process (Finland) Finnish Ministry of Education made CLDR data a major goal, 2004-06 Finnish Ministry of Education made CLDR data a major goal, 2004-06 Research Institute for the Languages of Finland ("RILF" aka "Kotus") designated agencyResearch Institute for the Languages of Finland ("RILF" aka "Kotus") designated agency Documenting the national preferences in the open even more important than implementationsDocumenting the national preferences in the open even more important than implementations Results expected to lead to new/revised national standardsResults expected to lead to new/revised national standards Slide 18 Example Country Process (II) RILF a Unicode Liaison member, 2004-07 RILF a Unicode Liaison member, 2004-07 Set up fully open national group on language and cultural requirements on ICT, 2004-09Set up fully open national group on language and cultural requirements on ICT, 2004-09 Two official languages (Finnish and Swedish) & four regional / minority languages (three Smi & Romani as spoken in Finland) to be coveredTwo official languages (Finnish and Swedish) & four regional / minority languages (three Smi & Romani as spoken in Finland) to be covered Over 30 different parties represented: commercial, non-commercial, individualsOver 30 different parties represented: commercial, non-commercial, individuals Public comments to be allowed: http://kotoistus.fiPublic comments to be allowed: http://kotoistus.fi http://kotoistus.fi Documentation for all controversial issues and deviations from any national standardsDocumentation for all controversial issues and deviations from any national standards Slide 19 For more information Unicode Unicode http://www.unicode.org/http://www.unicode.org/http://www.unicode.org/ CLDR CLDR http://www.unicode.org/cldr/http://www.unicode.org/cldr/http://www.unicode.org/cldr/ This presentation This presentation http://www.macchiato.com/slides/Locali zation.ppthttp://www.macchiato.com/slides/Locali zation.ppthttp://www.macchiato.com/slides/Locali zation.ppthttp://www.macchiato.com/slides/Locali zation.ppt document.write(between.get_banner_code('728x90'));