xliff - the xml based open standard for localisable content tony jewtushenko oracle corporation -...
TRANSCRIPT
XLIFF - the XML based Open Standard for Localisable
Content
Tony JewtushenkoOracle Corporation - Principal Product Manager
Chair – OASIS XLIFF TC
The XML Localisation Interchange File Format
Slide 2
Agenda
• Open StandardsDefinition and process
• Overview of XLIFF Definition, goals, and benefits of XLIFFArchitecture and Main Features of XLIFFUse cases
• Open Source LocalisationTechnical OverviewProcess OverviewUse case
• Where does XLIFF fit?Tools Support for XLIFFXLIFF Adoption by Open Source community
Slide 3
What is an Open Standard?
Open standards are:• Publicly available in stable, persistent versions• Developed and approved under a published process • Open to public input: public comments, public archives, no
NDAs• Subject to explicit, disclosed IPR terms• See the US, EU, WTO governmental & treaty definitions of
“standards”
Anything else is proprietary
Source: “Relationship Between Open Standards and Open Source Software”, Patrick Gannon – CEO OASIS, Open Source in Government, Washington, DC, 15-17 March 2004
Slide 4
OASIS: Standards Body Home of XLIFF
• OASIS: Organization for the Advancement of Structured Information Standards
• World’s largest independent, non-profit organization dedicated to the standardisation of eBusiness specifications.
• More than 150 member companies plus individuals• Operates XML.ORG Registry, the open community
clearinghouse of XML application schemas • Technical work on XML interoperability includes
XML conformance and XML Registries/Repositories • General XML and eBusiness technical resource
Slide 5
OASIS Standards Process
• Specifications are created under an open, democratic, vendor-neutral process– Anyone may participate
– No single organisation can dictate the specification - specifications must meet everyone’s needs
– All discussions are open to the public view and comment
• Two Tiered Specification approval process– Committee Draft approved by Technical Committee
– OASIS members approve specification as OASIS Standard
• Process guarantees that specifications are created by a broad range of industry, not just a single vendor
Slide 6
XLIFF Overview
A glance at the definitions, goals and benefits of the XML Localisation Interchange File Format.
Slide 7
What is XLIFF?
A specification for the lossless interchange of localizable data and its related information, which is tool-neutral, has been formalized as an XML vocabulary, and features an extensibility mechanism.
Slide 8
Why XLIFF is Needed?
Localization offers the following challenges:
• Insufficient interoperability between tools.
• Lack of support for overall localization workflow.
• Necessity of localization tools developers to deal with many formats.
• Large number of proprietary intermediate formats.
Slide 9
Advantages – Technology (1/2)
• For a given utility, only one implementation is necessary (e.g. not one spell checker for PO Files, and another one for HTML).
• Increases usability of utilities (i.e. all formats with XLIFF filters can be used with XLIFF-enabled utilities).
• Can contain either UI or Document content
• Metadata provides integration with automated workflow.
Slide 10
Advantages – Technology (2/2)
• All advantages of XML-based processing:– Content validation (XSD)– Use of its internationalization features.– Better interoperability and cross-platform support.– Powerful rendering options (XSL-FO, CSS).– Powerful transformation options (XSLT).– Greater integration with Web services.
• Access to existing, and often open-source, XML implementations
Slide 11
XLIFF Timeline
• September 2000 - DataDefinition Kickoff
• December 2000 - first face to face
• March 2001 - second face to face
• End March 2001 - draft 1.0 spec and DTD published
• June 2001 - White Paper published
• December 2001 - OASIS XLIFF Technical Committee Proposal submitted
• April 2002 – XLIFF 1.0 Specification approved by formal vote as an OASIS Committee Specification
• May 2003 – XLIFF 1.1 Specification approved by formal vote as an OASIS Committee Specification
• August/Sept 2003 – XLIFF 1.1 Peer Review
• November 2003 – Revised XLIFF 1.1 Specification approved as OASIS Committee Specification
• November 2003 – XLIFF 1.1 Specification submitted for public review
Slide 12
Drivers Behind XLIFF
Alchemy SoftwareBowne Global SolutionsConvey SoftwareEktron, Inc ENLASO Corp (RWS)GlobalsightHPLotus/IBMLionbridgeLRCMoravia IT
NovellOraclePASS EngineeringMicrosoftSAPSDL InternationalSun MicrosystemsTektronixTRADOSXML-Intl
Slide 13
XLIFF TC in the Standards Community
• Shared interests with OASIS Translation Web Services Technical Committee– XLIFF may be used as data container for WS
• Shared interests with the OSCAR SIG at LISA– Segmentation and word-count.– Content markup (inline codes).
• Shared interests with the W3C i18n WG– Localization directives.– Best practices.– In the localization aspects of the W3C. recommendations.– Web services.
Slide 15
Extract-Localize-Merge Paradigm
• Separate data related to localization from parts not related to localization.
• Merge translated data with codes at the end of the process to create the final document.
• Skeleton file is optional, so this paradigm is also optional
Slide 16
A Birds-Eyes View
An XLIFF document can capture anything needed for a localization project:
1. Localizable objects (e.g. text strings) in source and target languages.
2. Supplementary information (e.g. glossaries, or material to recreate the original format).
3. Administrative information (e.g. workflow data).
4. Custom data (e.g. initialization information for tools).
Slide 17
The XLIFF Document
• An XLIFF document is designed to store the extracted data related to localization.
• Each given source container (e.g. a file, a database table, and so forth) corresponds to a <file> element in XLIFF.
• Each XLIFF document can include several <file> elements.
• A whole localization project can possibly be stored in a single XLIFF document.
Slide 18
Bilingual Model
• Each <file> element is designed to store one source language and one target language.
• The rational is that the translation of different target language is done by different people most of the time.
• However, languages in <alt-trans> element can be different. For example, proposed matches in national Portuguese when translating into Brazilian Portuguese.
Slide 19
Localizable Objects
• XLIFF allows not only text string as localizable object but also other object types such as graphics.
• Supplementary information can be represented in a generic way through inline codes (e.g. formatting of text).
• Relationship between object can be captured (e.g. all items in a menu).
Slide 21
Supplementary Info
• XLIFF provides “hooks” for storing supplementary information (for example to glossaries or translation memories which should be used).
• The supplementary information can be referenced (i.e. reside outside of the document), or embedded within the document.
Slide 22
Administrative Info
XLIFF provides mechanisms for capturing administrative information:
• For relating source material to XLIFF documents.
• For storing workflow data.
• For providing pre-translation entries generated by TM, MT, translation repository.
• For keeping track of changes.
Slide 23
XLIFF 1.1 Custom Data
In XLIFF 1.1, we have the ability to customise XLIFF by extending via private namespace:– Elements– Attributes– Attribute Values
Slide 24
Embedding XLIFF 1.1
• Can embed an entire or part of an XLIFF doc in other XML doc
• XML defined by XML Schema (XSD) that includes an <any> element in the definition of the element where the XLIFF data can be inserted
Slide 26
Basic Use Case – without XLIFF
Tool ResourceFilters
DeveloperApplications TranslatorCustomer
SpecificTool (s)
Native File 2(e.g., JavaFiles)
Native File 1(e.g., HTML)
Native File 3(e.g., Java Properties)
Native File n
Publisher/CustomerDomain
LocalisationDomain
Slide 27
Basic Use Case –with XLIFF
XLIFF compliant DeveloperApplications
TranslatorXLIFFCompliantEditor
XLIFF file(s) containingHTML, Java, Properties, etc translatable resources
Non XLIFF compliant DeveloperApplications
- OR -
Publisher/CustomerDomain
LocalisationDomain
Direct toXLIFF authoring
HTML
Java Properties
RC Data
Pre-processing
Slide 28
Automated Localisation with CAT Use Case
Developer Translator
GenerateXLIFF
Pseudo Translate / Test
LocalizationEngineer
XLIFF Translation Kit
100% match
TranslationRepository
DefectReport
XLIFF Editor
XLIFF Translation Kit
Translate
RequiresTranslation
100%Translated
0% Translated
100%Translated
Fuzzymatch
TranslationMemory
MachineTranslation
MachineTranslate
Update
Slide 30
Open Source Resource Formats
• User Assistance (Help):– DocBook as intermediate container
• UI Resources:– Many different format types, but converge on:
• PO / POT
• Java Resource Bundles (.properties & .java)
Slide 31
Docbook
• Formed in 1991• SGML and XML versions• Many commercial XML editors optimised for
Docbook• No good Open Source XML editors available.• GNU converts Docbook to (XML->) PO files,
translates, then converts back.• Docbook converted to HTML dynamically by Yelp
Help Browser.• To optimise performance can pre-convert to HTML
Slide 32
UI Resource Format – Java Resources
• ListResourceBundle– .java file– Can contain binary data– Compiled into class file
• PropertyResourceBundles– .properties file– Contain strings only– Values acquired at runtime– Requires 8859-1 encoding– Non 8859-1 characters represented as UTF8 escape codes
(ie, \uxxxx)– native2ascii to convert non 8859-1 content
Slide 33
UI Resource Format – Java Resources
• Localization challenges:– Each file contains 1 language locale pair– Key / Value Pairs– No normalized metadata – comments often used for
ad hoc metadata.
Slide 34
UI Resource Format - PO
PO (Portable Object) Files, and POT (templates)– A “Catalog”– Bi-lingual model– Resource bundle accessed by “gettext()” – Text files– Utilities available to convert from many resource types to
PO (ie., C, Delphi, Java, Python, etc.)– Compiled into “MO” files– Support for Plurals– Limited metadata– Used by most GNU, GNOME, KDE and other Open
Source projects
Slide 35
PO File Syntax# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
msgid “”
msgstr “”
"Project-Id-Version: Project Version \n"
"PO-Revision-Date: YYYY-DD-MM HH:MM-SSSS\n"
"Last-Translator: TranslatorName <email>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=code\n"
"Content-Transfer-Encoding: 8bit\n"
"POT-Creation-Date: \n"
"Language-Team: \n“
white-space (usually a single new line)
# translator-comments
#. automatic-comments
#: reference...
#, flag...
msgid untranslated-string
msgstr translated-string
Header
Resource(s)Segment Metadata
Comments
Separator
Slide 36
PO File Plural Form
white-space
# translator-comments
#. automatic-comments
#: reference...
#, flag...
msgid untranslated-string
msgstr translated-string
msgstr_plural translated-string-plural-form
msgstr[0] translated-string-plural-form
msgstr[1] translated-string-plural-form
msgstr[n] translated-string-plural-form
Plural form of a message in the PO file looks like this:
“n” is language specific
Slide 37
PO File Plural Forms Syntax / Examplesmsgid untranslated-string
msgstr_plural translated-string-plural-form
msgstr[0] translated-string-plural-form
msgstr[1] translated-string-plural-form
msgstr[n] translated-string-plural-form
msgid "%s file"
msgid_plural "%s files"
msgstr[0] "%s fichier"
msgstr[1] "%s fichiers"
msgid "%s file"
msgid_plural "%s files"
msgstr[0] "%s plik"
msgstr[1] "%s pliki"
msgstr[2] "%s plików"
Syntax
French
Polish
Slide 38
PO File Localization Challenges
• Plural Forms Challenges– Rules differ across languages, and implementations differ
across platforms.
– PO editing tools don’t support plural form well (poedit, Kbabel), and recommend using text editors .
• Limited normalized metadata• Little or no context information for translators• Docbook represented as PO files loses metadata• Limited support for segmentation, alignment
Slide 39
Simplified GNU/KDE Style Use Case
Docbook
i18n Coordinator
Documentation Author
DeveloperDomain Localisation
Domain
Docbook/PO converterCVSUI Developer
Generate PO FilesPO
PO
Preparation &Project Management
Translator
PO
Text Editor
PO Editor
CVSUP
PO/Docbook converter
Translation
TM
Slide 40
Open Source Localisation Process
• Localization in Open Source community is very technical, and almost entirely manual – primary interface is CVS, even for translators(eg: http://i18n.kde.org/translation-howto/index.html)
• Process and tools differ from project to project, even language to language.
• Little or no formal linguistic review: quality, style consistency vary widely.
• Project Management and translation are performed by volunteers.
Slide 42
XML-Enabled Translation Tools
• Any XML-enabled translation tool can work with an XLIFF document, as long as the text to translate is initially copied in the <target> elements. However, this does not mean it supports all XLIFF features, but just permits translation of <target> content.
• Many tools cannot handle conditional translation (for example: <trans-unit translate="no">). Then, you need to add extra elements temporarily.
Slide 43
XLIFF Enabled Commercial Tools
• Alchemy Software - Catalyst 5.0 – Visual XLIFF 1.1 Editor http://www.alchemysoftware.ie
• Heartsome XLIFF Editor, support for PO files, Docbook: http://www.heartsome.net
• PASS: Passolo: Visual XLIFF Editor: http://www.passolo.com
• Trados: No direct XLIFF support yet, but can edit XLIFF files using modified INI
• XML-Intl : XLIFF Editor http://www.xml-intl.com
Slide 44
XLIFF Enabled Shareware/Freeware
• ENSALO Corp (formerly “RWS Group”) : Extraction Utility for RC Data and Java Properties to XLIFF 1.1 http://dotnet.goglobalnow.net/
Various Freeware Utilities, including converters for PO files: http://www.translate.com/shared/tools
Slide 45
XLIFF Enabled Open Source
• International Components for Unicode (ICU):– Open Source set of C/C++ and Java libraries for
Unicode support, software internationalization and globalization, extends JDK i18n
– genrb, and XLIFF2ICUConverter class to convert between common formats and XLIFF
– Includes RBManager, a Java based resource bundle editor with XLIFF support
http://oss.software.ibm.com/icu/
Slide 46
XLIFF Enabled Open Source
• Okapi Framework XSL Template Collection:–Sample utilities for transforming XLIFF to PO, RC, Java Properties
http://sourceforge.net/project/showfiles.php?group_id=42949&release_id=67485
• xliffRoundTrip tool–Transforms any XML file to/from XLIFF using XSLT
http://sourceforge.net/projects/xliffroundtrip/
• Lionbridge ForeignDesk–Incomplete XLIFF support
http://sourceforge.net/projects/foreigndesk/
Slide 47
Future Support for XLIFF Announced:• Apple Corp: Apple’s resource editor AppleGlot• Idiom: Worldserver V.6.0• SDL International: SDLX support for XLIFF currently
in development. See http://www.sdlx.com for more information.
• uPortal: Open Source Web portal infrastructure for Universities – XLIFF support announced for Version 3.0, to be released in 2005
Slide 48
Where does XLIFF fit?
• Good choice for projects with multiple resource formats, especially good for XML.
• XLIFF addresses the process and metadata related problems of Open Source projects:– Supports workflow metadata.– Supports multiple resource formats– Normalised translation memory / repository data.– Simplifies translator usability experience.
Slide 49
Where does XLIFF fit?
• Issues Blocking Adoption by Open Source:– Adoption requires retooling - lack of existing open
source XLIFF tools for PO and Docbook.– PO tools deemed adequate for current requirements– “Volunteer” model reduces urgency to reduce costs
Slide 50
Where does XLIFF fit?
• Issues Encouraging Adoption by Open Source:– Increase in commercial product development for
Open Source platforms• Translation not volunteer effort - cost control important.
• Integration with existing automation required.
• Increased availability of commercial tools that support XLIFF
– Increase in Java Open Source projects• Java projects are well supported by XLIFF.
• Well documented L10n best practices include XLIFF
• Available commercial and Open Source tools
Slide 51
More Information
• The XLIFF TC Web Site: http://www.xliff.org
• A “best practice” from Sun Developer Network: http://developers.sun.com/dev/gadc/technicalpublications/whitepapers/translation_technology_sun.html
• Presenter: – XLIFF TC Chair: Tony Jewtushenko (Oracle)