coping with babel how to localize xml. designing for localization document design can seriously...
TRANSCRIPT
![Page 1: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/1.jpg)
Coping with Babel
How to Localize XML
![Page 2: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/2.jpg)
Designing for Localization
• Document design can seriously impact the costs of translation and localization.
• Remember that you are designing for all languages, not just English.
• There are clear do’s and don’ts.
• Overriding principle is good XML practice.
• Always consider the target language implications.
![Page 3: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/3.jpg)
Entity references
Do not use entity references for word substitution:
<para>Use a &tool; to release the catch.</para>
• Cause problems for inflected languages
• Cause problems for parsing/translation tools
• Use boiler plate text instead
![Page 4: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/4.jpg)
Translatable attributes
Avoid using translatable attributes:<para>Use a <tool id="a1098" name="claw hammer"> to release the CPU retention catch.</para>
• Cause problems for inflected languages
• Cause extra burden for translators
• More to go wrong
![Page 5: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/5.jpg)
CDATA sections
Avoid using CDATA sections that may contain translatable text:
<tmpl><![CDATA[<p>Please refer to the <em>index page</em> page for further information</p>]]></tmpl>
• Lose syntactical control
• How are translation tools to cope?
![Page 6: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/6.jpg)
Processing instructions
Avoid Processing Instructions in translatable text:
<para>Use a <?tool name="claw hammer"?> to release the CPU retention catch.</para>
• Syntactically week
• Confuse translation memory operations
![Page 7: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/7.jpg)
Infinite Naming Schemes
Avoid the use of infinite naming schemes:<resources xml:lang="en">
<err001>Cannot open file $1.</err001>
<hint001>Hint: does file $1 exist.</hint001>
<err002>Incorrect value.</err002>
<hint002>Hint: Must be between $1 and 2.</hint002>
<err003>Connection timeout.</err999>
</resources>
• No clear element definitions
![Page 8: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/8.jpg)
Typographical elements
Avoid the use of "typographical" elements:<para><b>Do not use</b> <br/> type elements.</para>
• Bad XML practice.
• Causes problems for translators.
• Target language text may be in the opposite order.
![Page 9: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/9.jpg)
Do not break sentences
Never break a linguistically complete text unit over more than one non-inline element:
<para>
<line>This text should not be</line>
<line>broken this way – the translated text may well be in a different order.</line>
</para>
![Page 10: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/10.jpg)
XML Translation Standards
• LISA - Localization Industry Standards Association: http://www.lisa.org
• OASIS - Organization for the Advancement of Structured Information Standards: http://www.oasis-open.org
• W3C - World Wide Web Consortium: http://www.w3c.org
• OLIF Consortium: http://www.olif.net
![Page 11: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/11.jpg)
LISA Standards
• TMX - Translation Memory Exchange format: http://www.lisa.org/tmx
• TBX - Termbase Exchange format: http://www.lisa.org/tbx
• SRX - Segmentation Rules Exchange format: http://www.lisa.org/srx
• GMX - GILT Metrics Exchange format: http://www.lisa.org/gmx
![Page 12: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/12.jpg)
OASIS L10n Standards
• XLIFF - XML Localization Interchange File Format: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff
• TransWS - Translation Web Services: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=trans-ws
![Page 13: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/13.jpg)
W3C and OLIF
• W3C to start on Localization Directives standard.
• OLIF - Open Lexicon Interchange Format: http://www.olif.net
![Page 14: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/14.jpg)
xml:tm
XML Text Memory
A radical new approach to translating XML documents
![Page 15: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/15.jpg)
• Machine Translation
• Translation Memory
• Hybrid Linguistic Inferencing Engines
• Terminology
Computational Linguistic Methodologies
![Page 16: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/16.jpg)
Translation memory
• Advent in early 1980’s
• Intermediate format
• Alignment
• Storage
• Leveraged memory
• Fuzzy matching – statistical
• Advantages: cost reduction, consistency
• Drawbacks: proofreading, managing memories
• No significant advances in technology
![Page 17: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/17.jpg)
XML namespace
• Major new feature of XML compared to SGML• Allows the mapping of different ontological
entities onto the same representation
• Allows different ways to look at the same data• Namespaces can be made transparent
![Page 18: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/18.jpg)
xml:tm namespace
• Text Memory namespace• Can be mapped onto any XML document• Vertical view of document in terms of ‘text segments’• Can be totally transparent
xml:tm
![Page 19: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/19.jpg)
xml:tm namespacexml:tm
Example of the use of namespace in an XML document:
<document xmlns:tm="urn:xml-Intl-tm" > <tm:tm> <section> <para> <tm:te> <tm:tu> Namespace is very flexible. </tm:tu> <tm:tu> It is very easy to use. </tm:tu> </tm:te> </para>
![Page 20: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/20.jpg)
xml:tm namespace
doc
title
section section
para text
tm
te sentence sentencetu tu
te sentence sentencetu tu
te sentence sentencetu tu
tm namespace view
original document
view te texttutext
te sentence sentencetu tu
para text
para text
para text
para text
para text
te sentence sentencetu tu
te sentence sentencetu tu
![Page 21: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/21.jpg)
xml:tm namespace
text
te sentence sentencetu tu
original document view
tm namespace view
![Page 22: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/22.jpg)
xml:tm namespace
Namespace is very simple. It is easy to use.
te sentence sentencetu tu
original document view
tm namespace view
<para>
</para>
<para>
</para>
<tm:te id=“e1”>
<tm:tu id=“u1.1”> Namespace is very simple. </tm:tu>
<tm:tu id=“u1.2”> It is easy to use. </tm:tu>
</tm:te>
text
![Page 23: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/23.jpg)
xml:tm Text Memory
• Author memoryMaintain memory of source text
Authoring statistics
Authoring tool input
• Translation memoryAutomatic alignment
Maintain perfect link of source and target text
Reduce translation costs
xml:tm
![Page 24: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/24.jpg)
Updated Source Document
tu id=”1”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
deleted
tu id=”8”new
Source Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”5”
tu id=”6”
xml:tm DOM differencing
origid=”5”modified
![Page 25: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/25.jpg)
xml:tm Author Memory
• Namespace aware differencing
• Identify changes from the previous version• Unique text unit identifiers are maintained• Modification history• Text units can be loaded into a database• Authoring environment integration
xml:tm
![Page 26: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/26.jpg)
xml:tm Translation Memory
• The tm namespace can be used to create XLIFF files
• Automatic alignment of source and target languages• Allows for more focused translation matching
– Perfect matching
– Leveraged matching from document - identical text
– Leveraged matching from database
– Modified text unit matching
– Linguistically enhanced fuzzy matching
– Non translatable text unit identification
xml:tm
![Page 27: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/27.jpg)
xml:tm translation
Source Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”5”
tu id=”6”
Translated Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”5”
tu id=”6”
XLIFF Document
trans-unit id=”1”
trans-unit id=”2”
trans-unit id=”3”
trans-unit id=”4”
trans-unit id=”5”
trans-unit id=”6”
![Page 28: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/28.jpg)
doc
title
section section
para tekst
tm
te zdanie zdanietu tu
te zdanie zdanietu tu
te zdanie zdanietu tu
translated tm namespace
view
translated document
view te teksttutekst
te zdanie zdanietu tu
para tekst
para tekst
para tekst
para tekst
para tekst
te zdanie zdanietu tu
te zdanie zdanietu tu
xml:tm translated document
![Page 29: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/29.jpg)
Source Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”5”
tu id=”6”
Translated Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”5”
tu id=”6”
Perfect alignment
xml:tm perfect alignment
![Page 30: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/30.jpg)
Updated Source Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
deleted
tu id=”8”
modified
new
Matched Target Document
tu id=”1”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
tu id=”8”
Perfect Matching
requires translation
requires translation
xml:tm perfect matching
![Page 31: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/31.jpg)
xml:tm contextual memory
Source Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”5”
tu id=”6”
Translated Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”5”
tu id=”6”
Perfect alignment
![Page 32: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/32.jpg)
Source Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”5”
tu id=”6”
Translated Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”5”
tu id=”6”
Perfect alignment
DB
xml:tm leveraged DB memory
![Page 33: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/33.jpg)
xml:tm in-document leveraged matching
Updated Source Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
deleted
tu id=”8”
modified
new:same id=”3”
Matched Target Document
tu id=”1”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
tu id=”8”
Perfect Matching
requires translation
requires proofing
leveraged match
![Page 34: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/34.jpg)
xml:tm in-document fuzzy matching
Updated Source Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
deleted
tu id=”8”
mod:origid=”5”
New:same
Matched Target Document
tu id=”1”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
tu id=”8”
Perfect Matching
requires translation
requires proofing
fuzzy match
leveraged match
![Page 35: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/35.jpg)
Updated Source Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
deleted
tu id=”8”
mod:origid=”5”
new:same
Matched Target Document
tu id=”1”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
tu id=”8”
Perfect Matching
requires translation
requires proofing
fuzzy match
doc leveraged match
tu id=”9” tu id=”9”
xml:tm db leveraged matching
DB
requires proofing DB leveraged match
![Page 36: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/36.jpg)
Updated Source Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
non trans
tu id=”8”new:same
Matched Target Document
tu id=”1”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
tu id=”8”
Perfect Matching
requires translation
requires proofing
fuzzy match
doc leveraged match
tu id=”9” tu id=”9”
DB
requires proofing DB leveraged match
tu id=”2” requires no translation non translatable
xml:tm non translatable text
![Page 37: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/37.jpg)
Traditional Translation Scenarioxml:tm
source text
Publishing Translation
source text extract
Extracted text
tm process
Prepared text
TranslateTranslated
text
target text
target text
merge
target text
QA
![Page 38: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/38.jpg)
xml:tm
xml source
text
Publishing
Translator
extractExtracted
texttm
process
Prepared text
Translate
xml target text merge
Web
perfect matching
leveraged matching
Automatic Process
web interfaceQA
Automatic Process
xml:tm Translation Scenario
![Page 39: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/39.jpg)
xml:tm matching• Perfect Matching driven by Author Memory• Leveraged Matching:
100% same textIn document Leveraged MatchingDatabase Leveraged Matching
• Fuzzy MatchingModified MatchingLinguistically aware Fuzzy Matching
• Non translatable element identificationAlphanumericNumericMeasurements
xml:tm
![Page 40: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/40.jpg)
xml:tm benefits
• Enterprise level scalability
• Totally integrated within the XML framework
• Source text is automatically extracted and matched• Word counts are controlled by the customer• Text can be presented for translation via the web• Online composition• The most up to date translation is held by the customer• Data is merged automatically at end of translation cycle• All memory operations are totally automated • Can be used transparently for relay translations• Much cheaper to implement and run• More accurate – better matching
xml:tm
![Page 41: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/41.jpg)
xml:tm summary
• Can be used to build consistent authoring systems• Can be used to produce automatic authoring statistics• Translation Memory generation and alignment is totally
automatic
• Memory is held within the documents themselves• Extraction and merging for translation are automatic• The system provides much more efficient matching mechanisms• Structure of the XML document is protected during translation
xml:tm
![Page 42: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/42.jpg)
xml:tm
• Fully specified XML based standard• http://www.xml-intl.com/docs/specification/
xml-tm.html• Maintained by xml-intl.com• http://www.xml-intl.com/dtd/tm.dtd• http://www.xml-intl.com/dtd/tm.xsd• Detailed article on www.xml.com• Offered for consideration as a Lisa standard
xml:tm
![Page 43: Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember](https://reader035.vdocuments.us/reader035/viewer/2022062409/56649eca5503460f94bd88ae/html5/thumbnails/43.jpg)
Any questions?
xml:tm