xml:tm xml based text memory using xml technology to reduce the cost of translating xml documents 27...

44
xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

Upload: hilda-fleming

Post on 05-Jan-2016

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm

XML Based Text Memory

Using XML technology to reduce the cost of translating XML documents

27 June 2005

Page 2: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

• Machine Translation

• Translation Memory

• Hybrid Linguistic Inference Engines

• Terminology

Automating Translation

Page 3: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

Automating Translation

• Machine translation• 40 year history• Rigorous control of grammar and

terminology can produce good results• Lots of interesting new developments with

hybrid statistical/transfer based systems• Translation of free format text is

theoretically impossible with current technology.

Page 4: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

Translation Memory

• Align source and target text

• Look up new text against memory

• Relatively primitive technology

• Not much innovation over the past 30 years

• Need for proofing

• Proprietary translation memory formats

Page 5: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

• XML inherently easier to translate

• Separation of form and content

• Support for Unicode and other international encoding formats.

• Allows multiple output formats - PDF, XHTML, WAP

Translating XML Documents

Page 6: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

XML Translation Standards

• LISA - Localization Industry Standards Association: http://www.lisa.org

• OASIS - Organization for the Advancement of Structured Information Standards: http://www.oasis-open.org

• W3C - World Wide Web Consortium: http://www.w3c.org

• OLIF Consortium: http://www.olif.net

Page 7: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

LISA Standards

• TMX - Translation Memory Exchange format: http://www.lisa.org/tmx

• TBX - Termbase Exchange format: http://www.lisa.org/tbx

• SRX - Segmentation Rules Exchange format: http://www.lisa.org/srx

• GMX - GILT Metrics Exchange format: http://www.lisa.org/gmx

Page 9: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

W3C and OLIF

• W3C ITS http://www.w3.org/International/

http://www.w3.org/International/its

• OLIF - Open Lexicon Interchange Format: http://www.olif.net

Page 10: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

XML namespace

• Major feature of XML• Allows the mapping of different ontological

entities onto the same representation• Allows different ways to look at the same

data• Namespaces can be made transparent

Page 11: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm

• XML based text memory

• Revolutionary approach to translating XML documents

• First significant advance in translation memory technology

• Uses XML namespace to transparently embed contextual information

Page 12: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm namespace

• Text Memory namespace• Can be mapped onto any XML document• Vertical view of document in terms of ‘text

segments’• Can be totally transparent

Page 13: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm namespace

Example of the use of tm namespace in an XML document:

<document xmlns:tm="urn:xml-Intl-tm" > <tm:tm> <section> <para> <tm:te> <tm:tu> Namespace is very flexible. </tm:tu> <tm:tu> It is very easy to use. </tm:tu> </tm:te> </para>

Page 14: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm namespace

doc

title

section section

para

tm

te sentence sentencetu tu

te sentence sentencetu tu

te sentence sentencetu tu

tm namespace view

original document

view te texttutext

te sentence sentencetu tu

para text

para text

para text

para text

para text

te sentence sentencetu tu

te sentence sentencetu tu

text

Page 15: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm namespace

Namespace is very simple. It is easy to use.

te sentence sentencetu tu

original document view

tm namespace view

<para>

</para>

<para>

</para>

<tm:te id=“e1”>

<tm:tu id=“u1.1”> Namespace is very simple. </tm:tu>

<tm:tu id=“u1.2”> It is easy to use. </tm:tu>

</tm:te>

text

Page 16: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm Text Memory

• Author memoryMaintain memory of source text

Authoring statistics

Authoring tool input

• Translation memoryAutomatic alignment

Maintain perfect link of source and target text

Reduce translation costs

Page 17: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

Updated Source Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”new

Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

xml:tm DOM differencing

origid=”5”modified

Page 18: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm Author Memory

• Namespace aware DOM differencing• Identify changes from the previous version• Unique text unit identifiers are maintained• Modification history• Text units can be loaded into a database• Authoring environment integration

Page 19: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005
Page 20: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005
Page 21: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm Translation Memory• The tm namespace can be used to create XLIFF

files

• Automatic alignment of source and target languages

• Allows for more focused translation matching– Exact matching– Leveraged matching from document - identical text– Leveraged matching from database– Modified text unit matching– Non translatable text unit identification

Page 22: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

DITA Strengths

• Topic-centric level of granularity

• Very well thought out and flexible architecture for content creation and publishing

• Substantial reuse of existing assets

• Specialization at the topic and domain levels

• Automated processing based on meta data property

• Translate topic only once, reuse many times

Page 23: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

DITA and xml:tm

• Both complement each other• xml:tm encourages text reuse at the sentence

level• Automates translation matching and extraction• Automatic alignment of source and target

documents at the text unit (sentence) level• Introduces the concept of exact matching for

translation as well as focused matching• Fully integrated with existing standards such as

SRX, GMX, TMX and XLIFF

Page 24: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm translation via XLIFF

Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Translated Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

XLIFF Document

trans-unit id=”1”

trans-unit id=”2”

trans-unit id=”3”

trans-unit id=”4”

trans-unit id=”5”

trans-unit id=”6”

Page 25: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

doc

title

section section

para tekst

tm

te zdanie zdanietu tu

te zdanie zdanietu tu

te zdanie zdanietu tu

translated tm namespace

view

translated document

view te teksttutekst

te zdanie zdanietu tu

para tekst

para tekst

para tekst

para tekst

para tekst

te zdanie zdanietu tu

te zdanie zdanietu tu

xml:tm translated document

Page 26: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Translated Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Exact alignment

xml:tm perfect alignment

Page 27: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm perfect matching

Updated Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

modified

new

Matched Target Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Perfect Matching

requires translation

requires translation

Page 28: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm leveraged DB memorySource Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Translated Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Perfect alignment

DB

TMX

Page 29: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm in-document leveraged matching

Updated Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

modified

new:same id=”3”

Matched Target Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Perfect Matching

requires translation

requires proofing

leveraged match

Page 30: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm in-document fuzzy matching

Updated Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

mod:origid=”5”

New:same

Matched Target Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Perfect Matching

requires translation

requires proofing

fuzzy match

leveraged match

Page 31: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm db leveraged matching

Updated Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

mod:origid=”5”

new:same

Matched Target Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Perfect Matching

requires translation

requires proofing

fuzzy match

doc leveraged match

tu id=”9” tu id=”9”

DB

requires proofing DB leveraged match

Page 32: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

Updated Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

non trans

tu id=”8”new:same

Matched Target Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Exact Matching

requires translation

requires proofing

fuzzy match

doc leveraged match

tu id=”9” tu id=”9”

DB

requires proofing DB leveraged match

tu id=”2” requires no translation non translatable

xml:tm non-translatable text

Page 33: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

Traditional Translation Scenario

source text

Publishing Translation

source text extractExtracted

texttm

process

Prepared text

TranslateTranslated

texttarget texttarget text

merge target text

QA

Page 34: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm source

text

Publishing

Translator

extractExtracted

texttm

process

XLIFF

file

Translate

xml:tm target text

merge

Web

perfect matching

leveraged matching

Automatic Process

Web service/ interface

QA

Automatic Process

xml:tm Translation Scenario

Page 35: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005
Page 36: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005
Page 37: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm benefits • Open Standard donated by XML INTL to LISA

• Complements DITA

• Enterprise level scalability

• Totally integrated within the XML framework

• Source text is automatically extracted and matched• Word counts are controlled by the customer• Text can be presented for translation via the web• Data is merged automatically at end of translation cycle• All memory operations are totally automated • Can be used transparently for relay translations• More accurate – better matching

Page 38: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

xml:tm• Full specification:

– http://www.xml-intl.com/docs/specification/xml-tm.html

• Maintained by xml-intl.com– http://www.xml-intl.com/dtd/tm.dtd– http://www.xml-intl.com/dtd/tm.xsd

• Detailed article on xml:tm in www.xml.com

• Donated by XML INTL to Lisa OSCAR

Page 39: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005
Page 40: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005
Page 41: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005
Page 42: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005
Page 43: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

Any questions?

Page 44: Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

XML INTL Contact Details

• Postal address:PO Box 2167Gerrards CrossBucks SL9 8XFUnited Kingdom

• Phone: +44 1753 480 467 • Fax: +44 1753 480 465 • Bob Willans - [email protected]• Andrzej Zydroń – [email protected]• Bartek Bogacki – [email protected]