what’s new in xliff 1.2? tony jewtushenko director research & development product innovator...

48
What’s New in XLIFF 1.2? Tony Jewtushenko Director Research & Development Product Innovator Ltd. Co-Chair – OASIS XLIFF TC The XML Localisation Interchange File Format

Upload: scott-grimmer

Post on 22-Dec-2015

232 views

Category:

Documents


6 download

TRANSCRIPT

What’s New in XLIFF1.2?

Tony JewtushenkoDirector Research & Development

Product Innovator Ltd. Co-Chair – OASIS XLIFF TC

The XML Localisation Interchange File Format

Agenda

Overview of XLIFF Definition, goals, benefits, architecture and basic XLIFF concepts

What’s new in XLIFF 1.2New and changed features of XLIFF 1.2 normative specification

Non-Normative Representation GuidesA brief introduction of the representation guides provided with XLIFF 1.2

XLIFF Overview

A glance at the definitions, goals and benefits of the XML Localisation Interchange File Format.

What is XLIFF?

A specification

for the lossless interchange of localizable data and its related information,

which is tool-neutral,

has been formalized as an XML vocabulary,

and features an extensibility mechanism.

Why XLIFF was created…

Localisation Is Difficult Insufficient interoperability between tools Lack of support for overall localisation

workflow Necessity of localisation tools developers to

deal with many formats Large number of proprietary intermediate

formats

XLIFF Timeline

2001

9/00 9/06

2002 2003 2004 2005 2006

Sep 2000DataDefinition

Kickoff

Mar 2001Draft 1.0 Spec

and DTDpublished

Jun 2001WhitepaperPublished

Dec 2001OASIS XLIFFTC Proposal

Submitted

Apr 2002XLIFF 1.0

Committee SpecApproved

May 2003XLIFF 1.1

Committee SpecApproved

Aug 03 - Sep 03XLIFF 1.1 Public

Peer Review

Nov 03Revised XLIFF 1.1Committee Spec

Approved

Dec 03 - May 06XLIFF 1.2 Segmentation

Representation Guides for (X)HTML,Java, PO/POT

May 2006XLIFF 1.2 Committee Spec

Representation GuidesApproved

14 Jul, 2006 - 12 Sep, 2006XLIFF 1.2

Public Peer Review

Contributors to XLIFF - Past and Present Alchemy Software Bowne Global Solutions Convey Software Ektron, Inc ENLASO Corp (RWS) Globalsight Heartsome HP Idiom Technologies, Inc Lionbridge LRC Lotus/IBM

Microsoft Moravia IT Novell Oracle Red Hat PASS Engineering SAP SDL International Sun Microsystems Tektronix TRADOS XML Intl

OASIS XLIFF TC Members as of 1 Sept 06

TC Officers: Chairs: Tony Jewtushenko, Product Innovator Ltd; Bryan Schnabel, Tektronix Secretary: Peter Reynolds, Idiom Technologies, Inc.

Current Members of TC: • Mat Lovatt, Oracle• Doug Domeny, Ektron• Rodolfo Raya, Heartsome• Eiju Akahane, IBM• Steven Harris, Idiom Technologies, Inc.• Fredrik Corneliusson, Lionbridge• Joachim Schurig, Lionbridge• Milan Karasek, Moravia IT• Florian Sachse, Pass Engineering• Christian Lieske, SAP• Magnus Martikainen, SDL International• David Pooley, SDL International• Kevin Bargary, University of Limerick Localisation Research Centre• Reinhard Schaler, University of Limerick Localisation Research Centre• Andrzej Zydron, XML- Intl

OASIS: Standards Body Home of XLIFF

OASIS: Organization for the Advancement of Structured Information Standards

World’s largest independent, non-profit organization dedicated to the standardisation of XML applications and Web Services

More than 150 member companies plus individuals Operates XML.ORG Registry, the open community

clearinghouse of XML application schemas clearinghouse of XML application schemas

Technical work on XML interoperability includes XML conformance and XML Registries/Repositories

General XML technical resource

XLIFF Benefits:

Cost,Time

Automation

OpenStandards

Interoperability

Flexiblility

ScalabilityReduce cost,

turnaround time

Reduce cost, turnaround time

Reduces Effort in Deploying Integrated

Best of Breed Solutions

Reduces Effort in Deploying Integrated

Best of Breed Solutions

Reduces Vendor Lock-In, Re-Use

Reduces Vendor Lock-In, Re-Use

Reduces Defects introduced by

Manual Processing and Handling

Reduces Defects introduced by

Manual Processing and Handling

Leverages services, technologies,

vendors

Leverages services, technologies,

vendors

Easy to scale and future proof

Easy to scale and future proof

High Level XLIFF Architecture

An XLIFF document is a container for all data needed for a localisation project:

1. Localizable objects (e.g. text strings, graphics) in source and target languages.

2. Supplementary information (e.g. glossaries, or material to recreate the original format).

3. Administrative information (e.g. workflow data).

4. Custom data (e.g. initialization information for tools).

The XLIFF Document

An XLIFF document is designed to store the extracted data related to localisation.

Each given source container (e.g. a file, a database table, and so forth) corresponds to a <file> element in XLIFF.

Each XLIFF document can include several <file> elements.

An entire localisation project could stored in a single XLIFF document.

Bilingual Model

Each <file> element is designed to store one source language and one target language

The rationale is that the translation of different target language is done by different people most of the time

However, languages in <alt-trans> element can be different. For example, proposed matches in national Portuguese when translating into Brazilian Portuguese.

Localisable Objects

Besides localisable text, XLIFF can also contain other localisable object types such as binary graphics

Supplementary information can be represented in a generic way through inline codes (e.g. formatting of text)

Relationship between object can be captured (e.g. a hierarchical menu or text related to a web graphic)

Supplementary Info

XLIFF provides “hooks” for storing supplementary information in reference element Glossaries Translation memories Segmentation Rules (via SRX file)

The supplementary information can be referenced (i.e. reside outside of the document), or embedded within the document

Administrative Info

XLIFF provides mechanisms for capturing administrative information:

For relating source material to XLIFF documents.

For storing workflow data. For providing pre-translation entries. For keeping track of changes.

Administrative Info – Pre-Translation

A set of proposed translations can be included for each <trans-unit> element, using the <alt-trans> element.

<trans-unit id='1'> <source xml:lang='en'>The text</source> <alt-trans quality-match='high' origin='MTsystem'> <target xml:lang='fr'>Le texte</target> </alt-trans></trans-unit>

Customising XLIFF

Customise XLIFF by extending (adding) user defined:

Elements Attributes Attribute Values

Extending Elements

Extension points in the following elements: <alt-trans>, <bin-unit>,<group>, <header>,<tool>,

<trans-unit>, and new in 1.2: <xliff> and <seg-source>. content of each custom element can be any valid

XML content: empty content, PCDATA, mixed content, and so forth

Custom elements defined in private namespace schema

Example of Extending Elements<xliff version='1.2'xmlns='urn:oasis:names:tc:xliff:document:1.2'xmlns:sup='http://www.ChaucerState.ac.pg/Frm/XLFSup-v1'> <file original='passus-1.doc' source-language='enm‘

datatype='plaintext'> <group> <sup:SourceInfo> <sup:Book>Piers Plowman, Passus 1</sup:Book> <sup:Author>William Langland</sup:Author> </sup:SourceInfo> <sup:WorkInfo Task='transcription' Context='Middle-English:1360'/> <trans-unit id='1'> <source xml:lang='enm'>What this mountaigne bymeneth</source> <target xml:lang='en'>What this mountain means</target> <sup:Reference Type='strophe'>1-a</sup:Reference> </trans-unit> </group> </file></xliff>

Non-XLIFF elements in BOLD

Non-XLIFF elements Defined in XSD:

<xsd:schema targetNamespace="XLFSup-v1"xmlns:xsd="http://www.w3.org/2001/XMLSchema"xmlns:sup="http://www.ChaucerState.ac.pg/Frm/XLFSup-v1"elementFormDefault="qualified" attributeFormDefault="unqualified"><xsd:element name="SourceInfo"><xsd:complexType><xsd:sequence maxOccurs="unbounded"><xsd:element name="Book" type="xsd:string"/><xsd:element name="Author" type="xsd:string"/></xsd:sequence></xsd:complexType></xsd:element><xsd:element name="WorkInfo"><xsd:complexType><xsd:attribute name="Task" type="xsd:string"/><xsd:attribute name="Context" type="xsd:string"/></xsd:complexType></xsd:element><xsd:element name="Reference"><xsd:complexType><xsd:simpleContent><xsd:extension base="xsd:string">Struct_InLine<xsd:attribute name="Type" type="xsd:string"/></xsd:extension></xsd:simpleContent></xsd:complexType></xsd:element></xsd:schema>

Extending Attributes Attributes of a namespace different than XLIFF can

be included in these XLIFF elements: <alt-trans>, <bin-source>, <bintarget>,<bin-unit>, <bpt>,

<bx/>, <ept>, <ex/>, <file>, <g>, <group>, <it>, <mrk>,<ph>, <source>, <target>, <tool>, <trans-unit>, <x/>, and new in 1.2 :<xliff>, <seg-source>.

No specific location where to insert the non-XLIFF attributes

No limit to the number of non-XLIFF attributes that can be used in an XLIFF document

Extending AttributesAttributes from HTML extend <group> and <trans-unit>

<xliff version='1.2' xmlns='urn:oasis:names:tc:xliff:document:1.2' xmlns:htm='http://www.w3.org/1999/xhtml'><file original='table.htm' source-language='en' datatype='html'>

<group restype='table' htm:border='1' htm:cellpadding='5‘ htm:cellspacing='0' htm:width='100%'>

<group restype='row'><trans-unit id='1' htm:valign='top' htm:width='30%'>

<source>Text of row 1 column 1</source> </trans-unit>

<trans-unit id='1' htm:valign='top' htm:width='30%'><source>Text of row 1 column 2</source>

</trans-unit></group>

<group restype='row'><trans-unit id='1' htm:valign='top' htm:width='30%'>

<source>Text of row 2 column 1</source></trans-unit><trans-unit id='1' htm:valign='top' htm:width='30%'>

<source>Text of row 2 column 2</source></trans-unit>

</group></group>

</file></xliff>

Extending Attribute Values

Attributes where the list of values can be extended are the following: context-type, count-type, ctype, datatype, mtype, priority, purpose, restype, size-unit, state, state-qualifier, unit; new in 1.2: alttranstype, reformat

User-defined values must start with a “x-” prefix There is no specified mechanism to validate

individual user-defined values, beyond starting with “x-”

Example of Extending Attribute Values

The following excerpt shows how the user-defined value “x-for-engineer” can be utilized in a document:

...<group>

<context-group name='EngineersData'><context context-type='x-for-

engineers'>Data...</context></context-group>

</group> ...

Embedding XLIFF

Can embed an entire or part of an XLIFF doc in other XML doc

Valid where XML defined by XML Schema (XSD) includes an <any> element in the definition of the element where the XLIFF data can be inserted

What’s new in XLIFF 1.2

New and changed features of XLIFF 1.2 normative specification

New, Deprecated or Changed 1.1 to 1.2 Validation via Transitional and Strict models Segmentation Support added Add mid as an optional attribute for the <alt-trans> element Changed name attribute for <context-group> from required to

optional, and modified description Added extension point at <xliff> Tracking/Accepting Suggested Translations added:

Add a alttranstype attribute for the alt-trans element. Deprecate the use of multiple target elements in a single alt-trans. Deprecate the restype attribute for the target element. Introduce the phase-name attribute for alt-trans element. Introduce a convention: more recent alt-trans elements should

appear before older ones.

Validation in 1.2

Validation via two “Flavours” of XSD (Schema): Transitional: Deprecated (obsolete) elements

and attributes are permitted. Use to validate reading older version documents (XLIFF 1.1). xsi:schemaLocation='urn:oasis:names:tc:xliff:document:1.2 xliffcore-1.2-transitional.xsd‘

Strict: Deprecated items are not permitted. Use to validate when creating XLIFF 1.2 documents.xsi:schemaLocation='urn:oasis:names:tc:xliff:document

:1.2 xliffcore-1.2-strict.xsd'

XLIFF 1.2 Segmentation: seg-source

How corresponding segments are referenced between <seg-source> and <target>

<trans-unit id= "1"><source>First sentence.Second sentence.</source><seg-source><mrk mtype="seg" mid="1">First sentence.</mrk><mrk mtype="seg" mid="2">Second sentence.</mrk></seg-source><target><mrk mtype="seg" mid="1">Translated first sentence.</mrk><mrk mtype="seg" mid="2">Translated second sentence.</mrk></target></trans-unit>

XLIFF 1.2 Segmentation: seg-source

Alt-trans may also be segmented:<trans-unit id="3">

<source>First sentence. Second sentence.</source><alt-trans match-quality="100%"><source>The second sentence.</source>

<seg-source>

<mrk mtype="seg" mid="1">First sentence.</mrk>

<mrk mtype="seg" mid="2">Second sentence.</mrk>

</seg-source>

<target>

<mrk mtype="seg" mid="1">Translated first sentence.</mrk>

<mrk mtype="seg" mid="2">Translated second sentence.</mrk>

</target>

</alt-trans>

</trans-unit>

XLIFF 1.2 Segmentation: merged-trans

Aggregating translations across multiple trans-units:<group merged-trans="yes"> <trans-unit id="t1"> <source>The German acronym v.</source> <target equiv-trans="no">Niemiecki skrót v. OT oznacza górną pozycję silnika.</target> </trans-unit> <trans-unit id="t2"> <source>OT signifies the top dead center position for an engine.</source> <target equiv-trans="no"/> </trans-unit></group>

XLIFF 1.2 Segmentation: equiv-trans

To denote when translation is not direct equivalent to source: <trans-unit id="t1">

<source>Constrained text for limited</source>

<target equiv-trans="no">Tekst angielski dla</target>

</trans-unit>

<trans-unit id="t2">

<source>display for English</source>

<target equiv-trans="no">ograniczonego pola</target>

</trans-unit>

XLIFF 1.2 Add a type attribute for the <alt-trans> element

The type attribute is to be optional, and is to have the following values and meanings:

Value Meaning

proposal (default) The <alt-trans> represents a translation proposal from a translation memory or other resource.

previous-version The <alt-trans> represents a previous version of the <target> element

rejected The <alt-trans> represents a rejected version of the <target> element.

reference The <alt-trans> represents a translation to be used for reference purposes only, for example from a related product or a different language

accepted The <alt-trans> represents a proposed translation that was used for the translation of the trans-unit, possibly modified.

XLIFF 1.2 Additional revision to alt-trans Introduce the phase-name attribute for <alt-trans>

makes it possible to find out who made the change, when, and which process the change was introduced in

Deprecate the restype attribute for the <target> element no longer needed, as the <target> is always of the same restype

as the <trans-unit> or <alt-trans> it appears in Introduce the phase-name attribute for <alt-trans>

makes it possible to find out who made the change, when, and which process the change was introduced in

convention: more recent <alt-trans> elements should appear before older ones determine the order of changes if multiple previous versions

have been introduced

Non-Normative Representation Guides

A brief walk-through of the Representation Guides provided with XLIFF 1.2

Purpose of the Guides

Synonymous with “profile” specifications Non-normative

Not requirement for “legal” XLIFF 1.2 Guidance for consistently representing native

formats as XLIFF across implementations Kickstart new implementations Better interoperability between tools

Guide Contents

Recommended Extraction Techniques and Considerations

Recommended mappings from native structures to XLIFF

Strategies for implementing Translation Memory support (using inline tags)

Detailed examples and supplementary sample files

Extract-Localize-Merge Minimalist Approach

Process:1. Identify localisable content (resources) and non-localisable content (code)2. Populate XLIFF document’s trans-unit and bin-unit with localisable content 3. Create “Skeleton File” with localisable content stripped out and replaced with tokens that map to

XLIFF trans-unit or bin-unit ID’s4. Translate XLIFF document5. Merge translated data in XLIFF with Skeleton to generate the localised translated material

Skeleton file is optional and not recommended in certain circumstances (e.g., HTML or if tool interoperability required)

In <SKL> embed the entire Skeleton file within the XLIFF file or specify the file’s location XLIFF doesn’t define the Skeleton file or token format

Convert/Transform Paradigm (maximalist approach)

Process:1. Convert original material by mapping entire original document to XLIFF (using

representation guides)2. Structural information (code) stored in XLIFF container as non-translatable trans-

units / bin-units3. Translate XLIFF content4. Generate the native translated material directly from the XLIFF content

Best suited for textual resource formats (RCDATA, Java, PO/POT) and mark-up languages like (X)HTML and XML

Difficult and impractical for binary resource formats (e.g., EXE’s and DLL’s)

OriginalMaterial

Filter

XLIFF

TranslatedMaterial

Minimalist Example –Source Content & Skeleton

A very simple HTML file: <html>

<head><h1 class='title'>Almost the Smallest HTML File</title>

</head> <body>

<p>Just some stuff here to fill up space</p> </body>

</html>

<html><head>

<title>%%%1%%%</title></head> <body>

<p>%%%2%%%</p> </body>

</html>

Original Content

…<header> <skl> <external-file href='sample.skl'/> </skl></header><body>

<trans-unit id='%%%1%%%'> <source xml:lang='en'>Almost the Smallest HTML File</source>

</trans-unit> <trans-unit id='%%%2%%% “restype='x-html-p'> <source xml:lang='en'>Just some stuff here to fill up

space</source> </trans-unit></body>

XLIFF

Skeleton

Filter

Full Transformation:

<html><head>

<h1 class='title'>Almost the Smallest HTML File</title></head> <body>

<p>Just some stuff here to fill up space</p> </body>

</html>

…<body> <group restype='x-html-html'>

<group restype='x-html-head'> <trans-unit id='1' restype='x-html-p-title' html:class='title'> <source xml:lang='en'>Almost the Smallest HTML File</source></trans-unit>

</group> <group restype='x-html-body'>

<trans-unit id='2' restype='x-html-p'> <source xml:lang='en'>Just some stuff here to fill up space</source> </trans-unit>

</group> </group></body>

Maximalist Example – Transform content to XLIFF

Original Content

XLIFF

Guides provided with XLIFF 1.2

(X)HTML Many flavours of HTML, guide focuses on HTML

4.01, XHTML 1.0 Java Resource Bundles

Support for java.util.ResourceBundle abstract class’ two subclasses: PropertyResourceBundle and ListResourceBundle

Gettext PO/POT files Linux resource format

To Get the Most from the Guides Review the document in full before commencing design or development of an

XLIFF solution Considerations for recommended source document structure and content Identify exceptions (e.g., dynamically generated HTML via server-side processing)

Consider the Guide’s recommended Extraction approach when designing overall architecture: HTML recommends “maximalist”, but provides examples for “minimalist” as well. Both PO/POT and Java make no specific recommendation, but examples are

“maximalist” Order of Extraction recommendations: typically in the order of the data in the source

document Refer to Mappings Reference in each guide when designing and building filters

Recommendations are comprehensive with many examples Non-standard structures and conventions are dealt with (especially for (X)HTML)

Use the Sample files Valuable reference for learning Provides validation during development effort Verify compliance by feeding sample files into filter – either native source or XLIFF

More Representation Guides

Late draft of Windows 32 / .NET Not approved, but is posted on the XLIFF website Requires more expert input

More to follow upon request

More Information

The XLIFF TC Web Site: http://www.xliff.org Presenter:

XLIFF TC Co-Chair: Tony Jewtushenko (Product Innovator Ltd)([email protected])

Thank You...

Questions?

Product Innovator Ltd

provides product management and software process improvement training and mentoring services to technology companies seeking to maximize their productivity and revenue potential

Contact: [email protected]+353 1 8875183 / +353.87.2479057