mlif: a metamodel to represent and exchange multilingual textual information
DESCRIPTION
LREC 2010, 19 May 2010. MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information. ISO TC37 SC4 WG3 24616. Samuel Cruz-Lara, Gil Francopoulo, Laurent Romary, Nasredine Semmar. Outline. Scope of MLIF Purpose and Justification of MLIF Description of MLIF Use Cases - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/1.jpg)
MLIF: A Metamodel to Represent and Exchange
Multilingual Textual InformationISO TC37 SC4 WG3 24616
Samuel Cruz-Lara, Gil Francopoulo, Laurent Romary, Nasredine Semmar
LREC 2010, 19 May 2010
![Page 2: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/2.jpg)
OutlineScope of MLIFPurpose and Justification of MLIFDescription of MLIFUse CasesCurrent Status
2
![Page 3: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/3.jpg)
Scope of MLIF
![Page 4: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/4.jpg)
4
ScopeMLIF aims at proposing a specification platform to represent multilingual data within a large variety of applications such as translation memories, localization, computer-aided translation, multimedia or electronic document managementMLIF introduces a metamodel in combination with chosen data categories in order to allow the description of any specific domainMLIF provides a way to validate any instance of this metamodel, as well as, interoperability principles with numerous translation and localization standards
![Page 5: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/5.jpg)
OutlineScope of MLIFPurpose and Justification of MLIFDescription of MLIFUse CasesCurrent Status
5
![Page 6: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/6.jpg)
6
Purpose and Justification
The evolution of Communication and Information Technologies and in particular natural language processing, makes acute the question of standardizationThe issues related to standardization are of an industrial, economic and cultural natureThe control of the interoperability between the existing industrial standards for localization (XLIFF), translation memory (TMX), … constitutes a major objective for a coherent and global management of multilingual dataMLIF could be associated to multimedia standards such as MPEG-4 [ISO/IEC 14496 ], MPEG-7 [ ISO/IEC 15938 ], and W3C SMIL, in order to handle multilingual data within several multimedia applications such as, interactive TV, video conferencing, subtitling, etcAll these formats work well in the specific field they are designed for, but they lack a synergy that would make them interoperable when using one type of information in a slightly different context
MLIF should be considered as a unified conceptual representation of multilingual and multimedia content
![Page 7: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/7.jpg)
OutlineScope of MLIFPurpose and Justification of MLIFDescription of MLIFUse CasesCurrent Status
7
![Page 8: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/8.jpg)
8
Description of MLIFAs with “Terminological Markup Framework” TMF [ISO 16642] in terminology, MLIF introduces a metamodel in combination with chosen data categories [ ISO 12620 ]
These data categories will be derived as a subset of a Data Category Registry (DCR) in order to ensure interoperability between several multilingual applications and corpora
A Data Category Specification (DCS) will define, in combination with the metamodel, the various constraints that apply to a given domain-specific information structure or interchange format
MLIF describes elementary linguistic segments (i.e. sentence, syntactical component, word, …)
![Page 9: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/9.jpg)
• MLDC (Multi Lingual Data Collection)
• GI (Global Information)
• HistoC (History Component)
• GroupC (Grouping Component)
• MultiC (Multilingual Component)
• MonoC (MonoLingual Component)
• SegC (Segmentation Component)
9
MLIF Metamodel
![Page 10: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/10.jpg)
10
MLIF MetamodelMulti Lingual Data Collection (MLDC)
Represents a collection of data containing global information and several multilingual units
Global Information (GI)Represents technical and administrative information applying to the entire data collection. Example: title of the data collection, revision history, …
![Page 11: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/11.jpg)
11
MLIF MetamodelHistory Component (HistoC)
This generic component allows to trace modifications on the component it is anchored to (i.e. versioning)
Grouping Component (GroupC)Represents a sub-collection of multilingual data having a common origin or purpose within a given project
![Page 12: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/12.jpg)
MLIF MetamodelMulti Lingual Component (MultiC)
This component represents a unique multilingual entryMono Lingual Component (MonoC)
Part of a multilingual component containing information related to one language
Segmentation Component (SegC)A recursive component allowing any level of segmentation for textual information
In order to provide a larger description of the linguistic content, the MLIF metamodel allows anchoring of other metamodels, such as MAF (morphological description), SynAF (syntactical annotation), TMF (terminological description), or any other metamodel based on ISO 12620
12
![Page 13: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/13.jpg)
Data Categories
• Domain• Project• Source• sourceType• sourceLanguage• class• duration• begin• next• xml:id• xml:lang• xlink…
13
MLIF Metamodel
![Page 14: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/14.jpg)
MLIF: a simple example
14
![Page 15: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/15.jpg)
OutlineScope of MLIFPurpose and Justification of MLIFDescription of MLIFUse CasesCurrent Status
15
![Page 16: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/16.jpg)
16
Use CasesInteroperabilityLinguistic PropertiesRelated StandardsMultimediaInteractive TV
![Page 17: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/17.jpg)
17
InteroperabilityTMX“the sentence contains different formatting information”
![Page 18: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/18.jpg)
Interoperability
18
TMX file
MLIF file
![Page 19: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/19.jpg)
Linguistic Properties
Sentences, words, …
Time related issues
19
![Page 20: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/20.jpg)
Linguistic Properties
20
TMX file produced by TRADOS
MLIF file produced by CEA LIST Sentence Aligner
![Page 21: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/21.jpg)
21
Related StandardsTEI (Text Encoding Initiative)
The description of all different XML elements has been done by using RelaxNG [ ISO 19757-2 ] with the help of ODD
W3C ITS (International Tag Set)ITS is a set of rules, expressed in elements, that provide information on how parts of a given DTD or XML Schema are related to specific internationalization & localization propertie
W3C SMILSMILtext
MLIF may be used to include pre-existant non-MLIF data like the ones that are produced by NLP tools
![Page 22: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/22.jpg)
Multimedia
22
![Page 23: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/23.jpg)
Interactive TVTimed, Multilingual Textual DescriptionsTimed, Multilingual Textual Descriptions
W3C SMIL Standardization- Development of Interactive TV Profile- Integration of Annotation Support- Definition of Temporal Text Processing
ISO MLIF Standardization- Development of MLIF format- Development of a multilingual processing pipeline- Interaction with SMIL and MPEG standards
multilingual componentmultilingual component
multilingual DBmultilingual DB
linguistic segmentlinguistic segment
l’histoire du courage d’une femme pour l’histoire du courage d’une femme pour démasquer un mystèredémasquer un mystère
Monolingual componentMonolingual component
linguistic segmentlinguistic segment
la historia da la valentía de una mujer para la historia da la valentía de una mujer para desenmascarar undesenmascarar un misteriomisterio
Monolingual componentMonolingual component
23
![Page 24: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/24.jpg)
OutlineScope of MLIFPurpose and Justification of MLIFDescription of MLIFUse CasesCurrent Status
24
![Page 25: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/25.jpg)
Current StatusAWI (August 2006)CD (Mai 2009)DIS (February 2010)
Ongoing ballot process
25
![Page 26: MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information](https://reader036.vdocuments.us/reader036/viewer/2022062816/56814a78550346895db78ea7/html5/thumbnails/26.jpg)
26
Thank you!Thank you for your attentionAny question?Mailing list
[email protected] site
http://mlif.loria.fr