using olif, the open lexicon interchange format susan mccormick olif2 consortium october 1, 2004
TRANSCRIPT
Using OLIF,Using OLIF,The Open Lexicon The Open Lexicon
Interchange FormatInterchange Format
Susan McCormickSusan McCormick
OLIF2 ConsortiumOLIF2 Consortium
October 1, 2004October 1, 2004
The OLIF FormatThe OLIF Format
The Open Lexicon Interchange FormatThe Open Lexicon Interchange Format XML-compliant standard XML-compliant standard Supports exchange of lexical and Supports exchange of lexical and
terminological data for language terminological data for language technology applicationstechnology applications
Handles basic exchange as well as more Handles basic exchange as well as more complex applications such as MT lexiconscomplex applications such as MT lexicons
The OLIF2 ConsortiumThe OLIF2 Consortium
OLIF v.2 was developed by the OLIF2 OLIF v.2 was developed by the OLIF2 Consortium, a group of language Consortium, a group of language technology companies and technology companies and organizations interested in issues of organizations interested in issues of MT data/term data exchangeMT data/term data exchange Led by Led by SAPSAP Members include Members include Xerox, Microsoft, Xerox, Microsoft,
Trados, IBM, Systran, IAI, DFKITrados, IBM, Systran, IAI, DFKI and and ComprendiumComprendium
Developing OLIF v.2Developing OLIF v.2 Based on OLIF prototypeBased on OLIF prototype
Developed in EC-funded Developed in EC-funded OTELOOTELO project – project – proposing standards for users of proposing standards for users of disparate language toolsdisparate language tools
Original purpose of OLIF was to facilitate Original purpose of OLIF was to facilitate terminology exchange for industrial terminology exchange for industrial users of MTusers of MT
Developing OLIF v.2Developing OLIF v.2 Version 2 adapted from OLIF Version 2 adapted from OLIF
prototype using input fromprototype using input from Developers/users of 3+ MT systemsDevelopers/users of 3+ MT systems Developers/users of terminology Developers/users of terminology
management systemsmanagement systems Other language standards projects:Other language standards projects:
EAGLESEAGLES SALTSALT ISLEISLE MARTIF, TBXMARTIF, TBX
OLIF Version 2OLIF Version 2
Released as open standard in 2002Released as open standard in 2002 XML-compliantXML-compliant Covers 6 European languagesCovers 6 European languages
English, German, French, Spanish, English, German, French, Spanish, Danish, PortugueseDanish, Portuguese
Includes options for modeling Includes options for modeling administrative, morphological, administrative, morphological, syntactic and semantic datasyntactic and semantic data
Available to UsersAvailable to Users
XML implementation of OLIF XML implementation of OLIF specification in a DTDspecification in a DTD
Available from OLIF2 Consortium web Available from OLIF2 Consortium web site:site:
www.olif.netwww.olif.net
The OLIF FileThe OLIF File
Follows Terminology Markup Follows Terminology Markup Framework (TMF) structure:Framework (TMF) structure:
HeaderHeader BodyBody Shared resourcesShared resources
The OLIF EntryThe OLIF Entry
Collection of monolingual data on a Collection of monolingual data on a specified sense of a word or phrasespecified sense of a word or phrase
Optional links for cross-reference and Optional links for cross-reference and transfertransfer
Transfer is bilingual and unidirectionalTransfer is bilingual and unidirectional Multiple transfers in multiple languages Multiple transfers in multiple languages
possible for single word sensepossible for single word sense
Key Data CategoriesKey Data Categories
The OLIF entry is uniquely identified The OLIF entry is uniquely identified by 5 key data categories:by 5 key data categories: Canonical formCanonical form LanguageLanguage Part of speechPart of speech Subject fieldSubject field Semantic readingSemantic reading
Basic Well-Formed OLIF Basic Well-Formed OLIF EntryEntry
<entry> <mono>
<keyDC> <canForm>table</canForm> <language>en</language> <ptOfSpeech>noun</ptOfSpeech> <subjField>general</subjField> <semReading>86</semReading> </keyDC>
</mono></entry>
<entry><entry> <mono><mono> <keyDC><keyDC> <canForm>table</canForm>
<language>en</language> <ptOfSpeech>noun</ptOfSpeech> <subjField>general</subjField> <semReading>86</semReading> </keyDC></keyDC> <monoDC><monoDC>
</monoDC></monoDC> </mono></mono></entry></entry>
<monoAdmin><monoAdmin> <originator><originator>WeberWeber</originator> </originator>
<adminStatus><adminStatus>verver</adminStatus></adminStatus> </monoAdmin></monoAdmin>
<monoMorph><monoMorph> <inflection><inflection>like book,bookslike book,books</inflection> </inflection> </monoMorph></monoMorph> <monoSyn><monoSyn> <synType><synType>cntcnt</synType></synType> <synFrame><synFrame>[gencomp-opt][gencomp-opt]</synFrame> </synFrame> </monoSyn></monoSyn> <monoSem><monoSem> <semType><semType>informinform</semType></semType> </monoSem></monoSem>
OLIF Entry with Cross-OLIF Entry with Cross-ReferenceReference
<entry><entry>
<mono><mono>
<keyDC><keyDC>
<canForm>table</canForm> <language>en</language> <ptOfSpeech>noun</ptOfSpeech> <subjField>general</subjField> <semReading>86</semReading>
</keyDC></keyDC> </mono></mono>
</entry></entry>
<crossRefer><crossRefer> <keyDC><keyDC> <canForm><canForm>rowrow</canForm> </canForm> <language><language>enen</language> </language> <ptOfSpeech><ptOfSpeech>nounnoun</ptOfSpeech> </ptOfSpeech> <subjField><subjField>generalgeneral</subjField> </subjField> <semReading><semReading>6969</semReading> </semReading> </keyDC></keyDC> <crLinkType><crLinkType>has-meronymhas-meronym</crLinkType</crLinkType>></crossRefer</crossRefer>>
OLIF Entry with TransferOLIF Entry with Transfer<entry><entry>
<mono><mono>
<keyDC><keyDC>
<canForm>table</canForm> <language>en</language> <ptOfSpeech>noun</ptOfSpeech> <subjField>general</subjField> <semReading>86</semReading>
</keyDC></keyDC> </mono></mono>
</entry></entry>
<transfer><transfer> <keyDC><keyDC> <canForm><canForm>TabelleTabelle</canForm> </canForm> <language><language>dede</language> </language> <ptOfSpeech><ptOfSpeech>nounnoun</ptOfSpeech> </ptOfSpeech> <subjField><subjField>generalgeneral</subjField> </subjField> <semReading><semReading>8686</semReading> </semReading> </keyDC></keyDC></transfer</transfer>>
Data Category ValuesData Category Values Allowed values specified by OLIFAllowed values specified by OLIF Administrative, terminological, linguistic Administrative, terminological, linguistic
values based on values based on General industry standardsGeneral industry standards
E.g., allowed values for E.g., allowed values for datedate derived from derived from recommendations from ISO 8601:1988recommendations from ISO 8601:1988
MT/Terminology standardsMT/Terminology standards E.g., suggested values for E.g., suggested values for subject fieldsubject field adapted from adapted from
ECEC Widely-recognized linguistic standardsWidely-recognized linguistic standards
E.g., allowed values for E.g., allowed values for gender gender based on based on longstanding gender description for European longstanding gender description for European languageslanguages
User Extensions: User Extensions: The OLIF Data Category The OLIF Data Category
RegistryRegistry Users may declare and use their own Users may declare and use their own
values for certain data categories:values for certain data categories: Subject fieldSubject field Semantic readingSemantic reading Morphological structureMorphological structure Part of speechPart of speech InflectionInflection AspectAspect Syntactic typeSyntactic type Syntactic frameSyntactic frame Semantic typeSemantic type Concept hierarchyConcept hierarchy
Organizing Based on Organizing Based on ConceptConcept
Users may link monolingual entries Users may link monolingual entries via a concept identifiervia a concept identifier
These IDs can be used to organize These IDs can be used to organize entries as equivalent word senses entries as equivalent word senses associated with the same concepts associated with the same concepts rather than source word senses rather than source word senses associated with transfers. associated with transfers.
Entries Linked by ConceptEntries Linked by Concept<entry ConceptUserId=<entry ConceptUserId= ” ”0731F16CCCD2D3119B4D0731F16CCCD2D3119B4D”>”> <mono><mono>
<keyDC><keyDC> <canForm><canForm>tabletable</canForm> </canForm> <language><language>enen</language> </language>
<ptOfSpeech><ptOfSpeech>nounnoun</ptOfSpeech> </ptOfSpeech> <subjField><subjField>generalgeneral</subjField> </subjField> <semReading><semReading>8686</semReading> </semReading> </keyDC></keyDC>
</mono></mono></entry></entry>
<entry ConceptUserId=<entry ConceptUserId= ” ”0731F16CCCD2D3119B4D0731F16CCCD2D3119B4D”>”> <mono><mono>
<keyDC><keyDC> <canForm><canForm>TabelleTabelle</canForm> </canForm> <language><language>dede</language> </language> <ptOfSpeech><ptOfSpeech>nounnoun</ptOfSpeech> </ptOfSpeech> <subjField><subjField>generalgeneral</subjField> </subjField> <semReading><semReading>8686</semReading> </semReading> </keyDC></keyDC>
</mono></mono></entry></entry>
What’s Available to the OLIF What’s Available to the OLIF User?User?
On On www.olif.netwww.olif.net Complete XML DTD for downloadComplete XML DTD for download Hyperlinked DTD for viewingHyperlinked DTD for viewing Graphical view of structure of DTDGraphical view of structure of DTD Current specification for OLIF v.2Current specification for OLIF v.2 Formalization of OLIF data categoriesFormalization of OLIF data categories Alphabetic list of XML elements and attributesAlphabetic list of XML elements and attributes Fixed and recommended values for elements Fixed and recommended values for elements
and attributesand attributes Guidelines for formulating canonical formsGuidelines for formulating canonical forms Sample OLIF entriesSample OLIF entries
Using OLIFUsing OLIF
Some applications:Some applications: SAP has implemented an OLIF converter SAP has implemented an OLIF converter
to exchange terminological data from its to exchange terminological data from its central termbase SAPtermcentral termbase SAPterm
MT developers in OLIF2 Consortium MT developers in OLIF2 Consortium currently developing OLIF converters currently developing OLIF converters (Comprendium, Systran)(Comprendium, Systran)
OLIF User Forum = 60+ membersOLIF User Forum = 60+ members
What’s New: XML SchemaWhat’s New: XML Schema
OLIF XSD offersOLIF XSD offers 40+ built-in data types40+ built-in data types Allows creation of user-defined data Allows creation of user-defined data
typestypes Supports inheritanceSupports inheritance
What’s New: The OLIF APIWhat’s New: The OLIF API
Based on OLIF XSD, Java classes Based on OLIF XSD, Java classes createdcreated
Supports:Supports: Converting .csv files to OLIFConverting .csv files to OLIF Converting from XML format to OLIFConverting from XML format to OLIF Creating OLIF documents from scratchCreating OLIF documents from scratch Modifying OLIF documentsModifying OLIF documents
What to Expect this Year from What to Expect this Year from OLIFOLIF
OLIF XSD and API are available to the OLIF XSD and API are available to the user from user from www.olif.netwww.olif.net
OLIF web site upgraded, updatedOLIF web site upgraded, updated Requirements for modeling Japanese Requirements for modeling Japanese
entries integratedentries integrated
OLIF User ForumOLIF User Forum
Users of OLIF can access and post Users of OLIF can access and post questions, messages and sample data questions, messages and sample data from the OLIF group site:from the OLIF group site:
http://groups.yahoo.com/group/http://groups.yahoo.com/group/olifConsortium/olifConsortium/