tmf - a tutorial tmf - terminological markup framework laurent romary - laboratoire loria
TRANSCRIPT
![Page 1: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/1.jpg)
TMF - a tutorial
TMF - Terminological Markup Framework
Laurent Romary - Laboratoire Loria
![Page 2: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/2.jpg)
Three parts
Part 1: Basic concepts
Part 2: Representing data categories
Part 3: Designing (schemas and) filters
![Page 3: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/3.jpg)
TMF - a tutorialPart 1: Basic concepts
TMF - Terminological Markup Framework
Laurent Romary - Laboratoire Loria
![Page 4: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/4.jpg)
• Background - ISO etc.
• The need for abstraction
• Structure and content of terminological data - picture virtual-actual
• The meta-model (structural skeleton)
• Describing data categories
• Styles and vocabularies
• XTMF as a mapping tool - examples
• Further work: extending the model to a wider scope (language engineering)
![Page 5: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/5.jpg)
Overview
![Page 6: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/6.jpg)
General principles
Expressing constraints on the representation of computerized terminologies
• What is the underlying structure of computerized terminologies?
• Which data-category is used and under which conditions?
Maintaining interoperability between representations
• Providing a conceptual tool to compare two given formats
![Page 7: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/7.jpg)
Definitions
TMF: Terminological Mark-up Framework• Definition of underlying structures and mechanisms
needed for the computer representation of terminological data
• Independence with regards any specific format
GMT: Generic Mapping Tool• Abstract XML format equivalent to the underlying
model of TMF
![Page 8: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/8.jpg)
Definitions - cont.
TML: Terminological Mark-up Language• One specific representation format generated within
TMF
• E.g.: DXLT is a possible TML
![Page 9: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/9.jpg)
A family of formats
TMF
TML1 TML2 TML3 TMLi…
(DXLT)(Geneter)
GMT
![Page 10: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/10.jpg)
Meta-model
Representing the underlying structure of terminological data
![Page 11: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/11.jpg)
*
* 1*
1
**1
*1
*
*
11
*
1
0:1
Terminological Data Collection
Global Information
Terminological Entry
Complementary Information
Terminology- related
Information Language Section
Term Section
Term ComponentSection
![Page 12: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/12.jpg)
Meta-model description
Terminological Data Collection (TDC) • A collection of data containing information on
concepts of specific concept fields.
Terminological Entry (TE) • An entry containing information on terminological
units (i.e., subject-specific concepts, terms, etc.).» Example: Domain description, Conceptual relations
etc.
![Page 13: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/13.jpg)
Meta-model description - cont.
Language Section (LS) • The part of a terminological entry containing
information related to one language.» Note: One terminological entry may contain
information on one, two or more languages.
Term Section (TS) • The part of a language section giving information
about a term.» Example: Term status (e.g. abbreviation), Usage
information (temporal, geographical etc.)
![Page 14: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/14.jpg)
Meta-model description - cont.
Term Component Section (TCS) • The section of a term section giving information
about components of a term.» Example: Component grammatical information (Part
of speech)
![Page 15: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/15.jpg)
Meta-model description - cont.
Global Information (GI) • Technical and administrative information applying
to the entire data collection .» Example: title of the data collection, revision history
Complementary Information (CI)• Information supplementary to terminology-related
information.» Example: bibliographical source, documentary
language or description thereof.
![Page 16: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/16.jpg)
The structural skeleton
Terminological Data Collection (TDC)
Global Information (GI) Complementary Information (CI)
Terminological Entry (TE)
Language Section (LS)
Term Level (TL)
Term Component Level (TCL)
*
*
*
*
![Page 17: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/17.jpg)
How does this work?
Walking through an example…
![Page 18: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/18.jpg)
DXLT example<termEntry id='ID67'>
<descrip type='subjectField‘>manufacturing</descrip><descrip type='definition'>A value between 0 and 1 used in ...</descrip><langSet lang='en'>
<tig><term>alpha smoothing factor</term><termNote
type='termType'>fullForm</termNote></tig>
</langSet><langSet lang='hu'>
<tig><term>Alfa ...</term>
</tig></langSet>
</termEntry>
![Page 19: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/19.jpg)
Identifying the structural skeletonid=‘ID67’ [attribute]subjectField=‘ manufacturing ’ [typedElement]definition=‘A value…’ [typedElement]
lang=‘ hu ’ [attribute]lang=‘ en ’ [attribute]
term=‘…’ [element]
term=‘alpha smoothing factor’ [element]termType=‘fullForm’ [typedElement]
TE
LS
TStig
langSet
tig
langSet
termEntry
TE: Terminological EntryLS: Language SectionTS: Term Section
![Page 20: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/20.jpg)
TMF information model
TE
TS
LSLS
TS
id=‘ID67’subjectField=‘ manufacturing ’definition=‘A value…’
lang=‘ hu ’lang=‘ en ’
term=‘…’term=‘alpha smoothing factor’termType=‘fullForm’
![Page 21: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/21.jpg)
GMT representation<struct type=“TE”>
<feat type=“id”>ID67</feat><feat type=“subjectField”>manufacturing</feat><feat type=“definition”>A value between 0 and 1 used in ...</feat><struct type=“LS”>
<feat type=“lang”>en</feat><struct type=“TS”>
<feat type=“term”>alpha smoothing factor</feat> <feat type=“termType”>fullForm</feat>
</struct></struct><struct type=“LS”>
<feat type=“lang”>hu</feat><struct type=“TS”>
<feat type=“term”>Alfa ...</feat></struct>
</struct></struct>
![Page 22: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/22.jpg)
Structural Skeleton DCRref (ISO12620)
DCRi
- DCRref subset- Application dependent DCR
Interoperability conditionsGMT
Dialecti
- Expansion structures- DatCat structural styles- DatCat vocabulary styles
Terminological Markup Language (TML)
![Page 23: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/23.jpg)
TML à la mode ISO
– Ingredients– A structural skeleton
» (take the TMF Metamodel)– A reference Data Category Registry
» ISO 12620 is a good place to find one
– Recette– Choose some data categories from the registry
» You can even constrain the values of your datcats– Associate a style and vocabulary to each datcat
» You can inspire yourself from others (DXLT)– Serve it hot to your software guy with a piece of SALT software
![Page 24: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/24.jpg)
GMT
Generic Mapping Tool
![Page 25: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/25.jpg)
Background
Interoperability principle– If any two TMLs have exactly the same DCS,
even though they differ radically in style and vocabulary, they are equivalent.
Consequence– It is always possible to define a filter from one
TML to another when they are interoperable• GMT is the intermediate representation to do so
![Page 26: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/26.jpg)
From one TML to another
GMT - Generic mapping tool– an abstract XML representation
• identification of levels– <struct type=“LS”>…</struct>
» a recursive element
• representation of data-categories– <feat type=“definition”>…</feat>
![Page 27: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/27.jpg)
The tmf element
• Description:– The tmf element is the root element for any valid XTMF
document. It contains both the global information that corresponds to a terminological data collection, the collection itself, and the complementary information comprising external resources in particular, which are needed for describing the various terminological entries.
• Content model: <!ELEMENT tmf (struct*)>
![Page 28: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/28.jpg)
The struct element
• Description – The struct element should be used to represent a locus in a
given structural skeleton. The struct element is recursive and may also contain feat and/or brack elements to express attributes belonging to the corresponding level of the meta model.
• Attributes:– type: level in the meta model (TDC, TE, LS, TS or TCS)
• Content model:<!ELEMENT struct ((feat|brack)*, struct*)><!ATTLIST struct type (TDC|TE|LS|TS|TCS) #REQUIRED>
![Page 29: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/29.jpg)
The feat element
• Description – The feat element represents any feature that is either
directly attached to a locus in the structural skeleton (represented by a struct element).
• The feat element accepts the following attributes:– type: categorises the feat element through the reference to
the name of the corresponding data category.
• Content model (DTD) – <!ELEMENT feat (#PCDATA | annot)*>– <!ATTLIST feat type CDATA #REQUIRED>
![Page 30: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/30.jpg)
Bracketing information
![Page 31: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/31.jpg)
Rationale
Describing the context of use of a given data category– Example 1:
» Classification Code: AG1
» Classification System: Lenoc
– Example 2:» Transaction type: modification
» Responsible person: Mr. X
» Date: 23 avril 1988
![Page 32: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/32.jpg)
Formal model
Hierarchical feature structure– Constraint: Type given by ‘ main ’ (first) data
category
ClassificationGrp ClassificationCode AG1
ClassificationSystem Lenoc
![Page 33: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/33.jpg)
GMT description
• Bracketing features
<brack><feat type=“classificationCode“>xxx</feat><feat type=“classificationSystem“>Lenoc</feat>
</brack>
Rem: no type for ‘ brack ’
![Page 34: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/34.jpg)
Annotating content
![Page 35: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/35.jpg)
Rationale
Why should we annotate specific content?– To identify components which are not
explicitly expressed as a specific part of a terminological entry
• E.g.: Characteristics of a concept
– To relate a component to another entry or an external resource
• E.g.: bibliographical reference
![Page 36: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/36.jpg)
Formal model
?
![Page 37: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/37.jpg)
XML model
Mixed content– <!element feat (#PCDATA|annot)*>
• Attributes– type: categorises the annot element through the reference
to the name of the corresponding data category.
• Rem.: Problems with mixed content in XML schemas
![Page 38: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/38.jpg)
GMT description
• Annotating information<feat type=“definition”>pencil whose<annot type=“characteristic”> casing </annot>
is fixed around a cental graphite medium which is used for writing or making marks
</feat>
![Page 39: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/39.jpg)
![Page 40: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/40.jpg)
Representation of relations
![Page 41: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/41.jpg)
XML links
Transparency as to the actual location of a resource (internal vs. external)
Maybe useful to identify ontologies– External links between concepts
entry i
entry j
entry i
entry j
![Page 42: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/42.jpg)
Representation in GMT
Two attributes• Target - a pointer to a ‘ struct ’ element in the case
the feature expresses a relation between the current locus and another locus in the structural skeleton;
• Source - a pointer to a ‘ struct ’ element in cases where the feature is described external to the locus to which it is supposed to be attached.
![Page 43: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/43.jpg)
Some examples
• Simple atomic feature attached directly to a locus:<feat type="conceptIdentifier">ID67</feat>
• Basic feature whose value is a reference to a locus in the structural skeleton:
<feat type="partWhole" target="TE24"/>
• Basic feature anchored at the locus in the structural skeleton whose id attribute value is “TE24”:
<feat type="conceptIdentifier" source="TE24">ID67</feat>
• Compound feature anchored at “TE 23” and which makes reference to “TE 24”:
<feat type="partWhole" source="TE23" target=“TE24”/>
![Page 44: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/44.jpg)
Styles and vocabularies
![Page 45: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/45.jpg)
Structural Skeleton DCRref (ISO12620)
DCRi
- DCRref subset- Application dependant DCR
Interoperability conditionsCML
Dialecti
- Expension structures- DatCat structural styles- DatCat vocabulary styles
Terminological Markup Language (TML)
![Page 46: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/46.jpg)
Implementating a DatCat
– Definitions:• ‘ style ’ — The way a given DatCat is implemented as
an XML object…• ‘ vocabulary ’ — symbols needed to express the
implementation of a given DatCat in its associated style ;
– E.g.:» DatCat: /definition/» Vocabulary = [def]» Style = Element» <def>pencil whose casing …</def>
DatCat value
![Page 47: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/47.jpg)
Implementating a DatCat (Cont.)
– Definition:• ‘ anchor ’ — the XML element(s) to which the
implementation of a given DatCat can be attached– E.g.:
<tig>
<term>alpha smoothing factor</term>
</tig>
![Page 48: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/48.jpg)
Styles - element
Element• Def.: The Datcat is implemented as an element,
child of its anchor
• Vocabularies : the name of the corresponding element
• E.g.:<def>pencil whose casing …</def>
<term>alpha smoothing factor</term>
DatCat value
![Page 49: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/49.jpg)
Styles - typedElement
typedElement• Def.: The Datcat is implemented as a generic XML
element, which is a child of the anchor, and which is further specified by means of a type attribute. Its content is the value of the feature in the structural skeleton.
• Vocabularies : the element name and the value of the type attribute
• E.g.:<termNote type=‘definition’>Bla, bla, bla…</termNote>
DatCat value
![Page 50: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/50.jpg)
Styles - attribute
Attribute• Def.: The Datcat is implemented as an attribute of
its anchor
• Vocabularies : the name of the corresponding attribute
• E.g.:<termEntry id='ID67'> … </termEntry>
<ldl language ='en'> … </ldl>
DatCat value
![Page 51: TMF - a tutorial TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria](https://reader033.vdocuments.us/reader033/viewer/2022061616/56649dd15503460f94ac74fd/html5/thumbnails/51.jpg)
ValuedElement TypedValuedElement