lrec 2000 athens; gerhard budin and alan melby accessibility of multilingual terminological...
TRANSCRIPT
![Page 1: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/1.jpg)
LREC 2000 Athens; Gerhard Budin and Alan Melby
Accessibility of Multilingual Terminological Resources
Current Problems and Prospects for the Future
Gerhard BudinUniversity of Vienna
Alan MelbyBrigham Young University
![Page 2: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/2.jpg)
LREC 2000 Athens; Gerhard Budin and Alan Melby
Diversity Problems of MTRs
• Incompatible ontologies
• Diverse categorizations of terminological information
• Varieties of data models
• Multitude of formats and ‚standards‘
-> lack of interoperability, portability
across applications, domains, platforms, etc.
![Page 3: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/3.jpg)
Terminology Interchange
• Pre-requisite for – knowledge sharing– co-operative work flows– marketing, distribution– maintenance– interoperability (data management across MT,
TM, CL, TA, IM, KM, etc.)
• R&D since 1980s (EU, ISO, TEI)
![Page 4: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/4.jpg)
LREC 2000 Athens; Gerhard Budin and Alan Melby
Barriers to terminological knowledge sharing
• Legal barriers (copyright, IPR)
• economic barriers (pricing, billing)
• information barriers (lack of information)
• technical barriers (lack of cross-platform/-system/-format (im-/ex-)portability, etc.)
• methodological barriers (data modelling, diversity in work principles, methods)
![Page 5: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/5.jpg)
Multitude of Formats
• Document formats
• Database formats
• Mark-up formats
• for lexical/terminological data
MATER, TEI-lex/term, NTRF, OLIF, MARTIF, TBX, IIF, TRANSTERM, GENETER, EURAMIS etc.)
![Page 6: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/6.jpg)
SALT-XLT
• Standards-based Access to Multilingual Lexicons and Terminologies - a broad-based initiative
• aiming at CONVERGENCE, INTEROPERABILITY
• International Consortium of industry partners, universities, NGOs/IOs/IGOs, professional associations
– European group: shared-cost RTD project called SALT in the 5th Framework Programme (IST-HLT), started in January 2000 (funding for 2 years)
– US group (funding expected)
![Page 7: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/7.jpg)
Features of the SALT Initiative• User-oriented (industry, administration, multiple
user-groups)• Oriented towards integrating applications• Ontology mapping component• Web-based• free-ware approach• XML, XLST, Java• Standards-based (integrating HLT standards,
concurrent development with ISO/TC 37)
![Page 8: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/8.jpg)
LREC 2000 Athens; Gerhard Budin and Alan Melby
XLT• XML-based Lexical/Terminological framework
format• A FAMILY of (interoperable) formats
– includes or is based on or overlaps with• TEI
• MARTIF
• MSC
• OLIF
• Geneter
• TBX, etc.
![Page 9: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/9.jpg)
XLT
Lex/term Resources,Diverse Formats Industry Sectors
Language
Server / Toolkit
InformationTechnologyDevelopers
ConsultingServices
Broader SocialImpact
Enhanced Access to Multilingual Resources
for Language Technology
TRANSTERM
OLIFMARTIF
INTERVALGENETER
PROPRIETARYFORMATS
EXPORTTOOLS
IMPORTTOOLS
VIEWERS
MERGE/QUERYFUNCTIONS
FACILITATION
ACCESS
TAGGING
CONVERSION
INFO BROKERAGE
MARKUP
ONTOLOGIES
AUTHORINGMT
TM
IM
TMS
TRANSLATION
L10NI18N
INTEGRATION ACCESS
![Page 10: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/10.jpg)
Workflow in SALT
Analysis of existing formats (sample data sets, data elements/structures, ontologies)
PM Mapping Clustering
QM
Utilities, tools, website
external assessment, evaluation
dissemination, implementation
![Page 11: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/11.jpg)
LREC 2000 Athens; Gerhard Budin and Alan Melby
Features of XLT
• XML-based (since this is the dominant data exchange transport mechanism today)
• standards-based• corresponding relational data model for integrated
database to facilitate loading• flexible in order to support maintenance of the
format as needs evolve• language industry support
![Page 12: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/12.jpg)
LREC 2000 Athens; Gerhard Budin and Alan Melby
Levels of Modellingin the SALT Initiative
• Level 1: meta-model consisting of a
– structural meta-model (ORM, UML)and a
– and a content meta-model:• metadata registry based on ISO 12620, following the
methods of ISO 11179
• co-operation with the SCHEMAS project (registry of XML schemas), JTC 1/SC 32, etc.)
![Page 13: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/13.jpg)
LREC 2000 Athens; Gerhard Budin and Alan Melby
![Page 14: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/14.jpg)
LREC 2000 Athens; Gerhard Budin and Alan Melby
Levels of Modellingin the SALT Initiative
• Level 2: conceptual data model (user-group needs analysis level)
– implementation modality (e.g. XML intermediate format or relational database) is selected for user group
– a core structure compatible with the meta-model but going into more detail is defined for each modality
– particular set of data categories and constraints on them is selected according to user needs
• e.g. Reltef (E-R diagram), XLT (DTD, XML schema, data-category specifications)
![Page 15: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/15.jpg)
LREC 2000 Athens; Gerhard Budin and Alan Melby
Levels of Modellingin the SALT Initiative
• Level 3: Specific data model / format– core structure, a data category specification, and a
representation style are combined to define a member of the SALT family
– each member is fully interoperable with other members that use the same data category specification
• e.g. concrete relational database implementations, specific XLT implementations, subsets for industrial user groups such as TBX
![Page 16: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/16.jpg)
LREC 2000 Athens; Gerhard Budin and Alan Melby
Cooperation and ConcertationThe SALT consortium (U Vienna, U AS Cologne, U Surrey,
LORIA Nancy, Termisti Brussels, EA Bozen/Bolzano, BYU Provo) cooperates with
• other HLT or IST projects (TQPro, Schemas, etc.)
• other EU-projects (MLIS) (TDCNet, GEMA, DINT, etc.)
• ELRA, EAFT
• EU Commission, UN-Jiamcatt group
• TEI, ISO, JTC 1, W3C
• LISA (OSCAR) including companies other than IT from other industries (telecom, automotive eng.)
• FIT, etc.
![Page 17: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard](https://reader036.vdocuments.us/reader036/viewer/2022082709/56649d145503460f949e939e/html5/thumbnails/17.jpg)
LREC 2000 Athens; Gerhard Budin and Alan Melby
Conclusions
• The SALT project contributes to a convergence process that is badly needed in the area of multilingual lex/term resources
• technical/methodological convergence resulting in interoperability and accessibility of MTRs supports language industry markets