lrec 2000 athens; gerhard budin and alan melby accessibility of multilingual terminological...

17
LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard Budin University of Vienna <[email protected]> Alan Melby Brigham Young University <[email protected]>

Upload: brett-strickland

Post on 18-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

LREC 2000 Athens; Gerhard Budin and Alan Melby

Accessibility of Multilingual Terminological Resources

Current Problems and Prospects for the Future

Gerhard BudinUniversity of Vienna

<[email protected]>

Alan MelbyBrigham Young University

<[email protected]>

Page 2: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

LREC 2000 Athens; Gerhard Budin and Alan Melby

Diversity Problems of MTRs

• Incompatible ontologies

• Diverse categorizations of terminological information

• Varieties of data models

• Multitude of formats and ‚standards‘

-> lack of interoperability, portability

across applications, domains, platforms, etc.

Page 3: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

Terminology Interchange

• Pre-requisite for – knowledge sharing– co-operative work flows– marketing, distribution– maintenance– interoperability (data management across MT,

TM, CL, TA, IM, KM, etc.)

• R&D since 1980s (EU, ISO, TEI)

Page 4: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

LREC 2000 Athens; Gerhard Budin and Alan Melby

Barriers to terminological knowledge sharing

• Legal barriers (copyright, IPR)

• economic barriers (pricing, billing)

• information barriers (lack of information)

• technical barriers (lack of cross-platform/-system/-format (im-/ex-)portability, etc.)

• methodological barriers (data modelling, diversity in work principles, methods)

Page 5: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

Multitude of Formats

• Document formats

• Database formats

• Mark-up formats

• for lexical/terminological data

MATER, TEI-lex/term, NTRF, OLIF, MARTIF, TBX, IIF, TRANSTERM, GENETER, EURAMIS etc.)

Page 6: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

SALT-XLT

• Standards-based Access to Multilingual Lexicons and Terminologies - a broad-based initiative

• aiming at CONVERGENCE, INTEROPERABILITY

• International Consortium of industry partners, universities, NGOs/IOs/IGOs, professional associations

– European group: shared-cost RTD project called SALT in the 5th Framework Programme (IST-HLT), started in January 2000 (funding for 2 years)

– US group (funding expected)

Page 7: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

Features of the SALT Initiative• User-oriented (industry, administration, multiple

user-groups)• Oriented towards integrating applications• Ontology mapping component• Web-based• free-ware approach• XML, XLST, Java• Standards-based (integrating HLT standards,

concurrent development with ISO/TC 37)

Page 8: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

LREC 2000 Athens; Gerhard Budin and Alan Melby

XLT• XML-based Lexical/Terminological framework

format• A FAMILY of (interoperable) formats

– includes or is based on or overlaps with• TEI

• MARTIF

• MSC

• OLIF

• Geneter

• TBX, etc.

Page 9: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

XLT

Lex/term Resources,Diverse Formats Industry Sectors

Language

Server / Toolkit

InformationTechnologyDevelopers

ConsultingServices

Broader SocialImpact

Enhanced Access to Multilingual Resources

for Language Technology

TRANSTERM

OLIFMARTIF

INTERVALGENETER

PROPRIETARYFORMATS

EXPORTTOOLS

IMPORTTOOLS

VIEWERS

MERGE/QUERYFUNCTIONS

FACILITATION

ACCESS

TAGGING

CONVERSION

INFO BROKERAGE

MARKUP

ONTOLOGIES

AUTHORINGMT

TM

IM

TMS

TRANSLATION

L10NI18N

INTEGRATION ACCESS

Page 10: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

Workflow in SALT

Analysis of existing formats (sample data sets, data elements/structures, ontologies)

PM Mapping Clustering

QM

Utilities, tools, website

external assessment, evaluation

dissemination, implementation

Page 11: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

LREC 2000 Athens; Gerhard Budin and Alan Melby

Features of XLT

• XML-based (since this is the dominant data exchange transport mechanism today)

• standards-based• corresponding relational data model for integrated

database to facilitate loading• flexible in order to support maintenance of the

format as needs evolve• language industry support

Page 12: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

LREC 2000 Athens; Gerhard Budin and Alan Melby

Levels of Modellingin the SALT Initiative

• Level 1: meta-model consisting of a

– structural meta-model (ORM, UML)and a

– and a content meta-model:• metadata registry based on ISO 12620, following the

methods of ISO 11179

• co-operation with the SCHEMAS project (registry of XML schemas), JTC 1/SC 32, etc.)

Page 13: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

LREC 2000 Athens; Gerhard Budin and Alan Melby

Page 14: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

LREC 2000 Athens; Gerhard Budin and Alan Melby

Levels of Modellingin the SALT Initiative

• Level 2: conceptual data model (user-group needs analysis level)

– implementation modality (e.g. XML intermediate format or relational database) is selected for user group

– a core structure compatible with the meta-model but going into more detail is defined for each modality

– particular set of data categories and constraints on them is selected according to user needs

• e.g. Reltef (E-R diagram), XLT (DTD, XML schema, data-category specifications)

Page 15: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

LREC 2000 Athens; Gerhard Budin and Alan Melby

Levels of Modellingin the SALT Initiative

• Level 3: Specific data model / format– core structure, a data category specification, and a

representation style are combined to define a member of the SALT family

– each member is fully interoperable with other members that use the same data category specification

• e.g. concrete relational database implementations, specific XLT implementations, subsets for industrial user groups such as TBX

Page 16: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

LREC 2000 Athens; Gerhard Budin and Alan Melby

Cooperation and ConcertationThe SALT consortium (U Vienna, U AS Cologne, U Surrey,

LORIA Nancy, Termisti Brussels, EA Bozen/Bolzano, BYU Provo) cooperates with

• other HLT or IST projects (TQPro, Schemas, etc.)

• other EU-projects (MLIS) (TDCNet, GEMA, DINT, etc.)

• ELRA, EAFT

• EU Commission, UN-Jiamcatt group

• TEI, ISO, JTC 1, W3C

• LISA (OSCAR) including companies other than IT from other industries (telecom, automotive eng.)

• FIT, etc.

Page 17: LREC 2000 Athens; Gerhard Budin and Alan Melby Accessibility of Multilingual Terminological Resources Current Problems and Prospects for the Future Gerhard

LREC 2000 Athens; Gerhard Budin and Alan Melby

Conclusions

• The SALT project contributes to a convergence process that is badly needed in the area of multilingual lex/term resources

• technical/methodological convergence resulting in interoperability and accessibility of MTRs supports language industry markets