lrec 2000 athens; gerhard budin and alan melby accessibility of multilingual terminological...

Post on 18-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

LREC 2000 Athens; Gerhard Budin and Alan Melby

Accessibility of Multilingual Terminological Resources

Current Problems and Prospects for the Future

Gerhard BudinUniversity of Vienna

<gerhard.budin@univie.ac.at>

Alan MelbyBrigham Young University

<akm@byu.edu>

LREC 2000 Athens; Gerhard Budin and Alan Melby

Diversity Problems of MTRs

• Incompatible ontologies

• Diverse categorizations of terminological information

• Varieties of data models

• Multitude of formats and ‚standards‘

-> lack of interoperability, portability

across applications, domains, platforms, etc.

Terminology Interchange

• Pre-requisite for – knowledge sharing– co-operative work flows– marketing, distribution– maintenance– interoperability (data management across MT,

TM, CL, TA, IM, KM, etc.)

• R&D since 1980s (EU, ISO, TEI)

LREC 2000 Athens; Gerhard Budin and Alan Melby

Barriers to terminological knowledge sharing

• Legal barriers (copyright, IPR)

• economic barriers (pricing, billing)

• information barriers (lack of information)

• technical barriers (lack of cross-platform/-system/-format (im-/ex-)portability, etc.)

• methodological barriers (data modelling, diversity in work principles, methods)

Multitude of Formats

• Document formats

• Database formats

• Mark-up formats

• for lexical/terminological data

MATER, TEI-lex/term, NTRF, OLIF, MARTIF, TBX, IIF, TRANSTERM, GENETER, EURAMIS etc.)

SALT-XLT

• Standards-based Access to Multilingual Lexicons and Terminologies - a broad-based initiative

• aiming at CONVERGENCE, INTEROPERABILITY

• International Consortium of industry partners, universities, NGOs/IOs/IGOs, professional associations

– European group: shared-cost RTD project called SALT in the 5th Framework Programme (IST-HLT), started in January 2000 (funding for 2 years)

– US group (funding expected)

Features of the SALT Initiative• User-oriented (industry, administration, multiple

user-groups)• Oriented towards integrating applications• Ontology mapping component• Web-based• free-ware approach• XML, XLST, Java• Standards-based (integrating HLT standards,

concurrent development with ISO/TC 37)

LREC 2000 Athens; Gerhard Budin and Alan Melby

XLT• XML-based Lexical/Terminological framework

format• A FAMILY of (interoperable) formats

– includes or is based on or overlaps with• TEI

• MARTIF

• MSC

• OLIF

• Geneter

• TBX, etc.

XLT

Lex/term Resources,Diverse Formats Industry Sectors

Language

Server / Toolkit

InformationTechnologyDevelopers

ConsultingServices

Broader SocialImpact

Enhanced Access to Multilingual Resources

for Language Technology

TRANSTERM

OLIFMARTIF

INTERVALGENETER

PROPRIETARYFORMATS

EXPORTTOOLS

IMPORTTOOLS

VIEWERS

MERGE/QUERYFUNCTIONS

FACILITATION

ACCESS

TAGGING

CONVERSION

INFO BROKERAGE

MARKUP

ONTOLOGIES

AUTHORINGMT

TM

IM

TMS

TRANSLATION

L10NI18N

INTEGRATION ACCESS

Workflow in SALT

Analysis of existing formats (sample data sets, data elements/structures, ontologies)

PM Mapping Clustering

QM

Utilities, tools, website

external assessment, evaluation

dissemination, implementation

LREC 2000 Athens; Gerhard Budin and Alan Melby

Features of XLT

• XML-based (since this is the dominant data exchange transport mechanism today)

• standards-based• corresponding relational data model for integrated

database to facilitate loading• flexible in order to support maintenance of the

format as needs evolve• language industry support

LREC 2000 Athens; Gerhard Budin and Alan Melby

Levels of Modellingin the SALT Initiative

• Level 1: meta-model consisting of a

– structural meta-model (ORM, UML)and a

– and a content meta-model:• metadata registry based on ISO 12620, following the

methods of ISO 11179

• co-operation with the SCHEMAS project (registry of XML schemas), JTC 1/SC 32, etc.)

LREC 2000 Athens; Gerhard Budin and Alan Melby

LREC 2000 Athens; Gerhard Budin and Alan Melby

Levels of Modellingin the SALT Initiative

• Level 2: conceptual data model (user-group needs analysis level)

– implementation modality (e.g. XML intermediate format or relational database) is selected for user group

– a core structure compatible with the meta-model but going into more detail is defined for each modality

– particular set of data categories and constraints on them is selected according to user needs

• e.g. Reltef (E-R diagram), XLT (DTD, XML schema, data-category specifications)

LREC 2000 Athens; Gerhard Budin and Alan Melby

Levels of Modellingin the SALT Initiative

• Level 3: Specific data model / format– core structure, a data category specification, and a

representation style are combined to define a member of the SALT family

– each member is fully interoperable with other members that use the same data category specification

• e.g. concrete relational database implementations, specific XLT implementations, subsets for industrial user groups such as TBX

LREC 2000 Athens; Gerhard Budin and Alan Melby

Cooperation and ConcertationThe SALT consortium (U Vienna, U AS Cologne, U Surrey,

LORIA Nancy, Termisti Brussels, EA Bozen/Bolzano, BYU Provo) cooperates with

• other HLT or IST projects (TQPro, Schemas, etc.)

• other EU-projects (MLIS) (TDCNet, GEMA, DINT, etc.)

• ELRA, EAFT

• EU Commission, UN-Jiamcatt group

• TEI, ISO, JTC 1, W3C

• LISA (OSCAR) including companies other than IT from other industries (telecom, automotive eng.)

• FIT, etc.

LREC 2000 Athens; Gerhard Budin and Alan Melby

Conclusions

• The SALT project contributes to a convergence process that is badly needed in the area of multilingual lex/term resources

• technical/methodological convergence resulting in interoperability and accessibility of MTRs supports language industry markets

top related