olif v2 gr. thurmair april 2000. 2 olif april 2000 olif: overview rationale principles entries...

25
OLIF V2 Gr. Thurmair April 2000

Upload: katherine-pruitt

Post on 27-Mar-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

OLIF V2

Gr. ThurmairApril 2000

Page 2: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

2 OLIF April 2000

OLIF: Overview

Rationale Principles Entries Descriptions Header Examples Status

Page 3: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

3 Rationale

The Exchange Problemtext

text

memory lookup

MT

term lookuptext

• the same text is processed with different tools• how do these tools communicate?

Page 4: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

4 Rationale

The Resource ProblemTranslation

Project

technical docuMT system 1

user docuMT system 2

marketing docu 1human / term bank 3

marketing docu 2human / term bank 4

• how often will the same term be stored?• who pays for redundant maintenance?

Page 5: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

5 Rationale

The Migration Problem

use of translation tool 1 problems

use of translation tool 2

• do you have to re-build your language resources?• who pays for rebuilding lexicons and memories?

Page 6: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

6 Principles

Problems in Exchange

Different purposes of exchange data import / export data validation

Different content of exchange terminological data, different for each system lexicographical data, different for each system

Different structures of exchange files trend towards markup structures (SGML/XML)

Page 7: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

7 Principles

OLIF: Principles

Keep it simple! flat feature value structures standard software environment

Keep it pragmatic! worry only about what’s there bottom-up: compare what systems have

Page 8: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

8 Principles

OLIF V2 formalism

Define a representation formalism: XML

general format; processing tools available but: overhead in markups

=> not suitable for very large files Coding standard: UTF-8

File structure header: globals, defaults, definitions body: sequence of entries

Page 9: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

9 Principles

OLIF principles

Entries are concept-based i.e. we describe word senses:

different readings -> different entries

Entries have (monolingual) descriptions both for MT and terminology

Entries have links to other entries inner-language: crossreference links intra-language: transfers (multilingual, directed)

Page 10: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

10 Entries

Entry Structure

“central” information definition features administrative features

monolingual / linguistic feature set terminological feature set transfer features cross-references / links

Page 11: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

11 Entries

Definition of the entry

Obligatory features: language

(only language or also locale?)

canonical form do we need guidelines (multiwords)

part of speech only open word classes: N, V, A, Adv, Prep

domain information (semantics) => a common top level classification?

reading no.

Page 12: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

12 Descriptions

Scope of the descriptions

minimal linguistic descriptions features which everybody has / needs

minimal terminology descriptions feature set of e.g. Interval

minimal transfer descriptions equitype, tests, transfers

minimal thesaurus / ontology relations ISO standards

additional fields for “personal” use

Page 13: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

13 Descriptions

Linguistic Descriptions

Morphological Features entry type

abbreviation, single word, compound, multiword)

inflection class enumeration of inflection patterns, per category

gender (special) number

singulare / plurale tantum

degree / comparative

Page 14: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

14 Descriptions

Linguistic descriptions

Syntactic Features Syntactic Type

(subcategorisation of part-of-speech)

Syntactic Frame Argument structures (DObj, PObj-for, ...)

(Transitivity) intransitive, transitive

Page 15: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

15 Descriptions

Linguistic Descriptions

Semantic Features Semantic type

for subclasses only? who has / needs it?

Page 16: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

16 Descriptions

Terminological Descriptions

Minimum needed to validate an entry(Interval) Definition Context Scope Comment / Note (validation status)

(a three-level hierarchy)

Page 17: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

17 Descriptions

Transfer Descriptions

Equivalence type full - partial (subset / superset) - none for reversible entries

Tests and Actions (to be worked out)

Comment (Definition of the “target” link)

Page 18: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

18 Descriptions

Cross-Reference Descriptions

Linktype thesaurus relations

broader / narrower / synonym / related

additional customisable relations abbreviation_for, forbidden, outdated

Definition of the “target” link

Page 19: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

19 Descriptions

Administrative Information

Source of Entry string

Author creation author last modofication

Date creation date last modification date

Page 20: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

20 Header

Header Information

Definition of Encoding given in the XML statement (UTF-8)

Definition of features / values used Definition of default values

Page 21: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

21 Example

Example (1)

<ENTRY> <MONO>

<LG> de </LG> <CAN> Brot </CAN>

<CAT> noun </CAT> <SA> gv </SA>

.... </MONO>

</ENTRY>

Page 22: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

22 Example

Example 2a

<Entry><MONO>.... <CAN> offshore account </CAN> <CAT> noun </CAT> <SA> Money-Laundering </SA> ...</MONO><XFR> <CAN> compte en banque à l‘étranger </CAN>

<LG> fr </LG> <CAT> noun </CAT> <SA> Money-Laundering </SA> </XFR><XFR> <CAN> Auslandskonto </CAN> <LG> de </LG> ...</Entry>

Page 23: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

23 Example

Example (2b)

<Entry><MONO> <CAN> compte en banque à l‘étranger </CAN> <CAT>noun </CAT> <SA>Money-Laundering </SA> ...</MONO><XFR> <CAN>offshore account </CAN> <LG>en </LG> <CAT>noun <SA>Money-Laundering </SA> </XFR><XFR><CAN>Auslandskonto </CAN> <LG>de </LG> ...</Entry>

Page 24: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

24 Status

OLIF: Status

Implementation of a central DB Implementation of OLIF parser & generator

BUT: different “flavours” of OLIF

dependent on projects (OTELO - Aventinus) different formalisms (own, SGML, XML)

Page 25: OLIF V2 Gr. Thurmair April 2000. 2 OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status

25 Status

Status

Verification of OLIF MT lexicons (Logos, T1) Term Bases (SAPterm, DanTerm)

Converter prototypes T1 <-> OLIF, Logos <-> OLIF

Term Lookup systems based on OLIF-type database

Comparisons with other formats