unitran translation divergences - university of...

12
Machine Translation Classical and Statistical Approaches Session 4: Interlingua-based MT Jonas Kuhn Universität des Saarlandes, Saarbrücken The University of Texas at Austin [email protected] DGfS/CL Fall School 2005, Ruhr-Universität Bochum, September 19-30, 2005 Jonas Kuhn: MT 2 Session 4: Interlingua-based MT Dorr (1992, 1994): UNITRAN system Classification of divergences Lexical Conceptual Structure Translation mappings between syntactic structure and LCS representations Language-specific exceptions to translation mappings Jonas Kuhn: MT 3 UNITRAN Translation between Spanish, English and German (bidirectionally) Jonas Kuhn: MT 4 Translation divergences (1) Thematic divergence: E: I like Mary S: Maria me gusta a mi 'Mary pleases me' (2) Promotional divergence: E: John usually goes home S: Juan suele ira casa 'John tends to go home' (3) Demotional divergence: E: I like eating G: Ich esse gern 'I eat likingly' (4) Structural divergence: E: John entered the house S: Juan entró en la casa 'John entered in the house'

Upload: buicong

Post on 12-Jun-2018

236 views

Category:

Documents


0 download

TRANSCRIPT

Machine Translation– Classical and Statistical Approaches

Session 4: Interlingua-based MTJonas Kuhn

Universität des Saarlandes, SaarbrückenThe University of Texas at Austin

[email protected]

DGfS/CL Fall School 2005, Ruhr-Universität Bochum, September 19-30, 2005Jonas Kuhn: MT 2

Session 4: Interlingua-based MT

Dorr (1992, 1994): UNITRAN systemClassification of divergencesLexical Conceptual StructureTranslation mappings between syntactic structure and LCS representationsLanguage-specific exceptions to translation mappings

Jonas Kuhn: MT 3

UNITRANTranslation between Spanish, English and German (bidirectionally)

Jonas Kuhn: MT 4

Translation divergences(1) Thematic divergence:E: I like Mary S: Maria me gusta a mi

'Mary pleases me'(2) Promotional divergence:E: John usually goes home S: Juan suele ira casa

'John tends to go home'(3) Demotional divergence:E: I like eating G: Ich esse gern

'I eat likingly'(4) Structural divergence:E: John entered the house S: Juan entró en la casa

'John entered in the house'

Jonas Kuhn: MT 5

Translation divergences(5) Conflational divergence:E: I stabbed John S: Yo le di puñaladas a Juan

'I gave knife-wounds to John'(6) Categorial divergence:E: I am hungry G: Ich habe Hunger

'I have hunger'(7) Lexical divergence:E: John broke into the room

S: Juan forzó la entrada al cuarto'John forced (the) entry to the room'

Jonas Kuhn: MT 6

Lexical Conceptual Structure

Following Jackendoff (1983, 1990)

Example:English: Bill went into the houseLCS: GO(BILL,TO(IN(HOUSE)))Spanish: Bill entró a la casa.

Jonas Kuhn: MT 7

LCS – DefinitionsDefinition 1 (Dorr 1994)

A lexical conceptual structure (LCS) is a modified version of the representation proposed by Jackendoff (1983, 1990) that conforms to the following structural form:

This corresponds to the tree-like representation shown in Figure 2, in which (1) X' is the logical head; (2) W' is the logical subject; (3) Z'1 ... Z'n are the logical arguments; and (4) Q'1 ... Q'n are the logical modifiers.

Figure 2:In addition, T(φ) is the logical type (Event, State, Path, Position, etc.) corresponding to the primitive φ (CAUSE, LET, GO, STAY, BE, etc.);

Primitives are further categorized into fields (e.g., Possessional, Identificational, Temporal, Locational, etc.).

Jonas Kuhn: MT 8

LCS – Definitions

Example 1John went happily to school [Event GOLoc

([Thing JOHN], [Path TOLoc ([Position ATLoc ([Thing JOHN], [Location SCHOOL])])][Manner HAPPILY])]

Logical Head

Logical Subject

Logical Argument Logical

Modifier

Jonas Kuhn: MT 9

LCS – Definitions

Types and primitives:

Jonas Kuhn: MT 10

LCS – DefinitionsPrimitives must adhere to constraints on argument structure

Spatial dimension

Causal dimension

Jonas Kuhn: MT 11

LCS – DefinitionsField dimension (specialization of primitive stating undrewhich domain it is interpreted – e.g., GOLoc vs. GOTemp)

Footnote 14: Technically the second argument for each of these fields is a Path or a Position. For the purposes of the current description the column under “Argument 2” refers to the lowest leaf node embedded inside of the second argument.

Jonas Kuhn: MT 12

LCS – DefinitionsLCS representation in the lexicon and as the interlingua representation

Definition 2 (Dorr 1994)A RLCS (i.e., a root LCS) is an uninstantiated LCS that is associated with a word definition in the lexicon (i.e., a LCS with unfilled variable positions).

Definition 3 (Dorr 1994)A CLCS (i.e., a composed LCS) is an instantiated LCS that is the result of combining two or more RLCSs by means of unification (roughly). This is the interlingua, or language-independent, form that serves as the pivot between the source and target languages.

Jonas Kuhn: MT 13

LCS – DefinitionsExamples of RLCSs and CLCSs:

RLCS associated with the word go:[Event GOLoc ([Thing X], [Path TOLoc ([Position ATLoc ([Thing X], [Location Z])])])]

CLCS: composition of RLCSs for go, John, school, and happily leads to the LCS seen previously (using a concept of “unification”)

Jonas Kuhn: MT 14

Composition of LCSs

Notion of “Unification” differs from standard unification

Not directly invertibleMore “relaxed” notion (for words associated with special parameters like :INT, :EXT, :PROMOTE etc.)

Jonas Kuhn: MT 15

Composition of LCSsComposition based on syntactic parse (following the GB framework(Government-and-Binding theory))

Definition 4 (Dorr 1994)A syntactic phrase is a maximal projection that conforms to the following structural form:

Syntactic Head

External Argument

Internal Arguments

Syntactic Adjuncts

Syntactic Adjuncts

Jonas Kuhn: MT 16

Composition of LCSs

ExampleJohn went happily to school

Syntactic Head

External Argument

Internal Argument

Syntactic Adjunct

Jonas Kuhn: MT 17

The translation mappings

Generalized linking routine (GLR)

Canonical syntactic realization (CSR)

Jonas Kuhn: MT 18

The translation mappings

Generalized linking routine (GLR)

Simplified schema:

X: Syntactic Head

W: External Argument

Z: Internal Argument

Q: Syntactic Adjunct

X’: Logical Head

W’: Logical Subject

Z’: Logical Argument

Q’: Logical Modifier

Jonas Kuhn: MT 19

The translation mappings

X: Syntactic Head

W: External Argument

Z: Internal Argument

Q: Syntactic Adjunct

X’: Logical Head

W’: Logical Subject

Z’: Logical Argument

Q’: Logical Modifier

Generalized linking routine (GLR)

Example

Jonas Kuhn: MT 20

The translation mappings

Canonical syntactic realization (CSR)

Jonas Kuhn: MT 21

The Divergence Problem

There can be (language-specific) exceptions to the GLR and/or the CSRTranslation divergences occur when such exceptions occur in one language, but not in the other

Formal classification of lexical-semantic divergences

Jonas Kuhn: MT 22

Addressing the Divergence Problem

Parameters for encoding language-specific information

GLR, CSR: language independentParameters: language-specific information about lexical items

Seven parameters::INT:EXT:PROMOTE:DEMOTE*:CAT:CONFLATED

Jonas Kuhn: MT 23

Thematic Divergence

E: I like Mary S: Maria me gusta a mi'Mary pleases me'

Arises only where there is a logical subject

Jonas Kuhn: MT 24

Thematic Divergence

Encoded with the :INT and :EXT parameters

Jonas Kuhn: MT 25

Thematic Divergence

Translation mapping for

English relies on

GLR defaults

Jonas Kuhn: MT 26

Parameter markings

Parameter markers such as :INT and :EXT show up only in the RLCS (for lexicon entries)The CLCS does not include such markers, it is a language-independent representation

Jonas Kuhn: MT 27

Promotional Divergence

E: John usually goes home S: Juan suele ira casa'John tends to go home‘

Logical ModifierLogical Head

Logical ArgumentLogical HeadJonas Kuhn: MT 28

Promotional Divergence

Jonas Kuhn: MT 29

Promotional Divergence

Jonas Kuhn: MT 30

Demotional Divergence

E: I like eating G: Ich esse gern'I eat likingly'

Jonas Kuhn: MT 31

Demotional Divergence

:DEMOTE parameter:logical head and logical argument swap places

Jonas Kuhn: MT 32

Demotional Divergence

Jonas Kuhn: MT 33

Divergence Types

The difference between promotional and demotional divergences

In promotional divergences (e.g., soler-usually), the verb (soler) triggers the head switching, no matter what event is substituted as its argumentIn demotional divergences (e.g., like-gern), the adverbial satellite (gern) is the trigger

Jonas Kuhn: MT 34

Structural Divergence

E: John entered the house S: Juan entró en la casa'John entered in the house'

In structural divergence it is not the positions in the GLR mapping that are altered, but the nature of the relation betweenthe different positions

Jonas Kuhn: MT 35

Structural Divergence

Jonas Kuhn: MT 36

Conflational Divergence

E: I stabbed John S: Yo le di puñaladas a Juan'I gave knife-wounds to John‘

Logical Argument; suppressed in English

Jonas Kuhn: MT 37

Conflational Divergence

Not realized syntactically

Jonas Kuhn: MT 38

Conflational Divergence

Jonas Kuhn: MT 39

Divergence Types

(1) Thematic divergence(2) Promotional divergence(3) Demotional divergence(4) Structural divergence(5) Conflational divergence(6) Categorial divergence(7) Lexical divergence

Default Operationof GLR is changed

Default Operationof CSR is changed

Jonas Kuhn: MT 40

Categorial Divergence

E: I am hungry G: Ich habe Hunger'I have hunger'

Jonas Kuhn: MT 41

Categorial Divergence

Jonas Kuhn: MT 42

Lexical Divergence

Arises only in the context of other divergence typesChoice of lexical items in any languge relies on the realization and composition properties of those itemsSince the various other divergences alter these properties, lexical divergence is viewed as a side effect of other divergences

No specific override markers used

Jonas Kuhn: MT 43

Lexical Divergence

E: John broke into the room S: Juan forzó la entrada al cuarto

'John forced (the) entry to the room‘Conflational divergence forces the occurrence of a lexical divergence

Jonas Kuhn: MT 44

Lexical Divergence

“break into”subsumes two concepts

Jonas Kuhn: MT 45

Discussion

Full coverage constraint

Generation-based view of GB parsing

Bias in “interlingua” representation?