machine translation challenges and language divergences

35
Machine Translation Challenges and Language Divergences Alon Lavie Language Technologies Institute Carnegie Mellon University 11-731: Machine Translation January 12, 2011

Upload: axl

Post on 05-Feb-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Machine Translation Challenges and Language Divergences. Alon Lavie Language Technologies Institute Carnegie Mellon University 11-731: Machine Translation January 12, 2011. Major Sources of Translation Problems. Lexical Differences: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Machine Translation Challenges and Language Divergences

Machine TranslationChallenges and Language Divergences

Alon LavieLanguage Technologies Institute

Carnegie Mellon University

11-731: Machine TranslationJanuary 12, 2011

Page 2: Machine Translation Challenges and Language Divergences

January 12, 2011

Major Sources of Translation Problems

• Lexical Differences:– Multiple possible translations for SL word, or

difficulties expressing SL word meaning in a single TL word

• Structural Differences:– Syntax of SL is different than syntax of the TL: word

order, sentence and constituent structure

• Differences in Mappings of Syntax to Semantics:– Meaning in TL is conveyed using a different syntactic

structure than in the SL

• Idioms and Constructions

211-731: Machine Translation

Page 3: Machine Translation Challenges and Language Divergences

January 12, 2011

Lexical Differences

• SL word has several different meanings, that translate differently into TL– Ex: financial bank vs. river bank

• Lexical Gaps: SL word reflects a unique meaning that cannot be expressed by a single word in TL– Ex: English snub doesn’t have a corresponding verb

in French or German• TL has finer distinctions than SL SL word

should be translated differently in different contexts– Ex: English wall can be German wand (internal),

mauer (external)

311-731: Machine Translation

Page 4: Machine Translation Challenges and Language Divergences

January 12, 2011

Google at Work…

411-731: Machine Translation

Page 5: Machine Translation Challenges and Language Divergences

January 12, 2011 511-731: Machine Translation

Page 6: Machine Translation Challenges and Language Divergences

January 12, 2011 611-731: Machine Translation

Page 7: Machine Translation Challenges and Language Divergences

January 12, 2011

Lexical Differences

• Lexical gaps: – Examples: these have no direct equivalent in

English:

gratiner(v., French, “to cook with a cheese coating”)

ōtosanrin(n., Japanese, “three-wheeled truck or van”)

711-731: Machine Translation

Page 8: Machine Translation Challenges and Language Divergences

January 12, 2011

[From Hutchins & Somers]

Lexical Differences

811-731: Machine Translation

Page 9: Machine Translation Challenges and Language Divergences

January 12, 2011

MT Handling of Lexical Differences

• Direct MT and Syntactic Transfer:– Lexical Transfer stage uses bilingual lexicon– SL word can have multiple translation entries,

possibly augmented with disambiguation features or probabilities

– Lexical Transfer can involve use of limited context (on SL side, TL side, or both)

– Lexical Gaps can partly be addressed via phrasal lexicons

• Semantic Transfer:– Ambiguity of SL word must be resolved during

analysis correct symbolic representation at semantic level

– TL Generation must select appropriate word or structure for correctly conveying the concept in TL

911-731: Machine Translation

Page 10: Machine Translation Challenges and Language Divergences

January 12, 2011

Structural Differences

• Syntax of SL is different than syntax of the TL: – Word order within constituents:

• English NPs: art adj n the big boy• Hebrew NPs: art n art adj ha yeled ha gadol

– Constituent structure:• English is SVO: Subj Verb Obj I saw the man• Modern Arabic is VSO: Verb Subj Obj

– Different verb syntax:• Verb complexes in English vs. in German I can eat the apple Ich kann den apfel essen

– Case marking and free constituent order• German and other languages that mark case: den apfel esse Ich the(acc) apple eat I(nom)

1011-731: Machine Translation

Page 11: Machine Translation Challenges and Language Divergences

January 12, 2011 1111-731: Machine Translation

Page 12: Machine Translation Challenges and Language Divergences

January 12, 2011 1211-731: Machine Translation

Page 13: Machine Translation Challenges and Language Divergences

January 12, 2011 1311-731: Machine Translation

Page 14: Machine Translation Challenges and Language Divergences

January 12, 2011

MT Handling of Structural Differences

• Direct MT Approaches:– No explicit treatment: Phrasal Lexicons and sentence

level matches or templates• Syntactic Transfer:

– Structural Transfer Grammars• Trigger rule by matching against syntactic structure on

SL side• Rule specifies how to reorder and re-structure the

syntactic constituents to reflect syntax of TL side

• Semantic Transfer:– SL Semantic Representation abstracts away from SL

syntax to functional roles done during analysis– TL Generation maps semantic structures to correct

TL syntax

1411-731: Machine Translation

Page 15: Machine Translation Challenges and Language Divergences

January 12, 2011

Syntax-to-Semantics Differences

• Meaning in TL is conveyed using a different syntactic structure than in the SL– Changes in verb and its arguments– Passive constructions– Motion verbs and state verbs– Case creation and case absorption

• Main Distinction from Structural Differences:– Structural differences are mostly independent of

lexical choices and their semantic meaning addressed by transfer rules that are syntactic in nature

– Syntax-to-semantic mapping differences are meaning-specific: require the presence of specific words (and meanings) in the SL

1511-731: Machine Translation

Page 16: Machine Translation Challenges and Language Divergences

January 12, 2011

Syntax-to-Semantics Differences

• Structure-change example:I like swimming“Ich scwhimme gern”I swim gladly

1611-731: Machine Translation

Page 17: Machine Translation Challenges and Language Divergences

January 12, 2011 1711-731: Machine Translation

Page 18: Machine Translation Challenges and Language Divergences

January 12, 2011

Syntax-to-Semantics Differences

• Verb-argument example:Jones likes the film.“Le film plait à Jones.”(lit: “the film pleases to Jones”)

• Use of case roles can eliminate the need for this type of transfer – Jones = Experiencer– film = Theme

1811-731: Machine Translation

Page 19: Machine Translation Challenges and Language Divergences

January 12, 2011 1911-731: Machine Translation

Page 20: Machine Translation Challenges and Language Divergences

January 12, 2011 2011-731: Machine Translation

Page 21: Machine Translation Challenges and Language Divergences

January 12, 2011

Syntax-to-Semantics Differences

• Passive Constructions• Example: French reflexive passives:

Ces livres se lisent facilement*”These books read themselves easily”These books are easily read

2111-731: Machine Translation

Page 22: Machine Translation Challenges and Language Divergences

January 12, 2011 2211-731: Machine Translation

Page 23: Machine Translation Challenges and Language Divergences

January 12, 2011 2311-731: Machine Translation

Page 24: Machine Translation Challenges and Language Divergences

January 12, 2011

Same intention, different syntax

• rigly bitiwgacny my leg hurts• candy wagac fE rigly I have pain in my leg• rigly bitiClimny my leg hurts• fE wagac fE rigly there is pain in my leg• rigly bitinqaH calya my leg bothers on me

Romanization of Arabic from CallHome Egypt.

2411-731: Machine Translation

Page 25: Machine Translation Challenges and Language Divergences

January 12, 2011

MT Handling of Syntax-to-Semantics Differences

• Direct MT Approaches:– No Explicit treatment: Phrasal Lexicons and sentence

level matches or templates• Syntactic Transfer:

– “Lexicalized” Structural Transfer Grammars• Trigger rule by matching against “lexicalized” syntactic

structure on SL side: lexical and functional features• Rule specifies how to reorder and re-structure the

syntactic constituents to reflect syntax of TL side

• Semantic Transfer:– SL Semantic Representation abstracts away from SL

syntax to functional roles done during analysis– TL Generation maps semantic structures to correct

TL syntax

2511-731: Machine Translation

Page 26: Machine Translation Challenges and Language Divergences

January 12, 2011

Idioms and Constructions

• Main Distinction: meaning of whole is not directly compositional from meaning of its sub-parts no compositional translation

• Examples:– George is a bull in a china shop– He kicked the bucket– Can you please open the window?

2611-731: Machine Translation

Page 27: Machine Translation Challenges and Language Divergences

January 12, 2011 2711-731: Machine Translation

Page 28: Machine Translation Challenges and Language Divergences

January 12, 2011

Formulaic Utterances

• Good night.• tisbaH cala xEr waking up on good

• Romanization of Arabic from CallHome Egypt

2811-731: Machine Translation

Page 29: Machine Translation Challenges and Language Divergences

January 12, 2011

Constructions

• Identifying speaker intention rather than literal meaning for formulaic and task-oriented sentences.

How about … suggestionWhy don’t you… suggestionCould you tell me… request

info. I was wondering… request info.

2911-731: Machine Translation

Page 30: Machine Translation Challenges and Language Divergences

January 12, 2011 3011-731: Machine Translation

Page 31: Machine Translation Challenges and Language Divergences

January 12, 2011

MT Handling of Constructions and Idioms

• Direct MT Approaches:– No Explicit treatment: Phrasal Lexicons and sentence level

matches or templates• Syntactic Transfer:

– No effective treatment– “Highly Lexicalized” Structural Transfer rules can handle

some constructions• Trigger rule by matching against entire construction, including

structure on SL side• Rule specifies how to generate the correct construction on the

TL side• Semantic Transfer:

– Analysis must capture non-compositional representation of the idiom or construction specialized rules

– TL Generation maps construction semantic structures to correct TL syntax and lexical words

3111-731: Machine Translation

Page 32: Machine Translation Challenges and Language Divergences

January 12, 2011

Take Home Messages• Remember these types of language divergences as you

learn about and apply the various steps in the MT system pipelines of different approaches!

• Ask yourself how capable these various steps and approaches are in addressing these types of divergences!– Can the step/approach handle these divergences?– If so, does it model the divergence at the appropriate level

of abstraction?• Keep these language divergences in mind when you

analyze the errors of the MT system that you have put together and trained!– Are the errors attributable to a particular divergence?– What would be required for the system to address this type

of error?

3211-731: Machine Translation

Page 33: Machine Translation Challenges and Language Divergences

January 12, 2011

Summary

• Main challenges for current state-of-the-art MT approaches - Coverage and Accuracy:– Acquiring broad-coverage high-accuracy translation

lexicons (for words and phrases)– learning syntactic mappings between languages

from parallel word-aligned data– overcoming syntax-to-semantics differences and

dealing with constructions– Effective Target Language Modeling

3311-731: Machine Translation

Page 34: Machine Translation Challenges and Language Divergences

January 12, 2011

Homework Assignment #1

3411-731: Machine Translation

Page 35: Machine Translation Challenges and Language Divergences

January 12, 2011

Questions…

3511-731: Machine Translation