multi-engine machine translationalavie/presentations/kenneth-memt-may11.pdf · multi-engine machine...

11
Overview System Overview Language Modeling Results Multi-Engine Machine Translation Kenneth Heafield Language Technologies Institute Carnegie Mellon University May 26, 2011 Kenneth Heafield Multi-Engine Machine Translation

Upload: others

Post on 09-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-Engine Machine Translationalavie/Presentations/Kenneth-MEMT-May11.pdf · Multi-Engine Machine Translation Kenneth Hea eld Language Technologies Institute Carnegie Mellon University

OverviewSystem Overview

Language ModelingResults

Multi-Engine Machine Translation

Kenneth Heafield

Language Technologies InstituteCarnegie Mellon University

May 26, 2011

Kenneth Heafield Multi-Engine Machine Translation

Page 2: Multi-Engine Machine Translationalavie/Presentations/Kenneth-MEMT-May11.pdf · Multi-Engine Machine Translation Kenneth Hea eld Language Technologies Institute Carnegie Mellon University

OverviewSystem Overview

Language ModelingResults

Pipeline

Individual Systems METEOR This Work

Input Translate

Translate

Translate

Align Decode

Output

Kenneth Heafield Multi-Engine Machine Translation

Page 3: Multi-Engine Machine Translationalavie/Presentations/Kenneth-MEMT-May11.pdf · Multi-Engine Machine Translation Kenneth Hea eld Language Technologies Institute Carnegie Mellon University

OverviewSystem Overview

Language ModelingResults

Arabic-English Example Combination

System 1: So even if that was meaningful , it is because you were late

System 2: Even if feasible , it is because you have been delayed

Combine

Combined: Even if feasible , it is because you were late

≈ Compare

Reference: And even if that was useful , it was because you were late

Kenneth Heafield Multi-Engine Machine Translation

Page 4: Multi-Engine Machine Translationalavie/Presentations/Kenneth-MEMT-May11.pdf · Multi-Engine Machine Translation Kenneth Hea eld Language Technologies Institute Carnegie Mellon University

OverviewSystem Overview

Language ModelingResults

Search Space

Sentence Pair Alignment

Match surface, stems, WordNet synsets, and learned correspondencesMinimize crossing alignments

Twice that produced by nuclear plants

Double that that produce nuclear power stations

Kenneth Heafield Multi-Engine Machine Translation

Page 5: Multi-Engine Machine Translationalavie/Presentations/Kenneth-MEMT-May11.pdf · Multi-Engine Machine Translation Kenneth Hea eld Language Technologies Institute Carnegie Mellon University

OverviewSystem Overview

Language ModelingResults

Search Space

Search Space

Algorithm

Start at the beginning of each sentence

Branch by appending the first unused word from a system

Use the appended word and those aligned with it

Loop until all hypotheses reach end of sentence

Example

System 1: Now can know why .

System 2: Now we can now know why .Partial Hypothesis{

Now

NowKenneth Heafield Multi-Engine Machine Translation

Page 6: Multi-Engine Machine Translationalavie/Presentations/Kenneth-MEMT-May11.pdf · Multi-Engine Machine Translation Kenneth Hea eld Language Technologies Institute Carnegie Mellon University

OverviewSystem Overview

Language ModelingResults

Search Space

Search Space

Algorithm

Start at the beginning of each sentence

Branch by appending the first unused word from a system

Use the appended word and those aligned with it

Loop until all hypotheses reach end of sentence

Example

System 1: Now can know why .

System 2: Now we can now know why .Partial Hypothesis

Now

{can

weKenneth Heafield Multi-Engine Machine Translation

Page 7: Multi-Engine Machine Translationalavie/Presentations/Kenneth-MEMT-May11.pdf · Multi-Engine Machine Translation Kenneth Hea eld Language Technologies Institute Carnegie Mellon University

OverviewSystem Overview

Language ModelingResults

Search Space

Search Space

Algorithm

Start at the beginning of each sentence

Branch by appending the first unused word from a system

Use the appended word and those aligned with it

Loop until all hypotheses reach end of sentence

Example

System 1: Now can know why .

System 2: Now we can now know why .Partial Hypothesis

Now we

{can

canKenneth Heafield Multi-Engine Machine Translation

Page 8: Multi-Engine Machine Translationalavie/Presentations/Kenneth-MEMT-May11.pdf · Multi-Engine Machine Translation Kenneth Hea eld Language Technologies Institute Carnegie Mellon University

OverviewSystem Overview

Language ModelingResults

Search Space

Search Space

Algorithm

Start at the beginning of each sentence

Branch by appending the first unused word from a system

Use the appended word and those aligned with it

Loop until all hypotheses reach end of sentence

Example

System 1: Now can know why .

System 2: Now we can now know why .Partial Hypothesis

Now we can

{know

nowKenneth Heafield Multi-Engine Machine Translation

Page 9: Multi-Engine Machine Translationalavie/Presentations/Kenneth-MEMT-May11.pdf · Multi-Engine Machine Translation Kenneth Hea eld Language Technologies Institute Carnegie Mellon University

OverviewSystem Overview

Language ModelingResults

Search Space

Features

Length

Length of hypothesis

Match

How well each system matches the combined output

Language Model

Model: log probability from a language modelOOV: count of words unknown to the model

Kenneth Heafield Multi-Engine Machine Translation

Page 10: Multi-Engine Machine Translationalavie/Presentations/Kenneth-MEMT-May11.pdf · Multi-Engine Machine Translation Kenneth Hea eld Language Technologies Institute Carnegie Mellon University

OverviewSystem Overview

Language ModelingResults

Language Modeling

Our implementations compared to de facto SRILM

Probing 2.4x speed, 57% of the memory

Trie 1.3x speed, 30% of the memory

Impact on Applications

Substantial reduction in CPU and RAM costs

Replacing SRILM: Moses, cdec, and Joshua use my code

Also applies to speech recognition and information retrieval

Kenneth Heafield Multi-Engine Machine Translation

Page 11: Multi-Engine Machine Translationalavie/Presentations/Kenneth-MEMT-May11.pdf · Multi-Engine Machine Translation Kenneth Hea eld Language Technologies Institute Carnegie Mellon University

OverviewSystem Overview

Language ModelingResults

Recent Results

Track ∆BLEU ∆-TER ∆METGALE Arabic-English 1.02 0.99NIST Arabic-English 6.67 3.02 3.68

Urdu-English 1.84 0.87 0.74Czech-English 0.26 -1.05 0.73

German-English 1.91 1.76 0.93Spanish-English 3.64 2.68 1.89French-English 2.05 1.81 1.40

English-Czech 0.43 0.21 0.28English-German 0.53 -0.14 -0.04English-Spanish 2.62 3.01 1.00English-French 1.08 0.40 0.76

Table: Performance gains over best individual system from recent externalevaluations: DARPA GALE, NIST-2009, and WMT-2011.

Kenneth Heafield Multi-Engine Machine Translation