statistical machine translation alona fyshe based on slides from colin cherry and dekang lin

82
Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Upload: brian-rees

Post on 28-Mar-2015

229 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Statistical Machine Translation

Alona Fyshe

Based on slides from Colin Cherry and Dekang Lin

Page 2: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Basic statistics

• 0 <= P(x) <=1• P(A)

Probability that A happens

• P(A,B) Probabiliy that A and B happen

• P(A|B) Probability that A happens given that we

know B happened

Page 3: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Basic statistics

• Conditional probability

P(A | B) =P(A,B)

P(B)

Page 4: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Basic Statistics

• Use definition of conditional probability to derive the chain rule

P(A | B) =P(A,B)

P(B)

P(A,B) = P(B)P(A | B) = P(A)P(B | A)

P(A1, A2,K An )

= P(An | An−1,K A1)P(An−1,K A1)

=L

= P(A1)P(A2 | A1)P(A3 | A1, A2)K P(An | A1K , An−1)

Page 5: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Basic Statistics

• Bayes Rule

P(A,B) = P(A | B)P(B)

P(A,B) = P(B | A)P(A)

P(A | B)P(B) = P(B | A)P(A)

P(A | B) =P(B | A)P(A)

P(B)

Page 6: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Basic Statistics

• Just rememberDefinition of cond. prob.

Bayes rule

Chain rule

P(A | B) =P(A,B)

P(B)

P(A | B) =P(B | A)P(A)

P(B)

P(A1)P(A2 | A1)P(A3 | A1,A2)K P(An | A1K , An−1)

Page 7: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Goal

• Translate.• I’ll use French (F) into English (E)

as the running example.

Page 8: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Oh, Canada

• I’m Canadian Mandatory French class in school until grade 6 I speak “Cereal Box French”

GratuitGagnerChocolatGlaçageSans grasSans cholestérolÉlevé dans la fibre

Page 9: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Oh, Canada

Page 10: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Machine Translation

• Translation is easy for (bilingual) people

• Process:Read the text in FrenchUnderstand itWrite it down in English

Page 11: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Machine Translation

• Translation is easy for (bilingual) people

• Process:Read the text in FrenchUnderstand itWrite it down in English

Page 12: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Machine Translation

Understanding languageWriting well formed text

• Hard tasks for computers The human process is invisible,

intangible

Page 13: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

One approach: Babelfish

• A rule-based approach to machine translation

• A 30-year-old feat in Software Eng.

• Programming knowledge in by hand is difficult and expensive

Page 14: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Alternate Approach: Statistics

• We are trying to model P(E|F) I give you a French sentence You give me back English

• How are we going to model this?We could use Bayes rule:

P(E | F) =P(F | E)P(E)

P(F)∝ P(F | E)P(E)

Page 15: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Alternate Approach: Statistics

P(E | F) =P(F | E)P(E)

P(F)∝ P(F | E)P(E)

Given a French sentence F, we could do a

search for an E that maximizes P(E | F)

Page 16: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Why Bayes rule at all?

• Why not model P(E|F) directly?

• P(F|E)P(E) decomposition allows us to be sloppy

P(E) worries about good English

P(F|E) worries about French that matches

English

The two can be trained independently

Page 17: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Crime Scene Analogy

• F is a crime scene. E is a person who may have committed the crime P(E|F) - look at the scene - who did it?

P(E) - who had a motive? (Profiler)

P(F|E) - could they have done it? (CSI - transportation, access to weapons, alabi)

• Some people might have great motives, but no means - you need both!

Page 18: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

On voit Jon à la télévision

good English? P(E) good match to French? P(F|E)

Jon appeared in TV.

It back twelve saw.

In Jon appeared TV.

Jon is happy today.

Jon appeared on TV.

TV appeared on Jon.

Jon was not happy.

Table borrowed from Jason Eisner

Page 19: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

On voit Jon à la télévision

good English? P(E) good match to French? P(F|E)

Jon appeared in TV.

It back twelve saw.

In Jon appeared TV.

Jon is happy today.

Jon appeared on TV.

TV appeared on Jon.

Jon was not happy.

Table borrowed from Jason Eisner

Page 20: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

I speak English good.

• How are we going to model good English?• How do we know these sentences are not

good English? Jon appeared in TV.

It back twelve saw.

In Jon appeared TV.

TV appeared on Jon.

Je ne parle pas l'anglais.

Page 21: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

I speak English good.

• Je ne parle pas l'anglais. These aren’t English words.

• It back twelve saw. These are English words, but it’s jibberish.

• Jon appeared in TV. “appeared in TV” isn’t proper English

Page 22: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

I speak English good.

• Let’s say we have a huge collection of documents written in English Like, say, the Internet.

• It would be a pretty comprehensive list of English words Save for “named entities” People, places, things

Might include some non-English words

Speling mitsakes! lol!

• Could also tell if a phrase is good English

Page 23: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Google, is this good English?

• Jon appeared in TV. “Jon appeared” 1,800,000 Google results “appeared in TV” 45,000 Google results “appeared on TV” 210,000 Google results

• It back twelve saw. “twelve saw” 1,100 Google results “It back twelve” 586 Google results “back twelve saw” 0 Google results

• Imperfect counting… why?

Page 24: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Google, is this good English?

• Language is often modeled this wayCollect statistics about the frequency of

words and phrasesN-gram statistics

1-gram = unigram 2-gram = bigram 3-gram = trigram 4-gram = four-gram 5-gram = five-gram

Page 25: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Google, is this good English?

• Seriously, you want to query google for every phrase in the translation?

• Google created and released a 5-gram data set.Now you can query Google locally

(kind of)

Page 26: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Language Modeling

• What’s P(e)? P(English sentence) P(e1, e2, e3 … ei)Using the chain rule

P(e1)P(e2 | e1)P(e3 | e1,e2)P(e4 | e1,e2,e3)K P(ei | e1,e2,K ei−1)

Page 27: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Language Modeling

• Markov assumption The choice of word ei depends only on

the n words before ei

• Definition of conditional probability

P(ei | ei−4,ei−3,ei−2,ei−1) =P(ei−4,ei−3,ei−2,ei−1,ei)

P(ei−4 ,ei−3,ei−2,ei−1)

P(ei | e1,e2,K ei−4,ei−3,ei−2,ei−1) = P(ei | ei−4,ei−3,ei−2,ei−1)

P(e1)P(e2 | e1)P(e3 | e1,e2)P(e4 | e1,e2,e3)K P(ei | e1,e2,K ei−1)

Page 28: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Language Modeling

P( pie | I, love, to,eat) =P(I, love, to,eat, pie)

P(I, love, to,eat)

Page 29: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Language Modeling

• Approximate probability using counts

• Use the n-gram corpus!

P(ei−4,ei−3,ei−2,ei−1,ei)

P(ei−4 ,ei−3,ei−2,ei−1)

P(ei−4,ei−3,ei−2,ei−1,ei)

P(ei−4 ,ei−3,ei−2,ei−1)=

C(ei−4 ,ei−3,ei−2,ei−1,ei)

C(ei−4,ei−3,ei−2,ei−1)

Page 30: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Language Modeling

• Use the n-gram corpus!

Not surprisingly, given that you love to eat, loving to eat chocolate is more probable (0.177)

P( pie | I, love, to,eat) =P(I, love, to,eat, pie)

P(I, love, to,eat)

=C(I, love, to,eat, pie)

C(I, love, to,eat)

=2,760

409,000= 0.0067

Page 31: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Language Modeling

• But what if

• Then P(e) = 0• Happens even if the sentence is

grammatically correct “Al Gore’s pink Hummer was stolen.”

C(ei−4,ei−3,ei−2,ei−1,ei) = 0

Page 32: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Language Modeling

• SmoothingMany techniques

• Add one smoothingAdd one to every countNo more zeros, no problems

• Backoff If P(e1, e2, e3, e4, e5) = 0 use something

related to P(e1, e2, e3, e4)

Page 33: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Language Modeling

• Wait… Is this how people “generate” English sentences?Do you choose your fifth word based on B

• Admittedly, this is an approximation to process which is both intangible and hard for humans themselves to explain

• If you disagree, and care to defend yourself, consider a PhD in NLP

Page 34: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Back to Translation

• Anyway, where were we?

Oh right…

So, we’ve got P(e), let’s talk P(f|e)

P(E | F) =P(F | E)P(E)

P(F)∝ P(F | E)P(E)

Page 35: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Where will we get P(F|E)?

Cereal boxes in English

Same cerealBoxes,

in French

MachineLearning

Magic

P(F|E) model

Page 36: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Where will we get P(F|E)?

Books inEnglish

Same books,in French

MachineLearning

Magic

P(F|E) model

We call collections stored in two languages parallel corpora or parallel texts

Want to update your system? Just add more text!

Page 37: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Translated Corpora

• The Canadian Parliamentary Debates Available in both French and English

• UN documents Available in Arabic, Chinese, English, French,

Russian and Spanish

Page 38: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Problem:

• How are we going to generalize from examples of translations?

• I’ll spend the rest of this lecture telling you: What makes a useful P(F|E) How to obtain the statistics needed for P(F|E)

from parallel texts

Page 39: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Strategy: Generative Story

• When modeling P(X|Y):Assume you start with YDecompose the creation of X from Y into

some number of operations Track statistics of individual operations

For a new example X,Y: P(X|Y) can be calculated based on the probability of the operations needed to get X from Y

Page 40: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

What if…?

The quick fox jumps over the lazy dog

Le renard rapide saut par - dessus le chien parasseux

Page 41: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

New Information

• Call this new info a word alignment (A)

• With A, we can make a good storyThe quick fox jumps over the lazy dog

Le renard rapide saut par - dessus le chien parasseux

Page 42: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

P(F,A|E) Story

null The quick fox jumps over the lazy dog

P(F, A | E) = ?

Page 43: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

P(F,A|E) Story

null The quick fox jumps over the lazy dog

f1 f2 f3 … f10

P(F, A | E) = ε

Simplifying assumption: Choose the length of the French sentence f. All lengths have equal probability

Page 44: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

P(F,A|E) Story

null The quick fox jumps over the lazy dog

f1 f2 f3 … f10

P(F, A | E) =ε

(8 +1)10

There are (l+1)m = (8+1)10 possible alignments

Page 45: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

P(F,A|E) Story

null The quick fox jumps over the lazy dog

Le renard rapide saut par - dessus le chien parasseux

P(F,A | E) =ε

910

pt (Le | The) •

pt (renard | fox) •

K •

pt (parasseux | lazy)

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

Page 46: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

P(F,A|E) Story

null The quick fox jumps over the lazy dog

Le renard rapide saut par - dessus le chien parasseux

P(F,A | E) =ε

(l +1)mpt ( f j | ea j

)j=1

m

Page 47: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Getting Pt(f|e)

• We need numbers for Pt(f|e)

• Example: Pt(le|the) Count lines in a large collection of

aligned text

null The quick fox jumps over the lazy dog

Le renard rapide saut par - dessus le chien parasseux

null The quick fox jumps over the lazy dog

Le renard rapide saut par - dessus le chien parasseux

null The quick fox jumps over the lazy dog

Le renard rapide saut par - dessus le chien parasseux

null The quick fox jumps over the lazy dog

Le renard rapide saut par - dessus le chien parasseux

null The quick fox jumps over the lazy dog

Le renard rapide saut par - dessus le chien parasseux

null The quick fox jumps over the lazy dog

Le renard rapide saut par - dessus le chien parasseux

Pt ( f | e) =# e linked to f

# e linked to anything€

Pt (le | the) =# (le, the)

# (le, the)+# (la, the)+# (les, the)

Page 48: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Where do we get the lines?

• That sure looked like a lot of monkeys…

• Remember: some times the information hidden in the text just jumps out at you We’ll get alignments out of unaligned text by

treating the alignment as a hidden variable

We infer an A that maxes the prob. of our corpus

Generalization of ideas in HMM training: called

EM

Page 49: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Where’s “heaven” in Vietnamese?

Example borrowed from Jason Eisner

Page 50: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

English: In the beginning God created the heavens and the earth.

Vietnamese: Ban dâu Dúc Chúa Tròi dung nên tròi dât.

English: God called the expanse heaven.Vietnamese: Dúc Chúa Tròi dat tên khoang không la tròi.

English: … you are this day like the stars of heaven in number.

Vietnamese: … các nguoi dông nhu sao trên tròi.

Where’s “heaven” in Vietnamese?

Example borrowed from Jason Eisner

Page 51: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

English: In the beginning God created the heavens and the earth.

Vietnamese: Ban dâu Dúc Chúa Tròi dung nên tròi dât.

English: God called the expanse heaven.Vietnamese: Dúc Chúa Tròi dat tên khoang không la tròi.

English: … you are this day like the stars of heaven in number.

Vietnamese: … các nguoi dông nhu sao trên tròi.

Where’s “heaven” in Vietnamese?

Example borrowed from Jason Eisner

Page 52: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

EM: Expectation Maximization

• Assume a probability distribution (weights) over hidden events Take counts of events based on this

distribution

Use counts to estimate new parameters

Use parameters to re-weight examples.

• Rinse and repeat

Page 53: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Alignment Hypotheses

null I like milk

Je aime le lait

null I like milk

Je aime le lait

null I like milk

Je aime le lait

null I like milk

Je aime le lait

null I like milk

Je aime le lait

null I like milk

Je aime le lait

null I like milk

Je aime le lait

null I like milk

Je aime le lait

0.65 0.25 0.05

0.01 0.01 0.01

0.01 0.001

Page 54: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Weighted Alignments

• What we’ll do is:Consider every possible alignmentGive each alignment a weight -

indicating how good it is

Count weighted alignments as normal

P(A | E,F) =P(F,A | E)

P(F | E)

Page 55: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Good grief! We forgot about P(F|E)!

• No worries, a little more stats gets us what we need:

P(F | E) = P(F, A | E)A∈A

∴ P(A | E,F) =P(F,A | E)

P(F, A | E)A∈A

Page 56: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Big Example: Corpus

fast car

voiture rapide

fast

rapide

1

2

Page 57: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Possible Alignments

fast car

voiture rapide

fast

rapide

fast car

voiture rapide

1a 1b 2

Page 58: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Parameters

fast car

voiture rapide

fast

rapide

fast car

voiture rapide

1a 1b 2

P(voiture|fast) P(rapide|fast) P(voiture|car) P(rapide|car)

1/2 1/2 1/2 1/2

Page 59: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Weight Calculations

fast car

voiture rapide

fast

rapide

fast car

voiture rapide

1a 1b 2

P(voiture|fast) P(rapide|fast) P(voiture|car) P(rapide|car)

1/2 1/2 1/2 1/2

P(A,F|E) P(A|F,E)

1a 1/2*1/2=1/4 1/4 / 2/4 = 1/2

1b 1/2*1/2=1/4 1/4 / 2/4 = 1/2

2 1/2 1/2 / 1/2 = 1

Page 60: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Count Lines

fast car

voiture rapide

fast

rapide

fast car

voiture rapide

1a 1b 2

1/2 1/2 1

Page 61: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Count Lines

fast car

voiture rapide

fast

rapide

fast car

voiture rapide

1a 1b 2

1/2 1/2 1

#(voiture,fast) #(rapide,fast) #(voiture,car) #(rapide,car)

1/2 1/2+1 = 3/2 1/2 1/2

Page 62: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Count Lines

fast car

voiture rapide

fast

rapide

fast car

voiture rapide

1a 1b 2

1/2 1/2 1

#(voiture,fast) #(rapide,fast) #(voiture,car) #(rapide,car)

1/2 1/2+1 = 3/2 1/2 1/2

Normalize

P(voiture|fast) P(rapide|fast) P(voiture|car) P(rapide|car)

1/4 3/4 1/2 1/2

Page 63: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Parameters

fast car

voiture rapide

fast

rapide

fast car

voiture rapide

1a 1b 2

P(voiture|fast) P(rapide|fast) P(voiture|car) P(rapide|car)

1/4 3/4 1/2 1/2

Page 64: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Weight Calculations

fast car

voiture rapide

fast

rapide

fast car

voiture rapide

1a 1b 2

P(voiture|fast) P(rapide|fast) P(voiture|car) P(rapide|car)

1/4 3/4 1/2 1/2

P(A,F|E) P(A|F,E)

1a 1/4*1/2=1/8 1/8 / 4/8 = 1/4

1b 1/2*3/4=3/8 3/8 / 4/8 = 3/4

2 3/4 3/4 / 3/4 = 1

Page 65: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Count Lines

fast car

voiture rapide

fast

rapide

fast car

voiture rapide

1a 1b 2

1/4 3/4 1

Page 66: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Count Lines

fast car

voiture rapide

fast

rapide

fast car

voiture rapide

1a 1b 2

1/4 3/4 1

#(voiture,fast) #(rapide,fast) #(voiture,car) #(rapide,car)

1/4 3/4+1 = 7/4 3/4 1/4

Page 67: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Count Lines

fast car

voiture rapide

fast

rapide

fast car

voiture rapide

1a 1b 2

1/4 3/4 1

#(voiture,fast) #(rapide,fast) #(voiture,car) #(rapide,car)

1/4 3/4+1 = 7/4 3/4 1/4

Normalize

P(voiture|fast) P(rapide|fast) P(voiture|car) P(rapide|car)

1/8 7/8 3/4 1/4

Page 68: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

After many iterations:

fast car

voiture rapide

fast

rapide

fast car

voiture rapide

1a 1b 2

~0 ~1 1

P(voiture|fast) P(rapide|fast) P(voiture|car) P(rapide|car)

0.001 0.999 0.999 0.001

Page 69: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Seems too easy?

• What if you have no 1-word sentence?

Words in shorter sentences will get more weight - fewer possible alignments

Weight is additive throughout the corpus: if a word e shows up frequently with some other word f, P(f|e) will go up Like our heaven example

Page 70: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

The Final Product

• Now we have a model for P(F|E)• Test it by aligning a corpus!

IE: Find argmaxAP(A|F,E)

• Use it for translation:Combine with our n-gram model for P(E) Search space of English sentences for

one that maximizes P(E)P(F|E) for a given F

Page 71: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Model could be a lot better:

• Word positions• Multiple f’s generated by the same e• Could take into account who

generated your neighbors• Could use syntax, parsing• Could align phrases

Page 72: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Sure, but is it any better?

• We’ve got some good ideas for improving translation

• How can we quantify the change translation quality?

Page 73: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Sure, but is it any better?

• How to (automatically) measure translation? Original French

Dès qu'il fut dehors, Pierre se dirigea vers la rue de Paris, la principale rue du Havre, éclairée, animée, bruyante.

Human translation to EnglishAs soon as he got out, Pierre made his way to the Rue de Paris, the

high-street of Havre, brightly lighted up, lively and noisy.

Two machine tranlations back to French: Dès qu'il est sorti, Pierre a fait sa manière à la rue De Paris, la haut-

rue de Le Havre, brillamment allumée, animée et bruyante. Dès qu'il en est sorti, Pierre s'est rendu à la Rue de Paris, de la

grande rue du Havre, brillamment éclairés, animés et bruyants.

Example from http://www.readwriteweb.com/archives/google_translation_systran.php

Page 74: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Bleu Score

• Bleu Bilingual Evaluation Understudy A metric for comparing translations

• Considers n-grams in common with the target translation Length of target translation

• Score of 1 is identical, 0 shares no words in common

• Even human translations don’t score 1

Page 75: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Google Translate

• http://translate.google.com/translate_t 25 language pairs

• In the news (digg.com) http://www.readwriteweb.

com/archives/google_translation_systran.php

• In competition http://www.nist

.gov/speech/tests/mt/doc/mt06eval_official_results.html

Page 76: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Questions?

?

Page 77: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

References(Inspiration, Sources of borrowed material)

• Colin Cherry, MT for NLP, 2005 http://www.cs.ualberta.ca/~colinc/ta/MT650.pdf

• Knight, K., Automating Knowledge Acquisition for Machine Translation , AI Magazine 18(4), 1997.

• Knight, K., A Statistical Machine Translation Tutorial Workbook, 1999, http://www.clsp.jhu.edu/ws99/projects/mt/mt-workbook.htm

• Eisner, J., JHU NLP Course notes: Machine Translation, 2001, http://www.cs.jhu.edu/~jason/465/PDFSlides/lect32-translation.pdf

• Olga Kubassova, Probability for NLP, http://www.comp.leeds.ac.uk/olga/ProbabilityTutorial.ppt

Page 78: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Enumerating all alignments

P(F | E) =ε

(l +1)mK pt ( f j | ea j

)j=1

m

∏am = 0

l

∑a1 = 0

l

There are possible alignments!

l +1( )m

Page 79: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Gah!

Null (0) Fast (1) car (2)

Voiture (1) rapide (2)

pt ( f1 | e0)pt ( f2 | e0) +

pt ( f1 | e0)pt ( f2 | e1) +

pt ( f1 | e0)pt ( f2 | e2) +

pt ( f1 | e1)pt ( f2 | e0) +

pt ( f1 | e1)pt ( f2 | e1) +

pt ( f1 | e1)pt ( f2 | e2) +

pt ( f1 | e2)pt ( f2 | e0) +

pt ( f1 | e2)pt ( f2 | e1) +

pt ( f1 | e2)pt ( f2 | e2) +

Page 80: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

Let’s move these over here…

Null (0) Fast (1) car (2)

Voiture (1) rapide (2)

pt ( f1 | e0) pt ( f2 | e0) + pt ( f2 | e1) + pt ( f2 | e2)[ ] +

pt ( f1 | e1) pt ( f2 | e0) + pt ( f2 | e1) + pt ( f2 | e2)[ ] +

pt ( f1 | e2) pt ( f2 | e0) + pt ( f2 | e1) + pt ( f2 | e2)[ ]

Page 81: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

And now we can do this…

Null (0) Fast (1) car (2)

Voiture (1) rapide (2)

pt ( f1 | e0) + pt ( f1 | e1) + pt ( f1 | e2)[ ] •

pt ( f2 | e0) + pt ( f2 | e1) + pt ( f2 | e2)[ ]

Page 82: Statistical Machine Translation Alona Fyshe Based on slides from Colin Cherry and Dekang Lin

So, it turns out:

K pt ( f j | ea j)

j=1

m

∏am = 0

l

∑ =a1 = 0

l

∑ pt ( f j | ei)i= 0

l

∑j=1

m

Requires only operations.

m(l +1)

Can be used whenever your alignment choice for one word does not affect the probability of the rest of the alignment