jhu mt class: feature-based models

53
Feature-Based Models

Upload: alopezfoo

Post on 26-Jun-2015

320 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: JHU MT class: Feature-based models

Feature-BasedModels

Page 2: JHU MT class: Feature-based models

•Some (not all) key ingredients in Google Translate:

•Phrase-based translation models

•... Learned heuristically from word alignments

•... Coupled with a huge language model

•... And very tight pruning heuristics

•Today: more flexible parameterizations.

Page 3: JHU MT class: Feature-based models

p(English|Chinese) !

p(English) ! p(Chinese|English)

Bayes’ Rule

translation modellanguage model

Page 4: JHU MT class: Feature-based models

English

p(Chinese|English)

Page 5: JHU MT class: Feature-based models

English

p(Chinese|English)

! p(English)

Page 6: JHU MT class: Feature-based models

English

p(Chinese|English)

! p(English)

∼ p(English|Chinese)

Page 7: JHU MT class: Feature-based models

English

p(Chinese|English)1

! p(English)1

∼ p(English|Chinese)

Page 8: JHU MT class: Feature-based models

English

p(Chinese|English)2

! p(English)1

∼ p(English|Chinese)

Page 9: JHU MT class: Feature-based models

English

p(Chinese|English)1/2

! p(English)1

∼ p(English|Chinese)

Page 10: JHU MT class: Feature-based models

English

p(Chinese|English)0

! p(English)1

∼ p(English|Chinese)

Page 11: JHU MT class: Feature-based models

English

0 · log p(Chinese|English)

+1 · log p(English)

∼ log p(English|Chinese)

Page 12: JHU MT class: Feature-based models

English

0 · log p(Chinese|English)

+1 · log p(English)

∼ log p(English|Chinese)

log(x) is monotonic for positive x:log(x) > log(y) iff x>y

Page 13: JHU MT class: Feature-based models

English

0 · log p(Chinese|English)

+1 · log p(English)

= score(English|Chinese)

Page 14: JHU MT class: Feature-based models

score(English|Chinese) =

λ1 log p(Chinese|English) + λ2 log p(English)

Page 15: JHU MT class: Feature-based models

score(English|Chinese) =

exp(λ1 log p(Chinese|English) + λ2 log p(English))

Page 16: JHU MT class: Feature-based models

exp(λ1 log p(Chinese|English) + λ2 log p(English))�

English

exp(λ1 log p(Chinese|English) + λ2 log p(English))

p(English|Chinese) =

Page 17: JHU MT class: Feature-based models

exp(λ1 log p(Chinese|English) + λ2 log p(English))�

English

exp(λ1 log p(Chinese|English) + λ2 log p(English))

p(English|Chinese) =

log-linear modelmaximum entropy model

conditional modelundirected model

Page 18: JHU MT class: Feature-based models

p(English|Chinese) =

log-linear modelmaximum entropy model

conditional modelundirected model

p(English) ! p(Chinese|English)

Note: Original model is a special case of this model!

Page 19: JHU MT class: Feature-based models

exp(λ1 log p(Chinese|English) + λ2 log p(English))�

English

exp(λ1 log p(Chinese|English) + λ2 log p(English))

p(English|Chinese) =

log-linear modelmaximum entropy model

conditional modelundirected model

Page 20: JHU MT class: Feature-based models

p(English|Chinese) =

log-linear modelmaximum entropy model

conditional modelundirected model

exp

��

k

λkhk(English, Chinese)

English�

exp

��

k

λkhk(English�, Chinese)

Page 21: JHU MT class: Feature-based models

p(English|Chinese) =

log-linear modelmaximum entropy model

conditional modelundirected model

1Z

exp

��

k

λkhk(English, Chinese)

Page 22: JHU MT class: Feature-based models

p(English|Chinese) =

log-linear modelmaximum entropy model

conditional modelundirected model

1Z

exp

��

k

λkhk(English, Chinese)

Z is the normalization term or partition function

Page 23: JHU MT class: Feature-based models

p(English|Chinese) =

1Z

exp

��

k

λkhk(English, Chinese)

Z is the normalization term or partition function

The functions hk are features or feature functionsThey are deterministic (fixed) functions of the

input/output pair.

The parameters of the model are the terms.λk

Page 24: JHU MT class: Feature-based models

What’s a Feature?

Page 25: JHU MT class: Feature-based models

What’s a Feature?A feature can be any function in the form:

hk : English× Chinese→ R+

Page 26: JHU MT class: Feature-based models

What’s a Feature?

•Language model: p(English)

A feature can be any function in the form: hk : English× Chinese→ R+

Page 27: JHU MT class: Feature-based models

What’s a Feature?

•Language model: p(English)

•Translation model: p(Chinese|English)

A feature can be any function in the form: hk : English× Chinese→ R+

Page 28: JHU MT class: Feature-based models

What’s a Feature?

•Language model: p(English)

•Translation model: p(Chinese|English)

•Reverse translation model: p(English|Chinese)

A feature can be any function in the form: hk : English× Chinese→ R+

Page 29: JHU MT class: Feature-based models

What’s a Feature?

•Language model: p(English)

•Translation model: p(Chinese|English)

•Reverse translation model: p(English|Chinese)

•The number of words in the English sentence.

A feature can be any function in the form: hk : English× Chinese→ R+

Page 30: JHU MT class: Feature-based models

What’s a Feature?

•Language model: p(English)

•Translation model: p(Chinese|English)

•Reverse translation model: p(English|Chinese)

•The number of words in the English sentence.

•The number of verbs in the English sentence.

A feature can be any function in the form: hk : English× Chinese→ R+

Page 31: JHU MT class: Feature-based models

What’s a Feature?

•Language model: p(English)

•Translation model: p(Chinese|English)

•Reverse translation model: p(English|Chinese)

•The number of words in the English sentence.

•The number of verbs in the English sentence.

•1 if the English sentence has a verb, 0 otherwise.

A feature can be any function in the form: hk : English× Chinese→ R+

Page 32: JHU MT class: Feature-based models

What’s a Feature?A feature can be any function in the form:

hk : English× Chinese→ R+

Page 33: JHU MT class: Feature-based models

What’s a Feature?

•A word-based translation model: p(Chinese|English)

A feature can be any function in the form: hk : English× Chinese→ R+

Page 34: JHU MT class: Feature-based models

What’s a Feature?

•A word-based translation model: p(Chinese|English)

•Agreement features in the English sentence.

A feature can be any function in the form: hk : English× Chinese→ R+

Page 35: JHU MT class: Feature-based models

What’s a Feature?

•A word-based translation model: p(Chinese|English)

•Agreement features in the English sentence.

•Features over part-of-speech sequences in the English sentence.

A feature can be any function in the form: hk : English× Chinese→ R+

Page 36: JHU MT class: Feature-based models

What’s a Feature?

•A word-based translation model: p(Chinese|English)

•Agreement features in the English sentence.

•Features over part-of-speech sequences in the English sentence.

•How many times the sentence pair includes the English word north and Chinese word 北.

A feature can be any function in the form: hk : English× Chinese→ R+

Page 37: JHU MT class: Feature-based models

What’s a Feature?

•A word-based translation model: p(Chinese|English)

•Agreement features in the English sentence.

•Features over part-of-speech sequences in the English sentence.

•How many times the sentence pair includes the English word north and Chinese word 北.

•Do words north and 北 appear in a dictionary?

A feature can be any function in the form: hk : English× Chinese→ R+

Page 38: JHU MT class: Feature-based models

Learning

arg maxθ

1Z

exp

��

k

λkhk(English, Chinese)

θ = �λ1, ...,λK�where:

Page 39: JHU MT class: Feature-based models

Learning

arg maxθ

1Z

exp

��

k

λkhk(English, Chinese)

θ = �λ1, ...,λK�where:

Techniques: SGD, L-BFGS

Page 40: JHU MT class: Feature-based models

Learning

arg maxθ

1Z

exp

��

k

λkhk(English, Chinese)

θ = �λ1, ...,λK�where:

Techniques: SGD, L-BFGS

Require computing derivatives (expectations!), iterating.

Page 41: JHU MT class: Feature-based models

Problems

Page 42: JHU MT class: Feature-based models

Problems

•Inference is intractable!

Page 43: JHU MT class: Feature-based models

Problems

•Inference is intractable!

•Compute over n-best lists of outputs.

Page 44: JHU MT class: Feature-based models

Problems

•Inference is intractable!

•Compute over n-best lists of outputs.

•Compute over pruned search graphs.

Page 45: JHU MT class: Feature-based models

Problems

•Inference is intractable!

•Compute over n-best lists of outputs.

•Compute over pruned search graphs.

•Reachability: what if data likelihood is zero?

Page 46: JHU MT class: Feature-based models

Problems

•Inference is intractable!

•Compute over n-best lists of outputs.

•Compute over pruned search graphs.

•Reachability: what if data likelihood is zero?

•Throw away data.

Page 47: JHU MT class: Feature-based models

Problems

•Inference is intractable!

•Compute over n-best lists of outputs.

•Compute over pruned search graphs.

•Reachability: what if data likelihood is zero?

•Throw away data.

•Pretend sentence with highest BLEU score is observed.

Page 48: JHU MT class: Feature-based models

Problems

Page 49: JHU MT class: Feature-based models

Problems

•Why maximize likelihood if we care about BLEU or some other metric?

Page 50: JHU MT class: Feature-based models

BLEU(MT output)

Page 51: JHU MT class: Feature-based models

BLEU(argmaxEnglish

score(English|Chinese))

Page 52: JHU MT class: Feature-based models

BLEU(argmaxEnglish

score(English|Chinese))1�

Chinese∈Test

BLEU

Page 53: JHU MT class: Feature-based models

• Ôptimization