why generative models underperform surface heuristics uc berkeley natural language processing john...
Post on 19-Dec-2015
213 views
TRANSCRIPT
![Page 1: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/1.jpg)
Why Generative Models Underperform Surface
Heuristics
UC BerkeleyNatural Language Processing
John DeNero, Dan Gillick, James Zhang, and Dan Klein
![Page 2: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/2.jpg)
Overview: Learning Phrases
Sentence-aligned corpus
cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9language ||| langue ||| 0.9 …
Phrase table(translation model)
Intersected and grown word alignments
Directional word alignments
![Page 3: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/3.jpg)
Overview: Learning Phrases
Sentence-aligned corpus
cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9language ||| langue ||| 0.9 …
Phrase table(translation model)
Phrase-level generative model
• Early successful phrase-based SMT system [Marcu & Wong ‘02]
• Challenging to train
• Underperforms heuristic approach
![Page 4: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/4.jpg)
OutlineI) Generative phrase-based alignment
Motivation Model structure and training Performance results
II) Error analysis Properties of the learned phrase
table Contributions to increased error rate
III) Proposed Improvements
![Page 5: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/5.jpg)
Motivation for Learning Phrases
Translate!
Input sentence:
Output sentence:
J ’ ai un chat .
I have a spade .
![Page 6: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/6.jpg)
Motivation for Learning Phrases
appelle un chat un chat
call
a
spade
a
spade
appelle call
chat un chat spade a spade
![Page 7: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/7.jpg)
Motivation for Learning Phrases
appelle un chat un chat
call
a
spade
a
spade
appelleappelle un appelle un chatunun chatun chat unchatchat unchat un chat
callcall acall a spadea x2
a spade x2
a spade aspade x2
spade aspade a spade
… appelle un chat un chat …
![Page 8: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/8.jpg)
A Phrase Alignment Model Compatible with Pharaoh
les chats aiment le poisson frais .
cats like fresh fish .
![Page 9: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/9.jpg)
Training Regimen That Respects Word Alignment
les chatsaiment
lepoisson
cats
like
fresh
fish
.
.frais
.
les chatsaiment
lepoisson
cats
like
fresh
fish
.
.
frais
.
X
![Page 10: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/10.jpg)
Training Regimen That Respects Word Alignment
les chatsaiment
lepoisson
cats
like
fresh
fish
.
.frais
.
Only 46% of training sentences contributed to training.
![Page 11: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/11.jpg)
36
37
38
39
40
0 1 2 3 4
EM Iterations
BLEU
100k25k
Performance Results
Heuristically generated parameters
![Page 12: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/12.jpg)
Performance Results
39.0
38.538.3
38.8
37
38
39
40
Heuristic(100k)
Heuristic(50k)
Heuristic(25k)
Learned(100k)
BLEU
Lost training data is not the whole story
Learned parameters with 4x training data
underperform heuristic
![Page 13: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/13.jpg)
OutlineI) Generative phrase-based alignment
Model structure and training Performance results
II) Error analysis Properties of the learned phrase
table Contributions to increased error
rate
III) Proposed Improvements
![Page 14: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/14.jpg)
Training Corpus
French: carte sur la table
English: map on the table
French: carte sur la table
English: notice on the chart
Example: Maximizing Likelihood with Competing Segmentations
cartecartecarte surcarte surcarte sur lacarte sur lasurlasur lasur la tablesur la tablela tablela tabletabletable
mapnoticemap onnotice onmap on thenotice on theontheon theon the tableon the chartthe tablethe charttablechart
0.50.50.50.50.50.51.01.01.00.50.50.50.50.50.50.25 * 7 / 7
= 0.25
carte sur la tableLikelihood Computation
![Page 15: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/15.jpg)
Training Corpus
French: carte sur la table
English: map on the table
French: carte sur la table
English: notice on the chart
Example: Maximizing Likelihood with Competing Segmentations
cartecarte surcarte sur lasursur lasur la tablelala tabletable
mapnotice onnotice on theonon theon the tablethethe tablechart
1.01.01.01.01.01.01.01.01.0
carte sur la table
Likelihood of “notice on the chart” pair: 1.0 * 2 / 7 = 0.28 > 0.25
Likelihood of “map on the table” pair: 1.0 * 2 / 7 = 0.28 > 0.25
![Page 16: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/16.jpg)
EM Training Significantly Decreases Entropy of the Phrase Table
French phrase entropy:
0 10 20 30 40
0-.01
.01-.5
.5-1
1-1.5
1.5-2
> 2
Entropy
Percent of French Phrases
LearnedHeuristic
10% of French phrases have deterministic distributions
![Page 17: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/17.jpg)
Effect 1: Useful Phrase Pairs Are Lost Due to Critically Small Probabilities
In 10k translated sentences, no phrases with weight less than 10-5 were used by the decoder.
0 100 200 300 400
Heuristic
Learned
Effective Table Size (1000 phrases)
![Page 18: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/18.jpg)
Effect 2: Determinized Phrases Override Better Candidates During Decoding
the situation varies to an enormous degree
the situation varie d ' une immense degré
the situation varies to an enormous degree
the situation varie d ' une immense caractérise
Heuristic
Learned
~00.02amount
0.010.02extent
0.260.38level
0.640.49degree
degré
€
φH
€
φEM
0.998~0degree
~00.05features
0.0010.21characterized
0.0010.49characterizes
caractérise
€
φH
€
φEM
![Page 19: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/19.jpg)
Effect 3: Ambiguous Foreign Phrases Become Active During Decoding
Deterministic phrases can be used by the decoder with no cost.
Translations for the French apostrophe
![Page 20: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/20.jpg)
OutlineI) Generative phrase-based alignment
Model structure and training Performance results
II) Error analysis Properties of the learned phrase
table Contributions to increased error
rate
III) Proposed Improvements
![Page 21: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/21.jpg)
Motivation for Reintroducing Entropy to the Phrase Table
1. Useful phrase pairs are lost due to critically small probabilities.
2. Determinized phrases override better candidates.
3. Ambiguous foreign phrases become active during decoding.
![Page 22: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/22.jpg)
Reintroducing Lost Phrases
36.5 37 37.5 38 38.5 39
Learned
Heuristic
Interpolated
BLEU (25k sentences)
Interpolation yields up to 1.0 BLEU improvement
![Page 23: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/23.jpg)
Smoothing Phrase Probabilities
Reserves probability mass for unseen translations based on
the length of the French phrase
36.5 37 37.5 38 38.5 39
Learned
Heuristic
Smoothed
BLEU (25k sentences)
![Page 24: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/24.jpg)
Conclusion Generative phrase models determinize the phrase table via the latent segmentation variable.
A determinized phrase table introduces errors at decoding time.
Modest improvement can be realized by reintroducing phrase table entropy.
![Page 25: Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein](https://reader034.vdocuments.us/reader034/viewer/2022050714/56649d3a5503460f94a14e07/html5/thumbnails/25.jpg)
Questions?