![Page 1: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/1.jpg)
Training a Parser for Machine Translation Reordering
Jason Katz-Brown Slav Petrov Ryan McDonald Franz Och
David Talbot Hiroshi Ichikawa Masakazu Seno Hideto Kazawa
![Page 2: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/2.jpg)
Two juicy bits
1. Method for improving performance of any parser on any extrinsic metric
2. Experimental results in training parsers for machine-translation reordering
![Page 3: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/3.jpg)
Our reordering system
• Syntactic reordering during preprocessing at training and decoding time (Collins et al., 2005)
• Translation quality sensitive to reordering quality
• Reordering quality sensitive to parse quality
nsubj ROOT dobj
I saw herPRP VBD PRP
I her saw
Reorder verb after its arguments, to
produce Japanese word order:
Input English:
![Page 4: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/4.jpg)
Evaluating reordering quality
Source Wear sunscreen that has an SPF of 15 or above
Japanese-y Reference
15 or above of an SPF has that sunscreen Wear
References made by bilingual non-linguist annotators→Cheap→Fast
![Page 5: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/5.jpg)
Evaluating reordering quality
Source Wear sunscreen that has an SPF of 15 or above
Japanese-y Reference
15 or above of an SPF has that sunscreen Wear
Experiment 15 or above of an SPF has that Wear sunscreen
Experiment reordering score: 1 - (3-1)/(11-1) = 0.8
More detail: Talbot et al., EMNLP-WMT 2011
![Page 6: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/6.jpg)
Let’s make a magical treebank!
![Page 7: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/7.jpg)
Let’s make a magical treebank!
• Reordering quality is predictive of parse quality
![Page 8: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/8.jpg)
Let’s make a magical treebank!
• Reordering quality is predictive of parse quality
• So, what if we make a treebank that contains only parse trees that get a good reordering score?
![Page 9: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/9.jpg)
Let’s make a magical treebank!
• Reordering quality is predictive of parse quality
• So, what if we make a treebank that contains only parse trees that get a good reordering score?
• We can search through a large n-best list of parses from a baseline parser for the one with the best reordering score.
![Page 10: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/10.jpg)
Targeting good reordering
nn ROOT ref rcmod
Wear sunscreen that has ...NN NN WDT VBZ
Reordered: ... has that Wear sunscreenReordered: ... has that Wear sunscreenReordered: ... has that Wear sunscreenReordered: ... has that Wear sunscreenReordered: ... has that Wear sunscreenScore: 0.8 (“Wear” is out of place) Score: 0.8 (“Wear” is out of place) Score: 0.8 (“Wear” is out of place) Score: 0.8 (“Wear” is out of place) Score: 0.8 (“Wear” is out of place)
ROOT dobj ref rcmod
Wear sunscreen that has ...VB NN WDT VBZ
Reordered: ... has that sunscreen WearReordered: ... has that sunscreen WearReordered: ... has that sunscreen WearReordered: ... has that sunscreen WearReordered: ... has that sunscreen WearScore: 1.0 (matches reference)Score: 1.0 (matches reference)Score: 1.0 (matches reference)Score: 1.0 (matches reference)Score: 1.0 (matches reference)
Parse #1:
Parse #23:
![Page 11: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/11.jpg)
Sweet
• This could work for any extrinsic metric!
![Page 12: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/12.jpg)
Self-Training
• McClosky et al. (2006) demonstrated self-training to improve implicit parse accuracies e.g. labeled attachment score (LAS)
• For every sentence in some new corpus:
1. Parse sentence with a baseline parser.2. Add parse to baseline parser’s training data.
![Page 13: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/13.jpg)
Targeted Self-Training
• We target the self-training to incorporate an extrinsic metric
• For every sentence in some new corpus:1. Parse sentence with a baseline parser and
produce a ranked n-best list.2. Select parse with highest extrinsic metric.3. Add parse to baseline parser’s training data.
![Page 14: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/14.jpg)
Experimental Setup
• Two parser models• Deterministic Shift-Reduce Parser (Nivre et al.) + TnT Tagger
(Brants)• Latent Variable Parser (BerkeleyParser; Petrov et al. '06)
• English→SOV reordering rules of Xu et al. (2009) • Precedence rules for how to reorder the children of a
dependency tree node• Suitable for English to Japanese, Korean, Turkish, Hindi, ...
• Stanford converter to convert constituency trees to dependency trees (Marnaffe et al., 2006)
![Page 15: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/15.jpg)
Training on questions
• QuestionBank (Judge et al. '06)
• 1000-sentence train set, 1000-sentence test set
• Compare training on:1. WSJ2. WSJ + 1000 QuestionBank trees3. WSJ + 1000 reference reorderings of same sentences
![Page 16: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/16.jpg)
Question reordering scores
Shift Reduce
Berkeley
0.6 0.663 0.725 0.788 0.85
0.775
0.779
0.8
0.768
0.733
0.663
WSJWSJ+TreebankWSJ+Targeted Self-Training
Reordering score
![Page 17: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/17.jpg)
Question attachment scores
0.6
0.675
0.75
0.825
0.9
0.6 0.675 0.75 0.825 0.9
Shift ReduceBerkeley
Reordering score
LASTargeted
self-training
Treebanktraining
![Page 18: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/18.jpg)
Question translation evaluations
English to
BLEUBLEU Human eval (scores 0-6)Human eval (scores 0-6)Human eval (scores 0-6)
English to
WSJ-only Targeted WSJ-only Targeted Sig. diff?
Japanese 0.2379 0.2615 2.12 2.94 yes
![Page 19: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/19.jpg)
Training on web text
• 13595 English sentences sampled from web
• 6268-sentence train set, 7327-sentence test set
• Compare training on:• WSJ + self-training on 6268 sentences• WSJ + targeted self-training on 6268 reference reorderings
![Page 20: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/20.jpg)
Web text reordering scores
Shift Reduce
Berkeley
0.6 0.663 0.725 0.788 0.85
0.79
0.777
0.785
0.756
0.78
0.757
WSJWSJ+Self-TrainingWSJ+Targeted Self-Training
Reordering score
![Page 21: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/21.jpg)
Web text attachment scores
0.8
0.825
0.85
0.875
0.9
0.75 0.763 0.775 0.788 0.8
Shift ReduceBerkeley
Web text reordering score
WSJLAS
![Page 22: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/22.jpg)
Web text translation evaluations
English to
BLEUBLEU Human eval (scores 0-6)Human eval (scores 0-6)Human eval (scores 0-6)
English to
WSJ-only Targeted WSJ-only Targeted Sig. diff?
Japanese 0.1777 0.1802 2.56 2.69yes
(95%)
Turkish 0.3229 0.3259 2.61 2.70yes
(90%)
Korean 0.1344 0.1370 2.10 2.20yes
(95%)
![Page 23: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/23.jpg)
Related work
• Perceptron model for end-to-end MT system, updating alignment parameters from n-best list based on which leads to best BLEU score (Liang et al., 2006)
• Weakly or distantly supervised structured prediction (e.g. Chang et al., 2007)
• Re-training with bilingual data jointly parsed with multiview learning objective (Burkett et al., 2010)
• Up-training (Petrov et al., 2010)
• “Training dependency parsers by jointly optimizing multiple objectives” (Hall et al., EMNLP 2011)
![Page 24: Training a Parser for Machine Translation Reordering · Let’s make a magical treebank! • Reordering quality is predictive of parse quality • So, what if we make a treebank that](https://reader033.vdocuments.us/reader033/viewer/2022042214/5ebac7ab1d3aac255d706054/html5/thumbnails/24.jpg)
Contributions
• Introduced targeted self-training which• Adapts any parser to any extrinsic metric• Improves translation quality significantly when applied to
word reordering