phrase reordering for statistical machine translation based on predicate-argument structure mamoru...
TRANSCRIPT
![Page 1: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/1.jpg)
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument
Structure
Mamoru Komachi, Yuji MatsumotoNara Institute of Science and Technology
Masaaki NagataNTT Communication Science Laboratories
![Page 2: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/2.jpg)
2
Overview of NAIST-NTT System
please write down your address here
住所 を ここ に 書い 下さいて
住所 を ここ に書い 下さいて
• Improve translation model by phrase reordering
address-ACC here -LOC write please
![Page 3: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/3.jpg)
3
Motivation
• Translation model using syntactic and semantic information has not yet succeeded
• Improve distortion model between language pairs with different word orders
Improve statistical machine translation by using predicate-argument structure
Improve word alignment by phrase reordering
![Page 4: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/4.jpg)
4
Outline
• Overview• Phrase Reordering by Predicate-
argument Structure• Experiments and Results• Discussions• Conclusions• Future Work
![Page 5: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/5.jpg)
5
Phrase Reordering by Predicate-argument Structure
住所 を ここ に 書い 下さいて
住所 を ここ に書い 下さいて
• Phrase reordering by morphological analysis (Niessen and Ney, 2001)
• Phrase reordering by parsing (Collins et al., 2005)
Predicate-argument structure analysisaddress-ACC here -LOC write please
![Page 6: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/6.jpg)
6
Predicate-argument Structure Analyzer: SynCha
• Predicate-argument structure analyzer based on (Iida et al., 2006) and (Komachi et al., 2006)• Identify predicates (verb/adjective/event-de
noting noun) and their arguments• Trained on NAIST Text Corpus
http://cl.naist.jp/nldata/corpus/• Can cope with zero-anaphora and ellipsis
• Achieves F-score 0.8 for arguments within a sentence
![Page 7: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/7.jpg)
7
住所 を ここ に 書い 下さいて
住所 を ここ に 書い 下さいて
住所 を ここ に 書い 下さいて
WO-ACC NI-LOC predicate
Predicate-argument Structure Analysis Steps
address-ACC here -LOC write please
![Page 8: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/8.jpg)
8
Phrase Reordering Steps
住所 を ここ に 書い 下さいて
住所 を ここ に書い 下さいて
• Find predicates(verb/adjective/event-denoting noun)
• Use heuristics to match English word order
address-ACC here -LOC write please
![Page 9: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/9.jpg)
9
Preprocessing
• Japanese side• Morphological analyzer/Tokenizer: ChaSen• Dependency parser: CaboCha• Predicate-argument structure: SynCha
• English side• Tokenizer: tokenizer.sed (LDC)• Morphological analyzer: MXPOST• All English words were lowercased for train
ing
![Page 10: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/10.jpg)
10
Aligning Training Corpus
• Manually aligned 45,909 sentence pairs out of 39,953 conversations
かしこまり まし た 。 この 用紙 に 記入 し て 下さい 。
sure . please fill out this form .
かしこまり まし た 。この 用紙 に 記入 し て 下さい 。
sure . please fill out this form .
![Page 11: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/11.jpg)
11
Training Corpus Statistics
この 用紙 に 記入 し て 下さい
please fill out this form
記入 し て 下さい この 用紙 に
Add each pair to training corpus
Learn word alignment by
GIZA++
# of sent.
Improve alignment 33,874Degrade alignment 7,959
No change 4,076Total 45,909
# of sent.
Reordered 18,539Contain crossing 39,979
this form-LOC pleasewrite
![Page 12: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/12.jpg)
12
Experiments
• WMT 2006 shared task baseline system trained on normal order corpus with default parameters
• Baseline system trained on pre-processed corpus with default parameters
• Baseline system trained on pre-processed corpus with parameter optimization by a minimum error rate training tool (Venugopal, 2005)
![Page 13: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/13.jpg)
13
Translation Model and Language Model
• Translation model• GIZA++ (Och and Ney, 2003)
• Language model• Back-off word trigram model trained by Pal
mkit (Ito, 2002)• Decoder
• WMT 2006 shared task baseline system (Pharaoh)
![Page 14: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/14.jpg)
14
Minimum Error Rate Training (MERT)
• Optimize translation parameters for Pharaoh decoder• Phrase translation probability (JE/EJ)• Lexical translation probability (JE/EJ)• Phrase penalty• Phrase distortion probability
• Trained with 500 normal order sentences
![Page 15: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/15.jpg)
15
Results
System BLEU NIST
ASR1-BEST
Baseline 0.1081 4.3555
Proposed (w/o MERT) 0.1366 4.8438
Proposed (w/ MERT) 0.1311 4.8372
Correct recognition
Baseline 0.1170 4.7078
Proposed (w/o MERT) 0.1459 5.3649
Proposed (w/ MERT) 0.1431 5.2105
![Page 16: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/16.jpg)
16
Results for the Evaluation Campaign
• While it had high accuracy on translation of content words, it had poor results on individual word translation• ASR: BLEU 12/14, NIST 11/14, METEOR 6/14• Correct Recognition: BLEU 12/14, NIST 10/14,
METEOR 7/14• Pretty high WER
![Page 17: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/17.jpg)
17
Discussion
• Better accuracy over the baseline system• Improve translation model by phase reordering
• Degrade accuracy by MERT• Could not find a reason yet• Could be explained by the fact that we did not put
any constraints on reordered sentences (They may be ungrammatical on Japanese side)
• Predicate-argument structure accuracy• SynCha is trained on newswire sources (not optimi
zed for travel conversation)
![Page 18: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/18.jpg)
18
Discussion (Cont.)• Phrase alignment got worse by splitting a
case marker from its dependent verb
住所 を ここ に 書い 下さいて
住所 を ここ に書い 下さいて
please write down your address here
address-ACC here -LOC write please
![Page 19: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/19.jpg)
19
Conclusions
• Present phrase reordering model based on predicate-argument structure
• The phrase reordering model improved translation accuracy over the baseline method
![Page 20: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and](https://reader035.vdocuments.us/reader035/viewer/2022072016/56649ee55503460f94bf49e0/html5/thumbnails/20.jpg)
20
Future work
• Investigate the reason why MERT does not work• Make reordered corpus more grammatical
(reorder only arguments)• Use newswire sources to see the effect
of correct predicate-argument structure• Reorder sentences which have crossing
alignments only• Use verb clustering and map arguments
automatically