phrase reordering for statistical machine translation based on predicate-argument structure mamoru...

20
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and Te chnology Masaaki Nagata NTT Communication Science Labora tories

Upload: moses-barrett

Post on 02-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument

Structure

Mamoru Komachi, Yuji MatsumotoNara Institute of Science and Technology

Masaaki NagataNTT Communication Science Laboratories

Page 2: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

2

Overview of NAIST-NTT System

please write down your address here

住所 を ここ に 書い 下さいて

住所 を ここ に書い 下さいて

• Improve translation model by phrase reordering

address-ACC here -LOC write please

Page 3: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

3

Motivation

• Translation model using syntactic and semantic information has not yet succeeded

• Improve distortion model between language pairs with different word orders

Improve statistical machine translation by using predicate-argument structure

Improve word alignment by phrase reordering

Page 4: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

4

Outline

• Overview• Phrase Reordering by Predicate-

argument Structure• Experiments and Results• Discussions• Conclusions• Future Work

Page 5: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

5

Phrase Reordering by Predicate-argument Structure

住所 を ここ に 書い 下さいて

住所 を ここ に書い 下さいて

• Phrase reordering by morphological analysis (Niessen and Ney, 2001)

• Phrase reordering by parsing (Collins et al., 2005)

Predicate-argument structure analysisaddress-ACC here -LOC write please

Page 6: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

6

Predicate-argument Structure Analyzer: SynCha

• Predicate-argument structure analyzer based on (Iida et al., 2006) and (Komachi et al., 2006)• Identify predicates (verb/adjective/event-de

noting noun) and their arguments• Trained on NAIST Text Corpus

http://cl.naist.jp/nldata/corpus/• Can cope with zero-anaphora and ellipsis

• Achieves F-score 0.8 for arguments within a sentence

Page 7: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

7

住所 を ここ に 書い 下さいて

住所 を ここ に 書い 下さいて

住所 を ここ に 書い 下さいて

WO-ACC NI-LOC predicate

Predicate-argument Structure Analysis Steps

address-ACC here -LOC write please

Page 8: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

8

Phrase Reordering Steps

住所 を ここ に 書い 下さいて

住所 を ここ に書い 下さいて

• Find predicates(verb/adjective/event-denoting noun)

• Use heuristics to match English word order

address-ACC here -LOC write please

Page 9: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

9

Preprocessing

• Japanese side• Morphological analyzer/Tokenizer: ChaSen• Dependency parser: CaboCha• Predicate-argument structure: SynCha

• English side• Tokenizer: tokenizer.sed (LDC)• Morphological analyzer: MXPOST• All English words were lowercased for train

ing

Page 10: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

10

Aligning Training Corpus

• Manually aligned 45,909 sentence pairs out of 39,953 conversations

かしこまり まし た 。 この 用紙 に 記入 し て 下さい 。

sure . please fill out this form .

かしこまり まし た 。この 用紙 に 記入 し て 下さい 。

sure . please fill out this form .

Page 11: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

11

Training Corpus Statistics

この 用紙 に 記入 し て 下さい

please fill out this form

記入 し て 下さい この 用紙 に

Add each pair to training corpus

Learn word alignment by

GIZA++

# of sent.

Improve alignment 33,874Degrade alignment 7,959

No change 4,076Total 45,909

# of sent.

Reordered 18,539Contain crossing 39,979

this form-LOC pleasewrite

Page 12: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

12

Experiments

• WMT 2006 shared task baseline system trained on normal order corpus with default parameters

• Baseline system trained on pre-processed corpus with default parameters

• Baseline system trained on pre-processed corpus with parameter optimization by a minimum error rate training tool (Venugopal, 2005)

Page 13: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

13

Translation Model and Language Model

• Translation model• GIZA++ (Och and Ney, 2003)

• Language model• Back-off word trigram model trained by Pal

mkit (Ito, 2002)• Decoder

• WMT 2006 shared task baseline system (Pharaoh)

Page 14: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

14

Minimum Error Rate Training (MERT)

• Optimize translation parameters for Pharaoh decoder• Phrase translation probability (JE/EJ)• Lexical translation probability (JE/EJ)• Phrase penalty• Phrase distortion probability

• Trained with 500 normal order sentences

Page 15: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

15

Results

System BLEU NIST

ASR1-BEST

Baseline 0.1081 4.3555

Proposed (w/o MERT) 0.1366 4.8438

Proposed (w/ MERT) 0.1311 4.8372

Correct recognition

Baseline 0.1170 4.7078

Proposed (w/o MERT) 0.1459 5.3649

Proposed (w/ MERT) 0.1431 5.2105

Page 16: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

16

Results for the Evaluation Campaign

• While it had high accuracy on translation of content words, it had poor results on individual word translation• ASR: BLEU 12/14, NIST 11/14, METEOR 6/14• Correct Recognition: BLEU 12/14, NIST 10/14,

METEOR 7/14• Pretty high WER

Page 17: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

17

Discussion

• Better accuracy over the baseline system• Improve translation model by phase reordering

• Degrade accuracy by MERT• Could not find a reason yet• Could be explained by the fact that we did not put

any constraints on reordered sentences (They may be ungrammatical on Japanese side)

• Predicate-argument structure accuracy• SynCha is trained on newswire sources (not optimi

zed for travel conversation)

Page 18: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

18

Discussion (Cont.)• Phrase alignment got worse by splitting a

case marker from its dependent verb

住所 を ここ に 書い 下さいて

住所 を ここ に書い 下さいて

please write down your address here

address-ACC here -LOC write please

Page 19: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

19

Conclusions

• Present phrase reordering model based on predicate-argument structure

• The phrase reordering model improved translation accuracy over the baseline method

Page 20: Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and

20

Future work

• Investigate the reason why MERT does not work• Make reordered corpus more grammatical

(reorder only arguments)• Use newswire sources to see the effect

of correct predicate-argument structure• Reorder sentences which have crossing

alignments only• Use verb clustering and map arguments

automatically