advanced signal processing 05/06 reinisch bernhard statistical machine translation phrase based...

37
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

Upload: bartholomew-johns

Post on 30-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

Advanced Signal Processing 05/06Reinisch Bernhard

Statistical Machine Translation

Phrase Based Model

Page 2: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

2/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Overview

● The quality of the MT systems have improved with the use of phrase translation– Phrases from word-based alignments– Syntactic phrases– Phrases from phrase alignments– IBM word-based statistical MT systems

enhanced with phrase translation● Best to extract phrase translations pairs?

– Evaluation Framework / Outcome

Page 3: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

3/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Word based approaches

● Try to model word-to-word correspondences● Models are often restricted

– source word -> exactly one target word– Hidden Markov models in speech recognition

● Enhanced to “One-to-many” alignment model– Solve lexical problems like

● “Zahnarzttermin” -> “dentist’s appointment”

● Order of words will be changed

Page 4: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

4/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Statistical machine translation (1)

● argmax … search/decoding problem (generation of the output sentence)

● Pr(e1) … language model

● Pr(f1|e1) … translation model

J1

I1

I1

ê

J1

I1

ê

I1

Jj1I1Jj1

J1

e|fPrePrmaxarg

f|ePrmaxargê

e...e...ee;f...f...ff

I1

I1

Page 5: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

5/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Statistical machine translation (2)

Taken from [2]

Page 6: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

6/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Learning translation lexica

● Following describes methods for learning single-word and phrase-based translation lexica– Statistical alignment models

● Used for learning word alignments● Symmetrization

– Bilingual phrases– Alignment templates

Page 7: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

7/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Statistical alignment models (1)

● In the alignment model– A “hidden” parameter is introduced a– a describes the mapping from source position j to target

position aj

● “a” is represented as a matrix with binary values– 1 entry … words are aligned– 0 entry … words are not aligned– source word -> no target word (empty word eo)

Page 8: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

8/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Statistical alignment models (2)

● In general the model depends on a set of unknown parameters

● Exist several different specific statistical alignment models– First compute word alignments i.e. model 4– Train this hidden parameters θ

● Alignment with highest probability– called Viterbi alignment

Page 9: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

9/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Symmetrization (1)

● Baseline alignment model (i.e. model 4) does not allow multiple target words– “Zahnarzttermin” -> “dentist’s appointment”

● Outcome should be such

alignment matrix

Taken from [2]

Page 10: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

10/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Symmetrization (2)

● To solve this problem– Training in both directions – For a sentence pair -> two Viterbi alignments

– Now both alignments tables A1 and A2 have to combined (symmetized)

● Simple union of both tables (some refined methods)– Result then is used to train single word based

translation lexica

Page 11: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

11/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Symmetrization (2)

– By computing for relative frequencies using:

● N(e|f) … how many times e and f are aligned● N(f) … how many time the word f occurs

Page 12: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

12/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Bilingual phrases

● Now we need an algorithm that relationships between whole phrases of source sentence m and target sentence n– “phrase extract” algorithm

and take as input

alignment matrix A

Taken from [2]

Page 13: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

13/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Alignment templates (1)

● A more systematic approach– Considers whole phrases

● Whole group of adjacent words in the source● maps to a whole group of words in the target

– The context of words have greater influence – The changes of word order can be learned

● The Idea is to model two different alignment levels– Word level alignments– Phrase level alignments

Page 14: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

14/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Alignment templates (2)

• Alignments templates z– “F”… source class sequence– “E”…target class sequence– “A”… describes the alignment between source

and target

• “F” and “E” are classes – The advantage is a better generalization

~~~

,,, AEFz

Page 15: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

15/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Alignment templates (3)

Taken from [2]

Page 16: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

16/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Alignment templates (4)

● For the training we need the probability of applying an alignment template

● The “phrase extraction” have to be modified● Can be estimated by relative frequencies● Finished the

“Learning translation lexica”-task

Page 17: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

17/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Translation model (1)

• For notation we decompose the sentences

– f1J…source sentence

– e1I…target sentence

– sequence of phrases (k=1,…,K)

• Further considerations (only one segmentation)

kk

kk

jjk

KI

jjk

KJ

eeeee

fffff

,....,;

,....,;

1

~

1

~

1

1

~

1

~

1

1

1

Page 18: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

18/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Translation model (2)

● The model have to allow reordering of the phrases

K

K

Kz

K

K

K

z

zfe

e

f

K

,

ibleshidden var following

ationfor transl templatealignment ...

phrase theofn permutatio...phrasestarget...

phrasessource...

K1

1

~

1

~

K11

~

1

~

Page 19: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

19/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Translation model (3)

Taken from [2]

Page 20: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

20/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Translation model (4)

Taken from [2]

Page 21: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

21/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Alignment template approach results

● Evaluation of the approach by a translation task (“Verbmobil Task”)

● Additional preprocessing– word-joinings– word-splitting

Taken from [2]

Taken from [2]

Page 22: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

22/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Alignment template approach conclusions

● Overall we see a better performance● So it is important to model word groups in

source and target language● By using two abstraction levels

– Phrase level alignments– Word level alignments– -> greater influence of the context and can be

learned explicitly

Page 23: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

23/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Syntactic phrases (1)

● A collection of all phrase pairs will also include non-intuitive phrases– “Okay, the”, “house the”, etc… – Intuitively such phrases do not help– Restricting to syntactically motivated phrases

● The idea of syntactic trees and phrases as subtrees

Page 24: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

24/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Syntactic phrases (2)

● The input sentence is preprocessed by a syntactic parser

● Different operations will be performed on each node– reordering child nodes– inserting extra words at each node– translating leaf words

Page 25: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

25/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Syntactic phrases (3)

Taken from [4]

Page 26: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

26/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Syntactic phrases (4)

Taken from [6]

Page 27: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

27/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Syntactic phrases (5)

● Reordering– Every given child sequence has a probability of

reordering (N nodes -> N! pos. reorderings)– The probability of reordering is given by the model (table

etc)● Inserting

– Extra word can be inserted (left/right)– Another table for insert probability

● Translating– Operation is applied to every leaf– Assumption that this operation only depends on the word

itself

Page 28: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

28/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Experiments

● Now we have three models● [1] build a system to compare them and

measure performance under different aspects– Weighting syntactic phrases– Maximum phrase length

● Setup– Free corpus Europarl– German to English– Performance measured using BLEU score

Page 29: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

29/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Comparison of core methods

● AP… template alignment● M4 … IBM Model 4 for word

based translation● Syn … syntactic phrases

● Training corpus size [sentences]

Taken from [1] Taken from [1]

Page 30: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

30/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Weighting syntactic phrases (1)

● The restriction on syntactic phrases is harmful, because too many phrases are eliminated

● Intuitively that can not be– Improvements in data collection, during

translation, penalizing● Results suggest

– Collection of only syntactically phrases – Performance not better– But smaller table sizes

Page 31: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

31/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Weighting syntactic phrases (2)

● Example:– “es gibt” literally translates in “it gives” but really

means “there is”– Not syntactic relationship– Also “with regard to”, “note that” syntactically

complex but easy translation

Page 32: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

32/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Maximum phrase length

● How long do phrases have to be to achieve high performance?

● All experiments with “Phrases from word-based alignments” approach

Taken from [1] Taken from [1]

Page 33: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

33/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Simpler Underlying word-based models (1)

● The core of this framework is IBM model 4 for collecting phrase pairs

● Model 4 is computationally expensive, parameters problems (approximations)

● What about IBM models 1-3– Faster and easier to implement– Model 1 and 2 compute word alignments

efficiently

Page 34: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

34/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Simpler Underlying word-based models (2)

● How much is performance affected, if the base word alignment on these simpler methods?

● M1 worst performance● But M2 & M3 provide similar

performance to the M4 model

Taken from [1]

Page 35: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

35/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Conclusions

● Intuitively phrase bases approaches gives better performance than word-based approaches

● Also experiments show us that– “straight forward” forward syntax based models

have disadvantages● The “best” outcome with small word phrases● Phrase extraction and the alignment heuristic

have a great influence

Page 36: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

36/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

References

● [1] Philipp Koehn, Franz Josef Och, Daniel Marcu; Statistical Phrase-Based Translation

● [2] Franz Josef Och, Hermann Ney; The Alignment Template Approach to Statistical Machine Translation

● [3] Franz Josef Och, Christoph Tillmann, Hermann Ney; Improved Alignment Models for Statistical Machine Translation

● [4] Kenji Yamada, Kevin Knight; A Syntax-based Translation Model

● [5] Daniel Marcu, William Wong; A Phrase-Based, Joint Probability Model for Statistical Machine Translation

● [6] Amitabha Mukerjee, Ankit Soni and Achla M. Raina; Detecting Complex Predicates in Hindi using POS Projection across Parallel Corpora

● [7] www.sbox.tugraz.at/home/b/brein/061120_TranslationModelPhraseBased.zip

Page 37: Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model

37/37ASP 06/07Reinisch Bernhard

Translation Model – Phrase-based

Advanced Signal Processing 05/06Reinisch Bernhard

Statistical Machine Translation

Phrase Based Models