12: (non-neural) statistical machine translation - university of...
TRANSCRIPT
![Page 1: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/1.jpg)
Machine Translation12: (Non-neural) Statistical Machine Translation
Rico Sennrich
University of Edinburgh
R. Sennrich MT – 2018 – 12 1 / 27
![Page 2: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/2.jpg)
Today’s Lecture
So far, main focus of lecture was on:neural machine translationresearch since ≈2013
today, we look at (non-neural) Statistical Machine Translation,and research since ≈ 1990
R. Sennrich MT – 2018 – 12 1 / 27
![Page 3: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/3.jpg)
MT – 2018 – 12
1 Statistical Machine TranslationBasicsPhrase-based SMTHierarchical SMTSyntax-based SMT
R. Sennrich MT – 2018 – 12 2 / 27
![Page 4: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/4.jpg)
Refresher: A probabilistic model of translation
Suppose that we have:a source sentence S of length m (x1, . . . , xm)a target sentence T of length n (y1, . . . , yn)
We can express translation as a probabilistic model:
T ∗ = argmaxT
P (T |S)
= argmaxT
P (S|T )P (T ) Bayes’ theorem
We can model translation via two models:
language model to estimate P (T )translation model to estimate P (S|T )
Without continuous space representations, how to estimate P (S|T )?→ break it up into smaller units
R. Sennrich MT – 2018 – 12 3 / 27
![Page 5: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/5.jpg)
Word Alignment
chicken-and-egg problemlet’s break up P (S|T ) into small units (words):
we can estimate an alignment given a translation modelexpectation step
we can estimate translation model given a an alignment(using relative frequencies)maximization step
what can we do if we have neither?
solution: Expectation Maximization Algorithm
initialize model
iterate between estimating alignment and translation model
simplest model based on lexical translation; more complex modelsconsider position and fertility
R. Sennrich MT – 2018 – 12 4 / 27
![Page 6: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/6.jpg)
Word Alignment: IBM Models [Brown et al., 1993]
EM Algorithm
... la maison ... la maison blue ... la fleur ...
... the house ... the blue house ... the flower ...
• Initial step: all alignments equally likely
• Model learns that, e.g., la is often aligned with the
Chapter 4: Word-Based Models 14
R. Sennrich MT – 2018 – 12 5 / 27
![Page 7: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/7.jpg)
Word Alignment: IBM Models [Brown et al., 1993]
EM Algorithm
... la maison ... la maison blue ... la fleur ...
... the house ... the blue house ... the flower ...
• After one iteration
• Alignments, e.g., between la and the are more likely
Chapter 4: Word-Based Models 15
R. Sennrich MT – 2018 – 12 5 / 27
![Page 8: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/8.jpg)
Word Alignment: IBM Models [Brown et al., 1993]
EM Algorithm
... la maison ... la maison bleu ... la fleur ...
... the house ... the blue house ... the flower ...
• After another iteration
• It becomes apparent that alignments, e.g., between fleur and flower are morelikely (pigeon hole principle)
Chapter 4: Word-Based Models 16
R. Sennrich MT – 2018 – 12 5 / 27
![Page 9: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/9.jpg)
Word Alignment: IBM Models [Brown et al., 1993]
EM Algorithm
... la maison ... la maison bleu ... la fleur ...
... the house ... the blue house ... the flower ...
• Convergence
• Inherent hidden structure revealed by EM
Chapter 4: Word-Based Models 17
R. Sennrich MT – 2018 – 12 5 / 27
![Page 10: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/10.jpg)
Word Alignment: IBM Models [Brown et al., 1993]
IBM Model 1 and EM
• Probabilitiesp(the|la) = 0.7 p(house|la) = 0.05
p(the|maison) = 0.1 p(house|maison) = 0.8
• Alignments
la •maison•
the•house•
la •maison•
the•house•
@@@
la •maison•
the•house•,
,, la •
maison•the•house•
@@@,
,,
p(e, a|f) = 0.56 p(e, a|f) = 0.035 p(e, a|f) = 0.08 p(e, a|f) = 0.005
p(a|e, f) = 0.824 p(a|e, f) = 0.052 p(a|e, f) = 0.118 p(a|e, f) = 0.007
• Countsc(the|la) = 0.824 + 0.052 c(house|la) = 0.052 + 0.007
c(the|maison) = 0.118 + 0.007 c(house|maison) = 0.824 + 0.118
Chapter 4: Word-Based Models 21
R. Sennrich MT – 2018 – 12 5 / 27
![Page 11: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/11.jpg)
Linear Models
T ∗ = argmaxT
P (S|T )P (T ) Bayes’ theorem
T ∗ ≈ argmaxT
M∑m=1
λmhm(S, T ) [Och, 2003]
linear combination of arbitrary features
Minimum Error Rate Training to optimize feature weights
big trend in SMT research: engineering new/better features
R. Sennrich MT – 2018 – 12 6 / 27
![Page 12: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/12.jpg)
Word-based SMT
core ideacombine word-based translation model and n-gram language model tocompute score of translation
consequences+ models are easy to compute
- word translations are assumed to be independent of each other:only LM takes into account context
- poor at modelling long-distance phenomena:n-gram context is limited
R. Sennrich MT – 2018 – 12 7 / 27
![Page 13: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/13.jpg)
MT – 2018 – 12
1 Statistical Machine TranslationBasicsPhrase-based SMTHierarchical SMTSyntax-based SMT
R. Sennrich MT – 2018 – 12 8 / 27
![Page 14: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/14.jpg)
Phrase-based SMT
core ideaBasic translation unit in translation model is not word, but word sequence(phrase)
consequences+ much better memorization of frequent phrase translations
- large (and noisy) phrase table
- large search space; requires sophisticated pruning
- still poor at modelling long-distance phenomena
Mr Steiger gone to Cologne
Herr Steiger nach Köln gefahren
unfortunately ,
leider ist
has
R. Sennrich MT – 2018 – 12 9 / 27
![Page 15: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/15.jpg)
Phrase Extraction
extraction rules based on word-aligned sentence pairphrase pair must be compatible with alignment......but unaligned words are okphrases are contiguous sequences
Extracting Phrase Translation Rules
Ishall
bepassing
some
onto
you
comments
Ich
werd
eIh
nen
die
ents
prec
hend
enAn
mer
kung
enau
shän
dige
n
shall be = werde
Syntax-based Statistical Machine Translation 36
R. Sennrich MT – 2018 – 12 10 / 27
![Page 16: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/16.jpg)
Phrase Extraction
extraction rules based on word-aligned sentence pairphrase pair must be compatible with alignment......but unaligned words are okphrases are contiguous sequences
Extracting Phrase Translation Rules
Ishall
bepassing
some
onto
you
comments
Ich
werd
eIh
nen
die
ents
prec
hend
enAn
mer
kung
enau
shän
dige
n
some comments = die entsprechenden Anmerkungen
Syntax-based Statistical Machine Translation 37
R. Sennrich MT – 2018 – 12 10 / 27
![Page 17: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/17.jpg)
Phrase Extraction
extraction rules based on word-aligned sentence pair
phrase pair must be compatible with alignment...
...but unaligned words are ok
phrases are contiguous sequencesExtracting Phrase Translation Rules
Ishall
bepassing
some
onto
you
comments
Ich
werd
eIh
nen
die
ents
prec
hend
enAn
mer
kung
enau
shän
dige
n
werde Ihnen die entsprechenden Anmerkungen aushändigen = shall be passing on to you some comments
Syntax-based Statistical Machine Translation 38R. Sennrich MT – 2018 – 12 10 / 27
![Page 18: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/18.jpg)
Common Features in Phrase-based SMT
phrase translation probabilities (in both directions)
word translation probabilities (in both directions)
language model
reordering model
constant penalty for each phrase used
sparse features with learned cost for some (classes of) phrase pairs
multiple models of each type possible
R. Sennrich MT – 2018 – 12 11 / 27
![Page 19: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/19.jpg)
Decoding
Translation Options
he
er geht ja nicht nach hause
it, it
, he
isare
goesgo
yesis
, of course
notdo not
does notis not
afterto
according toin
househome
chamberat home
notis not
does notdo not
homeunder housereturn home
do not
it ishe will be
it goeshe goes
isare
is after alldoes
tofollowingnot after
not tonot
is notare notis not a
• The machine translation decoder does not know the right answer– picking the right translation options– arranging them in the right order
→ Search problem solved by heuristic beam search
Chapter 6: Decoding 9
R. Sennrich MT – 2018 – 12 12 / 27
![Page 20: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/20.jpg)
DecodingDecoding: Hypothesis Expansion
er geht ja nicht nach hause
are
pick any translation option, create new hypothesis
Chapter 6: Decoding 12
R. Sennrich MT – 2018 – 12 13 / 27
![Page 21: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/21.jpg)
DecodingDecoding: Hypothesis Expansion
er geht ja nicht nach hause
are
it
he
create hypotheses for all other translation options
Chapter 6: Decoding 13
R. Sennrich MT – 2018 – 12 13 / 27
![Page 22: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/22.jpg)
Decoding
Decoding: Hypothesis Expansion
er geht ja nicht nach hause
are
it
hegoes
does not
yes
go
to
home
home
also create hypotheses from created partial hypothesis
Chapter 6: Decoding 14
R. Sennrich MT – 2018 – 12 13 / 27
![Page 23: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/23.jpg)
Decoding
Decoding: Find Best Path
er geht ja nicht nach hause
are
it
hegoes
does not
yes
go
to
home
home
backtrack from highest scoring complete hypothesis
Chapter 6: Decoding 15
R. Sennrich MT – 2018 – 12 13 / 27
![Page 24: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/24.jpg)
Decoding
large search space (exponential number of hypotheses)reduction of search space:
recombination of identical hypothesespruning of hypotheses
efficient decoding is a lot more complex in SMT than in neural MT
R. Sennrich MT – 2018 – 12 14 / 27
![Page 25: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/25.jpg)
MT – 2018 – 12
1 Statistical Machine TranslationBasicsPhrase-based SMTHierarchical SMTSyntax-based SMT
R. Sennrich MT – 2018 – 12 15 / 27
![Page 26: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/26.jpg)
Hierarchical SMT
core ideause context-free grammars (CFG) rules as basic translation units→ allows gaps
consequences+ better modeling of some reordering patterns
Herr Steigerleider Kölnist nach gefahren
Mr Steigerunfortunately Cologne , has gone to
- overgeneralisation is still possible
Herr Steiger nichtleider Kölnist nach gefahren
Herr Steiger does notunfortunately Cologne, has gone to
R. Sennrich MT – 2018 – 12 16 / 27
![Page 27: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/27.jpg)
Hierarchical Phrase Extraction
Extracting Hierarchical Phrase Translation Rules
Ishall
bepassing
some
onto
you
comments
Ich
werd
eIh
nen
die
ents
prec
hend
enAn
mer
kung
enau
shän
dige
n
werde X aushändigen= shall be passing on X
subtractingsubphrase
Syntax-based Statistical Machine Translation 39
R. Sennrich MT – 2018 – 12 17 / 27
![Page 28: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/28.jpg)
Decoding
Decoding via (S)CFG derivationSCFG Derivations1 | s1
• Derivation starts with pair of linked s symbols.
Syntax-based Statistical Machine Translation 12
R. Sennrich MT – 2018 – 12 18 / 27
![Page 29: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/29.jpg)
Decoding
Decoding via (S)CFG derivationSCFG Derivations1 | s1
⇒ s2 x3 | s2 x3
• s→ s1 x2 | s1 x2 (glue rule)
Syntax-based Statistical Machine Translation 13
R. Sennrich MT – 2018 – 12 18 / 27
![Page 30: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/30.jpg)
Decoding
Decoding via (S)CFG derivationSCFG Derivations1 | s1
⇒ s2 x3 | s2 x3
⇒ s2 x4 und x5 | s2 x4 and x5
• x→ x1 und x2 | x1 and x2
Syntax-based Statistical Machine Translation 14
R. Sennrich MT – 2018 – 12 18 / 27
![Page 31: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/31.jpg)
Decoding
Decoding via (S)CFG derivationSCFG Derivations1 | s1
⇒ s2 x3 | s2 x3
⇒ s2 x4 und x5 | s2 x4 and x5
⇒ s2 unzutreffend und x5 | s2 unfounded and x5
• x→ unzutreffend | unfounded
Syntax-based Statistical Machine Translation 15
R. Sennrich MT – 2018 – 12 18 / 27
![Page 32: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/32.jpg)
Decoding
Decoding via (S)CFG derivationSCFG Derivations1 | s1
⇒ s2 x3 | s2 x3
⇒ s2 x4 und x5 | s2 x4 and x5
⇒ s2 unzutreffend und x5 | s2 unfounded and x5
⇒ s2 unzutreffend und irrefuhrend | s2 unfounded and misleading
• x→ irrefuhrend | misleading
Syntax-based Statistical Machine Translation 16
R. Sennrich MT – 2018 – 12 18 / 27
![Page 33: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/33.jpg)
Decoding
Decoding via (S)CFG derivationSCFG Derivations1 | s1
⇒ s2 x3 | s2 x3
⇒ s2 x4 und x5 | s2 x4 and x5
⇒ s2 unzutreffend und x5 | s2 unfounded and x5
⇒ s2 unzutreffend und irrefuhrend | s2 unfounded and misleading
⇒ x6 unzutreffend und irrefuhrend | x6 unfounded and misleading
• s→ x1 | x1 (glue rule)
Syntax-based Statistical Machine Translation 17
R. Sennrich MT – 2018 – 12 18 / 27
![Page 34: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/34.jpg)
Decoding
Decoding via (S)CFG derivationSCFG Derivations1 | s1
⇒ s2 x3 | s2 x3
⇒ s2 x4 und x5 | s2 x4 and x5
⇒ s2 unzutreffend und x5 | s2 unfounded and x5
⇒ s2 unzutreffend und irrefuhrend | s2 unfounded and misleading
⇒ x6 unzutreffend und irrefuhrend | x6 unfounded and misleading
⇒ deshalb x7 die x8 unzutreffend und irrefuhrend
| therefore the x8 x7 unfounded and misleading
• x→ deshalb x1 die x2 | therefore the x2 x1 (non-terminal reordering)
Syntax-based Statistical Machine Translation 18
R. Sennrich MT – 2018 – 12 18 / 27
![Page 35: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/35.jpg)
Decoding
Decoding via (S)CFG derivationSCFG Derivations1 | s1
⇒ s2 x3 | s2 x3
⇒ s2 x4 und x5 | s2 x4 and x5
⇒ s2 unzutreffend und x5 | s2 unfounded and x5
⇒ s2 unzutreffend und irrefuhrend | s2 unfounded and misleading
⇒ x6 unzutreffend und irrefuhrend | x6 unfounded and misleading
⇒ deshalb x7 die x8 unzutreffend und irrefuhrend
| therefore the x8 x7 unfounded and misleading
⇒ deshalb sei die x8 unzutreffend und irrefuhrend
| therefore the x8 was unfounded and misleading
• x→ sei | was
Syntax-based Statistical Machine Translation 19
R. Sennrich MT – 2018 – 12 18 / 27
![Page 36: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/36.jpg)
Decoding
Decoding via (S)CFG derivationSCFG Derivations1 | s1
⇒ s2 x3 | s2 x3
⇒ s2 x4 und x5 | s2 x4 and x5
⇒ s2 unzutreffend und x5 | s2 unfounded and x5
⇒ s2 unzutreffend und irrefuhrend | s2 unfounded and misleading
⇒ x6 unzutreffend und irrefuhrend | x6 unfounded and misleading
⇒ deshalb x7 die x8 unzutreffend und irrefuhrend
| therefore the x8 x7 unfounded and misleading
⇒ deshalb sei die x8 unzutreffend und irrefuhrend
| therefore the x8 was unfounded and misleading
⇒ deshalb sei die Werbung unzutreffend und irrefuhrend
| therefore the advertisement was unfounded and misleading
• x→Werbung | advertisement
Syntax-based Statistical Machine Translation 20
R. Sennrich MT – 2018 – 12 18 / 27
![Page 37: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/37.jpg)
MT – 2018 – 12
1 Statistical Machine TranslationBasicsPhrase-based SMTHierarchical SMTSyntax-based SMT
R. Sennrich MT – 2018 – 12 19 / 27
![Page 38: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/38.jpg)
Syntax-based SMT
core ideause syntax on source, target, or both
rule extraction constrained by syntax
potentially use syntactic structures for scoring (syntax-based LMs)
consequencesdepend on exact flavor of syntax used; here: string-to-tree SMT
+ less overgeneralisation
- sparsity in grammar requires relaxation of extraction constraints
- label matching constraints increase search space during decoding
Herr Steiger
NN
S
NNNP
VAFIN
leiderADV APPR
Köln
NE
ist nach gefahren
VVPP
PPVP
Herr SteigerNNP
S
NNP
NP
,
unfortunatelyADV VBZ
Cologne
NP
, has goneVBN
VPVP
NNP
toTO
PP
R. Sennrich MT – 2018 – 12 20 / 27
![Page 39: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/39.jpg)
Syntax-based Phrase Extraction
Learning Syntactic Translation Rules
PRP IMD shall
VB beVBG passing
DT some
RP onTO to
PRP you
NNS comments
Ich
PPE
R
werd
e V
AFIN
Ihne
n P
PER
die
ART
ents
pr.
ADJ
Anm
. N
N
aush
änd.
VV
FIN
NP
PPVP
VP
VP
S
NP
VPVP
S
pro
Ihnen
= pp
prp
you
to
to
Syntax-based Statistical Machine Translation 43
R. Sennrich MT – 2018 – 12 21 / 27
![Page 40: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/40.jpg)
Decoding
Example
Input jemand mußte Josef K. verleumdet habensomeone must Josef K. slandered have
Grammar
⇒ r1: np → Josef K. | Josef K. 0.90⇒ r2: vbn → verleumdet | slandered 0.40⇒ r3: vbn → verleumdet | defamed 0.20⇒ r4: vp → mußte x1 x2 haben | must have vbn2 np1 0.10⇒ r5: s → jemand x1 | someone vp1 0.60⇒ r6: s → jemand mußte x1 x2 haben | someone must have vbn2 np1 0.80⇒ r7: s → jemand mußte x1 x2 haben | np1 must have been vbn1 by someone 0.05
Derivation 1 jemand
X
someone
S
Source Target
verleumdet
X
Josef
habenX
X
mußte
slandered
have VBNmust
VP
K.
NP
Josef K.
Syntax-based Statistical Machine Translation 72
R. Sennrich MT – 2018 – 12 22 / 27
![Page 41: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/41.jpg)
Why Syntax-based SMT?
many variants (syntax on source/target/both...)syntactic constraints for rule extraction and application prevent someover-generalizationssyntactic structure can be exploited by feature functions:
unification constraints [Williams, 2009]
“eine” →
cat ART
infl
case nomdeclension mixed
agr
[gender fnum sg
]
“Welt” →
cat NN
infl
case nom
agr
[gender fnum sg
]
syntax-based neural language model [Sennrich, 2015]
PSYNTAX(T,D) ≈n∏
i=1
Pl(i)× Pw(i)
Pl(i) =P (li |wa(i), la(i))
Pw(i) =P (wi |li, wa(i), la(i))
Laura hat einen kleinen GartenLaura has a small garden
root obja
attrsubj
det
R. Sennrich MT – 2018 – 12 23 / 27
![Page 42: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/42.jpg)
Edinburgh’s* WMT Results over the Years
2013 2014 2015 2016 20170.0
10.0
20.0
30.0
20.3 20.9 20.8 21.519.4 20.2
22.0 22.1
18.9
24.726.0
BLE
U(n
ewst
est2
013
EN→
DE
)
phrase-based SMTsyntax-based SMTneural MT
*NMT 2015 from U. Montréal: https://sites.google.com/site/acl16nmt/
R. Sennrich MT – 2018 – 12 24 / 27
![Page 43: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/43.jpg)
What Phrase-based SMT (Still) Does Better than NMT
better performance in low-data conditions [Koehn and Knowles, 2017]
clear stopping criterion at decoding time:when all source words have been covered by a phrase pair
good ecosystem of methods for specialized requirements (e.g.inclusion of terminology)ability to inspect translation decisions and models:
alignment between source and outputadd/remove phrase table entries
R. Sennrich MT – 2018 – 12 25 / 27
![Page 44: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/44.jpg)
Software
Moses SMT Toolkitdeveloped in Edinburgh
many features and extensive documentation:http://www.statmt.org/moses
documentation of baseline phrase-based systems:http://www.statmt.org/moses/?n=moses.baseline
http://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2017/baseline/
baselineSystemPhrase_kj.html
config files for SOTA (in 2014/5) syntax-based systems:https://github.com/rsennrich/wmt2014-scripts
R. Sennrich MT – 2018 – 12 26 / 27
![Page 45: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/45.jpg)
Further Reading
text booksPhilipp Koehn (2009). Statistical Machine Translation.
Philip Williams; Rico Sennrich; Matt Post; Philipp Koehn (2016).Syntax-based Statistical Machine Translation.
online resourcessyntax-based tutorial by Philip Williams and Philipp Koehn(slide credit to them for some slides shown here):http://homepages.inf.ed.ac.uk/s0898777/syntax-tutorial.pdf
slides on word- and phrase-based SMT by Philipp Koehn:http://www.statmt.org/book/slides/04-word-based-models.pdf
http://www.statmt.org/book/slides/05-phrase-based-models.pdf
http://www.statmt.org/book/slides/06-decoding.pdf
R. Sennrich MT – 2018 – 12 27 / 27
![Page 46: 12: (Non-neural) Statistical Machine Translation - University of …homepages.inf.ed.ac.uk/rsennric/mt18/12.pdf · 2018. 3. 5. · Today’s Lecture So far, main focus of lecture](https://reader035.vdocuments.us/reader035/viewer/2022071404/60f9311b7c17796568085c86/html5/thumbnails/46.jpg)
Bibliography I
Brown, P. F., Della Pietra, V. J., Della Pietra, S. A., and Mercer, R. L. (1993).The Mathematics of Statistical Machine Translation: Parameter Estimation.Computational Linguistics, 19(2):263–311.
Koehn, P. and Knowles, R. (2017).Six Challenges for Neural Machine Translation.In Proceedings of the First Workshop on Neural Machine Translation, pages 28–39, Vancouver. Association for ComputationalLinguistics.
Och, F. J. (2003).Minimum Error Rate Training in Statistical Machine Translation.In ACL ’03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pages 160–167, Sapporo,Japan. Association for Computational Linguistics.
Sennrich, R. (2015).Modelling and Optimizing on Syntactic N-Grams for Statistical Machine Translation.Transactions of the Association for Computational Linguistics, 3:169–182.
Williams, P. (2009).Towards Statistical Machine Translation with Unification Grammars.Master’s thesis, University of Edinburgh, Edinburgh, UK.
R. Sennrich MT – 2018 – 12 28 / 27