improving zero-shot translation of low-resource...
TRANSCRIPT
Surafel M. Lakew1,2, Quintino F. Lotito2, Matteo Negri1, Marco Turchi1, Marcello Federico1
1Fondazione Bruno Kessler | 2University of Trento, Trento, Italy
Improving Zero-shot Translation of Low-Resource Languages
IWSLT-2017 | Tokyo, Japan | 14-15/12/2017
Machine Translation: why low-resource & zero-shot
* Washington Post Article -- mentioning “Ethtnologue language of the world”, 8th ed. 2
Americas: 1,064
Asia: 2,301 Africa: 2,138 Pacific: 1,313 Europe: 286
There are at least 7,102 living languages in the world
Machine Translation: why low-resource & zero-shot
3
A very short Story of
mine
* Languages of Ethiopia: “Ethtnologue language of the world” https://www.ethnologue.com/country/ET
Machine Translation: why low-resource & zero-shot
4
A very short Story of
mine
* Languages of Ethiopia: “Ethtnologue language of the world” https://www.ethnologue.com/country/ET
From a place with 88 Living
Languages
5 Dying
8 In Trouble
41 Institutional
15 Developing
19 Vigorous
Machine Translation: why low-resource & zero-shot
5
A very short Story of
mine
* Languages of Ethiopia: “Ethtnologue language of the world” https://www.ethnologue.com/country/ET
From a place with 88 Living
Languages
Machine Translation
MTEngine
6
Machine Translation
MTEngine
7
TrainingItalian(source data)
English(target data)
Machine Translation
MTEngine
8
È una storia incredibile. It's an incredible story.Inference
TrainingItalian(source data)
English(target data)
Neural Machine Translation: working mechanism
9
Neural Machine Translation: working mechanism
10Source words
Neural Machine Translation: working mechanism
11
Encoder
Source words
Neural Machine Translation: working mechanism
12
DecoderEncoder
Source words
Neural Machine Translation: working mechanism
13
DecoderEncoder
Source words
Target words
Neural Machine Translation: working mechanism
14
DecoderEncoder
Source words
Target wordsAttention
Neural Machine Translation: a walk to Multilingual NMT
15
2014
NMT - to the main stream
Multi-NMT introduced
Using multiple encoder & decoder
Multi-tasking approaches
2015
Multi-source NMT
Multi-NMT with shared attention mechanism
Multi-NMT with single encoder-decoder
2016
Multimodal-multilingual approaches in WMT17
Increasing number of languages - IWSLT17, for 20 directions
New training approaches - including this paper
2017
Multilingual-NMT (Multi-NMT) permits Zero-Shot Translation (ZST)
NMT Multi-NMT
Multilingual-NMT & ZST: challenge
16
Small training data
“NMT systems have a steeper learning curve with respect to the amount of training data, resulting in worse quality in low-resource settings...” P. Koehn et al [2017].
Multilingual NMT & ZST: scenario & our hypothesis
17
English
Italian Romanian
GermanDutch
Multilingual NMT & ZST: scenario & our hypothesis
18
English
Italian Romanian
GermanDutch
ZST
ZST results in worse translation performance in such low-resource setting.
Multilingual NMT & ZST: scenario & our hypothesis
19
English
Italian Romanian
GermanDutch
ZST
Pivoting (x-step translation) is an alternative approach to a direct ZST.
Pivoting
ZST results in worse translation performance in such low-resource setting.
Decoder-1
Encoder-1
Decoder-2
Encoder-2
Attention
Multilingual NMT & ZST: related work
20
Firat et al., [2016a]; Multi-way, multilingual neural machine translation with a shared attention mechanism.
Firat et al., [2016b]; Zero-resource translation with multilingual neural machine translation.
Decoder-1
Encoder-1
Decoder-2
Encoder-2
Attention
Multilingual NMT & ZST: related work
21
Firat et al., [2016a]; Multi-way, multilingual neural machine translation with a shared attention mechanism.
Ha et al., [2016]; Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder
Melvin et al., [2016]; Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation.
Firat et al., [2016b]; Zero-resource translation with multilingual neural machine translation.
Decoder-1
Encoder-1
Attention
Multilingual NMT & ZST: current limitations
22
- Pre-assumption to use an already available parallel data
- A not so efficient creation and usage of synthetic data
- Weaker target language ID in low-resource scenario
Multilingual-NMT & ZST: our setup
23Illustration: Google’s Multilingual Neural Machine Translation System - https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html
Multi-NMTEnglish
Dutch
German
Romanian
Italian
English
Dutch
German
Romanian
Italian
No available parallel data for Italian<-->Romanian pairs.
Multilingual-NMT & ZST: our setup
24Illustration: Google’s Multilingual Neural Machine Translation System - https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html
Training Inference
8 directions8 directions
+2 zero-shot
Multi-NMTEnglish
Dutch
German
Romanian
Italian
English
Dutch
German
Romanian
Italian
Multilingual-NMT & ZST: our setup
25Illustration: Google’s Multilingual Neural Machine Translation System - https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html
Multi-NMTEnglish
Dutch
German
Romanian
Italian
English
Dutch
German
Romanian
Italian
Notice how the Italian<-->Romanian ZST directions creates a dual-translation loop.
Iterative Learning
26
Multi-NMT
[ multi-source, multi-target ]Train TRAINING >
Training Data
Iterative Learning
27
[MT output, zst-source ]
[ zst-source ]
Infer< INFERENCE
Multi-NMT
[ multi-source, multi-target ]Train TRAINING >
Training Data
Creates a train-Infer-train
cycle for a dual translation directions
Iterative Learning
28
[MT output, zst-source ]
[ zst-source ]
Infer< INFERENCE
Multi-NMT
[ multi-source, multi-target ]Train TRAINING >
Training Data
Iterative Learning: training with self-generated data
29
RO ITENDE
NL
Training Data
We use the RO & IT dataset for the train-infer-train loop.
Iterative Learning: training with self-generated data
30
<2IT>
<2RO>
Inference at round ZERO - before applying “train-infer-train”
Iterative Learning: training with self-generated data
31
<2IT>
<2RO>
Inference at round ONE
Iterative Learning: training with self-generated data
32
<2IT>
<2RO>
Inference at round N - convergence
Iterative Learning: training with self-generated data
33
How does translation-duality helps to improve the ZST directions?
RO
IT
Inference data
Iterative Learning: training with self-generated data
34
How does translation-duality helps to improve the ZST directions?
ROMulti-NMT
IT* --> RO
RO* --> IT
IT
Inference dataInference
Iterative Learning: training with self-generated data
35
How does translation-duality helps to improve the ZST directions?
RO
RO*
RO* --> ITIT*
IT* --> RO
Multi-NMT
IT* --> RO
RO* --> IT
IT
Inference dataInference MT output New parallel data
Experiments: dataset
36
Language Direction Training Size
EN<-->DE 197,489
EN<-->IT 221,688
EN<-->NL 231,669
EN<-->RO 211,508
IT<-->RO 209,668
IWSLT-2017 Multilingual Dataset
Dataset used to train only the single language pair models.
Experiments: improvements on each round
37
Results of the [Italian <--> Romanian] zero-shot directions on test2017
Experiments: zero-shot comparison
Our proposed “train-infer-train” approach outperformed the baseline Multi-NMT and the Pivoting mechanism on test2017 38
Surafel M. Lakew, Quintino F. Lotito, Marco Turchi, Matteo Negri, and Marcello Federico. 2017. Improving Zero-Shot Translation of Low-Resource Languages. Proc. of IWSLT, Tokyo.
Experiments: non-zero-shot comparison
39
Our proposed “train-infer-train” approach slightly improves the baseline Multi-NMT on test2017
Translation Examples
40
Zero-shot: Italian --> Romanian ... che rafforza la corruzione, l'evasione fiscale, la povertà, l'instabilità.Source
… poarta de bază, evazia fiscală, sărăcia, instabilitatea.Pivot
... restrânge corrupția, fiscale de evasion, poverty, instabilitate.Multi-NMT
... care rafinează corupția, evasarea fiscală, sărăcia, instabilitatea.Multi-NMT*
... care protejează corupţia, evaziunea fiscală, sărăcia şi instabilitatea.Reference
Translation Examples
41
Non-zero-shot: English --> ItalianWe can't use them to make simple images of things out in the Universe.Source
Non possiamo usarli per creare immagini semplici di cose nell'universo.Multi-NMT
Non possiamo usarle per fare semplici immagini di cose nell'universo.Multi-NMT*Non possiamo usarle per fare semplici immagini di cose nell'universoReference
Conclusion
We introduced “train-infer-train”, an approach for improving ZST:
- Efficiently leverages dual-translation directions
- Achieved a significant improvements over a Multi-NMT baseline
- Outperformed a pivoting based approach for ZST
Future work:
- A more efficient training and inference steps
- Including additional monolingual data for the ZST directions
42
Surafel M. Lakew | [email protected] Bruno Kessler & University of Trento
43
ありがとう
Thank You
ありがとう
Thank You
Multi-NMT