improving zero-shot translation of low-resource...

Surafel M. Lakew1,2, Quintino F. Lotito2, Matteo Negri1, Marco Turchi1, Marcello Federico1

1Fondazione Bruno Kessler | 2University of Trento, Trento, Italy

Improving Zero-shot Translation of Low-Resource Languages

IWSLT-2017 | Tokyo, Japan | 14-15/12/2017

Machine Translation: why low-resource & zero-shot

* Washington Post Article -- mentioning “Ethtnologue language of the world”, 8th ed. 2

Americas: 1,064

Asia: 2,301 Africa: 2,138 Pacific: 1,313 Europe: 286

There are at least 7,102 living languages in the world


3

A very short Story of

mine

* Languages of Ethiopia: “Ethtnologue language of the world” https://www.ethnologue.com/country/ET

https://www.ethnologue.com/country/ET


4


mine


From a place with 88 Living

Languages


5 Dying

8 In Trouble

41 Institutional

15 Developing

19 Vigorous


5


mine


From a place with 88 Living

Languages


Machine Translation

MTEngine

6

Machine Translation

MTEngine

7

TrainingItalian(source data)

English(target data)

Machine Translation

MTEngine

8

È una storia incredibile. It's an incredible story.Inference

TrainingItalian(source data)

English(target data)

Neural Machine Translation: working mechanism

9


10Source words


11

Encoder

Source words


12

DecoderEncoder

Source words


13

DecoderEncoder

Source words

Target words


14

DecoderEncoder

Source words

Target wordsAttention

Neural Machine Translation: a walk to Multilingual NMT

15

2014

NMT - to the main stream

Multi-NMT introduced

Using multiple encoder & decoder

Multi-tasking approaches

2015

Multi-source NMT

Multi-NMT with shared attention mechanism

Multi-NMT with single encoder-decoder

2016

Multimodal-multilingual approaches in WMT17

Increasing number of languages - IWSLT17, for 20 directions

New training approaches - including this paper

2017

Multilingual-NMT (Multi-NMT) permits Zero-Shot Translation (ZST)

NMT Multi-NMT

Multilingual-NMT & ZST: challenge

16

Small training data

“NMT systems have a steeper learning curve with respect to the amount of training data, resulting in worse quality in low-resource settings...” P. Koehn et al [2017].

Multilingual NMT & ZST: scenario & our hypothesis

17

English

Italian Romanian

GermanDutch


18

English

Italian Romanian

GermanDutch

ZST

ZST results in worse translation performance in such low-resource setting.


19

English

Italian Romanian

GermanDutch

ZST

Pivoting (x-step translation) is an alternative approach to a direct ZST.

Pivoting

ZST results in worse translation performance in such low-resource setting.

Decoder-1

Encoder-1

Decoder-2

Encoder-2

Attention

Multilingual NMT & ZST: related work

20

Firat et al., [2016a]; Multi-way, multilingual neural machine translation with a shared attention mechanism.

Firat et al., [2016b]; Zero-resource translation with multilingual neural machine translation.

Decoder-1

Encoder-1

Decoder-2

Encoder-2

Attention

Multilingual NMT & ZST: related work

21

Firat et al., [2016a]; Multi-way, multilingual neural machine translation with a shared attention mechanism.

Ha et al., [2016]; Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder

Melvin et al., [2016]; Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation.

Firat et al., [2016b]; Zero-resource translation with multilingual neural machine translation.

Decoder-1

Encoder-1

Attention

Multilingual NMT & ZST: current limitations

22

- Pre-assumption to use an already available parallel data

- A not so efficient creation and usage of synthetic data

- Weaker target language ID in low-resource scenario

Multilingual-NMT & ZST: our setup

23Illustration: Google’s Multilingual Neural Machine Translation System - https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html

Multi-NMTEnglish

Dutch

German

Romanian

Italian

English

Dutch

German

Romanian

Italian

No available parallel data for Italian<-->Romanian pairs.

https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html



Training Inference

8 directions8 directions

+2 zero-shot

Multi-NMTEnglish

Dutch

German

Romanian

Italian

English

Dutch

German

Romanian

Italian




Multi-NMTEnglish

Dutch

German

Romanian

Italian

English

Dutch

German

Romanian

Italian

Notice how the Italian<-->Romanian ZST directions creates a dual-translation loop.


Iterative Learning

26

Multi-NMT

[ multi-source, multi-target ]Train TRAINING >

Training Data

Iterative Learning

27

[MT output, zst-source ]

[ zst-source ]

Infer< INFERENCE

Multi-NMT


Training Data

Creates a train-Infer-train

cycle for a dual translation directions

Iterative Learning

28

[MT output, zst-source ]

[ zst-source ]

Infer< INFERENCE

Multi-NMT


Training Data

Iterative Learning: training with self-generated data

29

RO ITENDE

NL

Training Data

We use the RO & IT dataset for the train-infer-train loop.


30

<2IT>

<2RO>

Inference at round ZERO - before applying “train-infer-train”


31

<2IT>

<2RO>

Inference at round ONE


32

<2IT>

<2RO>

Inference at round N - convergence


33

How does translation-duality helps to improve the ZST directions?

RO

IT

Inference data


34


ROMulti-NMT

IT* --> RO

RO* --> IT

IT

Inference dataInference


35


RO

RO*

RO* --> ITIT*

IT* --> RO

Multi-NMT

IT* --> RO

RO* --> IT

IT

Inference dataInference MT output New parallel data

Experiments: dataset

36

Language Direction Training Size

EN<-->DE 197,489

EN<-->IT 221,688

EN<-->NL 231,669

EN<-->RO 211,508

IT<-->RO 209,668

IWSLT-2017 Multilingual Dataset

Dataset used to train only the single language pair models.

Experiments: improvements on each round

37

Results of the [Italian <--> Romanian] zero-shot directions on test2017

Experiments: zero-shot comparison

Our proposed “train-infer-train” approach outperformed the baseline Multi-NMT and the Pivoting mechanism on test2017 38

Surafel M. Lakew, Quintino F. Lotito, Marco Turchi, Matteo Negri, and Marcello Federico. 2017. Improving Zero-Shot Translation of Low-Resource Languages. Proc. of IWSLT, Tokyo.

Experiments: non-zero-shot comparison

39

Our proposed “train-infer-train” approach slightly improves the baseline Multi-NMT on test2017

Translation Examples

40

Zero-shot: Italian --> Romanian ... che rafforza la corruzione, l'evasione fiscale, la povertà, l'instabilità.Source

… poarta de bază, evazia fiscală, sărăcia, instabilitatea.Pivot

... restrânge corrupția, fiscale de evasion, poverty, instabilitate.Multi-NMT

... care rafinează corupția, evasarea fiscală, sărăcia, instabilitatea.Multi-NMT*

... care protejează corupţia, evaziunea fiscală, sărăcia şi instabilitatea.Reference

Translation Examples

41

Non-zero-shot: English --> ItalianWe can't use them to make simple images of things out in the Universe.Source

Non possiamo usarli per creare immagini semplici di cose nell'universo.Multi-NMT

Non possiamo usarle per fare semplici immagini di cose nell'universo.Multi-NMT*Non possiamo usarle per fare semplici immagini di cose nell'universoReference

Conclusion

We introduced “train-infer-train”, an approach for improving ZST:

- Efficiently leverages dual-translation directions

- Achieved a significant improvements over a Multi-NMT baseline

- Outperformed a pivoting based approach for ZST

Future work:

- A more efficient training and inference steps

- Including additional monolingual data for the ZST directions

42

Surafel M. Lakew | [email protected] Bruno Kessler & University of Trento

43

ありがとう

Thank You

ありがとう

Thank You

Multi-NMT

mailto:[email protected]

improving zero-shot translation of low-resource...

Documents