improving zero-shot translation of low-resource...

43
Surafel M. Lakew 1,2 , Quintino F. Lotito 2 , Matteo Negri 1 , Marco Turchi 1 , Marcello Federico 1 1 Fondazione Bruno Kessler | 2 University of Trento, Trento, Italy Improving Zero-shot Translation of Low-Resource Languages IWSLT-2017 | Tokyo, Japan | 14-15/12/2017

Upload: others

Post on 09-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Surafel M. Lakew1,2, Quintino F. Lotito2, Matteo Negri1, Marco Turchi1, Marcello Federico1

1Fondazione Bruno Kessler | 2University of Trento, Trento, Italy

Improving Zero-shot Translation of Low-Resource Languages

IWSLT-2017 | Tokyo, Japan | 14-15/12/2017

Page 2: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Machine Translation: why low-resource & zero-shot

* Washington Post Article -- mentioning “Ethtnologue language of the world”, 8th ed. 2

Americas: 1,064

Asia: 2,301 Africa: 2,138 Pacific: 1,313 Europe: 286

There are at least 7,102 living languages in the world

Page 3: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Machine Translation: why low-resource & zero-shot

3

A very short Story of

mine

* Languages of Ethiopia: “Ethtnologue language of the world” https://www.ethnologue.com/country/ET

Page 4: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Machine Translation: why low-resource & zero-shot

4

A very short Story of

mine

* Languages of Ethiopia: “Ethtnologue language of the world” https://www.ethnologue.com/country/ET

From a place with 88 Living

Languages

Page 5: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

5 Dying

8 In Trouble

41 Institutional

15 Developing

19 Vigorous

Machine Translation: why low-resource & zero-shot

5

A very short Story of

mine

* Languages of Ethiopia: “Ethtnologue language of the world” https://www.ethnologue.com/country/ET

From a place with 88 Living

Languages

Page 6: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Machine Translation

MTEngine

6

Page 7: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Machine Translation

MTEngine

7

TrainingItalian(source data)

English(target data)

Page 8: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Machine Translation

MTEngine

8

È una storia incredibile. It's an incredible story.Inference

TrainingItalian(source data)

English(target data)

Page 9: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Neural Machine Translation: working mechanism

9

Page 10: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Neural Machine Translation: working mechanism

10Source words

Page 11: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Neural Machine Translation: working mechanism

11

Encoder

Source words

Page 12: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Neural Machine Translation: working mechanism

12

DecoderEncoder

Source words

Page 13: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Neural Machine Translation: working mechanism

13

DecoderEncoder

Source words

Target words

Page 14: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Neural Machine Translation: working mechanism

14

DecoderEncoder

Source words

Target wordsAttention

Page 15: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Neural Machine Translation: a walk to Multilingual NMT

15

2014

NMT - to the main stream

Multi-NMT introduced

Using multiple encoder & decoder

Multi-tasking approaches

2015

Multi-source NMT

Multi-NMT with shared attention mechanism

Multi-NMT with single encoder-decoder

2016

Multimodal-multilingual approaches in WMT17

Increasing number of languages - IWSLT17, for 20 directions

New training approaches - including this paper

2017

Multilingual-NMT (Multi-NMT) permits Zero-Shot Translation (ZST)

Page 16: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

NMT Multi-NMT

Multilingual-NMT & ZST: challenge

16

Small training data

“NMT systems have a steeper learning curve with respect to the amount of training data, resulting in worse quality in low-resource settings...” P. Koehn et al [2017].

Page 17: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Multilingual NMT & ZST: scenario & our hypothesis

17

English

Italian Romanian

GermanDutch

Page 18: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Multilingual NMT & ZST: scenario & our hypothesis

18

English

Italian Romanian

GermanDutch

ZST

ZST results in worse translation performance in such low-resource setting.

Page 19: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Multilingual NMT & ZST: scenario & our hypothesis

19

English

Italian Romanian

GermanDutch

ZST

Pivoting (x-step translation) is an alternative approach to a direct ZST.

Pivoting

ZST results in worse translation performance in such low-resource setting.

Page 20: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Decoder-1

Encoder-1

Decoder-2

Encoder-2

Attention

Multilingual NMT & ZST: related work

20

Firat et al., [2016a]; Multi-way, multilingual neural machine translation with a shared attention mechanism.

Firat et al., [2016b]; Zero-resource translation with multilingual neural machine translation.

Page 21: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Decoder-1

Encoder-1

Decoder-2

Encoder-2

Attention

Multilingual NMT & ZST: related work

21

Firat et al., [2016a]; Multi-way, multilingual neural machine translation with a shared attention mechanism.

Ha et al., [2016]; Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder

Melvin et al., [2016]; Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation.

Firat et al., [2016b]; Zero-resource translation with multilingual neural machine translation.

Decoder-1

Encoder-1

Attention

Page 22: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Multilingual NMT & ZST: current limitations

22

- Pre-assumption to use an already available parallel data

- A not so efficient creation and usage of synthetic data

- Weaker target language ID in low-resource scenario

Page 23: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Multilingual-NMT & ZST: our setup

23Illustration: Google’s Multilingual Neural Machine Translation System - https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html

Multi-NMTEnglish

Dutch

German

Romanian

Italian

English

Dutch

German

Romanian

Italian

No available parallel data for Italian<-->Romanian pairs.

Page 24: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Multilingual-NMT & ZST: our setup

24Illustration: Google’s Multilingual Neural Machine Translation System - https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html

Training Inference

8 directions8 directions

+2 zero-shot

Multi-NMTEnglish

Dutch

German

Romanian

Italian

English

Dutch

German

Romanian

Italian

Page 25: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Multilingual-NMT & ZST: our setup

25Illustration: Google’s Multilingual Neural Machine Translation System - https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html

Multi-NMTEnglish

Dutch

German

Romanian

Italian

English

Dutch

German

Romanian

Italian

Notice how the Italian<-->Romanian ZST directions creates a dual-translation loop.

Page 26: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Iterative Learning

26

Multi-NMT

[ multi-source, multi-target ]Train TRAINING >

Training Data

Page 27: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Iterative Learning

27

[MT output, zst-source ]

[ zst-source ]

Infer< INFERENCE

Multi-NMT

[ multi-source, multi-target ]Train TRAINING >

Training Data

Page 28: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Creates a train-Infer-train

cycle for a dual translation directions

Iterative Learning

28

[MT output, zst-source ]

[ zst-source ]

Infer< INFERENCE

Multi-NMT

[ multi-source, multi-target ]Train TRAINING >

Training Data

Page 29: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Iterative Learning: training with self-generated data

29

RO ITENDE

NL

Training Data

We use the RO & IT dataset for the train-infer-train loop.

Page 30: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Iterative Learning: training with self-generated data

30

<2IT>

<2RO>

Inference at round ZERO - before applying “train-infer-train”

Page 31: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Iterative Learning: training with self-generated data

31

<2IT>

<2RO>

Inference at round ONE

Page 32: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Iterative Learning: training with self-generated data

32

<2IT>

<2RO>

Inference at round N - convergence

Page 33: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Iterative Learning: training with self-generated data

33

How does translation-duality helps to improve the ZST directions?

RO

IT

Inference data

Page 34: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Iterative Learning: training with self-generated data

34

How does translation-duality helps to improve the ZST directions?

ROMulti-NMT

IT* --> RO

RO* --> IT

IT

Inference dataInference

Page 35: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Iterative Learning: training with self-generated data

35

How does translation-duality helps to improve the ZST directions?

RO

RO*

RO* --> ITIT*

IT* --> RO

Multi-NMT

IT* --> RO

RO* --> IT

IT

Inference dataInference MT output New parallel data

Page 36: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Experiments: dataset

36

Language Direction Training Size

EN<-->DE 197,489

EN<-->IT 221,688

EN<-->NL 231,669

EN<-->RO 211,508

IT<-->RO 209,668

IWSLT-2017 Multilingual Dataset

Dataset used to train only the single language pair models.

Page 37: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Experiments: improvements on each round

37

Results of the [Italian <--> Romanian] zero-shot directions on test2017

Page 38: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Experiments: zero-shot comparison

Our proposed “train-infer-train” approach outperformed the baseline Multi-NMT and the Pivoting mechanism on test2017 38

Surafel M. Lakew, Quintino F. Lotito, Marco Turchi, Matteo Negri, and Marcello Federico. 2017. Improving Zero-Shot Translation of Low-Resource Languages. Proc. of IWSLT, Tokyo.

Page 39: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Experiments: non-zero-shot comparison

39

Our proposed “train-infer-train” approach slightly improves the baseline Multi-NMT on test2017

Page 40: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Translation Examples

40

Zero-shot: Italian --> Romanian ... che rafforza la corruzione, l'evasione fiscale, la povertà, l'instabilità.Source

… poarta de bază, evazia fiscală, sărăcia, instabilitatea.Pivot

... restrânge corrupția, fiscale de evasion, poverty, instabilitate.Multi-NMT

... care rafinează corupția, evasarea fiscală, sărăcia, instabilitatea.Multi-NMT*

... care protejează corupţia, evaziunea fiscală, sărăcia şi instabilitatea.Reference

Page 41: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Translation Examples

41

Non-zero-shot: English --> ItalianWe can't use them to make simple images of things out in the Universe.Source

Non possiamo usarli per creare immagini semplici di cose nell'universo.Multi-NMT

Non possiamo usarle per fare semplici immagini di cose nell'universo.Multi-NMT*Non possiamo usarle per fare semplici immagini di cose nell'universoReference

Page 42: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Conclusion

We introduced “train-infer-train”, an approach for improving ZST:

- Efficiently leverages dual-translation directions

- Achieved a significant improvements over a Multi-NMT baseline

- Outperformed a pivoting based approach for ZST

Future work:

- A more efficient training and inference steps

- Including additional monolingual data for the ZST directions

42

Page 43: Improving Zero-shot Translation of Low-Resource Languagesworkshop2017.iwslt.org/downloads/O3-3-Slide.pdf · Ha et al., [2016]; Toward Multilingual Neural Machine Translation with

Surafel M. Lakew | [email protected] Bruno Kessler & University of Trento

43

ありがとう

Thank You

ありがとう

Thank You

Multi-NMT