multi-system machine translation using online apis for english-latvian

17
Multi-system machine translation using online APIs for English-Latvian Matīss Rikters University of Latvia ACL 2015 Fourth Workshop on Hybrid Approaches to Translation Beijing, 31.07.2015

Upload: matiss-rikters

Post on 11-Feb-2017

546 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Multi-system machine translation using online APIs for English-Latvian

Multi-system machine translation using online APIs for

English-LatvianMatīss Rikters

University of Latvia

ACL 2015 Fourth Workshop onHybrid Approaches to Translation

Beijing, 31.07.2015

Page 2: Multi-system machine translation using online APIs for English-Latvian

Introduction

Motivation: Doctoral studies at the University of Latvia

A hybrid machine translation method, combining results of various machine translation systems

Literature review Recent trends in Multi-System Machine Translation

Nothing similar publically available was found

Page 3: Multi-system machine translation using online APIs for English-Latvian

Introduction

Goals: Combine output from multiple online MT APIs Keep it simple Make it work fast

Page 4: Multi-system machine translation using online APIs for English-Latvian

Related work

"Coupling Statistical Machine Translation with Rule-based Transfer and Generation", A. Ahsan, and P. Kolachina.

"Using language and translation models to select the best among outputs from multiple MT systems", Y. Akiba, T. Watanabe, and E. Sumita.

"MANY: Open source machine translation system combination", L. Barrault.

"A program for automatically selecting the best output from multiple machine translation engines", C. Callison-Burch and R. S. Flournoy.

Page 5: Multi-system machine translation using online APIs for English-Latvian

Initial plan

Use systems that support English – Latvian translation Found five such systems:

Page 6: Multi-system machine translation using online APIs for English-Latvian

What worked

Couldn`t get APIs of two of them to work Used the remaining three:

Page 7: Multi-system machine translation using online APIs for English-Latvian

System descriptionSentence tokenization

Translation with APIs

Google Translate Bing Translator LetsMT

Selection of the best translation

Output

Page 8: Multi-system machine translation using online APIs for English-Latvian

Selection of the best translation

Probabilities are calculated based on the observed entry with longest matching history:

where the probability and backoff penalties are given by an already-estimated language model. Perplexity is then calculated using this probability:

where given an unknown probability distribution p and a proposed probability model q, it is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn from p.

Page 9: Multi-system machine translation using online APIs for English-Latvian

System usage

Get the code - https://github.com/M4t1ss/Multi-System-Hybrid-Translator Get API access

Google - https://cloud.google.com/translate/ Bing - http://www.bing.com/dev/en-us/translator LetsMT - https://www.letsmt.eu/Integration.aspx Add API keys to the configuration

Prepare a language model You can use KenLM – https://kheafield.com/code/kenlm/

Prepare input data Run

php MSHT.php languageModel.binary inputSentances.txt

Page 10: Multi-system machine translation using online APIs for English-Latvian

Experiments

MT System APIs Google Translate Bing Translator TB2013 EN-LV v03 from LetsMT

Language model JRC Acquis corpus version 2.2

Input sentences JRC Acquis corpus version 2.2 ACCURAT balanced test corpus for under resourced languages

Page 11: Multi-system machine translation using online APIs for English-Latvian

Experiment results – JRC Acquis

System BLEU TER WERTranslations selected  

Google Bing LetsMT EqualGoogle Translate 16.92 47.68 58.55 100 % - - -

Bing Translator 17.16 49.66 58.40 - 100 % - -

LetsMT 28.27 36.19 42.89 - - 100 % -

Hybrid Google + Bing 17.28

48.30 58.15

50.09 %

45.03 %

- 4.88 %

Hybrid Google + LetsMT 22.89 41.38 50.31 46.17 %

- 48.39 %

5.44 %

Hybrid LetsMT + Bing 22.83 42.92 50.62 - 45.35 %

49.84 %

4.81 %

Hybrid Google + Bing + LetsMT

21.08 44.12 52.99 28.93 %

34.31 %

33.98 %

2.78 %

Page 12: Multi-system machine translation using online APIs for English-Latvian

Experiment results – ACCURAT balanced

System BLEUGoogle Translate 24.73Bing Translator 22.07LetsMT 32.01Hybrid Google + Bing 23.75Hybrid Google + LetsMT 28.94Hybrid LetsMT + Bing 27.44Hybrid Google + Bing + LetsMT 26.74

Page 13: Multi-system machine translation using online APIs for English-Latvian

Human evaluation

5 native Latvian speakers were given a random 2% - 32 sentences They were told to mark which of the three MT outputs is the best, worst

and OK Having the option to select multiple answers for best, worst or OK

Page 14: Multi-system machine translation using online APIs for English-Latvian

Human resultsSyste

mUser 1

User 2

User 3 User 4 User

5AVG

userHybri

d BLEU

Bing 21,88%

53,13%

28,13% 25,00% 31,25% 31,88% 28,93

% 16.92

Google

28,13%

25,00%

25,00% 28,13% 46,88% 30,63% 34,31

% 17.16

LetsMT

50,00%

21,88%

46,88% 46,88% 21,88% 37,50% 33,98

% 28.27

Page 15: Multi-system machine translation using online APIs for English-Latvian

Conclusion

Simple to Build Use Add new MT APIs

Works When used on similar systems Poor with one much superior system

Needs Improvements for translation selection More configuration options

Page 16: Multi-system machine translation using online APIs for English-Latvian

Future work

Use a bigger & better language model? Tried it… about the same results

Confusion networks? Too confusing for now

Use MT quality estimation for selecting the best candidates QuEst or QuEst++ Other quality estimation

Chunk sentences in smaller parts, translate & recombine

Page 17: Multi-system machine translation using online APIs for English-Latvian

Thank you!

http://ej.uz/MSHT-GITHUB

http://ej.uz/MSMT-EN-LV