multi-system machine translation using online apis for english-latvian
TRANSCRIPT
Multi-system machine translation using online APIs for
English-LatvianMatīss Rikters
University of Latvia
ACL 2015 Fourth Workshop onHybrid Approaches to Translation
Beijing, 31.07.2015
Introduction
Motivation: Doctoral studies at the University of Latvia
A hybrid machine translation method, combining results of various machine translation systems
Literature review Recent trends in Multi-System Machine Translation
Nothing similar publically available was found
Introduction
Goals: Combine output from multiple online MT APIs Keep it simple Make it work fast
Related work
"Coupling Statistical Machine Translation with Rule-based Transfer and Generation", A. Ahsan, and P. Kolachina.
"Using language and translation models to select the best among outputs from multiple MT systems", Y. Akiba, T. Watanabe, and E. Sumita.
"MANY: Open source machine translation system combination", L. Barrault.
"A program for automatically selecting the best output from multiple machine translation engines", C. Callison-Burch and R. S. Flournoy.
Initial plan
Use systems that support English – Latvian translation Found five such systems:
What worked
Couldn`t get APIs of two of them to work Used the remaining three:
System descriptionSentence tokenization
Translation with APIs
Google Translate Bing Translator LetsMT
Selection of the best translation
Output
Selection of the best translation
Probabilities are calculated based on the observed entry with longest matching history:
where the probability and backoff penalties are given by an already-estimated language model. Perplexity is then calculated using this probability:
where given an unknown probability distribution p and a proposed probability model q, it is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn from p.
System usage
Get the code - https://github.com/M4t1ss/Multi-System-Hybrid-Translator Get API access
Google - https://cloud.google.com/translate/ Bing - http://www.bing.com/dev/en-us/translator LetsMT - https://www.letsmt.eu/Integration.aspx Add API keys to the configuration
Prepare a language model You can use KenLM – https://kheafield.com/code/kenlm/
Prepare input data Run
php MSHT.php languageModel.binary inputSentances.txt
Experiments
MT System APIs Google Translate Bing Translator TB2013 EN-LV v03 from LetsMT
Language model JRC Acquis corpus version 2.2
Input sentences JRC Acquis corpus version 2.2 ACCURAT balanced test corpus for under resourced languages
Experiment results – JRC Acquis
System BLEU TER WERTranslations selected
Google Bing LetsMT EqualGoogle Translate 16.92 47.68 58.55 100 % - - -
Bing Translator 17.16 49.66 58.40 - 100 % - -
LetsMT 28.27 36.19 42.89 - - 100 % -
Hybrid Google + Bing 17.28
48.30 58.15
50.09 %
45.03 %
- 4.88 %
Hybrid Google + LetsMT 22.89 41.38 50.31 46.17 %
- 48.39 %
5.44 %
Hybrid LetsMT + Bing 22.83 42.92 50.62 - 45.35 %
49.84 %
4.81 %
Hybrid Google + Bing + LetsMT
21.08 44.12 52.99 28.93 %
34.31 %
33.98 %
2.78 %
Experiment results – ACCURAT balanced
System BLEUGoogle Translate 24.73Bing Translator 22.07LetsMT 32.01Hybrid Google + Bing 23.75Hybrid Google + LetsMT 28.94Hybrid LetsMT + Bing 27.44Hybrid Google + Bing + LetsMT 26.74
Human evaluation
5 native Latvian speakers were given a random 2% - 32 sentences They were told to mark which of the three MT outputs is the best, worst
and OK Having the option to select multiple answers for best, worst or OK
Human resultsSyste
mUser 1
User 2
User 3 User 4 User
5AVG
userHybri
d BLEU
Bing 21,88%
53,13%
28,13% 25,00% 31,25% 31,88% 28,93
% 16.92
28,13%
25,00%
25,00% 28,13% 46,88% 30,63% 34,31
% 17.16
LetsMT
50,00%
21,88%
46,88% 46,88% 21,88% 37,50% 33,98
% 28.27
Conclusion
Simple to Build Use Add new MT APIs
Works When used on similar systems Poor with one much superior system
Needs Improvements for translation selection More configuration options
Future work
Use a bigger & better language model? Tried it… about the same results
Confusion networks? Too confusing for now
Use MT quality estimation for selecting the best candidates QuEst or QuEst++ Other quality estimation
Chunk sentences in smaller parts, translate & recombine
Thank you!
http://ej.uz/MSHT-GITHUB
http://ej.uz/MSMT-EN-LV