statistical machine translation with moses hieu hoang localization world 2013 0.6227
TRANSCRIPT
Statistical Machine Translation with Moses
Hieu HoangLocalization World 2013
0.6227
Moses by Hieu Hoang, University of Edinburgh
2
Agenda
• What is Statistical Machine Translation?• What is Moses?– Common misconceptions
• Coming up• What can we do for you?
Moses by Hieu Hoang, University of Edinburgh
3
Agenda
• What is Statistical Machine Translation?• What is Moses?– Common misconceptions
• Coming up• What can we do for you?
Moses by Hieu Hoang, University of Edinburgh
4
What is Statistical Machine Translation?
It is very tempting to say that a book written in Chinese is simply a book written in English which was coded into the “Chinese code.” If we have useful methods for solving almost any cryptographic problem, may it not be thatwith proper interpretation we already have useful methods for translation?
Warren Weaver1949
Moses by Hieu Hoang, University of Edinburgh
5
• NLP Application– search engines, text mining etc.
• Big-data– bi-text from the Internet• eg. multilingual websites, documents
– large monolingual data• Learn to translate– from previous translations– models of language
What is Statistical Machine Translation?
Moses by Hieu Hoang, University of Edinburgh
6
What is Statistical Machine Translation?Training
Training Data Linguistic Toolsbi-textmonolingual datadictionary
SMT Systemtranslation modellanguage modellots of numbers…
Using
Source Text
SMT Systemtranslation modellanguage modellots of numbers…
§
Source Text
Moses by Hieu Hoang, University of Edinburgh
7
What is a model?
thanks to Precision Translation Tools
• Translation Model• Language Model– (of the target language)
Moses by Hieu Hoang, University of Edinburgh
8
What is a model?• Translation model– source translation– probability
source target probability
den Vorschlag the proposal 0.6227
‘s proposal 0.1068
a proposal 0.0341
the idea 0.0250
this proposal 0.0227
proposal 0.0205
…. ….
Moses by Hieu Hoang, University of Edinburgh
9
What is a model?• Language model– Likelihood of sentence– in target language
text probability
I would like 0.489
would like to 0.905
like to commend 0.002
to commend the 0.472
commend the rapporteur
0.147
…. ….
Moses by Hieu Hoang, University of Edinburgh
10
Agenda
• What is Statistical Machine Translation?• What is Moses?– Common misconceptions
• Coming up• What can we do for you?
Moses by Hieu Hoang, University of Edinburgh
11
What is Moses?
• Replacement for Pharoah– Academic software– Closed-source
• Open source• Re-written, clean code– More features
• Large developer community– Initiated by Hieu Hoang– Developed at NLP Workshop
Moses by Hieu Hoang, University of Edinburgh
12
Agenda
• What is Statistical Machine Translation?• What is Moses?– Timeline– Common misconceptions
• Coming up• What can we do for you?
Moses by Hieu Hoang, University of Edinburgh
13
What is Moses?
• Only for Linux• Difficult to use• Unreliable• Only phrase-based• Developed by one person• Slow
Common Misconceptions
Moses by Hieu Hoang, University of Edinburgh
14
Only works on Linux
• Tested on– Windows 7 (32-bit) with Cygwin 6.1 – Mac OSX 10.7 with MacPorts– Ubuntu 12.10, 32 and 64-bit– Debian 6.0, 32 and 64-bit– Fedora 17, 32 and 64-bit– openSUSE 12.2, 32 and 64-bit
• Project files for– Visual Studio– Eclipse on Linux and Mac OSX
Moses by Hieu Hoang, University of Edinburgh
15
Difficult to use• Easier compile and install– Boost bjam – No installation required
• Binaries available for– Linux– Mac– Windows/Cygwin– Moses + Friends
• IRSTLM• GIZA++ and MGIZA
• Ready-made models trained on Europarl
Moses by Hieu Hoang, University of Edinburgh
16
Unreliable• Monitor check-ins• Unit tests• More regression tests• Nightly tests
– Run end-to-end training– http://www.statmt.org/moses/cruise/
• Tested on all major OSes• Train Europarl models
– Phrase-based, hierarchical, factored– 8 language-pairs– http://www.statmt.org/moses/RELEASE-1.0/models/
Moses by Hieu Hoang, University of Edinburgh
17
Only phrase-based model– replacement for Pharoah– extension of Pharaoh
• From the beginning– Factored models– Lattice and confusion network input– Multiple LMs, multiple phrase-tables
• since 2009– Hierarchical model– Syntactic models
Moses by Hieu Hoang, University of Edinburgh
18
Developed by one person• ANYONE can contribute
– 50 contributors
‘git blame’ of Moses repository
Kenneth
Heafield
Hieu Hoan
g
phkoeh
n
Ondrej Bojar
Barry H
addow
sanmarf
Tetsu
o Kiso
Eva H
asler
Rico Se
nnrich
wlin12
nicolab
ertoldi
eherb
st
Ales Ta
mchyn
a
Colin Cherr
y
Matous M
achace
k
Phil Willi
ams
0%5%
10%15%20%25%30%35%40%
Moses by Hieu Hoang, University of Edinburgh
19
Slow
thanks to Ken!!
Decoding
Moses by Hieu Hoang, University of Edinburgh
20
Slow
• Multithreaded
• Reduced disk IO– compress intermediate files
• Reduce disk space requirement
Time (mins) 1-core 2-cores 4-cores 8-cores Size (MB)
Phrase-based
60 47(79%)
37(63%)
33(56%)
893
Hierarchical 1030 677(65%)
473(45%)
375(36%)
8300
Training
Moses by Hieu Hoang, University of Edinburgh
21
What is Moses?Common Misconceptions
• Only for Linux• Difficult to use• Unreliable• Only phrase-based• Developed by one person• Slow
Moses by Hieu Hoang, University of Edinburgh
22
What is Moses?
• Only for Linux Windows, Linux, Mac• Difficult to use Easier compile and install• Unreliable Multi-stage testing• Only phrase-based Hierarchical, syntax model• Developed by one person everyone• Slow Fastest decoder, multithreaded training,
less IO
Common Misconceptions
Moses by Hieu Hoang, University of Edinburgh
23
Agenda
• What is Statistical Machine Translation?• What is Moses?– Common misconceptions
• Coming up• What can we do for you?
Moses by Hieu Hoang, University of Edinburgh
24
Coming up…• Code cleanup• Incremental Training• Better translation– smaller model– bigger data– faster training and decoding
• Applications– CAT tools– Speech translation
Moses by Hieu Hoang, University of Edinburgh
25
Applications
• EU Project– CASMACAT– MATECAT
Computer-Aided Translation
Moses by Hieu Hoang, University of Edinburgh
26
Agenda
• What is Statistical Machine Translation?• What is Moses?– Common misconceptions
• Coming up• What can we do for you?
Moses by Hieu Hoang, University of Edinburgh
27
What can we do for you?
– simpler Moses– graphical interface– Windows compatibility– terminology and glossary– incremental training
• What can you do for us?– code– data– funding
Moses by Hieu Hoang, University of Edinburgh
28
What can we do for you?
– simpler Moses– graphical interface– Windows compatibility– terminology and glossary– incremental training
• What can you do for us?– code– data– funding