statistical machine translation with moses hieu hoang localization world 2013 0.6227

28
Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Upload: mitchell-blizzard

Post on 29-Mar-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Statistical Machine Translation with Moses

Hieu HoangLocalization World 2013

0.6227

Page 2: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

2

Agenda

• What is Statistical Machine Translation?• What is Moses?– Common misconceptions

• Coming up• What can we do for you?

Page 3: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

3

Agenda

• What is Statistical Machine Translation?• What is Moses?– Common misconceptions

• Coming up• What can we do for you?

Page 4: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

4

What is Statistical Machine Translation?

It is very tempting to say that a book written in Chinese is simply a book written in English which was coded into the “Chinese code.” If we have useful methods for solving almost any cryptographic problem, may it not be thatwith proper interpretation we already have useful methods for translation?

Warren Weaver1949

Page 5: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

5

• NLP Application– search engines, text mining etc.

• Big-data– bi-text from the Internet• eg. multilingual websites, documents

– large monolingual data• Learn to translate– from previous translations– models of language

What is Statistical Machine Translation?

Page 6: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

6

What is Statistical Machine Translation?Training

Training Data Linguistic Toolsbi-textmonolingual datadictionary

SMT Systemtranslation modellanguage modellots of numbers…

Using

Source Text

SMT Systemtranslation modellanguage modellots of numbers…

§

Source Text

Page 7: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

7

What is a model?

thanks to Precision Translation Tools

• Translation Model• Language Model– (of the target language)

Page 8: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

8

What is a model?• Translation model– source translation– probability

source target probability

den Vorschlag the proposal 0.6227

‘s proposal 0.1068

a proposal 0.0341

the idea 0.0250

this proposal 0.0227

proposal 0.0205

…. ….

Page 9: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

9

What is a model?• Language model– Likelihood of sentence– in target language

text probability

I would like 0.489

would like to 0.905

like to commend 0.002

to commend the 0.472

commend the rapporteur

0.147

…. ….

Page 10: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

10

Agenda

• What is Statistical Machine Translation?• What is Moses?– Common misconceptions

• Coming up• What can we do for you?

Page 11: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

11

What is Moses?

• Replacement for Pharoah– Academic software– Closed-source

• Open source• Re-written, clean code– More features

• Large developer community– Initiated by Hieu Hoang– Developed at NLP Workshop

Page 12: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

12

Agenda

• What is Statistical Machine Translation?• What is Moses?– Timeline– Common misconceptions

• Coming up• What can we do for you?

Page 13: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

13

What is Moses?

• Only for Linux• Difficult to use• Unreliable• Only phrase-based• Developed by one person• Slow

Common Misconceptions

Page 14: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

14

Only works on Linux

• Tested on– Windows 7 (32-bit) with Cygwin 6.1 – Mac OSX 10.7 with MacPorts– Ubuntu 12.10, 32 and 64-bit– Debian 6.0, 32 and 64-bit– Fedora 17, 32 and 64-bit– openSUSE 12.2, 32 and 64-bit

• Project files for– Visual Studio– Eclipse on Linux and Mac OSX

Page 15: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

15

Difficult to use• Easier compile and install– Boost bjam – No installation required

• Binaries available for– Linux– Mac– Windows/Cygwin– Moses + Friends

• IRSTLM• GIZA++ and MGIZA

• Ready-made models trained on Europarl

Page 16: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

16

Unreliable• Monitor check-ins• Unit tests• More regression tests• Nightly tests

– Run end-to-end training– http://www.statmt.org/moses/cruise/

• Tested on all major OSes• Train Europarl models

– Phrase-based, hierarchical, factored– 8 language-pairs– http://www.statmt.org/moses/RELEASE-1.0/models/

Page 17: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

17

Only phrase-based model– replacement for Pharoah– extension of Pharaoh

• From the beginning– Factored models– Lattice and confusion network input– Multiple LMs, multiple phrase-tables

• since 2009– Hierarchical model– Syntactic models

Page 18: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

18

Developed by one person• ANYONE can contribute

– 50 contributors

‘git blame’ of Moses repository

Kenneth

Heafield

Hieu Hoan

g

phkoeh

n

Ondrej Bojar

Barry H

addow

sanmarf

Tetsu

o Kiso

Eva H

asler

Rico Se

nnrich

wlin12

nicolab

ertoldi

eherb

st

Ales Ta

mchyn

a

Colin Cherr

y

Matous M

achace

k

Phil Willi

ams

0%5%

10%15%20%25%30%35%40%

Page 19: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

19

Slow

thanks to Ken!!

Decoding

Page 20: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

20

Slow

• Multithreaded

• Reduced disk IO– compress intermediate files

• Reduce disk space requirement

Time (mins) 1-core 2-cores 4-cores 8-cores Size (MB)

Phrase-based

60 47(79%)

37(63%)

33(56%)

893

Hierarchical 1030 677(65%)

473(45%)

375(36%)

8300

Training

Page 21: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

21

What is Moses?Common Misconceptions

• Only for Linux• Difficult to use• Unreliable• Only phrase-based• Developed by one person• Slow

Page 22: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

22

What is Moses?

• Only for Linux Windows, Linux, Mac• Difficult to use Easier compile and install• Unreliable Multi-stage testing• Only phrase-based Hierarchical, syntax model• Developed by one person everyone• Slow Fastest decoder, multithreaded training,

less IO

Common Misconceptions

Page 23: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

23

Agenda

• What is Statistical Machine Translation?• What is Moses?– Common misconceptions

• Coming up• What can we do for you?

Page 24: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

24

Coming up…• Code cleanup• Incremental Training• Better translation– smaller model– bigger data– faster training and decoding

• Applications– CAT tools– Speech translation

Page 25: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

25

Applications

• EU Project– CASMACAT– MATECAT

Computer-Aided Translation

Page 26: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

26

Agenda

• What is Statistical Machine Translation?• What is Moses?– Common misconceptions

• Coming up• What can we do for you?

Page 27: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

27

What can we do for you?

– simpler Moses– graphical interface– Windows compatibility– terminology and glossary– incremental training

• What can you do for us?– code– data– funding

Page 28: Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227

Moses by Hieu Hoang, University of Edinburgh

28

What can we do for you?

– simpler Moses– graphical interface– Windows compatibility– terminology and glossary– incremental training

• What can you do for us?– code– data– funding