machine translation course program (in english)

2

Click here to load reader

Upload: dmitry-kan

Post on 26-Jun-2015

317 views

Category:

Technology


0 download

DESCRIPTION

This is the English version of my Machine Translation course program for the following course slides (in Russian): http://www.slideshare.net/dmitrykan/introduction-to-machine-translation-2911038 and http://www.slideshare.net/dmitrykan/introduction-to-machine-translation-1

TRANSCRIPT

Page 1: Machine translation course program (in English)

Machine Translation course program Brief description of the course: There are two fundamental approaches to machine translation: rule-based approach (based on formal models of natural languages, like e.g. dependency grammars) and statistical approaches (based on parallel streams of data). Both these approaches have their advantages: rule-based one being formal and structured, while statistic approach gives an opportunity to construct and scale the system without the need to deeply study properties of a natural language. On the other hand both these approaches have their problematic areas: rule-based approach is bound to a given language or a family of languages, while statistic approach doesn’t allow controlling subtle structures and properties of a natural language, like for example generating prepositions. Recently combining these two fundamental approaches have been of a special interest of scientists. An entire pipeline of machine translation, starting from source language formalization and finishing with word reordering on the target language side, can be considered as a training area for combining rule based with statistics. This course will introduce students into all sub-tasks of creating a machine translation system using both fundamental approaches: formalization of natural language, translational dictionaries, phrase translation, machine translation models, decoding and word reordering. The course will also present formal semantic models of natural languages and their place in the topic. Along with that, machine learning methods (like structured prediction) will be in the focus of the course. The course material assumes knowledge of general higher mathematics and knowledge or interest in the natural language processing. We will have some hands-on and take-away knowledge sessions, which assume familiarity with formats, NLP algorithms and libraries.

Course topics 1. Introduction to MT. Motivation of its existence 2. Short history of MT, mane phases. ALPAC report 3. MT systems triangle. Direct and indirect MT. Examples of MT systems 4. Current MT systems existing in the industry, main players 5. Existing software packages for natural language processing and building an MT system 6. Two fundamental approaches to MT: statistical and rule-based (classical) 7. Methods of MT 8. Direct MT system, its features, pros and cons. 9. Transfer MT system, types of transfer methods, features 10. Notion of interlingua. Features of MT based on interlingua, its comparison with transfer 11. Statistical MT and its components 12. Example based MT systems 13. Theory of statistical MT systems. Fundamental equation (Bayes theorem). Notion of statistical language

model. MT model 14. model of machine translation in statistical MT 15. Task of word alignment 16. Features of MT systems 17. Existing programming components of statistical MT systems 18. Evaluation of MT systems: human evaluation and automatic metrics 19. BLEU score 20. METEOR score 21. NIST score 22. Round-trip evaluation method 23. Hybrid MT systems 24. Task of word reordering in a sentence on the target side. Rule-based and statistical approaches 25. Computer semantics of a natural language. MT system based on it 26. Pragmatics and context analysis on cross-sentence level 27. Practical details of software packages: GIZA++, SRILM, Moses

Page 2: Machine translation course program (in English)

28. Method of structured prediction for learning machine translation models

Seminar topics 1. Mathematics of statistical MT, paper [1] 2. Hierarchical model of statistical MT, paper [2] 3. Phrase-based statistical MT, paper [3] 4. Rule-based MT systems, papers [4,5] 5. Hybrid MT systems, based on examples, paper [6] 6. BLEU score in details, paper [8] 7. Robust large-scale MT systems, based on examples, paper [9]

Bibliography [1] Brown P., Della Petra S., Della Petra V., Mercer R.: The Mathematics of

Statistical Machine Translation: Parameter Estimation, 1993

[2] Chiang D.: A Hierarchical Phrase-Based Model for Statistical Machine

Translation, 2005

[3] Koehn P., Och F., Marcu D.: Statistical Phrase-Based Machine Translation, 2003

[4] Kaplan R., Netter K., Wedekind J., Zaenen A.: Translation By Structural

Correspondences, 1989

[5] Landsbergen J.: The Rosetta Project, 1989

[6] Groves D., Way A.: Hybrid Example-Based SMT: the Best of Both Worlds?

[7] Athanaselis T., Bakamidis S., Dologou I.: Words Reordering based on Statistical

Language Model, 2006

[8] Papineni K., Roukos S., Ward T., Zhu W.-J.: BLEU: a Method for Automatic

Evaluation of Machine Translation, 2002

[9] Gough N., Way A.: Robust Large-Scale EBMT with Marker-Based Segmentation,

2004