statistical machine translation for indian language copy

24
Statistical Machine Translation (SMT) for Indian Language Presented By: Nakul Sharma, Parteek Bhatia. Thapar University, Patiala.

Upload: ntu727

Post on 15-Jul-2015

78 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: Statistical machine translation for indian language   copy

Statistical Machine Translation (SMT) for Indian Language

Presented By:

Nakul Sharma, Parteek Bhatia.

Thapar University, Patiala.

Page 2: Statistical machine translation for indian language   copy

Main Agenda

• Introduction to SMT.• Tools.• Popular Machine Translation Systems.• Machine Translation Projects in India.• Machine Translation Tools and Punjabi

Language.• Conclusion and future work.• References.

Page 3: Statistical machine translation for indian language   copy

Introduction

• Part of Corpus based Machine Translation.• System consists of 3 components:– Language Model (LM).– Translation Model (TM).– Decoder.

Page 4: Statistical machine translation for indian language   copy

System Architecture

T s

S T

Language ModelP(T)

Translation Model P(S|T)

Decoder

Page 5: Statistical machine translation for indian language   copy

Language Model (LM)

• Gives probability of single word given all words of the sentence.

• N-gram model.• P(s)=P(w1,w2,w3,……….,wn)

=P(w1)P(w2/w1)P(w3/w1.w2)P(w4/w1w2w3)……..

P(wn/w1w2w3w……wn-1).

Page 6: Statistical machine translation for indian language   copy

Translation Model (TM)

• Computes conditional probability P (T|S).• Break the process into smaller units (words,

phrases..)• Here T:Target Language, S:Source language.• For Example, (aUH baag wYWch s/UN gaYI|

she slept in garden).

Page 7: Statistical machine translation for indian language   copy

Decoder

• Search for a sentence T is performed that maximizes P(S|T) i.e.– Pr (S, T) = argmax P(T) P (S|T).

• Start with null hypothesis, i.e. sequence starts with sequence of sentences.

Page 8: Statistical machine translation for indian language   copy

Main Agenda

• Introduction to SMT.• Tools for SMT.• Popular Machine Translation Systems.• Machine Translation Projects in India.• Machine Translation Tools and Punjabi

Language.• Conclusion and future work.• References.

Page 9: Statistical machine translation for indian language   copy

Tools for SMT

• LM Tools– CMU Statistical Language Modeling (SLM) Toolkit– SRILM

• TM Tools– GIZA++– MGIZA

• Decoder– Moses– ISI Rewriter Decoder– Pharaoh

Page 10: Statistical machine translation for indian language   copy

LM Tools

• CMU Statistical Language Modeling (SLM) Toolkit. – Set of unix software tools.– Written by Roni Rosenfeld.

• SRILM– Developed by SRI Speech Technology and research

laboratory.– Applying Language Models.

Page 11: Statistical machine translation for indian language   copy
Page 12: Statistical machine translation for indian language   copy

Architecture for LM

Architecture of LM.

Page 13: Statistical machine translation for indian language   copy
Page 14: Statistical machine translation for indian language   copy

TM Tools

• GIZA++– Implements different models like HMM.– Performs word alignment.

• MGIZA++– Multi-threaded word alignment– Memory optimization.

Page 15: Statistical machine translation for indian language   copy

This is the t3 final:-First column: ids of source wordsSecond column:ids of target words.Third column: Probability of alignment words.

Page 16: Statistical machine translation for indian language   copy

Decoder Tools

• Moses– Automatic training of translation models for any

language pair.– Works with SRILM and GIZA++.

• ISI Rewriter Decoder– Performs searching in development of SMT.– Works with CMU-Statistical Language Modeling

toolkit and GIZA++.

Page 17: Statistical machine translation for indian language   copy

Popular Machine Translation Systems

• Google Translator.• Bing Translator.• Systran.• Hindi to Punjabi Machine Translation System.• METAL.

Page 18: Statistical machine translation for indian language   copy

Main Agenda

• Introduction to SMT.• Tools.• Popular Machine Translation Systems.• Machine Translation Projects in India.• Machine Translation Tools and Punjabi

Language.• Conclusion and future work.• References.

Page 19: Statistical machine translation for indian language   copy

Machine Translation Project in India

• Anglabharat and Anubharati• Anusaaraka• MaTra• Mantra• UCSG-based English-Kannada MT• UNL based MT between English, Hindi and

Marathi• Tamil-Hindi Anusaarka and English-Tamil MT• English-Hindi SMT.

Page 20: Statistical machine translation for indian language   copy

Machine Translation Tools and Punjabi Language

• Punjabi University.–On-line Hindi-Punjabi & Punjabi-Hindi

Machine Translation. • Thapar University.– Punjabi language server which includes

Punjabi-UNL Encoverter and UNL-Punjabi Encoverter.

Page 21: Statistical machine translation for indian language   copy

Conclusion and Future Work

•There are applications supporting regional language translation.•Future research directions in tree-tostring alignment template,clause based restructuring.•Combination of various MT techniques leading to efficient translation.

Page 22: Statistical machine translation for indian language   copy

References[01]. Adam Lopez, “Statistical Machine Translation”, ACM Computing Surveys, Vol. 40, No. 3, Article 8, Aug 2008.

[02]. Durgesh Rao; ―Machine Translation in India: A Brief Survey.

[03]. Franz Josef Och., ―GIZA++: Training of statistical translation models available at: ‖ http://fjoch.com/GIZA++.html accessed on 26/03/2010.

[04]. Hindi to Punjabi Translation system available at http://h2p.learnpunjabi.org accessed on 03/04/2010.

[05]. Hindi to Punjabi Translation system available at http://h2p.learnpunjabi.org accessed on 03/04/2010.

[06] Gurpreet Singh Lehal, ―A Survey of the State of the Art in Punjabi Language Processing , Language in India, oct ‖2009.

[07] Hindi to Punjabi Translation system available at http://h2p.learnpunjabi.org accessed on 03/04/2010

[08] ISI ReWrite Decoder User's Manual, Version 0.2, available at http://www.isi.edu/~germann/software/ReWrite-Decoder/isi-decoder-manual.html accessed on 12/03.2010

[09] Jamie G. Carbonell, Teruko Mitamurs, Eric H. Nyberg, ―The KANT Perspective: A Critique of Pur Transfer (and Pure Interlingua, Pure Statistic,….)

[10] Jayprasad J Hegde, Ananthakrishnan R, Kavitha M, Chandra Shekhar, Ritesh Shah, Sawani Bade, Sasikumar M, ―MaTra: A Practical Approach to Fully- Automatic Indicative English-Hindi Machine Translation.

[11] Jean Senellart, Péter Dienes, Tamás Váradi, ―New Generation Systran Translation System, MT Summit VIII, Sept 2001.

Page 23: Statistical machine translation for indian language   copy

References(Cont.)[12] On line Translation System available at:

www.translate.google.com accessed on 03/04/2010.[13] Online manual of CMU Statistical Language Modeling Toolkit

available at: http://mi.eng.cam.ac.uk/~prc14/toolkit_documentation.html accessed on 15/03/2010.

[14] P. Brown, S. Della Pietra, V. Della Pietra, and R. Mercer ―The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2), 263-311. (1993).

[15] Parteek Bhatia, Sandeep Singh, ―Punjabi Deconverter Architecture , National Seminar on Creation of Lexical Resources ‖for Indian Language Computing and Processing, CDAC Mumbai, March 26-28, 2007

Page 24: Statistical machine translation for indian language   copy

Contact Us

[email protected]

950303762