statistical machine translation raghav bashyal. statistical machine translation uses pre-translated...

Statistical Machine Translation Raghav Bashyal

Upload: oswin-marshall

Post on 17-Jan-2016

221 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Statistical Machine Translation

Raghav Bashyal

Statistical Machine Translation

Uses pre-translated text (copora) Compare translated text to original Notice patterns, associate words

SMT Process

• Knight – A Statistical Translation Workbook

• Basic probabilities

– P(word)

• Conditional probabilities

– P(word | word)

• …

• Pick the most probable translation

Page 4: Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original

SMT process

http://isoft.postech.ac.kr/research/SMT/images/math.jpg

Page 5: Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original

Project

Translate basic text from Spanish to English Test effectiveness

with/without hard-coded components (syntax) Specific procedures/algorithms that add speed

Page 6: Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original

Literature

Guides on Statistical Machine Translation Most research project follow the same

procedure as outlined by Knight

• “state of the art” implementation

– Google

Page 7: Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original

Literature

• NLTK

– Christina Wallin

• UC Berkeley

– Modifications

– Larger corpora more useful

• Syntax based

– hard-code

– Higher translation quality when used with SMT

Page 8: Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original

Procedure

NLTK – Natural Language ToolKit Python Made from Natural Language processing projects

Current procedure – read the SMT worksheet Code along with worksheet

Page 9: Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original

Development

• Create corpora

• Tokenization

– Clean string

• Probability

– P(word) in corpora

Page 10: Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original

Smoothing

• Coefficients used to modify probability

– Large coefficients for trigrams

– Small for bigrams and single words

• Normalizes the weight of all the words/phrases

– Trigrams are more valuable

Page 11: Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original

Algorithm

For translation, IMB Model 3 is used:1. For each English word ei indexed by i = 1, 2, ..., 1, choose fertility phi-i with probability

n(phi-i | ei)

2. Choose the number phi-0 of "spurious" French words to be generated from e0 = NULL, using

probability p1 and the sum of fertilities from step 1

3. Let m be the sum of fertilities for all words, including NULL

4. For each i = 0, 1, 2, ...., 1, and each k = 1, 2, ..., phi-i, choose a French word tau-ik

with probability t(tau-ik | ei)

5. For each i = 1, 2, ..., 1, and each k = 1, 2, ..., phi-i, choose target French position

pi-ik with probability d(pi-ik | i, l, m)

6. For each k = 1, 2, ..., phi-0, choose a position pi-0k from the phi-0 - k + 1 remaining

vacant positions in 1, 2, ...m, for a total probability of 1/phi-0!

7. Output the French sentence with words tau-ik in positions pi-ik (0<=i<=1, 1<=k<phi-i)

Page 12: Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original

Expected Results

Probably will be very basic translation Usually perform better with “sample” text than

“real” text Highlighted errors

Program should use reference data to find some errors

Error frequency plots for certain words Test the effectiveness of adjustments

Hard coding, other algorithms

Page 13: Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original

GUI

Protein synthesis “Translation” Protein synthesis “Translation” The letters of the nucleic acid is translated into amino acids. “from nucleotide language

Questionnaire translation in the 3 European Company Survey ...€¦ · translated the questionnaire back into English (back-translation). 4. The results of the back-translation were

Get ready for full translated sites with Entity Translation

Language Models for Machine Translation: Original vs. Translated Texts

The impact of translation direction on characteristics of ... · The impact of translation direction on characteristics of translated ... or shining through in ... of translation

An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller

Middle Beyond Extremes - promieniepromienie.net/.../maitreya_madhyantavibhaga_middle-beyond-extrem… · Maitreyanātha's work translated into English from the Tibetan translation

Study on the Translation of News Reports · Accurate translation of these neologisms is the key of news translation. The following examples show how neologisms are perfectly translated

Source Language Text, Parallel Text, and Model Translated ... · Source Language Text, Parallel Text, and Model Translated Text: A Pilot Study in Teaching Translation By Sergio Bolaæos

Asymmetries in/of Translation: Translating Translated ... · 81 Asymmetries in/of Translation: Translating Translated Hispanicism(s)1 Rosario Martín Ruano Carmen África Vidal Claramonte

Translated in Translation Studies

Legal Translation Gone Wrong: 10 Mistakes to Avoid When Having Your Legal Content Translated

Translated version in Persian - UNCTAD | Home...1 Translated version in Persian The present work is an unofficial translation of Science, Technology and Innovation Policy Review of

Automatic Detection of Machine Translated Text and ...cl.haifa.ac.il/iscol14/Papers/Automatic_Detection_of...Automatic Detection of Machine Translated Text and Translation Quality

TRANSLATION QUALITY: A COMPARATIVE STUDY OF TWO TRANSLATED ...thesis.swu.ac.th/swuthesis/Eng(M.A.)/Nannapat_S.pdf · TRANSLATION QUALITY: A COMPARATIVE STUDY OF TWO TRANSLATED VERSIONS

Translated by Tripitaka Master Hsuan Tsang Commentary by ... · Translated by Tripitaka Master Hsuan Tsang Commentary by Grand Master T'an Hsu English Translation by Ven. Master Lok

ESTABLISHING COMPETENCE LEVELS IN TRANSLATION · PDF file“Establishing Competence Levels in Translation ... Text genres liable to be translated at ... Level A Basic translation competence

AN ANALYSIS OF TRANSLATION PROCEDURES IN TRANSLATED …

Translation Series No. 3277 Series No. 3277 ... Translated by the Translation Bureau ... BRANCH OR DIVISION TRANSLATOR (INITIA LS) Feel 3? 7 7 AUTHOR

A corpus-based study of PP placement in translated and non-translated journalistic Dutch. Annelore Willems, Gert De Sutter Faculty of Translation Studies

Translation 7.3. Translation the information coded in mRNA is translated to a polypeptide chain

^FISHERIES AND MARINE SERVICE Translation Seriesdfo-mpo.gc.ca/Library/111669.pdf · ^FISHERIES AND MARINE SERVICE Translation Series No. 3060 Translated by the Translation Bureau(EJI,)

Machine Translation and the Translator - TermCoord · Machine Translation and the Translator Philipp Koehn ... Each phrase is translated into English Phrases are reordered ... Bengali

TRANSLATION METHODS IN “A WALK TO REMEMBER” …eprints.dinus.ac.id/5865/1/thesis.pdf · TRANSLATION METHODS IN “A WALK TO REMEMBER” NOVEL TRANSLATED INTO “KAN KUKENANG SELALU

Heidegger Letter On Humanism Translation GROTHwagner.edu/.../2013/01/Heidegger-Letter-On-Humanism-Translation … · Martin Heidegger LETTER ON "HUMANISM"*1 Translated by Miles Groth,

The importance of culture and text types in translation ... Web viewThe importance of culture and text types in translation: Should culture be translated. 14

Analysing Translated Allusions: Exploring a Statistical ...translation researchers. Research on the translation of allusions has nevertheless been mainly limited to qualita-tive case

HUMAN AND MACHINE TRANSLATION QUALITY: DEFINABLE? … · 2018. 12. 17. · Machine translation is at the center of Human-Assisted Machine Translation (HAMT). An entire text is translated

· Rehabilitation's (CDCR) ... and a list of translated forms available ... interpretation/translation that may involve medical,

AAT Translation testphase 2010...AAT Translation testphase 2010 – 149 subjects from the AAT – 311 German terms – 110 scope notes translated – 46 forum entries = questions –

Translation Testing in Idioms Translation of Fiction from ... · PDF fileThe original English version and the Thai translated version ... Translation Testing in Idioms Translation

CELL BIOLOGY Locally translated mTOR controls axonal local ...CELL BIOLOGY Locally translated mTOR controls axonal local translation in nerve injury Marco Terenzio, 1Sandip Koley,

15. Alessandro Cattelan (Translated) Natural Language Processing for Translation)

Metrics for Evaluating Translation Memory Software · Translation memory (TM) tools help human translators recycle portions of their previous work by storing previously translated

Bossa nova songs translated into Italian: translation