machine translator introduction

82
Machine Translator

Upload: hamid-shahrivari

Post on 21-Jan-2018

50 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Machine translator Introduction

Machine Translator

Page 2: Machine translator Introduction

What is Machine Translator?

Automatic translation from one language to another

Koehn: „Translating between languages is a task for which even humans require special training.“

Page 3: Machine translator Introduction

Why Even Humans Require Special Training

وندشی شورای امنیت، مردم ایران رو به قبله نمی « لولو» با نشان دادن

راهابچهكهموجوداتیمثلامنیتشورایاگركهاستگفته*:نیوزویكترجمه.كشندنمیدرازجهانمسلمانانقبلهسویبهایرانمردمشود،ظاهرترسانندمی

همراناكیترسچیزامنیتشورایاگركهگفت*:پائیسالاسپانیایینشریهترجمه.دخوابننمیسعودیعربستانسویبهایرانمردمهمبازدهد،نشانایرانیانبه

مركزسویبهایرانیانكشیدندرازكهگفت*:اومانیتهفرانسوینشریهترجمهایند،بترسنایافسانهموجوداتازآنهاكهدارداینبهبستگیمسلماناناعتقادات

.استایرانیداستانیك

Page 4: Machine translator Introduction

Other Similar Concept

• computer-aided translation

• machine-aided human translation (MAHT)

Page 5: Machine translator Introduction

History

As early as the 17th century by philosophers René Descartes and Gottfried Wilhelm Leibniz

Page 6: Machine translator Introduction

MT in Computer

Page 7: Machine translator Introduction

Applications

Dissemination

Publication inother languages

Communication

Emails, chats

Assimilation

Understand the Content

Page 8: Machine translator Introduction
Page 9: Machine translator Introduction

Challenges

Input

Typology

Lexical

Other

Page 10: Machine translator Introduction

Input: Ambiguity

I saw a man with telescope

Page 11: Machine translator Introduction

Input: Complexity

General relativity includes a dynamical spacetime so it isdifficult to see how to identify the conserved energy andmomentum Noether's theorem allows these quantities to bedetermined from a Lagrangian with translation invariance butgeneral covariance makes translation invariance intosomething of a gauge symmetry

Page 12: Machine translator Introduction

Input: Wrong Sentence

I try for getting best grades but I did not can achive it

Page 13: Machine translator Introduction

Typology: Morphology

Page 14: Machine translator Introduction

Typology: Syntax

order of verbs (V), subjects (S) and objects (O)

Page 15: Machine translator Introduction

Typology: Argument structure and linking

Page 16: Machine translator Introduction

Typology: Pronouns omission

Page 17: Machine translator Introduction

Lexical: Ambiguity

Page 18: Machine translator Introduction

Lexical: Grammer

Page 19: Machine translator Introduction

Lexical: Lexical gap

Page 20: Machine translator Introduction

Lexical: Idiom

Page 21: Machine translator Introduction

Other Challenges

Page 22: Machine translator Introduction

Other Challenges

Page 23: Machine translator Introduction

Human Translation Process

• Decoding the meaning of the source text

• Re-encoding this meaning in the target language.

Page 24: Machine translator Introduction

Simplest Machine Translator

Apple سیب

Page 25: Machine translator Introduction

MT

Human Translation

With Machine Aid

Machine Translation

With Human Aid

Fully Automated Translation

Rule Based MT

Direct MT Transfer MT Interlingua

Knowledge Based MT

Principle Based MT

Empirical Based MT

Statistical MT

Word Based Translation

Phrase Based Translation

Hierarchical Phrase Based

Translation

Example Based MT

Online Interactive MT

Hybrid MT Neural MT

Page 26: Machine translator Introduction

Rule-Base MT

Page 27: Machine translator Introduction

Direct Translation

• dictionary has to cover all cross-lingual phenomena

• need to include contextual information in dictionary (long phrases)

• inflectional agreement, shifts in word order & structure

+ direct translation systems include simplistic rules

Page 28: Machine translator Introduction

Direct Translation Approach

• simplistic: only low-level pre/post-processing (tokenization, etc)

• advanced: handle some specific phenomena

identification & handling of syntactic ambiguity

morphological processing/synthesis

word re-ordering rules

rules for prepositions

handling of compounds and idioms, ...

Page 29: Machine translator Introduction

Is Direct Translation Feasible?

Page 30: Machine translator Introduction

Transfer Based Translation

Motivation:

• complete analysis of source language sentences

• handle lexical & structural ambiguity in one formalism

Page 31: Machine translator Introduction

Transfer Based Needed Information/Tools

• source language parser (morpho-syntactic analysis)

• transfer engine (e.g. unification based grammar)

• target language generator

Page 32: Machine translator Introduction
Page 33: Machine translator Introduction

• Morphological analysis. Surface forms of the input text are classified as to part-of-speech (e.g. noun, verb, etc.) and sub-category (number, gender, tense, etc.). All of the possible "analyses" for each surface form are typically made output at this stage, along with the lemma of the word.

• Lexical categorisation. In any given text some of the words may have more than one meaning, causing ambiguity in analysis. Lexical categorisation looks at the context of a word to try to determine the correct meaning in the context of the input. This can involve part-of-speech tagging and word sense disambiguation.

• Lexical transfer. This is basically dictionary translation; the source language lemma (perhaps with sense information) is looked up in a bilingual dictionary and the translation is chosen.

• Structural transfer. While the previous stages deal with words, this stage deals with larger constituents, for example phrases and chunks. Typical features of this stage include concordance of gender and number, and re-ordering of words or phrases.

• Morphological generation. From the output of the structural transfer stage, the target language surface forms are generated.

Page 34: Machine translator Introduction

Transfer Based: Syntactic Transfer

Page 35: Machine translator Introduction

What are the problems?

• lots of grammar engineering (writing rules ...)

• language-pair specific rules

• exponential ambiguity

• variation & preference

Page 36: Machine translator Introduction

Interlingua-based Translation

Page 37: Machine translator Introduction

Persian

English

Page 38: Machine translator Introduction

Persian

SpanishEnglish

Page 39: Machine translator Introduction

Persian

Spanish

Japanese

English

Page 40: Machine translator Introduction

interlingua

Persian

Spanish

New

English

Page 41: Machine translator Introduction

Advantages & Disadvantages

• no language-pair specific transfer

• simple to add new languages (add new analysis/generation component)

• need to design interlingua that covers all language phenomena

• need semantic representation (and that’s hard!)

Page 42: Machine translator Introduction
Page 43: Machine translator Introduction

Statistical MT

Page 44: Machine translator Introduction

Statistical MT

Page 45: Machine translator Introduction

Statistical MT

Page 46: Machine translator Introduction
Page 47: Machine translator Introduction

Statistical MT

(1) build a language model which allows us to estimate P(e)

(2) build a translation model which allows us to estimate P(f|e)

(3) search for e maximizing the product P(f|e).P(e)

Page 48: Machine translator Introduction

Language Modeling

Page 49: Machine translator Introduction

Which N-Gram?

• 1-Gram is not very realistic

• More realistic still is the trigram model

Problem

50,000 English word

2.5 billion possible bigrams

Many zero bigram in corpus but maybe needed in translations

Page 50: Machine translator Introduction

linear interpolation

Page 51: Machine translator Introduction

Translation Model

(i) a model of the sentence-aligned source–target training corpus

(ii) a method for computing the probability that S and T are equivalent using that model

Page 52: Machine translator Introduction

Translation Model Example

Page 53: Machine translator Introduction

Example

Page 54: Machine translator Introduction
Page 55: Machine translator Introduction
Page 56: Machine translator Introduction
Page 57: Machine translator Introduction

Word Alignment

Page 58: Machine translator Introduction

Simple Word Alignment

Page 59: Machine translator Introduction

Expectation-Maximisation (EM) algorithm

Page 60: Machine translator Introduction

Expectation-Maximisation (EM) algorithm

Page 61: Machine translator Introduction
Page 62: Machine translator Introduction
Page 63: Machine translator Introduction
Page 64: Machine translator Introduction

MT Evaluation

• How can we measure MT quality?

• How can we compare MT engines?

• How can we measure progress in MT development?

Page 65: Machine translator Introduction

• Adequacy: Does the output convey the same meaning as the input sentence?

Is part of the message lost, added, or distorted?

• Fluency: Is the output good fluent English?

This involves both grammatical correctness and idiomatic word choices.

Page 66: Machine translator Introduction

What do We Expect from MT?

• adequacy & informativeness (preserve meaning)

• fluency & grammaticality (translation needs to be natural)

• acceptance (for its task)

Page 67: Machine translator Introduction

Task-specific evaluation

• browsing quality: Is the translation understandable in itscontext?

• post-editing quality: How many edit operations are required to turn it into a good translation?

• publishing quality: How many human interventions arenecessary to make the entire document ready for printing?

Page 68: Machine translator Introduction

Evaluation is Difficult!

• I What is the best translation? (language variation!)

• I Subjective aspects (What is “fluent”? Clarity? Style?)

• I What is “grammatical”?

• I What is “adequate”? (Is it possible to be adequate?)

Page 69: Machine translator Introduction

MT evaluation

Manual Evaluation

• ask actual users to rate translations

• statistics over user responses

• separate evaluations of adequacy & fluency

• requires guidelines

• task-specific evaluation

Automatic Evaluation

• compare to reference translations

• approximations by measuring overlaps

• strong bias but useful for rapid development

Page 70: Machine translator Introduction

Fluency and Adequacy: Scales

Page 71: Machine translator Introduction
Page 72: Machine translator Introduction

Manual MT evaluation: What are the problems?

• need volunteers (every time we want to evaluate)

• expensive evaluation!

• subjective measures & disagreement between annotators

Page 73: Machine translator Introduction
Page 74: Machine translator Introduction
Page 75: Machine translator Introduction
Page 76: Machine translator Introduction
Page 77: Machine translator Introduction

Automatic Evaluation: BLEU-score

• introduced in 2002 by Papineni et al

• desperately needed by rapid MT development

• quickly adapted by statistical MT community

• created a boom in MT research/experiments

• Many MT papers report only BLEU scores and don’t even look at the

translations

Page 78: Machine translator Introduction

BLEU-score

the closer a machine translation is to a professional human translation

the better it is

Page 79: Machine translator Introduction

Definition

•Pn: for each pair of candidate and reference sentences.

• This score represents the proportion of n-word sequences in the candidate translation which also occur in the reference translation.

Page 80: Machine translator Introduction
Page 81: Machine translator Introduction

• Koehn, Philipp. Statistical machine translation. Cambridge University Press, 2009.

• Arnold, D., et al. "Machine translation: An introductory guide. NCC Blackwell." (1994).

• https://www.slideshare.net/rushdishams/types-of-machine-translation

• https://en.wikipedia.org/wiki/Machine_translation

• Brown, Peter F., et al. "A statistical approach to machine translation." Computational linguistics 16.2 (1990): 79-85.

• Hearne, Mary, and Andy Way. "Statistical machine translation: a guide for linguists and translators." Language and Linguistics Compass 5.5 (2011): 205-226.

Page 82: Machine translator Introduction

Question?