machine translation dr. radhika mamidi. what is machine translation? a sub-field of computational...

24
Machine Translation Dr. Radhika Mamidi

Upload: percival-barton

Post on 27-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Machine Translation

Dr. Radhika Mamidi

What is Machine Translation?

• A sub-field of computational linguistics

• It investigates the use of computer software to translate text or speech in between natural languages

Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another.

Translation process

The translation process can be stated as:

• Decoding the meaning of the source text, and

• Re-encoding this meaning in the target language

=Analyze the source text for meaning and generate the meaning in target language.

Five types of knowledge used in the translation process:

• Knowledge of the source language, which allows us to understand the original text.

• Knowledge of the target language, which makes it possible to produce a coherent text in that language.

• Knowledge of equivalents between the source and target languages.

• Knowledge of the subject field as well as general knowledge, both of which aid comprehension of the source language text.

• Knowledge of socio-cultural aspects, that is, of the customs and conventions of the source and target cultures.

In other words, we need…

• in-depth knowledge of both the grammar, semantics, syntax, idioms and the like of the source language, as well as the culture of its speakers

• the same in-depth knowledge of target language is needed to re-encode the meaning in the target language

The challenge

• How to program a computer to "understand" a text as a human being does

• Also to "create" a new text in the source language that "sounds" as if it has been written by a human

Some issues• Languages have different word

orders/structures.• Some languages are morphologically

complex.• Some languages do not have determiners.• Identifying and finding equivalents of

idioms, collocations, phrasal verbs etc.• Lexical gaps• Lexical ambiguity• Structural ambiguity

History

• The Georgetown experiment in 1954 [after the second world war] involved fully automatic translation of more than sixty Russian sentences into English.

• The experiment was a great success and ushered in an era of significant funding for machine translation research.

• After the ALPAC report in 1966, which found that the ten years long research had failed to fulfill the expectations, the funding was dramatically reduced.

• ALPAC = Automatic Language Processing Advisory Committee

History (contd.)

• Starting in the late 1980s, as computational power increased and became less expensive, more interest began to be shown in statistical models for machine translation.

• Today there are many software programs for translating natural language, several of them online, such as the SYSTRAN system which powers both Google translate and the AltaVista's Babelfish.

Compromise

• It would be absurd to claim that a machine could produce a target text of the same quality as that of a human being

• A machine can do the first stage of translation automatically and then human beings can revise and edit the translation.– Machine/Computer Aided Translation systems– Human Aided Machine Translation systems

• Currently, the product of machine translation is sometimes called a "gisting translation”.

• MT will often produce only a rough translation that will at best allow the reader to "get the gist" of the source text.

• It may not convey a complete understanding of it.

• The user may find the raw translation sufficiently useful as it is.

• Despite their inherent limitations, MT programs are currently used by various organisations around the world.

Very useful for…

• Translating user manuals

• Translating UN, EU documents

• Translating instructions

• Translating web pages

• Limited domain

Approaches

Source Text

Target Text

Direct Translation

Transfer

Interlingua

Analysis Generation

Methods1. Rule based:

• Words will be translated in a linguistic way — the most suitable words of the target language will replace the ones in the source language

• This method requires extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules.

Methods

2. Corpus based:• Usually statistics is used to find the

equivalents in two languages from the parallel corpus for a given word, phrase or sentence.

• Given enough data, machine translation programs often work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker.

Types of MT systems

1.Dictionary based MT

Machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does — word by word, usually without much correlation of meaning between them.

Types of MT systems

2. Statistical machine translation The machine translation tries to generate

translations using statistical methods based on bilingual text corpora

3. Example-based machine translation The EBMT approach is often characterised by its

use of a bilingual corpus as its main knowledge base, at run-time. It is essentially a translation by analogy

Evaluation

• Evaluating the MT system is essential.

• There are various methods for evaluating the performance of machine translation systems:– The oldest method is by using human judges

to tell the quality of a translation, – Automated methods include BLEU, NIST and

METEOR.

Online available MT software

• Alpha Works Emerging Technologies

http://www.alphaworks.ibm.com/aw.nsf/html/mt• Systran Language Translation Technologies

http://www.systransoft.com/index.html• Free Professional Translation

http://www.freetranslation.com/• 100 Links to Online Translators and Machine

Translation Software

http://www.bultra.com/mtlinks.htm

Problem -1

• Word Sense Disambiguation: Identifying the correct meaning of a polysemous word.

• Translating with the correct meaning of the word.

• Example: Telugu-English equivalents

perugu = yogurt [N], grow [V]

cheyyi = hand [N], do [V]

pallu = fruits [N], teeth [N]

Problem 2• Anaphora Resolution: Identifying the NP

the pronoun refers to.• Translating correctly the pronoun.

Examples:

Sam went to market. He met Harry. He gave him his telephone number.

John and Mary saw some cars. They were new models.

John and Mary saw some cars. They were excited.

Problem 3

Structural ambiguity: A sentence having at least two structures, each structure having a different meaning.

Examples: They are cooking apples. Flying planes can be dangerous. Susan wrote a letter to her daughter

from London.

Problem 4• Translating idioms, collocations, multi-word

expressions, frozen expressions, phrasal verbs etc.

Problem 5

• Translating verbalized nouns as in English

He nailed the picture on the wall = He hung the picture on the wall with a nail.

He axed him to death = He killed him with an axe.

Assignment 3Due date: 3rd June, 2009

Step 1: Choose an English text of five sentences[This text should contain at least some challenging problems that we

have discussed]

Step 2: Use Systrans/Google translate for translating this text into Arabic

Step 3:Use the translated output in Arabic as the input now.

Step 4: Translate this back into English

Is the original English text and the translated English text the same? Comment where the translation went wrong.

[Important: Your comments are very important. So don’t copy from each other.]