jseva-odep-phd_pristupnirad_automatic language translation

8/7/2019 JSeva-ODEP-PhD_PristupniRad_Automatic language translation

1/13

University of Zagreb

Faculty of Organization and Informatics Varadin

Automatic language translation

Seminar paper

Ph.D. studies

Class: Selected Chapters in e-Commerce

Mentor: Neven Vrek, Ph.D.

Jurica eva

Ph.D. candidate

February 2011


2/13

Table of Contents

1. Introduction (brief history) ................................ ................................ ................................ ......... 3

2. Approaches ................................ ................................ ................................ ................................ 4

Rule based methods (aka Knowledge based machine translation) ................................ .................. 4

a) Direct translation ................................ ................................ ................................ ................ 5

b) Transfer translation ................................ ................................ ................................ ............ 5

c) Interlingua translation ................................ ................................ ................................ ........ 6

Example-based methods ................................ ................................ ................................ ................ 6

Statistical methods ................................ ................................ ................................ ......................... 7

a) Translational equivalence model ................................ ................................ ......................... 8

b) Parameterization ................................ ................................ ................................ ................ 9

c) Parameter estimation ................................ ................................ ................................ ....... 10

d) Decoding ................................ ................................ ................................ .......................... 103. Practical examples: Moses ................................ ................................ ................................ ........ 11

4. Literature ................................ ................................ ................................ ................................ . 12


3/13

1. Introduction (brief history)The problem of automatic language translation has been considered since the early days of modernscience, starting with the 17th century philosophers Leibniz and Descartes who developed codeswhich related words between languages. These first steps where purely hypothetical andphilosophical and never resulted in a machine that implemented these first steps. The first practicalprogress (actual translating machines) has been made in the early 1930s but the truly important

development has been made after the invention of computers and computing machines and wasproposed by Warren Weaver at the Rockefeller Foundation as presented in [1]. The WW II andprogress made in the field of code breaking has significantly helped the progress in the field ofautomated translation although the field itself was still at the early stages of development, with thefirst theoretical and practical implementation being put to place. The first efforts have made itpossible for machine translation to be spread out to the universities across the US and the first publicdemonstration followed in 1954 via IMB and Georgetown University [1]. During this period of timethe machine translation research has spread out form the US to the rest of the world, with theRussian, Chinese and British academia making valuable contribution to the field. A significant stepforward was made when the argument of semantic ambiguity (double meaning) has been presented.As stated in the example:

Little John was looking for his toy box. Finally he found it. The box was in the pen,

the word pen has double meaning (either a object to write with or a container). This has led to thefounding of the ALPAC (Automatic Language Processing Advisory Committee) which concluded thatMACHINE TRANSLATION was slower, less accurate and twice as expensive as human translation andthat there is no immediate or predictable prospect of useful machine translation [1]. The conclusionof this research can be viewed as the death of machine translation as it was conducted until then andhas stopped the machine translation research in the US for more than a decade although there weresuccessful systems implemented in this period over the world (e.g. Systan (USAF), Meteo (MontrealUniversity)). Following these successful implementation, the 80s showed a significant number ofvarious machine translation systems from a wide variety of countries (this was mostly caused by thewide availability of microcomputers and text processing software and the drop of the computationalprice; the conclusion of the ALPAC were without merit by that point). Most notable systems were

GETA-Ariane (Grenoble), SUSY (Saarbrcken), Mu (Kyoto), DLT (Utrecht), Rosetta (Eindhoven), theknowledge-based project at Carnegie-Mellon University (Pittsburgh), and two internationalmultilingual projects: Eurotra, supported by the European Communities, and the Japanese CICCproject with participants in China, Indonesia and Thailand [2]. A major turning point in the field ofmachine translation were the 90s with IBMs Candide system (based on statistics; more aboutmethods of machine translation is mentioned in the following chapter) and this period also wasnotable for starting to use corpora based translations. Still, both approaches ignored the semantic orsyntactic rules in the analysis of text but proposed alternatives to available rule based methods forlarge text exploitation. The major projects of this era, as mentioned in [2], are ATR (Nara, Japan),JANUS (ATR, Carnegie-Mellon University, University of Karlsruhe) and the Verbmobil project(Germany). Another important shift during this period was the transition from pure research projectsto practical applications. The 00s have placed machine translation products as a mass-market

product, especially with the development of the Web that has made on line translation servicesavailable (AltaVista, Google Translate, etc.). For the interested reader about the history,development, different approaches and current state, a more detailed history of machine translationis presented in [2].

It is now widely accepted that global communications must be accessible and transferable, in atimely manner, in as many languages as feasible. Given that any field in which human beings areactively involved requires the knowledge of another field, machine translation, having a historyalmost as old as the modern digital computer, emerged as an attempt to overcome the intricacy of


4/13

'being informed' in a group of offers to sustain communication. In doing this, machine translation,much advanced since then, is a key means for the human translator, although not without itsproblems. Machine translation applications, for some, have long been challenging human translators.For others, despite machine translation researchers' arguments, it cannot aim at replacing thehuman mind. Moreover, machine translation designers, taking a simplistic view of languagetranslation, have also long been searching for an idealistic key to find a universal foundation for all

natural languages. For instance, [3] suggest a comprehensive assessment of the issues behindmachine translation and popular misconceptions. For them, the types of knowledge an automatedtranslation system should have are: a) linguistic knowledge independent of context (semantics), b)linguistic knowledge that relates to context (pragmatics), and c) common sense / real worldknowledge (non-linguistic knowledge).

2. ApproachesThere are several ways of approaching the problems of automatic language translation betweendifferent languages. They are listed and briefly presented in this chapter. A brief history of thedevelopment of machine translation techniques has been presented in the introductory chapter.Toachieve a complete and full language translation one needs to understand and use methods based

on linguistic rules where a the words in the source language are replaced by the most suitable wordsin the target language and it is argued that the problem of natural language understanding has to besolved first. In general the translation process can be summarized as follows:

1. Decoding the source text2. Recoding the meaning of the source text in the target language

Picture 1. Machine translation triangle[4]

Although it looks like a simple process the accurate translation is a complicated cognitive operation.

More detailed explanation of each model follows in subsequent sub chapters. A general graphicaloverview is presented in picture 1.

Rule based methods (aka Knowledge based machine translation)

When dealing with the rule based translation systems, they can be categorized in 3 main groups:

y Direct translation, which were first machine translation approaches trying to simulate thework of real world translators


5/13

y Transfer translation, utilizing an intermediate language in translationsy Interlingua translation, utilizing a proper second level translation language

a) Direct translationDirect translation systems have been developed since 1965 and were, like many machine

translation efforts of that period, focused on the English Russian translation. In practical sense

these systems are the oldest and least popular approach to automatic language translation. Theyare primarily focused on one target language, are bilingual and uni-directional and need only alittle syntactic and/or semantic analysis. An example of a system based on direct translation isthe SYSTRAN system and the PAHO system. The main characteristic of direct translation systemsis that they rely on a large set of language pair-dependent rules to carry out the translationof a text [5]. They are based on rules that take separate grammatical and lexical phenomenaof the source language (SL) and their realizations in the target language (TL) and put thetwo in correspondence [5] as presented in Picture 2. In general, they try to directly map thesource language to the target language which makes him highly dependent on both the sourceand target languages. In practice this means that we need to develop a new system for every newlanguage pair.

PICTURE 2. DIRECT TRANSLATION SYSTEM[5]

b) Transfer translationUnlike direct translation systems that ignore syntactic/semantic structure of the source

and target language, transfer translation system use syntactic target-language-independent

analysis of the source language which allows substituting source language lexical units withtarget language lexical units in context [5] proving a more accurate translation then directtranslation systems. In practice it allows taking into account the syntactic structure of asentence and its parts in which lexical units appear. It relies on the idea that direct translationbetween languages is not feasible and that a successful translation requires an existence of anintermediate language representation that will allow capturing the meaning(syntactic/semantic) of the sentence in the original language and allow for a successfultranslation in to the target language. In transfer translation system the intermediate language isdependent on the language pair involved in the actual translation. The general scheme of thetranslation follows the same steps in all transfer translation systems (although the actualtranslation varies from implementation to implementation):

y Analysis of input text for morphology and syntaxy Creating an internal representationy Generation translation from the internal translation (based on bilingual dictionaries and

grammatical rules)

If during the translation only the syntactic structures are transferred then we are dealing with asuperficial transfer and if we take in to account the semantic translation then we are dealing with adeep transfer. This approach, including an intermediate language or Interlingua is most widely usedmethod of dealing with machine translation.


6/13

Picture 3. Transfer translation system

Examples of such systems are TAUM developed by the University of Montreal, METEO (a practicaldemonstration by University of Montreal), AVIATION, SUSY developed by University of Saar, GETdeveloped by the University of Grenoble among others.

c) Interlingua translationJust as with transfer translation systems, Interlingua translation systems employ an

intermediate translation language as a bond between the source language and target language.With this approach the Interlingua language is an abstract language-independent representationof the source language. This intermediate abstract language allows for a more precise translationin between any two languages and it needs fewer components for that. Besides that, it makes iteasier to add a new language and allows mono-language system developers to participate inactive translation efforts as it serves as an abstract representation of a language. The mappingsin between languages are more accurate and easier to maintain this way as well as providing auniversal language that would potentially allow a one to one translation in between any twolanguage pairs. There are a few disadvantages to this approach, the main one being the difficultyto create the actual Interlingua that is adequate. The difficulty comes from the need put towardsthe Interlingua to be both abstract and independent from both source and target languages. Thetranslation process follows the same steps:

y Analysis of the source language text and the interlingua languagey Generation of the interlingua and the target language

Picture 4. Interlingua MT system architecture [4]

The newly generated Interlingua language is called pivot or bridge language and it can be either anartificial language or natural language that serves as an intermediary language for translationbetween two or more languages. Picot language is an independent language and if constructedcorrectly it allows translation in between any two pair of languages that have the same intermediary

language translation. Using this way of translation we escape the explosion of the number of possibletranslation combination

Example-based methods

Example based method was first presented in [6] and was based on the concept of analogy. Asmentioned in [6], the main proposed ideas about the translation are:

1. Man does not translate a simple sentence by doing deep linguistic analysis, rather,


7/13

2. Man does the translation, first, by properly decomposing an input sentence into certainfragmental phrases (very often, into case frame units), then, by translating these fragmentalphrases into other language phrases, and finally by properly composing these fragmentaltranslations into one long sentence. The translation of each fragmental phrase will be doneby the analogy translation principle with proper examples as its reference, which isillustrated above.

This approach, without modifications is very time consuming. By dividing the entire process in to substages with providing the needed information at each stage, the entire process becomes lesscomplicated. The original paper ([6]) has identified the following sub stages:

1. Reduction of redundant expressions and supplement of eliminated expressions in an inputsentence, and getting an essential sentential structure.

2. Analysis of sentential structure by case grammar.3. Retrieval of target language words, and example phrases which are stored in the word

entries from the dictionary.4. Recognition of the similarity between the input sentential phrases and example phrases in

the dictionary. The word thesaurus is used for the similarity finding.5. Choice of a global sentential form for translation.6. The choice of local phrase structure is determined by the requirements of the global

sentential structure.

Although this provided a novel approach at the time, in [7] it is stated that example based machinetranslation is much less clearly defined. A proper definition of this method can be given in contrast tothe differences between rule based machine translation methods and statistical machine translationmethods as they have much in common. As mentioned in [7], the closest parallels between EBMTand RBMT are found when systems use structural transformations (in analysis, matching, extraction,recombination/synthesis), but they are present also whenever individual SL lexical items aresubstituted by individual TL lexical items.

S

tatistical methods

Statistical Machine Translation as a research area started in the late 1980s with the Candide projectat IBM. IBM's original approach maps individual words to words and allows for deletion and insertionof words. The overall approach generates translations based on the statistical models whoseparameters are derived from the analysis of bilingual text corpora. Statistical machine translationtreats translation as a machine learning problem, meaning that there is a learning algorithm appliedto a large body of previously translated text, called parallel corpus. According to [8], the interest instatistical machine translation can be attributed to the 1) growth of the Internet, which providedsame resources in several languages, 2) subjects interested in assimilation of non native languageinformation, 3) fast and cheap computing hardware making it cheaper and easier to do large scalecomputing, 4) development of automatic translation metrics and 5) freely available statistical

machine translation toolkits. When using statistical approach to machine translation, the task can bedefined as transforming a sequence of tokens in the source language with the vocabulary VF in to asequence of tokens in the target language with vocabulary VE [8] with the consideration that all dataare preprocessed consistently. The goal of the translation system is to find a target sentence e that istranslationally equivalent to the source sentencef (f VF, e VE). To solve this task there are 4 mainissues that need to be taken care of (as defined in [8]):

1. Describing steps that transform a source sentence in to a target sentence; translationalequivalence model


8/13

2. Enabling the model to make good choices when dealing with ambiguity;parameterization3. Associating values to parameters defined in step 2;parameter estimation4. Search for the highest scoring translation; decoding

a) Translational equivalence modelIn the first step, the objective is to create a model (a set of rules) that the translation

system will use in translating the source sentence in to the target sentence and they are usuallyextracted from the corpus. Translational equivalence models allow us to enumerate possiblestructural relationships between pairs of strings. There are various types of these models, mostnotable being finite-state transducer modelsand synchronouscontext-free grammars[8].

Finite-state transducer models are an extension of finite-state automata that aredefined as a set of states S, labels L and transition D where a transition is defined as a pair ofstates and a label that must be the output when we move from one state to the next. Thesetransitions can be applied both to the word-based as well as phrase-based models. Word-basedmodels are the first statistical based translation models and the interested reader can read moreabout them in [9], [10] and [11]. They are made out of 3 steps, as presented in [8]

a. Each target word chooses the number of source words that it will generate (fertilitynumber)b. Each copy of each target word produces a single source wordc. The translated words are permuted into their nal order

In this model each word is considered a separate entity and is approached in translation in thatfashion. The other approach, phrase-based models, looks at a continuous sequence of words andtranslates the sequence as an entity and in this approach is more similar to the human basedtranslations. Just like word-based models, there are three steps here as well in applying thesemodels, also found in [8]:

a. Splitting sentences in to phrasesb. Translating each phrasec. Permuting translated phrases in their final order

The interested reader is directed to [12].

Picture 5. CFG vs SCFG


9/13

Synchronous context-free grammars or syntax-directed translation, as the nameimplies, extends the FSM approach with some level of incorporation a linguistic representationof syntax. They are based on context-free grammars (developed by Noam Chomsky in the 1950s)and they naturally generate a formal language which has a non overlapping grammar and canhave nested clauses arbitrarily deeply. An example of a structured sentence is:

John, whose blue car was in the garage, walked to the green store. ->

(John, ((whose blue car) (was (in the garage)))), (walked (to (the green store))).

A synchronous context-free grammar is a generalization of context-free grammars to the case oftwo output strings where it defines a correspondence between strings via the use of co-indexednon-terminals. An example is provided in picture 5. One reason of using SCFG is the efficientexpression of reordering. The reviewed literature mentions three applications of SFCG instatistical machine translation:

y Bracketing grammarsy Syntax-based translationy Hierarchical phrase-based translation

b) ParameterizationThis is the second step, along with the creating a translational equivalence model, in

creating a fully functional model for statistical translation. This step addresses the problem ofpossible number of reordering combinations in the translation process and presents a method ofchoosing between all possible translations of a single source sentence. Because the number ofpossible translation grows exponentially, one way of dealing with providing the most accuratetranslation (and hopefully the best possible) is to award a score to any pair of source and targetsentences (similar as applied to the field of machine translation; more about the subject and theunderlying mathematics can be found in [13]). This step, as mentioned, deals with assigning thebest output to x X to an input y Y by using a functionf: X x Y -> R hat maps input and outputpairs to a real-valued score that is used to rank possible outputs [8]. These mappings arefounded on the probabilistic theory, and they can either be joint models where the assigned

value is a joint probability or a conditional model where one of the parameters (y) is fixed andthe probability is the probability of the assignment of the other parameter (x). There are anumber of possible models in dealing with the value assignment, among others:

y Generative modelsGenerative models are based on Bayes probability rules and can be defined as

[8], where P(f) is a constant (it denotes the input sentence), P(e) the language

model and P(f, d|e) the translation model. This approach was first proposed with the IBM Model4, and it looks how a possible translation fits the source sentence. These models are, because ofthis approach, useful in decoding as they correspond closely to the translational equivalencemodels.

y Discriminative models


10/13

Discriminative models are used to bring additional context into modeling. These modelstoo are influenced by the development of the machine learning field. One example of suchmodels is the use of log-linear modeling (a model that most of statistical translation models use)that define a relationship a relationship between a set of Kxed features

(e, d, f) of the dataand the function P(e, d|f) that we are interested in [8], with the future defined as any functionthat maps every pair of input and output strings to a non negative value.

c) Parameter estimationAfter acquiring all possible parameters, this step assigns values to them based on a given

sample of input/output pairs. A detailed description of this process can be found in [14].Generative models and discriminative models approach this stage in a different manner.

d) DecodingThe last step in creating a statistical machine translation system is the actual decoding

where, based on the model and parameters defined in previous steps; a new input sentence istranslated in to the target language. This step can be explained as a maximization of the

probability function P(e, d|f) mentioned above. Since there are two types of translationalequivalence models, we can deduct two types of decoding: finite state transducer translation(FST) and synchronous context-free grammar models and they will be presented next to concludethe chapter about statistical machine translation.

In FST search proceeds through a directed acyclic graph of states representing partial orcompleted translation hypotheses [8] as presented in picture 6.

Picture 6. FST search

The search space for possible translation is big and number of potential combinations thatwould translate the source sentence increases with every single cycle. As mentioned in [8] thereare 4 elements that each state consists of:

y A coverage set C{1, 2, ... , J} enumerates the positions of the source string f that have beentranslated

y If using an n-gram language model, the n1 most recently generated target words are keptfor computing the n-gram language model component of the probability

y cost h of our partial hypothesis is computed as the combination of model costs associatedwith the hypothesis

y estimated cost g of completing the partial hypothesis is computed heuristically (usually singlebest word-to-word (phrase-to-phrase) cost is used).


11/13


12/13

The modelsissimilarto the theoretical models presented in the previoussection where

eachsentence issegmented in to sequences, called phrases, and each phrase istranslated in to

the target language and reordered after the actual translation to achieve the proper word

sequence in the target language. Moses offers the use of both phrase-based and tree based

translation models. To create the phrase translation table, Mosesuses a technique called word

alignment, presented in picture 8, where the two language comparison tables are generated

(word mappings from language A to language B and vice versa). sing these two tables a jointintersection table is generated to get a high-precision alignmentof high-confidence alignment

points. There are several ways of achieving the actual world alignment, the mostcommon being

the toolkitGIZA++ whichis an implementation ofthe original IBM Modelsthatstarted the entire

statistical translation. Decoding in Moses implements a beam search with its components

(pruning, future costestimation). An example of possible translationis given in the picture 9.

Picture9Translation options

These translation options arecollected be

ore any decoding takes place

This allows a quicker lookupthan consulting the whole phrase translation table during decoding. The translation optionsarestored with the information about first foreign word covered, last foreign word covered, Englishphrase translation and phrase translation probability. The decoder is based on a beam searchalgorithm which is a heuristicsearch algorithm that focuses on the most promising node in a limitedset and presents an optimization of a best-first search (a graph search) that predicts how close thecurrent partial solution is to an overall solution. More about beam search can be found in [17] and[18]. Moses also uses an addition to the general model that allows the use of linguistic information

(morphological, syntactic or semantic) through factored translation models. The use of this kind ofadditional information is desirable for two reasons asstated in [15]

y Translation models that operate on more general representations, such as lemmas instead ofsurface forms of words, can draw on richer statistics and overcome the data sparsenessproblemscaused by limited training data.

y Many aspects of translation can be best e plained on a morphological, syntactic, orsemanticlevel. Having such information available to the translation model allows the direct modelingof these aspects. For instance ! reordering at thesentence level is mostly driven by generalsyntactic principles, local agreement constraintsshow up in morphology, etc.

4. Li"

#

r

$

"

ur#

[1] J. Hutchins, The history of machine translation in a nutshell, 2005.

[2] W.J. Hutchins, Ma% h&

n'

(

)

an0

1

a (&

on 2 pa0

(

3 p) ' 0 '

n ( 34

u ( u) '

, John Wiley & Sons, Inc., 1986.

[3] D. Arnold, L. Balkan, S. Meijer, R.L. Humphreys, and L. Sadler,MACHINE5RANSLATION

2An

In ()

odu%

( o)

6

Gu&

d'

, Blackwells- 7 8 8 , London, 1994.


13/13

[4] B. Dorr and E. Hovy, Natural Language Processing and Machine Translation Encyclopedia ofLanguage and Linguistics, (ELL2). Machine Translation: Interlingual Methods, Citeseer, pp. 1-20.

[5] S. Nirenburg, Machine translation: a knowledge-based approach, Machine Translation, vol.4, 1992, pp. 5-24.

[6] M. Nagao, A framework of a mechanical translation between Japanese and English byanalogy principle,A 9 T@ F@ C@AL AND A UMAN @ NTB LLIGENCE, 1984.

[7] J. Hutchins, Example-based machine translation: A review and commentary,MachineTranslation, vol. 19, 2005, p. 197211.

[8] A. Lopez, Statistical machine translation,ACM Computing Surveys, vol. 40, Aug. 2008, pp. 1-49.

[9] P.F. BROWN, J. COCKE, S.D. PIETRA, V.J.D. PIETRA, F. JELINEK, J.D. LAFFERTY, and P. MERCER,R. L. S. ROOSSIN, A statistical approach to machine translation, Comput. Linguist. 16, vol. 2,

1990.

[10] P.F. BROWN, S.A.D. PIETRA, V.J.D. PIETRA, and R.L. MERCER, The mathematics of statisticalmachine translation: Parameter estimation, Comput. Linguist. 19, vol. 2, 1993, pp. 263-311.

[11] A.L. BERGER, P.F. BROWN, S.A.D. PIETRA, V.J.D. PIETRA, J.R. GILLETT, J.D. LAFFERTY, R.L.MERCER, A. PRINTZ, H., and L. URES, The Candide system for machine translation,Proceedings of the A

9PA Workshop on

Auman Language Technology, 1994, pp. 157-162.

[12] S. KUMAR, Y. DENG, and W. BYRNE, A weighted nite state transducer translation templatemodel for statistical machine translation, NaturalLang. Engin. 12, vol. 1, 2006, p. 3575.

[13] T.M. MITCHELL, Machine Learning, McGraw-Hill, 1997.

[14] P.F. Brown, V.J.D. Pietra, S.A.D. Pietra, and R.L. Mercer, The mathematics of statisticalmachine translation: Parameter estimation, Computational linguistics, vol. 19, 1993, p. 263311.

[15] Moses - statistical machine translation system.

[16] P. Koehn, F.J. Och, and D. Marcu, Statistical phrase-based translation, Proceedings of the2003 Conference of the North American Chapter of the Association for Computational

Linguistics onA

uman Language Technology-Volume 1, Morristown, NJ, USA: Association forComputational Linguistics, 2003, p. 4854.

[17] P. Koehn, Pharaoh: a beam search decoder for phrase-based statistical machine translationmodels, Machine translation: From real users to research, 2004, p. 115124.

[18] V. Steinbiss, B.H. Tran, and H. Ney, Improvements in beam search, ThirdInternationalConference on Spoken Language Processing, 1994.

jseva-odep-phd_pristupnirad_automatic language translation

Documents