shiva and shakti english-hindi machine translation systemsweb.iiit.ac.in/~papi_reddy/test.pdf ·...

30
Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papi [email protected] International Institute of Information Technology-Hyderabad,AP conference name – p.1/30

Upload: trinhkhue

Post on 01-Nov-2018

289 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Shiva and ShaktiEnglish-Hindi Machine Translation

SystemsT.Papi Reddy

papi [email protected]

International Institute of Information Technology-Hyderabad,AP

conference name – p.1/30

Page 2: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Introduction

A Brief History

Different kinds of approaches

Shiva

Shakti

Problems in MT

Lexical Resources needed for MT

Conclusion

conference name – p.2/30

Page 3: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

A Brief History

Beginning (1950’s)

Dark period (1964 - 1979)

Re-Birth (1980)

A new paradigm (1990)

MT in India

conference name – p.3/30

Page 4: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Beginning (1950’s)

Very optimistic attitude towards MT

Very high expectations

Mainly from English to Russian and vice-versa

Successfull demonstration of an MT system withrestricted vocabulary and grammar by Georgetownuniversity and IBM in 1956

Attracted huge fundinginspired many other MT projects

Scaling up of restricted vocabulary and grammar didnot produce satisfactory results (infact deteriorated thesystem’s performance)

conference name – p.4/30

Page 5: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Dark period (1964 - 1979)

Report of Automatic Language Processing AdvisoryCommitte (ALPAC)

MT not possible and practical in near future! :(

Funding for MT stopped in US

However research continued in canada and a very fewother places.

conference name – p.5/30

Page 6: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Re-Birth (1979)

Installation of Taum-Meteo system for translation ofweather bulleins from English to French in Canada in1979

Limited subject domainproduced good translation :)

Japanese MT projectsMu project at Kyoto Univ (1982)

Eurotra MT project in Europe for many Europeanlanguages

Demand came from different sources

conference name – p.6/30

Page 7: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

New Paradigm (1990)

Candide: A purely statistical MT system from IBM

Usage of corpus based approaches in Japanese MTsystems

Focus changed from purely research to practicalapplications

By the end of 90’s , huge growth in usage of MTsystems for different purposes

conference name – p.7/30

Page 8: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

MT in India

MT was taken seriously only in 90’s

Anusaaraka (language accessor) among Indianlanguages built at IITK and University of Hyderabad

English to Hindi MT systemsNCSTCDACIITKSHIVA - CMU, IISC-B, LTRC IIITSHAKTI - LTRC IIIT

conference name – p.8/30

Page 9: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Different approaches

Rule-based : A lingust or a language expert givesnecessary rules to generate translation of sourcelanguage into a target language

Corpus-based : Machine tries to learn the rulesnecessary for translation from a parallel corpus.

StatisticalExample-based

Ex: Shiva

Hybrid approachA combination of both the above approachesEx: Shakti

conference name – p.9/30

Page 10: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Shiva

Example-based

Requires huge amounts of parallel corpora

Learns by associating word sequences in English withword sequences in Hindi

English-Hindi parallel corpora not readily available! :(

Controlled corporaphrasal and sentence level translations

conference name – p.10/30

Page 11: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Shakti

Hybrid system

Uses both the rule-based and corpus-basedapproaches

In addition to the rules provided by the languageexpert, it uses corpus based techniques in translation

conference name – p.11/30

Page 12: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Architecture

Source language(English) dependent modules

English sentence

English sentence Analysis

WSD

parsed output

senses marked

conference name – p.12/30

Page 13: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Architecture

Bilingual modules

parsed output

Transfer Grammar

English-Hindi Dictornary lookup

TAM lookup

reordered output

Hindi words substituted

Hindi TAM substitutedconference name – p.13/30

Page 14: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Architecture

Source Language (Hindi) sentence Synthesis

representation in Hindi

vibhakti Insertion

Agreement

Word form generation

Hindi sentence

vibhaktis inserted

approriate word forms generated

conference name – p.14/30

Page 15: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

English Sentence Analysis

Morphological analyzer

root and other features of each wordmore than one analsysis possible

went (root : go, lex_cat : v)saw (root : see, lex_cat : v) orsaw (root : saw, lex_cat : v)

Parts of speach (POS) tagger

single parts of speech tag for each worduses statistical language model

conference name – p.15/30

Page 16: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

English Sentence Analysis

Chunker

verb groupsnoun phrases

Ex: Godhra_FW (( is_VBZ simmering_VBG ))with_IN [[ anger_NN ]]

Parser

sentence analysisintermediate representation

conference name – p.16/30

Page 17: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Bilingual Analysis

Transfer Rules

Result is a re-ordered intermediate representationfor Hindi.

Word substitution from Bilingual dictionary.sense taken into account

TAM substitution from Bilingual TAM dictionary.

conference name – p.17/30

Page 18: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Transfer grammar

Reordering rules

(Ram)1 (saw)2 (the girl)3 (in the garden)4(Ram)1 (the girl)3 (the garden in)4 (saw)2Mirror principle

Verb Frames (TransLexGram)

verb : meetverb frameEnglish : A meets BHindi : A B sao milaa

Machine learned rules

very specific rulesless coverage

conference name – p.18/30

Page 19: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Tense Aspect Modality(TAM)

Tense Aspect Modality(TAM)

Ram is going home

TAM : is_ing

Hindi TAM : 0_ rha hO

conference name – p.19/30

Page 20: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Engineering Principle

Non-lexical grammar: 6 rulesmirror principle

Category based grammar : 60 verb classes

Lexicalized grammar : 6000 verbsTransLexGram

Fine grained grammar : 60,000 ruleslearn from a corpus

conference name – p.20/30

Page 21: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Hindi Sentence Synthesis

Vibhakti Insertionrule based

if verb = transitive and tense = past thensubj + nao

corpus based

Agreementwith in a chunkacross the chunk

conference name – p.21/30

Page 22: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Hindi Sentence Synthesis

Agreement of verbsubject if no vibhaktielse object if no vibhakti elseneutral form (masculine, singular, third person)

Word form generatortakes the root word and its featuresgives appropriate word form

jaafeatures : v f s 3 0_ raha hOjaa rhI hO

conference name – p.22/30

Page 23: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Problems in MT

Multiple Word Senses

Syntactic and Semantic ambiguities

Multiple analysis of a single sentence

Preposition disambiguation

Different sentence structures

Idioms

conference name – p.23/30

Page 24: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Multiple Word Senses

bankmoney bank?river bank?

Ex: I deposited hundred rupees in the bank

maO nao baOMk maoM saÝ $Payao jamaa ikyao

conference name – p.24/30

Page 25: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Mutliple Word Senses

Ex: She ate lunch in the bank

]sanao baOMk maoM daoPahr ka #aanaa #aayaaEx: She ate lunch on the bank

]sanao iknaara Par daoPahr ka #aanaa #aayaaUses WASP for word sense disambiguation

conference name – p.25/30

Page 26: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Mutliple Word Senses

had

Ex: She had lunch in the restaurant

]sanao jalaPaana-gaRh maoM daoPahr ka #aanaa #aayaaEx: She had tea in the restaurant

]sanao jalaPaana-gaRh maoM caaya iPayaa

conference name – p.26/30

Page 27: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Different Setence Structures

QuestionsWhat, Where, When . . .Yes/No

relative clauses

conference name – p.27/30

Page 28: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Lexical Resources

Bilingual dictionary

TAM dictionary

TransLexGramTransfer lexicon (Bilingual dictionary)Transfer grammarverb framesExample parallel sentences

Phrazal dictionary

conference name – p.28/30

Page 29: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Lexical Resources

Idiom dicionary

Wordnet (indian languages)

Word form generator

Annotated corporaPOS taggingtree banks

Transfer rules

conference name – p.29/30

Page 30: Shiva and Shakti English-Hindi Machine Translation Systemsweb.iiit.ac.in/~papi_reddy/test.pdf · Shiva and Shakti English-Hindi Machine Translation Systems T.Papi Reddy papireddy@students.iiit.net

Conclusion

It is necessary for the linguists and language experts towork hand in hand. Only then we can build practicalsystems

conference name – p.30/30