machine translation introduction

90
Machine Translation Introduction Jan Odijk LOT Winterschool Amsterdam January 2011 1

Upload: glenda

Post on 13-Jan-2016

52 views

Category:

Documents


1 download

DESCRIPTION

Machine Translation Introduction. Jan Odijk LOT Winterschool Amsterdam January 2011. Overview. MT: What is it MT: What is not possible (yet?) MT: Why is it so difficult? MT: Can we make it possible? MT: Evaluation MT: What is (perhaps) possible Conclusions. MT: What is it?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Machine Translation Introduction

Machine TranslationIntroduction

Jan Odijk

LOT Winterschool

Amsterdam January 2011

1

Page 2: Machine Translation Introduction

Overview

• MT: What is it

• MT: What is not possible (yet?)

• MT: Why is it so difficult?

• MT: Can we make it possible?

• MT: Evaluation

• MT: What is (perhaps) possible

• Conclusions2

Page 3: Machine Translation Introduction

MT: What is it?

• Input: text in source language

• Output text in target language that is a translation of the input text

3

Page 4: Machine Translation Introduction

MT: What is it?

Interlingua

Analyzed input transfer Analyzed output

Input direct translation Output4

Page 5: Machine Translation Introduction

MT: System Types

• Direct:– Earliest systems (1950s)

• Direct word-to-word translation

– Recent statistical MT systems

• Transfer– Almost all research and commercial systems <=

1990

• Interlingual5

Page 6: Machine Translation Introduction

MT: System Types

• Interlingual– A few research systems in the 1980s

• Rosetta (Philips), based on Montague Grammar– Semantic derivation trees of attuned grammars

• Distributed Translation (BSO)– (enriched) Esperanto

• Sometimes logical representations

• Hybrid Interlingual/Transfer– Transfer for lexicons; IL for rules

6

Page 7: Machine Translation Introduction

Rule-Based Systems

• Most systems– explicit source language grammar– parser yields analysis of source language input– transfer component turns it into target language

structure– no explicit grammar of target language (except

morphology)

7

Page 8: Machine Translation Introduction

Rule-Based Systems

• Some systems (Eurotra)– explicit source and target language grammar

• sometimes reversible

– parser yields analysis of source language input– transfer component turns it into target language

structure– generation of translation by target language

grammar

8

Page 9: Machine Translation Introduction

Rule-Based Systems

• Some systems (Rosetta, DLT)– explicit source and target language grammar

• in some cases reversible

– parser yields interlingual representation– generation of translation by target language

grammar from interlingual representation

9

Page 10: Machine Translation Introduction

MT: Is it difficult?

• FAHQT: Fully Automatic High Quality Translation– Fully Automatic: no human intervention– High Quality: close or equal to human

translation

• Even acceptable quality is difficult to achieve

10

Page 11: Machine Translation Introduction

MT: Why is it so difficult?

• Ambiguity– Real– Temporary

• Computational Complexity

• Complexity of language

• Divergences

• Language Competence v. Language Use

• Require large and rich lexicons11

Page 12: Machine Translation Introduction

MT: Why is it so difficult?

• De jongen sloeg het meisje met de gitaar

• Hij heeft boeken gelezen

• Hij heeft uren gelezen– He has been reading books– *He has been reading for books– *He has been reading hours– He has been reading for hours

12

Page 13: Machine Translation Introduction

MT: Why is it so difficult?

• Uren: not only also– dagen, de hele dag, weken, …– (Words expressing units of time)

• But also:– De hele vergadering, meeting, bijeenkomst, les,

…– (words expressing events)

13

Page 14: Machine Translation Introduction

MT: Why is it so difficult?

• Hij draagt een bruin pak– Dragen: wear or carry– Pak: suit or package

• Hij draagt een bruin pak en zwarte schoenen

• Hij draagt een bruin pak onder zijn arm

14

Page 15: Machine Translation Introduction

MT: Why is it so difficult?

• Voert uw bedrijf sloten uit?– Uitvoeren: execute, or export?– Bedrijf: act, or company?– Sloten: ditches, or locks?

15

Page 16: Machine Translation Introduction

MT: Why is it so difficult?

• Temporary Ambiguity– Hij heeft boeken gelezen

• Heeft: main or auxiliary verb?

• Boeken: noun or verb

– Voert uw bedrijf sloten uit?• Voert: form of voeren or of uitvoeren,

• Bedrijf: noun or verb form?

• Sloten uit: noun+particle or PP: out of ditches/locks

16

Page 17: Machine Translation Introduction

Why is MT difficult?

• Ambiguity of natural language Summary– requires modeling of knowledge of the world

/situation• by rule systems, and/or

• by statistics

17

Page 18: Machine Translation Introduction

MT: Why is it so difficult?

• Computational Complexity– High demands of processing capacity– High demands on memory

• Complexity of language– Many different construction types– All interacting with each other

18

Page 19: Machine Translation Introduction

Why is MT difficult?

• Divergences between language– require deep syntactic analysis– Or very sophisticated statistical techniques

19

Page 20: Machine Translation Introduction

Divergences: Category mismatches

• Simple category mismatches– woonachtig (zijn) v. reside (Adj – Verb)– zich ergeren v. (be) annoyed (Verb-Adj)– verliefd v. in love (Adj- Prep+Noun)– kunnen v. (be) able– kunnen v. (be) possible– door- v. continue (to)

20

Page 21: Machine Translation Introduction

Divergences: Category mismatches

• More complex category mismatches– graag vs. like (Adv vs. Verb)

• hij zwemt graag vs. he likes to swim

– toevallig vs. happen• hij viel toevallig vs. he happened to fall

21

Page 22: Machine Translation Introduction

Divergences: Category mismatches

• Phrasal category mismatches– de zieke vrouw

– the woman who is ill (* the ill woman)

– I expect her to leave• ik verwacht dat zij vertrekt

– She is likely to come• het is waarschijnlijk dat zij komt

22

Page 23: Machine Translation Introduction

Conflational Divergences:

• prepositional complements– houden van vs. love

• existential er vs. Ø– er passeerde een auto vs.– a car passed

• verbal particles– blow (something) up vs. volar

23

Page 24: Machine Translation Introduction

Conflational Divergences:

• reflexive verbs– zich scheren vs. shave

• composed vs. simple tense forms – he will do it vs. lo hará

• split negatives vs. composed negatives– he does not see anyone vs.– hij ziet niemand

24

Page 25: Machine Translation Introduction

Functional Divergences:

• I like these apples– me gustan estas manzanas

• se venden manzanas aqui– hier verkoopt men appels

• er werd door de toeschouwers gejuicht– the spectators were cheering

25

Page 26: Machine Translation Introduction

Divergences: MWEs

• semi-fixed MWEs– nuclear power plant vs. kerncentrale

• flexible idioms– de plaat poetsen vs. bolt– de pijp uit gaan v. to kick the bucket

26

Page 27: Machine Translation Introduction

Divergences: MWEs

• semi-idioms (collocations)– zware shag vs. strong tobacco

• semi-idioms (support verbs)– aandacht besteden aan– pay attention to

27

Page 28: Machine Translation Introduction

MT: Why is it so difficult?

• Language Competence v. Language Use– Earlier systems implemented idealized reality– But not the really occurring language use– In some cases

• focus on theoretically interesting difficult constructions

• That do occur in reality• But other constructions are more important to deal

with in practical systems

28

Page 29: Machine Translation Introduction

MT: Why is it so difficult?

• Large and rich lexicons– Existing human-oriented dictionaries are not

suited as such– All information must be available in a

formalized way– Much more information is needed than in a

traditional dictionary

29

Page 30: Machine Translation Introduction

MT: Why is it so difficult?

• Multi-word Expressions (MWEs)– Are in current dictionaries only in a very

informal way– No standards on how to represent them

lexically– Many different types requiring different

treatment in the grammar– Huge numbers!!– Domain and company-specific terminology are

often MWEs

30

Page 31: Machine Translation Introduction

MT: Can we make it possible?

• Probably not, • but we can still improve significantly

– Lexicons– Selection restrictions– Approximating analyses

• Statistical MT

31

Page 32: Machine Translation Introduction

MT: Can we make it possible?

• Large and rich lexicons– widely accepted and used (de facto) standards– Methods and tools to quickly adapt to domain

or company specific vocabulary– Better treatment of MWEs and standards for

lexical representation of MWEs

32

Page 33: Machine Translation Introduction

MT: Can we make it possible?

• Selection restrictions with type system to approach modeling of world knowledge– Requires sophisticated syntactic analysis

• Boek: info (legible)• Uur: time unit duration• Vergadering: event duration• Lezen: subject=human; object=info (legible)• Durational adjunct must be a duration phrase

33

Page 34: Machine Translation Introduction

MT: Can we make it possible?

• Selection restrictions– Pak (1) (suit): cloths– Pak (2) (package): entity– Dragen (1) (wear): subj=animate; object=cloths– Dragen (2) (carry): subj=animate; object= entity – Schoen: cloths– Entity > cloths– Identity preferred over subsumption– Homogeneous object preferred over heterogeneous one

34

Page 35: Machine Translation Introduction

MT: Can we make it possible?

• Selection restrictions– Hij draagt een bruin pak

• He wears a brown suit (1: cloths=cloths) • He carries a brown package (1: entity=entity)• He carries a brown suit (2: entity > cloth)• *He wears a brown package (cloth ¬> entity)

– Hij draagt een bruin pak en zwarte schoenen• He wears a brown suit and black shoes (1: homogeneous and

cloths=cloths)• He carries a brown suit and black shoes (2: homogeneous but

entity > cloths)• He carries a brown package and black shoes(2:

inhomogeneous but entity=entity)• *He wears a brown package and black shoes (cloths ¬> entity)

35

Page 36: Machine Translation Introduction

MT: Can we make it possible?

• Approximating analyses – Ignore certain ambiguities to begin with– Use only limited amount of relevant

information – Cut off analysis when there are too many

alternatives– This is currently actually done in all practical

systems– Need new ways of doing this without affecting

quality too seriously

36

Page 37: Machine Translation Introduction

MT: Can we make it possible?

• Statistical MT • Derives MT-system automatically

– From statistics taken from• Aligned parallel corpora ( translation model)• Monolingual target language corpora ( language

model)

• Being worked since early 90’s

37

Page 38: Machine Translation Introduction

MT: Can we make it possible?

• Plus:– No or very limited grammar development– Includes language and world knowledge automatically

(but implicitly)– Based on actually occurring data– Currently many experimental and commercial systems

• Minus:– Requires large aligned parallel corpora– Unclear how much linguistics will be needed anyway– Probably restricted to very limited domains only

38

Page 39: Machine Translation Introduction

MT: Can we make it possible?

• Google Translate (statistical MT)• Hij draagt een pak. √He wears a suit.• Hij draagt schoenen. √ He wears shoes.• Hij draagt bruine schoenen en een pak.

√ He wears a suit and brown shoes. (!!)• Hij draagt het pakket √ He carries the package• Hij heeft een pak aan. *He has a suit.• Voert uw bedrijf sloten uit?

*Does your company locks out?•

39

Page 40: Machine Translation Introduction

MT: Can we make it possible?

• Euromatrix esp. “the Euromatrix”– Lists data and tools for European language pairs– Goals

• Translation systems for all pairs of EU languages• Organization, analysis and interpretation of a competitive annual international

evaluation of machine translation • The provision of open source machine translation technology including

research tools, software and data• A systematically compiled and constantly updated detailed survey of the state

of MT technology for all EU language pairs • Efficient inclusion of linguistic knowledge into statistical machine

translation• The development and testing of hybrid architectures for the integration of

rule-based and statistical approaches

40

Page 41: Machine Translation Introduction

MT: Can we make it possible?

• Euromatrix esp. “the Euromatrix”– Lists data and tools for European language pairs– Goals

• Translation systems for all pairs of EU languages• Organization, analysis and interpretation of a competitive annual international

evaluation of machine translation • The provision of open source machine translation technology including

research tools, software and data• A systematically compiled and constantly updated detailed survey of the state

of MT technology for all EU language pairs • Efficient inclusion of linguistic knowledge into statistical machine translation• The development and testing of hybrid architectures for the integration of

rule-based and statistical approaches

• Successor project EuromatrixPlus

41

Page 42: Machine Translation Introduction

MT: Can we make it possible?

• META-NET 2010-2013 (EU-funding)– Building a community with shared vision and strategic

research agenda– Building META-SHARE, an open resource exchange

facility– Building bridges to neighbouring technology fields

• Bringing more Semantics into Translation• Optimising the Division of Labour in Hybrid MT• Exploiting the Context for Translation• Empirical Base for Machine Translation

42

Page 43: Machine Translation Introduction

MT: Can we make it possible?

• PACO-MT 2008-2011• Investigates hybrid approach to MT

– Rule-based and statistical– Uses existing parser for source language

analysis– Uses statistical n-gram language models for

generation– Uses statistical approach to transfer

43

Page 44: Machine Translation Introduction

MT Evaluation

• Evaluation depends on purpose of MT and how it is used– application, domain, controlled language

• Many aspects can be evaluated– functionality, efficiency, usability, reliability,

maintainability, portability

– translation quality

– embedding in work flow• post-editing options/tools

44

Page 45: Machine Translation Introduction

MT Evaluation

• Focus here:– does the system yield good translations

according to human judgement– in the context of developing a system

• Again, many aspects:– fidelity (how close), correctness, adequacy,

informativeness, intelligibility, fluency– and many ways to measure these aspects

45

Page 46: Machine Translation Introduction

MT Evaluation

• Test suite– Reference =

• list of (carefully selected) sentences• with their translations (ordered by score)

– translations judged correct by human (usually developer)– upon every update of the system output of the new system is compared to the

reference• if different: system has to be adapted, or reference has to be adapted

• Advantages– focus on specific translation problems possible– excellent for regression testing– Manual judgement needed only once for each new output

• –other comparisons are automatic

• Disadvantages– not really independent– particularly suited for pure rule-based systems– human judgement needed if output differs from reference

46

Page 47: Machine Translation Introduction

MT Evaluation

• Comparison against– translation corpus– independently created by human translators– possibly multiple equivalently correct translations of a sentence

• Advantages– truely independent– also suited for data-driven systems

• Disadvantage– requires human judgement (every time there is a system update)

• high effort by highly skilled people, high costs, requires a lot of time– human judgement is not easy (unless there is a perfect match)

• Useful – for a one-time evaluation of a stable system– not for evaluation during development

47

Page 48: Machine Translation Introduction

MT Evaluation

• Edit-Distance (Word Accuracy)– metric to determine closeness of translations

automatically– the least number of edit operations to turn the

translated sentence into the reference sentence– Alshawi et al. 1998

48

Page 49: Machine Translation Introduction

MT Evaluation

• WA = 1- ((d+s+i)/max(r,c))• d= number of deletions• s = number of substitutions• i = number of insertions• r = reference sentence length• c = candidate sentence length• easy to calculate using Levenshtein distance

algorithm (dynamic programming)• various extensions have been proposed

49

Page 50: Machine Translation Introduction

MT Evaluation

• Advantages– fully automatic given a reference set

• Disadvantages– penalizes candidates if a synonym is used– penalizes swaps of words and block of words

too much

50

Page 51: Machine Translation Introduction

MT Evaluation

• BLEU (method to automate MT Evaluation)– the closer a machine translation is to a

professional human translation, the better it is– BiLingual Evaluation Understudy

• Required:– corpus of good quality human reference

translations– a “closeness” metric

51

Page 52: Machine Translation Introduction

MT Evaluation

• Two candidate translations from Chinese source– C1: It is a guide to action which ensures that

the military always obeys the commands of the party

– C2: It is to insure the troops forever hearing the activity guidebook that party direct

• Intuitively: C1 is better than C2

52

Page 53: Machine Translation Introduction

MT Evaluation

• Three reference translations– R1: It is a guide to action that ensures that the

military will forever heed Party commands– R2: It is the guiding principle which guarantees

the military forces always being under the command of the Party

– R3: It is the practical guide for the army always to heed the directions of the party

53

Page 54: Machine Translation Introduction

MT Evaluation

• Basic idea:– a good candidate translation shares many words

and phrases with reference translations comparing n-gram matches can be used to

rank candidate translations• n-gram: a sequence of n word occurrences

– in BLEU n=1,2,3,4- 1-grams give a measure of adequacy- longer n-grams give a measure of fluency

54

Page 55: Machine Translation Introduction

MT Evaluation

• For unigrams:– count the number of matching unigrams

• in all references

– divide by the total number of unigrams (in the candidate sentence)

55

Page 56: Machine Translation Introduction

MT Evaluation

• Problem– C1: the the the the the the the (=7/7=1)– R1: the cat is on the mat

• Solution:– clip matching count (7) by maximum reference

count (2) 2 (CountClip)

modified unigram precision = 2/7=0.29

56

Page 57: Machine Translation Introduction

MT Evaluation

• Example (unigrams)– C1: It is a guide to action which ensures that the

military always obeys the commands of the party (17/18=0.94)

– R1: It is a guide to action that ensures that the military will forever heed Party commands

– R2: It is the guiding principle which guarantees the military forces always being under the command of the Party

– R3: It is the practical guide for the army always to heed the directions of the party

57

Page 58: Machine Translation Introduction

MT Evaluation

• Example (unigrams)– C2: It is to insure the troops forever hearing the activity

guidebook that party direct (8/14=0.57)

– R1: It is a guide to action that ensures that the military will forever heed Party commands

– R2: It is the guiding principle which guarantees the military forces always being under the command of the Party

– R3: It is the practical guide for the army always to heed the directions of the party

58

Page 59: Machine Translation Introduction

MT Evaluation

• Example (bigrams)– C1: It is a guide to action which ensures that the

military always obeys the commands of the party (10/17=0.59)

– R1: It is a guide to action that ensures that the military will forever heed Party commands

– R2: It is the guiding principle which guarantees the military forces always being under the command of the Party

– R3: It is the practical guide for the army always to heed the directions of the party

59

Page 60: Machine Translation Introduction

MT Evaluation

• Example (bigrams)– C2: It is to insure the troops forever hearing the activity

guidebook that party direct (1/13=0.08)

– R1: It is a guide to action that ensures that the military will forever heed Party commands

– R2: It is the guiding principle which guarantees the military forces always being under the command of the Party

– R3: It is the practical guide for the army always to heed the directions of the party

60

Page 61: Machine Translation Introduction

MT Evaluation

• Extend to a full multi-sentence corpus• compute n-gram matches sentence by sentence• sum the clipped n-gram counts for all candidates• divide by the number of n-grams in the text corpus

• pn =

– ∑C ∈ {Candidates}∑n-gram ∈ C Countclip(n-gram)

– divided by

– ∑C’ ∈ {Candidates}∑n-gram’ ∈ C’ Count(n-gram’)

61

Page 62: Machine Translation Introduction

MT Evaluation

• Combining n-gram precision scores

• weighted linear average works reasonable– ∑N

n=1 wn pn

• but: n-gram decisions decays exponentially with n (so log to compensate for this)– exp (∑N

n=1 wn log pn)

• weights in BLEU: wn = 1/N62

Page 63: Machine Translation Introduction

MT Evaluation

• BLEU is a precision measure– #(C ∩ R) / #C

• Recall is difficult to define because of multiple reference translations– e.g. #(C ∩ Rs) / # Rs

• where Rs = Ui Ri

– will not work

63

Page 64: Machine Translation Introduction

MT Evaluation

• C1: I always invariably perpetually do

• C2: I always do

• R1: I always do

• R2: I invariably do

• R3: I perpetually do

• Recall of C1 over R1-3 is better than C2

• but C2 is a better translation 64

Page 65: Machine Translation Introduction

MT Evaluation

• But without Recall:– C1: of the– compared with R1-3 as before– modified unigram precision = 2/2– modified bigram precision = 1/1– which is the wrong result

65

Page 66: Machine Translation Introduction

MT Evaluation

• Length– n-gram precision penalizes translations longer

than the reference– but not translations shorter than the reference Add Brevity Penalty (BP)

66

Page 67: Machine Translation Introduction

MT Evaluation

• bi= best match length = reference sentence length closest to candidate sentence i‘s length (e.g. r:12, 15, 17, c: 12 12)

• r = test corpus effective reference length = ∑i bi

• c = total length of candidate translation corpus

67

Page 68: Machine Translation Introduction

MT Evaluation

• BP = – computed over the corpus– not sentence by sentence and averaged– 1 if c > r– e(1-r/c) if c <= r

• BLEU = BP • exp (∑Nn=1 wn log pn)

68

Page 69: Machine Translation Introduction

MT Evaluation

• BLEU:– claim: BLEU closely matches human judgement

• when averaged over a test corpus

• not necessarily on individual sentences

• shown extensively in Papineni et al. 2001

multiple reference translations are desirable• to cancel out translation styles of individual translators

• (e.g. East Asian economy v. economy of East Asia)

69

Page 70: Machine Translation Introduction

MT Evaluation

• Variants on BLEU– NIST

• http://www.nist.gov/speech/tests/mt/doc/ngram-study.pdf

• different weights• different BP

– ROUGE (Lin and Hovy 2003) • for text summarization• Recall-Oriented Understudy for Gisting Evaluation

70

Page 71: Machine Translation Introduction

MT Evaluation

• Main Advantage of BLEU– automatic evaluation

• good for use during development

• particularly useful for data-based systems

• Disadvantage– defined for a whole test corpus– not for individual sentences– just measures difference with reference

71

Page 72: Machine Translation Introduction

MT: What is (perhaps) possible

• Cross-Language Information Retrieval• Low Quality MT for Gist extraction• MT and Speech Technology• Controlled Language• Limited Domain• Interaction with author• Combinations of the above• Computer-aided translation

72

Page 73: Machine Translation Introduction

MT: What is (perhaps) possible

• Cross-Language Information Retrieval (CLIR)– Input query: in own language– Input query translated into target languages– Search in target language documents– Results in target language

• Translation of individual words only• Growing need (growing multilingual Web)• No perfect translation required

73

Page 74: Machine Translation Introduction

MT: What is (perhaps) possible

74

Page 75: Machine Translation Introduction

MT: What is (perhaps) possible

• Low quality MT for Gist extraction• Low quality but still useful• If interesting high quality human translation

can be requested (has to be paid for)

75

Page 76: Machine Translation Introduction

MT: What is (perhaps) possible

76

Page 77: Machine Translation Introduction

MT: What is (perhaps) possible

77

Page 78: Machine Translation Introduction

MT: What is (perhaps) possible

• CLIR– Fills a growing need in the market– Is technically feasible– Creates need for translation of found

documents• Solved partially by low quality MT• Potentially creates need for more human translation• Stimulates (funds) research into more sophisticated

MT

78

Page 79: Machine Translation Introduction

MT: What is (perhaps) possible

• Combine MT (statistical or rule-based) with OCR technology– Make a picture of a text with your phone– Text is OCR-ed– Text is translated– (usually a short and simple text)

• Linguatec Shoot & Translate• Word Lens

79

Page 80: Machine Translation Introduction

MT: What is (perhaps) possible

• Combine MT (statistical or rule-based) with Speech technology– Complicates the problem on the one hand but– Speech technology (ASR) is currently limited to very

limited domains (makes MT simpler)– Many useful applications for speech technology

currently in the market• Directory assistance Tourist Information• Tourist communication Call Centers• Navigation Hotel reservations

– Some will profit from in-built automatic translation

80

Page 81: Machine Translation Introduction

MT: What is (perhaps) possible

• Large EC FP6 project TC-STAR (2004-)– (http://www.tc-star.org/)– Research into improved speech technology

(ASR and TTS)– Research into statistical MT– Research in combining both (speech-to-speech

translation)– In a few selected limited domains

81

Page 82: Machine Translation Introduction

MT: What is (perhaps) possible

• Commercial Speech2Speech Translation• Jibbigo

– http://www.jibbigo.com• Speech-to-speech translation (iPhone, Android)

• http://www.phonedog.com/2009/10/30/iphone-app-jibbigo-speech-translator

• Talk to Me (Android phones)

82

Page 83: Machine Translation Introduction

MT: What is (perhaps) possible

• Controlled Language– Authoring System limits vocabulary and syntax

of document authors– Often desirable in companies to get consistent

documentation (e.g. aircraft maintenance manuals)

• AECMA Simplified English• GIFAS Rationalized French

– Makes MT easier (language well-defined)

83

Page 84: Machine Translation Introduction

MT: What is (perhaps) possible

• Limited Domain– Translation of

• Weather reports (TAUM-Meteo, Canada)• Avalanche warnings (Switzerland)

– Fast adaptation to domain/company-specific vocabulary and terminology

84

Page 85: Machine Translation Introduction

MT: What is (perhaps) possible

• Interaction with author– No fully automatic translation– Document author resolves

• Ambiguities unresolved by the system• In a dialogue between the author and the system in

the source language• Approach taken in Rosetta project (Philips)• Will only work if the

– #unresolved ambiguities is low– Questions to resolve ambiguity are clear

85

Page 86: Machine Translation Introduction

MT: What is (perhaps) possible

• Hij droeg een bruin pak– Wat bedoelt u met “pak”

• (1) kostuum• (2) pakket

• Hij droeg een bruin pak– Wat bedoelt u met “dragen (droeg)”

• (1) aan of op hebben (kleding)• (2) bij zich hebben (bijv. in de hand)

86

Page 87: Machine Translation Introduction

MT: What is (perhaps) possible

• Combinations of the above

87

Page 88: Machine Translation Introduction

MT: What is (perhaps) possible

• Computer-aided translation– For end-users– For professional translators/localization industry

• Limited functionality– Specific terminology

• Bootstrap translation automatically– Human revision and correction (Post-edit)

• Only if– MT Quality is such that it reduces effort– The system is fully integrated in the workflow system

88

Page 89: Machine Translation Introduction

Conclusions

• FAHQT not possible (yet?)• MT is really very difficult!• Several constrained versions do yield usable

technology with state-of-the-art MT• In some cases: even potentially creates additional

needs for MT and human translation

89

Page 90: Machine Translation Introduction

Conclusions

• Statistical MT yields practical relatively quick to produce systems (but low-quality)

• More research and lots of hard work is needed to get better systems

• Will probably require hybrid systems (mixed statistically based/knowledge based); the focus of research is here (PACO-MT, META-NET,…)

• Needs to be financed by niches where current state-of-the art MT yields usable technology and there is a market.

90