january 26, 20001 stochastic transductions for machine translation* giuseppe riccardi at&t...

61
January 26, 2000 1 Stochastic Transductions for Machine Translation* Giuseppe Riccardi AT&T Labs-Research *Joint work with Srinivas Bangalore and Enrico Bocchieri

Upload: dortha-nelson

Post on 25-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

January 26, 2000 1

Stochastic Transductions for

Machine Translation*

Giuseppe RiccardiAT&T Labs-Research

*Joint work with Srinivas Bangalore and Enrico Bocchieri

2

Overview

• Motivation• Stochastic Finite State Machines• Learning Machine Translation Models• Case study

– MT for Human-Machine Spoken Dialog

• Experiments and Results

3

Speech Understanding Case

• ATIS 1994 DARPA Evaluation (G. Riccardi et al., ICASSP 1995, E. Bocchieri et al. SLT Workshop 1995 Levin et al., SLT Workshop 95)

Speech Recognizer

P( C ) Semantics C

P(W|C) Syntax L

P(A|W) Acoustic A

max P(A,W,C) min A L C

4

Motivation• Finite State Transducers (FST)

– Unified formalism to represent symbolic transductions

• Learnability– Automatically train transductions from (parallel)

corpora

• Speech-to-Speech Machine Translation chain– Combining speech and language sciences

TBXLC Source SpokenLanguage

Target SpokenLanguage

5

Overview

• Motivation• Stochastic Finite State Machines• Learning Machine Translation Models• Case study

– MT for Human-Machine Spoken Dialog

• Experiments and Results

6

Finite State Transducers• Weighted Finite State Transducers (FST)

– Algebraic Operations (1 + 2, 1 2…)• Composable E-S = E-J J-S

• Minimization min(1)

– Stochastic transductions (E-S :E* X S* [0,1])

• Joint Probability Decomposition: P(X1, X2, …, XN) P(X1) P(X2| X1)…P(XN| XN-1)– ..and computation

1 2 …. N

7

Stochastic TransducersI,1

1 2

I,1 2

I,3

3 4

I,

5V/4

1 2

cool,<adv>/0.3

cool,<adj>/0.3

cool,<noun>/0/2

cool,<verb>/0.2

8

Learning FSTs• Data-driven Learnig FSTs from large corpora.• Learnability

– Finite History Context (N-gram).– Generalization

• Unseen Event modeling (back-off)• Class-based n-grams.• Phrase grammar (long-distance dependency)• Context-free grammar approximation.

• Large-scale transductions– Efficient state and transition function model– Variable Ngram Stochastic Machine (VNSM)

9

VNSM: the state and transition space• Bottom-up approach

– Each state is associated to n-tuple in the corpus

– Each transition is associated to adjacent strings – Parametrization:

#states #n-tuples in the corpus#transitions #n-tuples in the corpus#-transition #(n-1)-tuples in the corpusVNSM recognizes W V* (V is the dictionary)

10

VNSM:Unseen Event Modeling

(the power of amnesia/reminiscence)

History= w2, w3

History=w3

History=“”

w4

11

VNSM: probability distributions

• Probability Distribution over W V*

• Parameter tying• Probability training

),1(context length variable"" theis

...

)|(

1

,..11,

nkk

wwW

wwwP

ijij

N

Niikiiji

jij

12

Overview

• Motivation• Stochastic Finite State Machines• Learning Machine Translation Models• Case study

– MT for Human-Machine Spoken Dialog

• Experiments and Results

14

Stochastic FST Machine Translation• Decompose language translation into two independent

processes.Lexical Choice : searching the target language wordsWord Reordering: searching the correct word order

• Modeling the two processes as stochastic finite state transductions

• Learning the transductions from bilingual corpora.

Speech and Language finite state transduction chainSpeech and Language finite state transduction chain

TBXLC Source SpokenLanguage

Target SpokenLanguage

15

Stochastic Machine Translation

• Noisy-channel paradigm (IBM)

• Stochastic Finite State Transducer Model

)()|(maxargˆTTS

WT WPWWPW

T

Problem Reordering Word )|(maxargˆ

Problem Choice Lexical ),(maxargˆ

)ˆ(TT

WRWT

TSW

T

WPW

WWPW

TT

T

MT SW TW

17

Learning Stochastic Transducers

• Given the input-output pair training set

• Align the input and output language sequences:

• Estimate the joint probability via VNSMs• Local reordering• Sentence-level reordering

),..,,)(,..,(, 2121 MNTS yyyxxxWW

},y...y{y ),)...(,)(,( )W,F(W M1i2211TS NN yxyxyx

18

Pairing and Aligning (1)

• Source-target language pairs• Sentence Alignment

– Automatic algorithm (Alshawi, Bangalore and Douglas, 1998)

Spanish : ajá quiero usar mi tarjeta de crédito

English : yeah I wanna use my credit card

Alignment : 1 3 4 5 7 0 6

19

Learning SFST from Bi-language

• Bi-language: each token consists of a source language word with its target language word.

• Ordering of tokens: source language order or target language order

• ajá quiero usar mi tarjeta de crédito• yeah I wanna use my credit card

• (ajá,yeah) (I) (quiero,wanna) (usar,use) (mi,my) (tarjeta,card) (de,

(crédito,credit)

SW

TW

)W,F(W TS

20

Learning Bilingual Phrases

• Effective translation of text chunks (e.g. collocations)

• Learn bilingual phrases – Joint entropy minimization on bi-language corpus– Weighted Mutual Information to rank bilingual

phrases

• Phrase-based VNST

• Local Reordering of phrases

).....,(),( )),)...(,(( 435432132111 yyxxyyyxxxyxyxh NN

una llamada de larga distancia

a call long distance

a long distance call

VNST

LocalReordering

21

Local Reordering

• Spanish Reordered Phrase=min(S TLM)– Word permutation machine expensive S is the “sausage” machine

TLM is the target language model

mmyyy )...( 21

22

Lexical Reordering• Output of the lexical choice transducer:

sequence of target language phrases.– like to make I'd call a calling card please

• Words in phrases are in target language word order.

• However, phrases need to be reordered in target language word order.

• Reordered: – I'd like to make a calling card call please

23

Lexical Reordering Models• Tree-based model

– Impose a tree structure on a sentence (Alshawi et.al ACL98)

– English: I'd like to charge this to my home phone

24

Lexical Reordering Models• Reordering using tree-local reordering

rules.Eng-Jap: 私は したいのです チャージ これを 私の 家の 電話に私は (I)

これを (this)

したいのです (like)

チャージ(charge)

家の (home)

私の(my)

電話に(phone)

私は (I)

これを (this)

したいのです (like)

チャージ(charge)

家の (home)

私の(my)

電話に(phone)

Japanese: 私は これを 私の 家の 電話に チャージ したいのです

25

Lexical Reordering Models (contd.)

• Dependency tree represented as a bracketed string (bounded) with reordering instructions. :[ したいのです : したいのです :-1 :[ チャージ : チャージ :]

:]

• Training VNSTs from bracketed corpus• Output of lexical reordering VNST: strings

with reordering instructions.• Instructions are composed with

“interpreter” FST to form target language sentence.

26

Tree Reordering• Sentence-level reordering

– Mapping sentence tree structures

English : my card credit(spanish order)

English : my credit card(english order)

card

my credit

-1 +1

card

my credit

-2 -1

Transduction 1

Transduction 2(alignment statistics)

Transduction 3

28

ASR-based Speech Translation

Alignment

VNST Learning

Bi-Phrase Learning

Tree Reordering

)W,F(W TS

)W,(W TS

a

b

Acoustic Model Training

)Speech,(WS

Lexicon FSM

)(Lexicon

A

L

c Speech Recognizerc b a L A

29

Overview

• Motivation• Stochastic Finite State Machines• Learning Machine Translation Models• Case study

– MT for Human-Machine Spoken Dialog

• Experiments and Results

30

MT Evaluation

• Lexical Accuracy (LA)– Bag of words.

• Translation Accuracy (TA)– Based on string alignment

• Application-driven evaluation– “How May I Help You?”– Spoken dialog for call routing– Classification based on salient phrase

detection

31

Automated Services and Customer Care

via Natural Spoken Dialog

Prompt is “AT&T. How may I help you?” User responds with unconstrained fluent

speech Spoken Dialog System for call routing

HELP

area

code

billing

credit

DA rate . . .

32

Examples

• Yes I like to make this long distance call area code x x x x x x x x x x

• Yeah I need the area code for rockmart georgia

• Yeah I’m wondering if you could place this call for me I can’t seem to dial it it don’t seem to want to go through for me

33

Call-Classification Performance

• False Rejection Rate: Probability of rejecting a call, given that the call-type is one of the 14 call-type set.

• Probability Correct: Probability of correctly classifying a call , given that the call is not rejected.

34

MT evaluation on HMIHY

35

DEMO

36

Conclusion• Stochastic Finite State based approach is

viable and effective for limited domain MT.

• Finite-state model chain for complex speech and language constraints.

• Multilingual speech application enabled by MT

• Coupling of ASR and MT

http://www.research.att.com/~srini/Projects/Anuvaad/home.html

37

Biblio-J. Berstel “Transductions and Context Free Languages” Teubner Studienbüchner-G. Riccardi, R. Pieraccini and E. Bocchieri, "Stochastic Automata for Language Modeling", Computer Speech and Language, 10, pp. 265-293, 1996.-Fernando C. N. Pereira and Michael Riley. Speech Recognition by Composition of Weighted Finite Automata . Finite-State Language Processing. MIT Press, Cambridge, Massachusetts. 1997-S. Bangalore and G. Riccardi, "Stochastic Finite-State Models for Spoken Language Machine Translation", Workshop on Embedded Machine Translation Systems, NAACL, pp. 52-59, Seattle, May 2000.

More references on http://www.research.att.com/info/dsp3

 

 

http://research.att.com/info/dsp3

39

Stochastic Finite State Models:from concepts to speech

• Variable Ngram Stochastic Automata (VNSA)– Concept Modeling for NLU – Word Sequence Modeling for ASR

• Phonotactic Transducers (context-to-phone)• Tree-structured Transducers (phone-to-word)• Stochastic-FSM based ASR (context-to-concept)

• ATIS Evaluation: it actually worked!

1993

1994

40

Why it worked?

• Symbolic representation (SFSM) for probabilistic sequence modeling (words, concepts,..).

• Learning algorithms– Cascade (phrase grammar -> {phrases, word classes} ->

words)

• Machine Combination– Context-to-Phone, Phone-to-Word, Context-to-Grammar (CLG)

• Decoding very simple and fast (Viterbi and Beam-Width Search)

48

Multilingual Speech Processing

ASR-MT EngineTarjeta de credito Credit card

Credit card

• Finite state chain allow for:Speech and Language coupling (e.g. prosody,

recognition errors) Integrated multilingual processing

49

Speech Translation• Previous approaches to Speech

Translation– Source language ASR– Translation Model

• Finite-state Model based Speech Translation– Source Language Acoustic Model– Lexical Choice Model– Lexical Reordering Model

51

Learning the state space and state transition function

(revised)• For each suffix in the corpus, we

create two states (one for string recognition and the other for backoff, epsilon transition).

• The size of the automaton is still linear in the corpus size

• The stochastic automaton is able to compute word probability for all strings in X*!.

52

Word Prediction……the President of United ???……

“elections”

History(4)=“the president of United”PrevClass=AdjPrevPrevClass=Function WordTrigger(10)=“Elections”

States/p1

Airlines/p2

/p3

the president of United

State Transition Probability Stochastic Finite State Automata/Transducers

53

Learning Lexical Choice Models

• English utterances recorded from customer calls.

• Manually translated into Japanese/Spanish.• ``Bunsetsu'' like tokenization for Japanese.• Alignment

English: I'd like to charge this to my home phoneJapanese: 私は これを 私の 家の 電話に チャージ したいのです Alignment: 1 7 0 6 2 0 3 4 5

• BilanguageI'd_ 私は like_ したいのです to_charge_ チャージ this_

これを to_ my_ 私の home_ 家の phone_ 電話に

54

Learning Bilingual Stochastic Transducers

• Learn stochastic transducers from bilanguage (Embedded MT 2000)

• Learn automatically bilingual phrases– Reordering within phrases.エイ ティー アンド ティー A T and T私の 家の 電話に to my home phone

私は コレクト コールを I need to makeかける 必要があります a collect call

tarjeta de credito credit carduna llamada de larga distancia a long distance

call

55

Lexical Choice Transducer• Language Model: N-gram model built on

phrase-chunked bilanguage.• A combination of phrases and words

maximize predictive power and minimize number of parameters

• Resulting finite-state automaton on bilanguage vocabulary is converted into a finite-state transducer.

56

Lexical Reordering• Output of the lexical choice transducer:

sequence of target language phrases.– like to make I'd call a calling card please

• Words in phrases are in target language word order.

• However, phrases need to be reordered in target language word order.

• Reordered: – I'd like to make a calling card call please

57

Lexical Reordering Models• Tree-based model

– Impose a tree structure on a sentence (Alshawi et.al ACL98)

– English: I'd like to charge this to my home phone

58

Lexical Reordering Models• Reordering using tree-local

reordering rules.Eng-Jap: 私は したいのです チャージ これを 私の 家の 電話

に私は (I)

これを (this)

したいのです (like)

チャージ(charge)

家の (home)

私の(my)

電話に(phone)

私は (I)

これを (this)

したいのです (like)

チャージ(charge)

家の (home)

私の(my)

電話に(phone)

Japanese: 私は これを 私の 家の 電話に チャージ したいのです

59

Lexical Reordering Models (contd.)

• Dependency tree represented as a bracketed string with reordering instructions. :[ したいのです : したいのです :-1 :[ チャージ : チャージ :] :]

• Lexical reordering FST: Result of training a stochastic finite-state transducer on the corpus of bracketed strings.

• Output of lexical reordering FST: strings with reordering instructions.

• Instructions are interpreted to form target language sentence.

60

Translation using stochastic FSTs• Sequence of finite-state transductions

English: I’d like to charge this to my home phoneEng-Jap: 私は したいのです チャージ これを 私の 家の 電話に

Japanese: 私は これを 私の 家の 電話に チャージ したいのです

私は (I)

これを (this)

したいのです (like)

チャージ(charge)

家の (home)

私の(my)

電話に(phone)

私は (I)

これを (this)

したいのです (like)

チャージ(charge)

家の (home)

私の(my)

電話に(phone)

61

Spoken Language Corpora

• Prompt: How may I help you?• Examples

Yeah I need the area code for rockmart georgiaYes I'd like to make this long distance call area

code x x x x x x x x x x Yeah I'm wondering if you could place this call

for me I can't seem to dial it it don't seem to want to go through for me

• Parallel corpora: English, Japanese, Spanish.

62

Evaluation Metric

• Evaluation metric for MT is a complex issue.• String edit distance between reference string

and result string (length in words: R)– Insertions (I)– Deletions (D)– Moves = pairs of Deletions and Insertions (M)– Remaining Insertions (I') and Deletions (D')

• Simple String Accuracy = 1 – (I + D + S) / R• Generation String Accuracy =

1 – (M + I' + D' + S) / R

63

Experiments and Evaluation

• Data Collection: – The customer side of operator-customer

conversations transcribed– Transcriptions were then manually translated

into Japanese

• Training Set: 12226 English-Japanese sentence pairs

• Test Set: 3253 English sentences.• Different translation models

– Local Reordering (LR), Sentence Reordering (SR)– Word bigram and Phrase bigram

64

Results• English-Japanese Translation

• Phrase-based models outperform word-based models

• Accuracy improves with local reordering (LR) (from 58.0% to 60.4% for phrase n-gram)

• Inclusion of sentence-level reordering (SR) improves accuracy dramatically (from 58.0% to 70.9% for phrase n-gram)

No LR

LR SR

Word n-gram

57.0 57.0

67.5

Phrase n-gram

58.0 60.4

70.9

65

Summary

• FSTs provide a framework for integration of modules.

• FST representation for lexical choice (Embedded MT-Workshop, 2000)

• Here, lexical reordering as an FST.• Integrated finite-state model for

speech translation.

66

Application: How May I Help You?

67

Machine Translation

“When I look at an article in Russian, I say: 'This is really written in English, but it has been coded in some strange symbols.

I will now proceed to decode’”(Warren Weaver, March, 1947)

Is the process a finite state decoding chain?

69

A T and T area code to my home phone charge it to my home phone

A T y Tcodigo de areaal telefono de mi casa

cargarla al telefono de mi casa

Bilingual Phrases(reordered)

70

Learning SFST from Bi-language

• VNSM learning algorithm applied to the bi-language training set

• Joint probability

– Unseen translation pairs (bi-language token backoff)– Out-of-vocabulary source language word

• String-to-string mapping– Ambiguous machine (“multiple translations”)

)),)...(,(|,()W,P(W 1111TS iininiii yxyxyxP

71

Lexical Choice Transducer• Language Model: N-gram model built on

phrase-chunked bilanguage.• A combination of phrases and words

maximize predictive power and minimize number of parameters

• Resulting finite-state automaton on bilanguage vocabulary is converted into a finite-state transducer.

72

Finite State versus Non-Finite State MT

• Head-Transducer MT (Alshawi, Bangalore and Douglas, 1998)

Spanish-English Text

LA TA

Finite-State 77.4 60.4

Head-Transducer

77.3 62.3

English-Japanese

Speech

LA TA

Finite State 77.6 58.1

Head-Transducer

78.7 58.7

73

Lexical Choice Accuracy(Spanish-to-English)

VNST order Recall R

Precision P

F-Measure(2*P*R/(P+R))

Unigram 30.7 86.4 45.4Bigram 69.0 84.4 75.8Trigram 66.8 79.0 72.1Phrase Unigram

45.6 86.6 59.7

Phrase Bigram 72.2 83.4 77.4Phrase Trigram

71.9 83.1 77.1

74

Understanding Spontaneous Speech

1. Transcr: yes I wanted to charge a call to my business phone

ASR+parse: yes I_WANNA_CHARGE call to distance phone

Transd+decision: {THIRD NUMBER 0.75}

2. Transcr: hi I like to stick it on my calling card please

ASR+parse: hi I'D_LIKE_TO_SPEAK ON_MY_CALLING_CARD please

Transd+decision: {PERSON_PERSON 0.78} {CALLING CARD 0.96}

3. Transcr: I'm trying to find out a particular staple which fits one of your guns or a your desk staplers

ASR+parse: I'm trying TO_FIND_OUT they've CHECKED this is still call which six one of your thompson area state for ask stay with the

Transd+decision: {RATE 0.25} {OTHER 0.86}