overcoming the lack of parallel data in machine...

75
Overcoming the Lack of Parallel Data in Machine Translation Kevin Knight & David Chiang USC/ISI MURI Review November 14, 2014 Two Talks: Exploiting monolingual data Exploiting deeper representations

Upload: others

Post on 23-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Overcoming the Lack of Parallel Data in Machine Translation

Kevin Knight & David Chiang USC/ISI

MURI Review

November 14, 2014

Two Talks: • Exploiting monolingual data • Exploiting deeper representations

Page 2: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Talk #1: Exploiting Monolingual Data cross-site collaboration (ISI/UT/CMU/MIT): using dependency parsers and word aligners to extract translation patterns from non-parallel text

Page 3: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Exploiting Monolingual Data

Malagasy text

Deciphering Engine

Malagasy/English translation dictionary, models for use in MT

Decipherment into English

treat Malagasy as a code for English... and decode

Page 4: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Step by Step

Letter Substitution Ciphers [Ravi/Knight 08, Ravi/Knight 09a]

Phoneme Substitution [Ravi/Knight 09b]

Word Substitution Ciphers [Ravi/Knight 11a, Dou/Knight 12]

Foreign Language as a Cipher for English [Ravi/Knight 11a, Dou/Knight 12, Dou/Knight 13, Dou/Vaswani/Knight 14]

Historical Ciphers [Snyder/Barzilay/Knight 10, Knight/Megyesi/Schaefer 11, Ravi/Knight 11b, Reddy/Knight 11]

2009 2010 2011 2012 2013 2008 2014

Page 5: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Letter Substitution Cipher

Page 6: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Letter Frequencies

2-grams: 3-grams: ? - 99 ? - ^ 47 C : 66 C : G 23 - ^ 49 Y ? - 22 : G 48 y ? - 18 z ) 44 H C | 17

Tendencies:

A, E, I, O, U followed by 3 and j A, E, I, O, U preceded by z and >

0

50

100

150

200

250

300

350

400

450

^ | z G- C Z j ! 3 Y ) U y + O F H = : I > b g RM E X c ? 6 K N n < / Q ~ A D p B P " S l Lkm1 & e 5f v h rJ 7 i T s o ] a t d u89[ 0w_ W 4 q @x2#, ` \*%

Page 7: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Letter Distributions

?]8R j 3 |^+C~DgBF/[4TM15-: 7Q >z6X9s qxJmvknwtrfhoai Lc bp uKei W”=Gd&<)OAZUEI y!Y PHN

unaccented Roman letters

circumflexed vowels

underlined letters

letters grouped if they have similar contexts (L/R neighbors)

thanks Jon Graehl

Page 8: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Statistical Modeling

P(c | p) P(p)

plaintext p ciphertext c

“key”

Page 9: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Statistical Modeling

ciphertext c P(c | p) P(p)

plaintext p

Find substitution-table values that maximize P(c) = Σp P(p, c) = Σp P(p) P(c | p)

best guess plaintext p

Find plaintext p that maximizes P(p | c) ∼ P(p) P(c | p)

EM

Viterbi

LM

plaintext samples, unrelated to ciphertext

ciphertext c

“key”

Page 10: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Letter Substitution: Results

[Ravi & Knight 08]

Plus other methods, such as based on integer linear programming.

Page 11: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Word Substitution

Each code number represents a plaintext word, not letter

Page 12: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Encipherment Key

Decipherment Key

Word Substitution Keys

Page 13: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Word Substitution

[Dou & Knight 2012]

Page 14: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Foreign Language as a Code for English

!l@!m !lywm !lth!ny& !l@!m !lm!Dy Sfr @!m th!ny& @!m 1992 @!m 1993 ywm !l!sbw@ !lm!Dy fy !ldqyq& !lsn& !lj!ry& !lsn& !lsh=hr !lm!Dy !lsh=hr !lj!ry snw!t sn& =hdh! !l@!m s!@& !l@Sr @!m 1991

!l@Swr =hdh! !lsh=hr fy ywm nys!n !sbw@ =hdh=h !l!'y!m qbl !'y!m fy !l@Sr mn !lsn& !lsnw!t b@d ywm !l!y!m 13 nys!n 1994 !lth!ny& @shr& thl!th& !y!m qbl !sbw@yn fy !lywm !lt!ly sh@b!n tmwz 3 dhw !lHj& 1414 fy shb!T !lm!Dy qbl ywmyn

@!m 1990 w!lth!ny& fy !lywm mn !lsh=hr !lj!ry !lqrn !'y!m @!m!aN !ls!@& 17 shb!T 1994 thl!th snw!t dqyq& =hdh=h !lsn& ywmyn mn !l@!m !lm!Dy !lsn& !lmqbl& fy !lsn& kl ywm fy !l@!m !lm!Dy

Page 15: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

13 4 Hzyr!n 1967 12 fy 12 Hzyr!n 1993 7 5 Hzyr!n 1967 6 fy 30 Hzyr!n 1989 6 30 Hzyr!n 1989 4 fy 30 Hzyr!n 1994 4 fy 30 Hzyr!n 1993 3 fy 19 Hzyr!n 1967 2 ywm 30 Hzyr!n 1989 2 w 6 Hzyr!n 1994 2 qbl 5 Hzyr!n 1967 2 fy 9 Hzyr!n 1967 2 fy 7 Hzyr!n 1981 2 fy 6 Hzyr!n 1994 2 fy 5 Hzyr!n 1967

2 fy 30 Hzyr!n 1995 2 fy 18 Hzyr!n 1994 2 fy 14 Hzyr!n 1993 2 fy 14 Hzyr!n 1991 2 fy 12 Hzyr!n 1990 2 7 Hzyr!n 1994 2 6 Hzyr!n 1941 2 26 Hzyr!n 1994 2 21 Hzyr!n 1994 2 1 Hzyr!n 1994 2 19 Hzyr!n 1965 2 18 Hzyr!n 1994 2 18 Hzyr!n 1940 2 12 Hzyr!n 1993 2 11 Hzyr!n 1994

<n> Hzyr!n <n>

Foreign Language as a Code for English

Page 16: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Foreign Language as a Code for English

13 4 Hzyr!n 1967 12 fy 12 Hzyr!n 1993 7 5 Hzyr!n 1967 6 fy 30 Hzyr!n 1989 6 30 Hzyr!n 1989 4 fy 30 Hzyr!n 1994 4 fy 30 Hzyr!n 1993 3 fy 19 Hzyr!n 1967 2 ywm 30 Hzyr!n 1989 2 w 6 Hzyr!n 1994 2 qbl 5 Hzyr!n 1967 2 fy 9 Hzyr!n 1967 2 fy 7 Hzyr!n 1981 2 fy 6 Hzyr!n 1994 2 fy 5 Hzyr!n 1967

<n> Hzyr!n <n>

Search query Documents January 4, 1967 8040 February 4, 1967 9270 March 4, 1967 10700 April 4, 1967 21800 May 4, 1967 14000 June 4, 1967 39300 July 4, 1967 12600 August 4, 1967 7970 September 4, 1967 7390 October 4, 1967 8800 November 4, 1967 6560 December 4, 1967 9770

Page 17: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Parsing Helps Decipherment

How much foreign text (running words)

Accuracy, learned bilingual dictionary

Decipherment with parsing (Dou/Knight 2013)

Spanish/English

* of most freq 5000 word types, 1-best translation in parallel dict

*

Decipherment without parsing (Dou/Knight 2012)

Adjacent bigrams Naciones Unidas dogs run perros corren blue rock no necessito United Nations piedra azul need not

Dependency bigrams Naciones Unidas run dogs corren perros rock blue necessito no Nations United piedra azul need not

?

Exploit parsers in both languages

Page 18: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Accurate Parsing Is Important

0.5

1.5

2.5

3.5

4.5

5.5

6.5

7.5

8.5

100k 1m 10m

Accu

racy

Num of Tokens

Adjacent

Dep1

Dep2

Dep3

Malagasy/English

Malagasy parser #1

Malagasy parser #2

Malagasy parser #3

** of most freq 5000 word types, any of 5-best in parallel dict

**

Increased parsing data by manual projection through parallel data using online dictionary Improve POS tags with parallel data. UT tagger + CMU Turboparser. CMU Turbotagger and Turboparser

Malagasy English maro many monisipaly municipal ratsy bad midadasika large vavy female lalina fundamental manokana special taitra surprised

Page 19: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Combining Parallel and Non-Parallel Data

small word-aligned parallel corpus

decipherment of large monolingual corpus

improve decipherment by seeding with dictionary derived from parallel data

improved machine translation

Page 20: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Combining Parallel and Non-Parallel Data

small word-aligned parallel corpus

decipherment of large monolingual corpus

improve decipherment by seeding with dictionary derived from parallel data

improve alignment of parallel data using large monolingual resources

improved machine translation

Page 21: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Combining Parallel and Non-Parallel Data

small word-aligned parallel corpus

decipherment of large monolingual corpus

improve decipherment by seeding with dictionary derived from parallel data

improve alignment of parallel data using large monolingual resources

improved machine translation

Joint objective function: Πe,f P(f | e) α · Πf Σe P(e) P(f | e)

parallel data

non-parallel data

find bilingual dictionary that makes both model components happy

Page 22: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Combining Parallel and Non-Parallel Data

Small bilingual Malagasy/English text (need to align words [Brown et al 93])

Large Malagasy monolingual text (need to decipher [Dou & Knight 13])

Decipherment helps Word Alignment

Decipherment helps Machine Translation

joint

Bleu

Page 23: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Decipherment: Next Steps

Integrate decipherment and foreign language parsing – “project English syntax through non-parallel data” – learn better Malagasy parser automatically

Fully integrated processing: – decipherment + alignment + parsing

More accurate decipherment – word classes/embeddings

Open-source tool for decipherment

Need to keep going!

Page 24: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Work by Others on Decipherment • "Simple Effective Decipherment via Combinatorial Optimization," (T. Berg-Kirkpatrick and D.

Klein), Proc. EMNLP, 2011. • "Deciphering Foreign Language by Combining Language Models and Context Vectors," (M.

Nuhn, A. Mauser, and H. Ney), Proc. ACL, 2012. • "Decipherment Complexity in 1:1 Substitution Ciphers," (M. Nuhn and H. Ney), Proc. ACL,

2013. • "Beam Search for Solving Substitution Ciphers," (M. Nuhn, J. Schamper, and H. Ney), Proc. ACL,

2013. • “Unsupervised Consonant-Vowel Prediction over Hundreds of Languages,” (Y. Kim and B.

Snyder), Proc. ACL, 2013. • “EM Decipherment for Large Vocabularies,” (M. Nuhn and H. Ney), Proc. ACL, 2014. • “Scalable Decipherment for Machine Translation via Hash Sampling,” (S. Ravi), Proc. ACL, 2013. • “Decipherment with a Million Random Restarts,” (T. Berg-Kirkpatrick and D. Klein), Proc.

EMNLP, 2013. • “Combining Bilingual and Comparable Corpora for Low Resource Machine Translation,” (A.

Irvine and C. Callison-Burch), Proc. WMT, 2013. • “Hallucinating Phrase Translations for Low Resource MT” (A. Irvine and C. Callison-Burch), Proc.

CoNLL, 2014. • “Solving Substitution Ciphers with Combined Language Models” (B. Hauer, R. Hayward, and G.

Kondrak), Proc. COLING, 2014.

Page 25: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Talk #2: Exploiting Deeper Representations cross-site collaboration (ISI/CMU): mapping language onto meaning mapping meaning onto language combining these for meaning-based MT

Page 26: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Why Meaning-Based MT?

• That’s what translation is: – build grammatical target text… – that preserves the meaning of the source

Oh, we got the meaning wrong…

We got the right meaning, but rendered it disfluently…

- or -

Page 27: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Meaning-Based MT • What content goes into the meaning

representation? Abstract Meaning Representation (AMR)

• How are meaning representations probabilistically generated, transformed, scored, ranked? How to represent knowledge that drives these processes? Automata theory, efficient algorithms

• How can a full MT system be built? Engineering, modeling, features, training

MURI

Page 28: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Machine Translation Automata

Phrase-based MT

Syntax-based MT

source string

target string

source string

source tree

target tree

target string

Page 29: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Finite-State Transducer (FST)

k

n

i

g

h

t

q k q2 *e*

q2 n q N

q i q AY q g q3 *e*

q4 t qfinal T q3 h q4 *e*

Original input: Transformation: q k

n

i

g

h

t

FST

q q2

qfinal q3 q4

k : *e*

n : N

h : *e*

g : *e* t : T

i : AY

Page 30: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Finite-State (String) Transducer

q2 n

i

g

h

t

q k q2 *e*

q2 n q N

q i q AY q g q3 *e*

q4 t qfinal T q3 h q4 *e*

Original input: Transformation: k

n

i

g

h

t

FST

q q2

qfinal q3 q4

k : *e*

n : N

h : *e*

g : *e* t : T

i : AY

Page 31: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Finite-State (String) Transducer

N

q i

g

h

t

q k q2 *e*

q2 n q N

q i q AY q g q3 *e*

q4 t qfinal T q3 h q4 *e*

Original input: Transformation: k

n

i

g

h

t

FST

q q2

qfinal q3 q4

k : *e*

n : N

h : *e*

g : *e* t : T

i : AY

Page 32: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Finite-State (String) Transducer

q g

h

t

q k q2 *e*

q2 n q N

q i q AY q g q3 *e*

q4 t qfinal T q3 h q4 *e*

AY

N

Original input: Transformation: k

n

i

g

h

t

FST

q q2

qfinal q3 q4

k : *e*

n : N

h : *e*

g : *e* t : T

i : AY

Page 33: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Finite-State (String) Transducer

q3 h

t

q k q2 *e*

q2 n q N

q i q AY q g q3 *e*

q4 t qfinal T q3 h q4 *e*

AY

N

Original input: Transformation: k

n

i

g

h

t

FST

q q2

qfinal q3 q4

k : *e*

n : N

h : *e*

g : *e* t : T

i : AY

Page 34: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Finite-State (String) Transducer

q4 t

q k q2 *e*

q2 n q N

q i q AY q g q3 *e*

q4 t qfinal T q3 h q4 *e*

AY

N

Original input: Transformation: k

n

i

g

h

t

FST

q q2

qfinal q3 q4

k : *e*

n : N

h : *e*

g : *e* t : T

i : AY

Page 35: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Finite-State (String) Transducer

q k q2 *e*

q2 n q N

q i q AY q g q3 *e*

q4 t qfinal T q3 h q4 *e*

T

qfinal

AY

N

k

n

i

g

h

t

Original input: Transformation:

FST

q q2

qfinal q3 q4

k : *e*

n : N

h : *e*

g : *e* t : T

i : AY

Page 36: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

S

NP VP

PRO

he

VBZ

enjoys

NP

VBG

listening

VP

P

to

NP

SBAR

music

Original input: Transformation:

q S

NP VP

PRO

he

VBZ

enjoys

NP

VBG

listening

VP

P

to

NP

SBAR

music

Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970)

Page 37: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

S

NP VP

PRO

he

VBZ

enjoys

NP

VBG

listening

VP

P

to

NP

SBAR

music

Original input: Transformation:

q S

NP VP

PRO

he

VBZ

enjoys

NP

VBG

listening

VP

P

to

NP

SBAR

music

Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970)

q S

x0:NP VP

s x0, wa, r x2, ga, q x1

x1:VBZ x2:NP

0.2

Page 38: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

S

NP VP

PRO

he

VBZ

enjoys

NP

VBG

listening

VP

P

to

NP

SBAR

music

Original input: Transformation:

s NP

PRO

he

q VBZ

enjoys

r NP

VBG

listening

VP

P

to

NP

SBAR

music

, ,

Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970)

, wa , ga

Page 39: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

S

NP VP

PRO

he

VBZ

enjoys

NP

VBG

listening

VP

P

to

NP

SBAR

music

Original input: Transformation:

s NP

PRO

he

q VBZ

enjoys

r NP

VBG

listening

VP

P

to

NP

SBAR

music

, ,

Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970)

, wa , ga

s NP

PRO

kare

he

0.7

Page 40: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

S

NP VP

PRO

he

VBZ

enjoys

NP

VBG

listening

VP

P

to

NP

SBAR

music

Original input: Transformation:

q VBZ

enjoys

r NP

VBG

listening

VP

P

to

NP

SBAR

music

, kare wa ,

Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970)

, , ga

Page 41: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

S

NP VP

PRO

he

VBZ

enjoys

NP

VBG

listening

VP

P

to

NP

SBAR

music

kare kiku ongaku o wa daisuki desu ga no

Original input: Final output:

, , , , , , , ,

Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970)

Page 42: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

General-Purpose Algorithms for Tree Automata String Automata

Algorithms Tree Automata

Algorithms N-best … … paths through an WFSA

(Viterbi, 1967; Eppstein, 1998) … trees in a weighted forest (Jiménez & Marzal, 2000; Huang & Chiang, 2005)

EM training Forward-backward EM (Baum/Welch, 1971; Eisner 2003)

Tree transducer EM training (Graehl & Knight, 2004)

Determinization … … of weighted string acceptors (Mohri, 1997)

… of weighted tree acceptors (Borchardt & Vogler, 2003; May & Knight, 2005)

Intersection WFSA intersection Tree acceptor intersection

Applying transducers string WFST WFSA tree TT weighted tree acceptor

Transducer composition WFST composition (Pereira & Riley, 1996)

Many tree transducers not closed under composition (Maletti et al 09)

General-purpose tools Carmel, OpenFST Tiburon (May & Knight 10)

Page 43: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Machine Translation

Phrase-based MT

Syntax-based MT

Meaning-based MT source string

meaning graphs

target string

source string

target string

source string

source tree

target tree

target string

source tree

target tree

Page 44: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Semantic Graphs “Pascale was charged with public intoxication and resisting arrest.”

15,000 sentences have been annotated with Abstract Meaning Representation (AMR) in [Banarescu et al 13].

Page 45: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Abstract Meaning Representation (AMR) Pascale was charged with public intoxication and resisting arrest. (c / charge-05 :ARG1 (p / person :name (n / name :op1 “Pascale”)) :ARG2 (a / and :op1 (i / intoxicate-01 :ARG1 p :location (p2 / public)) :op2 (r / resist-01 :ARG0 p :ARG1 (a / arrest-01 :ARG1 p))))

PropBank frames

Named entities of 80 types

Entities play multiple roles (coreference)

100 semantic roles

Implicit roles

Modality

Negation

Questions

Full exploitation of predicates

Bond investors might not react. (p / possible :domain (r / react-01 :polarity – :arg0 (p2 / person :arg0-of (i / invest-01 :arg1 (b / bond)))

Abstraction from POS Light Verbs Cause Sub-events etc.

Page 46: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Graph Automata for NLU and NLG String Automata

Algorithms Tree Automata

Algorithms Graph Automata

Algorithms N-best answer extraction

… paths through an WFSA (Viterbi, 1967; Eppstein, 1998)

… trees in a weighted forest (Jiménez & Marzal, 2000; Huang & Chiang, 2005)

Investigating: • Linguistically adequate representations • Efficient algorithms Using them in: • Text Meaning (NLU) • Meaning Text (NLG) • Meaning-based MT

Unsupervised EM training

Forward-backward EM (Baum/Welch, 1971; Eisner 2003)

Tree transducer EM training (Graehl & Knight, 2004)

Determinization, minimization

… of weighted string acceptors (Mohri, 1997)

… of weighted tree acceptors (Borchardt & Vogler, 2003; May & Knight, 2005)

Intersection WFSA intersection Tree acceptor intersection

Application of transducers

string WFST WFSA tree TT weighted tree acceptor

Composition of transducers

WFST composition (Pereira & Riley, 1996)

Many tree transducers not closed under composition (Maletti et al 09)

Software tools Carmel, OpenFST Tiburon (May & Knight 10) ISI jointly with CMU & ND

Page 47: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Mapping Between Meaning and Text

the boy wants to see WANT

BOY

SEE

instance

instance

instance agent

patient

agent

Umuhungu arashaka kubona.

Page 48: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Mapping Between Meaning and Text

the boy wants to be seen WANT

BOY

SEE

instance

instance

instance agent

patient

patient

Umuhungu arashaka kubonwa.

Page 49: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Mapping Between Meaning and Text

the boy wants to see the girl WANT

BOY

SEE

instance

instance

instance agent

patient

patient

GIRL

instance

agent

Umuhungu arashaka kubona umukobwa

Page 50: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Mapping Between Meaning and Text

the boy wants to see himself

WANT

BOY

SEE

instance

instance

instance agent

patient

patient agent

Umuhungu arashaka kwibona.

Page 51: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

DAG-to-Tree Transducer [Kamimura & Slutski 82; Quernheim & Knight 2012ab]

• Bottom-up transformation of graph to tree

Page 52: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Hyperedge Replacement Grammar [Drewes et al 97; Chiang et al 2013]

probabilistic rules

initial graph

final graph

Page 53: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

HRG Derivation

instance ARG0

WANT

B

ARG1

instance

ARG0

BELIEVE

ARG1

G

instance

WANT

ARG1

= boy wants girl to believe that he is wanted

LET’S DERIVE THIS:

Page 54: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

HRG Derivation

instance ARG0

WANT

B

ARG1

instance

ARG0

BELIEVE

ARG1

G

instance

WANT

ARG1

instance

ARG0

WANT

B

X

“the boy wants something involving himself”

ARG1

LET’S DERIVE THIS:

Page 55: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

HRG Derivation

instance ARG0

WANT

B

ARG1

instance

ARG0

BELIEVE

ARG1

G

instance

WANT

ARG1

“the boy wants something involving himself”

instance

ARG0

WANT

B

X

ARG1

LET’S DERIVE THIS:

Page 56: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

HRG Derivation

instance ARG0

WANT

B

ARG1

instance

ARG0

BELIEVE

ARG1

G

instance

WANT

ARG1

instance ARG0

WANT

B

X instance

ARG0

BELIEVE

G

“the boy wants the girl to believe something involving him”

ARG1

LET’S DERIVE THIS:

Page 57: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

HRG Derivation

instance ARG0

WANT

B

ARG1

instance

ARG0

BELIEVE

ARG1

G

instance

WANT

ARG1

instance ARG0

WANT

B

X instance

ARG0

BELIEVE

G

“something involving B”

ARG1

LET’S DERIVE THIS:

Page 58: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

HRG Derivation

instance ARG0

WANT

B

ARG1

instance

ARG0

BELIEVE

ARG1

G

instance

WANT

ARG1

instance ARG0

WANT

B

instance

ARG0

BELIEVE

G

instance

WANT

ARG1

ARG1

ARG1

FINISHED!

LET’S DERIVE THIS:

Page 59: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Synchronous Hyperedge Replacement Grammar (SHRG)

[Chiang et al 13]

• Each SHRG rule outputs a graph fragment and a tree fragment simultaneously

• Used for transducing meaning to language, and vice-versa

Page 60: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

SHRG Derivation

S

wants B INF

instance ARG0

WANT

B

X

“the boy wants something involving himself”

Page 61: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

SHRG Derivation

S

wants B INF

to believe G S

instance ARG0

WANT

B

X

ARG1

instance

ARG0

BELIEVE

G

“something involving B”

Page 62: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

SHRG Derivation

instance ARG0

WANT

B

ARG1

instance

ARG0

BELIEVE

G

instance

WANT

ARG1

S

wants B INF

to believe G S

is wanted he

FINISHED!

Page 63: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Formal Properties of HRG Acceptors

novel algorithm

very pleasant!

mildly unpleasant

d=input graph outdegree T=treewidth complexity

Page 64: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Formal Properties of SHRG transducers

New device

Page 65: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

New Work

• DAG-to-tree transducer formalism – Prague JHU workshop, summer 2014

• David Chiang, Dan Gildea (US) • Frank Drewes, Giorgio Satta (Europe)

• Linguistic suitability of formalisms – Explain meaning/string corpora

Page 66: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Strings Graphs

FSA CFG DAG acceptor HRG

probabilistic yes yes yes yes

intersects with finite-

state yes yes yes yes

EM training yes yes yes yes

transduction O(n) O(n3) O(|Q|T+1n) O((3dn)T+1)

implemented yes yes yes yes

Results for DAG automata

d = graph degree for AMR, high in practice T = treewidth complexity for AMR, low in practice (2-3)

Page 67: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

efficient algorithms for k-best, EM, etc + tools + high impact on practical machine translation

invented tree transducers

worked out basic theoretical properties of tree transducers

over 30 years

invented graph grammars & basic recognition invented synchronous graph grammars

for transduction. also: probabilities, training algorithms, theorems, toolkits.

improved algorithms (Prague 2014) summer workshop

aiming at high impact on MT

Page 68: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Linguistic Suitability of Formalisms

• Concisely capture all of the graph/string pairs in a corpus

• By manually building linguistic mapping knowledge

then, measure coverage & conciseness

Page 69: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

“Stress Test” Data

• 10,000 smallest semantic graphs composed of: – Predicates BELIEVE and WANT – Entities BOY and GIRL

• Plus 10 English string realizations of each graph

He wants her to believe he wants her.

Page 70: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Solutions We Designed and Tested

All transducer cascades are bidirectional: we run forwards for NL generation task, and backwards for NL understanding task.

graph SHRG

string tree graph

string

graph DAG2Tree string tree xLNTs (take yield)

tree DAG2Tree (tree-ify)

xLNT (introduce

verbs)

xLNTs (take yield) tree

xLNT (introduce pronouns)

Page 71: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Empirical Results

Data is available -- amr.isi.edu/download/boygirl.tgz We hope that others will continue to design more elegant, efficient formalisms to capture the meaning/text relation!

Page 72: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Novel Contributions

• Hyperedge Replacement Grammar (HRG) – synchronous version for transduction [ACL 2013] – proof-of-concept MT system [COLING 2012]

• Novel algorithm for graph parsing [ACL 2013]

• Empirical fitness to linguistic data [LREC 2014]

• Map of theoretical and computational properties – closure, complexity [FSMNLP 2015, subm.]

Page 73: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Bolinas Graph Processing Toolkit

Page 74: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

Graph Formalisms: Next Steps

• Graph transduction workshop – One week in Dagstuhl, Germany (March 2015) – 35 attendees from Theory and NLP – Organizers: F. Drewes, K. Knight, M. Kuhlman

• Automatic extraction/use of graph grammars

– Hook up with manually-created AMR bank of 15,000 sentences (fiction, news, blog)

• AMR to English generation • English to AMR parsing

Page 75: Overcoming the Lack of Parallel Data in Machine TranslationMURI/Presentations/year4-2014-11-14/...2014/11/14  · Overcoming the Lack of Parallel Data in Machine Translation Kevin

end