joint parsing and alignment with weakly synchronized grammars

25
Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein

Upload: eyal

Post on 23-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Joint Parsing and Alignment with Weakly Synchronized Grammars. David Burkett, John Blitzer, & Dan Klein. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A. Statistical MT Training Pipeline. }. 1) Align sentence pairs (GIZA++) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Joint Parsing and Alignment with Weakly Synchronized Grammars

Joint Parsing and Alignment with Weakly Synchronized Grammars

David Burkett, John Blitzer, & Dan Klein

John
PhD, visiting researcher, PostdocAll on correspondencesSay: Correspondences for Domain Adaptation Correspondences for Machine Translation
Page 2: Joint Parsing and Alignment with Weakly Synchronized Grammars

Statistical MT Training Pipeline

1) Align sentence pairs (GIZA++)2) Parse English sentences (Berkeley parser) Parse Foreign sentences

3) Extract rules (Galley et al. 2006)

4) Tune discriminative parameters

在at

办公室office

里in

读了read

书book

read

the

book

in

the

office

} Joint model for (1) & (2)

Page 3: Joint Parsing and Alignment with Weakly Synchronized Grammars

Data Setting for Joint Models

( 中文 ; )English WSJ

.

.

.

(EN; )(EN; )

(EN; )

( 中文 ; )...

( 中文 ; )

Chinese CTBParallel, Aligned CTB

.

.

.

(EN, 中文 ; )(EN, 中文 ; )

(EN, 中文 ; )

Unlabeled parallel text

.

.

.

(EN; 中文 )(EN; 中文 )

(EN; 中文 )

Page 4: Joint Parsing and Alignment with Weakly Synchronized Grammars

Word alignment grids

在at

办公室office

里in

读了read

书book

read

the

book

in

the

office

Page 5: Joint Parsing and Alignment with Weakly Synchronized Grammars

Syntactic Correspondences

EN中文Build a model

Page 6: Joint Parsing and Alignment with Weakly Synchronized Grammars

Correspondence via Synchronous Grammars

Page 7: Joint Parsing and Alignment with Weakly Synchronized Grammars

Synchronous derivation

Page 8: Joint Parsing and Alignment with Weakly Synchronized Grammars

Synchronous Derivation

Page 9: Joint Parsing and Alignment with Weakly Synchronized Grammars

Weakly Synchronized Example

Page 10: Joint Parsing and Alignment with Weakly Synchronized Grammars

Weakly Synchronized Example

Separate PCFGs

Page 11: Joint Parsing and Alignment with Weakly Synchronized Grammars

Weakly Synchronized Example

ITG alignment

Page 12: Joint Parsing and Alignment with Weakly Synchronized Grammars

Weakly Synchronized Example

Points for synchronization, but not required

Page 13: Joint Parsing and Alignment with Weakly Synchronized Grammars

Correspondence Model & Feature Types

办公室office

Feature type 1: Word Alignment

EN 中文

PPPP

Feature type 3: Correspondence

Feature type 2: Monolingual Parser

ENPP

in the office

EN 中文EN 中文 EN 中文EN 中文EN 中文

[HBDK09]

Page 14: Joint Parsing and Alignment with Weakly Synchronized Grammars

Estimating

EN 中文 EN 中文

• Set to maximize the log-likelihood of the correct parses & alignments

EN EN 中文中文 EN 中文

EN 中文• normalizes to sum to 1

Page 15: Joint Parsing and Alignment with Weakly Synchronized Grammars

Computing

PP PP Correspondence features tie pieces together

EN 中文

EN 中文

Computing exactly is intractable

EN 中文 EN 中文

Individual , , have polynomial-time dynamic programming algorithms

Page 16: Joint Parsing and Alignment with Weakly Synchronized Grammars

Approximating : Mean Field

• Exploit tractability in individual models:

• Factored approximation: EN 中文

PPPP

1) Initialize separately

2) Iterate:

• Set to minimize EN 中文

EN 中文

Algorithm

Page 17: Joint Parsing and Alignment with Weakly Synchronized Grammars

Large scale inference

We can approximate in polynomial time, but . . .EN 中文

Sum over possible alignments is an algorithm.

But computers are fast, right?

• Medium-length sentences are 50 words long• Small translation data sets are 250,000 sentences• ~4 quadrillion operations (See for speedup details)[BBK10, HBDK09]

Page 18: Joint Parsing and Alignment with Weakly Synchronized Grammars

Quantitative Results: Parsing

Series178

81

84

87

90 Monolingual Joint

Page 19: Joint Parsing and Alignment with Weakly Synchronized Grammars

Quantitative Results: Parsing

Chinese parser78

81

84

87

90 Monolingual Joint

85.7%

83.6%

Page 20: Joint Parsing and Alignment with Weakly Synchronized Grammars

Quantitative Results: Parsing

Chinese parser English parser78

81

84

87

90 Monolingual Joint

81.2%

84.5%

Page 21: Joint Parsing and Alignment with Weakly Synchronized Grammars

Incorrect English PP Attachment

Page 22: Joint Parsing and Alignment with Weakly Synchronized Grammars

Corrected English PP Attachment

Page 23: Joint Parsing and Alignment with Weakly Synchronized Grammars

Quantitative Results: Translation

Word alignment65

69

73

77

81

85

89 HMM Discriminative ITG Joint

69.5%

85.0%

BLEU improvement from 29.4 to 30.6

79.5%

Page 24: Joint Parsing and Alignment with Weakly Synchronized Grammars

Better Translations with Bilingual Adaptation

ReferenceAt this point the cause of the plane collision is still unclear. The local caa will launch an investigation into this .

Baseline (GIZA++)The cause of planes is still not clear yet, local civil aviation department will investigate this .

目前 导致 飞机 相撞 的 原因 尚 不 清楚 , 当地 民航 部门 将 对此 展开 调查Cur-

rently cause plane crash DE reason yet not clear, localcivilaero-

nauticsbureau will toward open investi-

gations

Bilingual Adaptation ModelThe cause of plane collision remained unclear, local civil aviation departments will launch an investigation .

Page 25: Joint Parsing and Alignment with Weakly Synchronized Grammars

Thanks