joint parsing and alignment with weakly synchronized grammars

Post on 23-Feb-2016

48 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Joint Parsing and Alignment with Weakly Synchronized Grammars. David Burkett, John Blitzer, & Dan Klein. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A. Statistical MT Training Pipeline. }. 1) Align sentence pairs (GIZA++) - PowerPoint PPT Presentation

TRANSCRIPT

Joint Parsing and Alignment with Weakly Synchronized Grammars

David Burkett, John Blitzer, & Dan Klein

John
PhD, visiting researcher, PostdocAll on correspondencesSay: Correspondences for Domain Adaptation Correspondences for Machine Translation

Statistical MT Training Pipeline

1) Align sentence pairs (GIZA++)2) Parse English sentences (Berkeley parser) Parse Foreign sentences

3) Extract rules (Galley et al. 2006)

4) Tune discriminative parameters

在at

办公室office

里in

读了read

书book

read

the

book

in

the

office

} Joint model for (1) & (2)

Data Setting for Joint Models

( 中文 ; )English WSJ

.

.

.

(EN; )(EN; )

(EN; )

( 中文 ; )...

( 中文 ; )

Chinese CTBParallel, Aligned CTB

.

.

.

(EN, 中文 ; )(EN, 中文 ; )

(EN, 中文 ; )

Unlabeled parallel text

.

.

.

(EN; 中文 )(EN; 中文 )

(EN; 中文 )

Word alignment grids

在at

办公室office

里in

读了read

书book

read

the

book

in

the

office

Syntactic Correspondences

EN中文Build a model

Correspondence via Synchronous Grammars

Synchronous derivation

Synchronous Derivation

Weakly Synchronized Example

Weakly Synchronized Example

Separate PCFGs

Weakly Synchronized Example

ITG alignment

Weakly Synchronized Example

Points for synchronization, but not required

Correspondence Model & Feature Types

办公室office

Feature type 1: Word Alignment

EN 中文

PPPP

Feature type 3: Correspondence

Feature type 2: Monolingual Parser

ENPP

in the office

EN 中文EN 中文 EN 中文EN 中文EN 中文

[HBDK09]

Estimating

EN 中文 EN 中文

• Set to maximize the log-likelihood of the correct parses & alignments

EN EN 中文中文 EN 中文

EN 中文• normalizes to sum to 1

Computing

PP PP Correspondence features tie pieces together

EN 中文

EN 中文

Computing exactly is intractable

EN 中文 EN 中文

Individual , , have polynomial-time dynamic programming algorithms

Approximating : Mean Field

• Exploit tractability in individual models:

• Factored approximation: EN 中文

PPPP

1) Initialize separately

2) Iterate:

• Set to minimize EN 中文

EN 中文

Algorithm

Large scale inference

We can approximate in polynomial time, but . . .EN 中文

Sum over possible alignments is an algorithm.

But computers are fast, right?

• Medium-length sentences are 50 words long• Small translation data sets are 250,000 sentences• ~4 quadrillion operations (See for speedup details)[BBK10, HBDK09]

Quantitative Results: Parsing

Series178

81

84

87

90 Monolingual Joint

Quantitative Results: Parsing

Chinese parser78

81

84

87

90 Monolingual Joint

85.7%

83.6%

Quantitative Results: Parsing

Chinese parser English parser78

81

84

87

90 Monolingual Joint

81.2%

84.5%

Incorrect English PP Attachment

Corrected English PP Attachment

Quantitative Results: Translation

Word alignment65

69

73

77

81

85

89 HMM Discriminative ITG Joint

69.5%

85.0%

BLEU improvement from 29.4 to 30.6

79.5%

Better Translations with Bilingual Adaptation

ReferenceAt this point the cause of the plane collision is still unclear. The local caa will launch an investigation into this .

Baseline (GIZA++)The cause of planes is still not clear yet, local civil aviation department will investigate this .

目前 导致 飞机 相撞 的 原因 尚 不 清楚 , 当地 民航 部门 将 对此 展开 调查Cur-

rently cause plane crash DE reason yet not clear, localcivilaero-

nauticsbureau will toward open investi-

gations

Bilingual Adaptation ModelThe cause of plane collision remained unclear, local civil aviation departments will launch an investigation .

Thanks

top related