a constrained viterbi relaxation for bidirectional word...

47
A Constrained Viterbi Relaxation for Bidirectional Word Alignment Yin-Wen Chang, Alexander Rush, John DeNero and Michael Collins ACL 2014 June 25, 2014

Upload: others

Post on 20-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

A Constrained Viterbi Relaxationfor Bidirectional Word Alignment

Yin-Wen Chang, Alexander Rush,John DeNero and Michael Collins

ACL 2014

June 25, 2014

Page 2: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

HMM Word Alignment Model

i: 1 2 3 4 5

e: let us see the documents

j: 1 2 3 4 5

f: montrez - nous les documents

f

e

ε montrez

- nous

les

documents

ε

let

us

see

the

documents

Page 3: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

HMM Word Alignment Model

i: 1 2 3 4 5

e: let us see the documents

j: 1 2 3 4 5

f: montrez - nous les documents

f

e

ε montrez

- nous

les

documents

ε

let

us

see

the

documents

Page 4: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

HMM Word Alignment Model

i: 1 2 3 4 5

e: let us see the documents

j: 1 2 3 4 5

f: montrez - nous les documents

f

e

ε montrez

- nous

les

documents

ε

let

us

see

the

documents

Page 5: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

HMM Word Alignment Model

i: 1 2 3 4 5

e: let us see the documents

j: 1 2 3 4 5

f: montrez - nous les documents

f

e

ε montrez

- nous

les

documents

ε

let

us

see

the

documents

Page 6: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

HMM Word Alignment Model

i: 1 2 3 4 5

e: let us see the documents

j: 1 2 3 4 5

f: montrez - nous les documents

f

e

ε montrez

- nous

les

documents

ε

let

us

see

the

documents

Page 7: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

HMM Word Alignment Model

i: 1 2 3 4 5

e: let us see the documents

j: 1 2 3 4 5

f: montrez - nous les documents

f

e

ε montrez

- nous

les

documents

ε

let

us

see

the

documents

Page 8: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

HMM Word Alignment Model

i: 1 2 3 4 5

e: let us see the documents

j: 1 2 3 4 5

f: montrez - nous les documents

f f

e

ε montrez

- nous

les

documents

ε

let

us

see

the

documents

ε montrez

- nous

les

documents

ε

let

us

see

the

documents

Page 9: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

HMM Word Alignment Model

i: 1 2 3 4 5

e: let us see the documents

j: 1 2 3 4 5

f: montrez - nous les documents

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Page 10: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

This Work: Bidirectional Alignment

I Most bidirectional formulations are NP-hard to solve.

I Previous attempt used dual decomposition and achieved 6%exact solutions (DeNero and Macherey, 2011).

I Goal: increase the number of exact solutions

Page 11: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Contributions

I A new relaxation for decoding the bidirectional model,solvable with a variant of Viterbi algorithm.

I Lagrangian relaxation to enforce the relaxed constraints.

I General techniques for adding constraints and applyingpruning.

Page 12: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Outline

Bidirectional AlignmentHMM Word AlignmentLagrangian Relaxation

TighteningAdding ConstraintsPruning

Results

Page 13: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Word Alignment: f→e

f (x ; θ) =∑i ,j ,j ′

θ(j ′, i , j)x(j ′, i , j)

I boundary: x(0, 0) = 1

I backward consistency:

x(i , j) =J∑

j ′=0

x(j ′, i , j)

I forward consistency:

x(i , j) =J∑

j ′=0

x(j , i + 1, j ′)

ε montrez

- nous

les documents

ε

let

us

see

the

documents

ε montrez

- nous

les documents

ε

let

us

see

the

documents

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Page 14: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Word Alignment Example: f→e

x(1, 2, 3) = 1

θ(1, 2, 3) = log(P(us|nous)) + log(P(3|1))

0 1 2 3 4 5ε m

ontrez

- nous

les documents

ε

let

us

see

the

documents

0

1

2

3

4

5

Page 15: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Word Alignment: e→f

g(y ;ω) =∑j ,i ,i ′

ω(i ′, i , j)y(i ′, i , j)

I boundary: y(0, 0) = 1

I backward consistency:

y(i , j) =I∑

i ′=0

y(i ′, i , j)

I forward consistency:

y(i , j) =I∑

i ′=0

y(i , i ′, j + 1)

ε montrez

- nous

les documents

ε

let

us

see

the

documents

ε montrez

- nous

les documents

ε

let

us

see

the

documents

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Page 16: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Bidirectional Alignment with Full Agreement

Goal:

x∗, y∗ = arg maxx ,y

f (x) + g(y) s.t.

x(i , j) = y(i , j) ∀ i , j 6= 0

ε montrez

- nous

les

documents

ε

let

us

see

the

documents

Page 17: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Outline

Bidirectional AlignmentHMM Word AlignmentLagrangian Relaxation

TighteningAdding ConstraintsPruning

Results

Page 18: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

The Relaxed Problem

I Y: set of the e→f alignment

I Y ′: set of the e→f alignment without the forward constraints

I Y ⊂ Y ′

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Page 19: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Lagrangian Relaxation

Goal:

x∗, y∗ = arg maxx∈X ,y∈Y

f (x) + g(y) s.t.

x(i , j) = y(i , j) ∀ i , j 6= 0

Lagrangian dual:

L(λ) = arg maxx∈X ,y∈Y ′,x(i ,j)=y(i ,j)

f (x) + g ′(y ;ω, λ)

where

g ′(y ;ω, λ) = g(y ;ω, λ)−∑i ,j

λ(i , j)︸ ︷︷ ︸Lagrange

multipliers

(y(i , j)−

∑i ′

y(i , i ′, j + 1)

)︸ ︷︷ ︸

forward constraints for y

Page 20: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Viterbi-style Algorithm for computing L(λ)

maxx∈X ,y∈Y′,x(i,j)=y(i,j)

f (x) + g ′(y ;ω, λ)

= maxx∈X ,y∈Y′,x(i,j)=y(i,j)

f (x) +∑i,j

y(i , j) maxi ′ω′(i ′, i , j)

where ω′(i ′, i , j) = ω(i ′, i , j)− λ(i , j) + λ(i ′, j − 1)

ε montrez

- nous

les documents

ε

let

us

see

the

documents

I Compute the score for each y(i , j)

I Standard Viterbi update forcomputing x(i , j), adding in thescore of y(i , j)

Page 21: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

The Lagrangian Relaxation Algorithm

I Lagrangian dual is the upper bound:

L(λ) ≥ f (x∗) + g(y∗)

I Find tightest upper bound:

minλ

L(λ)

I Minimize by subgradient:

1. Set (x , y) to the arg max of L(λ).

If (x , y) satisfies the forward constraint, return (x , y)2. Else, update λ(i , j) for all i , j ,

λ(i , j)← λ(i , j)− ηt(y(i , j)−

I∑i ′=0

y(i , i ′, j + 1))

I Certificate of optimality upon convergence

Page 22: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Extension: Adjacent Agreement

I A model that allows adjacent matches

I x(4, 3) = 1 because y(3, 3) = 1

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Page 23: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Extension: Adjacent Agreement

I A modified Viterbi algorithm with the same complexity

ε montrez

- nous

les

documents

ε

let

us

see

the

documents

Page 24: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Preview Results: Lagrangian Relaxation

I Lagrangian relaxation only guarantee to solve the linearprogramming relaxation

I 54.7 % exact solutions

certificate0

20

40

60

80

100

% o

f se

nte

nce

s

D&M

LR

Page 25: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Outline

Bidirectional AlignmentHMM Word AlignmentLagrangian Relaxation

TighteningAdding ConstraintsPruning

Results

Page 26: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Strategy: Adding Constraints

Upper bound:

L(λ) ≥ f (x∗) + g(y∗)

Current gap:

L(λ)− (f (x∗) + g(y∗))

Tightening:

finding a different dual with better gap

Find L′(λ) ≤ L(λ)

Page 27: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Strategy: Adding Constraints

I Re-introduce a relaxed forward constraint on link y(i , j)

I Adding the most often violated constraints

I For example, adding constraint for link y(2, 3):

Feasible under L(λ)Not feasible under L′(λ)

Feasible under bothL(λ) and L′(λ)

ε montrez

- nous

les documents

ε

let

us

see

the

documents

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Page 28: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Outline

Bidirectional AlignmentHMM Word AlignmentLagrangian Relaxation

TighteningAdding ConstraintsPruning

Results

Page 29: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Relaxed Max-marginal

I Improve efficiency while keeping optimality guarantee

I M: Relaxed max-marginal values

The highest dual value of all alignments using the link x(i , j)

I Example: M(2, 3;λ)

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Page 30: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Exact Coarse-to-Fine PruningI We can safely remove an alignment link x(i , j) if

M(i , j ;λ) < lb

I Lower bound:

f (x) + g(y) ≤ f (x∗) + g(y∗)

for some x , y that is valid

mon

trez

- nous

les

docu

men

ts

Let

us

see

the

documents

Page 31: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Exact Coarse-to-Fine PruningI We can safely remove an alignment link x(i , j) if

M(i , j ;λ) < lb

I Lower bound:

f (x) + g(y) ≤ f (x∗) + g(y∗)

for some x , y that is valid

mon

trez

- nous

les

docu

men

ts

Let

us

see

the

documents

Page 32: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Preview Results: Pruning

0 50 100 150 200 250 300 350 400iteration

100

50

0

50

100

score

rela

tive t

o o

pti

mal best dual

intersection

0 50 100 150 200 250 300 350 400number of iterations

0.0

0.2

0.4

0.6

0.8

1.0

rela

tive s

earc

h s

pace

siz

e

Page 33: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Finding Lower Bounds

A greedy heuristic algorithm:I Repeat until

I there exists no null-aligned word, orI there is no score increase.

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Page 34: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Finding Lower Bounds

A greedy heuristic algorithm:I Repeat until

I there exists no null-aligned word, orI there is no score increase.

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Page 35: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Finding Lower Bounds

A greedy heuristic algorithm:I Repeat until

I there exists no null-aligned word, orI there is no score increase.

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Page 36: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Finding Lower Bounds

A greedy heuristic algorithm:I Repeat until

I there exists no null-aligned word, orI there is no score increase.

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Page 37: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Finding Lower Bounds

A greedy heuristic algorithm:I Repeat until

I there exists no null-aligned word, orI there is no score increase.

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Page 38: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Finding Lower Bounds

A greedy heuristic algorithm:I Repeat until

I there exists no null-aligned word, orI there is no score increase.

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Page 39: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Finding Lower Bounds

A greedy heuristic algorithm:I Repeat until

I there exists no null-aligned word, orI there is no score increase.

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Page 40: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Finding Lower Bounds

A greedy heuristic algorithm:I Repeat until

I there exists no null-aligned word, orI there is no score increase.

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Page 41: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Results: Coarse-to-Fine Pruning with Good Lower Bound

0 50 100 150 200 250 300 350 400iteration

100

50

0

50

100

score

rela

tive t

o o

pti

mal best dual

best primal

intersection

0 50 100 150 200 250 300 350 400number of iterations

0.0

0.2

0.4

0.6

0.8

1.0

rela

tive s

earc

h s

pace

siz

e

Page 42: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Experiments

I Trained on 6.2 million words of Chinese-English FBIS data

I Evaluated on 150 sentence pairs of NIST 2002 data

I Identical to DeNero and Macherey (2011)

Page 43: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Results: with Adding Constraints and Pruning

certificate0

20

40

60

80

100

% o

f se

nte

nce

s

D&M

LR

CONS

Page 44: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Results: Speed and Optimal Solutions

time certificate (%)

ILP 924.24 100.0LR 6.33 54.7CONS 21.08 86.0

D&M - 6.2

I ILP: Integer linear programming

I LR: Our Lagrangian relaxation algorithm

I CONS: LR with adding constraints

I D&M: Dual decomposition algorithm by DeNero andMacherey (2011)

Page 45: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Results: Accuracy

DIR: union

grow-diag-final

intersection

D&M: union

grow-diag-final

intersection

CONS: -

33.4

32.1

27.0

29.1

28.0

23.6

26.4

Alignment Error Rate (AER)

DIR: union

grow-diag-final

intersection

D&M: union

grow-diag-final

intersection

CONS: -

46.3

48.4

51.9

52.5

53.0

55.3

52.7

Phrase Pair F1

I DIR: directional alignmentsI When the algorithm does not converge:

I D&M uses combination proceduresI CONS uses the highest scoring feasible solution

Page 46: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Conclusion

I A Lagrangian relaxation algorithm for bidirectional alignment

I Adding constraints incrementally

I Coarse-to-fine pruning

I Convergence on 86% sentences, compared to 6% reported byDeNero and Macherey (2011)

I Future work: apply these techniques to a more flexible modelwith a wider range of directional matches

Thank you!

Page 47: A Constrained Viterbi Relaxation for Bidirectional Word Alignmentpeople.seas.harvard.edu/~srush/slides_alignment.pdf · A Constrained Viterbi Relaxation for Bidirectional Word Alignment

Conclusion

I A Lagrangian relaxation algorithm for bidirectional alignment

I Adding constraints incrementally

I Coarse-to-fine pruning

I Convergence on 86% sentences, compared to 6% reported byDeNero and Macherey (2011)

I Future work: apply these techniques to a more flexible modelwith a wider range of directional matches

Thank you!