a constrained viterbi relaxation for bidirectional word...

A Constrained Viterbi Relaxationfor Bidirectional Word Alignment

Yin-Wen Chang, Alexander Rush,John DeNero and Michael Collins

ACL 2014

June 25, 2014

HMM Word Alignment Model

i: 1 2 3 4 5

e: let us see the documents

j: 1 2 3 4 5

f: montrez - nous les documents

f

e

ε montrez

- nous

les

documents

ε

let

us

see

the

documents


i: 1 2 3 4 5


j: 1 2 3 4 5


f f

e

ε montrez

- nous

les

documents

ε

let

us

see

the

documents

ε montrez

- nous

les

documents

ε

let

us

see

the

documents


i: 1 2 3 4 5


j: 1 2 3 4 5


ε montrez

- nous

les documents

ε

let

us

see

the

documents

This Work: Bidirectional Alignment

I Most bidirectional formulations are NP-hard to solve.

I Previous attempt used dual decomposition and achieved 6%exact solutions (DeNero and Macherey, 2011).

I Goal: increase the number of exact solutions

Contributions

I A new relaxation for decoding the bidirectional model,solvable with a variant of Viterbi algorithm.

I Lagrangian relaxation to enforce the relaxed constraints.

I General techniques for adding constraints and applyingpruning.

Outline

Bidirectional AlignmentHMM Word AlignmentLagrangian Relaxation

TighteningAdding ConstraintsPruning

Results

Word Alignment: f→e

f (x ; θ) =∑i ,j ,j ′

θ(j ′, i , j)x(j ′, i , j)

I boundary: x(0, 0) = 1

I backward consistency:

x(i , j) =J∑

j ′=0

x(j ′, i , j)

I forward consistency:

x(i , j) =J∑

j ′=0

x(j , i + 1, j ′)

ε montrez

- nous

les documents

ε

let

us

see

the

documents

ε montrez

- nous

les documents

ε

let

us

see

the

documents

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Word Alignment Example: f→e

x(1, 2, 3) = 1

θ(1, 2, 3) = log(P(us|nous)) + log(P(3|1))

0 1 2 3 4 5ε m

ontrez

- nous

les documents

ε

let

us

see

the

documents

0

1

2

3

4

5

Word Alignment: e→f

g(y ;ω) =∑j ,i ,i ′

ω(i ′, i , j)y(i ′, i , j)

I boundary: y(0, 0) = 1

I backward consistency:

y(i , j) =I∑

i ′=0

y(i ′, i , j)

I forward consistency:

y(i , j) =I∑

i ′=0

y(i , i ′, j + 1)

ε montrez

- nous

les documents

ε

let

us

see

the

documents

ε montrez

- nous

les documents

ε

let

us

see

the

documents

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Bidirectional Alignment with Full Agreement

Goal:

x∗, y∗ = arg maxx ,y

f (x) + g(y) s.t.

x(i , j) = y(i , j) ∀ i , j 6= 0

ε montrez

- nous

les

documents

ε

let

us

see

the

documents

Outline



Results

The Relaxed Problem

I Y: set of the e→f alignment

I Y ′: set of the e→f alignment without the forward constraints

I Y ⊂ Y ′

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Lagrangian Relaxation

Goal:

x∗, y∗ = arg maxx∈X ,y∈Y

f (x) + g(y) s.t.

x(i , j) = y(i , j) ∀ i , j 6= 0

Lagrangian dual:

L(λ) = arg maxx∈X ,y∈Y ′,x(i ,j)=y(i ,j)

f (x) + g ′(y ;ω, λ)

where

g ′(y ;ω, λ) = g(y ;ω, λ)−∑i ,j

λ(i , j)︸︷︷︸Lagrange

multipliers

(y(i , j)−

∑i ′

y(i , i ′, j + 1)

)︸︷︷︸

forward constraints for y

Viterbi-style Algorithm for computing L(λ)

maxx∈X ,y∈Y′,x(i,j)=y(i,j)

f (x) + g ′(y ;ω, λ)

= maxx∈X ,y∈Y′,x(i,j)=y(i,j)

f (x) +∑i,j

y(i , j) maxi ′ω′(i ′, i , j)

where ω′(i ′, i , j) = ω(i ′, i , j)− λ(i , j) + λ(i ′, j − 1)

ε montrez

- nous

les documents

ε

let

us

see

the

documents

I Compute the score for each y(i , j)

I Standard Viterbi update forcomputing x(i , j), adding in thescore of y(i , j)

The Lagrangian Relaxation Algorithm

I Lagrangian dual is the upper bound:

L(λ) ≥ f (x∗) + g(y∗)

I Find tightest upper bound:

minλ

L(λ)

I Minimize by subgradient:

1. Set (x , y) to the arg max of L(λ).

If (x , y) satisfies the forward constraint, return (x , y)2. Else, update λ(i , j) for all i , j ,

λ(i , j)← λ(i , j)− ηt(y(i , j)−

I∑i ′=0

y(i , i ′, j + 1))

I Certificate of optimality upon convergence

Extension: Adjacent Agreement

I A model that allows adjacent matches

I x(4, 3) = 1 because y(3, 3) = 1

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Extension: Adjacent Agreement

I A modified Viterbi algorithm with the same complexity

ε montrez

- nous

les

documents

ε

let

us

see

the

documents

Preview Results: Lagrangian Relaxation

I Lagrangian relaxation only guarantee to solve the linearprogramming relaxation

I 54.7 % exact solutions

certificate0

20

40

60

80

100

% o

f se

nte

nce

s

D&M

LR

Outline



Results

Strategy: Adding Constraints

Upper bound:

L(λ) ≥ f (x∗) + g(y∗)

Current gap:

L(λ)− (f (x∗) + g(y∗))

Tightening:

finding a different dual with better gap

Find L′(λ) ≤ L(λ)

Strategy: Adding Constraints

I Re-introduce a relaxed forward constraint on link y(i , j)

I Adding the most often violated constraints

I For example, adding constraint for link y(2, 3):

Feasible under L(λ)Not feasible under L′(λ)

Feasible under bothL(λ) and L′(λ)

ε montrez

- nous

les documents

ε

let

us

see

the

documents

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Outline



Results

Relaxed Max-marginal

I Improve efficiency while keeping optimality guarantee

I M: Relaxed max-marginal values

The highest dual value of all alignments using the link x(i , j)

I Example: M(2, 3;λ)

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Exact Coarse-to-Fine PruningI We can safely remove an alignment link x(i , j) if

M(i , j ;λ) < lb

I Lower bound:

f (x) + g(y) ≤ f (x∗) + g(y∗)

for some x , y that is valid

mon

trez

- nous

les

docu

men

ts

Let

us

see

the

documents

Preview Results: Pruning

0 50 100 150 200 250 300 350 400iteration

100

50

0

50

100

score

rela

tive t

o o

pti

mal best dual

intersection

0 50 100 150 200 250 300 350 400number of iterations

0.0

0.2

0.4

0.6

0.8

1.0

rela

tive s

earc

h s

pace

siz

e

Finding Lower Bounds

A greedy heuristic algorithm:I Repeat until

I there exists no null-aligned word, orI there is no score increase.

ε montrez

- nous

les documents

ε

let

us

see

the

documents

Results: Coarse-to-Fine Pruning with Good Lower Bound

0 50 100 150 200 250 300 350 400iteration

100

50

0

50

100

score

rela

tive t

o o

pti

mal best dual

best primal

intersection

0 50 100 150 200 250 300 350 400number of iterations

0.0

0.2

0.4

0.6

0.8

1.0

rela

tive s

earc

h s

pace

siz

e

Experiments

I Trained on 6.2 million words of Chinese-English FBIS data

I Evaluated on 150 sentence pairs of NIST 2002 data

I Identical to DeNero and Macherey (2011)

Results: with Adding Constraints and Pruning

certificate0

20

40

60

80

100

% o

f se

nte

nce

s

D&M

LR

CONS

Results: Speed and Optimal Solutions

time certificate (%)

ILP 924.24 100.0LR 6.33 54.7CONS 21.08 86.0

D&M - 6.2

I ILP: Integer linear programming

I LR: Our Lagrangian relaxation algorithm

I CONS: LR with adding constraints

I D&M: Dual decomposition algorithm by DeNero andMacherey (2011)

Results: Accuracy

DIR: union

grow-diag-final

intersection

D&M: union

grow-diag-final

intersection

CONS: -

33.4

32.1

27.0

29.1

28.0

23.6

26.4

Alignment Error Rate (AER)

DIR: union

grow-diag-final

intersection

D&M: union

grow-diag-final

intersection

CONS: -

46.3

48.4

51.9

52.5

53.0

55.3

52.7

Phrase Pair F1

I DIR: directional alignmentsI When the algorithm does not converge:

I D&M uses combination proceduresI CONS uses the highest scoring feasible solution

Conclusion

I A Lagrangian relaxation algorithm for bidirectional alignment

I Adding constraints incrementally

I Coarse-to-fine pruning

I Convergence on 86% sentences, compared to 6% reported byDeNero and Macherey (2011)

I Future work: apply these techniques to a more flexible modelwith a wider range of directional matches

Thank you!

a constrained viterbi relaxation for bidirectional word...

Documents