a constrained viterbi relaxation for bidirectional word...
TRANSCRIPT
A Constrained Viterbi Relaxationfor Bidirectional Word Alignment
Yin-Wen Chang, Alexander Rush,John DeNero and Michael Collins
ACL 2014
June 25, 2014
HMM Word Alignment Model
i: 1 2 3 4 5
e: let us see the documents
j: 1 2 3 4 5
f: montrez - nous les documents
f
e
ε montrez
- nous
les
documents
ε
let
us
see
the
documents
HMM Word Alignment Model
i: 1 2 3 4 5
e: let us see the documents
j: 1 2 3 4 5
f: montrez - nous les documents
f
e
ε montrez
- nous
les
documents
ε
let
us
see
the
documents
HMM Word Alignment Model
i: 1 2 3 4 5
e: let us see the documents
j: 1 2 3 4 5
f: montrez - nous les documents
f
e
ε montrez
- nous
les
documents
ε
let
us
see
the
documents
HMM Word Alignment Model
i: 1 2 3 4 5
e: let us see the documents
j: 1 2 3 4 5
f: montrez - nous les documents
f
e
ε montrez
- nous
les
documents
ε
let
us
see
the
documents
HMM Word Alignment Model
i: 1 2 3 4 5
e: let us see the documents
j: 1 2 3 4 5
f: montrez - nous les documents
f
e
ε montrez
- nous
les
documents
ε
let
us
see
the
documents
HMM Word Alignment Model
i: 1 2 3 4 5
e: let us see the documents
j: 1 2 3 4 5
f: montrez - nous les documents
f
e
ε montrez
- nous
les
documents
ε
let
us
see
the
documents
HMM Word Alignment Model
i: 1 2 3 4 5
e: let us see the documents
j: 1 2 3 4 5
f: montrez - nous les documents
f f
e
ε montrez
- nous
les
documents
ε
let
us
see
the
documents
ε montrez
- nous
les
documents
ε
let
us
see
the
documents
HMM Word Alignment Model
i: 1 2 3 4 5
e: let us see the documents
j: 1 2 3 4 5
f: montrez - nous les documents
ε montrez
- nous
les documents
ε
let
us
see
the
documents
This Work: Bidirectional Alignment
I Most bidirectional formulations are NP-hard to solve.
I Previous attempt used dual decomposition and achieved 6%exact solutions (DeNero and Macherey, 2011).
I Goal: increase the number of exact solutions
Contributions
I A new relaxation for decoding the bidirectional model,solvable with a variant of Viterbi algorithm.
I Lagrangian relaxation to enforce the relaxed constraints.
I General techniques for adding constraints and applyingpruning.
Outline
Bidirectional AlignmentHMM Word AlignmentLagrangian Relaxation
TighteningAdding ConstraintsPruning
Results
Word Alignment: f→e
f (x ; θ) =∑i ,j ,j ′
θ(j ′, i , j)x(j ′, i , j)
I boundary: x(0, 0) = 1
I backward consistency:
x(i , j) =J∑
j ′=0
x(j ′, i , j)
I forward consistency:
x(i , j) =J∑
j ′=0
x(j , i + 1, j ′)
ε montrez
- nous
les documents
ε
let
us
see
the
documents
ε montrez
- nous
les documents
ε
let
us
see
the
documents
ε montrez
- nous
les documents
ε
let
us
see
the
documents
Word Alignment Example: f→e
x(1, 2, 3) = 1
θ(1, 2, 3) = log(P(us|nous)) + log(P(3|1))
0 1 2 3 4 5ε m
ontrez
- nous
les documents
ε
let
us
see
the
documents
0
1
2
3
4
5
Word Alignment: e→f
g(y ;ω) =∑j ,i ,i ′
ω(i ′, i , j)y(i ′, i , j)
I boundary: y(0, 0) = 1
I backward consistency:
y(i , j) =I∑
i ′=0
y(i ′, i , j)
I forward consistency:
y(i , j) =I∑
i ′=0
y(i , i ′, j + 1)
ε montrez
- nous
les documents
ε
let
us
see
the
documents
ε montrez
- nous
les documents
ε
let
us
see
the
documents
ε montrez
- nous
les documents
ε
let
us
see
the
documents
Bidirectional Alignment with Full Agreement
Goal:
x∗, y∗ = arg maxx ,y
f (x) + g(y) s.t.
x(i , j) = y(i , j) ∀ i , j 6= 0
ε montrez
- nous
les
documents
ε
let
us
see
the
documents
Outline
Bidirectional AlignmentHMM Word AlignmentLagrangian Relaxation
TighteningAdding ConstraintsPruning
Results
The Relaxed Problem
I Y: set of the e→f alignment
I Y ′: set of the e→f alignment without the forward constraints
I Y ⊂ Y ′
ε montrez
- nous
les documents
ε
let
us
see
the
documents
Lagrangian Relaxation
Goal:
x∗, y∗ = arg maxx∈X ,y∈Y
f (x) + g(y) s.t.
x(i , j) = y(i , j) ∀ i , j 6= 0
Lagrangian dual:
L(λ) = arg maxx∈X ,y∈Y ′,x(i ,j)=y(i ,j)
f (x) + g ′(y ;ω, λ)
where
g ′(y ;ω, λ) = g(y ;ω, λ)−∑i ,j
λ(i , j)︸ ︷︷ ︸Lagrange
multipliers
(y(i , j)−
∑i ′
y(i , i ′, j + 1)
)︸ ︷︷ ︸
forward constraints for y
Viterbi-style Algorithm for computing L(λ)
maxx∈X ,y∈Y′,x(i,j)=y(i,j)
f (x) + g ′(y ;ω, λ)
= maxx∈X ,y∈Y′,x(i,j)=y(i,j)
f (x) +∑i,j
y(i , j) maxi ′ω′(i ′, i , j)
where ω′(i ′, i , j) = ω(i ′, i , j)− λ(i , j) + λ(i ′, j − 1)
ε montrez
- nous
les documents
ε
let
us
see
the
documents
I Compute the score for each y(i , j)
I Standard Viterbi update forcomputing x(i , j), adding in thescore of y(i , j)
The Lagrangian Relaxation Algorithm
I Lagrangian dual is the upper bound:
L(λ) ≥ f (x∗) + g(y∗)
I Find tightest upper bound:
minλ
L(λ)
I Minimize by subgradient:
1. Set (x , y) to the arg max of L(λ).
If (x , y) satisfies the forward constraint, return (x , y)2. Else, update λ(i , j) for all i , j ,
λ(i , j)← λ(i , j)− ηt(y(i , j)−
I∑i ′=0
y(i , i ′, j + 1))
I Certificate of optimality upon convergence
Extension: Adjacent Agreement
I A model that allows adjacent matches
I x(4, 3) = 1 because y(3, 3) = 1
ε montrez
- nous
les documents
ε
let
us
see
the
documents
Extension: Adjacent Agreement
I A modified Viterbi algorithm with the same complexity
ε montrez
- nous
les
documents
ε
let
us
see
the
documents
Preview Results: Lagrangian Relaxation
I Lagrangian relaxation only guarantee to solve the linearprogramming relaxation
I 54.7 % exact solutions
certificate0
20
40
60
80
100
% o
f se
nte
nce
s
D&M
LR
Outline
Bidirectional AlignmentHMM Word AlignmentLagrangian Relaxation
TighteningAdding ConstraintsPruning
Results
Strategy: Adding Constraints
Upper bound:
L(λ) ≥ f (x∗) + g(y∗)
Current gap:
L(λ)− (f (x∗) + g(y∗))
Tightening:
finding a different dual with better gap
Find L′(λ) ≤ L(λ)
Strategy: Adding Constraints
I Re-introduce a relaxed forward constraint on link y(i , j)
I Adding the most often violated constraints
I For example, adding constraint for link y(2, 3):
Feasible under L(λ)Not feasible under L′(λ)
Feasible under bothL(λ) and L′(λ)
ε montrez
- nous
les documents
ε
let
us
see
the
documents
ε montrez
- nous
les documents
ε
let
us
see
the
documents
Outline
Bidirectional AlignmentHMM Word AlignmentLagrangian Relaxation
TighteningAdding ConstraintsPruning
Results
Relaxed Max-marginal
I Improve efficiency while keeping optimality guarantee
I M: Relaxed max-marginal values
The highest dual value of all alignments using the link x(i , j)
I Example: M(2, 3;λ)
ε montrez
- nous
les documents
ε
let
us
see
the
documents
Exact Coarse-to-Fine PruningI We can safely remove an alignment link x(i , j) if
M(i , j ;λ) < lb
I Lower bound:
f (x) + g(y) ≤ f (x∗) + g(y∗)
for some x , y that is valid
mon
trez
- nous
les
docu
men
ts
Let
us
see
the
documents
Exact Coarse-to-Fine PruningI We can safely remove an alignment link x(i , j) if
M(i , j ;λ) < lb
I Lower bound:
f (x) + g(y) ≤ f (x∗) + g(y∗)
for some x , y that is valid
mon
trez
- nous
les
docu
men
ts
Let
us
see
the
documents
Preview Results: Pruning
0 50 100 150 200 250 300 350 400iteration
100
50
0
50
100
score
rela
tive t
o o
pti
mal best dual
intersection
0 50 100 150 200 250 300 350 400number of iterations
0.0
0.2
0.4
0.6
0.8
1.0
rela
tive s
earc
h s
pace
siz
e
Finding Lower Bounds
A greedy heuristic algorithm:I Repeat until
I there exists no null-aligned word, orI there is no score increase.
ε montrez
- nous
les documents
ε
let
us
see
the
documents
Finding Lower Bounds
A greedy heuristic algorithm:I Repeat until
I there exists no null-aligned word, orI there is no score increase.
ε montrez
- nous
les documents
ε
let
us
see
the
documents
Finding Lower Bounds
A greedy heuristic algorithm:I Repeat until
I there exists no null-aligned word, orI there is no score increase.
ε montrez
- nous
les documents
ε
let
us
see
the
documents
Finding Lower Bounds
A greedy heuristic algorithm:I Repeat until
I there exists no null-aligned word, orI there is no score increase.
ε montrez
- nous
les documents
ε
let
us
see
the
documents
Finding Lower Bounds
A greedy heuristic algorithm:I Repeat until
I there exists no null-aligned word, orI there is no score increase.
ε montrez
- nous
les documents
ε
let
us
see
the
documents
Finding Lower Bounds
A greedy heuristic algorithm:I Repeat until
I there exists no null-aligned word, orI there is no score increase.
ε montrez
- nous
les documents
ε
let
us
see
the
documents
Finding Lower Bounds
A greedy heuristic algorithm:I Repeat until
I there exists no null-aligned word, orI there is no score increase.
ε montrez
- nous
les documents
ε
let
us
see
the
documents
Finding Lower Bounds
A greedy heuristic algorithm:I Repeat until
I there exists no null-aligned word, orI there is no score increase.
ε montrez
- nous
les documents
ε
let
us
see
the
documents
Results: Coarse-to-Fine Pruning with Good Lower Bound
0 50 100 150 200 250 300 350 400iteration
100
50
0
50
100
score
rela
tive t
o o
pti
mal best dual
best primal
intersection
0 50 100 150 200 250 300 350 400number of iterations
0.0
0.2
0.4
0.6
0.8
1.0
rela
tive s
earc
h s
pace
siz
e
Experiments
I Trained on 6.2 million words of Chinese-English FBIS data
I Evaluated on 150 sentence pairs of NIST 2002 data
I Identical to DeNero and Macherey (2011)
Results: with Adding Constraints and Pruning
certificate0
20
40
60
80
100
% o
f se
nte
nce
s
D&M
LR
CONS
Results: Speed and Optimal Solutions
time certificate (%)
ILP 924.24 100.0LR 6.33 54.7CONS 21.08 86.0
D&M - 6.2
I ILP: Integer linear programming
I LR: Our Lagrangian relaxation algorithm
I CONS: LR with adding constraints
I D&M: Dual decomposition algorithm by DeNero andMacherey (2011)
Results: Accuracy
DIR: union
grow-diag-final
intersection
D&M: union
grow-diag-final
intersection
CONS: -
33.4
32.1
27.0
29.1
28.0
23.6
26.4
Alignment Error Rate (AER)
DIR: union
grow-diag-final
intersection
D&M: union
grow-diag-final
intersection
CONS: -
46.3
48.4
51.9
52.5
53.0
55.3
52.7
Phrase Pair F1
I DIR: directional alignmentsI When the algorithm does not converge:
I D&M uses combination proceduresI CONS uses the highest scoring feasible solution
Conclusion
I A Lagrangian relaxation algorithm for bidirectional alignment
I Adding constraints incrementally
I Coarse-to-fine pruning
I Convergence on 86% sentences, compared to 6% reported byDeNero and Macherey (2011)
I Future work: apply these techniques to a more flexible modelwith a wider range of directional matches
Thank you!
Conclusion
I A Lagrangian relaxation algorithm for bidirectional alignment
I Adding constraints incrementally
I Coarse-to-fine pruning
I Convergence on 86% sentences, compared to 6% reported byDeNero and Macherey (2011)
I Future work: apply these techniques to a more flexible modelwith a wider range of directional matches
Thank you!