pair hmms and edit distance ristad & yianilos. special meeting wed 4/14 what: evolving and...
TRANSCRIPT
![Page 1: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/1.jpg)
Pair HMMs and edit distance
Ristad & Yianilos
![Page 2: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/2.jpg)
Special meeting Wed 4/14
• What: Evolving and Self-Managing Data Integration Systems
• Who: AnHai Doan, Univ. of Illinois at Urbana-Champaign
• When: Wednesday, April 14, 2004 @ 11am (food at 10:30am)
• Where: Sennott Square Building, room 5317
![Page 3: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/3.jpg)
Special meeting 4/28 (last class)
• First International Joint Conference on Information Extraction, Information Integration, and Sequential Learning
• 10:30-11:50 am, Wean Hall 4601
• All project proposals have been accepted as paper abstracts, and you’re all invited to present for 10min (including questions)
![Page 4: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/4.jpg)
Pair HMMs – Ristad & Yianolis
• HMM review– notation– inference (forward algorithm)– learning (forward-backward & EM)
• Pair HMMs– notation– generating edit strings– distance metrics (stochastic, viterbi)– inference (forward)– learning (forward-backward & EM)
• Results from R&Y paper– K-NN with trained distance, hidden prototypes– problem: phoneme strings => words
• Advanced Pair HMMs– adding state (eg for affine gap models)– Smith Waterman?– CRF training?
last week
today
![Page 5: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/5.jpg)
HMM Notation
},...,{, emittingafter in is HMM state:
of substring a:...
ofprefix a :...
aka string,input :...
)Pr(y,probabilitemission :)(
)Pr(y,probabilitn transitio:)'(
HMM theof parameters :
1
1
1
21
'
Ktt
t
Tutt
ut
Tt
t
TT
l
ll
llsxs
xxxxx
xxxx
xxxx
xsxl
ssll
![Page 6: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/6.jpg)
HMM Example
},...,{, emittingafter in is HMM state:
of substring a:...
ofprefix a :...
aka string,input :...
)Pr(y,probabilitemission :)(
)Pr(y,probabilitn transitio:)'(
HMM theof parameters :
1
1
1
21
'
Ktt
t
Tutt
ut
Tt
t
TT
l
ll
llsxs
xxxxx
xxxx
xxxx
xsxl
ssll
1 2
Pr(1->2)
Pr(2->1)
Pr(2->2)Pr(1->1)
Pr(1->x)
d 0.3
h 0.5
b 0.2
Pr(2->x)
a 0.3
e 0.5
o 0.2
Sample output: xT=heehahaha, sT=122121212
![Page 7: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/7.jpg)
HMM Inference
t=1 t=2 t=2 ... t=T
l=1 ...
l=2 ...
...
l=K ...
Key point: Pr(si=l) depends only on Pr(l’->l) and si-1
x1 x2 x3 xT
![Page 8: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/8.jpg)
HMM Inference
t=1 t=2 t=2 ... t=T
l=1 ...
l=2 ...
...
l=K ...
Key point: Pr(si=l) depends only on Pr(l’->l) and si-1
so you can propogate probabilities forward...
x1 x2 x3 xT
![Page 9: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/9.jpg)
HMM Inference – Forward Algorithm
l=1 ....
l=2 ....
...
l=K ....
x1 x2 x3 xT
)()'()1,(
)Pr()'Pr()'Pr(
)Pr(),(
11
'
t
ttt
l
tt
xllltl
xllllsx
lsxtl
),( tl
![Page 10: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/10.jpg)
HMM Learning - EM
Expectation maximization:– Find expectations, i.e. Pr(si=l) for i=1,...,T
• forward algorithm + epsilon
• hidden variables are states s at times t=1,...,t=T
– Maximize probability of parameters given expectations:
• replace #(l’->l)/#(l’) with weighted version of counts
• replace #(l’->x)/#(l’) with weighted version
![Page 11: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/11.jpg)
HMM Inference
t=1 t=2 t=2 ... t=T
l=1 ...
l=2 ...
...
l=K ...
Forward algorithm: computes probabilities α(l,t) based on information in first t letters of string, ignores “downstream” information
x1 x2 x3 xT
![Page 12: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/12.jpg)
HMM Inference
l=1 ...
l=2
...
l=K
x1 x2 x3 x
T
)Pr(),(
)Pr()Pr()Pr(
1
1
lsxli
lsxlsxlsx
iTi
iTii
ii
T
)Pr()()'()'Pr( 211 lsxxllllsx iTi
lii
Ti
),1()()'()',( 1 lixllllil
i
![Page 13: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/13.jpg)
HMM Learning - EM
Expectation maximization:– Find expectations, i.e. Pr(si=l) for i=1,...,T
• forward backward algorithm
• hidden variables are states s at times t=1,...,t=T
– Maximize probability of parameters given expectations:
• replace #(l’->l)/#(l’) with weighted version of counts
• replace #(l’->x)/#(l’) with weighted version
![Page 14: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/14.jpg)
Pair HMM Notation
}..1{}..1{, with associatedproperty hidden :
edits, of string:
, aka strings,input :,
),...,(),,( written also y,probabilitemission :)(
editsemissions/:}END{)},{()},{()},{(
HMMpair theof parameters :
*
VTrzr
Ezz
yxyx
bbae
babaE
kkk
n
VT
![Page 15: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/15.jpg)
Pair HMM Example
}..1{}..1{, with associatedproperty hidden :
edits, of string:
, aka strings,input :,
),...,(),,( written also y,probabilitemission :)(
editsemissions/:}END{)},{()},{()},{(
HMMpair theof parameters :
*
VTrzr
Ezz
yxyx
bbae
babaE
kkk
n
VT
1
e Pr(e)
<a,a> 0.10
<e,e> 0.10
<h,h> 0.10
<a,-> 0.05
<h,t> 0.05
<-,h> 0.01
... ..
![Page 16: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/16.jpg)
Pair HMM Example
1
e Pr(e)
<a,a> 0.10
<e,e> 0.10
<h,h> 0.10
<e,-> 0.05
<h,t> 0.05
<-,h> 0.01
... ..
Sample run: zT = <h,t>,<e,e><e,e><h,h>,<e,->,<e,e>Strings x,y produced by zT: x=heehee, y=teehe
Notice that x,y is also produced by z4 + <e,e>,<e,-> and many other edit strings
![Page 17: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/17.jpg)
Distances based on pair HMMs
),,(:),,(:
)Pr()Pr()|,Pr(VTnnVTnn yxzEDITz i
i
yxzEDITz
nVT zzyx
)|,Pr(log),(stochastic TTTT yxyxd
)|Pr(maxarglog),(),,(:
viterbi n
yxzEDITz
TT zyxdTTnn
![Page 18: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/18.jpg)
Pair HMM Inference
),()1,(),(),1(),()1,1(
),(),Pr(),(),Pr(),(),Pr(
),Pr(),(1111
vtvt
vvt
tvt
vtvt
vt
yvtxvtyxvt
yyxxyxyxyx
yxvt
)1,1( vt ),1( vt
)1,( vt ),( vt
Dynamic programming is possible: fill out matrix left-to-right, top-down
![Page 19: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/19.jpg)
Pair HMM Inference
),()1,(),(),1(),()1,1(
),Pr(),(
vtvt
vt
yvtxvtyxvt
yxvt
t=1 t=2 t=2 ... t=T
v=1 ...
v=2 ...
...
v=K ...
x1 x2 x3 xT
![Page 20: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/20.jpg)
Pair HMM Inference
),()1,(),(),1(),()1,1(
),Pr(),(
vtvt
vt
yvtxvtyxvt
yxvt
t=1 t=2 t=2 ... t=T
v=1 ...
v=2 ...
...
v=K ...
![Page 21: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/21.jpg)
Pair HMM Inference
t=1 t=2 t=2 ... t=T
v=1 ...
v=2 ...
...
v=K ...
One difference: after i emissions of pair HMM, we do not know the column position
i=1
i=1 i=3
i=1 i=2
i=3
![Page 22: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/22.jpg)
Pair HMM Inference: Forward-Backward
)1,(),(),1(),()1,1(),(
),Pr(),(),Pr(),(),Pr(),(
),Pr(),(
),Pr(),())','(),(Pr(
1111
''1
jiyjixjiyx
yxyyxxyxyx
yxji
yxjijirjir
jiji
Vj
Tij
Vj
Tii
Vj
Tiji
Vj
Ti
Vj
Tikk
t=1 t=2 t=2 ... t=T
v=1 ...
v=2 ...
...
v=K ...
![Page 23: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/23.jpg)
Multiple states
1
e Pr(e)
<a,a> 0.10
<e,e> 0.10
<h,h> 0.10
<a,-> 0.05
<h,t> 0.05
<-,h> 0.01
... ..
3e Pr(e)
<a,a> 0.10
<e,e> 0.10
<h,h> 0.10
<a,-> 0.05
2
e Pr(e)
<a,a> 0.10
<e,e> 0.10
<h,h> 0.10
<a,-> 0.05
![Page 24: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/24.jpg)
...v=K
...
...v=2
...v=1
t=T...t=2t=2t=1l=2
An extension: multiple states
...v=K
...
...v=2
...v=1
t=T...t=2t=2t=1l=1
conceptually, add a “state” dimension to the model
EM methods generalize easily to this setting
![Page 25: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/25.jpg)
Back to R&Y paper...
• They consider “coarse” and “detailed” models, as well as mixtures of both.
• Coarse model is like a back-off model – merge edit operations into equivalence classes (e.g. based on equivalence classes for chars).
• Test by learning distance for K-NN with an additional latent variable
![Page 26: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/26.jpg)
K-NN with latent prototypes
test example y (a string of phonemes)
possible prototypes x (known word pronounciation )
x1 x2 x3 xm
words from dictionary
y
w1 w2 wK
learned phonetic distance
![Page 27: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/27.jpg)
K-NN with latent prototypes
x1 x2 x3 xm
y
w1 w2 wK
learned phonetic distance
Method needs (x,y) pairs to train a distance – to handle this, an additional level of E/M is used to pick the “latent prototype” to pair with each y
![Page 28: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/28.jpg)
Hidden prototype K-nn
![Page 29: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/29.jpg)
Experiments
•E1: on-line pronounciation dictionary
•E2: subset of E1 with corpus words
•E3: dictionary from training corpus
•E4: dictionary from training + test corpus (!)
•E5: E1 + E3
![Page 30: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/30.jpg)
Experiments
![Page 31: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/31.jpg)
Experiments
![Page 32: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/32.jpg)
Special meeting Wed 4/14
• What: Evolving and Self-Managing Data Integration Systems
• Who: AnHai Doan, Univ. of Illinois at Urbana-Champaign
• When: Wednesday, April 14, 2004 @ 11am (food at 10:30am)
• Where: Sennott Square Building, room 5317
![Page 33: Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ](https://reader035.vdocuments.us/reader035/viewer/2022062515/56649f435503460f94c64277/html5/thumbnails/33.jpg)
Special meeting 4/28 (last class)
• First International Joint Conference on Information Extraction, Information Integration, and Sequential Learning
• 10:30-11:50 am, Wean Hall 4601
• All project proposals have been accepted as paper abstracts, and you’re all invited to present for 10min (including questions)