1 hidden markov model xiaole shirley liu stat115, stat215, bio298, bist520
TRANSCRIPT
2
Outline
• Markov Chain
• Hidden Markov Model– Observations, hidden states, initial, transition
and emission probabilities
• Three problems– Pb(observations): forward, backward procedure– Infer hidden states: forward-backward, Viterbi– Estimate parameters: Baum-Welch
3
iid process
• iid: independently and identically distributed– Events are not correlated to each other– Current event has no predictive power of future
event– E.g.
Pb(girl | boy) = Pb(girl),
Pb(coin H | H) = Pb(H)
Pb(dice 1 | 5) = pb(1)
4
Discrete Markov Chain
• Discrete Markov process– Distinct states: S1, S2, …Sn
– Regularly spaced discrete times: t = 1, 2,…– Markov chain: future state only depends on
present state, but not the path to get here
– aij transition probability
]|[...],|[ 121 itjtktitjt SqSqPSqSqSqP
N
jijijij aaa
1
1,0,
5
Markov Chain Example 1
• States: exam grade 1 – Pass, 2 – Fail
• Discrete times: exam #1, #2, # 3, …
• State transition probability
• Given PPPFFF, pb of pass in the next exam
5.05.0
2.08.0}{ ijaA
5.0
),|(
}),,,,,{,|(
2617
22211117
SqASqP
SSSSSSOASqP
6
Markov Chain Example 2
• States: 1 – rain; 2 – cloudy; 3 – sunny
• Discrete times: day 1, 2, 3, …
• State transition probability
• Given 3 at t=1
8.01.01.0
2.06.02.0
3.03.04.0
}{ ijaA
4
23321311313333
3132311333
10536.12.01.03.04.01.08.08.0
),|},,,,,,,{(
aaaaaaa
SqASSSSSSSSOP
7
Markov Chain Example 3
• States: fair coin F, unfair (biased) coin B
• Discrete times: flip 1, 2, 3, …
• Initial probability: F = 0.6, B = 0.4
• Transition probability
• Prob(FFBBFFFB)
F B0.1
0.30.9 0.7
8
Hidden Markov Model
• Coin toss example
• Coin transition is a Markov chain
• Probability of H/T depends on the coin used
• Observation of H/T is a hidden Markov chain (coin state is hidden)
9
Hidden Markov Model
• Elements of an HMM (coin toss)– N, the number of states (F / B)– M, the number of distinct observation (H / T)
– A = {aij} state transition probability
– B = {bj(k)} emission probability
={i} initial state distributionF = 0.4, B = 0.6
7.03.0
1.09.0A
2.0)(,8.0)(
5.0)(,5.0)(
TbHb
TbHb
BB
FF
10
HMM Applications• Stock market: bull/bear market hidden Markov
chain, stock daily up/down observed, depends on big market trend
• Speech recognition: sentences & words hidden Markov chain, spoken sound observed (heard), depends on the words
• Digital signal processing: source signal (0/1) hidden Markov chain, arrival signal fluctuation observed, depends on source
• Bioinformatics: sequence motif finding, gene prediction, genome copy number change, protein structure prediction, protein-DNA interaction prediction
11
Basic Problems for HMM
1. Given , how to compute P(O|) observing sequence O = O1O2…OT
• Probability of observing HTTHHHT …• Forward procedure, backward procedure
• Given observation sequence O = O1O2…OT and , how to choose state sequence Q = q1q2…qt
1. What is the hidden coin behind each flip2. Forward-backward, Viterbi
1. How to estimate =(A,B,) so as to maximize P(O| )
1. How to estimate coin parameters 2. Baum-Welch (Expectation maximization)
12
Problem 1: P(O|)
• Suppose we know the state sequence Q
– O = HT T H H H T– Q = F F B F F B B
– Q = B F B F B B B
• Each given path Q has a probability for O
2.08.05.05.02.05.05.0
)()()()()()()(),|(
TbHbHbHbTbTbHbQOP BBFFBFF
2.08.08.05.02.05.08.0
)()()()()()()(),|(
TbHbHbHbTbTbHbQOP BBBFBFB
)()...()(),|( 2211 TqTqq ObObObQOP
13
Problem 1: P(O|)
• What is the prob of this path Q?
– Q = F F B F F B B
– Q = B F B F B B B
• Each given path Q has its own probability
7.01.09.03.01.09.06.0
)|(
BBFBFFBFFBFFF aaaaaaQP
)|( QP
7.07.01.03.01.03.04.0
)|(
BBBBFBBFFBBFB aaaaaaQP
14
Problem 1: P(O|)
• Therefore, total pb of O = HTTHHHT
• Sum over all possible paths Q: each Q with its own pb multiplied by the pb of O given Q
• For path of N long and T hidden states, there are TN paths, unfeasible calculation
QallQall
QOPQPQOPOP ),|()|()|,()|(
15
Solution to Prob1: Forward Procedure
• Use dynamic programming
• Summing at every time point
• Keep previous subproblem solution to speed up current calculation
16
Forward Procedure
• Coin toss, O = HTTHHHT
• Initialization
– Pb of seeing H1 from F1 or B1
H T T H …
)()( 11 Obi ii
B
F
17
Forward Procedure
• Coin toss, O = HTTHHHT
• Initialization
• Induction
– Pb of seeing T2 from F2 or B2
F2 could come from F1 or B1
Each has its pb, add them up
H T T H …
)()( 11 Obi ii
)(])([)( 11
1
tj
N
iijtt Obaij
+BB
F+
F
18
Forward Procedure
• Coin toss, O = HTTHHHT
• Initialization
• Induction
H T T H …
0712.02.0)7.048.01.02.0()())()(()(
162.05.0)3.048.09.02.0()())()(()(
112
112
TbaBaFB
TbaBaFF
BBBFB
FBFFF
)()( 11 Obi ii
)(])([)( 11
1
tj
N
iijtt Obaij
+BB
F+
F
19
Forward Procedure
• Coin toss, O = HTTHHHT
• Initialization
• Induction
H T T H …
013208.02.0]7.00712.01.0162.0[)(
08358.05.0]3.00712.09.0162.0[)(
3
3
B
F
)()( 11 Obi ii
)(])([)( 11
1
tj
N
iijtt Obaij
+B
+B
+
+B B
F F+
F+
F
20
Forward Procedure
• Coin toss, O = HTTHHHT
• Initialization
• Induction
• Termination
H T T H …
)()( 11 Obi ii
)(])([)( 11
1
tj
N
iijtt Obaij
+B
+B
+
+B B
F F+
F+
F
)()()()|( 441
BFiOPN
it
21
Solution to Prob1: Backward Procedure
• Coin toss, O = HTTHHHT
• Initialization
• Pb of coin to see certain flip after it
...H H H T
1)()(
1)(
**
*
BF
i
TT
T
B
F
22
Backward Procedure
• Coin toss, O = HTTHHHT
• Initialization
• Induction
• Pb of coin to see certain flip after it
1)(* iT
)()()(1
11 jObaiN
jttjijt
23
Backward Procedure
• Coin toss, O = HTTHHHT
• Initialization
• Induction
...H H H T?
1)(* iT
)()()(1
11 jObaiN
jttjijt
29.02.07.05.03.01)(1)()(
47.02.01.05.09.01)(1)()(
1
1
TbaTbaB
TbaTbaF
BBBFBFT
BFBFFFT
+
+
B
F
24
Backward Procedure
• Coin toss, O = HTTHHHT
• Initialization
• Induction
...H H H T
1)(* iT
)()()(1
11 jObaiN
jttjijt
2329.029.08.07.047.05.03.0)()()()()(
2347.029.08.01.047.05.09.0)()()()()(
112
112
BHbaFHbaB
BHbaFHbaF
TBBBTFBFT
TBFBTFFFT
+ +
++
B B
+
+
B
F F F
25
Backward Procedure
• Coin toss, O = HTTHHHT
• Initialization
• Induction
• Termination
• Both forward and backward could be used to solve problem 1, which should give identical results
1)(* iT
)()()(1
11 jObaiN
jttjijt
)()()()((*) 110 BHbFHb BBFF
26
Solution to Problem 2 Forward-Backward Procedure
• First run forward and backward separately• Keep track of the scores at every point• Coin toss
– α: pb of this coin for seeing all the flips now and before
– β: pb of this coin for seeing all the flips after
H T T H H H T
α1(F) α2(F) α3(F) α4(F) α5(F) α6(F) α7(F)
α1(B) α2(B) α3(B) α4(B) α5(B) α6(B) α7(B)
β1(F) β2(F) β3(F) β4(F) β5(F) β6(F) β7(F)
β1(B) β2(B) β3(B) β4(B) β5(B) β6(B) β7(B)
27
Solution to Problem 2 Forward-Backward Procedure
• Coin toss
• Gives probabilistic prediction at every time point• Forward-backward maximizes the expected
number of correctly predicted states (coins)
N
jtt
ttt
jj
iii
1
)()(
)()()(
659.00038.00016.00047.00025.0
0047.00025.0..
)()()()(
)()()(
3333
333
ge
BBFF
FFF
28
Solution to Problem 2Viterbi Algorithm
• Report the path that is most likely to give the observations
• Initiation
• Recursion
• Termination
• Path (state sequence) backtracking
0)(
)()(
1
11
i
Obi ii
])([maxarg)(
)(])([max)(
11
11
ijtNi
t
tjijtNi
t
aij
Obaij
)]([maxarg
)]([max*
1
*
1
iq
iP
TNi
T
TNi
1,...,2,1),( *11
* TTtqq ttt
29
Viterbi Algorithm
• Observe: HTTHHHT
• Initiation0)(
)()(
1
11
i
Obi ii
48.08.06.0)(
2.05.04.0)(
1
1
B
F
31
Viterbi Algorithm
• Observe: HTTHHHT
• Initiation
• Recursion
0)(
)()(
1
11
i
Obi ii
])([maxarg)(
)(])([max)(
11
11
ijtNi
t
tjijtNi
t
aij
Obaij
BBFF
TbaBaFB
TbaBaFF
BBBFBi
FBFFFi
)(,)(
0672.02.0)336.0,02.0max()()])(,)(([max)(
09.05.0)144.0,18.0max()()])(,)(([max)(
22
112
112
Max instead of +, keep track path
32
Viterbi Algorithm
• Max instead of +, keep track of path
• Best path (instead of all path) up to here
H T T H
F F
B B
33
Viterbi Algorithm
• Observe: HTTHHHT
• Initiation
• Recursion
0)(
)()(
1
11
i
Obi ii
])([maxarg)(
)(])([max)(
11
11
ijtNi
t
tjijtNi
t
aij
Obaij
BBFF
TbaBaFB
TbaBaFF
BBBFBi
FBFFFi
)(,)(
0094.02.0)7.00672.0,1.009.0max(
)()])(,)(([max)(
0405.05.0)3.00672.0,9.009.0max(
)()])(,)(([max)(
33
223
223
Max instead of +, keep track path
34
Viterbi Algorithm
• Max instead of +, keep track of path
• Best path (instead of all path) up to here
H T T H
F F
B B
F
B
F
B
35
Viterbi Algorithm
• Terminate, pick state that gives final best δ score, and backtrack to get path
H T T H
• BFBB most likely to give HTTH
F F
B B
F
B
F
BBB
F
B
36
Solution to Problem 3
• No optimal way to do this, so find local maximum
• Baum-Welch algorithm (equivalent to expectation-maximization)– Random initialize =(A,B,) – Run Viterbi based on and O– Update =(A,B,)
: % of F vs B on Viterbi path
• A: frequency of F/B transition on Viterbi path
• B: frequency of H/T emitted by F/B
37
GenScan
• HMM model for gene structure– Hexamer coding statistics– Matrix profile for gene
structure
• Need training sequences– Known coding/noncoding
• Could miss or mispredict whole gene/exon
38
Summary
• Markov Chain• Hidden Markov Model
– Observations, hidden states, initial, transition and emission probabilities
• Three problems– Pb(observations): forward, backward procedure (give
same results)
– Infer hidden states: forward-backward (pb prediction at each state), Viterbi (best path)
– Estimate parameters: Baum-Welch