hidden markov model
DESCRIPTION
Hidden Markov Model. Example: CpG Island. We consider two questions (and some variants): Question 1: Given a short stretch of genomic data, does it come from a CpG island ? Question 2: Given a long piece of genomic data, does it contain CpG islands in it, where, what length ? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/1.jpg)
.
Hidden Markov Model
![Page 2: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/2.jpg)
2
Example: CpG Island
We consider two questions (and some variants):
Question 1: Given a short stretch of genomic data, does it come from a CpG island ?
Question 2: Given a long piece of genomic data, does it contain CpG islands in it, where, what length ?
We “solve” the first question by modeling strings with and without CpG islands as Markov Chains over the same states {A,C,G,T} but different transition probabilities:
![Page 3: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/3.jpg)
4
Question 2: Finding CpG Islands
Given a long genomic str with possible CpG Islands, we define a Markov Chain over 8 states, all interconnected (hence it is ergodic):
C+ T+G+A+
C- T-G-A-
The problem is that we don’t know the sequence of states which are traversed, but just the sequence of letters.
Therefore we use here Hidden Markov Model
![Page 4: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/4.jpg)
5
Hidden Markov Models - HMM
H1 H2 HL-1 HL
X1 X2 XL-1 XL
Hi
Xi
Hidden variables
Observed data
![Page 5: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/5.jpg)
6
Hidden Markov Model
1 11
( , , ) ( | )L
L i ii
p s s p s s
A Markov chain (s1,…,sL):
and for each state s and a symbol x we have p(Xi=x|Si=s)
Application in communication: message sent is (s1,…,sm) but we receive (x1,…,xm) . Compute what is the most likely message sent ?
Application in speech recognition: word said is (s1,…,sm) but we recorded (x1,…,xm) . Compute what is the most likely word said ?
S1 S2 SL-1 SL
x1 x2 XL-1 xL
M M M M
TTTT
![Page 6: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/6.jpg)
7
Hidden Markov Model
Notations:
Markov Chain transition probabilities: p(Si+1= t|Si = s) = ast
Emission probabilities: p(Xi = b| Si = s) = es(b)
S1 S2 SL-1 SL
x1 x2 XL-1 xL
M M M M
TTTT
![Page 7: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/7.jpg)
8
Example: The Dishonest Casino
A casino has two dice: Fair die
P(1) = P(2) = P(3) = P(5) = P(6) = 1/6 Loaded die
P(1) = P(2) = P(3) = P(5) = 1/10P(6) = 1/2
Casino player switches back-&-forth between fair and loaded die once every 20 turns
Game:1. You bet $12. You roll (always with a fair die)3. Casino player rolls (maybe with fair die,
maybe with loaded die)4. Highest number wins $2
![Page 8: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/8.jpg)
9
Question # 1 – DecodingGIVEN
A sequence of rolls by the casino player
1245526462146146136136661664661636616366163616515615115146123562344
QUESTION
What portion of the sequence was generated with the fair die, and what portion with the loaded die?
This is the DECODING question in HMMs
![Page 9: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/9.jpg)
10
Question # 2 – EvaluationGIVEN
A sequence of rolls by the casino player
1245526462146146136136661664661636616366163616515615115146123562344
QUESTION
How likely is this sequence, given our model of how the casino works?
This is the EVALUATION problem in HMMs
![Page 10: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/10.jpg)
11
Question # 3 – LearningGIVEN
A sequence of rolls by the casino player
1245526462146146136136661664661636616366163616515615115146123562344
QUESTION
How “loaded” is the loaded die? How “fair” is the fair die? How often does the casino player change from fair to loaded, and back?
This is the LEARNING question in HMMs
![Page 11: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/11.jpg)
12
The dishonest casino model
FAIR LOADED
0.05
0.05
0.950.95
P(1|F) = 1/6P(2|F) = 1/6P(3|F) = 1/6P(4|F) = 1/6P(5|F) = 1/6P(6|F) = 1/6
P(1|L) = 1/10P(2|L) = 1/10P(3|L) = 1/10P(4|L) = 1/10P(5|L) = 1/10P(6|L) = 1/2
![Page 12: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/12.jpg)
13
A parse of a sequence
Given a sequence x = x1……xN,
A parse of x is a sequence of states = 1, ……, N
1
2
K
…
1
2
K
…
1
2
K
…
…
…
…
1
2
K
…
x1 x2 x3 xK
2
1
K
2
![Page 13: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/13.jpg)
14
Likelihood of a parse
Given a sequence x = x1……xN
and a parse = 1, ……, N,
To find how likely is the parse: (given our HMM)
P(x, ) = P(x1, …, xN, 1, ……, N) = P(xN, N | x1…xN-1, 1, ……, N-1) P(x1…xN-1, 1, ……, N-1) = P(xN, N | N-1) P(x1…xN-1, 1, ……, N-1) = … =
P(xN, N | N-1) P(xN-1, N-1 | N-2)……P(x2, 2 | 1) P(x1, 1) = P(xN | N) P(N | N-1) ……P(x2 | 2) P(2 | 1) P(x1 | 1) P(1) = a01 a12……aN-1N e1(x1)……eN(xN)
1
2
K…
1
2
K…
1
2
K…
…
…
…
1
2
K…
x1 x2 x3 xK
2
1
K
2
![Page 14: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/14.jpg)
15
Example: the dishonest casino
Let the sequence of rolls be:
x = 1, 2, 1, 5, 6, 2, 1, 6, 2, 4
Then, what is the likelihood of
= Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair?
(say initial probs a0Fair = ½, a0Loaded = ½)
½ P(1 | Fair) P(Fair | Fair) P(2 | Fair) P(Fair | Fair) … P(4 | Fair) =
½ (1/6)10 (0.95)9 = .00000000521158647211 = 0.5 10-9
![Page 15: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/15.jpg)
16
Example: the dishonest casino
So, the likelihood the die is fair in all this runis just 0.521 10-9
OK, but what is the likelihood of
= Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded?
½ P(1 | Loaded) P(Loaded, Loaded) … P(4 | Loaded) =
½ (1/10)8 (1/2)2 (0.95)9 = .00000000078781176215 = 0.79 10-9
Therefore, it somewhat more likely that the die is fair all the way, than that it is loaded all the way
![Page 16: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/16.jpg)
17
Example: the dishonest casino
Let the sequence of rolls be:
x = 1, 6, 6, 5, 6, 2, 6, 6, 3, 6
Now, what is the likelihood = F, F, …, F?
½ (1/6)10 (0.95)9 = 0.5 10-9, same as before
What is the likelihood
= L, L, …, L?
½ (1/10)4 (1/2)6 (0.95)9 = .00000049238235134735 = 0.5 10-7
So, it is 100 times more likely the die is loaded
![Page 17: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/17.jpg)
18
C-G Islands ExampleA
C
G
T
change
A
C
G
T
H1 H2 HL-1 HL
X1 X2 XL-1 XL
Hi
Xi
C-G island?
A/C/G/T
![Page 18: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/18.jpg)
19
Hidden Markov Model for CpG Islands
The states:
S1 S2 SL-1 SL
X1 X2 XL-1 XL
Domain(Si)={+, -} (2 values)
In this representation P(xi| si) = 0 or 1 depending on whether xi is consistent with si . E.g. xi= G is consistent with si=(+,G) and with si=(-,G) but not with any other state of si.
The query of interest:),,|,...,(argmax ),...,( 11
),...,(s
**1
1
LLs
L xxsspssL
![Page 19: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/19.jpg)
20
1. Most Probable state path = decodingS1 S2 SL-1 SL
x1 x2 XL-1 xL
M M M M
TTTT
First Question: Given an output sequence x = (x1,…,xL),
A most probable path s*= (s*1,…,s*
L) is one which maximizes p(s|x).
1( ,..., )
* *1 1 1* ( ,..., ) ( ,..., | ,..., )maxarg
Ls s
L L Ls s s p s s x x
![Page 20: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/20.jpg)
21
DecodingGIVEN x = x1x2……xN
We want to find = 1, ……, N,such that P[ x, ] is maximized
* = argmax P[ x, ]
We can use dynamic programming!
Let Vk(i) = max{1,…,i-1} P[x1…xi-1, 1, …, i-1, xi, i = k] = Probability of most likely sequence of states ending at state i = k
1
2
K…
1
2
K…
1
2
K…
…
…
…
1
2
K…
x1
x2 x3 xK
2
1
K
2
![Page 21: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/21.jpg)
22
Decoding – main idea
Given that for all states k, and for a fixed position i,
Vk(i) = max{1,…,i-1} P[x1…xi-1, 1, …, i-1, xi, i = k]
What is Vj(i+1)?
From definition, Vj(i+1) = max{1,…,i}P[ x1…xi, 1, …, i, xi+1, i+1 = j ]
= max{1,…,i}P(xi+1, i+1 = j | x1…xi,1,…, i) P[x1…xi, 1,…, i] = max{1,…,i}P(xi+1, i+1 = j | i ) P[x1…xi-1, 1, …, i-1, xi, i]
= maxk [P(xi+1, i+1 = j | i = k) max{1,…,i-1}P[x1…xi-1,1,…,i-1, xi,i=k]] = ej(xi+1) maxk akj Vk(i)
![Page 22: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/22.jpg)
23
The Viterbi Algorithm
Input: x = x1……xN
Initialization:V0(0) = 1 (0 is the imaginary first position)Vk(0) = 0, for all k > 0
Iteration:Vj(i) = ej(xi) maxk akj Vk(i-1)
Ptrj(i) = argmaxk akj Vk(i-1)
Termination:P(x, *) = maxk Vk(N)
Traceback: N* = argmaxk Vk(N) i-1* = Ptri (i)
![Page 23: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/23.jpg)
24
The Viterbi Algorithm
Similar to “aligning” a set of states to a sequence
Time:O(K2N)
Space:O(KN)
x1 x2 x3 ………………………………………..xN
State 12
K
Vj(i)
![Page 24: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/24.jpg)
25
Viterbi Algorithm – a practical detailUnderflows are a significant problem
P[ x1,…., xi, 1, …, i ] = a01 a12……ai e1(x1)……ei(xi)
These numbers become extremely small – underflow
Solution: Take the logs of all values
Vl(i) = log ek(xi) + maxk [ Vk(i-1) + log akl ]
![Page 25: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/25.jpg)
26
ExampleLet x be a sequence with a portion of ~ 1/6 6’s, followed by a portion of ~ ½ 6’s…
x = 123456123456…12345 6626364656…1626364656
Then, it is not hard to show that optimal parse is :
FFF…………………...F LLL………………………...L
6 characters “123456” parsed as F, contribute .956(1/6)6 = 1.610-5
parsed as L, contribute .956(1/2)1(1/10)5 = 0.410-5
“162636” parsed as F, contribute .956(1/6)6 = 1.610-5
parsed as L, contribute .956(1/2)3(1/10)3 = 9.010-5
![Page 26: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/26.jpg)
27
2. Computing p(x) = evaluation
S1 S2 SL-1 SL
x1 x2 XL-1 xL
M M M M
TTTT
Given an output sequence x = (x1,…,xL),Compute the probability that this sequence was generated:
( ) ( ),p px x sS
The summation taken over all state-paths s generating x.
![Page 27: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/27.jpg)
28
Evaluation
We will develop algorithms that allow us to compute:
P(x) Probability of x given the model
P(xi…xj) Probability of a substring of x given the model
P(i = k | x) Probability that the ith state is k, given x
![Page 28: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/28.jpg)
29
The Forward AlgorithmWe want to calculate
P(x) = probability of x, given the HMM
Sum over all possible ways of generating x:
P(x) = P(x, ) = P(x | ) P()
To avoid summing over an exponential number of paths , define
fk(i) = P(x1…xi, i = k) (the forward probability)
![Page 29: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/29.jpg)
30
The Forward Algorithm – derivationDefine the forward probability:
fk(i) = P(x1…xi, i = k)
= 1…i-1 P(x1…xi-1, 1,…, i-1, i = k) ek(xi)
= j 1…i-2 P(x1…xi-1, 1,…, i-2, i-1 = j) ajk ek(xi)
= ek(xi) j fj(i-1) ajk
![Page 30: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/30.jpg)
31
The Forward Algorithm
We can compute fk(i) for all k, i, using dynamic programming!
Initialization:f0(0) = 1
fk(0) = 0, for all k > 0
Iteration:fk(i) = ek(xi) j fj(i-1) ajk
Termination:P(x) = k fk(N) ak0
Where, ak0 is the probability that the terminating state is k (usually = a0k)
![Page 31: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/31.jpg)
32
Relation between Forward and Viterbi
VITERBI
Initialization:V0(0) = 1
Vk(0) = 0, for all k > 0
Iteration:
Vj(i) = ej(xi) maxk Vk(i-1) akj
Termination:
P(x, *) = maxk Vk(N)
FORWARD
Initialization:f0(0) = 1
fk(0) = 0, for all k > 0
Iteration:
fj(i) = ej(xi) k fk(i-1) akj
Termination:
P(x) = k fk(N) ak0
![Page 32: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/32.jpg)
33
Motivation for the Backward Algorithm
We want to compute
P(i = k | x),
the probability distribution on the ith position, given x
We start by computing
P(i = k, x) = P(x1…xi, i = k, xi+1…xN)
= P(x1…xi, i = k) P(xi+1…xN | x1…xi, i = k)
= P(x1…xi, i = k) P(xi+1…xN | i = k)
Then, P(i = k | x) = P(i = k, x) / P(x)
Forward, fk(i) Backward, bk(i)
![Page 33: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/33.jpg)
34
The Backward Algorithm – derivationDefine the backward probability:
bk(i) = P(xi+1…xN | i = k)
= i+1…N P(xi+1,xi+2, …, xN, i+1, …, N | i = k)
= j i+1…N P(xi+1,xi+2, …, xN, i+1 = j, i+2, …, N | i = k)
= j ej(xi+1) akj i+2…N P(xi+2, …, xN, i+2, …, N | i+1 = j)
= j ej(xi+1) akj bj(i+1)
![Page 34: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/34.jpg)
35
The Backward Algorithm
We can compute bk(i) for all k, i, using dynamic programming
Initialization:
bk(N) = ak0, for all k
Iteration:
bk(i) = j ej(xi+1) akj bj(i+1)
Termination:
P(x) = j a0j ej(x1) bj(1)
![Page 35: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/35.jpg)
36
Computational Complexity
What is the running time, and space required, for Forward, and Backward?
Time: O(K2N)Space: O(KN)
Useful implementation technique to avoid underflows
Viterbi: sum of logsForward/Backward: rescaling at each position by multiplying by a constant
![Page 36: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/36.jpg)
37
Viterbi, Forward, Backward
VITERBI
Initialization:V0(0) = 1
Vk(0) = 0, for all k > 0
Iteration:
Vj(i) = ej(xi) maxk Vk(i-1) akj
Termination:
P(x, *) = maxk Vk(N)
FORWARD
Initialization:f0(0) = 1
fk(0) = 0, for all k > 0
Iteration:
fj(i) = ej(xi) k fk(i-1) akj
Termination:
P(x) = k fk(N) ak0
BACKWARD
Initialization:bk(N) = ak0, for all k
Iteration:
bj(i) = k ej(xi+1) akj bk(i+1)
Termination:
P(x) = k a0k ek(x1) bk(1)
![Page 37: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/37.jpg)
38
Posterior DecodingWe can now calculate
fk(i) bk(i)P(i = k | x) = –––––––
P(x)
Then, we can ask
What is the most likely state at position i of sequence x:
Define ^ by Posterior Decoding:
^i = argmaxk P(i = k | x)
![Page 38: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/38.jpg)
39
Posterior Decoding For each state,
Posterior Decoding gives us a curve of likelihood of state for each position
That is sometimes more informative than Viterbi path *
Posterior Decoding may give an invalid sequence of states
Why?
![Page 39: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/39.jpg)
40
Posterior Decoding
P(i = k | x) = P( | x) 1(i = k)
= {:[i] = k} P( | x)
x1 x2 x3 …………………………………………… xN
State 1
l P(i=l|x)
k
![Page 40: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/40.jpg)
41
Posterior Decoding
Example: How do we compute P(i = l, ji = l’ | x)?
fl(i) bl(j)P(i = l, iI = l’ | x) = –––––––
P(x)
x1 x2 x3 …………………………………………… xN
State 1
l P(i=l|x)
k
P(j=l’|x)
![Page 41: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/41.jpg)
42
Applications of the model
Given a DNA region x,
The Viterbi algorithm predicts locations of CpG islands
Given a nucleotide xi, (say xi = A)
The Viterbi parse tells whether xi is in a CpG island in the most likely general scenario
The Forward/Backward algorithms can calculate
P(xi is in CpG island) = P(i = A+ | x)
Posterior Decoding can assign locally optimal predictions of CpG islands
^i = argmaxk P(i = k | x)
![Page 42: Hidden Markov Model](https://reader036.vdocuments.us/reader036/viewer/2022070502/56812e61550346895d94074a/html5/thumbnails/42.jpg)
43
A model of CpG Islands – (2) Transitions
What about transitions between (+) and (-) states? They affect
Avg. length of CpG island
Avg. separation between two CpG islands
X Y
1-p
1-q
p q
Length distribution of region X:
P[lX = 1] = 1-p
P[lX = 2] = p(1-p)
…P[lX= k] = pk(1-p)
E[lX] = 1/(1-p)
Geometric distribution, with mean 1/(1-p)