Download - Markov Chains
![Page 1: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/1.jpg)
![Page 2: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/2.jpg)
CpG is a pair of nucleotides C and G , appearing successively , in this order, along one DNA strand.
CpG islands are particular short subsequences in them the couple CpG is more frequent.
These CpG islands are known to appear in biologically more significant parts of the genome.
![Page 3: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/3.jpg)
Two problems involving CpG islands:
1. Given a short genome sequence, decide if it comes from a CpG islands or not.
2. Given a long DNA sequence, locate all the CpG islands in it.
![Page 4: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/4.jpg)
We like to show a Markov chain graphically as a collection of ‘states’ , each of which corresponds to a particular residue , with arrows between the states.
![Page 5: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/5.jpg)
A Markov chain is a triplet (Q,{p( x1 = s )},A), where:
• Q is a finite set of states. Each state corresponds to a symbol in the alphabet .
• Initial state probabilities.
• A is the state transition probabilities, denoted by ast for each s,t Q
For each s,t Q the transition probability is:
ast P ( x i = t | x i-1 = s )
Formal definition:
![Page 6: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/6.jpg)
For any probabilistic model of sequences we can write the probability of the sequence as :
P(x) = P (xL, xL-1,…..,x1)
By applying P(X,Y) = P(X|Y) P(Y) many times:
L
ixx ii
21
1122-L1-L1-LL
1a )P(x
))P(xx|(x )x|P(x )x|P(x P(x)
![Page 7: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/7.jpg)
Begin and end states ( and ) can be added to a Markov chain.
P(x1 = s) = as
P( | xL = t) = at
![Page 8: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/8.jpg)
INPUT : A short DNA sequence X = (x1 , …. , xL) (where = { A,C,G,T }).
QUESTION : Decide whether X is a CpG island.We can use two Markov chain models to solve this problem : one for dealing with CpG islands (the ‘+’ model) and the other for dealing with non CpG island (the ‘-’ model).
![Page 9: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/9.jpg)
c+st is the number of
times letter t followed letter s in the CpG chain.
''
tst
stst c
ca
The transition probabilities in each model are derived from a collection of human gene sequences, containing 48 putative CpG islands.
a+st denote the
transition probability of s,t inside a CpG island.
![Page 10: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/10.jpg)
Each row sums to one
The table is asymmetric
![Page 11: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/11.jpg)
The higher this score, the more likely it is that X is a CpG island.
Therefore we can compute a logaritmic likelihood score for a sequence X by :
![Page 12: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/12.jpg)
CpG islands sequences shown in dark grey and non-CpG sequences in light grey.
![Page 13: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/13.jpg)
INPUT : A long DNA sequence X = (x1 , …. , xL) (where = { A,C,G,T }).
QUESTION : Locate the CpG islands along X.
![Page 14: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/14.jpg)
Extract a sliding window Xk = (xk+1 , … , x k+l ) (where l << L and 1 k L – l) from the sequence.
Calculate Score(Xk) for each one of the resulting subsequences.
Subsequences that receive positive scores are potential CpG islands.
A naive approach
main disadvantage : We have no information about the lengths of the islands.
![Page 15: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/15.jpg)
Combine the two markov chains of the previous problem into a unified model, with a small probability of switching from one chain to the other at each transition point.
A better solution :
![Page 16: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/16.jpg)
In the new model :
A+ , C+ , G+ and T+ emit A,C,G and T respectively in CpG island regions, and A- ,C-,G- and T- corresponding in non-islands regions.
State: A+ , C+ , G+ , T+ , A- , C-, G- , T- Emitted Symbol: A C G T A C G T
![Page 17: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/17.jpg)
A Hidden Markov Model (HMM) is a triplet M = ( , Q , ), where:
• is an alphabet of symbols.
• Q is a finite set of states, capable of emitting symbols from the alphabet .
• is a set of probabilities, comprised of:
- State transition probabilities
- Emission probabilities
![Page 18: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/18.jpg)
State transition probabilities :
The state sequence = (1 ,…, L) is called a path.
the probability of a state depends only on the previous state: akl = P ( i = l | i-1 = k )Emission probabilities :
The symbol sequence is X = (x1 ,….,xL)
Emission probability is the probability that symbol b is seen when in state k : ek(b) = P (x i = b | i = k )
![Page 19: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/19.jpg)
Where for our convenience we denote : 0 = begin and L+1 = end.
The probability that the sequence X was generated by the model M given the path is therefore:
L
1i,, 110
)( a )P(X,iii
axe i
![Page 20: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/20.jpg)
Example :
modeling a dishonest casino dealer
![Page 21: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/21.jpg)
INPUT : A hidden Markov model M = ( , Q , ) and a sequence X , for which the generating path = (1 ,…, L) is unknown.
QUESTION : Find the most probable generating path for X
In general there may be many state sequences that could give rise to any particular sequence of symbols.
![Page 22: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/22.jpg)
In our example :
(C+,G+,C+,G+) , (C-,G-,C-,G-) and (C+,G-,C+,G-) would all generate the symbol sequence CGCG. However, they do so with very different probabilities.
We are looking for a path such that P(X , ) is maximized.
= argmax {P(X , ) }
![Page 23: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/23.jpg)
This algorithm is based on dynamic programming .
The most probable path * can be found recursively.
Suppose the probability vk(i) of the most probable path ending in state k with observation i is known for all states k. then these probabilities can be calculated for observation xi+1 as :
![Page 24: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/24.jpg)
The algorithm:
Time complexity: O(L|Q|2) Space complexity: O(L|Q|)
![Page 25: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/25.jpg)
The values of v for the sequence CGCG.
Example 1 : CpG islands
![Page 26: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/26.jpg)
Example 2 : The casino
![Page 27: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/27.jpg)
INPUT : A hidden Markov model M = ( , Q , ) and a sequence X , for which the generating path = (1 ,…, L) is unknown.
QUESTION : What the probability that observation xi came from state k given the observed sequence? i.e. P(i = k|X).
This is the posterior probability of state k at time i when the emitted sequence X is known.
![Page 28: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/28.jpg)
To answer this question we need two algorithms, both assume that X is known and
i = k:
Forward algorithm -
fk(i) = P(x1…xi ,i = k) The probability of emitting the prefix (x1,…,xi).
Backward algorithm -
bk(i) = P(xi+1…xL|i = k) The probability of the suffix (xi+1,…,xL).
![Page 29: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/29.jpg)
In Markov chains, the probability of a sequence was calculated by the equation :
What is the probability P(x) for an HMM?
L
ixx ii
21
1122-L1-L1-LL
1a )P(x
))P(xx|(x )x|P(x )x|P(x P(x)
•In order to find P(i = k|X). We need to know p(x).
![Page 30: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/30.jpg)
many different state paths can generate the same sequence x
The probability of x is sum of the probabilities for all possible paths.
In HMM…
The number of possible paths increases exponentially with the length of the sequence
Enumerating all paths is not practical
),()( xpxp
![Page 31: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/31.jpg)
one approach : We can use probability of Viterbi path * as an approximation to P(x).
In fact , the full probability can itself be calculated by a dynamic programming (like Viterbi), replacing the maximization steps with sums.
kendkk
klkill
iik
aLfxp
aifxeif
kxxPif
,
1
1
)()(
)()()1(
),...()(
![Page 32: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/32.jpg)
The algorithm :
![Page 33: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/33.jpg)
Calculate the probability of the suffix (xi+1,…,xL). bk(i) = P(xi+1…xL|i = k)
The recursion start at the end of the sequence.
![Page 34: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/34.jpg)
Back to the posterior problem…
Now we can calculate P(i = k|X).
Where P(x) is the result of the forward (or backward) algorithm.
![Page 35: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/35.jpg)
In Viterbi , Forward and Backward algorithms :
Complexity
- Time complexity: O(L|Q|2)
- Space complexity: O(L|Q|)
working in log space to avoid underflow errors when implemented on computer.
Some comments…
![Page 36: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/36.jpg)
The casino example
The posterior probability of being in a fair die throughout the sequence
Loaded die used in blue areas
Number of the roll
Pro
b. O
f fa
ir d
ie
![Page 37: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/37.jpg)
Uses for Posterior Decoding - Two alternative forms of decoding:
1. When many different paths have almost the same probability as the most probable one, then we may want to consider other possible paths as well.
** = argmaxk {P(i = k|X)}
* ** may not be a legitimate path, if some transitions are not permitted.
![Page 38: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/38.jpg)
2. When we’re not interested in the state sequence itself, but in some other property derived from it.
For example : g(k) = 1 if k {A+,C+,G+,T+}, 0 otherwise. G(i | x) is posterior probability that base i is in a CpG island.
![Page 39: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/39.jpg)
The most difficult problem faced when using HMMs is specifying the model:
Design the structure : which states and the connection between them.
The assignment of the transition and the emission probabilities akl and ek(b).
![Page 40: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/40.jpg)
Given training sequences X1,…,Xn * of length L1,…,Ln respectively, which were all generated from the HMM M = ( , Q , ). however, the values of the probabilities in , are unknown.
We want to construct an HMM that will best characterize X1,…,Xn .
We need to assign values to that will maximize the probability of X1,…,Xn .
(b)r(b)E(b)E
rAA
))|(log(
))|,...,(log()|,...,(
)}|,...,({maxarg
kkk
klklkl
1
11
1*
n
j
i
nn
n
xP
xxPxxscore
xxscore
![Page 41: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/41.jpg)
• since the sequences where generated independently, Probability of X1,…,Xn with is: )|()|,...,(
1
1
n
j
in xpxxp
Using the logarithmic score our goal is to find * such that:
))|(log(
))|,...,(log()|,...,(
)}|,...,({maxarg
1
11
1*
n
j
i
nn
n
xP
xxPxxscore
xxscore
![Page 42: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/42.jpg)
The sequences X1,…,Xn are usually called the training sequences
We shell examine two cases for parameter estimation:
Estimation when the state sequence is known.
Estimation when the state sequence is unknown.
![Page 43: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/43.jpg)
Estimation when the state sequence is known
When all the paths are known, we can count the number of times each particular transition or emission is used in the set of training sequence.
Akl - number of transitions from the state k to l in
all the state sequences
Ek(b) – the number of times that an emission of the symbol b occurred in state k in all the state sequences.
l'kl'
klkl A
Aa
b'k
kk )(b'E
(b)E(b)e
![Page 44: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/44.jpg)
Laplace`s correction In order to avoid zero probability
we will use :
(b)r(b)E(b)E
rAA
kkk
klklkl
Where usually rkl and rk(b) Are usually equal 1.
![Page 45: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/45.jpg)
Estimation when the state sequence is unknown
In the case that the state sequences are not known, the problem of finding the optimal set of parameters * is known to be NP-complete.
The Baum-Welch algorthim is a heuristic algorithm for finding a solution to the problem
Baum-Welch algorithm , which is a special case of the EM technique (Expectation and maximization).
![Page 46: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/46.jpg)
Initialization: Assign arbitrary values to .
Expectation: calculates Akl and Ek(b) as the expected number of times each transition is used, given the training sequences.
Probability that akl is used at position i in sequence x is:
![Page 47: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/47.jpg)
Sum over all positions and training sequences to get Akl
Similary ,we can find the expected number of times that letter b appears in state k.
1)(i)b(xe(i)af)P(x
1A j
lj
1ilklj i
jkj kl
j b}x{i|
jk
jkjk
ji
(i)(i)bf)P(x
1(b)E
![Page 48: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/48.jpg)
Maximization: update the values of akl and ek(b) according to the equations:
This process is iterated until the improvement of Score(X1,…,Xn|) is less then a given parameter .
l'kl'
klkl A
Aa
b'k
kk )(b'E
(b)E(b)e
l'kl'
klkl A
Aa
![Page 49: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/49.jpg)
Baum-Welch converges to local maximum of the target function Score(X1,…,Xn|)
Main problem : may exist several local maximum .
Solution:
1. run the algorithm several times, each time with different initial values for .
2. Start with values that are meaningful.
![Page 50: Markov Chains](https://reader036.vdocuments.us/reader036/viewer/2022062301/5681529f550346895dc0c4a4/html5/thumbnails/50.jpg)
References Hidden markov models / ron
shamirhttp://www.math.tau.ac.il/~rshamir/algmb/98/scribe/html/lec06/node1.html
Biological Sequence Analysis , Durbin et el,chapter 3