hmm for cpg islands parameter estimation for hmm maximum likelihood and the information inequality...
Post on 19-Dec-2015
227 views
TRANSCRIPT
![Page 1: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/1.jpg)
HMM for CpG Islands
Parameter Estimation For HMM
Maximum Likelihood and the Information Inequality
Lecture #7
Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001. Shlomo Moran, following Danny Geiger and Nir Friedman
![Page 2: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/2.jpg)
HMM for CpG Islands
![Page 3: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/3.jpg)
Reminder: Hidden Markov ModelS1 S2 SL-1 SL
x1 x2 XL-1 xL
M M M M
TTTT
11 11
( , ) ( , , ; ,..., ) ( )i i i
L
L L s s s ii
p p s s x x m e xs x
Next we apply HMM for the question of recognizing CpG ilands
3
![Page 4: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/4.jpg)
Hidden Markov Model for CpG Islands
The states: Domain(Si)={+, -} {A,C,T,G} (8 values)
In this representation P(xi| si) = 0 or 1 depending on whether xi is consistent with si . e.g. xi= G is consistent with si=(+,G) and with si=(-,G) but not with any other state of si.
A- T+
A T
G +
G
… …… …
4
![Page 5: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/5.jpg)
Reminder: Most Probable state pathS*1
S*2 S*L-1 S*L
x1 x2 XL-1 xL
M M M M
TTTT
Given an output sequence x = (x1,…,xL),
A most probable path s*= (s*1,…,s*
L) is one which maximizes p(s|x).
1( ,..., )
* *1 1 1* ( ,..., ) ( ,..., | ,..., )argmax
Ls s
L L Ls s s p s s x x
5
![Page 6: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/6.jpg)
A- C- T- T+
A C T T
G +
G
Predicting CpG islands via most probable path:
Output symbols: A, C, G, T (4 letters). Markov Chain states: 4 “-” states and 4 “+” states, two for each letter (8 states total).A most probable path (found by Viterbi’s algorithm) predicts CpG islands.
Experiment (Durbin et al, p. 60-61) shows that the predicted islands are shorter than the assumed ones. In addition quite a few “false negatives” are found.
7
![Page 7: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/7.jpg)
Reminder: Most probable state
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
Given an output sequence x = (x1,…,xL), si is a most probable state (at location i) if:
si=argmaxk p(Si=k |x).
( , )( | ) ( , )
( )i
i i
p S kp S k p S k
p
xx xx
8
![Page 8: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/8.jpg)
Finding the probability that a letter is in a CpG island via the algorithm for most probable state:
The probability that the occurrence of G in the i-th location is in a CpG island (+ state) is:
∑s+ p(Si =s+ |x) = ∑s+ F(Si=s+ )B(Si=s+
)
Where the summation is formally over the 4 “+” states, but actually
only state G+ need to be considered (why?)
A- C- T- T+
A C T T
G +
G
i
10
![Page 9: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/9.jpg)
Parameter Estimation for HMM
11
![Page 10: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/10.jpg)
Defining the Parameters
An HMM model is defined by the parameters: mkl and ek(b), for all states k,l and all symbols b. Let θ denote the collection of these parameters:
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
lk
b
mkl
ek(b)
{ : , are states} { ( ) : is a state, is a letter}kl km k l e b k b
12
![Page 11: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/11.jpg)
Training Sets
To determine the values of (the parameters in) θ, use a training set = {x1,...,xn}, where each xj is a sequence which is assumed to fit the model.Given the parameters θ, each sequence xj has an assigned probability p(xj|θ).
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
13
![Page 12: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/12.jpg)
Maximum Likelihood Parameter Estimation for HMM
The elements of the training set {x1,...,xn}, are assumed to be independent, p(x1,..., xn|θ) = ∏j p (xj|θ).
ML parameter estimation looks for θ which maximizes the above.
The exact method for finding or approximating this θ depends on the nature of the training set used.
14
![Page 13: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/13.jpg)
Data for HMM
The training set is characterized by:1.For each xj, the information on the states sj
i (the symbols
xji are usually known).
2.Its size (sum of lengths of all sequences).
S1 S2 SL-1 SL
x1 x2 XL-1 xL
M M M M
TTTT
15
![Page 14: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/14.jpg)
Case 1: ML when Sequences are fully known
We know the complete structure of each sequence in the training set {x1,...,xn}. We wish to estimate mkl and ek(b) for all pairs of states k, l and symbols b.
By the ML method, we look for parameters θ* which maximize the probability of the sample set: p(x1,...,xn| θ*) =MAXθ p(x1,...,xn| θ).
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
16
![Page 15: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/15.jpg)
Case 1: Sequences are fully known
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
Let Mkl= |{i: si-1=k,si=l}| (in xj). Ek(b)=|{i:si=k,xi=b}| (in xj).
( )
( , ) ( , )
then: ( | ) [ ( )]kl kM E bkkl
k l k b
jp m e bx
For each xj we have:
11
( | ) ( )j
i i i
Lj j
s s s ii
p x m e x
17
![Page 16: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/16.jpg)
Case 1 (cont)
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
By the independence of the xj’s, p(x1,...,xn| θ)=∏jp(xj|θ).
Thus, if Mkl = #(transitions from k to l) in the training set, and Ek(b) = #(emissions of symbol b from state k) in the training set, we have:
( )
( , ) ( , )
1 ( | ) [ ( )],.., kl kM E bkkl
k l k b
np m e bx x 18
![Page 17: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/17.jpg)
Case 1 (cont)
( )
( , ) ( , )
[ ( )]kl kM E bkkl
k l k b
m e b
So we need to find mkl’s and ek(b)’s which maximize:
Subject to:
For all states , 1 and ( ) 1
[ , ( ) 0 ]
kl kl b
kl k
k m e b
m e b
19
![Page 18: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/18.jpg)
Case 1 (cont)
( )
( , ) ( , )
( )
[ ( )]
[ ( )][ ] [ ]
kl k
kl k
M E bkkl
k l k b
M E bkkl
k l k b
F m e b
m e b
kb
Subject to: for all , 1, and e ( ) 1.kll
k m b
Rewriting, we need to maximize:
20
![Page 19: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/19.jpg)
Case 1 (cont)
If we maximize for each : s.t. 1klMklkl
ll
k m m ( )and also [ ( )] s.t. ( ) 1kE b
k kbb
e b e b
Then we will maximize also F.Each of the above is a simpler ML problem, which is similar to ML parameters estimation for a die, treated next.
21
![Page 20: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/20.jpg)
ML Parameters Estimation for a Single Die
22
![Page 21: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/21.jpg)
Defining The Problem
Let X be a random variable with 6 values x1,…,x6
denoting the six outcomes of a (possibly unfair) die. Here the parameters are θ ={1,2,3,4,5, 6} , ∑θi=1Assume that the data is one sequence:
Data = (x6,x1,x1,x3,x2,x2,x3,x4,x5,x2,x6)So we have to maximize
2 3 2 21 2 3 4 5 6( | )P Data
Subject to: θ1+θ2+ θ3+ θ4+ θ5+ θ6=1 [and θi 0 ]
252 3 2
1 2 3 4 51
i.e., ( | ) 1 ii
P Data
23
![Page 22: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/22.jpg)
Side comment: Sufficient Statistics
To compute the probability of data in the die example we only require to record the number of times Ni falling on side i (namely,N1, N2,…,N6).
We do not need to recall the entire sequence of outcomes
{Ni | i=1…6} is called sufficient statistics for
the multinomial sampling.
654321
5
154321 1)|(N
i iNNNNNDataP
24
![Page 23: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/23.jpg)
Sufficient Statistics
A sufficient statistics is a function of the data that summarizes the relevant information for the likelihood
Formally, s(Data) is a sufficient statistics if for any two datasets D and D’
s(Data) = s(Data’ ) P(Data|) = P(Data’|)
Datasets
Statistics
Exercise:Define “sufficient statistics” for the HMM model.
25
![Page 24: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/24.jpg)
Maximum Likelihood EstimateBy the ML approach, we look for parameters that maximizes the probability of data (i.e., the likelihood function ).We will find the parameters by considering the corresponding log-likelihood function:
63 51 2 4
551 2 3 4 1
log ( | ) log 1N
N NN N Nii
P Data
5
16
5
11loglog
i ii ii NN
A necessary condition for (local) maximum is:
01
)|(log5
1
6 =-
-=¶
θ¶
å=i ij
j
j
NNDataP
qqq
26
![Page 25: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/25.jpg)
Finding the Maximum
Rearranging terms:
ii
jj N
N Divide the jth equation by the ith
equation:
Sum from j=1 to 6:
ii
ii
j j
jj N
N
N
N
6
16
1
1
So there is only one local – and hence global – maximum. Hence the MLE is given by:
6,..,1 iN
N ii
6
65
1
6
1 NNN
i ij
j
27
![Page 26: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/26.jpg)
Note: Fractional Exponents are possible
Some models allow ni’s to be fractions (eg, if we are uncertain of a die outcome, we may consider it “6” with 20% confidence and “5” with 80%). Our analysis didn’t assume that the ni are integers, thus it applies also for fractional exponents.
28
![Page 27: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/27.jpg)
Generalization for distribution with any number n of outcomes
Let X be a random variable with k values x1,…,xk denoting the k outcomes of Independently and Identically Distributed experiments, with parameters θ ={1,2,...,k} (θi is the probability of xi). Again, the data is one sequence of length n, in which xi appears ni times.
Then we have to maximize
1 211 2( | ) , ( ... )knn n
kkP Data n n n
Subject to: θ1+θ2+ ....+ θk=1
11
1
1 11
i.e., ( | ) 1k
k
nknn
iki
P Data
29
![Page 28: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/28.jpg)
Generalization for n outcomes (cont)
i k
i k
n n
By treatment identical to the die case, the maximum is obtained when for all i:
Hence the MLE is given by the relative frequencies:
1,..,ii
ni k
n
30
![Page 29: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/29.jpg)
ML for a Single Dice, Normalized Version
Consider the two experiments for a 3-sided dice:1. 8 tosses: 2 x1,, 3 x2, 5 x3.2. 800 tosses: 200 x1,, 300 x2, 500 x3
Clearly, both imply the same ML parameters.In general, when formulating ML for a single dice, we
can ignore the actual number n of tosses, and just use the fraction of each outcome.
31
![Page 30: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/30.jpg)
1 2
1 1
1
1 2
Given positive numbers ,..., s.t. ... 1
Find parameters ,..., which maximize:
( | ) .k
k k
k
pp pk
k p p p p
P Data
Thus we can replace the number of outcomes ni by pi=ni/n, and get the following normalized setting of the ML problem for a single dice:
And the same analysis yields that a maximum is obtained when:
1,..,i ip i k
Normalized version of ML (cont.)
32
![Page 31: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/31.jpg)
Implication:
The Kullback Leibler
Information inequality
33
![Page 32: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/32.jpg)
1 2
1
1
1 2
Let ( ,..., ) be a probability distribution over a -set.
For any other such distribution ( ,..., ),
let the likelihood ( | ) .
Then ( | ) is maximized only when
k
k
k
pp pk
P p p k
Q q q
P Data Q q q q
P Data Q P
.Q
We can rephrase the “ML for single dice” inequality:
Rephrasing the ML inequality
1
1
Let ( ,..., ) be a probability distribution over a -set.
For any other such distribution ( ,..., ), consider the sum
( ) log (data| ) log
Then has a unique maximum at .
k
k
i i
P p p k
Q q q
R Q P Q p q
R Q P
Taking logarithms, we get
34
![Page 33: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/33.jpg)
The Kullback-Leibler Information Inequality
1 1
Given probability distributions over a -set,
( ,..., ) and ( ,..., ).
The of and is defined by:
( || ) log
Then ( || ) 0, with equalit
relativ
y onl
e ent
y
r
when
opy k k
ii
i
k
P p p Q q q
P Q
pD P Q p
q
D P Q
.P Q
35
![Page 34: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/34.jpg)
Proof of the information inequality
By the logarithmic version of the "normalized maximum likelihood"
(2 slides back):
( || ) log log log 0,
and equality holds only when for all . QED
ii i i i i
i
i i
pD P Q p p p p q
q
p q i
36
![Page 35: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/35.jpg)
Using the Solution for the“Dice Maximum Likelihood” to Find Parameters for HMM
When all States are Known
37
![Page 36: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/36.jpg)
The Parameters
Let Mkl = #(transitions from k to l) in the training set.Ek(b) = #(emissions of symbol b from state k) in the training set. We need to:
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
( )
( , ) ( , )
Maximize [ ( )]kl kM E bkkl
k l k b
m e b
k Subject to: for all states , 1, and e ( ) 1, , ( ) 0.kl kl kl b
k m b m e b 38
![Page 37: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/37.jpg)
Apply to HMM (cont.)
We apply the previous technique to get for each k the parameters {mkl|l=1,..,m} and {ek(b)|bΣ}:
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
'' '
( ) , and ( )
( ')kl k
kl kkl kl b
M E bm e b
M E b
Which gives the optimal ML parameters
39
![Page 38: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/38.jpg)
Summary of Case 1: Sequences are fully known
We know the complete structure of each sequence in the training set {x1,...,xn}. We wish to estimate mkl and ek(b) for all pairs of states k, l and symbols b.
When everything is known, we can find the (unique set of) parameters θ* which maximizes
p(x1,...,xn| θ*) =MAXθ p(x1,...,xn| θ).
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
40
![Page 39: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/39.jpg)
Adding pseudo counts in HMM
We may modify the actual count by our prior knowledge/belief (e.g., when the sample set is too small) : rkl is our prior belief on transitions from k to l.rk(b) is our prior belief on emissions of b from state k.
s1 s2 sL-1 sL
X1 X2 XL-1 XL
si
Xi
' '' '
( ) ( )then , and ( )
( ) ( ( ') ( '))kl kl k k
kl kkl kl k kl b
M r E b r ba e b
M r E b r b
41
![Page 40: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/40.jpg)
Case 2: State Paths are Unknown.
Here we use
ML with Hidden Parameters
42
![Page 41: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/41.jpg)
Dice likelihood with hidden parameters
Let X be a random variable with 3 values 0,1,2. Hence the parameters are θ ={0,1,2} , ∑θi=1
Assume that the data is a sequence of 2 tosses which we don’t see, but we know the the sum of
the outcomes is 2.
The problem: Find parameters which maximize the likelihood (probability) of the observed data.
Basic fact: The probability of an event is the sum of the probabilities of the simple events it
contains.
The probability space here: all sequences of 2 tosses:
43
(0,0), (0,1), (0, 2), (1,0),..., (2, 2)
![Page 42: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/42.jpg)
Defining The Problem
44
21 0 2Pr(sum = 2| ) = Pr {(1,1), (2,0), (0,2)} 2 .
Thus, we need to find parameters θ which maximize:
Finding an optimal solution is in general a difficult task. Hence we do the following procedure:1. “Guess” initial parameters2. Repeatedly improve the parameters using the EM
algorithm (to be studied later in this course).Next, we exemplify the EM algorithm on the above example.
![Page 43: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/43.jpg)
E step: Average Counts:
45
Pr(sum = 2| ) 0.0625 2 0.125 0.3125.
We use the probabilities of the events to generate “average counts” of the outcomes:Average count of 0 is 2*0.125=0.25.Average count of 1 is 2*0.0625=0.125 .Average count of 2 is 2*0.125=0.25.
0 1 2
2
Assume our initial paramaters are:
0.5, 0.25
Pr(1,1) 0.25 0.0625
Pr(2,0) Pr(0,2) 0.5 0.25 0.125
![Page 44: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/44.jpg)
M step: Updating probabilities by the average counts
46
Pr(sum = 2 | ) 0.04 2 0.16 0.36 0.3125 Pr(sum = 2 | ) .
0 2
1
0.250.4 .
0.6250.125
0.2.0.625
The total of all average count is: 2*0.25+0.125=0.625.The new relative frequencies equal the new parameters, λ1, λ2, λ3 :
2
2
Pr(1,1) 0.2 0.04
Pr(2,0) Pr(0,2) 0.4 0.16
The probabilities of the simple events according to the new parameters:
The probability of the events by the new parameters:
![Page 45: HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the text](https://reader036.vdocuments.us/reader036/viewer/2022062313/56649d2d5503460f94a03ddf/html5/thumbnails/45.jpg)
Summary of the algorithm:
47
• Start with some estimated parameters θ.• Use these parameters to define average counts of the outcomes.• define new parameters λ by the relative frequencies of the average
counts.
We will show that this algorithm never decreases, and usually increases the likelihood of the data.
An application of this algorithm for HMM is known as the Baum Welch algorithm, which we will see next.