hidden markov model parameter estimation bmi/cs 576 colin dewey fall 2015
DESCRIPTION
Learning without Hidden State learning is simple if we know the correct path for each sequence in our training set estimate parameters by counting the number of times each parameter is used across the training set 5 C A G T beginendTRANSCRIPT
![Page 1: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/1.jpg)
Hidden Markov Model Parameter Estimation
BMI/CS 576 www.biostat.wisc.edu/bmi576
Colin [email protected]
Fall 2015
![Page 2: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/2.jpg)
Three Important Questions• How likely is a given sequence?• What is the most probable “path” for generating a given
sequence?• How can we learn the HMM parameters given a set of
sequences?
![Page 3: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/3.jpg)
Learning without Hidden State• learning is simple if we know the correct path for each
sequence in our training set
• estimate parameters by counting the number of times each parameter is used across the training set
5
C A G T0 2 2 4 4
begin end0
4
3
2
1
5
![Page 4: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/4.jpg)
Learning with Hidden State
5
C A G T0
begin end0
4
3
2
1
5
? ? ? ?
• if we don’t know the correct path for each sequence in our training set, consider all possible paths for the sequence
• estimate parameters through a procedure that counts the expected number of times each transition and emission occurs across the training set
![Page 5: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/5.jpg)
Learning Parameters• if we know the state path for each training sequence,
learning the model parameters is simple– no hidden state during training– count how often each transition and emission occurs– normalize/smooth to get probabilities– process is just like it was for Markov chain models
• if we don’t know the path for each training sequence, how can we determine the counts?– key insight: estimate the counts by considering every
path weighted by its probability
![Page 6: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/6.jpg)
Learning Parameters: The Baum-Welch Algorithm
• a.k.a. the Forward-Backward algorithm• an Expectation Maximization (EM) algorithm
– EM is a family of algorithms for learning probabilistic models in problems that involve hidden state
– generally used to find maximum likelihood estimates for parameters of a model
• in this context, the hidden state is the path that explains each training sequence
![Page 7: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/7.jpg)
Learning Parameters: The Baum-Welch Algorithm
• algorithm sketch:– initialize the parameters of the model– iterate until convergence
• calculate the expected number of times each transition or emission is used
• adjust the parameters to maximize the likelihood of these expected values
![Page 8: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/8.jpg)
The Expectation Step• first, we need to know the probability of the i th symbol
being produced by state k, given sequence x
• given this we can compute our expected counts for state transitions, character emissions
![Page 9: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/9.jpg)
The Expectation Step• the probability of of producing x with the i th symbol being
produced by state k is
• the first term is , computed by the forward algorithm
• the second term is , computed by the backward algorithm
![Page 10: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/10.jpg)
The Expectation Step• we want to know the probability of producing sequence x
with the i th symbol being produced by state k (for all x, i and k)
C A G T
A 0.4C 0.1G 0.2T 0.3
A 0.4C 0.1G 0.1T 0.4
A 0.2C 0.3G 0.3T 0.2
begin end
0.5
0.5
0.2
0.8
0.4
0.6
0.1
0.90.2
0.8
0 5
4
3
2
1
A 0.1C 0.4G 0.4T 0.1
![Page 11: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/11.jpg)
The Expectation Step• the forward algorithm gives us , the probability of
being in state k having observed the first i characters of x
A 0.4C 0.1G 0.2T 0.3
A 0.2C 0.3G 0.3T 0.2
begin end
0.5
0.5
0.2
0.8
0.4
0.6
0.1
0.90.2
0.8
0 5
4
3
2
1
C A G T
A 0.4C 0.1G 0.1T 0.4
A 0.1C 0.4G 0.4T 0.1
![Page 12: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/12.jpg)
The Expectation Step• the backward algorithm gives us , the probability of
observing the rest of x, given that we’re in state k after i characters
A 0.4C 0.1G 0.2T 0.3
A 0.2C 0.3G 0.3T 0.2
begin end
0.5
0.5
0.2
0.8
0.4
0.6
0.1
0.90.2
0.8
0 5
4
3
2
1
A 0.1C 0.4G 0.4T 0.1
C A G T
A 0.4C 0.1G 0.1T 0.4
![Page 13: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/13.jpg)
The Expectation Step• putting forward and backward together, we can compute
the probability of producing sequence x with the i th symbol being produced by state k
A 0.4C 0.1G 0.2T 0.3
A 0.2C 0.3G 0.3T 0.2
begin end
0.5
0.5
0.2
0.8
0.4
0.6
0.1
0.90.2
0.8
0 5
4
3
2
1
C A G T
A 0.4C 0.1G 0.1T 0.4
A 0.1C 0.4G 0.4T 0.1
![Page 14: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/14.jpg)
The Backward Algorithm• initialization:
for states with a transition to end state
![Page 15: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/15.jpg)
The Backward Algorithm• recursion (i =L-1…0):
• An alternative to the forward algorithm for computing the probability of a sequence:
![Page 16: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/16.jpg)
The Expectation Step• now we can calculate the probability of the i th symbol
being produced by state k, given x
![Page 17: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/17.jpg)
• now we can calculate the expected number of times letter c is emitted by state k
• here we’ve added the superscript j to refer to a specific sequence in the training set
The Expectation Step
sum oversequences
sum over positionswhere c occurs in xj
![Page 18: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/18.jpg)
The Expectation Step
sum oversequences
sum over positionswhere c occurs in x
![Page 19: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/19.jpg)
The Expectation Step• and we can calculate the expected number of times that the
transition from k to l is used
• or if l is a silent state
![Page 20: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/20.jpg)
The Maximization Step• Let be the expected number of emissions of c from
state k for the training set• estimate new emission parameters by:
• just like in the simple case• but typically we’ll do some “smoothing” (e.g. add
pseudocounts)
![Page 21: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/21.jpg)
The Maximization Step
• let be the expected number of transitions from state k to state l for the training set
• estimate new transition parameters by:
![Page 22: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/22.jpg)
The Baum-Welch Algorithm
• initialize the parameters of the HMM• iterate until convergence
– initialize , with pseudocounts– E-step: for each training set sequence j = 1…n
• calculate values for sequence j• calculate values for sequence j• add the contribution of sequence j to ,
– M-step: update the HMM parameters using ,
![Page 23: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/23.jpg)
Baum-Welch Algorithm Example• given
– the HMM with the parameters initialized as shown– the training sequences TAG, ACG
A 0.1C 0.4G 0.4T 0.1
A 0.4C 0.1G 0.1T 0.4
begin end1.0
0.1
0.90.2
0.8
0 321
• we’ll work through one iteration of Baum-Welch
![Page 24: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/24.jpg)
Baum-Welch Example (Cont)• determining the forward values for TAG
• here we compute just the values that represent events with non-zero probability
• in a similar way, we also compute forward values for ACG
![Page 25: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/25.jpg)
Baum-Welch Example (Cont)• determining the backward values for TAG
• here we compute just the values that represent events with non-zero probability
• in a similar way, we also compute backward values for ACG
![Page 26: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/26.jpg)
Baum-Welch Example (Cont)• determining the expected emission counts for state 1
contributionof TAG
contributionof ACG pseudocount
*note that the forward/backward values in these two columns differ; in each column they are computed for the sequence associated with the column
![Page 27: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/27.jpg)
Baum-Welch Example (Cont)• determining the expected transition counts for state 1
(not using pseudocounts)
• in a similar way, we also determine the expected emission/transition counts for state 2
contributionof TAG
contributionof ACG
![Page 28: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/28.jpg)
Baum-Welch Example (Cont)• determining probabilities for state 1
![Page 29: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/29.jpg)
Baum-Welch Convergence• some convergence criteria
– likelihood of the training sequences changes little– fixed number of iterations reached– parameters are not changing significantly
• usually converges in a small number of iterations• will converge to a local maximum (in the likelihood
of the data given the model)
parameters
![Page 30: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/30.jpg)
Computational Complexity of HMM Algorithms
• given an HMM with N states and a sequence of length L, the time complexity of the Forward, Backward and Viterbi algorithms is
– this assumes that the states are densely interconnected• Given M sequences of length L, the time complexity of
Baum-Welch on each iteration is
![Page 31: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/31.jpg)
Learning and Prediction Tasks• learning
Given: a model, a set of training sequencesDo: find model parameters that explain the training sequences with
relatively high probability (goal is to find a model that generalizes well to sequences we haven’t seen before)
• classificationGiven: a set of models representing different sequence classes,
a test sequenceDo: determine which model/class best explains the sequence
• segmentationGiven: a model representing different sequence classes,
a test sequenceDo: segment the sequence into subsequences, predicting the class of each
subsequence
![Page 32: Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015](https://reader036.vdocuments.us/reader036/viewer/2022062911/5a4d1bdf7f8b9ab0599de847/html5/thumbnails/32.jpg)
Algorithms for Learning and Prediction Tasks
• learningcorrect path known for each training sequence simple maximum-
likelihood or Bayesian estimationcorrect path not known Baum-Welch (EM) algorithm for maximum
likelihood estimation
• classificationsimple Markov model calculate probability of sequence along single
path for each modelhidden Markov model Forward algorithm to calculate probability of
sequence along all paths for each model
• segmentationhidden Markov model Viterbi algorithm to find most probable path
for sequence