introduction to hidden markov modeling (hmm) · finding the optimal state sequence with viterbi 13...
TRANSCRIPT
![Page 1: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/1.jpg)
Introduction to Hidden Markov Modeling (HMM)
Daniel S. Terry
Scott Blanchard and Harel Weinstein labs
1
![Page 2: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/2.jpg)
HMM is useful for many, many problems.
2
Speech Recognitionand Translation
Weather Modeling Sequence Alignment
Financial Modeling
![Page 3: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/3.jpg)
![Page 4: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/4.jpg)
So let’s say you’re riding out nuclear war in a bunker…
To keep sane, you want to know what the weather outside is like…
…but all you can observe is if the security guard brings his umbrella.
4
?
![Page 5: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/5.jpg)
Probabilistic reasoning
5
P(Sunny|Umbrella)
XE
P(Cloudy|Umbrella) P(Rain|Umbrella)
P(Rain|No Umbrella)P(Cloudy|No Umbrella)P(Sunny|No Umbrella)
P(X|E) = probability of X happening if E is observed.
Ob
serv
atio
ns
Hidden State
![Page 6: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/6.jpg)
6
Probabilistic reasoning in stochastic processes
Time
Observations(“Emissions”)
HiddenState
X0
E0
X1
E1
X2
E2
X3
E3
X4
E4
…HiddenState
Observations(“Emissions”)
This is called a Markov chain
![Page 7: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/7.jpg)
7
Assumptions in Markov modeling
Assumption 1: This is a stationary process, specifically a first-order Markov Process:
P(Xt|Xt-1,Xt-2,Xt-3,…) = P(Xt|Xt-1)
…in other words, the current state depends only on the previous state.We call this the transition model.
Assumption 2: The current observations depends only on the current state:
P(Et|Xt,Xt-1,Xt-2,…,Et-1,Et-2,Et-3,…) = P(Et|Xt)
…in other words, the current observation depends only on the current state.We call this the observation (or emission) model.
X0
E0
X1
E1
X2
E2
X3
E3
X4
E4
…HiddenState
Observations(“Emissions”)
![Page 8: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/8.jpg)
The initial and transition probability models: π and A
8
X0
E0
X1
E1
…
Xt-1 P(Xt = Sunny)
P(Xt = Cloudy)
P(Xt = Raining)
Sunny 0.7 0.25 0.05
Cloudy 0.33 0.33 0.33
Raining 0.2 0.6 0.2
Encodes prior knowledge about weather trends.
π
Sunny 0.7
Cloudy 0.15
Raining 0.15
P(X)
P(Xt|Xt-1)
![Page 9: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/9.jpg)
The observation probability model: B
9
X0
E0
X1
E1
X2
E2
…HiddenState
Observations(“Emissions”)
Xt P(Et=Um.)
Sunny 0.05
Cloudy 0.10
Raining 0.85
Encodes prior knowledge about how likely people are to bring their umbrella depending on weather conditions.
![Page 10: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/10.jpg)
Together these parameters define a Markov model.
10
},,{ BA
Initial StateProbabilities
State TransitionProbabilities
ObservationDistributions
C RπC πR
aC,R
aC,C aR,R
aR,C
bC bR
![Page 11: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/11.jpg)
11
Predicting state sequences from observations
Observation Sequence (t=1..T)
Predicted Hidden State Sequence
X0
E0
X1
E1
X2
E2
X3
E3
…
Markov Chain
C RπC πR
aC,R
aC,C aR,R
aR,C
bC bR
Markov Model
![Page 12: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/12.jpg)
Finding the optimal state sequence with Viterbi
12
},,{ BA Given a model that describes the system ( ), we can determine the optimal state sequence (idealization) as follows:
S
C
R
S
C
R
S
C
R
S
C
R
For each state at time t, calculate probability of the state at time t (Xt) being a particular state xi (sunning, raining, etc), given observations and previous states:
P(Xt=xi|Et,Et-1,Et-2,…,Xt-1,Xt-2,Xt-3,…) = P(Xt=xi|Et, Xt-1=xj) = P(Xt=xi|Et) × P(Xt=xi|Xt=xj)
P(X0=xi) = π
X0 X1 X2 X3
Time
States …
![Page 13: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/13.jpg)
Finding the optimal state sequence with Viterbi
13
S
C
R
S
C
R
S
C
R
S
C
R
Repeat these calculations for all possible transitions recursively.Then at each point in time we have an estimate of how likely we are to be in a particular state at that time given all possible previous paths.We also keep track of the most likely state at each point in time.(This complex looking thing is called a trellis. Can you see why?)
X0 X1 X2 X3
Time
States …
![Page 14: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/14.jpg)
Finding the optimal state sequence with Viterbi
14
S
C
R
S
C
R
S
C
R
S
C
R
Find the most likely end state from the probabilities.We can then backtrack to find the most likely state sequence.You have seen a similar procedure with sequence alignment.
X0 X1 X2 X3
Time
States …
![Page 15: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/15.jpg)
15
Predicting state sequences from observations
Observation Sequence (t=1..T)
Predicted Hidden State Sequence
X0
E0
X1
E1
X2
E2
X3
E3
…
Markov Chain
C RπC πR
aC,R
aC,C aR,R
aR,C
bC bR
Markov Model
![Page 16: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/16.jpg)
16
A practical example of Markov modeling:
Analysis of single-molecule fluorescence trajectories
0 1 2 3 4 5
FR
ET
Time (min)
Flu
ore
sce
nce
…Ok, so I’m bored of talking about the weather.
![Page 17: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/17.jpg)
www.nia.NIH.gov, public domain.
Neurotransmitter release and reuptake is central toneuronal signaling and proper functioning of the brain.
NSSReuptake
![Page 18: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/18.jpg)
Neurotransmitter:Sodium Symporter (NSS) proteinsare the targets of many clinically-important drugs.
Drugs of Abuse
Therapeutic Inhibitors
www.nia.NIH.gov, public domain.
NSSReuptake
![Page 19: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/19.jpg)
Intracellular
Extracellular
Neurotransmitter
High Na+ Outside
Low Na+ Inside
Key Question: What are the specific conformational changesrequired for such a mechanism and how do they mediate transport?
A practical example of Markov modeling:Analysis of single-molecule fluorescence trajectories
![Page 20: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/20.jpg)
2 4 6 8 100.0
0.2
0.4
0.6
0.8
1.0
FR
ET
Distance (nm)
R0
DonorAcceptor
Single molecule FRET:A tool for examining conformational dynamics
20
![Page 21: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/21.jpg)
FRET imaging of single-molecules can be achieved usinga few tricks, including total internal reflection excitation.
DonorAcceptor
532 nm TIR Excitation
Surface
0 1 2 3 4 50.0
0.2
0.4
0.6
0.8
1.0
FR
ET
Time (min)
0
5
10
15
Flu
ore
sce
nce
21
![Page 22: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/22.jpg)
We want to know:1) How many distinct states are there?2) What are their FRET values?3) What are the rates?4) Most likely state at each point in time?
Co
nfo
rma
tion
0 2 4 6 8 10 12 14
FR
ET
Time (sec)
Flu
ore
sce
nce
HMM is a statistical framework for modeling a hidden systemusing a sequence of observations generated by that system.
Sequence of Hidden States
Sequence of Observations
22
X0
E0
X1
E1
X2
E2
…
Unlike with the weather, we have to learn the model form the data itself!!
![Page 23: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/23.jpg)
Hidden Markov models have three components:
1) Initial state probabilities:
CO ,
2) Transition probabilities:
CCOC
COOO
jiaa
aaaA
,,
,,
, }{
},,{ BA
O CπO πC
aO,C
aO,O aC,C
aC,O
bO bC
23
3) Observation probability distribution (OPD):
0.4 0.5 0.6 0.7
FRET
μi
σi
2
2
2
)(exp
2
1)(
i
it
i
ti
EEbB
FRET distributionfor state i.
![Page 24: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/24.jpg)
Goal: best model to explain the experimental data.
)|(argmaxˆ EP
In other words, we want to maximize the probability of the model giving the data.
(where λ is the model, E is the observed FRET trajectory)
)(
)()|()|(
EP
PEPEP
But we don’t know how to calculate P( λ | E ) !
Instead, turn it around using Bayes’ theorem:
)|(argmax)|(argmaxˆ EPEP
The prior probability P(E) is independent of the model choice andwill not affect model ranking. If we assume all models are equally likely, then:
24
P( E | λ ) is easy to calculate – it is the observation distribution.Why is X not here? We have to do this over all possible state sequences!
![Page 25: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/25.jpg)
0 1 2 3 4 50.0
0.2
0.4
0.6
0.8
1.0
FR
ET
Time (min)
Segmental k-means (SKM): optimization on the cheap
25
State assignment(Viterbi)
Parameterreestimation
• To get B, simply calculate the mean and std for each state from the current assignment.
• To get A, count the number of transitions of each type and normalize.
• To get π, count the number of times each dwell starts with each state xi and normalize.
F. Qin (2004), Biophys J 86: 1488
λ0
λi
Works only if the starting model that is close to final.
![Page 26: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/26.jpg)
Model optimization: expectation maximization (EM).
Expectation: Calculate the probability of data given the model (expectation).
Maximization: Adjust model parameters to better fit the calculated probabilities.
Termination: Iterate until log-likelihood converges (e.g., ΔLL<10-4).
26
)]|()|(log[)](log[..0
1 tt
Tt
tt EXPXXPXPLL
Tt
tttt EXPXXPXPEP..0
1 )|()|()()|(
Initial (π) Transition (A) Observation (B)
Restarts: if the likelihood “landscape” is very frustrated, restarting from a random initial model can help get out of local minima.
![Page 27: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/27.jpg)
27
The forward-backward algorithm (Baum Welch)
X0
E0
X1
E1
X2
E2
X3
E3
X96
E96
X97
E97
X98
E98
X99
E99
…
Calculating the probabilities at a particular point in time (t):
The “past” The “future”
),|()|( ..1..1..1 TtttTt EEXPEXP )|()|( ..1..1 Ttttt EXPEXP α
We can do this because of Bayes’ rule and conditional independence of observations over time…
We calculate these much like we did with Viterbi…
Forward Backward
![Page 28: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/28.jpg)
28
The forward algorithm
Partial probabilities (α) are calculated recursively as:
αt(j) = P(observation|hidden state is j) × P(all paths to state j at time t)
Initial condition: α0(j) = π(j)∙B(j,Et)
Iterate:
Then the total probability of the sequence is the sum of these α’s…
O
C
O
C
O
C
O
C
X0 X1 X2 X3
Time
States …
jit
n
i
t aij ,
1
t1 )(Ej,B )(
![Page 29: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/29.jpg)
Maximization using forward-backward probabilities
Probability of being in state i at time t:
Probability of transitioning from state i to j at time t:(from the Forward-Backward algorithm)
Model parameters adjusted to maximize log-likelihood:
29
This very much like SKM, except we use explicit probabilities instead of just counting.
![Page 30: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/30.jpg)
The problem of bias
30
• You can always get a better fit using more parameters!But it may not be a good model.
• Bayesian information criterion (BIC):
-2∙ln* P(E|k) + ≈ BIC = -2∙ln(LL) + k∙ln(n)
k is the number of free parameters, LL is log-likelihood of the optimal “fit”, and n is the number of data points.
• Akike information (AIC)AIC = -2∙k - 2∙ln(LL)
• Maximum evidence methods (vbFRET), etc.
![Page 31: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/31.jpg)
We want to know:1) How many distinct states are there?2) What are their FRET values?3) What are the rates?4) Most likely state at each point in time?
Co
nfo
rma
tion
0 2 4 6 8 10 12 14
FR
ET
Time (sec)
Flu
ore
sce
nce
HMM is a statistical framework for modeling a hidden systemusing a sequence of observations generated by that system.
Sequence of Hidden States
Sequence of Observations
31
X0
E0
X1
E1
X2
E2
…
![Page 32: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/32.jpg)
0 1 2 3 4 50.0
0.2
0.4
0.6
0.8
1.0
Time (min)
FR
ET
+2 mM Ala2 mM Na+:
Quantifying kinetics is then useful for understandinghow outside factors (ligands) influence dynamics.
-1 0 1 2 3 40
10
20
30
Open State
Closed State
Dw
ell
Tim
e (
s)
log [Ala] (M)
-1 0 1 2 3 4
20
40
60
80
Occup
an
cy (
%)
log [Ala] (M)
Zhao and Terry, et al (2011), Nature 474
![Page 33: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/33.jpg)
Other important examples of Markov modeling:
• Single-channel recordings (patch clamp)
• Sequence analysis
• Cardiac electrical modeling
• Systems modeling of metabolic networks
33
C
O
![Page 34: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/34.jpg)
We can do non-equilibrium Markov modeling, too
34Geggier et al (2010), JMB 399: 576
![Page 35: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/35.jpg)
HMM is useful for many, many problems.
35
Speech Recognitionand Translation
Weather Modeling Sequence Alignment
Financial Modeling
![Page 36: Introduction to Hidden Markov Modeling (HMM) · Finding the optimal state sequence with Viterbi 13 S C R S C R S C R S C R Repeat these calculations for all possible transitions recursively](https://reader030.vdocuments.us/reader030/viewer/2022041012/5ebf6564c674d44d7e56dcaa/html5/thumbnails/36.jpg)
Some useful references
• Artificial Intelligence: A Modern Approach
• http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html
• Rabiner (1989), Proc. of the IEEE 77: 257.
• Qin F. Principles of single-channel kinetic analysis. Methods Mol Biol. 2007; 403.
• Bronson et al (2009), Biophys J 97: 3196.
• QuB software suite: www.qub.buffalo.edu
36