mnw2 course introduction to bioinformatics
DESCRIPTION
MNW2 course Introduction to Bioinformatics. Lecture 22: Markov models Centre for Integrative Bioinformatics FEW/FALW [email protected]. Problem in biology. Data and patterns are often not clear cut - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/1.jpg)
MNW2 course
Introduction to Bioinformatics
Lecture 22: Markov models
Centre for Integrative BioinformaticsFEW/FALW
![Page 2: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/2.jpg)
Problem in biology
• Data and patterns are often not clear cut• When we want to make a method to recognise a
pattern (e.g. a sequence motif), we have to learn from the data (e.g. maybe there are other differences between sequences that have the pattern and those that do not)
• This leads to Data mining and Machine learning
![Page 3: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/3.jpg)
Contents:
•Markov chain models (1st order, higher order andinhomogeneous models; parameter estimation; classification)
• Interpolated Markov models (and back-off models)
• Hidden Markov models (forward, backward and Baum-Welch algorithms; model topologies; applications to genefinding and protein family modeling
A widely used machine learning approach: Markov models
![Page 4: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/4.jpg)
![Page 5: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/5.jpg)
Markov Chain Models
• a Markov chain model is defined by:– a set of states
• some states emit symbols
• other states (e.g. the begin state) are silent
– a set of transitions with associated probabilities• the transitions emanating from a given state define a
distribution over the possible next states
![Page 6: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/6.jpg)
Markov Chain Models• given some sequence x of length L, we can ask how
probable the sequence is given our model• for any probabilistic model of sequences, we can
write this probability as
• key property of a (1st order) Markov chain: the probability of each Xi depends only on Xi-1
![Page 7: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/7.jpg)
Markov Chain Models
Pr(cggt) = Pr(c)Pr(g|c)Pr(g|g)Pr(t|g)
![Page 8: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/8.jpg)
Markov Chain ModelsCan also have an end state, allowing the model to represent:
• Sequences of different lengths
• Preferences for sequences ending with particular symbols
![Page 9: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/9.jpg)
Markov Chain Models
ii xxa1
)|Pr( 11 iixx xxa
ii
The transition parameters can be denoted by
where
Similarly we can denote the probability of a sequence x as
Where aBxi represents the transition from the begin state
![Page 10: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/10.jpg)
Example Application• CpG islands
– CGdinucleotides are rarer in eukaryotic genomes than expected given the independent probabilities of C, G
– but the regions upstream of genes are richer in CG dinucleotides than elsewhere – CpG islands
– useful evidence for finding genes
• Could predict CpG islands with Markov chains– one to represent CpG islands– one to represent the rest of the genome
Example includes using Maximum likelihood and Bayes’ statistical data and feeding it to a HM model
![Page 11: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/11.jpg)
Estimating the Model Parameters
• Given some data (e.g. a set of sequences from CpG islands), how can we determine the probability parameters of our model?
• One approach: maximum likelihood estimation– given a set of data D
– set the parameters to maximize
Pr(D | )– i.e. make the data D look likely under the model
![Page 12: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/12.jpg)
Maximum Likelihood Estimation• Suppose we want to estimate the parameters Pr(a), Pr(c),
Pr(g), Pr(t)
• And we’re given the sequences: accgcgctta gcttagtgactagccgttac
• Then the maximum likelihood estimates are:
Pr(a) = 6/30 = 0.2 Pr(g) = 7/30 = 0.233Pr(c) = 9/30 = 0.3 Pr(t) = 8/30 = 0.267
![Page 13: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/13.jpg)
![Page 14: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/14.jpg)
![Page 15: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/15.jpg)
![Page 16: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/16.jpg)
![Page 17: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/17.jpg)
These data are derived from genome sequences
![Page 18: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/18.jpg)
![Page 19: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/19.jpg)
![Page 20: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/20.jpg)
![Page 21: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/21.jpg)
Higher Order Markov Chains
• An nth order Markov chain over some alphabet is equivalent to a first order Markov chain over the alphabet of n-tuples
• Example: a 2nd order Markov model for DNA can be treated as a 1st order Markov model over alphabet:AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, and TT (i.e. all possible dipeptides)
![Page 22: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/22.jpg)
A Fifth Order Markov Chain
![Page 23: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/23.jpg)
Inhomogenous Markov Chains
• In the Markov chain models we have considered so far, the probabilities do not depend on where we are in a given sequence
• In an inhomogeneous Markov model, we can have different distributions at different positions in the sequence
• Consider modeling codons in protein coding regions
![Page 24: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/24.jpg)
Inhomogenous Markov Chains
![Page 25: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/25.jpg)
A Fifth Order InhomogenousMarkov Chain
![Page 26: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/26.jpg)
Selecting the Order of aMarkov Chain Model
• Higher order models remember more “history”• Additional history can have predictive value• Example:
– predict the next word in this sentence fragment “…finish __” (up, it, first, last, …?)
– now predict it given more history
• “Fast guys finish __”
![Page 27: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/27.jpg)
Selecting the Order of aMarkov Chain Model
• However, the number of parameters we need to estimate grows exponentially with the order– for modeling DNA we need parameters for an nth order model, with n 5 normally
• The higher the order, the less reliable we can expect our parameter estimates to be– estimating the parameters of a 2nd order homogenous Markov chain from the complete genome of E. Coli, we would see each word > 72,000 times on average– estimating the parameters of an 8th order chain, we would see each word ~ 5 times on average
![Page 28: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/28.jpg)
Interpolated Markov Models
• The IMM idea: manage this trade-off by interpolating among models of various orders
• Simple linear interpolation:
![Page 29: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/29.jpg)
Interpolated Markov Models
• We can make the weights depend on the history
– for a given order, we may have significantly more data to estimate some words than others
• General linear interpolation
![Page 30: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/30.jpg)
Gene Finding: Search by Content•Encoding a protein affects the statistical properties of a DNA sequence
– some amino acids are used more frequently than others (Leu more popular than Trp)– different numbers of codons for different amino
acids (Leu has 6, Trp has 1)– for a given amino acid, usually one codon is used
more frequently than others
•This is termed codon preference
•Codon preferences vary by species
![Page 31: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/31.jpg)
Codon Preference in E. Coli
AA codon /1000----------------------Gly GGG 1.89Gly GGA 0.44Gly GGU 52.99Gly GGC 34.55
Glu GAG 15.68Glu GAA 57.20
Asp GAU 21.63Asp GAC 43.26
![Page 32: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/32.jpg)
• Common way to search by content– build Markov models of coding & noncoding
regions– apply models to ORFs (Open Reading Frames) or fixed-sized windows of sequence
• GeneMark [Borodovsky et al.]– popular system for identifying genes in bacterial
genomes– uses 5th order inhomogenous Markov chain
models
Search by Content
![Page 33: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/33.jpg)
The GLIMMER System• Salzberg et al., 1998
• System for identifying genes in bacterial genomes
• Uses 8th order, inhomogeneous, interpolated Markov chain models
![Page 34: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/34.jpg)
IMMs in GLIMMER• How does GLIMMER determine the values?• First, let us express the IMM probability
calculation recursively:
![Page 35: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/35.jpg)
IMMs in GLIMMER
• If we haven’t seen xi-1… xi-n more than 400 times, then compare the counts for the following:
• Use a statistical test ( 2) to get a value d indicating our confidence that the distributions represented by the two sets of counts are different
![Page 36: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/36.jpg)
IMMs in GLIMMER
2 score when comparing nth-order with n-1th-order Markov model (preceding slide)
![Page 37: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/37.jpg)
The GLIMMER method
• 8th order IMM vs. 5th order Markov model
• Trained on 1168 genes (ORFs really)
• Tested on 1717 annotated (more or less known) genes
![Page 38: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/38.jpg)
![Page 39: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/39.jpg)
![Page 40: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/40.jpg)
Hidden Markov models (HMMs)
Given say a T in our input sequence, which state emitted it?
![Page 41: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/41.jpg)
Hidden Markov models (HMMs)
Hidden State
• We will distinguish between the observed parts of a problem and the hidden parts• In the Markov models we have considered previously, it is clear which state accounts for each part of the observed sequence • In the model above (preceding slide), there are multiple states that could account for each part of the observed sequence– this is the hidden part of the problem– states are decoupled from sequence symbols
![Page 42: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/42.jpg)
HMM-based homology searching
HMM for ungapped alignment…
Transition probabilities and Emission probabilities
Gapped HMMs also have insertion and deletion states (next slide)
![Page 43: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/43.jpg)
Profile HMM: m=match state, I-insert state, d=delete state; go from left to right. I and m states output amino acids; d states are ‘silent”.
d1 d2 d3 d4
I0 I2 I3 I4I1
m0 m1 m2 m3 m4 m5
Start End
Model for alignment with insertions and deletions
![Page 44: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/44.jpg)
HMM-based homology searching
• Most widely used HMM-based profile searching tools currently are SAM-T99 (Karplus et al., 1998) and HMMER2 (Eddy, 1998)
• formal probabilistic basis and consistent theory behind gap and insertion scores
• HMMs good for profile searches, bad for alignment (due to parametrisation of the models)
• HMMs are slow
![Page 45: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/45.jpg)
Homology-derived Secondary Structure of Proteins
(HSSP) Sander & Schneider, 1991
It’s all about trying to push “don’t know region” down…
![Page 46: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/46.jpg)
The Parameters of an HMM
![Page 47: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/47.jpg)
HMM for Eukaryotic Gene Finding
Figure from A. Krogh, An Introduction to Hidden Markov Models for Biological Sequences
![Page 48: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/48.jpg)
A Simple HMM
![Page 49: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/49.jpg)
Three Important Questions
• How likely is a given sequence?the Forward algorithm
• What is the most probable “path” for generating a given sequence?
the Viterbi algorithm
• • How can we learn the HMM parameters given a set of sequences?the Forward-Backward (Baum-Welch) algorithm
![Page 50: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/50.jpg)
How Likely is a Given Sequence?
• The probability that the path is taken and the sequence is generated:
• (assuming begin/end are the only silent states on path)
![Page 51: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/51.jpg)
How Likely is a Given Sequence?
![Page 52: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/52.jpg)
How Likely is a Given Sequence?
The probability over all paths is:
but the number of paths can be exponential in the length of the sequence...
• the Forward algorithm enables us to compute this efficiently
![Page 53: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/53.jpg)
How Likely is a Given Sequence:The Forward Algorithm
• Define fk(i) to be the probability of being in state k
• Having observed the first i characters of x we want to compute fN(L), the probability of being in the end state having observed all of x
• We can define this recursively
![Page 54: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/54.jpg)
How Likely is a Given Sequence:
![Page 55: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/55.jpg)
The forward algorithm• Initialisation:
f0(0) = 1 (start),
fk(0) = 0 (other silent states k)
• Recursion: fl(i) = el(i)k fk(i-1)akl (emitting states),
fl(i) = k fk(i)akl (silent states)
• Termination:
Pr(x) = Pr(x1…xL) = f N(L) = k fk(L)akN probability that we are in the end state and have observed the entire sequence
probability that we’re in start state and have observed 0 characters from the sequence
![Page 56: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/56.jpg)
Forward algorithm example
![Page 57: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/57.jpg)
Three Important Questions
• How likely is a given sequence?
• What is the most probable “path” for generating a given sequence?
• How can we learn the HMM parameters given a set of sequences?
![Page 58: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/58.jpg)
Finding the Most Probable Path:The Viterbi Algorithm
• Define vk(i) to be the probability of the most probable path accounting for the first i characters of x and ending in state k
• We want to compute vN(L), the probability of the most probable path accounting for all of the sequence and ending in the end state
• Can be defined recursively• Can use DP to find vN(L) efficiently
![Page 59: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/59.jpg)
Finding the Most Probable Path:The Viterbi Algorithm
Initialisation:
v0(0) = 1 (start), vk(0) = 0 (non-silent states)
Recursion for emitting states (i =1…L):
Recursion for silent states:
![Page 60: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/60.jpg)
Finding the Most Probable Path:The Viterbi Algorithm
![Page 61: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/61.jpg)
Three Important Questions
• How likely is a given sequence? (clustering)
• What is the most probable “path” for generating a given sequence? (alignment)
• How can we learn the HMM parameters given a set of sequences?
![Page 62: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/62.jpg)
The Learning Task• Given:
– a model– a set of sequences (the training set)
• Do:– find the most likely parameters to explain the training sequences
• The goal is find a model that generalizes well to sequences we haven’t seen before
![Page 63: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/63.jpg)
Learning Parameters
• If we know the state path for each training sequence, learning the model parameters is simple– no hidden state during training– count how often each parameter is used– normalize/smooth to get probabilities– process just like it was for Markov chain models
• If we don’t know the path for each training sequence, how can we determine the counts?– key insight: estimate the counts by considering every
path weighted by its probability
![Page 64: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/64.jpg)
Learning Parameters:The Baum-Welch Algorithm
• An EM (expectation maximization) approach, a forward-backward algorithm
• Algorithm sketch:– initialize parameters of model– iterate until convergence
• Calculate the expected number of times each transition or emission is used
• Adjust the parameters to maximize the likelihood of these expected values
![Page 65: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/65.jpg)
The Expectation step
![Page 66: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/66.jpg)
The Expectation step
![Page 67: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/67.jpg)
The Expectation step
![Page 68: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/68.jpg)
The Expectation step
![Page 69: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/69.jpg)
The Expectation step
• First, we need to know the probability of the i th symbol being produced by state q, given sequence x:
Pr( i = k | x)
•Given this we can compute our expected counts for state transitions, character emissions
![Page 70: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/70.jpg)
The Expectation step
![Page 71: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/71.jpg)
The Backward Algorithm
![Page 72: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/72.jpg)
The Expectation step
![Page 73: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/73.jpg)
The Expectation step
![Page 74: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/74.jpg)
The Expectation step
![Page 75: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/75.jpg)
The Maximization step
![Page 76: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/76.jpg)
The Maximization step
![Page 77: MNW2 course Introduction to Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062809/568157f2550346895dc56f34/html5/thumbnails/77.jpg)
The Baum-Welch Algorithm
• Initialize parameters of model
• Iterate until convergence– calculate the expected number of times each transition or emission is used– adjust the parameters to maximize the likelihood of these expected values
• This algorithm will converge to a local maximum (in the likelihood of the data given the model)
• Usually in a fairly small number of iterations