how to find foreign genes? markov models aaaa: 10% aaac: 15% aaag: 40% aaat: 35% aaa aac aag aat...
TRANSCRIPT
![Page 1: How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model](https://reader036.vdocuments.us/reader036/viewer/2022082816/56649d1f5503460f949f2b34/html5/thumbnails/1.jpg)
How to find foreign genes?Markov Models
AAAA: 10%
AAAC: 15%
AAAG: 40%
AAAT: 35%
AAAAACAAGAATACA . . .TTGTTT
TrainingSet
Building the model
![Page 2: How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model](https://reader036.vdocuments.us/reader036/viewer/2022082816/56649d1f5503460f949f2b34/html5/thumbnails/2.jpg)
How to find foreign genes?Markov Models
A C G TAAA 0.10 0.15 0.40 0.35AAC 0.25 0.45 0.25 0.05AAG 0.25 0.20 0.30 0.25 AAT 0.25 0.20 0.30 0.25 ACA 0.15 0.20 0.25 0.40 . . .TTG 0.20 0.50 0.05 0.25TTT 0.10 0.55 0.25 0.10
Candidategene
AAAACAA…
0.10
3rd order Markov model
![Page 3: How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model](https://reader036.vdocuments.us/reader036/viewer/2022082816/56649d1f5503460f949f2b34/html5/thumbnails/3.jpg)
Markov ChainsA traffic light considered as a sequence of states
A trivial Markov chain – the transition probability between the states is always 1
Pgy = 1
Pyr = 1
Prg = 1
![Page 4: How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model](https://reader036.vdocuments.us/reader036/viewer/2022082816/56649d1f5503460f949f2b34/html5/thumbnails/4.jpg)
If we watch our traffic light, it will emit a string of states
A traffic light considered as a sequence of states Markov Chains
In the case of a simple Markov model, the state labels (e.g. green, red, yellow)
are the observable outputs of the process
![Page 5: How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model](https://reader036.vdocuments.us/reader036/viewer/2022082816/56649d1f5503460f949f2b34/html5/thumbnails/5.jpg)
Markov ChainsAn occasionally malfunctioning traffic light!!
The Markov property is that the probability of observing next a given future state depends only on the current state!
Pgy = 1
Pyr = .9
Prg = .85
Pry = .15
Pyg = .10
![Page 6: How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model](https://reader036.vdocuments.us/reader036/viewer/2022082816/56649d1f5503460f949f2b34/html5/thumbnails/6.jpg)
Markov ChainsThe Markov Property
ast = P(xi = t | xi-1 = s)
English Translation:
The transition probability ast from state s to state t…
…is equal to the probability that the ith state was t..
given that
that the immediately proceeding state (xi-1) was s
This is a form of conditional probability
![Page 7: How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model](https://reader036.vdocuments.us/reader036/viewer/2022082816/56649d1f5503460f949f2b34/html5/thumbnails/7.jpg)
Markov Chain
Now we can consider the probability of an observed sequence!
An occasionally malfunctioning traffic light!!
![Page 8: How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model](https://reader036.vdocuments.us/reader036/viewer/2022082816/56649d1f5503460f949f2b34/html5/thumbnails/8.jpg)
Markov ChainsWhat is the probability of chain of events x?
P(x) = P(xL, xL-1, … ,x1)
English Translation:
The probability of observing sequence of states x...
...is equal to the probability that the XLth state was
whatever AND the XL-1th state was whatever else,
AND etc., etc.
This is a form of joint probability
![Page 9: How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model](https://reader036.vdocuments.us/reader036/viewer/2022082816/56649d1f5503460f949f2b34/html5/thumbnails/9.jpg)
Markov ChainsWhat is the probability of chain of events x?
P(x) = P(xL, xL-1, … ,x1)
= P(xL | xL-1, … ,x1) P(xL-1 | xL-2, … ,x1) ... P(x1)
This is because P(X,Y) = P(X|Y) * P(Y)
English Translation:
The probability of events X AND Y happening is equal to the probability of X happening given that Y has already
happened, times the probability of event Y
![Page 10: How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model](https://reader036.vdocuments.us/reader036/viewer/2022082816/56649d1f5503460f949f2b34/html5/thumbnails/10.jpg)
Markov ChainsWhat is the probability of chain of events x?
P(x) = P(xL | xL-1, … ,x1) P(xL-1 | xL-2, … ,x1) ... P(x1)
But remember the key property of a Markov Chain is that probability of symbol xi depends ONLY on
the value of preceding symbol Xi-1!! Therefore:
P(x) = P(xL | xL-1) P(xL-1 | xL-2) ... P(x2|x1) P(x1)
P(x) = P(x1) axi-1xi
L
i=2
![Page 11: How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model](https://reader036.vdocuments.us/reader036/viewer/2022082816/56649d1f5503460f949f2b34/html5/thumbnails/11.jpg)
Markov ChainsHow about nucleic acid sequences?
No reason why nucleic acid sequences found in an organism cannot be modeled using Markov chains
A C
G T
![Page 12: How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model](https://reader036.vdocuments.us/reader036/viewer/2022082816/56649d1f5503460f949f2b34/html5/thumbnails/12.jpg)
Markov ModelWhat do we need to probabilistically model DNA sequences?
A C
G T
States
Transition probabilities
The states are the same for all organisms, so the transition probabilities are the model parameters we need to estimate
![Page 13: How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model](https://reader036.vdocuments.us/reader036/viewer/2022082816/56649d1f5503460f949f2b34/html5/thumbnails/13.jpg)
Parameter estimation
AAAA: 10%
AAAC: 15%
AAAG: 40%
AAAT: 35%
AAAAACAAGAATACA . . .TTGTTT
TrainingSet
Building the Markov Model
This is a maximum likelihood approach to parameter estimation. Such procedures
maximize the overall probability of the training set data.
![Page 14: How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model](https://reader036.vdocuments.us/reader036/viewer/2022082816/56649d1f5503460f949f2b34/html5/thumbnails/14.jpg)
Markov ModelWhich model best explains a newly observed sequence?
A C
G T
Each organism will have different transition probabilities parameters, so you can ask “was the sequence more likely
to be generated by model A or model B?”
A C
G T
Organism A Organism B
![Page 15: How to find foreign genes? Markov Models AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAA AAC AAG AAT ACA... TTG TTT Training Set Building the model](https://reader036.vdocuments.us/reader036/viewer/2022082816/56649d1f5503460f949f2b34/html5/thumbnails/15.jpg)
P(x|model A)
P(x|model B)S(x) = log
A commonly used metric for discrimination usingMarkov Chains is the Log-Odds ratio
Markov ModelWhich model best explains a newly observed sequence?
i =1
L
aAxi-1xi
aBxi-1xi
log