Download - Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU
![Page 1: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/1.jpg)
Lab3: Lab3: Bayesian phylogenetic Bayesian phylogenetic Inference and MCMCInference and MCMC
Department of Department of Bioinformatics & Bioinformatics & Biostatistics, SJTUBiostatistics, SJTU
![Page 2: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/2.jpg)
Topics
Phylogenetics Bayesian inference and MCMC: overview Bayesian model testing MrBayesian tutorial and application
– Nexus file– Configuration of the process– How to execute the process– analyzing the results
![Page 3: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/3.jpg)
Phylogenetics
Greek: phylum + genesis
Broad definition: historical term, how the species evolve and fall
Narrow definition: infer relationship of the extant
We prefer the narrow one
![Page 4: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/4.jpg)
Infer relationships among three species:
Outgroup:
![Page 5: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/5.jpg)
Three possible trees (topologies):
A
B
C
![Page 6: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/6.jpg)
A B C
Prior distribution
prob
abili
ty 1.0
Posterior distribution
prob
abili
ty 1.0
Data (observations)
![Page 7: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/7.jpg)
What is needed for inference?
A probabilistic model of evolution Prior distribution on the parameters of
the model Data A method for calculating the posterior
distribution for the model, prior distribution and data
![Page 8: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/8.jpg)
What is needed for inference?
A probabilistic model of evolution Prior distribution on the parameters of
the model Data A method for calculating the posterior
distribution for the model, prior distribution and data
![Page 9: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/9.jpg)
Model: topology + branch lengths
Parameters
topology )(branch lengths )( iv
A
B
3vC
D
2v
1v4v
5v (expected amount of change)
),( v
![Page 10: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/10.jpg)
Model: molecular evolution
Parameters
instantaneous rate matrix(Jukes-Cantor)
111][
111][
111][
111][
][][][][
T
G
C
A
TGCA
Q
![Page 11: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/11.jpg)
What is needed for inference?
A probabilistic model of evolution Prior distribution on the parameters of
the model Data A method for calculating the posterior
distribution for the model, prior distribution and data
![Page 12: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/12.jpg)
Priors on parameters
Topology– All unique topologies have equal probabilities
Branch lengths– Exponential prior puts more weight on small branch
lengths; appr. uniform on transition probabilities
![Page 13: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/13.jpg)
What is needed for inference?
A probabilistic model of evolution Prior distribution on the parameters of
the model Data A method for calculating the posterior
distribution for the model, prior distribution and data
![Page 14: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/14.jpg)
Data
X The data (alignment)
Taxon Characters
A ACG TTA TTA AAT TGT CCT CTT TTC AGA
B ACG TGT TTC GAT CGT CCT CTT TTC AGA
C ACG TGT TTA GAC CGA CCT CGG TTA AGG
D ACA GGA TTA GAT CGT CCG CTT TTC AGA
![Page 15: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/15.jpg)
What is needed for inference?
A probabilistic model of evolution Prior distribution on the parameters of
the model Data A method for calculating the posterior
distribution for the model, prior distribution and data
![Page 16: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/16.jpg)
Bayes’ Theorem
dXlp
XlpXf
)|()(
)|()()|(
Posteriordistribution
Prior distribution Likelihood function
Normalizing Constant
![Page 17: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/17.jpg)
tree 1 tree 2 tree 3
)|( Xf
Posterior probability distribution
Parameter space (high-dimension 1d)
Post
eri
or
pro
bab
ility
![Page 18: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/18.jpg)
tree 1 tree 2 tree 3
20% 48% 32%
We can focus on any parameter of interest (there are no nuisance parameters) by marginalizing the posterior over the other parameters (integrating out the uncertainty in the other parameters)
(Percentages denote marginal probability distribution on trees)
![Page 19: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/19.jpg)
32.048.020.0
38.014.019.005.0
33.006.022.005.0
29.012.007.010.0
3
2
1
321
joint probabilities
marginal probabilities
Marginal probabilities
trees
bra
nch
len
gth
vect
ors
![Page 20: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/20.jpg)
How to estimate the posterior?
Analytical calculation? Impossible!!! except for very simple
examples
Random sampling of parameter space? Impossible too!!! computational infeasible
Dependent sampling using MCMC technique? Yes, you got it!
![Page 21: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/21.jpg)
Metropolis-Hastings Sampling
* * *
*
( ) ( | ) ( | )min 1, * *
( ) ( | ) ( | )
p l X qr
p l X q
Assume that the current state has parameter values
Consider a move to a state with parameter values according to proposal density q
Accept the move with probability
(prior ratio x likelihood ratio x proposal ratio)
![Page 22: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/22.jpg)
Sampling Principles
For a complex model, you typically have many “proposal” or “update” mechanisms (“moves”)
Each mechanism changes one or a few parameters
At each step (generation of the chain) one mechanism is chosen randomly according to some predetermined probability distribution
It makes sense to try changing ‘more difficult’ parameters (such as topology in a phylogenetic analysis) more often
![Page 23: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/23.jpg)
Analysis ofAnalysis of 85 insect taxa 85 insect taxa based on 18S rDNAbased on 18S rDNA
Application example
![Page 24: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/24.jpg)
Model parameters 1
General Time Reversible (GTR)substitution model
GTGCTCATA
GTTCGCAGA
CTTCGGACA
ATTAGGACC
rrr
rrr
rrr
rrr
Q
A
B
3vC
D
2v
1v4v
5v
topology branch lengths
n ,...,, 21
![Page 25: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/25.jpg)
Model parameters 2
Gamma-shaped rate variation across sites
![Page 26: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/26.jpg)
Priors on parameters
Topology– all unique topologies have equal probability
Branch lengths– exponential prior (exp(10) means that expected
mean is 0.1 (1/10)) State Frequencies
– Dirichlet prior: Dir(1,1,1,1) Rates (revmat)
– Dirichlet prior: Dir(1,1,1,1,1,1) Shape of gamma-distribution of rates
– Uniform prior: Uni(0,100)
( , , , )A C G T
( , , , , , )AC AG AT CG CT GTr r r r r r
![Page 27: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/27.jpg)
![Page 28: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/28.jpg)
burn-in
stationary phase sampled with thinning(rapid mixing essential)
![Page 29: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/29.jpg)
Majority rule consensus tree from sampled treesFrequencies represent the posterior probability of the clades
Probability of clade being true given data, model, and prior(and given that the MCMC sample is OK)
![Page 30: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/30.jpg)
Mean and 95% credibility interval for model parameters
![Page 31: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/31.jpg)
MrBayes tutorial
Introduction/examples
![Page 32: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/32.jpg)
Nexus format input file
Input: nexus format; accurately, nexus(ish)
![Page 33: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/33.jpg)
…
…
![Page 34: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/34.jpg)
![Page 35: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/35.jpg)
![Page 36: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/36.jpg)
Running MrBayes
① Use execute to bring data in a Nexus file into MrBayes
② Set the model and priors using lset and prset③ Run the chain using mcmc
④ Summarize the parameter samples using sump⑤ Summarize the tree samples using sumt
Note that MrBayes 3.1 runs two independent analyses by default
![Page 37: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/37.jpg)
Convergence Diagnostics
By default performs two independent analyses starting from different random trees (mcmc nruns=2)
Average standard deviation of clade frequencies calculated and presented during the run (mcmc mcmcdiagn=yes diagnfreq=1000) and written to file (.mcmc)
Standard deviation of each clade frequency and potential scale reduction for branch lengths calculated with sumt
Potential scale reduction calculated for all substitution model parameters with sump
![Page 38: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/38.jpg)
Bayes’ theorem
)(
)|()(
)|()(
)|()()|(
Xf
Xff
dXff
XffXf
Marginal likelihood (of the model)
)|(
),|()|(),|(
MXf
MXfMfMXf
We have implicitly conditioned on a model:
![Page 39: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/39.jpg)
Bayesian Model Choice
00
11
X|MfMf
X|MfMf
Posterior model odds:
0
110 X|Mf
X|MfB
Bayes factor:
![Page 40: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/40.jpg)
Bayesian Model Choice
The normalizing constant in Bayes’ theorem, the marginal probability of the model, f(X) or f(X|M), can be used for model choice
f(X|M) can be estimated by taking the harmonic mean of the likelihood values from the MCMC run (MrBayes will do this automatically with ‘sump’)
Any models can be compared: nested, non-nested, data-derived No correction for number of parameters Can prefer a simpler model over a more complex mode
![Page 41: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/41.jpg)
Bayes Factor Comparisons
Interpretation of the Bayes factor
2ln(B10) B10 Evidence against M0
0 to 2 1 to 3Not worth more than a bare mention
2 to 6 3 to 20 Positive
6 to 10 20 to 150 Strong
> 10 > 150 Very strong
![Page 42: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/42.jpg)
![Page 43: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/43.jpg)
![Page 44: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/44.jpg)
![Page 45: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/45.jpg)
![Page 46: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/46.jpg)
![Page 47: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/47.jpg)
![Page 48: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/48.jpg)
![Page 49: Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU](https://reader031.vdocuments.us/reader031/viewer/2022012922/56649f005503460f94c161c4/html5/thumbnails/49.jpg)