Download - Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions
Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions
Jeremy M. Brown Robert C. Thomson
@jembrown www.phyleauxgenetics.org
Markov Chain Monte Carlo (MCMC)
Tree,Parameter Space
Pro
babi
lity
Den
sity
1) Start somewhere 2) Propose a new position 3) Calculate posterior density
ratio (r) of new to old states - If r > 1, accept - If r < 1, accept with
probability r. 4) Record state. 5) Repeat many times.
Yes!Maybe
Tree,Parameter Space
Pro
babi
lity
Den
sity
Tree,Parameter Space
Pro
babi
lity
Den
sity
MCMC Has Trouble With Rugged Distributions
Tree,Parameter Space
Pro
babi
lity
Den
sity
MCMC Has Trouble With Rugged Distributions
Tree,Parameter Space
Pro
babi
lity
Den
sity
Bipartition Bayes Factors
A
B
C
E
D
Marginal likelihood with AB | CDE
Bayes Factor
Marginal likelihood without AB | CDE + -
Negative Constraints = Rugged Distributions
homo_sapiens
pantherophis_guttata
zebra_finch
anolis_carolinensis
gallus_gallus
alligator_mississippiensis
crocodylus_porosus
pelomedusa_subrufa
sphenodon_tuatara
chrysemys_picta
homo_sapiens
chrysemys_picta
sphenodon_tuatara
zebra_finch
anolis_carolinensis
gallus_gallus
alligator_mississippiensis
pantherophis_guttata
pelomedusa_subrufa
crocodylus_porosus
zebra_finchhomo_sapiens
crocodylus_porosus
sphenodon_tuatara
pantherophis_guttata
chrysemys_picta
alligator_mississippiensis
gallus_gallus
anolis_carolinensis
pelomedusa_subrufa
Alternative Insertion Swaps are Difficult
homo_sapiens
pantherophis_guttata
zebra_finch
anolis_carolinensis
gallus_gallus
alligator_mississippiensis
crocodylus_porosus
pelomedusa_subrufa
sphenodon_tuatara
chrysemys_picta zebra_finchhomo_sapiens
crocodylus_porosus
sphenodon_tuatara
pantherophis_guttata
chrysemys_picta
alligator_mississippiensis
gallus_gallus
anolis_carolinensis
pelomedusa_subrufa
Data
Data
The Po-Boy Problem
How do you change the seafood on your po-boy while someone’s holding the sandwich?
Shrimp
Oysters
Halves of french roll = Naturally monophyletic taxa
Seafood = Inserted taxon
Metropolis Coupling (MC3) Improves Mixing
Tree,Parameter Space
Pro
babi
lity
Den
sity Additional heated chains
can act as “scouts”.
Swap?
Peaks All Found, But Different Probabilities?
homo_sapiens
chrysemys_picta
sphenodon_tuatara
zebra_finch
anolis_carolinensis
gallus_gallus
alligator_mississippiensis
pantherophis_guttata
pelomedusa_subrufa
crocodylus_porosus
homo_sapiens
pantherophis_guttata
zebra_finch
anolis_carolinensis
gallus_gallus
alligator_mississippiensis
crocodylus_porosus
pelomedusa_subrufa
sphenodon_tuatara
chrysemys_pictazebra_finchhomo_sapiens
crocodylus_porosus
sphenodon_tuatara
pantherophis_guttata
chrysemys_picta
alligator_mississippiensis
gallus_gallus
anolis_carolinensis
pelomedusa_subrufa0.500.25
0.240.38
0.250.24
Run 1Run 2
GenerationLn
L
A Closer Look at the Acceptance Ratio
Does chain i like where chain j is?
Does chain j like where chain i is?
r =pi(⌧j , ✓j |D) pj(⌧i, ✓i|D)
pi(⌧i, ✓i|D) pj(⌧j , ✓j |D)
A Closer Look at the Acceptance Ratio
r =pi(⌧j , ✓j |D) pj(⌧i, ✓i|D)
pi(⌧i, ✓i|D) pj(⌧j , ✓j |D)
r =
p(⌧j , ✓j |D)
p(⌧i, ✓i|D)
� 1Ti
� 1Tj
A Closer Look at the Acceptance Ratio
r =pi(⌧j , ✓j |D) pj(⌧i, ✓i|D)
pi(⌧i, ✓i|D) pj(⌧j , ✓j |D)
r =
p(⌧j , ✓j |D)
p(⌧i, ✓i|D)
� 1Ti
� 1Tj
When temps equal, ALL swaps accepted regardless of posterior density.
A Simple One-Parameter Example
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
Parameter Value
Pro
babi
lity
Den
sity
0.8
0.2
https://github.com/jembrown/toyMC3/
Max Temp > Number of Chains
2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
Maximum Temperature
Peak O
ne P
robability
5 Chains
10 Chains
20 Chains
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
Parameter Value
Pro
bability D
ensity
0.8
0.2
Peaks Have Different “Capture” Probabilities
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
Parameter Value
Pro
babi
lity
Den
sity
0.8
0.2
P=0.8 P=0.2
Spurious Convergence by Chain Number
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
Parameter Value
Pro
babi
lity
Den
sity
0.8
0.2
P=0.8 P=0.2
When two runs end up with the same distribution
of poorly mixing chains across peaks,
they will estimate nearly identical (but incorrect!)
probabilities.
Lots of Chains Looks Like Convergence
2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
Maximum Temperature
Peak O
ne P
robability/S
tandard
Devia
tion
5 Chains
10 Chains
20 Chains
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
Parameter Value
Pro
ba
bility D
en
sity
0.8
0.2
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
Parameter Value
Pro
babi
lity
Den
sity
0.8
0.2
Peak One0.8 * N
Peak Two0.2 * N
P=0.8 P=0.2
N (large #) Chains
Law of Large Numbers
Lots of Chains Looks Like Convergence
Negative Constraint on Bird Monophyly
zebra_finchhomo_sapiens
crocodylus_porosus
sphenodon_tuatara
pantherophis_guttata
chrysemys_picta
alligator_mississippiensis
gallus_gallus
anolis_carolinensis
pelomedusa_subrufa
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.2
0.4
0.6
0.8
1.0
Maximum Temperature
Pro
babi
lity
2 Chains4 Chains8 Chains16 Chains32 Chains
Negative Constraint on Bird Monophyly
zebra_finchhomo_sapiens
crocodylus_porosus
sphenodon_tuatara
pantherophis_guttata
chrysemys_picta
alligator_mississippiensis
gallus_gallus
anolis_carolinensis
pelomedusa_subrufa
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.2
0.4
0.6
0.8
1.0
Maximum Temperature
Pro
babi
lity/
Sta
ndar
d D
evia
tion 2 Chains
4 Chains8 Chains16 Chains32 Chains
Warnings
• Despite improving mixing, MC3 analyses still require careful thought.
• With small numbers of chains and small numbers of runs, estimated probabilities can be incorrect but identical across some runs.
• With large numbers of chains, estimated probabilities become increasingly similar across all runs.
Recommendations
• For rugged distributions, increase maximum chain temperature not chain number
• For broad distributions, increase chain number
• Use more than 2 runs
Negative Constraints = Rugged Distributions
TreeScaper
Guifang Zhou (SSB symposium lightning talk) - Monday, 1:45-1:50 - Ballroom A "A network framework to explore phylogenetic structure in genome data"
Guifang Zhou (iEvoBio talk) - Tuesday, 2:05-2:12 - Meeting Room 9C"TreeScaper: Software to visualize and extract phylogenetic signals from sets of trees”
https://github.com/whuang08/TreeScaper