svp 2012 talk: time-scaling trees in the fossil record

Post on 11-May-2015

162 Views

Category:

Spiritual

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

A talk I gave on my time-scaling method at SVP 2012 in Raleigh.

TRANSCRIPT

TIME-SCALING TREES IN THE FOSSIL RECORD

David Bapst

University of Chicago

Geophysical Sciences

(Pretty figure taken from Ted Garland’s website)

Phylogeny in the Fossil Record

G

H

E

K

F

J I

D

C Tim

e

B

L

A

Sampling in the Fossil Record

G

H

Sampling

Event

E

K

F

J I

D

C Tim

e

B

L

A

What We Have

A H K G E I D

A

K

I G

H D

E

Tim

e

What We Want

A H K G E I D

A

K

I G

H D

E

Tim

e A

K

I G

H D

E

Tim

e

Original

Of Time and Trees: ‘Basic’ Method

• Clades are as old as earliest observed descendent

A

K

I G

H D

E

A

K

I G

H D

E

Tim

e

Original Basic Method

Smith, 1994

Of Time and Trees: ‘Basic’ Method

• Creates zero-length branches (…nuisance)

• Common fix: Extend branches a small amount

• No measurement of the uncertainty involved

A

K

I G

H D

E

A

K

I G

H D

E

Tim

e

Nodes Separated

by Zero-Length

Branches

Original Basic Method

Dealing with Uncertainty: Stochastic Analyses

• Example: Dealing with Discrete Intervals

K

D

Tim

e In

terv

als

t.1

t.2

t.3

t.4

t.5

t.6

A

I G

H E

K

D

A

I G H E

Lloyd et al., 2012

Dealing with Uncertainty: Stochastic Analyses

• Example: Dealing with Discrete Intervals

K

D

Tim

e In

terv

als

t.1

t.2

t.3

t.4

t.5

t.6

A

I G H E

K

D

A

I G

H E

Randomly Drop

FADs and LADs

Lloyd et al., 2012

Dealing with Uncertainty: Stochastic Analyses

• Example: Dealing with Discrete Intervals

K

D

A

I G

H E

K

D

Tim

e In

terv

als

t.1

t.2

t.3

t.4

t.5

t.6

A

I G H E

K

D

A

I G

H

E

Repeat!

K

D

A

I G H

E

Lloyd et al., 2012

Stochastic Time-Scaling

• Randomly select new node ages across a cladogram

– Give lower bounds by starting with root (with loose lower bound) and work up, node by node

• Rinse, repeat many times to produce a large sample of trees

?

Tim

e

Stochastic Time-Scaling: Extensions

• Stochastically infer ancestor-descendant relationships by allowing node ages to occur after the earliest taxon appears

Tim

e

• Stochastically resolve soft polytomies by iteratively placing lineages over multiple steps

C

B

A

??

B

A

C

B

A

C

B

A

Tim

e

Stochastic Time-Scaling: Extensions

Stochastic Time-Scaling

• Expect more uncertainty in poorly-sampled fossil records

• Weight selection of node ages via probability model of the unobserved evolutionary history at a node

Tim

e

Pr(Σ gaps)

A Probabilistic Model of Gaps

• Total minimum unobserved evolutionary history is dependent on sampling rates

– Can obtain via methods such as the freqRat

Tim

e

A Probabilistic Model of Gaps

• But also dependent on diversification: branching and extinction rates

– Unsampled ‘twigs’ matter!

• Node ages need to be calibrated with three rates

– Cal3 time-scaling method

– Probability of unobserved twigs derived with Matt Pennell and Emily King

Foote et al., 1999; Friedman and Brazeau, 2011

Tim

e

A Probabilistic Model of Gaps

• But also dependent on diversification: branching and extinction rates

– Unsampled ‘twigs’ matter!

• Node ages need to be calibrated with three rates

– Cal3 time-scaling method

– Probability of unobserved twigs derived with Matt Pennell and Emily King

Foote et al., 1999; Friedman and Brazeau, 2011

Tim

e

But how good is the time-scaling?

A Probabilistic Model of Gaps

• But also dependent on diversification: branching and extinction rates

– Unsampled ‘twigs’ matter!

• Node ages need to be calibrated with three rates

– Cal3 time-scaling method

– Probability of unobserved twigs derived with Matt Pennell and Emily King

Foote et al., 1999; Friedman and Brazeau, 2011

Tim

e

Let’s do some simulations to

find out!

So, How Good is the Time-Scaling?

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

Median of Median Error in Per-Node Ages

100 Simulation Runs

Samples of 20 Trees

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Disc. Intervals = 5 tu

Rates est. via ML Using R library paleotree (Bapst, 2012; MEE)

Squared-Error: Cal3 has More Error

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

Median of Median Squared-Error

in Per-Node Ages

Using R library paleotree (Bapst, 2012)

100 Simulation Runs

Samples of 20 Trees

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Disc. Intervals = 5 tu

Rates est. via ML

Better Estimator of Uncertainty?

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

Proportion of True Node-Ages

within 95% Age Quantiles

Using R library paleotree (Bapst, 2012)

100 Simulation Runs

Samples of 20 Trees

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Disc. Intervals = 5 tu

Rates est. via ML

Similar Patterns with Terminal Branch Lengths

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

Proportion within 95% Age

Quantiles

Median of Median Squared

Error

Using R library paleotree (Bapst, 2012)

100 Simulation Runs

Samples of 20 Trees

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Disc. Intervals = 5 tu

Rates est. via ML

100 Simulation Runs

Samples of 20 Trees

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Disc. Intervals = 5 tu

Rates est. via ML Using R library paleotree (Bapst, 2012)

Similar Patterns with Terminal Branch Lengths

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

Proportion within 95% Age

Quantiles

Median of Median Squared

Error

But this isn’t what we’re

interested in most often…

Analyses of Trait Evolution

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

True Phylogeny

Median Estimated Rate of Trait Change

(Log-Scale Axis)

100 Simulation Runs

Samples of 20 Trees

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Disc. Intervals = 5 tu

Rates est. via ML

Analyses of Trait Evolution

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

True Phylogeny

AICc Weight of BM vs OU

(for Trait Simulated Under BM)

100 Simulation Runs

Samples of 20 Trees

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Disc. Intervals = 5 tu

Rates est. via ML

Conclusions • New time-scaling method: cal3 (paleotree v1.5)

– Time-scaling calibrated with estimated rates of sampling, branching and extinction

• Fidelity of time-scaling can be decoupled from fidelity of comparative analyses

– Why? A certain je ne sais quoi of the time-scaling?

• Frequency of int. ZLBs? Balance of total BrLen distribution?

– If analytical performance cannot be easily extrapolated or predicted, simulations are key

Thanks to M. Foote, E. King, J. Felsenstein, A. Haber, M. Pennell, G. Hunt, G. Lloyd, M. Friedman, P. Wagner, M. Webster, D. Jablonski, M. LaBarbera, K. Boyce, J. Mitchell, P. Harnik, G. Slater and the R-Sig-Phylo Email List!

Slightly Better Polytomy Resolver

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

timeLadderTree

100 Simulation Runs

20 Trees Samples

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Interval Length = 5 tu

Rates est. via ML

Proportion of Collapsed Clades Correctly Resolved

Fidelity of VCV Matrices: All High

Cal3

w/ Ancestors

Cal3

w/o Ancestors

100 Simulation Runs

20 Trees Samples

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Interval Length = 5 tu

Rates est. via ML

Median Random Skewer Similarity of VCV

Matrices to True VCV Matrix

Cal3

w/o Ancestors

Rand-Res

Basic

Rand-Res

Samp. Rate Cond. gives best est across samp rates

100 trees each (not SRC), ~50 taxa

(Lmy-1)

Model Fitting At Other Parameters

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

True Phylogeny

AICc Weight of BM vs OU

(for Trait Simulated Under BM)

100 Simulation Runs

20 Trees Samples

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Interval Length = 5 tu

Rates est. via ML

p=q =0.1

r=0.5 per Ltu

Model Fitting At Other Parameters

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

True Phylogeny

AICc Weight of BM vs OU

(for Trait Simulated Under BM)

100 Simulation Runs

20 Trees Samples

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Interval Length = 5 tu

Rates est. via ML

Bifurcating

Cladogenesis

A Model of Unobserved History T

ime

Using R library paleotree (Bapst, 2012)

A Model of Unobserved History

• Best fit with gamma models

Tim

e

Using R library paleotree (Bapst, 2012)

A Model of Unobserved History

• Importance of Diversification: Twigs Matter!

• Calibrate with Three Rates: ‘Cal3’ time-scaling

– Sampling, branching and extinction rates

Tim

e

(Derived with Matt Pennell

and Emily King)

Of Time and Trees • Usually start with data like this…

– (Thanks to Melanie Hopkins for this example data!)

Taxon Ranges

Unscaled Topology

Hopkins, 2011 (Strict Consensus Tree)

Of Time and Trees • Usually start with data like this…

– (Thanks to Melanie Hopkins for this example data!)

Taxon Ranges

Unscaled Topology

Hopkins, 2011 (Strict Consensus Tree)

Of Time and Trees

Time

• …but want a time-scaled tree for analyses

• How do we time-scale? Effect on analyses?

Note: Time = Strat meters for Hopkins (2011)

Of Time and Trees

Time

• One solution uses morph ‘clock’-like approach

• But what about trees with no char change info?

Previous Approaches

• Unit-length branches

• “speciational” scale

• No actual time-scale!

– Is setting all branches equal a good approx?

• Soft polytomies have to be randomly resolved beforehand

– (True of most methods)

Scaled Consensus Tree from Hopkins, Polytomies Rand-Res.

# of Obs Branching

Events

Previous Approaches

• Basic Approach

• Clade age = age of earliest obs desc

• Creates many zero-length branches (ZLB) – Unrealistic

– Singular varcovar matrices: math for BM, etc. doesn’t work

Time

Time-Scaled Consensus Tree from Hopkins, Polytomies Rand-Res.

Previous Approaches

• Adding X to all branches (ABA) or just very short branches – X = A Number

• Avoids singularity

• Similar: MinBrLength

• Widely used but how to pick X?

• Can push root back unrealistically far

Time

Time-Scaled Consensus Tree from Hopkins, Polytomies Rand-Res.

Previous Approaches

• ‘Equal’ Method – Graeme Lloyd

• Pull root down by X, redistribute time on earlier branches along ZLBs – X = A Number

• Widely used, but how choose X?

Time

Time-Scaled Consensus Tree from Hopkins, Polytomies Rand-Res.

New Method: Sampling Rate Conditioned

• Est samp rate r

• Randomly pick ‘gaps’ in evol history to scale branches, using P(gaps|r) as weights

• Repeat many times…

Time

Time-Scaled Consensus Tree from Hopkins, Polytomies NOT Rand-Res.

New Method: Sampling Rate Conditioned

• Est samp rate r

• Randomly pick ‘gaps’ in evol history to scale branches, using P(gaps|r) as weights

• Repeat many times, make many trees! – No single answer!

• Resolve polytomies, identify ancestors

25 Time-Scaled Trees, Polytomies NOT Rand-Res.

New Method: Sampling Rate Conditioned

• Est samp rate r

• Randomly pick ‘gaps’ in evol history to scale branches, using L(gaps|r) as weights

• Repeat many times, make many trees! – No single answer!

• Resolve polytomies, identify ancestors – Stratolikelihood-esque

But How Do These Methods

Perform?

25 Time-Scaled Trees, Polytomies NOT Rand-Res.

New Method: Sampling Rate Conditioned

• Est samp rate r

• Randomly pick ‘gaps’ in evol history to scale branches, using L(gaps|r) as weights

• Repeat many times, make many trees! – No single answer!

• Resolve polytomies, identify ancestors – Stratolikelihood-esque

Let’s Run Some Birth-Death-

Sampling Simulations and Find Out!

(…For Some Stuff)

25 Time-Scaled Trees, Polytomies NOT Rand-Res.

Obs Data Est PDF (KDE)

True Sim Value (OUR TARGET)

Quick Guide to Beanplots!

Sampling Rates

Met

ho

ds

Val

ue

Using R library beanplot

Samp. Rate Cond. gives best est across samp rates

100 trees each (not SRC), ~50 taxa

(Lmy-1)

Signal estimate bad across the board; bias for zero

100 trees each (not SRC), ~50 taxa (Lmy-1)

100 trees each (not SRC), ~50 taxa

• Implications for trait evol model-fitting in fossil record?

• Bias for low-signal models like OU?

Signal estimate bad across the board; bias for zero

(Lmy-1)

100 trees each (not SRC), ~50 taxa

High correlation generally; SRC performs worst

Only Fully Extinct Clades (Lmy-1) 1 MY timebins

• Similar results for FirstDiffs

• Corr increases with time step size used

• Obs ranges good – True under diff sampling

model?

– Clades with living desc? • Lane et al. 2005

100 trees each (not SRC), ~50 taxa

High correlation generally; SRC performs worst

Only Fully Extinct Clades (Lmy-1) 1 MY timebins

Samp. Rate Cond. better than randomly resolving

100 trees each , ~50 taxa ~50% of Nodes Removed (Lmy-1)

• Results – New method: Sampling Rate Conditioned

– Good for BM rate, not so good for diversity curve

– No method unbiased for estimating phylo signal

• Future Work – Compare fidelity with poorly resolved trees

– Test possible bias in trait model-fitting analyses

• Simulations necessary to understand the reliability of methods in paleobiology

• Code to be released soon in R library

Thanks for Code: G. Lloyd, G. Hunt Thanks for Data: Melanie Hopkins Thanks for Comments and Ideas: M. Foote, M. Webster, D. Jablonski, E. King, P. Wagner, J. Mitchell, M. Friedman, G. Slater, M. Pennell, L. Harmon and the R-Sig-Phylo Email List!

Results and Future Work

– Effect of random zombie lineages on div corr

– Time-scale with joint L(gaps,morph)

– Integrate with birth-death models of branch length distribution for paleo trees

• Move up tree, node by node – calculate likelihoods for each possible position of a node

– Randomly sample a position, using likelihoods as weights

• Repeat to produce large sample of time-scaled trees

B

A

B

A

B

A

B

A

B

A

L ( obs gap of length t ) = r * exp (- r * t) r = instantaneous sampling rate

Tim

e

(Bapst, in prep. C)

• Move up tree, node by node – calculate likelihoods for each possible position of a node

– Randomly sample a position, using likelihoods as weights

• Repeat to produce large sample of time-scaled trees

B

A

B

A

B

A

B

A

B

A

Tim

e

Pick One by Weighted Random Sampling (Bapst, in prep. C)

L ( obs gap of length t ) = r * exp (- r * t) r = instantaneous sampling rate

Time-scaling Difficulties

• In an extinct clade of fossil taxa...

– Temporal placement of nodes constrained only by appearance of descendant taxa

Tim

e

B

A

C

B

A

C

Time-scaling Difficulties

• In an extinct clade of fossil taxa...

– Ancestors are potentially among our sampled taxa (particularly in well-sampled clades)

Tim

e

B

A

C

B

A

C

‘Budding’ Anagenesis

Zipper Method: A Stochastic Solution

• Produces stochastic samples of time-scaled trees

• In each run, samples many hypotheses of branch lengths and (also) anc-desc relationships, weighted by sampling probabilities

• Cannot reconstruct multi-budding scenario

• Requires integrated phylogenetic inference method

• As many truly interesting things do

Problems Created by a Really Big Supertree of Dead Plankton • Dealing with topological uncertainty

– Need to resolve soft polytomies for time-scaling

– Randomly resolving can produce poor overall fit to observed sequence of appearances

A

B

C

D

A

B

C

D

Tim

e

Problems Created by a Really Big Supertree of Dead Plankton • Dealing with topological uncertainty

• Developed alternative method based on stochastic sampling(Bapst, in prep. B) – Uses sampling rates in the fossil record,

estimated from range data (Foote, 1997)

– Reconstructs more nodes correctly than randomly resolving nodes in simulated trees

• Evolutionary analyses must be repeated over large samples of potential topologies

• Additional uncertainties in time-scaling phylogenies of fossil taxa

top related