svp 2012 talk: time-scaling trees in the fossil record

68
TIME-SCALING TREES IN THE FOSSIL RECORD David Bapst University of Chicago Geophysical Sciences (Pretty figure taken from Ted Garland’s website)

Upload: david-bapst

Post on 11-May-2015

162 views

Category:

Spiritual


2 download

DESCRIPTION

A talk I gave on my time-scaling method at SVP 2012 in Raleigh.

TRANSCRIPT

Page 1: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

TIME-SCALING TREES IN THE FOSSIL RECORD

David Bapst

University of Chicago

Geophysical Sciences

(Pretty figure taken from Ted Garland’s website)

Page 2: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Phylogeny in the Fossil Record

G

H

E

K

F

J I

D

C Tim

e

B

L

A

Page 3: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Sampling in the Fossil Record

G

H

Sampling

Event

E

K

F

J I

D

C Tim

e

B

L

A

Page 4: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

What We Have

A H K G E I D

A

K

I G

H D

E

Tim

e

Page 5: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

What We Want

A H K G E I D

A

K

I G

H D

E

Tim

e A

K

I G

H D

E

Tim

e

Original

Page 6: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Of Time and Trees: ‘Basic’ Method

• Clades are as old as earliest observed descendent

A

K

I G

H D

E

A

K

I G

H D

E

Tim

e

Original Basic Method

Smith, 1994

Page 7: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Of Time and Trees: ‘Basic’ Method

• Creates zero-length branches (…nuisance)

• Common fix: Extend branches a small amount

• No measurement of the uncertainty involved

A

K

I G

H D

E

A

K

I G

H D

E

Tim

e

Nodes Separated

by Zero-Length

Branches

Original Basic Method

Page 8: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Dealing with Uncertainty: Stochastic Analyses

• Example: Dealing with Discrete Intervals

K

D

Tim

e In

terv

als

t.1

t.2

t.3

t.4

t.5

t.6

A

I G

H E

K

D

A

I G H E

Lloyd et al., 2012

Page 9: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Dealing with Uncertainty: Stochastic Analyses

• Example: Dealing with Discrete Intervals

K

D

Tim

e In

terv

als

t.1

t.2

t.3

t.4

t.5

t.6

A

I G H E

K

D

A

I G

H E

Randomly Drop

FADs and LADs

Lloyd et al., 2012

Page 10: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Dealing with Uncertainty: Stochastic Analyses

• Example: Dealing with Discrete Intervals

K

D

A

I G

H E

K

D

Tim

e In

terv

als

t.1

t.2

t.3

t.4

t.5

t.6

A

I G H E

K

D

A

I G

H

E

Repeat!

K

D

A

I G H

E

Lloyd et al., 2012

Page 11: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Stochastic Time-Scaling

• Randomly select new node ages across a cladogram

– Give lower bounds by starting with root (with loose lower bound) and work up, node by node

• Rinse, repeat many times to produce a large sample of trees

?

Tim

e

Page 12: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Stochastic Time-Scaling: Extensions

• Stochastically infer ancestor-descendant relationships by allowing node ages to occur after the earliest taxon appears

Tim

e

Page 13: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

• Stochastically resolve soft polytomies by iteratively placing lineages over multiple steps

C

B

A

??

B

A

C

B

A

C

B

A

Tim

e

Stochastic Time-Scaling: Extensions

Page 14: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Stochastic Time-Scaling

• Expect more uncertainty in poorly-sampled fossil records

• Weight selection of node ages via probability model of the unobserved evolutionary history at a node

Tim

e

Pr(Σ gaps)

Page 15: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

A Probabilistic Model of Gaps

• Total minimum unobserved evolutionary history is dependent on sampling rates

– Can obtain via methods such as the freqRat

Tim

e

Page 16: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

A Probabilistic Model of Gaps

• But also dependent on diversification: branching and extinction rates

– Unsampled ‘twigs’ matter!

• Node ages need to be calibrated with three rates

– Cal3 time-scaling method

– Probability of unobserved twigs derived with Matt Pennell and Emily King

Foote et al., 1999; Friedman and Brazeau, 2011

Tim

e

Page 17: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

A Probabilistic Model of Gaps

• But also dependent on diversification: branching and extinction rates

– Unsampled ‘twigs’ matter!

• Node ages need to be calibrated with three rates

– Cal3 time-scaling method

– Probability of unobserved twigs derived with Matt Pennell and Emily King

Foote et al., 1999; Friedman and Brazeau, 2011

Tim

e

But how good is the time-scaling?

Page 18: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

A Probabilistic Model of Gaps

• But also dependent on diversification: branching and extinction rates

– Unsampled ‘twigs’ matter!

• Node ages need to be calibrated with three rates

– Cal3 time-scaling method

– Probability of unobserved twigs derived with Matt Pennell and Emily King

Foote et al., 1999; Friedman and Brazeau, 2011

Tim

e

Let’s do some simulations to

find out!

Page 19: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

So, How Good is the Time-Scaling?

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

Median of Median Error in Per-Node Ages

100 Simulation Runs

Samples of 20 Trees

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Disc. Intervals = 5 tu

Rates est. via ML Using R library paleotree (Bapst, 2012; MEE)

Page 20: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Squared-Error: Cal3 has More Error

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

Median of Median Squared-Error

in Per-Node Ages

Using R library paleotree (Bapst, 2012)

100 Simulation Runs

Samples of 20 Trees

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Disc. Intervals = 5 tu

Rates est. via ML

Page 21: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Better Estimator of Uncertainty?

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

Proportion of True Node-Ages

within 95% Age Quantiles

Using R library paleotree (Bapst, 2012)

100 Simulation Runs

Samples of 20 Trees

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Disc. Intervals = 5 tu

Rates est. via ML

Page 22: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Similar Patterns with Terminal Branch Lengths

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

Proportion within 95% Age

Quantiles

Median of Median Squared

Error

Using R library paleotree (Bapst, 2012)

100 Simulation Runs

Samples of 20 Trees

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Disc. Intervals = 5 tu

Rates est. via ML

Page 23: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

100 Simulation Runs

Samples of 20 Trees

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Disc. Intervals = 5 tu

Rates est. via ML Using R library paleotree (Bapst, 2012)

Similar Patterns with Terminal Branch Lengths

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

Proportion within 95% Age

Quantiles

Median of Median Squared

Error

But this isn’t what we’re

interested in most often…

Page 24: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Analyses of Trait Evolution

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

True Phylogeny

Median Estimated Rate of Trait Change

(Log-Scale Axis)

100 Simulation Runs

Samples of 20 Trees

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Disc. Intervals = 5 tu

Rates est. via ML

Page 25: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Analyses of Trait Evolution

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

True Phylogeny

AICc Weight of BM vs OU

(for Trait Simulated Under BM)

100 Simulation Runs

Samples of 20 Trees

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Disc. Intervals = 5 tu

Rates est. via ML

Page 26: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Conclusions • New time-scaling method: cal3 (paleotree v1.5)

– Time-scaling calibrated with estimated rates of sampling, branching and extinction

• Fidelity of time-scaling can be decoupled from fidelity of comparative analyses

– Why? A certain je ne sais quoi of the time-scaling?

• Frequency of int. ZLBs? Balance of total BrLen distribution?

– If analytical performance cannot be easily extrapolated or predicted, simulations are key

Thanks to M. Foote, E. King, J. Felsenstein, A. Haber, M. Pennell, G. Hunt, G. Lloyd, M. Friedman, P. Wagner, M. Webster, D. Jablonski, M. LaBarbera, K. Boyce, J. Mitchell, P. Harnik, G. Slater and the R-Sig-Phylo Email List!

Page 27: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record
Page 28: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Slightly Better Polytomy Resolver

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

timeLadderTree

100 Simulation Runs

20 Trees Samples

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Interval Length = 5 tu

Rates est. via ML

Proportion of Collapsed Clades Correctly Resolved

Page 29: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Fidelity of VCV Matrices: All High

Cal3

w/ Ancestors

Cal3

w/o Ancestors

100 Simulation Runs

20 Trees Samples

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Interval Length = 5 tu

Rates est. via ML

Median Random Skewer Similarity of VCV

Matrices to True VCV Matrix

Cal3

w/o Ancestors

Rand-Res

Basic

Rand-Res

Page 30: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Samp. Rate Cond. gives best est across samp rates

100 trees each (not SRC), ~50 taxa

(Lmy-1)

Page 31: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Model Fitting At Other Parameters

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

True Phylogeny

AICc Weight of BM vs OU

(for Trait Simulated Under BM)

100 Simulation Runs

20 Trees Samples

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Interval Length = 5 tu

Rates est. via ML

p=q =0.1

r=0.5 per Ltu

Page 32: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Model Fitting At Other Parameters

Basic

Rand-Res

Cal3

w/ Ancestors

Cal3

w/o Ancestors

Cal3

w/o Ancestors

Rand-Res

True Phylogeny

AICc Weight of BM vs OU

(for Trait Simulated Under BM)

100 Simulation Runs

20 Trees Samples

~50 Taxa; Budding

p= q = r = 0.1 per Ltu

Interval Length = 5 tu

Rates est. via ML

Bifurcating

Cladogenesis

Page 33: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

A Model of Unobserved History T

ime

Using R library paleotree (Bapst, 2012)

Page 34: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

A Model of Unobserved History

• Best fit with gamma models

Tim

e

Using R library paleotree (Bapst, 2012)

Page 35: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

A Model of Unobserved History

• Importance of Diversification: Twigs Matter!

• Calibrate with Three Rates: ‘Cal3’ time-scaling

– Sampling, branching and extinction rates

Tim

e

(Derived with Matt Pennell

and Emily King)

Page 36: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record
Page 37: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record
Page 38: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Of Time and Trees • Usually start with data like this…

– (Thanks to Melanie Hopkins for this example data!)

Taxon Ranges

Unscaled Topology

Hopkins, 2011 (Strict Consensus Tree)

Page 39: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Of Time and Trees • Usually start with data like this…

– (Thanks to Melanie Hopkins for this example data!)

Taxon Ranges

Unscaled Topology

Hopkins, 2011 (Strict Consensus Tree)

Page 40: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Of Time and Trees

Time

• …but want a time-scaled tree for analyses

• How do we time-scale? Effect on analyses?

Note: Time = Strat meters for Hopkins (2011)

Page 41: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Of Time and Trees

Time

• One solution uses morph ‘clock’-like approach

• But what about trees with no char change info?

Page 42: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Previous Approaches

• Unit-length branches

• “speciational” scale

• No actual time-scale!

– Is setting all branches equal a good approx?

• Soft polytomies have to be randomly resolved beforehand

– (True of most methods)

Scaled Consensus Tree from Hopkins, Polytomies Rand-Res.

# of Obs Branching

Events

Page 43: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Previous Approaches

• Basic Approach

• Clade age = age of earliest obs desc

• Creates many zero-length branches (ZLB) – Unrealistic

– Singular varcovar matrices: math for BM, etc. doesn’t work

Time

Time-Scaled Consensus Tree from Hopkins, Polytomies Rand-Res.

Page 44: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Previous Approaches

• Adding X to all branches (ABA) or just very short branches – X = A Number

• Avoids singularity

• Similar: MinBrLength

• Widely used but how to pick X?

• Can push root back unrealistically far

Time

Time-Scaled Consensus Tree from Hopkins, Polytomies Rand-Res.

Page 45: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Previous Approaches

• ‘Equal’ Method – Graeme Lloyd

• Pull root down by X, redistribute time on earlier branches along ZLBs – X = A Number

• Widely used, but how choose X?

Time

Time-Scaled Consensus Tree from Hopkins, Polytomies Rand-Res.

Page 46: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

New Method: Sampling Rate Conditioned

• Est samp rate r

• Randomly pick ‘gaps’ in evol history to scale branches, using P(gaps|r) as weights

• Repeat many times…

Time

Time-Scaled Consensus Tree from Hopkins, Polytomies NOT Rand-Res.

Page 47: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

New Method: Sampling Rate Conditioned

• Est samp rate r

• Randomly pick ‘gaps’ in evol history to scale branches, using P(gaps|r) as weights

• Repeat many times, make many trees! – No single answer!

• Resolve polytomies, identify ancestors

25 Time-Scaled Trees, Polytomies NOT Rand-Res.

Page 48: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

New Method: Sampling Rate Conditioned

• Est samp rate r

• Randomly pick ‘gaps’ in evol history to scale branches, using L(gaps|r) as weights

• Repeat many times, make many trees! – No single answer!

• Resolve polytomies, identify ancestors – Stratolikelihood-esque

But How Do These Methods

Perform?

25 Time-Scaled Trees, Polytomies NOT Rand-Res.

Page 49: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

New Method: Sampling Rate Conditioned

• Est samp rate r

• Randomly pick ‘gaps’ in evol history to scale branches, using L(gaps|r) as weights

• Repeat many times, make many trees! – No single answer!

• Resolve polytomies, identify ancestors – Stratolikelihood-esque

Let’s Run Some Birth-Death-

Sampling Simulations and Find Out!

(…For Some Stuff)

25 Time-Scaled Trees, Polytomies NOT Rand-Res.

Page 50: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Obs Data Est PDF (KDE)

True Sim Value (OUR TARGET)

Quick Guide to Beanplots!

Sampling Rates

Met

ho

ds

Val

ue

Using R library beanplot

Page 51: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Samp. Rate Cond. gives best est across samp rates

100 trees each (not SRC), ~50 taxa

(Lmy-1)

Page 52: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Signal estimate bad across the board; bias for zero

100 trees each (not SRC), ~50 taxa (Lmy-1)

Page 53: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

100 trees each (not SRC), ~50 taxa

• Implications for trait evol model-fitting in fossil record?

• Bias for low-signal models like OU?

Signal estimate bad across the board; bias for zero

(Lmy-1)

Page 54: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

100 trees each (not SRC), ~50 taxa

High correlation generally; SRC performs worst

Only Fully Extinct Clades (Lmy-1) 1 MY timebins

Page 55: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

• Similar results for FirstDiffs

• Corr increases with time step size used

• Obs ranges good – True under diff sampling

model?

– Clades with living desc? • Lane et al. 2005

100 trees each (not SRC), ~50 taxa

High correlation generally; SRC performs worst

Only Fully Extinct Clades (Lmy-1) 1 MY timebins

Page 56: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Samp. Rate Cond. better than randomly resolving

100 trees each , ~50 taxa ~50% of Nodes Removed (Lmy-1)

Page 57: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

• Results – New method: Sampling Rate Conditioned

– Good for BM rate, not so good for diversity curve

– No method unbiased for estimating phylo signal

• Future Work – Compare fidelity with poorly resolved trees

– Test possible bias in trait model-fitting analyses

• Simulations necessary to understand the reliability of methods in paleobiology

• Code to be released soon in R library

Thanks for Code: G. Lloyd, G. Hunt Thanks for Data: Melanie Hopkins Thanks for Comments and Ideas: M. Foote, M. Webster, D. Jablonski, E. King, P. Wagner, J. Mitchell, M. Friedman, G. Slater, M. Pennell, L. Harmon and the R-Sig-Phylo Email List!

Results and Future Work

Page 58: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record
Page 59: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record
Page 60: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record
Page 61: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

– Effect of random zombie lineages on div corr

– Time-scale with joint L(gaps,morph)

– Integrate with birth-death models of branch length distribution for paleo trees

Page 62: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

• Move up tree, node by node – calculate likelihoods for each possible position of a node

– Randomly sample a position, using likelihoods as weights

• Repeat to produce large sample of time-scaled trees

B

A

B

A

B

A

B

A

B

A

L ( obs gap of length t ) = r * exp (- r * t) r = instantaneous sampling rate

Tim

e

(Bapst, in prep. C)

Page 63: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

• Move up tree, node by node – calculate likelihoods for each possible position of a node

– Randomly sample a position, using likelihoods as weights

• Repeat to produce large sample of time-scaled trees

B

A

B

A

B

A

B

A

B

A

Tim

e

Pick One by Weighted Random Sampling (Bapst, in prep. C)

L ( obs gap of length t ) = r * exp (- r * t) r = instantaneous sampling rate

Page 64: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Time-scaling Difficulties

• In an extinct clade of fossil taxa...

– Temporal placement of nodes constrained only by appearance of descendant taxa

Tim

e

B

A

C

B

A

C

Page 65: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Time-scaling Difficulties

• In an extinct clade of fossil taxa...

– Ancestors are potentially among our sampled taxa (particularly in well-sampled clades)

Tim

e

B

A

C

B

A

C

‘Budding’ Anagenesis

Page 66: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Zipper Method: A Stochastic Solution

• Produces stochastic samples of time-scaled trees

• In each run, samples many hypotheses of branch lengths and (also) anc-desc relationships, weighted by sampling probabilities

• Cannot reconstruct multi-budding scenario

• Requires integrated phylogenetic inference method

• As many truly interesting things do

Page 67: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Problems Created by a Really Big Supertree of Dead Plankton • Dealing with topological uncertainty

– Need to resolve soft polytomies for time-scaling

– Randomly resolving can produce poor overall fit to observed sequence of appearances

A

B

C

D

A

B

C

D

Tim

e

Page 68: SVP 2012 Talk: Time-Scaling Trees in the Fossil Record

Problems Created by a Really Big Supertree of Dead Plankton • Dealing with topological uncertainty

• Developed alternative method based on stochastic sampling(Bapst, in prep. B) – Uses sampling rates in the fossil record,

estimated from range data (Foote, 1997)

– Reconstructs more nodes correctly than randomly resolving nodes in simulated trees

• Evolutionary analyses must be repeated over large samples of potential topologies

• Additional uncertainties in time-scaling phylogenies of fossil taxa