finding good models of molecular evolution in phylogenetics - rob lanfear

Post on 17-Jul-2015

431 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Finding good models of molecular evolution in phylogenetics

Rob Lanfear

Australian National University,

National Evolutionary Synthesis Centre, USA

Acknowledgements

Simon Ho

Stephane Guindon

Brett Calcott

Alexis Statmatakis

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

A G

C T

Rate Matrix

πA + πC + πG + πT = 1

Base Frequencies

+ I + G

Site Rates

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

A G

C T

Rate Matrix

πA + πC + πG + πT = 1

Base Frequencies

+ I + G

Site Rates

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

a f

b

cd

e

JCa=b=c=d=e=fπA=πC=πG=πT

No I or G 0 free parameters

GTR+I+Ga, b, c, d, e, fπA, πC, πG, πT

I, G10 free parameters

GTRa, b, c, d, e, fπA, πC, πG, πT

No I or G 8 free parameters

HKYa=c=d=f, b=eπA, πC, πG, πT

No I or G 4 free parameters

Model Selection

Compare all models.

2. Penalise models with more parameters

e.g. Bayesian Information Criterion (BIC)

1. Calculate the Likelihood of each model

3. Use the model with the smallest BIC score

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

The Problem

Almost always select GTR+I+G(the most complex model)

“like an overweight man shopping in the women's petites department”Gatesy J, Trends Ecol Evol 2007, 22:509-10

Partitioning

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

GTR+I+Ga, b, c, d, e, fπA, πC, πG, πT

I, G10 free parameters

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

actgactgactgactgactgactgactgactgactgactgactgac

GTR+I+Ga, b, c, d, e, fπA, πC, πG, πT

I, G10 free parameters

GTR+I+Ga, b, c, d, e, fπA, πC, πG, πT

I, G10 free parameters

Spp1 actgactgactgactgactgactgactgactgactgactgactga

Spp2 actgactgactgactgactgactgactgactgactgactgactga

Spp3 actgactgactgactgactgactgactgactgactgactgactga

Spp4 actgactgactgactgactgactgactgactgactgactgactga

Spp5 actgactgactgactgactgactgactgactgactgactgactga

Gene 1 Gene 2 Gene 3 Subsets

9

6

2

A Solution

Compare all possible partitioning schemes.

2. Penalise schemes with more parameters

e.g. Bayesian Information Criterion (BIC)

1. Calculate the Likelihood of each scheme

3. Use the scheme with the smallest BIC score

Many models (HKY, GTR) for each subset

Many ways to partition a dataset

The Problem

PartitionFinderwww.robertlanfear.com/partitionfinder

15,404 sites from whole mitochondrial genomes

87 data blocks

8,000 unit improvement in BIC score

Future directions

1. Genome scale analyses (finished)2. Cloud computing (started)3. GUI (planned)4. Better algorithms (planned)

top related