molecular clocks

44
Molecular Clocks Molecular Clocks Rose Hoberman

Upload: quanda

Post on 18-Jan-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Molecular Clocks. Rose Hoberman. The Holy Grail. Fossil evidence is sparse and imprecise (or nonexistent). Predict divergence times by comparing molecular data. H. C. M. R. D. 110 MYA. Given a phylogenetic tree branch lengths (rt) a time estimate for one (or more) node. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Molecular Clocks

Molecular ClocksMolecular Clocks

Rose Hoberman

Page 2: Molecular Clocks

The Holy Grail

Fossil evidence issparse and imprecise

(or nonexistent)

Predict divergence times by comparing molecular data

Page 3: Molecular Clocks

• Given – a phylogenetic tree – branch lengths (rt)– a time estimate for

one (or more) node C HD R M

110 MYA

• Can we date other nodes in the tree?

• Yes... if the rate of molecular change is constant across all branches

Page 4: Molecular Clocks

Page & Holmes p240

Rate Constancy?

Page 5: Molecular Clocks

Protein Variability

• Protein structures & functions differ– Proportion of neutral sites differ

• Rate constancy does not hold across different protein types

• However...– Each protein does appear to have a

characteristic rate of evolution

Page 6: Molecular Clocks

Evidence for Rate Constancyin Hemoglobin

Page and Holmes

p229

Large carniverous marsupial

Page 7: Molecular Clocks

TheMolecular Clock Hypothesis

• Amount of genetic difference between sequences is a function of time since separation.

• Rate of molecular change is constant (enough) to predict times of divergence

Page 8: Molecular Clocks

Outline

• Methods for estimating time under a molecular clock– Estimating genetic distance– Determining and using calibration points– Sources of error

• Rate heterogeneity– reasons for variation– how its taken into account when estimating times

• Reliability of time estimates• Estimating gene duplication times

Page 9: Molecular Clocks

Measuring Evolutionary time with a molecular clock

1. Estimate genetic distance d = number amino acid replacements

2. Use paleontological data to determine date of common ancestor

T = time since divergence

3. Estimate calibration rate (number of genetic changes expected per unit time)

r = d / 2T

4. Calculate time of divergence for novel sequences

T_ij = d_ij / 2r

Page 10: Molecular Clocks

Estimating Genetic DifferencesIf all nt equally likely, observed difference would plateau at 0.75

Simply counting differences underestimates distances

Fails to count for multiple hits

(Page & Holmes p148)

Page 11: Molecular Clocks

Estimating Genetic Distance with a Substitution Model

• accounts for relative frequency of different types of substitutions

• allows variation in substitution rates between sites

• given learned parameter values– nucleotide frequencies– transition/transversion bias– alpha parameter of gamma distribution

• can infer branch length from differences

Page 12: Molecular Clocks

Distances from Gamma-Distributed Rates

• rate variation among sites– “fast/variable” sites

• 3rd codon positions• codons on surface of globular protein

– “slow/invariant” sites• Trytophan (1 codon) structurally required• 1st or 2nd codon position when di-sulfide bond needed

• alpha parameter of gamma distribution describes degree of variation of rates across positions

• modeling rate variation changes branch length/ sequence differences curve

Page 13: Molecular Clocks

Gamma Corrected Distances

• high rate sites saturate quickly

• sequence difference rises much more slowly as the low-rate sites gradually accumulate differences

• Felsenstein Inferring Phylogenies p219

Page 14: Molecular Clocks

The ‘Sloppy’ Clock

• ‘Ticks’ are stochastic, not deterministic– Mutations happen randomly according to a

Poisson distribution.

• Many divergence times can result in the same number of mutations

• Actually over-dispersed Poisson– Correlations due to structural constraints

Page 15: Molecular Clocks

Poisson Variance(Assuming A Pefect Molecular Clock)

If mutation every MY• Poisson variance

– 95% lineages 15 MYA old have 8-22 substitutions

– 8 substitutions also could be 5 MYA

Molecular Systematics p532

Page 16: Molecular Clocks

Need for Calibrations

• Changes = rate*time • Can explain any observed branch length

– Fast rate, short time– Slow rate, long time

• Suppose 16 changes along a branch– Could be 2 * 8 or 8 * 2– No way to distinguish– If told time = 8, then rate = 2

• Assume rate=2 along all branches– Can infer all times

Page 17: Molecular Clocks

Estimating Calibration Rate

• Calculate separate rate for each data set (species/genes) using known date of divergence (from fossil, biogeography)

• One calibration point– Rate = d/2T

• More than one calibration point– use regression– use generative model that constrains time

estimates (more later)

Page 18: Molecular Clocks

Calibration Complexities

• Cannot date fossils perfectly

• Fossils usually not direct ancestors– branched off tree before (after?) splitting

event.

• Impossible to pinpoint the age of last common ancestor of a group of living species

Page 19: Molecular Clocks

Linear Regression

•Fix intercept at (0,0)•Fit line between

divergence estimates and calibration times

•Calculate regression and prediction confidence limits

Molecular Systematics p536

Page 20: Molecular Clocks

Molecular DatingSources of Error

• Both X and Y values only estimates– substitution model could be incorrect– tree could be incorrect– errors in orthology assignment– Poisson variance is large

• Pairwise divergences correlated (Systematics p534?)– inflates correlation between divergence & time

• Sometimes calibrations correlated– if using derived calibration points

• Error in inferring slope• Confidence interval for predictions much larger than

confidence interval for slope

Page 21: Molecular Clocks

Rate Heterogeneity

• Rate of molecular evolution can differ between– nucleotide positions– genes– genomic regions– genomes (nuclear vs organelle), species– species– over time

• If not considered, introduces bias into time estimates

Page 22: Molecular Clocks

Cause Reason

Repair equipment

e.g. RNA viruses have

error-prone polymerases

Metabolic rate More free radicals

Generation time Copies DNA more frequently

Population size Effects mutation fixation rate

Rate Heterogeneity among Lineages

Page 23: Molecular Clocks

Local Clocks?

• Closely related species often share similar properties, likely to have similar rates

• For example– murid rodents on average 2-6 times faster

than apes and humans (Graur & Li p150)– mouse and rat rates are nearly equal (Graur &

Li p146)

Page 24: Molecular Clocks

Cause Reason

Population size changes

Genetic drift more likely to fix neutral alleles in small population

Strength of selection changes over time

1. new role/environment

2. gene duplication

3. change in another gene

Rate Changes within a Lineage

Page 25: Molecular Clocks

Working Around Rate Heterogeneity

1. Identify lineages that deviate and remove them

2. Quantify degree of rate variation to put limits on possible divergence dates– requires several calibration dates, not always

available– gives very conservative estimates of

molecular dates

3. Explicity model rate variation

Page 26: Molecular Clocks

Search for Genes with Uniform Rate across Taxa

Many ‘clock’ tests:

– Relative rates tests • compares rates of sister nodes using an outgroup

– Tajima test• Number of sites in which character shared by outgroup and

only one of two ingroups should be equal for both ingroups– Branch length test

• deviation of distance from root to leaf compared to average distance

– Likelihood ratio test• identifies deviance from clock but not the deviant sequences

Page 27: Molecular Clocks

Likelihood Ratio Test

• estimate a phylogeny under molecular clock and without it– e.g. root-to-tip distances must be equal

• difference in likelihood ~ 2*Chi^2 with n-2 degrees of freedom– asymptotically– when models are nested– when nested parameters aren’t set to

boundary

Page 28: Molecular Clocks

Relative Rates Tests

• Tests whether distance between two taxa and an outgroup are equal (or average rate of two clades vs an outgroup)– need to compute expected variance – many triples to consider, and not independent

• Lacks power, esp– short sequences– low rates of change

• Given length and number of variable sites in typical sequences used for dating, (Bronham et al 2000) says:– unlikely to detect moderate variation between lineages (1.5-4x)– likely to result in substantial error in date estimates

Page 29: Molecular Clocks

Modeling Rate VariationRelaxing the Molecular Clock

• Learn rates and times, not just branch lengths– Assume root-to-tip times equal– Allow different rates on different branches– Rates of descendants correlate with that of

common acnestor

• Restricts choice of rates, but still too much flexibility to choose rates well

A B C

D E F

M

NR

Page 30: Molecular Clocks

Relaxing the Molecular Clock

• Likelihood analysis– Assign each branch a rate parameter

• explosion of parameters, not realistic– User can partition branches based on domain knowledge– Rates of partitions are independent

• Nonparametric methods– smooth rates along tree

• Bayesian approach– stochastic model of evolutionary change– prior distribution of rates– Bayes theorem– MCMC

Page 31: Molecular Clocks

Parsimonious Approaches

• Sanderson 1997, 2002– infer branch lengths via parsimony– fit divergence times to minimize difference

between rates in successive branches– (unique solution?)

• Cutler 2000– infer branch lengths via parsimony– rates drawn from a normal distribution

(negative rates set to zero)

Page 32: Molecular Clocks

Bayesian ApproachesLearn rates, times, and substitution parameters

simultaneously

Devise model of relationship between rates– Thorne/Kishino et al

• Assigns new rates to descendant lineages from a lognormal distribution with mean equal to ancestral rate and variance increasing with branch length

– Huelsenbeck et al• Poisson process generates random rate changes

along tree• new rate is current rate * gamma-distributed

random variable

Page 33: Molecular Clocks

Comparison of Likelihood & Bayesan Approaches for Estimating Divergence

Times (Yang & Yoder 2003)

• Analyzed two mitochondrial genes– each codon position treated separately– tested different model assumptions– used – 7 calibration points

• Neither model reliable when – using only one codon position– using a single model for all positions

• Results similar for both methods– using the most complex model – use separate parameters for each codon position (could use

codon model?)

Page 34: Molecular Clocks

Sources of Error/Variance

• Lack of rate constancy (due to lineage, population size or selection effects)

• Wrong assumptions in evolutionary model• Errors in orthology assignment• Incorrect tree• Stochastic variability• Imprecision of calibration points• Imprecision of regression• Human sloppiness in analysis

– self-fulfilling prophecies

Page 35: Molecular Clocks

Reading the entrails of chickens (Graur and Martin 2004)

• single calibration point• error bars removed from calibration points• standard error bars instead of 95% confidence

intervals• secondary/tertiary calibration points treated as

reliable and precise– based on incorrect initial estimates– variance increases with distance from

original estimate

• few proteins used

Page 36: Molecular Clocks

Multiple Gene Loci

• “Trying to estimate time of divergence from one protein is like trying to estimate the average height of humans by measuring one human”

--Molecular Systematics p539

Use multiple genes!

(and multiple calibration points)

Page 37: Molecular Clocks

Even so...Be Very Wary

Of Molecular Times

• Point estimates are absurd

• Sample errors often based only on the difference between estimates in the same study

• Even estimates with confidence intervals unlikely to really capture all sources of variance

Page 38: Molecular Clocks

McLysaght, Hokamp, Wolfe 2002Dating Human Gene Duplications

• [758] Trees generated (ML method using PAM matrix)• [602] Alpha parameter for gamma distribution learned

– (Gu and Zhang 1997) faster than ML, more accurate than parsimony

– Thrown out if variance > mean. Why would this happen?– “May be problematic to apply this model for gene family

evolution because of the possible functional divergence among paralogous genes”

• [481] NJ trees built from Gamma-corrected distances– Family kept only if worm/fly group together

• [191] Two-cluster test of rate constancy (Takezaki et al 1995)

Page 39: Molecular Clocks

Blanc, Hokamp, WolfeDating Arabadopsis Duplications

• Create nucleotide alignments• Estimate “Level of” Synonymous

substitutions (Yang’s ML method) – per site? per synonymous site?

• Ks values > 10 ignored (Yang; Anisimova) • Why used different method than for

human?• How reliable is ranking of Ks values? How

much variance expected?

Page 40: Molecular Clocks

Ks > 10 unreliable ?

• Yang (abstract) calculates effect of evolutionary rate on accuracy of phylogenic reconstruction

• Anisimova calculates accuracy and power of LRT in detecting adaptive molecular evolution

• Neither seems to give any cutoff regarding dS > 10.

Page 41: Molecular Clocks

Future Improvements

• Calculate accurate confidence intervals taking into account multiple sources of variance

• Novel models that account for variation in rates between taxa

• Build explicit models that predict rates based on an understanding of the underlying processes that generate differences in substitutions rates

Page 42: Molecular Clocks

General References

Reviews/Critiques1. Bronham and Penny. The modern molecular clock,

Nature review in genetics?, 2003.2. Graur and Martin. Reading the entrails of chickens...the

illusion of precision. Trends in Genetics, 2004.

Textbooks:1. Molecular Systematics. 2nd edition. Edited by Hillis,

Moritz, and Mable.2. Inferring Phylogenies. Felsenstein.3. Molecular Evolution, a phylogenetic approach. Page

and Holmes.

Page 43: Molecular Clocks

Dealing with Rate Heterogeneity1. Yang and Yoder. Comparison of likelihood and bayesian methods for

estimating divergence times... Syst. Biol, 2003.2. Kishino, Thorne, and Bruno. Performance of a divergence time

estimation method under a probabilistic model of rate evolution. Mol. Biol. Evol, 2001.

3. Huelsenbeck, Larget, and Swofford. A compound poisson process for relaxing the molecular clock. Genetics, 2000.

Testing for Rate heterogeneity1. Takezaki, Rzhetsky and Nei. Phylogenetic test of the molecular clock

and linearized trees. Mol. Bio. Evol., 1995.2. Bronham, Penny, Rambaut, and Hendy. The power of relative rates

test depends on the data. J Mol Evol, 2000.

Rate Heterogeneity References

Page 44: Molecular Clocks

Dating Duplications References

Dating duplications:• McLysaght, Hokamp, and Wolfe. Extensive genomic duplication

during early chordate evolution. Nature Genetics?, 2002.• Blanc, Hokamp, and Wolfe. Recent polyploidy superimposed on

older large-scale duplications in the Arabidopsis genome. Genome Research, 2003.

Reference used for dating duplications in above papers• Gu and Zhang. A simple method for estimating the parameter of

substitution rate variation among sites. Mol. Biol. Evol., 1997.• Yang Z. On the best evolutionary rate for phylogenetic analysis.

Syst. Biol, 1998.• Anisimova, Bielawski, Yang. Accuracy and power of the likelihood

ratio test in detecting adaptive molecular evolution. Mol. Biol. Evol., 2001.