lecture 3 molecular evolution and phylogeny. facts on the molecular basis of life every life forms...

Post on 15-Jan-2016

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 3

Molecular Evolution and Phylogeny

Facts on the molecular basis of life

• Every life forms is genome based• Genomes evolves• There are large numbers of apparently hom

logous intra-genomic (paralog) and inter-genomic (ortholog) genes

• Some genes, especially those related to the function of transcription and translation, are common to ALL life forms

• The closer two organisms seem to be phylogenetically, the more similar their genomes and corresponding genes are

Central dogma of molecular biology

DNA

RNA

Protein

• Closer related organisms have more similar genomes

• Highly similar genes are homologs (have the same ancestor)

• A universal ancestor exists for all life forms• Molecular difference in homologous genes

(or protein sequences) are positively correlated with evolution time

• Phylogenetic relation can be expressed by a dendrogram (a “tree”)

Basic assumptions of molecular evolution

The five steps in phylogenetics dancing

Modified from Hillis et al., (1993). Methods in Enzymology 224, 456-487

12

3

4

5

Sequence data

Align Sequences

Phylogenetic signal?Patterns—>evolutionary processes?

Test phylogenetic reliability

Distances methods

Choose a method

MB ML

Characters based methods

Single treeOptimality criterion

Calculate or estimate best fit tree

LS ME NJ

Distance calculation(which model?)

Model?

MPWheighting?

(sites, changes)?Model?

Why protein phylogenies?Why protein phylogenies?

• For historical reasons - first For historical reasons - first

sequences...sequences...• Most genes encode proteins...Most genes encode proteins...• To study protein structure, function To study protein structure, function

andand

evolutionevolution• Comparing DNA and protein based Comparing DNA and protein based

phylogenies can be usefulphylogenies can be useful•Different genes - e.g. 18S rRNA versus Different genes - e.g. 18S rRNA versus

EF-2 proteinEF-2 protein•Protein encoding gene - codons versus Protein encoding gene - codons versus

amino acidsamino acids

Protein were the first molecular Protein were the first molecular sequences to be used for sequences to be used for phylogenetic inferencephylogenetic inference

Fitch and Margoliash (1967)

Construction of phylogenetic trees.

Science 155, 279-284.

Statistical Physics and Biological InformationInstitute of Theoretical Physics

University of California at Santa Barbara2001 May 7

Most of what follows taken from:

Understanding trees

Time

30 Mya

Root

22 Mya

7 Mya

same as

Understanding trees #2

Understanding trees #3

Difference in homologous sequences is a measure of evolution time

Part of multiple sequence alignment of Mitochondrial Small Sub-Unit rRNA

Full length is ~ 950

11 primate species with mouse as outgroup靈長目

Change similarity matrix to distance matrix: d = 1 - S

From alignment construct pairwise distance**Note: Alignment is not the only way to computedistance

Models of sequence evolution

Jukes-Cantor (minimal) Model

All substitution rates = all base frequency = 1/4

A C= 3 Pij(2t)

• Let probability of site being a base at time t be P(t)• After elapse time t

mutate to other three bases is –3t P(t) Gain from other bases is t (1 - P(t))

• Hence P(t + t) = P(t) –3t P(t) + t (1 - P(t)) dP(t)/dt = P(t)

• Write P(t) = a exp(-bt) +c, solution is b= , c=1/4 P(t) = a exp(- t) +1/4

• If P(0) = 1, then a = ¾. If P(0) = 0, then a = -1/4• Finally Psame(t) =1/4 +3/4 exp(- t)

Pchange(t) =1/4 - 1/4 exp(- t)

Derivation of Jukes-Cantor formula

Transition A G or C TTransversion A T or C G

Hasegawa-Kishino-Yano modelHas a more general substitution rate

Part of Jukes-Cantor distance matrixfor primate examples

(is much larger; for outgroup)

Matrix will be used for clustering methods

Clustering

UPGMA

Neighbor-Joining Method

N-J Method produces an Unrooted, Additive tree

PAM Spinach Rice Mosquito Monkey HumanSpinach 0.0 84.9 105.6 90.8 86.3Rice 84.9 0.0 117.8 122.4 122.6Mosquito 105.6 117.8 0.0 84.7 80.8Monkey 90.8 122.4 84.7 0.0 3.3Human 86.3 122.6 80.8 3.3 0.0

What is required for the Neighbour joining method?

Distance matrix0. Distance Matrix

Neighbor-Joining MethodAn Example

PAM distance 3.3 (Human - Monkey) is the minimum. So we'll join Human and Monkey to MonHum and we'll calculate the new distances.

Mon-Hum

MonkeyHumanSpinachMosquito Rice

1. First Step

After we have joined two species in a subtree we have to compute the distances from every other node to the new subtree. We do this with a simple average of distances:Dist[Spinach, MonHum]

= (Dist[Spinach, Monkey] + Dist[Spinach, Human])/2 = (90.8 + 86.3)/2 = 88.55

Mon-Hum

MonkeyHumanSpinach

2. Calculation of New Distances

PAM Spinach Rice Mosquito MonHumSpinach 0.0 84.9 105.6 88.6Rice 84.9 0.0 117.8 122.5Mosquito 105.6 117.8 0.0 82.8MonHum 88.6 122.5 82.8 0.0

HumanMosquito

Mon-Hum

MonkeySpinachRice

Mos-(Mon-Hum)

3. Next Cycle

PAM Spinach Rice MosMonHumSpinach 0.0 84.9 97.1Rice 84.9 0.0 120.2MosMonHum 97.1 120.2 0.0

HumanMosquito

Mon-Hum

MonkeySpinachRice

Mos-(Mon-Hum)

Spin-Rice

4. Penultimate Cycle

PAM SpinRice MosMonHumSpinach 0.0 108.7MosMonHum 108.7 0.0

HumanMosquito

Mon-Hum

MonkeySpinachRice

Mos-(Mon-Hum)

Spin-Rice

(Spin-Rice)-(Mos-(Mon-Hum))

5. Last Joining

Human

Monkey

MosquitoRice

Spinach

The result:Unrooted Neighbor-Joining Tree

Bootstrapping

Why are trees not exact?

Pairwise distances usually not tree-like

Searching tree space

Maximum likelihood criterion

Parsimony criterion

Parsimony with molecular data

Parsimony criterion

Paul Higgs:

Is the best tree much better than others?

L: likelihood at nodes

Use Maximum Likelihood to rank alternate trees

yes

yes

same topology

NJ tree is 2nd best

Use Parsimony to rank alternate trees

different topology

; parsimony differentiates weakly

Quartet puzzling

MCMC: Markov chain with Monte Carlo

Topology probabilities according to MCMC

Clade probability compared from tree methods

NJ method is very fast and close to being the best

Lecture and Book

•Lecture by Paul Higgs• online.itp.ucsb.edu/online/infobio01/higgs/• see online.itp.ucsb.edu/online/infobio01/ for many lectures

•Book by Wen-Hsiong Li 李文雄•“Molecular Evolution” (Sinauer Associates, 1997)

•CMS Molecular Biology Resource•www.unl.edu/stc-95/ResTools/cmshp.html•Phylogeny - Molecular Evolution•www.unl.edu/stc-95/ResTools/biotools/biotools2.html

•The Tree of Life Web Project •tolweb.org/tree/phylogeny.html

•Web Resources in Molecular Evolution and Systematics

•darwin.eeb.uconn.edu/molecular-evolution.html

Some web sites on Molecular Evolution

• On-line service • www.ebi.ac.uk/clustalw/• clustalw.genome.ad.jp/

• Software• ftp-igbmc.u-strasbg.fr/pub/ClustalX/• ftp-igbmc.u-strasbg.fr/pub/ClustalW/

Some web sites on ClustalW

top related