phylogenetic trees. chimphumangorilla humanchimpgorilla = chimpgorillahuman == gorillachimp trees

54
Phylogenetic trees

Post on 21-Dec-2015

229 views

Category:

Documents


2 download

TRANSCRIPT

Phylogenetic trees

Chimp HumanGorillaHuman ChimpGorilla

=

Chimp GorillaHuman

= =

Human GorillaChimp

Trees

A branch =An edge

External node - leaf

Human ChimpChicken Gorilla

The root

Internal nodes

Terminology

Human ChimpChicken Gorilla

INGROUPOUTGROUP

Ingroup / Outgroup:

The maximum parsimony principle.

(The shortest path)

Modified from Inferring Phylogenies (Book),Author: Prof. Joe Felsenstein

Genes: 0 = absence, 1 = presence

speciesg1g2g3g4g5g6

s1100110

s2001000

s3110000

s4110111

s5001110

s1 s4 s3 s2 s5

Evaluate this tree…

s1 s4 s3 s2 s5

1

s1 s4 s3 s2 s5

01

s1 s4 s3 s2 s5

11 0

s1 s4 s3 s2 s5

1 1 1 0 0

Gene number 1

s1 s4 s3 s2 s5

Gene number 1.

The most parsimonious ancestral character states

1 1 1 0 0

10

1

s1 s4 s3 s2 s5

Gene number 1, Option number 1.

1 1 1 0 0

1

0

1

1

s1 s4 s3 s2 s5

Gene number 1, Option number 2.

Minimal number of changes for gene 1 (character 1) = 1

1 1 1 0 0

1

0

0

1

s1 s4 s3 s2 s5

0 0

Gene number 2,

s1 s4 s3 s2 s5

Gene number 2, Option number 1.

0 1 1 0 0

1

0

0

1

s1 s4 s3 s2 s5

Gene number 2, Option number 2.

0 1 1 0 0

1

0

1

1

Gene number 2, Option number 2.

s1 s4 s3 s2 s5

0 1 1 0 0

0

0

0

0Number of changes for gene 2 (character 2) = 2

Gene number 2, Option number 3.

Sum of changes = 9

Genes: 0 = absence, 1 = presence

speciesg1g2g3g4g5g6

s1100110

s2001000

s3110000

s4110111

s5001110

Total number of changes

given the tree

121221

Can we do better?

Sum of changes = 9

YES WE CAN!

Sum of changes = 8

Sum of changes = 9

The MP (most parsimonious) tree:

s1 s4 s3 s2 s5

The MP (most parsimonious) tree:

Sum of changes for this tree topology = 8

Intermediate Summary

MP tree = one for which minimal number of changes are needed to explain the data

We can now search for the best tree under the MP criterion

Challenges

Evaluating big tree “by hand” can be problematic. We want the computer to do it.

Going over all the trees? How many trees are there?

Can we generalize to nucleotides? To amino acids?

Is the parsimony criterion ideal?

Positions :

speciesp1p2p3p4p5p6

s1AAGTAA

s2CAAAAC

s3CAGGAA

s4AAATAC

s5GCGCCA

s1 AAGTAA

s2 CAAAAC

s3 CAGGAA

s4 AAATAC

s5 GCGCCA

s1 s4 s3 s2 s5

G

Position number 1

A A C C

s1 s4 s3 s2 s5

G

Position number 1

A

A

C CA

C

C

C Number of changes for position 1 = 2

GACA GGGACAAG GCGAGAAA

Human ChimpChicken GorillaDuck

Find the MP score of the tree for these sequences

Exercise

How to efficiently compute the MP score of a tree

A GC CA

Human ChimpChicken GorillaDuck

{A,G}

{A,C,G}

{A,C}

{A,C}

Postorder tree scan. In each node, if the intersection between the leaves is empty: we apply a union operator. Otherwise, an intersection.

The Fitch algorithm (1971):

A GC CA

Human ChimpChicken GorillaDuck

{A,G}

{A,C,G}

{A,C}

{A,C}

Total number of changes = number of union operators.

Positions :

speciesp1p2p3p4p5p6

HumanAAGTAA

ChimpAATTAC

GorillaACATAA

A A A A A AA A A

C H G G C HH C G

Total number of changes = 0

For all 3 possible tree topologies

Positions :

speciesp1p2p3p4p5p6

HumanAAGTAA

ChimpAATTAC

GorillaACATAA

A A C C A AA A C

C H G G C HH C G

Total number of changes = 1

For all 3 possible tree topologies

Positions :

speciesp1p2p3p4p5p6

HumanAAGTAA

ChimpAATTAC

GorillaACATAA

T G A A T GG T A

C H G G C HH C G

Total number of changes = 2

For all 3 possible tree topologies

Positions :

speciesp1p2p3p4p5p6

HumanAAGTAA

ChimpAATTAC

GorillaACATAA

C H G G C HH C G

Total number of changes is always the same

for all 3 possible tree topologies

G O HC H C GOO C HG

G H CO H O CGO H GC

G C OH H O GCO C GH O C GH

O H GC

O C HG

C H GO

C O HG

C O GH

G O HC H C GO

O C GH O C GH

C O HG

G O HCH C GO

O C GH

C

C GH

C O HG

1

5

4 3

2

O

OG

H

The position of the root does not affect the MP score.

Conclusion

Chimp

Orangutan

Gorilla

Human

C

GC A

G

G

G

G

G

G

A

G

After “bending” the trees, the association of changes and branches does not change!

Rooting does not change MP score

G

Chimp

Orangutan

Gorilla

Human

C

GC C

G

G

G

C

C

G

C

G

C

After “bending” the trees, the association of changes and branches does not change!

Rooting does not change MP score

Human

Chimp

Chicken

Gorilla

Human

Gorilla

Chimp

Chicken

Human

Chicken

Chimp

Gorilla

With 4 taxa, there are 3 difference unrooted trees.

Human

Chimp

Chicken

Gorilla

Human

Gorilla

Chimp

Chicken

Human

Chicken

Chimp

Gorilla

One tree gets a better score (less changes) than the other trees.

Human

Chimp

Chicken

Gorilla

We then use an external knowledge, that chicken is the outgroup and get a rooted tree

C

X

Y

H

X

O

CHY O

Can you root the unrooted tree to obtain the tree below?

Exercise

How many rooted trees result from an unrooted tree with n taxa?

Exercise

Assume you have three sequences and the MP score of the unrooted tree is X. You now add another sequence. Can the score of the 4-taxa tree be lower than that of the 3 taxa tree?

Exercise