doug raiford lesson 9. 3 approaches distance parsimony maximum likelihood have already seen a...

Post on 29-Jan-2016

227 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Doug RaifordLesson 9

3 Approaches Distance Parsimony Maximum Likelihood

Have already seen a distance method

04/22/23 2Phylogenetics Part II

What’s wrong with UPGMA?Let’s revisit the exampleCan this be? Doesn’t the derived tree

imply that B is equidistant from C and D

04/22/23 Phylogenetics Part II 3

A B C D

A B C D

A 0 7 6 7

B 0 4 5

C 0 3

D 0

UPGMA averaged the two and put them both (branches for C and D) at 1.5

What if don’t have equal rates of evolution after a divergence

04/22/23 Phylogenetics Part II 4

A B C D

A B C D

A 0 7 6 7

B 0 4 5

C 0 3

D 0

4

.5.5

1 22.5

Differing rates of evolution can sometimes cause problems with UPGMA

Especially if very similar (small distances)

04/22/23 Phylogenetics Part II 5

A B C

A 0 4 3

B 0 3

C 0A B C

1

2 11

This tree Yields this matrix Yields this tree

BCA

Also called minimum evolution method

Definition of parsimony:1 a : the quality of being careful with money or resources : thrift b : the quality or state of being stingy

2 : economy in the use of means to an end; especially : economy of explanation in conformity with Occam's razor

Ockham's razor: the simplest explanation is usually the best

04/22/23 6Phylogenetics Part II

Looks at each column of an MSA and attempts to find a tree that describes

Builds a consensus tree

atgccgca-actgccgcaggagatcaggactttcatgaatatcatcatgcgtggga-ttcagacctccatacgtgccccaggagatctggactttcacc---tggatcatgcgaccgtacctact-atgg-t-cgtgccgcaggagatcaggactttca-gt--g-aatcatctgg-cgc--c-aat--tcgt-ac-tgccccaggagatctggactttcaaa---ca-atcatgcgcc-g-tc-tataattccgtacgtgccgcaggagatcaggactttcag-t--a-tatcatctgtc-ggc--tag

04/22/23 7Phylogenetics Part II

What do we mean when we say “attempts to find a tree that describes”

Attempts to fit all possible trees in each column and choose best

How determine all possible trees? How determine which one has the best fit? Assume that majority nucleotide represents

ancestor

AGCTAACTAACTAACT

One possible tree

A A A G

A

00

A or a G

A or a G0 if A

0 if A

0 if A 1 if A

04/22/23 8Phylogenetics Part II

Total mutations that explain this

tree = 1

Pretty darn good

When there are two organisms there is only one possible tree

A B

04/22/23 9Phylogenetics Part II

What about when there are threeThird could go…

A B04/22/23 10Phylogenetics Part II

For each of the previous 3 trees, could add 4th to any of its branches (or could form a new root)

Each of the possible trees had 4 branches so could add to one of 4 locations (or splice in at top)

So total number of trees with 4 leaves: 3*5=15

04/22/23 Phylogenetics Part II 11A B

If this were the tree

Ni is number of trees given i taxa

Bi is the number of branches in a tree given i taxa

Bi=Bi-1+2, also i x 2-2 Ni=Ni-1*(Bi-1+1)

plus 1 due to possible new root

N2= 1 B2=2

04/22/23 Phylogenetics Part II 12

TaxaBranch

esTrees

2 2 1

3 4 3

4 6 15

5 8 105

6 10 945

7 12 10,395

8 14 135,135

9 16 2,027,025

10 1834,459,42

5

11 20654,729,0

75

Defined by a recurrence relation

so …

That’s right, as usual, exponential

Defined by a recurrence relation

so …

That’s right, as usual, exponential

What does this growth rate look like?

What does this growth rate look like?

Rooted vs. un-rootedWherever the root is, un-kink it

04/22/23 Phylogenetics Part II 13

Always bifurcated Can never have 3 branches “from” a

single node What are the odds?

04/22/23 Phylogenetics Part II 14

A

B C

D

Three possible trees

04/22/23 Phylogenetics Part II 15

A

B C

D

A

D C

B

A

C B

D

Are there any other combinations?

For each of the three trees (having 4 taxa) could add a branch to any of the 5 branches

3*5=15 trees

04/22/23 Phylogenetics Part II 16

A

B C

D

Outgroup Include an organism that is known to be

further away from all taxa than they are from each other

04/22/23 17Phylogenetics Part II

A

B C

D

If outgroup goes here…

outgroup A B C D

Ni is number of trees given i taxa

Bi is the number of branches in a tree given i taxa

Bi=Bi-1+2, also i x 2-3 Ni=Ni-1*(Bi-1)

No need for a “plus 1” for a possible new root because there are no roots

N2= 1 B2=2

04/22/23 Phylogenetics Part II 18

TaxaBranch

esTrees

3 3 1

4 5 3

5 7 15

6 9 105

7 11 945

8 13 10,395

9 15 135,135

10 17 2,027,025

11 1934,459,42

5

12 21654,729,0

75

Noticed that for un-rooted trees: Bi=2i-3 (for i 2)

Also noticed Ni=Ni-1*Bi-1

And reduced to (2n-5)(2n-7)(2n-9)…(3)(1)

where n is number of taxa Shorthand: (2n-5)!!

For rooted Ni=Ni-1*(Bi-1+1)

Reduced to (2n-3)!!

04/22/23 19Phylogenetics Part II

Ni=Bi-1*Ni-1

=(2(i-1)-3)Ni-1

=(2i-5)Ni-1

=(2i-5)(2i-7)Ni-2

Till the N term gets to 3

Double factorial: each successive number

reduced by two

Radical reduction in the number

Still only bought one additional taxa

04/22/23 Phylogenetics Part II 20

TaxaUn-rooted

treesRooted trees

3 1 3

4 3 15

5 15 105

6 105 945

7 945 10,395

8 10,395 135,135

9 135,135 2,027,025

10 2,027,025 34,459,425

11 34,459,425 654,729,075

12 654,729,07513,749,310,5

75

Even brighter mathematicians

04/22/23 21Phylogenetics Part II

Can you see why?

Not really a candidate for dynamic programming Don’t repeat a bunch of

sub-problems over and over Each sub-problem is a tree,

and they are all unique

04/22/23 Phylogenetics Part II 22

Still exponen

tial

Still exponen

tial

Discard large subsets of possible solutions

Use heuristics or predictions

04/22/23 Phylogenetics Part II 23

Don’t bother

Calculate a reasonable upper bound using a fast algorithm like UPGMA (hierarchical clustering)

Incrementally grow potential treesAny branch that any that go over

threshold stop investigating

04/22/23 Phylogenetics Part II 24

A

B C

DXX

X

Don’t bother, over threshold

Some columns all same Add no meaning All trees minimum

Columns that are all different Also add no meaning

Must have minimum 2 nt’s (or aa’s) that are the same

Useful in one respect If all the same infer makeup of

ancestor

04/22/23 Phylogenetics Part II 25

AGCTAACTAACTACCT

A A A A

A

00

A

A00

0 0

Each column yields a tree If all agree done If some different use

majority rule If sample too small

perform bootstrapping randomly draw sequences

from MSA Generate more trees labeled branches with the

percentage of bootstrap trees in which they appear

Used as a measure of support (repeatability)

04/22/23 Phylogenetics Part II 26

Still have maximum likelihoodAlso, some inferential stuff, but

that’s all in the next lecture

04/22/23 Phylogenetics Part II 27

04/22/23 28Phylogenetics Part III

top related