week 3: parsimony variants, compatibility, statistics and...

50
Week 3: Parsimony variants, compatibility, statistics and parsimony Genome 570 January, 2014 Week 3: Parsimony variants, compatibility, statistics and parsimony – p.1/50

Upload: others

Post on 20-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Week 3: Parsimony variants, compatibility, statistics andparsimony

Genome 570

January, 2014

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.1/50

Page 2: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Camin-Sokal parsimony

1 0 1

0

0 1 0

0 1

1

1

4 changes

Unidirectional change. Easy to reconstruct ancestors and count states.

This scheme is of importance in comparative genomics, with 1 = presenceof a deletion.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.2/50

Page 3: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Dollo parsimony

1 0 1

0

0 1 0

0 1

1

1

4 changes

(3 losses)

Assumes that it is much more difficult to gain state 1 than to lose it. Has

been used to model gain and loss of restriction sites.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.3/50

Page 4: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Polymorphism parsimony

1 0 1

0

0 1 0

0 1

1

1

5 retentions ofpolymorphism

01

Note that in this case the retention (not the loss) of the polymorphic state(01) is considered unlikely and is penalized.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.4/50

Page 5: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

nonadditive binary coding

0

1

2

3

4

original

new characters

0

0

0

0

0

a b

0

1

1

1

1

c

−1 1 0 0 0 0

d e

0 0 0

0

1

1

0

0

0

1

0

0

0

0

1

−1

0

1

2 3

4

a b

c d

e

Using multiple 0/1 characters to construct a case with the sameparsimony score on all trees as a single multistate character with a

“character state tree”

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.5/50

Page 6: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Dollo parsimony – a paradox

−1

0

1

2 3

4 A B C D

3 32 4

y

x

With a character-state tree, can you always find a Dollo parsimonyreconstruction that has only one origin of state 2 ?

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.6/50

Page 7: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Weighting using a function

Suppose that each step is to be weighted 1/(n + 2) if the character has a

total of n steps.

Then the total weight of steps when there are n steps in that character is

n × 1/(n + 2) which is n/(n + 2)

Total steps in Weight of Total weightthe character each step of all steps

0 0.5 01 0.3333 0.33332 0.25 0.53 0.2 0.64 0.1667 0.66675 0.142857 0.71428

The rationale for doing this is to somewhat discount the information frommore rapidly changing characters.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.7/50

Page 8: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Successive weighting

J.S. Farris suggested (1969):

1. Infer a tree with unweighted parsimony

2. Calculate weights for each character (or site) based on the number

of changes in that character on that tree

3. Use those weights to infer a new tree

4. Unless the tree hasn’t changed, go back to step 2.

This method, which was put forward in the first modern quantitative paperon weighting, was intended to discount rapidly-evolving characters.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.8/50

Page 9: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Successive weighting

There are 15 possible unrooted trees, which fall into 5 types according tohow many changes they have in each character. The table shows the totalweighted number of changes when each tree type is evaluated using the

weights implied by the 5 different tree types.

number have pattern type tree type used for weights

of trees of changes: of tree I II III IV V

1 (1,1,1,2,2,1) I 2.333 2.250 2.167 2.417 2.083

2 (1,2,1,2,2,1) II 2.667 2.500 2.500 2.667 2.333

2 (2,1,2,2,2,1) III 3.000 2.917 2.667 2.917 2.583

3 (2,2,2,1,1,1) IV 2.833 2.667 2.500 2.500 2.333

7 (2,2,2,2,2,1) V 3.333 3.167 3.000 3.167 2.833

From this table you can figure out what the sequence of trees will be if you

start from one of these types. Do you get different ultimate outcomes?

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.9/50

Page 10: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Ties in successive weighting

An example of successive weighting that would show the difficulty it has in

detecting ties. The table shows the total weighted number of changeswhen each tree type is evaluated using the weights implied by the differenttree types.

number have pattern type tree type used for weights

of trees of changes: of tree I II III

1 (1,1,2,2) I 1.667 1.833 1.5

1 (2,2,1,1) II 1.833 1.667 1.5

1 (2,2,2,2) III 2.333 2.333 2

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.10/50

Page 11: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Nonsuccessive weighting

We can use a different algorithm to avoid the issue of dependence onstarting point which is seen in successive weighting:

Search over tree space in the usual way

For each tree, evaluate the number of steps in each character (orsite)

Calculate the weight for the character from its number of steps onthe current tree in the search.

Add up the weighted steps across characters to evaluate that tree.

This is equivalent to looking only at the diagonals in the preceding tablesof trees. Weights are never based on a different tree.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.11/50

Page 12: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Evaluating compatibility with 0/1 characters

E. O. Wilson (Systematic Zoology, 1965) pointed out that for two characters,if all four combinations 00, 01, 10 and 11 exist in one or another species,the two characters must be incompatible with each other, in the sense thatthere can be no tree on which they each change only once (so the derived

state is uniquely derived).

Compatiblecharacters

0 1

0 X X1 X

Incompatiblecharacters

0 1

0 X X1 X X

The logic is straightforward: to originate a new combination requires a

step in one character (or both). With four combinations there are 3 stepsrequired, one more than the number that would be needed if bothcharacters were incompatible. (In genomics this is called the

“three-gamete condition” – out of the four possibilities there must be three

or fewer for there to have been no recombination and no recurrentmutations).

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.12/50

Page 13: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Data example for compatibility

The data set Table 1.1 with an added species all of whose characters are0.

Characters

Species 1 2 3 4 5 6

Alpha 1 0 0 1 1 0

Beta 0 0 1 0 0 0

Gamma 1 1 0 0 0 0

Delta 1 1 0 1 1 1

Epsilon 0 0 1 1 1 0

Omega 0 0 0 0 0 0

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.13/50

Page 14: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

The compatibility matrix for this data set

1 2 3 4 5 61 2 3 4 5 6

123456

The darkly-shaded boxes are the combinations that are compatible.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.14/50

Page 15: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

The Pairwise Compatibility Theorem

A set S of characters has all pairs of characterscompatible with each other if and only if all ofthe characters in the set are jointly compatible (inthat there exists a tree with which all of them arecompatible).

This theorem is true, given the right conditions, which we will explain.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.15/50

Page 16: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

The compatibility graph

Maximal cliques? Largest?

1

2 3

4

56

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.16/50

Page 17: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

One of the cliques it contains

1

2 3

4

56

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.17/50

Page 18: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

A maximal clique

1

2 3

4

56

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.18/50

Page 19: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

The largest maximal clique

1

2 3

4

56

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.19/50

Page 20: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Making the tree by “tree popping”

(for clique {1, 2, 3, 6})

Alpha

Beta

Gamma

Delta

Epsilon

AlphaBeta

Gamma

DeltaEpsilon

4 5

AlphaBetaGammaDeltaEpsilon

0011

10011

1

Character 1Character 3

1 2 3 6

101

01

11

00

0

1

1

0

00

00010

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.20/50

Page 21: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Making the tree by “tree popping”

(for clique {1, 2, 3, 6})

Gamma

Delta

Alpha

Beta

Gamma

Delta

Epsilon

AlphaBeta

Gamma

DeltaEpsilon

Beta

EpsilonAlpha

4 5

AlphaBetaGammaDeltaEpsilon

0011

10011

1

Character 1

Character 2

Character 3

1 2 3 6

101

01

11

00

0

1

1

0

00

00010

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.21/50

Page 22: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Making the tree by “tree popping”

(for clique {1, 2, 3, 6})

Gamma

DeltaGamma

Alpha

Beta

Gamma

Delta

Epsilon

AlphaBeta

Gamma

DeltaEpsilon

Beta

EpsilonAlpha

Delta

Beta

EpsilonAlpha

4 5

AlphaBetaGammaDeltaEpsilon

0011

10011

1

Character 1

Character 2

Character 3

Character 6

1 2 3 6

101

01

11

00

0

1

1

0

00

00010

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.22/50

Page 23: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Making the tree by “tree popping”

(for clique {1, 2, 3, 6})

Alpha

Beta

Gamma

Delta

Epsilon

AlphaBeta

Gamma

DeltaEpsilon

Beta

EpsilonAlpha

Alpha

Gamma

Delta

Beta

Epsilon

Tree is:

Gamma

Delta

Gamma

Delta

Beta

EpsilonAlpha

4 5

AlphaBetaGammaDeltaEpsilon

0011

10011

1

Character 1

Character 2

Character 3

Character 6

1 2 3 6

101

01

11

00

0

1

1

0

00

00010

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.23/50

Page 24: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Unknown states and compatibility

A data set that has all pairs of characters compatible, but which cannothave all characters compatible with the same tree. This violates thePairwise Compatibility Theorem, owing to the unknown ("?") states.

Alpha 0 0 0

Beta ? 0 1

Gamma 1 ? 0

Delta 0 1 ?

Epsilon 1 1 1

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.24/50

Page 25: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Multiple character states and compatibility

Walter Fitch’s set of nucleotide sequences that have each pair of sitescompatible, but which are not all compatible with the same tree.

Alpha A A A

Beta A C C

Gamma C G C

Delta C C G

Epsilon G A G

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.25/50

Page 26: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Haplotypes of SNPs in a simulated population

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.26/50

Page 27: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Values of the LD measure |D′|

1.0

0.8

0.6

0.4

0.2

0

D’

1 106 7 9 11 14 15 16 17 19 21 22 24 25 28 29 30

1

10

679

1114151617192122242528293031

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.27/50

Page 28: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Two of the pairs of SNPs that show |D′| = 1

29 3021 22

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.28/50

Page 29: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Transition probabilities in a two-state model

In a symmetric two-state model with rate of change r per unit time, where

0 1r

r we have Prob (1 | 0, t, r) = 12(1 − e

−2 r t)

0.0 0.5 1.0 1.5 2.00.0

0.2

0.4

0.6

0.8

1.0

2 (1 1 − e

r t

r tP

rob

ab

ilit

y o

f c

ha

ng

e

)− 2rt

When rt is small then to very good approximation the probability of theevents in the branch is rt or 1 − rt . The latter is nearly 1.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.29/50

Page 30: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Likelihood and parsimony

L = Prob (Data|Tree)

=chars∏

i=1

recon−structions

(

12

B∏

j=1

{

ritj if this character changes1 − ritj if it does not change

}

)

The sum is over all reconstructions of changes in that character that result

in the observed states at the tips of the tree. It is not just over all mostparsimonious reconstructions of changes.

We will see (later in the course) that maximizing the likelihood is a Good

Thing. We want to show that in the case where changes are infrequent,

there is a relationship between that and minimizing the (weighted)parsimony score, and what the proper weights turn out to be.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.30/50

Page 31: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

approximating ...

Since the probabilities of change are (nearly) rt, and the probabilities of

non-change are (nearly) 1, for each character (i) the likelihood isapproximately the product over all branches (the jth of which has length

tj)

1

2

B∏

i=1

(ritj)nij .

For them all it is approximately

L ≃chars∏

i=1

1

2

branches∏

j=1

(ritj)nij

.

Tossing out the 1/2 terms and taking logarithms, if the number of

changes in character i in branch j is nij,

− ln L ≃

chars∑

i=1

branches∑

j=1

nij (− ln (ritj)) .

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.31/50

Page 32: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Weights and likelihood

Probabilities of change and resulting weights for an imaginary case.

Character ri tj changes total weight

1 0.01 1 4.6052 0.01 2 9.2103 0.00001 1 11.519

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.32/50

Page 33: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

with one-tenth as much change ...

Character ri tj changes total weight

1 0.001 1 6.9082 0.001 2 13.8163 0.000001 1 13.816

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.33/50

Page 34: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

... and with one-tenth less again

Character ri tj changes total weight

1 0.0001 1 9.2102 0.0001 2 18.4213 0.0000001 1 16.118

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.34/50

Page 35: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Trees, steps, and patterns

0 0 0

1 1 1

1 1 1

1 1 1

1 1 1

0000

0001

0010

0100

0111

1000

1011

1101

1110

1111

1 1 1

1 1 1

1 2 2

1 2 2

0011

0101

0110

1001

1010

22 122 1

22 122 1

1100

1 1 1

111

0 0 0

a c

b d

a b

c d

a b

d c

This table shows how many changes of state (steps) are needed for each

of the 16 possible character patterns on each of the three unrooted treetopologies, under a parsimony criterion.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.35/50

Page 36: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Trees, steps, and patterns with DNA

AAAA

AAAC

AAAG

AAAT

AACA

AACG

AACT

AAGA

AAGC

AAGT

AATA

AATC

AATG

ACAA

ACAG

ACAT

ACCC

TTTT

...

0 0 0

1 1 1

1 1 1

1 1 1

1 1 11 1 1

2 2 2

2 2 2

1 1 1

2 2 2

2 2 2

1 1 1

2 2 2

2 2 2

1 1 1

2 2 2

2 2 2

AACC

AAGG

AATT

ACAC

ACCA

1 2 2

1 2 2

1 2 2

2 1 2

2 2 11 1 1

0 0 0

...

a c a b a b

b d c d d c

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.36/50

Page 37: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Parsimony and patterns

Ignoring all the patterns that have the same number of steps on all

topologies, the ones that matter to a parsimony method have one of thesethree sums:

nxxyy + 2nxyxy + 2nxyyx = 2(nxxyy + nxyxy + nxyyx) − nxxyy

2nxxyy + nxyxy + 2nxyyx = 2(nxxyy + nxyxy + nxyyx) − nxyxy

2nxxyy + 2nxyxy + nxyyx = 2(nxxyy + nxyxy + nxyyx) − nxyyx.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.37/50

Page 38: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

The counterexample

p pq

q q

A

B

C

D

The long branches have probabilities of change p, the short branches

have probabilities of change q. This is the canonical case of “long branchattraction”.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.38/50

Page 39: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Pattern probabilities

If tip pattern is 1100, and internal nodes are 1, 1 the probability is

1

2(1 − p)(1 − q)(1 − q)pq

and in general summing over all four possibilities:

P1100 = 12

(

(1 − p)(1 − q)2pq + (1 − p)2(1 − q)2q

+ p2q3 + pq(1 − p)(1 − q)2)

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.39/50

Page 40: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Pattern probabilities

Pxxyy = (1 − p)(1 − q)[q(1 − q)(1 − p) + q(1 − q)p]

+ pq[(1 − q)2(1 − p) + q2p]

Pxyxy = (1 − p)q[q(1 − q)p + q(1 − q)(1 − p)]

+ p(1 − q)[p(1 − q)2 + (1 − p)q2]

Pxyyx = (1 − p)q[(1 − p)q2 + p(1 − q)2]

+ p(1 − q)[q(1 − q)p + q(1 − q)(1 − p)]

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.40/50

Page 41: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Taking differences

Pxyxy − Pxyyx = (1 − 2q)[

q2(1 − p)2 + (1 − q)2p2]

Which is always positive as long as q < 1/2 and either p or q is

positive. Thus Pxyxy > Pxyyx so we don’t need to concern ourselves with

Pxyyx.

To have Pxxyy be the largest of the three, we only need to know that

Pxxyy − Pxyxy > 0

and after a struggle that turns out to require

(1 − 2q)[

q(1 − q) − p2]

> 0

which (provided q < 1/2) is true if and only if

q(1 − q) > p2

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.41/50

Page 42: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Pattern frequencies win out

100 200 500 1000 2000 5000 10000

0.00

0.05

0.10

0.15

0.20

Fre

qu

en

cie

s o

f th

e t

hre

e p

atr

tern

s

Number of characters

y x

x

x y

y

yy

x x yx

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.42/50

Page 43: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Conditions for inconsistency

p

q0.1 0.2 0.3 0.4 0.50

0.1

0.2

0.3

0.4

0.5

0

consistent

inconsistent

p2< q (1−q)

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.43/50

Page 44: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Example for patterns with DNA

root

z

p

qq

w

p

q

1(C) 3(A)

4(A)2(C)

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.44/50

Page 45: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Calculating a pattern frequency

Prob [CCAA] = 118

(1 − p)(1 − q)2pq + 127

pq2(1 − p)(1 − q)

+ 1162

p2q2(1 − q) + 7972

p2q3

+ 112

(1 − p)2(1 − q)2q.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.45/50

Page 46: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Pattern frequencies

Prob [xxyy] = (1 − p)2q(1 − q)2 + 23p(1 − p)q(1 − q)2

+ 49p(1 − p)q2(1 − q) + 2

27p2q2(1 − q)

+ 781

p2q3

Prob [xyxy] = 13(1 − p)2q2(1 − q) + 2

9p(1 − p)q2(1 − q)

+ 427

p(1 − p)q3 + 13p2(1 − q)3

+ 29p2q2(1 − q) + 2

81p2q3

Prob [xyyx] = 181

(1 − p)2q3 + 23p(1 − p)q(1 − q)2

+ 49p(1 − p)q3 + 1

9p2q(1 − q)2

+ 627

p2q2(1 − q) + 281

p2q3.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.46/50

Page 47: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Conditions for inconsistency with DNA

p <−18 q + 24 q2 +

243 q − 567 q2 + 648 q3 − 288 q4

9 − 24 q + 32 q2

(This will not be on the test).

For small p and q this is approximately

1

3p2 < q

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.47/50

Page 48: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Conditions for inconsistency with DNA

0.7

0.6

0.5

0.4

0.3

0.2

0.1

00.2 0.4 0.60

q

p

inconsistent

consistent

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.48/50

Page 49: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Inconsistency with a clock

A B C D E A B C D E

x x

Cases like this were discovered by Michael Hendy and David Penny in1989.

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.49/50

Page 50: Week 3: Parsimony variants, compatibility, statistics and ...evolution.gs.washington.edu/gs570/2014/week3.pdf · Week3:Parsimonyvariants,compatibility,statisticsandparsimony–p.11/50

Example showing inconsistency with a clock

200

100

0100 200 500 1000 2000

sitestr

ees c

orr

ect out of 1000

b

b

E

A

B

C

D

0.50

0.30

0.20−b

b = 0.03

b = 0.02

Week 3: Parsimony variants, compatibility, statistics and parsimony – p.50/50