species trees & constraint programming. ongoing work with ian gent, barbara smith, wu wei...

24
Species Trees & Constraint Programming

Upload: clifton-newman

Post on 11-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

Species Trees & Constraint Programming

Page 2: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

Ongoing work with

Ian Gent, Barbara Smith, Wu Wei (Christine)

Page 3: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

The Tree of Life

A central goal of systematics

• construct the tree of life

• a tree that represents the relationship between all living things• including constraint programmers

• The leaf nodes of the tree are species

• The interior nodes are hypothesized species• extinct, where species diverged

Page 4: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

Properties of a Species Tree

• We have a set of leaf nodes, each labelled with a species• the interior nodes have no labels• each interior node has 2 children and one parent

• except the root (it has no parent)• if we have n leaf nodes we then have n 1 interior nodes• it is a bifurcating tree

Page 5: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

Super Trees

• We are given two trees, T1 and T2

• T1 has leaf set S1 and S2 has leaf set • remember, leaves are species!

• But S1 and S2 have a non-empty intersection• why? How can that happen?

• We want to combine T1 and T2• so, why is that a problem?

Page 6: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

Most Recent Common Ancestors (mrca)

a b

cWe have 3 species, a, b, and c

Species a and b are more closely relatedto each other than they are to c

The most recent common ancestor of a and bis further from the root than the most recent common ancestor of a and c (and b and c)

• mrca(a,b) mrca(a,c)• mrca(a,b) mrca(b,c)• mrca(a,c) mrca(b,c)

cab |

Page 7: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

Triples (and Fans)

a b

c

b c

d

Species trees are frequently presented as a set of triples (and fans)

}|,|{ dbccab

Page 8: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

Triples (and Fans)

a b

c

b c

d

a b

c

d

Page 9: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

BreakUp & OneTree (circa 1996)

Algorithm breakUp takes a species tree and produces a set of rooted triples R that define that tree.

Algorithm OneTree takes a set of species and a set of rootedtriples, and builds a tree that respects those triples, or reportsthat no tree exists (in polytime)

OneTree is a specialisation of Build, an algorithm proposedby Aho, Sagiv, Szymanski, and Ulman in 1981

Page 10: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

The Flavour of OneTree

Given a set of species S and rooted triples R

• produce a node N• construct a graph G

• with vertices in S• and edge (x,y) if triple xy|z is in R

• if G is a single component fail• else recursively build

• on the left with one component • with S’ and R’ (the set of species and triples in that component)

• on the right, with the other components

Page 11: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

The Flavour of OneTree

},,,{

}|,|{

dcbaS

dbccabR

d

a

c

b

},,{

}|{

cbaS

cabR

a

c

b

}{

{}

dS

R

d

Page 12: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

Min-cut Super Trees

• What happens if OneTree fails?

• Gives us the best you can• by breaking some triples (resulting in fans)• by excluding some species

• There are polytime algorithms for this• but they are greedy and biased

Page 13: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

Constraint Programming solutions to building a species tree from a set of rooted triples

Page 14: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

A naïve constraint encoding (footnotes 756, 789, 794, 796)

• n-1 variables as interior nodes• v[i] = j parent(v[i]) = v[j]• no loops/cycles

• Barbara used set variables (ILOG)• Patrick used specialised constraint (Chco)

• Francois then encoded set variables!• n variables as leaf nodes• each takes a value respecting triples

• I am sparing you (and me) the details

Page 15: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

Why was this a naïve constraint encoding?

• It produced the right number of trees when no triples• the Catalan number• symmetry breaking

• It would produce a tree if one existed

A 2 stage process

• (1) build a tree from the interior nodes• there are Catalan many of these

• (2) given an “interior tree” place the leaf nodes• there are n! ways to do this

• if step (2) fails generate the next interior tree in (1)

Yikes! That’s expensive.Imagine {ab|c,bc|d,cd|a}

Page 16: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

Ultrametric Trees & Species Trees (footnotes 803,804,805,810,819)

What is an ultrametric tree?

• We are given a 2d symmetric matrix D• D[i][j] is the time of divergence of species i and j.

• D[i,j] is the the mrca(i,j) labeled with time of divergence• D[i,j] is the value of mrca(i,j)

• Build a bifurcating tree• n leaves and n - 1 interior nodes• interior nodes labeled with entries from D• any path from the root is a strictly decreasing sequence

Page 17: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

8

35

B3 CD

EA

0

50

880

8830

35880

E

D

C

B

A

EDCBA

Ultrametric Trees: here’s one I (well, Dan Gusfield actually ) prepared earlier

Note: if the sequence increases, we have min-ultrametric tree

Page 18: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

Ultrametric Matrix: necessary & sufficient conditions

• cannot have more than n - 1 distinct values• because there are n - 1 interior nodes

• For every 3 indices i,j,k• there is a tie for the maximum between D[i,j], D[i,k], D[j,k]

Given an ultrametric matrix, an ultrametric tree can beconstructed in O(n2)

… see Dan Gusfield’s book “Algorithms on Strings, Trees, and Sequences”

Page 19: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

A CP encoding of D

• We have a 2 dimensional matrix of constrained integer cvariables D• We must ensure that for any i,j,k the following holds

],[],[

],[],[

],[],[

kjDkiD

kjDjiD

kiDjiD

],[ jiD

],[ kjD

],[ kiD

i

j k

Think isosceles triangles,allowing equilateral

An ultrametric space,composed of isosceles triangles

Page 20: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

A CP encoding of D

],[],[

],[],[

],[],[

kjDkiD

kjDjiD

kiDjiD

],[ jiD

],[ kjD

],[ kiD

i

j k

Any instantiation of the variables in D isnow guaranteed to be min-ultrametric

We get Catalan number of min-ultrametric solutions

Page 21: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

How can we exploit this?

• We are given triples and fans, but not distances!• But we can consider a triple ij|k as a constraint

k

ji

],[],[],[],[],[],[ kjDjiDkiDjiDkjDkiD

Note: our tree is min-ultrametric!

This over-rides the disjunctions postedacross the matrix

Page 22: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

The CP encoding (contd)

• we have the “blanket” disjunctive constraint to ensure min-ultrametric

• triples are constraints that break the disjunctions

• a solution (if one exists) is min-ultrametric respecting triples

• we can then produce tree from the matrix, as a post process

• NOTE: we need a pre-process to break up trees into triples

Page 23: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

So where are we?

Good question:

• we have not yet tried real data• we have a number of different micro-encodings• Are we in P for decision?

• Not sure yet• How about optimisation?

• We can see a way, by introducing penalties• Wu Wei is coding up BreakUp and OneTree

• so we have something real to compare with• We need real data to check this out• I need to get funding for this

• write a grant proposal with DRG I think!

Page 24: Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

Questions?