constructing a level-2 phylogenetic network from a dense set of input triplets

38
Constructing a level-2 phylogenetic network from a dense set of input triplets Leo van Iersel 1 , Judith Keijsper 1 , Steven Kelk 2 , Leen Stougie 12 (1) Technische Universiteit Eindhoven (TU/e) (2) Centrum voor Wiskunde en Informatica (CWI), Amsterdam Email: [email protected] Web: http://homepages.cwi.nl/~kelk

Upload: zinnia

Post on 06-Jan-2016

11 views

Category:

Documents


0 download

DESCRIPTION

Constructing a level-2 phylogenetic network from a dense set of input triplets. Leo van Iersel 1 , Judith Keijsper 1 , Steven Kelk 2 , Leen Stougie 12 (1) Technische Universiteit Eindhoven (TU/e) (2) Centrum voor Wiskunde en Informatica (CWI), Amsterdam Email: [email protected] - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Constructing a level-2 phylogenetic network from a dense set of input triplets

Constructing a level-2 phylogenetic network from a dense set of input triplets

Leo van Iersel1, Judith Keijsper1, Steven Kelk2, Leen Stougie12

(1) Technische Universiteit Eindhoven (TU/e)(2) Centrum voor Wiskunde en Informatica (CWI), Amsterdam

Email: [email protected] Web: http://homepages.cwi.nl/~kelk

Page 2: Constructing a level-2 phylogenetic network from a dense set of input triplets

Triplet-based methods (1)

Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.)

Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor

z w x

x y z

y x w

w z y

algorithm

w z x y

solution

Page 3: Constructing a level-2 phylogenetic network from a dense set of input triplets

Triplet-based methods (2)

Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.)

Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor

z w x

x y z

y x w

w z y

algorithm

w z x y

solution

Page 4: Constructing a level-2 phylogenetic network from a dense set of input triplets

Triplet-based methods (2)

Given a set of rooted triplets zw|x, yx|w, xy|z, wz|y. (Note zw|x = wz|x.)

Find the tree that by contracting and deleting edges can give each of the triplet subgraphs as a minor

z w x

x y z

y x w

w z y

algorithm

w z x y

solution

Page 5: Constructing a level-2 phylogenetic network from a dense set of input triplets

From trees to networks…

• The algorithm of Aho et al. (1981) can be used to construct trees from rooted triplets.

• But…what if the algorithm fails? Why might the algorithm fail?

• Possible reason 1: The underlying evolution is tree-like, but the input triplets contain errors.

• Possible reason 2: The triplets are correct, but the underlying evolution is not tree-like. Biological phenomena such as hybridization, horizontal gene transfer, recombination and gene duplication can lead to evolutionary scenarios that are not tree-like!

• Response: try and construct not phylogenetic trees, but phylogenetic networks

Page 6: Constructing a level-2 phylogenetic network from a dense set of input triplets

From trees to networks (2)

x y z

x z y

• For example, suppose the input is {xy|z, xz|y}.

z

x

y

(Note that there are cases when, even if there is at most one triplet per 3 species, a tree is not possible)

Page 7: Constructing a level-2 phylogenetic network from a dense set of input triplets

From trees to networks (2)

x y z

x z y

• For example, suppose the input is {xy|z, xz|y}.

z

x

y

(Note that there are cases when, even if there is at most one triplet per 3 species, a tree is not possible)

Page 8: Constructing a level-2 phylogenetic network from a dense set of input triplets

From trees to networks (2)

x y z

x z y

• For example, suppose the input is {xy|z, xz|y}.

z

x

y

(Note that there are cases when, even if there is at most one triplet per 3 species, a tree is not possible)

Page 9: Constructing a level-2 phylogenetic network from a dense set of input triplets

Level-k phylogenetic networks

z

x

y

root(only one!)

leaf-vertex

split-vertex

recombination-vertex

A level-k phylogenetic network is a rooted,

directed acyclic graph where every biconnected

component (in the underlying undirected

graph) contains at most k recombination vertices.

Page 10: Constructing a level-2 phylogenetic network from a dense set of input triplets

• A set of input triplets is dense iff, for every subset of 3 species, there is at least one triplet corresponding to those 3 species.

• Therefore, a dense set of input triplets for n species contains O(n3) triplets.

• Jansson & Sung (2006) showed:

Level-1 Networks

Given a dense set of triplets T for a set L of species, it is possible to determine in polynomial-time whether a level-1 phylogenetic

network N exists such that all the triplets in T are consistent with N. (And if so, to construct such a network.)

• They later showed, together with Nguyen, how to do this in time linear in |T|. They also showed that, in the non-dense case, the problem is NP-hard.

• But what about level-2 networks, and higher?

Page 11: Constructing a level-2 phylogenetic network from a dense set of input triplets

Here is an example of a level-2 network.

Main result: Given a dense set of triplets T for a set L of species, it is possible to determine in time O(|T|3) whether a level-2

phylogenetic network N exists such that all the triplets in T are consistent with N. (And if so, to construct such a network.)

Page 12: Constructing a level-2 phylogenetic network from a dense set of input triplets

Algorithm, basic idea

• The basic idea behind Aho’s algorithm for trees is that we are able to determine, recursively, which species belong to which of the two subtrees hanging from some root vertex.

• For the level-1 and level-2 networks if there again exists such a clear dichotomy, we iterate on the two subsets.

root

Sub-network

Sub-network

Page 13: Constructing a level-2 phylogenetic network from a dense set of input triplets

Algorithm, basic idea

• The basic idea behind Aho’s algorithm for trees is that we are able to determine, recusively, which species belong to which of the two subtrees hanging from some root vertex.

• For the level-1 networks if there again exists such a clear dichotomy, we iterate on the two subsets. Otherwise there must exist a network of the form

Sub-networ

k

Sub-networ

k Sub-networ

kSub-

network

Sub-networ

k

Page 14: Constructing a level-2 phylogenetic network from a dense set of input triplets

Algorithm, basic idea

• The basic idea behind Aho’s algorithm for trees is that we are able to determine, recusively, which species belong to which of the two subtrees hanging from some root vertex.

• For the level-1 networks if there again exists such a clear dichotomy, we iterate on the two subsets. Otherwise there must exist a network of the form

Sub-networ

k

Sub-networ

k Sub-networ

kSub-

network

Sub-networ

k

Find the partition of the species (leaves)

into the subnetworks

Find the blue backbone network

Treat each of the partition elements (sub-networks) as

leaves to be hanged on the backbone

Recurse on the subnetworks

Page 15: Constructing a level-2 phylogenetic network from a dense set of input triplets

Algorithm, high-level idea

For level-2 networks the idea is similar:

Sub-networ

k

Sub-networ

kSub-

network

Sub-networ

k

Sub-networ

k

Find the partition of the species (leaves)

into the subnetworks

There is a complication in

level-2

Find the blue backbone network!

There are more level-2 backbone

forms

Treat each of the partition elements (sub-networks) as

(meta-)leaves to be hanged on the

backbone

Recurse on the subnetworks

Page 16: Constructing a level-2 phylogenetic network from a dense set of input triplets

• Suppose I have a partition P = {P1, P2, …, Pt} of the leaf set L.

• Suppose I have a dense set of triplets T on the leaf set L.

• Let T’ be a new triplet set on leaf set {q1, q2,…, qt} defined as follows:

• qiqj|qk is in T’ if and only if i≠j≠k and there exists a triplet xy|z in T such that x is in Pi, y is in Pj and z is in Pk

• Then we say that T’ is the triplet set induced by the partition P of L.

• Critically: if T is dense, then T’ is also dense.

• In some sense this can be perceived as a ‘coarsening’ of the input set.

Definition: inducing new triplet sets from partitions of the leaf set

Page 17: Constructing a level-2 phylogenetic network from a dense set of input triplets

Definition: simple level-2 networks

Lemma: There are exactly 4 different backbone networks

A simple level-2 network is any network obtained by“hanging leaves” off one of the above structures.

Page 18: Constructing a level-2 phylogenetic network from a dense set of input triplets

Here the leaves{a,b,c,d,e,f,g,h} have

been ‘hung’ from structure 8a, to yield a simple level-2

network.

A picture description of the simple level-2 algorithm

Page 19: Constructing a level-2 phylogenetic network from a dense set of input triplets

Level-2 network algorithm

Assume some oracle gives us the partition of the leaves into sub-networks

Treat each subnetwork as a leaf and construct a simple level-2 network

The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar

Page 20: Constructing a level-2 phylogenetic network from a dense set of input triplets

Suppose we can correctly ‘guess’ that leaf g hangs

directly below a recombination node

If we remove g, and all triplets that contain g, then we know that a

level-1 network must be possible on this new set of triplets (because now

fewer recombination nodes are needed)

Page 21: Constructing a level-2 phylogenetic network from a dense set of input triplets
Page 22: Constructing a level-2 phylogenetic network from a dense set of input triplets

Level-2 network algorithm

Assume some oracle gives us the partition of the leaves into sub-networks

Treat each subnetwork as a leaf and construct a simple level-2 network

The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar Guess the right “caterpillar set”

Page 23: Constructing a level-2 phylogenetic network from a dense set of input triplets

Caterpillar set

A caterpillar set with respect to a dense triplet set T is the set of leaves of a caterpillar subgraph of a network consistent with T

The empty set is also a caterpillar set

Caterpillar

Page 24: Constructing a level-2 phylogenetic network from a dense set of input triplets

Suppose we subsequently guess that the caterpillar with h now

hangs below a recombination node in

the new network.

If we remove the h-caterpillar, and all triplets that contain leaves of it,

then we know that a level-0 network must be possible on this new set of triplets (because now

even fewer recombination nodes are

needed.)

Page 25: Constructing a level-2 phylogenetic network from a dense set of input triplets

Level-2 network algorithm

Assume some oracle gives us the partition of the leaves into sub-networks

Treat each subnetwork as a leaf and construct a simple level-2 network

The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar Guess the right “caterpillar set” Remove it and remove the triplets that contain any

element of this set Construct the unique tree for the remaining triplets

[Jansson&Sung 2006]

Page 26: Constructing a level-2 phylogenetic network from a dense set of input triplets

In such a case the resulting tree is UNIQUE

(J&S).

Page 27: Constructing a level-2 phylogenetic network from a dense set of input triplets

So now we have a tree. We are going to guess

how to add the h-caterpillar back in, and then guess how to add

leaf g back in.

Page 28: Constructing a level-2 phylogenetic network from a dense set of input triplets

Adding the h-caterpillar back in.

Page 29: Constructing a level-2 phylogenetic network from a dense set of input triplets

And finally adding leaf g back in.

g

Page 30: Constructing a level-2 phylogenetic network from a dense set of input triplets

Level-2 network algorithm

Assume some oracle gives us the partition of the leaves into sub-networks

Treat each subnetwork as a leaf and construct a simple level-2 network

The simple level-2 network algorithm Guess the right “recombination leaf” Remove it and remove the triplets that contain this leaf 1 recombination vertex left with below it a caterpillar Guess the right “caterpillar set” Remove it and remove the triplets that contain any element

of this set Construct the unique tree for the remaining triplets

[Jansson&Sung 2006] Insert the caterpillar set and the recombination leaf in the

tree in the correct way

For each pair of guesses try all 4 backbone structures

Page 31: Constructing a level-2 phylogenetic network from a dense set of input triplets

Simple level-2 algorithm

Theorem: The simple level-2 network algorithm works in O(|T|^3)

Page 32: Constructing a level-2 phylogenetic network from a dense set of input triplets

SN-sets to partition the set of leaves

• Jansson & Sung introduced the SN-set to partition the set of leaves• SN-sets are special subsets of the leaves L, and are defined w.r.t. T • All sets containing just a single leaf, are SN-sets.• Any other SN-set is any subset of leaves obtained by taking the

closure of some subset S of the leaves L w.r.t. the following operation

If x,y є S and xz|y є T or yz|x є T then z є S

The SN-set that is equal to the total leaf set L, is called the trivial SN-set.

An SN-set that is non-trivial, and is not a strict subset of any other non-trivial

SN-set, is called a maximal SN-set.

(If the network is a tree there are 2 maximal SN-sets: one the set of leaves of

the subtree right and the other the set of leaves of the subtree left of the root)

Page 33: Constructing a level-2 phylogenetic network from a dense set of input triplets

• Jansson and Sung proved that the set of maximal SN-sets indeed partition the leaf set L. So no two maximal SN-sets overlap, and they completely cover the set of input leaves.

• All SN-sets and all maximal SN-sets can be found in polynomial-time.

• Jansson & Sung solved the level-1 problem by observing that each maximal SN-sets hangs as a ‘meta-leaf’ on the level-1 backbone network;

each maximal SN-set can completely be separated from the rest of the network by removing just one edge

• There are maximal SN-sets in level-2 networks that can hang under more than one edge!!!!

Definition: maximal SN-set

Page 34: Constructing a level-2 phylogenetic network from a dense set of input triplets

Definition highest cut-edge

In a phylogenetic network N, a cut-edge (x,y) is an edge whose removal disconnects the undirected graph.

A cut-edge (x,y) is said to be a trivial cut edge iff y is a leaf.

A cut-edge (x,y) is said to be highest iff there is no cut-edge (p,q) such that there is a directed path from q to x in N.

Page 35: Constructing a level-2 phylogenetic network from a dense set of input triplets

• Fact. Let (x,y) be a highest cut-edge and let L’ be the set of leaves reachable from y. Let L* be a strict subset of L’. Then L* is not a maximal SN-set.

• Proof: the set of leaves reachable from a highest cut-edge (x,y), is itself an SN-set. Clearly for any two leaves p,q in L’ and leaf r outside L’ there cannot be triplets pr|q and qr|p: the edge (x,y) forms a bottleneck. Thus pq|r must exist.

y

x

p q r

p r qL’

So: each maximal SN-set

can be expressed as

the union of the leaves

reachable by one or more highest cut-

edges.

Page 36: Constructing a level-2 phylogenetic network from a dense set of input triplets

Central Theorem (simplified). Suppose there is a dense triplet set T consistent with some simple level-2 network N. Then there

exists a level-2 network N’ (not necessarily simple) such that, with the exception of perhaps one maximal SN-set with respect to T,

every maximal SN-set appears below a single cut-edge in N’. The remaining, ‘odd-one-out’ maximal SN-set (if it exists) will be equal

to the union of leaves below two cut-edges.

In other words: there exists at most one maximal SN-set which is the union of the leaves below two highest cut-edges, whereas all other

SN-sets consist of the leaves below one highest cut-edge

Page 37: Constructing a level-2 phylogenetic network from a dense set of input triplets

The algorithm Determine the maximal SN-sets Guess the right SN-set to be split Treat the max SN-sets and the two split

sets as leaves {S1,S2,…,Sq} Adapt T to a new triplet set T’: SiSk|Sh є T’ if and only if there exist xєSi, yєSk,zєSh s.t. xy|z є T Construct a simple level-2 network for T’ Recursively find the sub-networks for

the sets S1,S2,…,Sq

Page 38: Constructing a level-2 phylogenetic network from a dense set of input triplets

Conclusions & open problems

• So we know how to efficiently construct level-2 networks from dense triplet sets. What’s next?

• Applicability: how useful is it?

• Initial implementation: programming and fine-tuning

• Improving running time: in the spirit of the “SN-tree” of J&S&N

• Complexity: what about level-3 and higher?

• Bounds: worst-case, best-case scenarios

• Building all networks

• Properties of output networks as function of input

• Different triplet restrictions

• Confidence: how good are the solutions?

• Exponential-time exact algorithms for NP-hard problems