cs 581 paper presentationtandy.cs.illinois.edu/khan-presentation.pdf · cs 581 paper presentation...

Post on 08-Aug-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CS 581 Paper PresentationMuhammad Samir Khan

Recovering the Treelike Trend of Evolution Despite Extensive Lateral Genetic Transfer: A Probabilistic Analysis

by

Sebastien Roch and Sagi Snir

Overview

• Introduction (what is LGT?)

• Notation

• Model• Bounded-rates Model• Yule Process

• Quartet Based Approach• Bounded Rates Model• Yule Process• Preferential LGT

• Further Results

What is LGT?

• Non-vertical transfer of genes

• Overall evolution is tree-like

• Particularly common in bacteria

• Primary Reason for the spreadof antibiotic resistance 1

1. https://en.wikipedia.org/wiki/Horizontal_gene_transfer2. http://www.nature.com/nrmicro/journal/v3/n9/images/nrmicro1253-f1.gif

Species Phylogeny

• 𝑇𝑠 = (𝑉𝑠, 𝐸𝑠, 𝐿𝑠: 𝑟, 𝜏)• 𝑉𝑠 vertices

• 𝐸𝑠 edges

• 𝐿𝑠 leaves

• 𝑟 root

• 𝜏(𝑒) interspeciation times

• Number of leaves 𝑛 = 𝑛+ + 𝑛−

• 𝑛+ > 0 extant species

• 𝑛− ≥ 0 extinct species

𝑟

extinct

extant𝜏(𝑒)

Extant Phylogeny

• Denoted 𝑇𝑠+ = (𝑉𝑠

+, 𝐸𝑠+, 𝐿𝑠

+: 𝑟+, 𝜏+)• Restrict to extant leaves 𝑇𝑠|𝐿𝑠

+

• Suppress vertices of degree 2 (add up the branch lengths)

• Root at the most recent common ancestor of 𝐿𝑠

+

• 𝑇𝑠+ is ultrametric

• Want to recover the extant phylogeny

𝑟

time

Gene Trees

• 𝑇𝑔 = (𝑉𝑔, 𝐸𝑔, 𝐿𝑔: 𝜔𝑔) for a gene 𝑔 is an unrooted tree• 𝑉𝑔 vertices

• 𝐸𝑔 edges

• 𝐿𝑔 leaves subset of 𝐿𝑠• 𝜔𝑔(𝑒) branch lengths (expected number of substitutions)

• Each vertex of degree 2 or 3

• 𝒯𝑔 = 𝒯[𝑇𝑔] is the topology of 𝑇𝑔 with degree 2 vertices suppressed

• Not ultrametric

LGT Transfer – Subtree Prune and Regraft

• LGT Transfer takes place on locations along the edges

• Recipient location: pruning

• Donor location: regrafting

• A new node at donor location

1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.

Contemporaneous Locations

• Two locations 𝑥, 𝑦 are contemporaneous if their 𝜏-distance to the root is identical:

𝜏 𝑥, 𝑟 = 𝜏(𝑦, 𝑟)

• For 𝑅 > 0, 𝐶𝑥(𝑅)

is the set of locations contemporaneous to 𝑥 and with MRCA at 𝜏-distance at most 𝑅 from 𝑥:

𝐶𝑥(𝑅)

= 𝑦 ∶ 𝜏 𝑥, 𝑟 = 𝜏 𝑦, 𝑟 , 𝜏 𝑥, 𝑦 ≤ 2𝑅

Random LGT

• Species phylogeny fixed 𝑇𝑠 = 𝑉𝑠, 𝐸𝑠, 𝐿𝑠: 𝑟, 𝜏

• 0 < 𝑅 ≤ ∞ (possibly depending on 𝑛)

• Each edge has a rate of LGT 𝜆 𝑒 : 0 < 𝜆 𝑒 < +∞• Λ 𝑒 = 𝜆 𝑒 𝜏 𝑒

• Λ𝑡𝑜𝑡 = σ𝑒∈𝐸𝑠Λ 𝑒

• Λ = σ𝑒∈𝐸(𝑇𝑠|𝐿𝑠

+)Λ 𝑒

• Taxon sampling probability 𝑝 ∶ 0 < 𝑝 ≤ 1

Random LGT

• LGT locations:• Start from root (chronologically)• Along each edge 𝑒 ∈ 𝐸𝑠, select a recipient location according to a continuous-

time Poisson process with rate 𝜆 𝑒• If 𝑥 is selected as a recipient location, donor location is selected uniformly at

random from 𝐶𝑥𝑅

• Keep each extant leaf independently with probability 𝑝, to get 𝐿𝑔

• Gene tree 𝑇𝑔 is obtained by keeping the subtree restricted to 𝐿𝑔

Bounded Rates Model

• Constants:• 𝜌𝜆 ∶ 0 < 𝜌𝜆 < 1

• 𝜌𝜏 ∶ 0 < 𝜌𝜏 < 1

• ҧ𝜏 ∶ 0 < ҧ𝜏 < +∞

• ҧ𝜆 possibly depending on 𝑛+ : 0 < ҧ𝜆 < +∞• Used to control the amount of LGT

• Under the bounded rates model:𝜌𝜆 ҧ𝜆 ≤ 𝜆 𝑒 ≤ ҧ𝜆 ∀𝑒 ∈ 𝐸𝑠𝜌𝜏 ҧ𝜏 ≤ 𝜏+ 𝑒+ ≤ ҧ𝜏 ∀𝑒+ ∈ 𝐸𝑠

+

Yule Process

• Branching process that starts with two species

• Each species generates a new offspring at rate 𝜈 ∶ 0 < 𝜈 < +∞• No extinct species

• Stop when number of species = 𝑛 + 1 (ignore the last species)

• 𝜌𝜆 ҧ𝜆 ≤ 𝜆 𝑒 ≤ ҧ𝜆 for every edge 𝑒 ∈ 𝐸𝑠• 𝜌𝜆 constant: 0 < 𝜌𝜆 < 1• ҧ𝜆 possibly depending on 𝑛: 0 < ҧ𝜆 < +∞

Quartet Based Approach

• Input: Gene trees 𝑇𝑔1 , … , 𝑇𝑔𝑁 Output: Estimated extant species phylogeny 𝑇

• Let 𝑋 = 𝑎, 𝑏, 𝑐, 𝑑 be a four-tuple of extant species• Three possible quartets

• 𝑞1 = 𝑎𝑏|𝑐𝑑

• 𝑞2 = 𝑎𝑐|𝑏𝑑

• 𝑞3 = 𝑎𝑑|𝑏𝑐

• Frequency of quartet:

𝑓𝑋 𝑞𝑖 =𝑔𝑗∶𝑋⊆𝐿𝑔𝑗 ,𝒯𝑔𝑗|𝑋=𝑞𝑖

𝑔𝑗∶𝑋⊆𝐿𝑔𝑗

Quartet Based Approach

1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.

Bounded Rates Model

1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.

Yules Process

1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.

Preferential LGT

1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.

Further Results

• Highways of LGT• The same model as before with additional “highways”• Highways are pairs of edges where LGT occurs deterministically• Highways can be different for different genes• Same result holds under the bounded rates model

• Assuming no extinctions• Frequency of genes affected by highways is low

• Distance Based Approach under the GTR model• Compute the distance matrix by using the median of distances• Use any statistically consistent distance based method

Questions?

top related