http://creativecommons.org/licenses/by-sa/2.0/. cis786, lecture 3 usman roshan
Post on 21-Dec-2015
219 views
TRANSCRIPT
![Page 1: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/1.jpg)
http://creativecommons.org/licenses/by-sa/2.0/
![Page 2: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/2.jpg)
CIS786, Lecture 3
Usman Roshan
![Page 3: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/3.jpg)
Maximum Parsimony
• Character based method
• NP-hard (reduction to the Steiner tree problem)
• Widely-used in phylogenetics
• Slower than NJ but more accurate
• Faster than ML
• Assumes i.i.d.
![Page 4: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/4.jpg)
Maximum Parsimony
• Input: Set S of n aligned sequences of length k
• Output: A phylogenetic tree T– leaf-labeled by sequences in S– additional sequences of length k labeling the
internal nodes of T
such that is minimized. )(),(
),(TEji
jiH
![Page 5: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/5.jpg)
Maximum parsimony (example)
• Input: Four sequences– ACT– ACA– GTT– GTA
• Question: which of the three trees has the best MP scores?
![Page 6: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/6.jpg)
Maximum Parsimony
ACT
GTT ACA
GTA ACA ACT
GTAGTT
ACT
ACA
GTT
GTA
![Page 7: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/7.jpg)
Maximum Parsimony
ACT
GTT
GTT GTA
ACA
GTA
12
2
MP score = 5
ACA ACT
GTAGTT
ACA ACT
3 1 3
MP score = 7
ACT
ACA
GTT
GTAACA GTA
1 2 1
MP score = 4
Optimal MP tree
![Page 8: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/8.jpg)
Maximum Parsimony: computational complexity
ACT
ACA
GTT
GTAACA GTA
1 2 1
MP score = 4
Finding the optimal MP tree is NP-hard
Optimal labeling can becomputed in linear time O(nk)
![Page 9: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/9.jpg)
Local search strategies
Phylogenetic trees
Cost
Global optimum
Local optimum
![Page 10: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/10.jpg)
Local search for MP
• Determine a candidate solution s• While s is not a local minimum
– Find a neighbor s’ of s such that MP(s’)<MP(s)– If found set s=s’– Else return s and exit
• Time complexity: unknown---could take forever or end quickly depending on starting tree and local move
• Need to specify how to construct starting tree and local move
![Page 11: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/11.jpg)
Starting tree for MP
• Random phylogeny---O(n) time• Greedy-MP
![Page 12: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/12.jpg)
Greedy-MP
Greedy-MP takes O(n^3k) time
![Page 13: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/13.jpg)
Faster Greedy MP3-way labeling
• If we can assign optimal labels to each internal node rooted in each possible way, we can speed up computation by order of n
• Optimal 3-way labeling– Sort all 3n subtrees using
bucket sort in O(n)– Starting from small subtrees
compute optimal labelings– For each subtree rooted at v,
the optimal labelings of children nodes is already computed
– Total time: O(nk)
![Page 14: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/14.jpg)
Faster Greedy MP3-way labeling
• If we can assign optimal labels to each internal node rooted in each possible way, we can speed up computation by order of n
• Optimal 3-way labeling– Sort all 3n subtrees using
bucket sort in O(n)– Starting from small subtrees
compute optimal labelings– For each subtree rooted at v,
the optimal labelings of children nodes is already computed
– Total time: O(nk)
With optimal labeling it takes constantTime to compute MP score for eachEdge and so total Greedy-MP timeIs O(n^2k)
![Page 15: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/15.jpg)
Local moves for MP: NNI
• For each edge we get two different topologies
• Neighborhood size is 2n-6
![Page 16: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/16.jpg)
Local moves for MP: SPR
• Neighborhood size is quadratic in number of taxa• Computing the minimum number of SPR moves
between two rooted phylogenies is NP-hard
![Page 17: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/17.jpg)
Local moves for MP: TBR
• Neighborhood size is cubic in number of taxa• Computing the minimum number of TBR moves
between two rooted phylogenies is NP-hard
![Page 18: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/18.jpg)
Tree Bisection and Reconnection (TBR)
![Page 19: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/19.jpg)
Tree Bisection and Reconnection (TBR)
Delete an edge
![Page 20: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/20.jpg)
Tree Bisection and Reconnection (TBR)
![Page 21: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/21.jpg)
Tree Bisection and Reconnection (TBR)
Reconnect the trees with a new edgethat bifurcates an edge in each tree
![Page 22: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/22.jpg)
Local optima is a problem
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
1 48 96 144 192 240 288 336
TNT
![Page 23: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/23.jpg)
Iterated local search: escape local optima by perturbation
Local optimumLocal search
![Page 24: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/24.jpg)
Iterated local search: escape local optima by perturbation
Local optimum
Output of perturbation
Perturbation
Local search
![Page 25: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/25.jpg)
Iterated local search: escape local optima by perturbation
Local optimum
Output of perturbation
Perturbation
Local search
Local search
![Page 26: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/26.jpg)
ILS for MP
• Ratchet
• Iterative-DCM3
• TNT
![Page 27: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/27.jpg)
Iterated local search: escape local optima by perturbation
Local optimum
Output of perturbation
Perturbation
Local search
Local search
![Page 28: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/28.jpg)
Ratchet
• Perturbation input: alignment and phylogeny– Sample with replacement p% of sites and
reweigh them to w– Perform local search on modified dataset
starting from the input phylogeny– Reset the alignment to original after
completion and output the local minimum
![Page 29: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/29.jpg)
Ratchet: escaping local minimaby data perturbation
Local optimum
Output of ratchet
Ratchet search
Local search
Local search
![Page 30: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/30.jpg)
Ratchet: escaping local minimaby data perturbation
Local optimum
Output of ratchet
Ratchet search
Local search
Local search
But how well does this perform?We have to examine this experimentally on real data
![Page 31: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/31.jpg)
Experimental methodology for MP on real data
• Collect alignments of real datasets – Usually constructed using ClustalW– Followed by manual (eye) adjustments– Must be reliable to get sensible tree!
• Run methods for a fixed time period• Compare MP scores as a function of time
– Examine how scores improve over time– Rate of convergence of different methods (not
sequence length but as a function of time)
![Page 32: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/32.jpg)
Experimental methodology for MP on real data
• We use rRNA and DNA alignments• Obtained from researchers and public databases • We run iterative improvement and ratchet each
for 24 hours beginning from a randomized greedy MP tree
• Each method was run five times and average scores were plotted
• We use PAUP*---very widely used software package for various types of phylogenetic analysis
![Page 33: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/33.jpg)
500 aligned rbcL sequences (Zilla dataset)
![Page 34: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/34.jpg)
854 aligned rbcL sequences
![Page 35: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/35.jpg)
2000 aligned Eukaryotes
![Page 36: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/36.jpg)
7180 aligned 3domain
![Page 37: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/37.jpg)
13921 aligned Proteobacteria
![Page 38: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/38.jpg)
Comparison of MP heuristics
• What about other techniques for escaping local minima?
• TNT: a combination of divide-and-conquer, simulated annealing, and genetic algorithms– Sectorial search (random): construct ancestral
sequence states using parsimony; randomly select a subset of nodes; compute iterative-improvement trees and if better tree found then replace
– Genetic algorithm (fuse): Exchange subtrees between two trees to see if better ones are found
– Default search: (1) Do sectorial search starting from five randomized greedy MP trees; (2) apply genetic algorithm to find better ones; (3) output best tree
![Page 39: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/39.jpg)
Comparison of MP heuristics
• What about other techniques for escaping local minima?
• TNT: a combination of divide-and-conquer, simulated annealing, and genetic algorithms– Sectorial search (random): construct ancestral
sequence states using parsimony; randomly select a subset of nodes; compute iterative-improvement trees and if better tree found then replace
– Genetic algorithm (fuse): Exchange subtrees between two trees to see if better ones are found
– Default search: (1) Do sectorial search starting from five randomized greedy MP trees; (2) apply genetic algorithm to find better ones; (3) output best tree
How does this compare to PAUP*-ratchet?
![Page 40: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/40.jpg)
Experimental methodology for MP on real data
• We use rRNA and DNA alignments
• Obtained from researchers and public databases
• We run PAUP*-ratchet, TNT-default, and TNT-ratchet each for 24 hours beginning from randomized greedy MP trees
• Each method was run five times on each dataset and average scores were plotted
![Page 41: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/41.jpg)
500 aligned rbcL sequences (Zilla dataset)
![Page 42: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/42.jpg)
854 aligned rbcL sequences
![Page 43: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/43.jpg)
2000 aligned Eukaryotes
![Page 44: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/44.jpg)
7180 aligned 3domain
![Page 45: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/45.jpg)
13921 aligned Proteobacteria
![Page 46: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/46.jpg)
Can we do even better?
Yes! But first let’s look at
Disk-Covering Methods
![Page 47: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/47.jpg)
Disk Covering Methods (DCMs)
• DCMs are divide-and-conquer booster methods. They divide the dataset into small subproblems, compute subtrees using a given base method, merge the subtrees, and refine the supertree.
• DCMs to date– DCM1: for improving statistical performance of
distance-based methods. – DCM2: for improving heuristic search for MP and ML– DCM3: latest, fastest, and best (in accuracy and
optimality) DCM
![Page 48: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/48.jpg)
DCM2 technique for speeding up MP searches
1. Decompose sequences into overlapping subproblems
2. Compute subtrees using a base method
3. Merge subtrees using the Strict Consensus Merge (SCM)
4. Refine to make the tree binary
![Page 49: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/49.jpg)
2. Find separator X in G which minimizes max where are the connected components of G – X
3. Output subproblems as .
DCM2• Input: distance matrix d,
threshold , sequences S• Algorithm:
1a. Compute a threshold graph G using q and d1b. Perform a minimum weight triangulation of G
DCM2 decomposition
|| iAX iA
}{ ijdq
iAX
![Page 50: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/50.jpg)
Threshold graph
• Add edges until graph is connected• Perform minimum weight triangulation
– NP-hard– Triangulated graph=perfect elimination ordering
(PEO)– Max cliques can be determined in linear time– Use greedy triangulation heuristic: compute PEO by
adding vertices which minimize largest edge added– Worst case is O(n^3) but fast in practice
![Page 51: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/51.jpg)
1. Find separator X in G which minimizes max where are the connected components of G – X
2. Output subproblems as3. This takes O(n^3) worst case
time: perform depth first search on each component (O(n^2)) for each of O(n) separators
Finding DCM2 separator
|| iAX iA
iAX
![Page 52: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/52.jpg)
DCM2 subsets
![Page 53: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/53.jpg)
DCM1 vs DCM2
DCM1 decomposition : NJ gets better accuracyon small diameter subproblems(which we shall return to later)
DCM2 decomposition:Getting a smaller number of smaller subproblemsspeeds up solution
![Page 54: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/54.jpg)
We saw how decomposition takes place, now on to supertree methods
1. Decompose sequences into overlapping subproblems
2. Compute subtrees using a base method
3. Merge subtrees using the Strict Consensus Merge (SCM)
4. Refine to make the tree binary
![Page 55: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/55.jpg)
Supertree Methods
![Page 56: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/56.jpg)
Optimization problems
• Subtree Compatibility: Given set of trees ,does there exist tree ,such
that, (we say contains ).
• NP-hard (Steel 1992)• Special cases are poly-time (rooted trees,
DCM)• MRP: also NP-hard
}{ ,,1 kTT T TtTt tL )(|,T T
T
![Page 57: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/57.jpg)
Direct supertree methods• Strict consensus supertrees,
MinCutSupertrees
![Page 58: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/58.jpg)
Indirect supertree methods
• MRP, Average consensus
![Page 59: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/59.jpg)
MRP---Matrix Representation using Parsimony (very popular)
![Page 60: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/60.jpg)
Strict Consensus Merger---faster and used in DCMs
1 2
3
4 6
5
1 2
3
7 4
1
3
2
4
1 2
3 4
1 2
3 4
1
2
3
4
5
6
7
![Page 61: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/61.jpg)
Strict Consensus Merger: compatible subtrees
![Page 62: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/62.jpg)
Strict Consensus Merger: compatible but collision
![Page 63: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/63.jpg)
Strict Consensus Merger: incompatible subtrees
![Page 64: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/64.jpg)
Strict Consensus Merger: incompatible and collision
![Page 65: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/65.jpg)
Strict Consensus Merger: difference from Gordon’s SC method
![Page 66: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/66.jpg)
We saw how decomposition takes place, now on to supertree methods
1. Decompose sequences into overlapping subproblems
2. Compute subtrees using a base method
3. Merge subtrees using the Strict Consensus Merge (SCM)
4. Refine to make the tree binary
![Page 67: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/67.jpg)
Tree Refinement
• Challenge: given unresolved tree, find optimal refinement that has an optimal parsimony score
• NP-hard
![Page 68: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/68.jpg)
Tree Refinement
ea
b c d
f g
h
a
bc d
fg
h
e
d
e
a
bc
f g
h
a
b
c f g
hd e
![Page 69: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/69.jpg)
We saw how decomposition takes place, now on to supertree methods
1. Decompose sequences into overlapping subproblems
2. Compute subtrees using a base method
3. Merge subtrees using the Strict Consensus Merge (SCM)
4. Refine to make the tree binary
![Page 70: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/70.jpg)
Comparing DCM decompositions
![Page 71: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/71.jpg)
Study of DCM decompositions
DCM2 is faster and better than DCM1
Comparison of MP scores Comparison of running times
![Page 72: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/72.jpg)
Best DCM (DCM2) vs Random
Comparison of MP scores Comparison of running times
DCM2 is better than RANDOM w.r.t MP scores and running times
![Page 73: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/73.jpg)
DCM2 (comparing two different thresholds)
Comparison of MP scores Comparison of running times
![Page 74: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/74.jpg)
Threshold selection techniques
Biological dataset of 503 rRNA sequences. Threshold valueat which we get two subproblems has best MP score.
![Page 75: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/75.jpg)
Comparing supertree methods
![Page 76: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/76.jpg)
MRP vs. SCM
1. SCM is better than MRP
Comparison of MP scores Comparison of running times
![Page 77: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/77.jpg)
Comparing tree refinement techniques
![Page 78: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/78.jpg)
Study of tree refinement techniques
Comparison of MP scores Comparison of running times
Constrained tree search had best MP scores but is slower thanother methods
![Page 79: Http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 3 Usman Roshan](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d575503460f94a358d0/html5/thumbnails/79.jpg)
Next time
• DCM1 for improving NJ
• Recursive-Iterative-DCM3: state of the art in solving MP and ML