![Page 1: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/1.jpg)
SupreFine, a new supertree method
Shel SwensonSeptember 17th 2009
![Page 2: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/2.jpg)
Tree of Life challenges:Tree of Life challenges: - millions of species- millions of species - lots of missing data- lots of missing data
Reconstructing the Tree of Life
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Two possible approaches: - Combined Analysis - Supertree Methods
![Page 3: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/3.jpg)
Two competing approaches
gene 1 gene 2 . . . gene k
. . . Combined AnalysisS
peci
es
![Page 4: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/4.jpg)
Combined Analysis Methods
gene 1S1S2S3
S4
S7
S8
TCTAATGGAA
GCTAAGGGAA TCTAAGGGAA TCTAACGGAA TCTAATGGAC
TATAACGGAA
gene 3TATTGATACA
TCTTGATACC
TAGTGATGCA
CATTCATACC
TAGTGATGCA
S1
S3
S4
S7
S8
gene 2GGTAACCCTCGCTAAACCTC
GGTGACCATC
GCTAAACCTC
S4
S5
S6
S7
![Page 5: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/5.jpg)
Combined Analysis gene 1
S1S2S3
S4
S5
S6
S7
S8
gene 2gene 3 TCTAATGGAA
GCTAAGGGAA TCTAAGGGAA TCTAACGGAA
TCTAATGGAC
TATAACGGAA
GGTAACCCTCGCTAAACCTC
GGTGACCATC
GCTAAACCTC
TATTGATACA
TCTTGATACC
TAGTGATGCA
CATTCATACC
TAGTGATGCA
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
![Page 6: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/6.jpg)
. . .
Analyzeseparately
SupertreeMethod
Two competing approaches
gene 1 gene 2 . . . gene k
. . . Combined AnalysisS
peci
es
![Page 7: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/7.jpg)
Why use supertree methods?
• Missing data• Large dataset sizes• Incompatible data types (e.g., morphological features, biomolecular sequences, gene orders, even distances based upon biochemistry)
• Unavailable sequence data (only trees)
![Page 8: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/8.jpg)
Many Supertree Methods
• MRP• weighted MRP• Min-Cut• Modified Min-Cut• Semi-strict Supertree
• MRF• MRD• QILI
• SDM• Q-imputation• PhySIC• Majority-Rule Supertrees
• Maximum Likelihood Supertrees
• and many more ...
Matrix Representation with Parsimony(Most commonly used and most accurate)
![Page 9: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/9.jpg)
Today’s Outline
• Supertree and combined analysis methods• Why we need better supertree methods• SuperFine: a new supertree method that is fast and more accurate than other supertree methods– Strict Consensus Merger (SCM)– Resolving polytomies– Performance of SuperFine (compared to MRP and combined anaylses)
– applications and future work
![Page 10: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/10.jpg)
gene 1 gene 2 . . . gene k
. . .
Taxa
Previous Simulation Studies
2. Generate sequence
data
1. Generate Model Tree
4. ConstructSource Trees
. . .
3. Select Subsets
5. Apply SupertreeMethod
6. Compare to Model Tree
![Page 11: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/11.jpg)
What does lead to missing data?
• Evolution (gain and loss of genes)
• Dataset selection
• Limited resources (time, money, etc.)
![Page 12: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/12.jpg)
My Simulation Study1. Generate model trees (100-1000 taxa)2. Simulate gene gain and loss and generate
sequences3. Simulate techniques for gene and taxon selection
• Clade-based datasets• Scaffold dataset
4. Generate source trees and a combined dataset5. Apply supertree and combined analysis methods6. Compare each estimated tree to the model tree,
and record topological error
![Page 13: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/13.jpg)
Experimental Parameters
• Number of taxa in model tree: 100, 500, and 1000– Generate 5, 15 and 25 clade-based datasets, respectively
• Scaffold density: 20%, 50%, 75%, and 100%
• Six super-methods: – Combined analysis using ML and MP– MRP on ML and MP source trees– Weighted MRP on ML and MP source trees(MRP = Matrix Representation with Parsimony)
![Page 14: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/14.jpg)
A
B
C
F
D E A
B
D
F
C
E
Quantifying Topological Error
True Tree Estimated Tree
• False positive (FP): An edge in the estimated tree not in the true tree
• False negative (FN): An edge in the true tree missing from the estimated tree
![Page 15: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/15.jpg)
Comparison of MRP-ML and CA-ML(False Negative Rate)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Scaffold Density (%)
![Page 16: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/16.jpg)
We still need supertree methods!
Combined analysis cannot be used for:– Datasets that are very large
– Incompatible data types
– Unavailable sequence data
![Page 17: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/17.jpg)
Outline
• Supertree and combined analysis methods• Why we need better supertree methods• SuperFine: a new supertree method that is fast and more accurate than other supertree methods– Strict Consensus Merger (SCM)– Resolving polytomies– Performance of SuperFine (compared to MRP and combined anaylses)
– applications and future work
![Page 18: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/18.jpg)
Methods that Led to SuperFine
• The Strict Consensus Merger (SCM) (Huson et al. 1999)
• Quartet MaxCut (QMC)(Snir and Rao
2008)
![Page 19: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/19.jpg)
Strict Consensus Merger (SCM)
a b
c d
e
fg
a b
cdh
i j
e
fg
hi j
a b
c
d
a b
c
d
e
fg
a b
c
dh
i j
![Page 20: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/20.jpg)
Theorem
Let S be a collection of source trees and T be a SCM tree on S.
Then for every s in S, ∑(T|L(s)) ∑(s), where T|L(s) is the induced subtree of T on the leafset of s.
![Page 21: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/21.jpg)
Intuition for the Theorem
a b
c d
e
fg
a b
cdh
i j
e
fg
hi j
a b
c
d
a b
c
d
e
fg
a b
c
dh
i j
![Page 22: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/22.jpg)
Performance of SCM
• Low false positive (FP) rate(Estimated supertree has few false edges)
• High false negative (FN) rate(Estimated supertree is missing many true edges)
![Page 23: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/23.jpg)
Methods that Led to SuperFine
• The Strict Consensus Merger (SCM) (Huson et al. 1999)
• Quartet MaxCut (QMC)(Snir and Rao
2008)
![Page 24: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/24.jpg)
Quartet MaxCut (QMC) QMC is a heuristic for the following optimization problem:
Given a collection Q of quartet trees, find a supertree T, with leaf set L(T) = qQ L(q), that displays the maximum number of quartet trees in Q.1
2
3
4
5 6
7
1 5
42
1
24
5
![Page 25: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/25.jpg)
• 12|34, 23|45, 34|56, 45|67 are compatible quartet trees with supertree
• Adding the quartet 17|23 creates an incompatible set of quartet trees. An “optimal” supertree would be the same as above, because it agrees with 4 out of 5 quartet trees.
Maximizing # of Quartet Trees Displayed
1
2
3
4
5 6
72
3 5
4
1 3
424
5 7
6
3 5
64
![Page 26: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/26.jpg)
QMC as a Supertree Method
• Step 1: Encode source trees as a set of quartets
• Step 2: Apply QMC
![Page 27: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/27.jpg)
Idea behind SuperFine
• First, construct a supertree with low false positives using SCM The Strict Consensus Merger
• Then, refine the tree to reduce false negatives by resolving each polytomy using QMC Quartet Max Cut
![Page 28: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/28.jpg)
Resolving a single polytomy, v
• Step 1: Encode each source tree as a collection of quartet trees on {1,2,...,d}, where d=degree(v)
• Step 2: Apply Quartet MaxCut (Snir and Rao) to the collection of quartet trees, to produce a tree t on leafset {1,2,...,d}
• Step 3: Replace the star tree at v by tree t
Why?
![Page 29: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/29.jpg)
Back to Our Examplee
fg
a b
c
dh
i j
a bc e
hi j
d fg
1 2 3
4 5 6
a b
c d
e
fg
a b
cdh
i j
1 1
1 4
1
65
1 1
142
3 3
![Page 30: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/30.jpg)
Where We Use the Theorem
e
fg
a b
c
dh
i j
For every s in S, ∑(T|L(s)) ∑(s)
4
1
65
1
42 3
a b
c d
e
fg
a b
cdh
i j
![Page 31: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/31.jpg)
Step 1: Encode each source tree as a collection of
quartet trees on {1,2,...,d}
1
2 3
4
1 4
56
a b
c d
e
fg
a b
cdh
i j
4
1
65
1
42 3
![Page 32: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/32.jpg)
Step 2: Apply Quartet MaxCut (QMC) to the collection of
quartet trees
1
2 3
4
1 4
56QMC
1
2 3
4
6
5
![Page 33: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/33.jpg)
Replace polytomy using tree from QMC
1
2 3
4
6
5
a bc e
hi j
d fg
e
fg
a b
c
dh
i jh
dg
fi
j
a
bc
e
![Page 34: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/34.jpg)
False Negative Rate
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Scaffold Density (%)
![Page 35: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/35.jpg)
False Negative Rate
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Scaffold Density (%)
![Page 36: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/36.jpg)
False Positive Rate
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Scaffold Density (%)
![Page 37: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/37.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Running TimeSuperFine vs. MRP
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
MRP 8-12 sec.SuperFine 2-3 sec.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Scaffold Density (%) Scaffold Density (%)Scaffold Density (%)
![Page 38: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/38.jpg)
Observations
• SuperFine is much more accurate than MRP, with comparable performance only when the scaffold density is 100%
• SuperFine is almost as accurate as CA-ML
• SuperFine is extremely fast
![Page 39: SupreFine, a new supertree method Shel Swenson September 17th 2009](https://reader033.vdocuments.us/reader033/viewer/2022051522/5a4d1b647f8b9ab0599af4fb/html5/thumbnails/39.jpg)
Future Work• Exploring algorithm design space for Superfine
– Different quartet encodings– Not using SCM in Step 1– Parallel version– Post-processing step to minimize Sum-of-FN to source trees
• Using Superfine to enable phylogeny estimation– without an alignment– on many marker combined datasets
• Using Superfine in conjunction with divide-and-conquer methods to create more accurate phylogenetic methods
• Exploration of impact of source tree collections (in particular the scaffold) on supertree analyses
• Revisiting specific biological supertrees