astral tutorial - tandy warnow · gens (k), shown in log scale (see fig. s2 for normal scale). a...
TRANSCRIPT
ASTRAL Tutorial
• Instructor: Siavash Mirarab, [email protected]
• Github site: https://github.com/smirarab/ASTRAL
• Email: [email protected]
1
OrangutanGorilla ChimpHuman
The species tree
A gene treeOrang.Gorilla ChimpHuman Orang.Gorilla Chimp Human
Gene tree discordance
gene1000gene 1
OrangutanGorilla ChimpHuman
The species tree
A gene treeOrang.Gorilla ChimpHuman Orang.Gorilla Chimp Human
Gene tree discordance
Causes of gene tree discordance include:• Duplication and loss • Horizontal Gene Transfer (HGT) and Hybridization • Incomplete Lineage Sorting (ILS)
gene1000gene 1
OrangutanGorilla ChimpHuman
Gene evolution model (MSC)
Orang.GorillaChimp
Human Orang.Gorilla ChimpHuman
Orang.Gorilla
ChimpHuman
Orang.Chimp Human
ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG
CTGAGCATCG CTGAGC-TCG ATGAGC-TC- CTGA-CAC-G
AGCAGCATCGTG AGCAGC-TCGTG AGCAGC-TC-TG C-TA-CACGGTG
CAGGCACGCACGAA AGC-CACGC-CATA ATGGCACGC-C-TA AGCTAC-CACGGAT
Sequence evolution model (GTR)
3
Species tree
Gene tree
Sequence data(Alignments)
Gene tree Gene tree Gene tree
Sequence data(Alignments)
Model
Orang.GorillaChimp
Human Orang.Gorilla ChimpHuman
Orang.Gorilla
ChimpHuman
Orang.Chimp Human
ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG
CTGAGCATCG CTGAGC-TCG ATGAGC-TC- CTGA-CAC-G
AGCAGCATCGTG AGCAGC-TCGTG AGCAGC-TC-TG C-TA-CACGGTG
CAGGCACGCACGAA AGC-CACGC-CATA ATGGCACGC-C-TA AGCTAC-CACGGAT
Step 1: infer gene trees (e.g., ML)
4
Gene tree Gene tree Gene tree Gene tree
Two-step approach
OrangutanGorilla ChimpHuman
Step 2: gene tree summarization
Orang.GorillaChimp
Human Orang.Gorilla ChimpHuman
Orang.Gorilla
ChimpHuman
Orang.Chimp Human
ACTGCACACCG ACTGC-CCCCG AATGC-CCCCG -CTGCACACGG
CTGAGCATCG CTGAGC-TCG ATGAGC-TC- CTGA-CAC-G
AGCAGCATCGTG AGCAGC-TCGTG AGCAGC-TC-TG C-TA-CACGGTG
CAGGCACGCACGAA AGC-CACGC-CATA ATGGCACGC-C-TA AGCTAC-CACGGAT
Step 1: infer gene trees (e.g., ML)
4
Gene tree Gene tree Gene tree Gene tree
Two-step approach
ASTRAL• Input: Unrooted gene trees
• Can have missing data
• Can have polytomies
• Can have multiple individuals per species
• Output: The estimated unrooted species tree
• Will have branch lengths in coalescent units on internal branches (not super accurate)
• Will have a measure of support called localPP
5
A bit on the input
6
Typical pipeline to produce input
1. Find orthologous parts of the genomes for species of interest
2. Align sequences per ortholog using your favorite MSA method (e.g., PASTA, UPP, MAFFT, etc.)
3. Infer “gene trees” per ortholog using Maximum Likelihood methods (e.g., RAxML or FastTree)Optionally: perform various filtering steps to remove errors
7
samples
ACTGCACACCG ACTGC-CCCCG AATGC-CCCC- -CTGCACACGG
CAGGCACGCACGAA AGC-CACGC-CATA ATGGCACGC-C-TA AGCTAC-CACGGAT
ACTGCACACCG ACTGCCCCCG AATGCCCCC CTGCACACGG
CAGGCACGCACGAA AGCCACGCCATA ATGGCACGCCTA AGCTACCACGGAT
gene 1000
gene 1 MSA
MSAOrtholog gene mining (using
WGA, annotation, HMMs, etc.)SequencingSample preparation Assembly
CTGAGCATCG
Reads
CTGAGCTCG
ATGAGCTCCTGACACG
CTGAGCTCG
ATGAGCTC
CTGACACGAACGACTAG...ACCGAAGTAAATATATAAT
AGTACACGAACGA...ACCAAGAAGTAATATACATATA
ATGACACGAACAAG...TACGAAGTGTACCGAGATATATACA
ACAAACGATCGACATAATT...ACCGAAGTAAACGTAT
Putative genomes
gene 1000
gene 1
Contamination Sequencing error Misassembly fragmentation
Mis-annotation, hidden paralogy, mis-alignment, fragmentation
Alignment error
Should you filter?• Filtering genes based on missing data?
• Generally not beneficial [Molloy and Warnow, 2018]
8
Should you filter?• Filtering genes based on missing data?
• Generally not beneficial [Molloy and Warnow, 2018]
• Filtering genes based on gene tree estimation error?
• Depends on conditions [Molloy and Warnow, 2018]
8
Should you filter?• Filtering genes based on missing data?
• Generally not beneficial [Molloy and Warnow, 2018]
• Filtering genes based on gene tree estimation error?
• Depends on conditions [Molloy and Warnow, 2018]
• Filtering fragmentary sequences while keeping the gene?
• Often beneficial [Sayyari, Whitfield, and Mirarab, 2018]
8
Should you filter?• Filtering genes based on missing data?
• Generally not beneficial [Molloy and Warnow, 2018]
• Filtering genes based on gene tree estimation error?
• Depends on conditions [Molloy and Warnow, 2018]
• Filtering fragmentary sequences while keeping the gene?
• Often beneficial [Sayyari, Whitfield, and Mirarab, 2018]
• Filtering super long branches helps (talk today by Uyen Mai)
8
bestML versus MLBS
9
• Multi-locus Bootstrapping (MLBS) is possible
• See -b option here: https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md#multi-locus-bootstrapping
• You need a file with bootstrap replicates for each gene tree
• MLBS not suggested as seem to degrade accuracy compared to simply using ML gene trees.
Bioinformatics, 2014, Mirarab et al.
Low support branches• Does it help to contract
branches with low support?
• Yes, but only for very low support branches
10
Simulations: 100 taxa, simphy, ILS: around 46% true discordance FastTree, support from bootstrapping
genes
All replicates
BMC Bioinformatics, 2018, Zhang et al.
Low support branches• Does it help to contract
branches with low support?
• Yes, but only for very low support branches
• Mostly helps in the presence of low support gene trees
10
Simulations: 100 taxa, simphy, ILS: around 46% true discordance FastTree, support from bootstrapping
genes
All replicates
BMC Bioinformatics, 2018, Zhang et al.
Low support branches• Does it help to contract
branches with low support?
• Yes, but only for very low support branches
• Mostly helps in the presence of low support gene trees
• More genes allows for more filtering
10
Simulations: 100 taxa, simphy, ILS: around 46% true discordance FastTree, support from bootstrapping
genes
All replicates
BMC Bioinformatics, 2018, Zhang et al.
A bit on the output
11
Unrooted quartets under MSC model
For a quartet (4 species), the most probable unrooted quartet tree (among the gene trees) is the unrooted species tree topology (Allman, et al. 2010)
12
Orang.
Gorilla Chimp
HumanOrang.
GorillaChimp
Human
Orang.
Gorilla
Chimp
Human
θ2=15% θ3=15%θ1=70%Gorilla
Orang.
Chimp
Human
d=0.8
Unrooted quartets under MSC model
For a quartet (4 species), the most probable unrooted quartet tree (among the gene trees) is the unrooted species tree topology (Allman, et al. 2010)
12
Orang.
Gorilla Chimp
HumanOrang.
GorillaChimp
Human
Orang.
Gorilla
Chimp
Human
θ2=15% θ3=15%θ1=70%Gorilla
Orang.
Chimp
Human
d=0.8
The most frequent gene tree = The most likely species tree
13
Orang.Gorilla ChimpHuman Rhesus
More than 4 speciesFor 5 or more species, the unrooted species tree topology can be different from the most probable gene tree (called “anomaly zone”) (Degnan, 2013)
13
Orang.Gorilla ChimpHuman Rhesus
1. Break gene trees into (n 4 ) quartets of species
2. Find the dominant tree for all quartets of taxa
3. Combine quartet trees
Some tools (e.g.. BUCKy-p [Larget, et al., 2010])
More than 4 speciesFor 5 or more species, the unrooted species tree topology can be different from the most probable gene tree (called “anomaly zone”) (Degnan, 2013)
13
Orang.Gorilla ChimpHuman Rhesus
1. Break gene trees into (n 4 ) quartets of species
2. Find the dominant tree for all quartets of taxa
3. Combine quartet trees
Some tools (e.g.. BUCKy-p [Larget, et al., 2010])
More than 4 speciesFor 5 or more species, the unrooted species tree topology can be different from the most probable gene tree (called “anomaly zone”) (Degnan, 2013)
Alternative: weight all 3(n
4 ) quartet topologies by their frequency
and find the optimal tree
Maximum Quartet Support Species Tree
• Optimization problem (NP-Hard):
14
the set of quartet trees induced by T
a gene treeScore(T) =
k
∑1
|Q(T) ∪ Q(ti) |
Find the species tree with the maximum number of induced quartet trees shared with the collection of input gene trees
Maximum Quartet Support Species Tree
• Optimization problem (NP-Hard):
• Statistically consistent under the multi-species coalescent model when solved exactly
14
the set of quartet trees induced by T
a gene treeScore(T) =
k
∑1
|Q(T) ∪ Q(ti) |
Find the species tree with the maximum number of induced quartet trees shared with the collection of input gene trees
Maximum Quartet Support Species Tree
• Optimization problem (NP-Hard):
• Statistically consistent under the multi-species coalescent model when solved exactly
• ASTRAL: an exact solution using dynamic programming a constrained version that can run on large datasets
14
the set of quartet trees induced by T
a gene treeScore(T) =
k
∑1
|Q(T) ∪ Q(ti) |
Find the species tree with the maximum number of induced quartet trees shared with the collection of input gene trees
Beyond topology, ASTRAL can …
• estimate length of internal branches in coalescent units: # generations / population size
15
Beyond topology, ASTRAL can …
• estimate length of internal branches in coalescent units: # generations / population size
• estimate a measure of branch support called local posterior probability
• The probability of the branch being correct assuming gene trees are generated by the MSC model
15
Beyond topology, ASTRAL can …
• estimate length of internal branches in coalescent units: # generations / population size
• estimate a measure of branch support called local posterior probability
• The probability of the branch being correct assuming gene trees are generated by the MSC model
• test for polytomies
15
Beyond topology, ASTRAL can …
• estimate length of internal branches in coalescent units: # generations / population size
• estimate a measure of branch support called local posterior probability
• The probability of the branch being correct assuming gene trees are generated by the MSC model
• test for polytomies
• annotate species tree branches with quartet-based measures of discordance
15
A bit on the search space
16
ASTRAL versions• ASTRAL-I (<v. 4.7.3) restricts the search space to combinations
of bipartitions seen in gene trees.
• This make it fast but it remains statistically consistent
17
ASTRAL versions• ASTRAL-I (<v. 4.7.3) restricts the search space to combinations
of bipartitions seen in gene trees.
• This make it fast but it remains statistically consistent
• ASTRAL-II (<v. 5.1.0) increased the search space heuristically
• Improved the accuracy at the expense of running time
• Can handle polytomies in input gene trees
17
ASTRAL versions• ASTRAL-I (<v. 4.7.3) restricts the search space to combinations
of bipartitions seen in gene trees.
• This make it fast but it remains statistically consistent
• ASTRAL-II (<v. 5.1.0) increased the search space heuristically
• Improved the accuracy at the expense of running time
• Can handle polytomies in input gene trees
• ASTRAL-III (>v. 5.1.1) changed the search space again for a better running time versus accuracy trade-off
• Improved running time for unresolved trees; makes it feasible to remove very low support branches from gene trees
17
Modifying the Search Space
• For small enough datasets, you can run ASTRAL in *exact* mode, which will find the globally optimal tree! (-x option)
• ASTRAL’s default usage will not run in exact mode, and will constrain the search space largely using the gene trees.
• It may be beneficial to *expand* the search space, using the “-e” or “-f” options (see tutorial at GitHub site). In particular, you can add species trees you’ve estimated using other methods, such as concatenation, SVDquartets, or species trees that other studies have suggested.
18
A bit on the running time
19
Empirical running time as a function of the number of genes
20
Zhang et al. Page 13 of 34
●
●
●
●
●
●
●
●
●
●
●
●
●
2.272.09
●
●
●
●
●
●
●
●
●
●
●
●
●
2.292.06
500 1500
28 29 210 211 212 213 214 28 29 210 211 212 213 214
20
25
210
#Genes
Run
ning
tim
e (m
inut
es)
●
●
ASTRAL2
ASTRAL3
Figure 5 Running time versus k. Average running times (4 replicates) are shown for ASTRAL-IIand ASTRAL-III on the avian dataset with 500bp or 1500bp alignments with varying numbers ofgens (k), shown in log scale (see Fig. S2 for normal scale). A line is fit to the data points in thelog/log space and line slopes are shown. ASTRAL-II did not finish on 214 genes in 48 hour.
branches has minimal impact on the discordance (eight discordant branches with
binned MP-EST instead of nine). However, contracting low support branches with
3%–33% thresholds dramatically reduces the discordance with the reference tree (2,
2, 4, 2, 3, and 3 discordant branches, respectively, for 3%, 5%, 7%, 10%, 20%, and
33%). Three thresholds (3%, 5%, and 10%) produce an identical tree (Fig. 4d). The
remaining di↵erences are among the branches that are deemed unresolved by Jarvis
et al. and change among the reference trees as well [5]. Contracting at 50% and 75%
thresholds, however, increases discordance to five and six branches, respectively.
Thus, consistent with simulations, contracting very low support branches seems
to produce the best results, when judged by similarity with the reference trees. To
summarize, ASTRAL-III obtained on unbinned but collapsed gene trees agreed with
all major relations in Jarvis et al., including the novel Columbea group, whereas
the unresolved tree missed important clades (Fig. 4).
3.2 RQ2: Running time improvements
We study the improvements in running time as various parameters change.
3.2.1 Varying the number of genes (k)
We compare ASTRAL-III to ASTRAL-II on the avian simulated dataset, changing
the number of genes from 28 to 214 and forcing X to be the same for both versions to
enable comparing impacts of improved weight calculation. We allow each replicate
run to take up to two days. ASTRAL-II is not able to finish on the dataset with
k = 214, while ASTRAL-III finishes on all conditions. ASTRAL-III improves the
running time over ASTRAL-II and the extent of the improvement depends on k
(Fig. 5). With 1000 genes or more, there is at least a 2.1X improvement. With
213 genes, the largest value where both versions could run, ASTRAL-III finishes
on average 3.2 times faster than ASTRAL-II (234 versus 758 minutes). Moreover,
fitting a line to the average running time in the log-log scale graph reveals that
on this dataset, the running time of ASTRAL-III on average grows as O(k2.08),
which is better than that of ASTRAL-II at O(k2.28), and both are better than the
theoretical worst case, which is O(k2.726).
Zhang et al. Page 13 of 34
●
●
●
●
●
●
●
●
●
●
●
●
●
2.272.09
●
●
●
●
●
●
●
●
●
●
●
●
●
2.292.06
500 1500
28 29 210 211 212 213 214 28 29 210 211 212 213 214
20
25
210
#Genes
Run
ning
tim
e (m
inut
es)
●
●
ASTRAL2
ASTRAL3
Figure 5 Running time versus k. Average running times (4 replicates) are shown for ASTRAL-IIand ASTRAL-III on the avian dataset with 500bp or 1500bp alignments with varying numbers ofgens (k), shown in log scale (see Fig. S2 for normal scale). A line is fit to the data points in thelog/log space and line slopes are shown. ASTRAL-II did not finish on 214 genes in 48 hour.
branches has minimal impact on the discordance (eight discordant branches with
binned MP-EST instead of nine). However, contracting low support branches with
3%–33% thresholds dramatically reduces the discordance with the reference tree (2,
2, 4, 2, 3, and 3 discordant branches, respectively, for 3%, 5%, 7%, 10%, 20%, and
33%). Three thresholds (3%, 5%, and 10%) produce an identical tree (Fig. 4d). The
remaining di↵erences are among the branches that are deemed unresolved by Jarvis
et al. and change among the reference trees as well [5]. Contracting at 50% and 75%
thresholds, however, increases discordance to five and six branches, respectively.
Thus, consistent with simulations, contracting very low support branches seems
to produce the best results, when judged by similarity with the reference trees. To
summarize, ASTRAL-III obtained on unbinned but collapsed gene trees agreed with
all major relations in Jarvis et al., including the novel Columbea group, whereas
the unresolved tree missed important clades (Fig. 4).
3.2 RQ2: Running time improvements
We study the improvements in running time as various parameters change.
3.2.1 Varying the number of genes (k)
We compare ASTRAL-III to ASTRAL-II on the avian simulated dataset, changing
the number of genes from 28 to 214 and forcing X to be the same for both versions to
enable comparing impacts of improved weight calculation. We allow each replicate
run to take up to two days. ASTRAL-II is not able to finish on the dataset with
k = 214, while ASTRAL-III finishes on all conditions. ASTRAL-III improves the
running time over ASTRAL-II and the extent of the improvement depends on k
(Fig. 5). With 1000 genes or more, there is at least a 2.1X improvement. With
213 genes, the largest value where both versions could run, ASTRAL-III finishes
on average 3.2 times faster than ASTRAL-II (234 versus 758 minutes). Moreover,
fitting a line to the average running time in the log-log scale graph reveals that
on this dataset, the running time of ASTRAL-III on average grows as O(k2.08),
which is better than that of ASTRAL-II at O(k2.28), and both are better than the
theoretical worst case, which is O(k2.726).
Zhang et al. Page 13 of 34
●
●
●
●
●
●
●
●
●
●
●
●
●
2.272.09
●
●
●
●
●
●
●
●
●
●
●
●
●
2.292.06
500 1500
28 29 210 211 212 213 214 28 29 210 211 212 213 214
20
25
210
#Genes
Run
ning
tim
e (m
inut
es)
●
●
ASTRAL2
ASTRAL3
Figure 5 Running time versus k. Average running times (4 replicates) are shown for ASTRAL-IIand ASTRAL-III on the avian dataset with 500bp or 1500bp alignments with varying numbers ofgens (k), shown in log scale (see Fig. S2 for normal scale). A line is fit to the data points in thelog/log space and line slopes are shown. ASTRAL-II did not finish on 214 genes in 48 hour.
branches has minimal impact on the discordance (eight discordant branches with
binned MP-EST instead of nine). However, contracting low support branches with
3%–33% thresholds dramatically reduces the discordance with the reference tree (2,
2, 4, 2, 3, and 3 discordant branches, respectively, for 3%, 5%, 7%, 10%, 20%, and
33%). Three thresholds (3%, 5%, and 10%) produce an identical tree (Fig. 4d). The
remaining di↵erences are among the branches that are deemed unresolved by Jarvis
et al. and change among the reference trees as well [5]. Contracting at 50% and 75%
thresholds, however, increases discordance to five and six branches, respectively.
Thus, consistent with simulations, contracting very low support branches seems
to produce the best results, when judged by similarity with the reference trees. To
summarize, ASTRAL-III obtained on unbinned but collapsed gene trees agreed with
all major relations in Jarvis et al., including the novel Columbea group, whereas
the unresolved tree missed important clades (Fig. 4).
3.2 RQ2: Running time improvements
We study the improvements in running time as various parameters change.
3.2.1 Varying the number of genes (k)
We compare ASTRAL-III to ASTRAL-II on the avian simulated dataset, changing
the number of genes from 28 to 214 and forcing X to be the same for both versions to
enable comparing impacts of improved weight calculation. We allow each replicate
run to take up to two days. ASTRAL-II is not able to finish on the dataset with
k = 214, while ASTRAL-III finishes on all conditions. ASTRAL-III improves the
running time over ASTRAL-II and the extent of the improvement depends on k
(Fig. 5). With 1000 genes or more, there is at least a 2.1X improvement. With
213 genes, the largest value where both versions could run, ASTRAL-III finishes
on average 3.2 times faster than ASTRAL-II (234 versus 758 minutes). Moreover,
fitting a line to the average running time in the log-log scale graph reveals that
on this dataset, the running time of ASTRAL-III on average grows as O(k2.08),
which is better than that of ASTRAL-II at O(k2.28), and both are better than the
theoretical worst case, which is O(k2.726).
Avian simulations: n = 48 species k = 256 to 16,384 gene trees Relatively high ILS Low/moderate gene tree error 4 replicates per dot
Empirical running time seems to increase close to O(k2)
17 hours
21
Zhang et al. Page 34 of 34
●
● ●
● ●
●
1.09
10k
20k
50k
100k
200k
500k
1M
200 500 1000 2000 5000 10000#Species
Size
of X
●
●
●
●
●
●
1.83
1
10
100
1000
200 500 1000 2000 5000 10000#Species
Run
ning
tim
e (m
inut
es)
Figure S8 Empirical running time of ASTRAL-III with n. Average running time is shown forASTRAL-III for datasets with varying n. Averages are over 20 replicates. One replicate of 2000species dataset could not finish in 2 days and is removed from the analysis. Note that thesedatasets have factors other than n that change as well (e.g., the amount of ILS, etc.). Thus, theserunning times should be treated as ball-park estimates. Finally, we note that on the 10,000dataset, we have only 2 replicates and not 20.
Empirical running time as a function of the number of species
Empirical running time seems to increase close to O(n1.8)
Simphy simulations: n = 200 to 10,000 species k = 1000 gene trees Varying ILS Varying gene tree error 20 replicates everywhere, except for 10000 that has only 2 replicates
52 hours
Parallelization and scaling limits
• We have now developed ASTRAL-MP to use parallelism.
• Can analyze datasets with 10,000 species and 1000 genes in less than a day given 24 cores and a GPU
• https://github.com/smirarab/ASTRAL/tree/MP-similarity
22
●
●
●
●
●
●
●
●
●
●
●
4
31
73
108
1 12 162 20 244 6 8CPU cores
Spee
dup
vers
us A
STR
AL−I
II
●
●
No GPU
1 GPU
Examples
23
Simple run
• Runs ASTRAL version 5.6.1 on 1KP-genetrees.tre and save results in 1KP-speciestrees.tre with logs saved to 1KP-log.txt.
• The dataset is from Wickett, Mirarab, et al. PNAS (2014), Phylotranscriptomic Analysis of the Origins and early Diversification of Land Plants.
24
java -jar ~/workspace/ASTRAL/astral.5.6.1.jar -i 1KP-genetrees.tre -o 1KP-speciestrees.tre 2> 1KP-log.txt
Open the tree in FigTree
• ASTRAL trees are unrooted.
• You will have to root them.
• You may have to open 2-3 times in FigTree (not sure why)
25
Equisetum_diffusum
Bazzania_trilobata
Rhynchostegium_serrulatum
Riccia_sp
Hibiscus_cannabinus
Bryum_argenteum
Juniperus_scopulorum
Vitis_vinifera
Eschscholzia_californicaPodophyllum_peltatum
Physcomitrella_patens
Mesostigma_viride
Rosulabryum_cf_capillare
Prumnopitys_andina
Acorus_americanus
Selaginella_moellendorffii_1kp
Carica_papaya
Roya_obtusa
Chlorokybus_atmophyticus
Spirogyra_sp
Sorghum_bicolor
Thuidium_delicatulum
Sciadopitys_verticillata
Pseudolycopodiella_caroliniana
Taxus_baccata
Sarcandra_glabra
Boehmeria_nivea
Persea_americana
Catharanthus_roseus
Arabidopsis_thaliana
Nuphar_advena
Spirotaenia_minuta
Cycas_rumphii
Penium_margaritaceum
Dioscorea_villosa
Cycas_micholitzii
Kadsura_heteroclita
Sphagnum_lescurii
Ceratodon_purpureus
Aquilegia_formosa
Angiopteris_evecta
Coleochaete_irregularis
Kochia_scoparia
Cunninghamia_lanceolata
Netrium_digitus
Marchantia_polymorpha
Inula_helenium
Nothoceros_aenigmaticus
Ipomoea_purpurea
Yucca_filamentosa
Cylindrocystis_cushleckae
Gnetum_montanum
Metzgeria_crassipilis
Ginkgo_bilobaPsilotum_nudum
Dendrolycopodium_obscurum
Monomastix_opisthostigma
Liriodendron_tulipifera
Mougeotia_sp
Mesotaenium_endlicherianum
Diospyros_malabarica
Ephedra_sinica
Allamanda_cathartica
Zamia_vazquezii
Rosmarinus_officinalis
Saruma_henryi
Pteridium_aquilinum
Sphaerocarpos_texanus
Klebsormidium_subtile
Colchicum_autumnale
Amborella_trichopoda
Pinus_taeda
Leucodon_brachypus
Pyramimonas_parkeae
Sabal_bermudana
Cosmarium_ochthodes
Chaetosphaeridium_globosum
Huperzia_squarrosa
Zea_mays
Chara_vulgaris
Medicago_truncatula
Nephroselmis_pyriformis
Tanacetum_parthenium
Alsophila_spinulosa
Populus_trichocarpa
Cedrus_libani
Anomodon_attenuatus
Oryza_sativa
Marchantia_emarginata
Brachypodium_distachyon
Cylindrocystis_brebissonii
Uronema_sp
Entransia_fimbriata
Nothoceros_vincentianus
Smilax_bona
Larrea_tridentata
Polytrichum_commune
Coleochaete_scutata
Selaginella_moellendorffii_genome
Welwitschia_mirabilis
Ophioglossum_petiolatum
Houttuynia_cordata
Hedwigia_ciliata
Length in coalescent units
• Often better to look at cladogram
26
Entransia_fimbriata
Smilax_bona
Mougeotia_sp
Klebsormidium_subtile
Angiopteris_evecta
Marchantia_emarginata
Pinus_taedaCedrus_libani
Acorus_americanus
Psilotum_nudum
Eschscholzia_californica
Ophioglossum_petiolatum
Hibiscus_cannabinus
Sarcandra_glabra
Amborella_trichopoda
Catharanthus_roseus
Cylindrocystis_brebissonii
Oryza_sativa
Bazzania_trilobata
Uronema_sp
Chlorokybus_atmophyticus
Pyramimonas_parkeae
Brachypodium_distachyon
Houttuynia_cordata
Carica_papaya
Populus_trichocarpa
Dioscorea_villosa
Ephedra_sinica
Selaginella_moellendorffii_1kp
Diospyros_malabarica
Cylindrocystis_cushleckaeMesotaenium_endlicherianum
Coleochaete_scutata
Pteridium_aquilinum
Sorghum_bicolor
Cycas_micholitzii
Chara_vulgaris
Cycas_rumphii
Nuphar_advena
Arabidopsis_thaliana
Prumnopitys_andina
Pseudolycopodiella_caroliniana
Nephroselmis_pyriformis
Metzgeria_crassipilis
Aquilegia_formosa
Riccia_sp
Chaetosphaeridium_globosum
Physcomitrella_patens
Vitis_vinifera
Leucodon_brachypus
Roya_obtusa
Sciadopitys_verticillata
Larrea_tridentata
Equisetum_diffusum
Penium_margaritaceum
Sphaerocarpos_texanus
Zea_mays
Liriodendron_tulipifera
Marchantia_polymorpha
Ipomoea_purpurea
Sabal_bermudana
Rosulabryum_cf_capillare
Medicago_truncatula
Dendrolycopodium_obscurum
Ceratodon_purpureus
Allamanda_cathartica
Selaginella_moellendorffii_genome
Yucca_filamentosa
Saruma_henryi
Mesostigma_viride
Podophyllum_peltatum
Thuidium_delicatulum
Inula_heleniumRosmarinus_officinalis
Welwitschia_mirabilis
Spirogyra_sp
Colchicum_autumnale
Cosmarium_ochthodes
Kadsura_heteroclita
Coleochaete_irregularis
Rhynchostegium_serrulatum
Monomastix_opisthostigma
Taxus_baccata
Gnetum_montanum
Persea_americana
Alsophila_spinulosa
Sphagnum_lescurii
Cunninghamia_lanceolata
Netrium_digitus
Spirotaenia_minuta
Kochia_scoparia
Huperzia_squarrosa
Juniperus_scopulorum
Polytrichum_commune
Zamia_vazquezii
Nothoceros_vincentianus
Tanacetum_parthenium
Ginkgo_biloba
Boehmeria_nivea
Hedwigia_ciliata
Nothoceros_aenigmaticus
Anomodon_attenuatus
Bryum_argenteum
Branch support
27
Entransia_fimbriata
Smilax_bona
Mougeotia_sp
Klebsormidium_subtile
Angiopteris_evecta
Marchantia_emarginata
Pinus_taedaCedrus_libani
Acorus_americanus
Psilotum_nudum
Eschscholzia_californica
Ophioglossum_petiolatum
Hibiscus_cannabinus
Sarcandra_glabra
Amborella_trichopoda
Catharanthus_roseus
Cylindrocystis_brebissonii
Oryza_sativa
Bazzania_trilobata
Uronema_sp
Chlorokybus_atmophyticus
Pyramimonas_parkeae
Brachypodium_distachyon
Houttuynia_cordata
Carica_papaya
Populus_trichocarpa
Dioscorea_villosa
Ephedra_sinica
Selaginella_moellendorffii_1kp
Diospyros_malabarica
Cylindrocystis_cushleckaeMesotaenium_endlicherianum
Coleochaete_scutata
Pteridium_aquilinum
Sorghum_bicolor
Cycas_micholitzii
Chara_vulgaris
Cycas_rumphii
Nuphar_advena
Arabidopsis_thaliana
Prumnopitys_andina
Pseudolycopodiella_caroliniana
Nephroselmis_pyriformis
Metzgeria_crassipilis
Aquilegia_formosa
Riccia_sp
Chaetosphaeridium_globosum
Physcomitrella_patens
Vitis_vinifera
Leucodon_brachypus
Roya_obtusa
Sciadopitys_verticillata
Larrea_tridentata
Equisetum_diffusum
Penium_margaritaceum
Sphaerocarpos_texanus
Zea_mays
Liriodendron_tulipifera
Marchantia_polymorpha
Ipomoea_purpurea
Sabal_bermudana
Rosulabryum_cf_capillare
Medicago_truncatula
Dendrolycopodium_obscurum
Ceratodon_purpureus
Allamanda_cathartica
Selaginella_moellendorffii_genome
Yucca_filamentosa
Saruma_henryi
Mesostigma_viride
Podophyllum_peltatum
Thuidium_delicatulum
Inula_heleniumRosmarinus_officinalis
Welwitschia_mirabilis
Spirogyra_sp
Colchicum_autumnale
Cosmarium_ochthodes
Kadsura_heteroclita
Coleochaete_irregularis
Rhynchostegium_serrulatum
Monomastix_opisthostigma
Taxus_baccata
Gnetum_montanum
Persea_americana
Alsophila_spinulosa
Sphagnum_lescurii
Cunninghamia_lanceolata
Netrium_digitus
Spirotaenia_minuta
Kochia_scoparia
Huperzia_squarrosa
Juniperus_scopulorum
Polytrichum_commune
Zamia_vazquezii
Nothoceros_vincentianus
Tanacetum_parthenium
Ginkgo_biloba
Boehmeria_nivea
Hedwigia_ciliata
Nothoceros_aenigmaticus
Anomodon_attenuatus
Bryum_argenteum
1
1
1
1
1
1
1
0.94
1
1
1
1
1
0.99
1
1
1
1
1
1
1
1
1
0.7
0.39
1
0.63
0.81
1
1
0.61
1
0.68
1
11
1
1
1
1
1
1
0.98
1
1
1
1
0.42
1
1
1
1
1
1
1
1
0.98
0.86
1
1
1
1
1
1
1
1 1
1
1
1
1
1
0.76
1
0.81
1
1
0.9
0.81
1
1
1
0.98
0.97
1
1
1
1
1
1
1
1
1
11
0.99
1
0.95
0.331
1
By default, nodes labelled by localPP
[Sayyari and Mirarab, 2016]
The log (1KP-log.txt) includes …
28
This is ASTRAL version 5.6.1 424 trees read from 1KP-genetrees.tre … Number of taxa: 103 (103 species) … Taxon occupancy: {Pseudolycopodiella_caroliniana=282, Kadsura_heteroclita=234, Entransia_fimbriata=294, … … Number of Clusters after addition by greedy: 11043 … Final quartet score is: 339023690 Final normalized quartet score is: 0.8946722590876793 … ASTRAL finished in 36.372 secs
Number of individuals
Number of species
Search space sizeNormalized
quartet scoreRaw quartet score
Annotate (score) a given tree and compute quartet support
• Scores 1KP-speciestrees.tre based on the 1KP-genetrees.tre and annotates branches with the quartet score of the branch (-t 1), saving results into 1KP-speciestrees-qs.tre.
• Check out other annotation options at https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md#extensive-branch-annotations
29
java -jar ~/workspace/ASTRAL/astral.5.6.1.jar -i 1KP-genetrees.tre -q 1KP-speciestrees.tre -o 1KP-speciestrees-qs.tre -t 1
Quartet scores as branch-specific estimate of gene tree discordance
30
Quartet score for the main resolution
Brachypodium_distachyon
Alsophila_spinulosa
Selaginella_moellendorffii_genome
Eschscholzia_californica
Riccia_sp
Ophioglossum_petiolatum
Kochia_scoparia
Ephedra_sinica
Sarcandra_glabra
Nephroselmis_pyriformis
Cycas_micholitzii
Liriodendron_tulipifera
Zamia_vazquezii
Chlorokybus_atmophyticus
Smilax_bona
Nothoceros_aenigmaticus
Pseudolycopodiella_caroliniana
Entransia_fimbriata
Marchantia_emarginata
Persea_americana
Sphaerocarpos_texanus
Ipomoea_purpurea
Chara_vulgaris
Nothoceros_vincentianus
Marchantia_polymorpha
Welwitschia_mirabilis
Mougeotia_sp
Podophyllum_peltatum
Taxus_baccata
Cylindrocystis_brebissoniiSpirogyra_sp
Rhynchostegium_serrulatum
Rosulabryum_cf_capillare
Physcomitrella_patens
Larrea_tridentata
Sciadopitys_verticillata
Cosmarium_ochthodesPenium_margaritaceum
Coleochaete_irregularis
Sorghum_bicolor
Ceratodon_purpureus
Ginkgo_biloba
Medicago_truncatula
Houttuynia_cordata
Gnetum_montanum
Metzgeria_crassipilis
Rosmarinus_officinalis
Nuphar_advena
Uronema_sp
Cedrus_libani
Mesotaenium_endlicherianum
Acorus_americanus
Catharanthus_roseus
Hedwigia_ciliataBryum_argenteum
Chaetosphaeridium_globosum
Sabal_bermudana
Cycas_rumphii
Kadsura_heteroclita
Equisetum_diffusum
Netrium_digitus
Cylindrocystis_cushleckae
Inula_helenium
Selaginella_moellendorffii_1kp
Polytrichum_commune
Populus_trichocarpa
Angiopteris_evecta
Roya_obtusa
Tanacetum_parthenium
Colchicum_autumnale
Monomastix_opisthostigma
Psilotum_nudum
Allamanda_cathartica
Diospyros_malabarica
Boehmeria_niveaVitis_vinifera
Sphagnum_lescurii
Coleochaete_scutata
Pinus_taeda
Bazzania_trilobata
Prumnopitys_andina
Carica_papaya
Anomodon_attenuatus
Dendrolycopodium_obscurum
Thuidium_delicatulum
Mesostigma_viride
Dioscorea_villosa
Aquilegia_formosa
Hibiscus_cannabinus
Spirotaenia_minuta
Zea_mays
Pteridium_aquilinum
Huperzia_squarrosa
Arabidopsis_thaliana
Amborella_trichopoda
Oryza_sativa
Juniperus_scopulorum
Saruma_henryi
Yucca_filamentosa
Leucodon_brachypus
Klebsormidium_subtile
Cunninghamia_lanceolata
Pyramimonas_parkeae
48.11
70.41
37.78
55.64
90.25
65.36
87.08
64.57
95.51
51.6
36.96
86.81
75.2
92.65
57.27
72.05
67.74
51.14
82.35
34.03
49.47
92.72
61.25
43.33
40.24
37.22
98.52
33.4898.4
34.3
95.77
69.74
45.03
70.15
76.64
93.95
43.26
58.96
96.9
45.5
42.97
93.18
78.39
98.12
50.75
41.94
91.86
67.53
93.79
58.95
49.52
38.95
99.97
89.7
66.4
39.62
41.88
82.16
71.69
58.96
84.83
47.94
51.89
51.75
96.57
39.25
36.52
81.83
57.51
58.89
95.29
99.11
97.73
90.51
46.9
54.59
64.15
68.29
42.25
41.61
38.67
90.38
43.86
60.47
88.99
88.19
97.39
53.11
81.34
45.5
62.82
84.47
53.71
53.04
38.09
43.5
40.93
62.11
69.27
46.05
42.53
Annotate (score) a given tree and compute quartet support
• Scores 1KP-speciestrees.tre based on the 1KP-genetrees.tre and annotates branches with the quartet score of the branch and the two alternative resolutions (-t 8), saving results into 1KP-speciestrees-qsall.tre.
31
java -jar ~/workspace/ASTRAL/astral.5.6.1.jar -i 1KP-genetrees.tre -q 1KP-speciestrees.tre -o 1KP-speciestrees-qsall.tre -t 8
Quartet scores of alternative resolutions
32
Quartet scores for all three resolutions around the branch. Example: 52%, 21%, 27%
Populus_trichocarpa
Hedwigia_ciliata
Hibiscus_cannabinus
Brachypodium_distachyon
Anomodon_attenuatus
Cycas_rumphii
Ceratodon_purpureus
Uronema_sp
Dioscorea_villosa
Thuidium_delicatulum
Penium_margaritaceum
Ephedra_sinica
Rhynchostegium_serrulatum
Mesostigma_viride
Bazzania_trilobata
Ginkgo_biloba
Nothoceros_vincentianus
Sabal_bermudana
Dendrolycopodium_obscurum
Marchantia_polymorpha
Saruma_henryi
Pyramimonas_parkeae
Sphagnum_lescurii
Ipomoea_purpurea
Juniperus_scopulorum
Polytrichum_commune
Sorghum_bicolor
Prumnopitys_andina
Riccia_spMetzgeria_crassipilis
Nephroselmis_pyriformis
Houttuynia_cordataPersea_americana
Cunninghamia_lanceolata
Rosulabryum_cf_capillare
Amborella_trichopoda
Spirotaenia_minuta
Podophyllum_peltatum
Spirogyra_sp
Leucodon_brachypus
Nuphar_advena
Equisetum_diffusum
Chaetosphaeridium_globosum
Oryza_sativa
Catharanthus_roseus
Taxus_baccata
Coleochaete_scutata
Welwitschia_mirabilis
Medicago_truncatula
Tanacetum_parthenium
Eschscholzia_californica
Psilotum_nudum
Selaginella_moellendorffii_genome
Vitis_vinifera
Alsophila_spinulosa
Physcomitrella_patens
Boehmeria_nivea
Colchicum_autumnale
Liriodendron_tulipifera
Monomastix_opisthostigma
Chlorokybus_atmophyticus
Zamia_vazquezii
Huperzia_squarrosa
Yucca_filamentosa
Coleochaete_irregularis
Bryum_argenteum
Klebsormidium_subtile
Ophioglossum_petiolatum
Smilax_bona
Inula_helenium
Acorus_americanus
Sciadopitys_verticillata
Aquilegia_formosa
Carica_papaya
Kadsura_heteroclita
Nothoceros_aenigmaticus
Mesotaenium_endlicherianum
Pteridium_aquilinum
Roya_obtusaCosmarium_ochthodes
Gnetum_montanum
Allamanda_cathartica
Selaginella_moellendorffii_1kp
Cycas_micholitzii
Entransia_fimbriata
Rosmarinus_officinalis
Larrea_tridentata
Marchantia_emarginata
Netrium_digitus
Angiopteris_evecta
Arabidopsis_thaliana
Cylindrocystis_brebissonii
Chara_vulgaris
Pinus_taedaCedrus_libani
Zea_mays
Pseudolycopodiella_caroliniana
Sarcandra_glabra
Mougeotia_sp
Sphaerocarpos_texanus
Cylindrocystis_cushleckae
Kochia_scopariaDiospyros_malabarica
[q1=0.43;q2=0.27;q3=0.31]
[q1=0.37;q2=0.34;q3=0.29]
[q1=0.77;q2=0.1;q3=0.13]
[q1=0.97;q2=0.01;q3=0.03]
[q1=0.38;q2=0.28;q3=0.34]
[q1=0.59;q2=0.23;q3=0.18]
[q1=0.93;q2=0.02;q3=0.05]
[q1=0.53;q2=0.22;q3=0.25]
[q1=0.42;q2=0.38;q3=0.21]
[q1=0.52;q2=0.26;q3=0.22]
[q1=0.41;q2=0.3;q3=0.29]
[q1=0.62;q2=0.2;q3=0.18]
[q1=0.4;q2=0.38;q3=0.22]
[q1=0.82;q2=0.08;q3=0.1]
[q1=0.53;q2=0.16;q3=0.31]
[q1=0.92;q2=0.02;q3=0.06]
[q1=0.96;q2=0.03;q3=0.02]
[q1=0.45;q2=0.36;q3=0.18]
[q1=0.85;q2=0.07;q3=0.09]
[q1=0.33;q2=0.34;q3=0.32]
[q1=0.42;q2=0.3;q3=0.28]
[q1=0.48;q2=0.26;q3=0.26]
[q1=0.65;q2=0.19;q3=0.15]
[q1=0.4;q2=0.25;q3=0.35]
[q1=0.7;q2=0.12;q3=0.18]
[q1=0.97;q2=0.02;q3=0.01]
[q1=0.51;q2=0.21;q3=0.28]
[q1=0.78;q2=0.14;q3=0.08]
[q1=0.59;q2=0.27;q3=0.14]
[q1=0.59;q2=0.34;q3=0.07]
[q1=0.68;q2=0.18;q3=0.14]
[q1=0.81;q2=0.1;q3=0.09]
[q1=0.52;q2=0.22;q3=0.26]
[q1=0.98;q2=0.01;q3=0.01]
[q1=0.47;q2=0.22;q3=0.32]
[q1=0.64;q2=0.22;q3=0.14]
[q1=0.39;q2=0.31;q3=0.3]
[q1=0.43;q2=0.3;q3=0.27]
[q1=0.94;q2=0.02;q3=0.04]
[q1=0.58;q2=0.17;q3=0.25]
[q1=0.57;q2=0.18;q3=0.24]
[q1=0.88;q2=0.08;q3=0.04]
[q1=0.5;q2=0.23;q3=0.28]
[q1=0.43;q2=0.31;q3=0.26]
[q1=0.37;q2=0.32;q3=0.31]
[q1=0.68;q2=0.19;q3=0.13]
[q1=0.54;q2=0.26;q3=0.2]
[q1=0.51;q2=0.22;q3=0.27]
[q1=0.95;q2=0.02;q3=0.02]
[q1=0.87;q2=0.06;q3=0.07]
[q1=0.82;q2=0.08;q3=0.1]
[q1=0.68;q2=0.14;q3=0.18]
[q1=0.99;q2=0.01;q3=0.01]
[q1=0.72;q2=0.12;q3=0.17]
[q1=0.59;q2=0.23;q3=0.18]
[q1=0.72;q2=0.11;q3=0.17]
[q1=0.63;q2=0.21;q3=0.17]
[q1=0.44;q2=0.15;q3=0.42]
[q1=0.46;q2=0.28;q3=0.26]
[q1=0.48;q2=0.4;q3=0.12]
[q1=0.99;q2=0.01;q3=0]
[q1=0.45;q2=0.26;q3=0.28]
[q1=0.94;q2=0.04;q3=0.02]
[q1=0.45;q2=0.24;q3=0.31]
[q1=1;q2=0;q3=0]
[q1=0.65;q2=0.22;q3=0.13]
[q1=0.93;q2=0.04;q3=0.04]
[q1=0.9;q2=0.05;q3=0.05]
[q1=0.52;q2=0.21;q3=0.27]
[q1=0.89;q2=0.07;q3=0.04]
[q1=0.69;q2=0.14;q3=0.16]
[q1=0.34;q2=0.33;q3=0.33]
[q1=0.66;q2=0.16;q3=0.18]
[q1=0.38;q2=0.36;q3=0.26]
[q1=0.9;q2=0.05;q3=0.05]
[q1=0.93;q2=0.05;q3=0.02]
[q1=0.9;q2=0.05;q3=0.05]
[q1=0.55;q2=0.21;q3=0.25]
[q1=0.7;q2=0.14;q3=0.15]
[q1=0.75;q2=0.11;q3=0.14]
[q1=0.96;q2=0.02;q3=0.02]
[q1=0.82;q2=0.12;q3=0.06]
[q1=0.37;q2=0.3;q3=0.34]
[q1=0.61;q2=0.13;q3=0.26]
[q1=0.56;q2=0.25;q3=0.19][q1=0.49;q2=0.26;q3=0.24]
[q1=0.34;q2=0.34;q3=0.32]
[q1=0.42;q2=0.25;q3=0.33]
[q1=0.87;q2=0.06;q3=0.08]
[q1=0.84;q2=0.09;q3=0.07]
[q1=0.39;q2=0.31;q3=0.3]
[q1=0.43;q2=0.32;q3=0.25][q1=0.6;q2=0.28;q3=0.11]
[q1=0.7;q2=0.17;q3=0.14]
[q1=0.44;q2=0.22;q3=0.35]
[q1=0.91;q2=0.03;q3=0.07]
[q1=0.39;q2=0.28;q3=0.34]
[q1=0.42;q2=0.32;q3=0.26]
[q1=0.98;q2=0.01;q3=0]
[q1=0.98;q2=0.01;q3=0.01]
[q1=0.97;q2=0.01;q3=0.02]
Quartet scores of alternative resolutions
32
Quartet scores for all three resolutions around the branch. Example: 52%, 21%, 27%
Populus_trichocarpa
Hedwigia_ciliata
Hibiscus_cannabinus
Brachypodium_distachyon
Anomodon_attenuatus
Cycas_rumphii
Ceratodon_purpureus
Uronema_sp
Dioscorea_villosa
Thuidium_delicatulum
Penium_margaritaceum
Ephedra_sinica
Rhynchostegium_serrulatum
Mesostigma_viride
Bazzania_trilobata
Ginkgo_biloba
Nothoceros_vincentianus
Sabal_bermudana
Dendrolycopodium_obscurum
Marchantia_polymorpha
Saruma_henryi
Pyramimonas_parkeae
Sphagnum_lescurii
Ipomoea_purpurea
Juniperus_scopulorum
Polytrichum_commune
Sorghum_bicolor
Prumnopitys_andina
Riccia_spMetzgeria_crassipilis
Nephroselmis_pyriformis
Houttuynia_cordataPersea_americana
Cunninghamia_lanceolata
Rosulabryum_cf_capillare
Amborella_trichopoda
Spirotaenia_minuta
Podophyllum_peltatum
Spirogyra_sp
Leucodon_brachypus
Nuphar_advena
Equisetum_diffusum
Chaetosphaeridium_globosum
Oryza_sativa
Catharanthus_roseus
Taxus_baccata
Coleochaete_scutata
Welwitschia_mirabilis
Medicago_truncatula
Tanacetum_parthenium
Eschscholzia_californica
Psilotum_nudum
Selaginella_moellendorffii_genome
Vitis_vinifera
Alsophila_spinulosa
Physcomitrella_patens
Boehmeria_nivea
Colchicum_autumnale
Liriodendron_tulipifera
Monomastix_opisthostigma
Chlorokybus_atmophyticus
Zamia_vazquezii
Huperzia_squarrosa
Yucca_filamentosa
Coleochaete_irregularis
Bryum_argenteum
Klebsormidium_subtile
Ophioglossum_petiolatum
Smilax_bona
Inula_helenium
Acorus_americanus
Sciadopitys_verticillata
Aquilegia_formosa
Carica_papaya
Kadsura_heteroclita
Nothoceros_aenigmaticus
Mesotaenium_endlicherianum
Pteridium_aquilinum
Roya_obtusaCosmarium_ochthodes
Gnetum_montanum
Allamanda_cathartica
Selaginella_moellendorffii_1kp
Cycas_micholitzii
Entransia_fimbriata
Rosmarinus_officinalis
Larrea_tridentata
Marchantia_emarginata
Netrium_digitus
Angiopteris_evecta
Arabidopsis_thaliana
Cylindrocystis_brebissonii
Chara_vulgaris
Pinus_taedaCedrus_libani
Zea_mays
Pseudolycopodiella_caroliniana
Sarcandra_glabra
Mougeotia_sp
Sphaerocarpos_texanus
Cylindrocystis_cushleckae
Kochia_scopariaDiospyros_malabarica
[q1=0.43;q2=0.27;q3=0.31]
[q1=0.37;q2=0.34;q3=0.29]
[q1=0.77;q2=0.1;q3=0.13]
[q1=0.97;q2=0.01;q3=0.03]
[q1=0.38;q2=0.28;q3=0.34]
[q1=0.59;q2=0.23;q3=0.18]
[q1=0.93;q2=0.02;q3=0.05]
[q1=0.53;q2=0.22;q3=0.25]
[q1=0.42;q2=0.38;q3=0.21]
[q1=0.52;q2=0.26;q3=0.22]
[q1=0.41;q2=0.3;q3=0.29]
[q1=0.62;q2=0.2;q3=0.18]
[q1=0.4;q2=0.38;q3=0.22]
[q1=0.82;q2=0.08;q3=0.1]
[q1=0.53;q2=0.16;q3=0.31]
[q1=0.92;q2=0.02;q3=0.06]
[q1=0.96;q2=0.03;q3=0.02]
[q1=0.45;q2=0.36;q3=0.18]
[q1=0.85;q2=0.07;q3=0.09]
[q1=0.33;q2=0.34;q3=0.32]
[q1=0.42;q2=0.3;q3=0.28]
[q1=0.48;q2=0.26;q3=0.26]
[q1=0.65;q2=0.19;q3=0.15]
[q1=0.4;q2=0.25;q3=0.35]
[q1=0.7;q2=0.12;q3=0.18]
[q1=0.97;q2=0.02;q3=0.01]
[q1=0.51;q2=0.21;q3=0.28]
[q1=0.78;q2=0.14;q3=0.08]
[q1=0.59;q2=0.27;q3=0.14]
[q1=0.59;q2=0.34;q3=0.07]
[q1=0.68;q2=0.18;q3=0.14]
[q1=0.81;q2=0.1;q3=0.09]
[q1=0.52;q2=0.22;q3=0.26]
[q1=0.98;q2=0.01;q3=0.01]
[q1=0.47;q2=0.22;q3=0.32]
[q1=0.64;q2=0.22;q3=0.14]
[q1=0.39;q2=0.31;q3=0.3]
[q1=0.43;q2=0.3;q3=0.27]
[q1=0.94;q2=0.02;q3=0.04]
[q1=0.58;q2=0.17;q3=0.25]
[q1=0.57;q2=0.18;q3=0.24]
[q1=0.88;q2=0.08;q3=0.04]
[q1=0.5;q2=0.23;q3=0.28]
[q1=0.43;q2=0.31;q3=0.26]
[q1=0.37;q2=0.32;q3=0.31]
[q1=0.68;q2=0.19;q3=0.13]
[q1=0.54;q2=0.26;q3=0.2]
[q1=0.51;q2=0.22;q3=0.27]
[q1=0.95;q2=0.02;q3=0.02]
[q1=0.87;q2=0.06;q3=0.07]
[q1=0.82;q2=0.08;q3=0.1]
[q1=0.68;q2=0.14;q3=0.18]
[q1=0.99;q2=0.01;q3=0.01]
[q1=0.72;q2=0.12;q3=0.17]
[q1=0.59;q2=0.23;q3=0.18]
[q1=0.72;q2=0.11;q3=0.17]
[q1=0.63;q2=0.21;q3=0.17]
[q1=0.44;q2=0.15;q3=0.42]
[q1=0.46;q2=0.28;q3=0.26]
[q1=0.48;q2=0.4;q3=0.12]
[q1=0.99;q2=0.01;q3=0]
[q1=0.45;q2=0.26;q3=0.28]
[q1=0.94;q2=0.04;q3=0.02]
[q1=0.45;q2=0.24;q3=0.31]
[q1=1;q2=0;q3=0]
[q1=0.65;q2=0.22;q3=0.13]
[q1=0.93;q2=0.04;q3=0.04]
[q1=0.9;q2=0.05;q3=0.05]
[q1=0.52;q2=0.21;q3=0.27]
[q1=0.89;q2=0.07;q3=0.04]
[q1=0.69;q2=0.14;q3=0.16]
[q1=0.34;q2=0.33;q3=0.33]
[q1=0.66;q2=0.16;q3=0.18]
[q1=0.38;q2=0.36;q3=0.26]
[q1=0.9;q2=0.05;q3=0.05]
[q1=0.93;q2=0.05;q3=0.02]
[q1=0.9;q2=0.05;q3=0.05]
[q1=0.55;q2=0.21;q3=0.25]
[q1=0.7;q2=0.14;q3=0.15]
[q1=0.75;q2=0.11;q3=0.14]
[q1=0.96;q2=0.02;q3=0.02]
[q1=0.82;q2=0.12;q3=0.06]
[q1=0.37;q2=0.3;q3=0.34]
[q1=0.61;q2=0.13;q3=0.26]
[q1=0.56;q2=0.25;q3=0.19][q1=0.49;q2=0.26;q3=0.24]
[q1=0.34;q2=0.34;q3=0.32]
[q1=0.42;q2=0.25;q3=0.33]
[q1=0.87;q2=0.06;q3=0.08]
[q1=0.84;q2=0.09;q3=0.07]
[q1=0.39;q2=0.31;q3=0.3]
[q1=0.43;q2=0.32;q3=0.25][q1=0.6;q2=0.28;q3=0.11]
[q1=0.7;q2=0.17;q3=0.14]
[q1=0.44;q2=0.22;q3=0.35]
[q1=0.91;q2=0.03;q3=0.07]
[q1=0.39;q2=0.28;q3=0.34]
[q1=0.42;q2=0.32;q3=0.26]
[q1=0.98;q2=0.01;q3=0]
[q1=0.98;q2=0.01;q3=0.01]
[q1=0.97;q2=0.01;q3=0.02]
Hard to read
• https://github.com/esayyari/DiscoVista • Sayyari, et. al. “DiscoVista: Interpretable Visualizations of
Gene Tree Discordance.” MPE 122 (2018): 110–15.
33
Discovista: visualizing discordance
• https://github.com/esayyari/DiscoVista • Sayyari, et. al. “DiscoVista: Interpretable Visualizations of
Gene Tree Discordance.” MPE 122 (2018): 110–15.
33
Discovista: visualizing discordance
Handling multiple individuals• Handling multiple individuals is supported
(the relevant paper has been in review for a long time)
• The mapping between names in gene trees and names in the species tree should be provided using the -a option. Two formats are supported (either can be used):
34
cat: siamesecat12,persiancat17,wildcat dog: dog1,labrador,bulldog-1,bulldog-2 horse: pony,shire-2,mustang110
cat 3 siamesecat12 persiancat17 wildcat dog 4 dog1 labrador bulldog-1 bulldog-2 horse 3 pony shire-2 mustang110
Format 1:
Format 2:
Main publications• Mirarab, Siavash, Rezwana Reaz, Md. Shamsuzzoha Bayzid, Théo
Zimmermann, M. S. Swenson, and Tandy Warnow. “ASTRAL: Genome-Scale Coalescent-Based Species Tree Estimation.” Bioinformatics 30, no. 17 (2014): i541–48. https://doi.org/10.1093/bioinformatics/btu462.
• Mirarab, S., and T. Warnow. “ASTRAL-II: Coalescent-Based Species Tree Estimation with Many Hundreds of Taxa and Thousands of Genes.” Bioinformatics 31, no. 12 (2015). https://doi.org/10.1093/bioinformatics/btv234.
• Zhang, Chao, Maryam Rabiee, Erfan Sayyari, and Siavash Mirarab. “ASTRAL-III: Polynomial Time Species Tree Reconstruction from Partially Resolved Gene Trees.” BMC Bioinformatics 19, no. S6 (2018): 153. https://doi.org/10.1186/s12859-018-2129-y.
• Sayyari, Erfan, and Siavash Mirarab. “Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies.” Molecular Biology and Evolution 33, no. 7 (2016): 1654–68. https://doi.org/10.1093/molbev/msw079.
35
Some other ASTRAL-related papers• Testing for polytomies in species trees using quartet frequencies (Sayyari and Mirarab).
Genes (2018) (Note: Implemented in option -t 10)
• Filtering loci is not beneficial! (Molloy and Warnow), Systematic Biology (2018)
• ASTRAL consistent under models of missing data (Nute, Molloy, Chou, and Warnow). BMC Genomics (2018)
• SIESTA improves ASTRAL (Vachaspati and Warnow). BMC Genomics (2018)
• Visualizing Discordance using DiscoVista (Sayyari, Whitfield, and Mirarab). Molecular Phylogenetics and Evolution (2018)
• Fragmentary sequences can negatively impact ASTRAL trees (Sayyari, Whitfield, and Mirarab). Molecular Biology and Evolution (2017)
• How many genes does ASTRAL need? (Shekhar, Roch, and Mirarab). Transactions on Computational Biology and Bioinformatics (2017)
• Using ASTRAL as a supertree method (Vachaspati and Warnow). Bioinformatics (2017)
• Performance under ILS and HGT (Davidson, Vachaspati, Mirarab, and Warnow). BMC Genomics (2015).
36
For more info• Contact us:
Tandy Warnow, [email protected], Siavash Mirarab, [email protected]
• Software available at Github site: https://github.com/smirarab/ASTRAL
• See tutorial and README at GitHub site: https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md
• Email: [email protected]
• More related papers at http://tandy.cs.illinois.edu/papers-all.html and http://eceweb.ucsd.edu/~smirarab/publications.html
37