![Page 1: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/1.jpg)
Phylogenetics in the Age of Genomics: Prospects and Challenges
Antonis RokasDepartment of Biological Sciences, Vanderbilt University
http://as.vanderbilt.edu/rokaslab
![Page 2: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/2.jpg)
http://pubmed2wordle.appspot.com/
![Page 3: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/3.jpg)
What is a Phylogenetic Tree?
A phylogenetic tree is the mathematical structure used to depict the evolutionary history of a group
of organisms or genes
Phylogenetic trees show historical relationships, not similarities
![Page 4: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/4.jpg)
Benefits to Science:
The origin and history of life
Evolution of molecules, phenotypes and developmental
mechanisms
Benefits to Society:
Human health (identification of disease agents &
reservoirs)
Agriculture (crops’ wild relatives)
Biodiversity (conservation strategies)
Why are Phylogenetic Trees Useful?
![Page 5: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/5.jpg)
Metzker et al. (2002) PNAS
A gastroenterologist was accused of second-degree murder for injecting his former girlfriend with blood obtained from an HIV type 1 (HIV-1)-infected patient under his care
Phylogenetic analyses of HIV-1 sequences were admitted and used as evidence in the court
CSI: Phylogeny
Patient and Victim HIV-1 sequences
Sequences from local population sample of
HIV-1 infected
individuals
![Page 6: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/6.jpg)
Phylogenetic Concepts and Terminology
tree / topology | taxon - taxa | branches, nodes & internodes| polytomy & resolution
![Page 7: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/7.jpg)
Rooted and Unrooted Trees
![Page 8: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/8.jpg)
Phylogenetic Concepts and Terminology
![Page 9: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/9.jpg)
The Meaning of Vertical and Horizontal Axes in Trees
![Page 10: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/10.jpg)
1. Select an optimality criterion
2. Select a search strategy
3. Use selected search strategy to generate a series of trees, and apply selected optimality criterion to each, always keeping track of the ‘best’ tree(s) examined thus far
Estimating Phylogenies
![Page 11: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/11.jpg)
Criterion Data Minimum Evolution pairwise distancesLeast Squares pairwise distancesParsimony discrete charactersMaximum Likelihood discrete charactersBayesian Inference discrete characters
Optimality Criteria
Distance methods first convert an alignment into a matrix of pairwise distances
Discrete character methods use the alignment as is and consider directly each site/character
![Page 12: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/12.jpg)
There is a need for a search strategy because the number of possible trees goes up very fast with increasing number of species
Search StrategyExhaustive Heuristic
Search Strategies
3
( ) (2 5)T
n
B T n
![Page 13: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/13.jpg)
Exhaustive Search
![Page 14: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/14.jpg)
Heuristic Search
Heuristic searches don’t guarantee discovery of optimal topology
The lazy subtree rearrangement (LSR) tree search strategy used by the RAxML program
![Page 15: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/15.jpg)
1234567...seq1: AAAATCGTGGGGseq2: ATAATGGTGGCGseq3: AAAATGGTGGCGseq4: ATAATGGTGGCG
Characters are sampled with replacement to create many bootstrap replicate data sets equal in size to the original one
Each bootstrap replicate data set is analysed (e.g., with parsimony, likelihood)
Agreement among the resulting trees is summarized with a majority-rule consensus tree
The frequency of occurrence of clades, also known as bootstrap values, is a measure of support for those groups
Assessing Robustness in Inference: Bootstrap
![Page 16: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/16.jpg)
The Problem of Incongruence
Species tree?
Gene X Gene Y
Incongruence is pervasive in the phylogenetics
literature
Rokas & Chatzimanolis (2008) in Phylogenomics (W. J. Murphy, Ed.)
![Page 17: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/17.jpg)
A Systematic Evaluation of Single Gene Phylogenies
Dataset: 106 genes on all 16 chromosomes totaling 127kb corresponding roughly to 1% of the genomic sequence, 2% of genes
Analyses: Maximum Likelihood (ML) & Maximum Parsimony (MP) on nt data sets and MP on amino acid data sets
S. cerevisiae
S. paradoxus
S. mikatae
S. bayanus
Kellis et al. (2003) Nature
![Page 18: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/18.jpg)
Incongruence in Shallow Time
ML / MP Rokas et al. (2003) Nature
![Page 19: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/19.jpg)
Concatenation of 106 Genes Yields a Single Yeast Phylogeny
ML / MP on ntMP on aa Rokas et al. (2003) Nature
![Page 20: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/20.jpg)
Phylogenetic Accuracy is Positively Correlated with Gene Number
Rokas & Carroll (2005) Mol. Biol. Evol.
Murphy et al. (2001) Science Zanis et al. (2002) PNAS
![Page 21: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/21.jpg)
Reconstructing the Tree of Life
Ciccarelli et al. (2006) Science James et al. (2006) Nature
![Page 22: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/22.jpg)
Methods of Assessing Robustness are Subject to Systematic Error
Rokas & Carroll (2006) PLoS Biol.; Rokas et al. (2003) Nature
![Page 23: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/23.jpg)
Gene Trees Can Differ from Species Trees
Degnan & Rosenberg (2009) Trends Ecol. Evol.
![Page 24: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/24.jpg)
Inferring the Species Tree from Individual Gene Histories
Concordance Factor: The proportion of the genome for which a clade is true
Ané et al. (2007) Mol. Biol. Evol.
STEM: Maximum likelihood estimation of species treesKubatko et al.(2009) Bioinformatics
BEST: Bayesian estimation of species treesLiu (2008) Bioinformatics
![Page 25: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/25.jpg)
![Page 26: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/26.jpg)
Lack of Resolution Due to Lack of Data or to Radiation?
Balavoine & Adoutte (1998) Science / Rokas et al. (2003) Evol. Dev.
![Page 27: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/27.jpg)
Lack of Resolution Among Most Metazoan Phyla
Rokas et al. (2005) Science
50 genes, ML/MP
![Page 28: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/28.jpg)
Hypothesis I: Mutational Saturationthe phylogenetic signal originally contained in these protein sequences has been erased by multiple substitutions
Hypothesis II: Evolutionary Radiationthe close spacing of cladogenetic events early in the history of metazoans limits their resolution
Biological Explanations for the Lack of Resolution
![Page 29: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/29.jpg)
![Page 30: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/30.jpg)
Time of Origin of Metazoa & Fungi
3) The way this set of 50 genes has evolved across the two Kingdoms is remarkably similar
727 Myr849 MyrDouzery et al. (2004) PNAS
Heckman et al. (2001) Science
Hedges et al. (2004) BMC Evol. Biol.
Study
1208 Myr1177 Myr
968 Myr976 Myr
FungiMetazoa2) Relative molecular clocks
Metazoa: Xiao et al. (1998) Nature Fungi: Yuan et al. (2005) Science
1) Fossils
![Page 31: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/31.jpg)
50 genes, ML/MP Rokas et al. (2005) Science
A Remarkable Contrast in Phylogenetic Resolution
Given: lack of resolution in
Metazoa (using lots of genes)
contrast in resolution between Metazoa and Fungi (using identical genes)
The early history of animals is best viewed as an evolutionary radiation
![Page 32: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/32.jpg)
Valentine (2002) Ann. Rev. Ecol. Syst.
Rapid Tempo of Cladogenesis Early in Metazoan History
![Page 33: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/33.jpg)
Incongruence in Deep Time
Rokas (2008) Ann. Rev. Genet.
![Page 34: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/34.jpg)
![Page 35: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/35.jpg)
What Makes Animals Hard to Resolve But Yeasts Easy?
Rokas & Carroll (2006) PLOS Biol.
Internode length: influences amount of phylogenetic signal (I)
Homoplasy: independent evolution of identical characters (*, )
![Page 36: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/36.jpg)
Homoplasy
the independent evolution of identical character states
(e.g., amino acid residues) in different branches of a
phylogenetic tree that are not directly inherited from a
common ancestor
![Page 37: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/37.jpg)
Rokas & Carroll (2008) Mol. Biol. Evol.
Quantifying Homoplasy Across the Tree of Life
![Page 38: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/38.jpg)
Measuring Homoplasy Across the Tree of Life
Rokas & Carroll (2008) Mol. Biol. Evol.
![Page 39: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/39.jpg)
Measuring Homoplasy Across the Tree of Life
![Page 40: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/40.jpg)
An Overabundance of Homoplastic Substitutions
Dataset Obs(H) Exp(H) Obs(H)/Exp(H)Saccharomyces yeasts 4.7% 2.0% 2.4
Aspergillus filamentous fungi 6.9% 2.5% 2.8
Fungal phyla 6.7% 3.5% 1.9
Eukaryote phyla 7.5% 3.5% 2.1
Plants 2.3% 0.9% 2.7
Drosophila fruit-flies 2.6% 1.3% 2.0
Cetartiodactyls mammals 12.3% 4.9% 2.5
Paenungulata mammals 10.4% 5.0% 2.1
Metazoan phyla 7.4% 3.3% 2.3
Vertebrates 10.2% 3.2% 3.2
Average: 7.1% 2.3% 2.4
Rokas & Carroll (2008) Mol. Biol. Evol.
![Page 41: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/41.jpg)
Parsimony-informative sitesClade Obs(H) Exp(H) Obs(H)/Exp(H)
Saccharomyces yeasts 3.3% 3.3% 0.0Aspergillus filamentous ascomycetes 10.3% 9.8% 1.1Fungal phyla 2.5% 2.2% 1.1Eukaryotic phyla 2.3% 2.1% 1.1Land plants 31.6% 31.7% 1.0Drosophila fruit-flies 10.2% 10.1% 1.0Cetartiodactyl mammals 4.5% 3.6% 1.3Metazoan phyla 3.1% 2.8% 1.1*Paenungulata mammals 5.4% 2.4% 2.3*Vertebrates 2.9% 2.4% 1.2
Average: 7.6% 7.0% 1.1
Excess Homoplasy is Specific to Homoplastic Substitutions
Rokas & Carroll (2008) Mol. Biol. Evol.
![Page 42: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/42.jpg)
Homoplasy Stems From Frequently Exchanged Amino Acids
Rokas & Carroll (2008) Mol. Biol. Evol.
190 possible interchanges among 20 amino acids- 75 can be achieved via a single nucleotide substitution- The other 115 require two or three substitutions
65% of observed interchanges is between the top12 most frequently observed amino acid interchanges a single mutational step away
![Page 43: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/43.jpg)
Conservative AA Substitutions Are Very Common in Alignments
>Cimi EGAAGPIRFNMRLPESRYHCAGGITTIVVY>Acla AATTSPEDVETRKDDERVEHGGGITTVLVY>Afis AATASAEEFELRKEDERVEHGDGITTILIY>Afla SPAAAADEFDLQRDEDQNDYADSVSSVFIF>Afum AATTSAEEFELRKEDERVEHGDGITTILIY>Anid APAATADVSDTRLKQEQAEQAGGITAILVY>Anig APAAAPDDLETRLEEEQADYAGSVSSILIY>Aory SPAAAADEFDLQRDEDQNDYADSVSSVFIF>Ater SPAAAPDDVDTQREEDQNDYAGSVSSIFVF
GAC/T (D)Aspartic Acid
GAA/G (E)Glutamic Acid
![Page 44: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/44.jpg)
![Page 45: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/45.jpg)
What Are the Limits to Resolution?
Philippe et al. (1994) Development
Time Amount of Data _40 my -> 700 variable sites
10 my -> 2,800 variable sites
1 my -> 28,000 variable sites
0.1 my -> ?????? variable sites
1 y -> ?????? variable sites
For internodes in ~ 600 my animal phylogeny
![Page 46: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/46.jpg)
What If Mammals Had Diversified in the Cambrian?
Springer et al. (2003) PNAS
![Page 47: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/47.jpg)
107 Myr tree 600 Myr tree
![Page 48: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/48.jpg)
Phylogenetic Accuracy is Inversely Correlated with Elapsed Time
![Page 49: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/49.jpg)
Failure to Resolve Internodes < 10 Million Years
Rokas et al. (2005) Science
![Page 50: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/50.jpg)
Why Are There So Few Bushes?
Data in, fully-resolved phylogenetic tree out
![Page 51: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/51.jpg)
Carroll (2003) Nature
The Human / Chimp / Gorilla Tree
Inform. Sites
1302 / 11293
8561 / 11293
1430 / 11293
Patterson et al. (2006) Nature
![Page 52: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/52.jpg)
Bushes in the Mammalian Tree
Inform. Sites
25
22
21
Nishihara et al. (2009) PNAS
![Page 53: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/53.jpg)
The three major lineages first
appeared within 20 – 30 million
years ago, approximately 390
million years ago
Inform. Sites
92 / 294
96 / 294
106 / 294
Takezaki et al. (2004) Mol. Biol. Evol.
44 genes, ML/MP/NJ
The Tetrapod / Lungfish / Coelacanth Bush
![Page 54: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/54.jpg)
“One can use the most sophisticated audio equipment to listen, for an eternity, to a recording of white noise and still not glean a useful scrap
of information”
Mind the Gap Between Real Data and Models
Rodrigo et al. (1994) Chapter in:Sponge in Time and Space; Biology, Chemistry, Paleontology
![Page 55: Phylogenetics in the Age of Genomics: Prospects and Challenges](https://reader031.vdocuments.us/reader031/viewer/2022021308/620842ffc065ba194e6a4c85/html5/thumbnails/55.jpg)
Acknowledgements
http://as.vanderbilt.edu/rokaslab