did viruses evolve as a distinct supergroup from common ...€¦ · 18.04.2016 · be polarized a...
TRANSCRIPT
1
Did viruses evolve as a distinct supergroup from common ancestors of cells?
Ajith Harish1*, Aare Abroi2, Julian Gough3 and Charles Kurland4
1Structural and Molecular Biology Group, Department of Cell and Molecular Biology, Biomedical Center, Uppsala University, 751 24 Uppsala, Sweden.
2Estonian Biocentre, Riia 23, Tartu, 51010, Estonia. 3Computational Genomics Group, Department of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, UK
4Microbial Ecology, Department of Biology, Lund University, Ecology Building, SE-223 62 Lund, Sweden.
*Corresponding author: [email protected] (preferred), [email protected] (institutional)
Abstract
The evolutionary origins of viruses according to marker gene phylogenies, as well as their
relationships to the ancestors of host cells remains unclear. In a recent article Nasir
and Caetano-Anollés reported that their genome-scale phylogenetic analyses identify an
ancient origin of the “viral supergroup” (Nasir et al (2015) A phylogenomic data-driven
exploration of viral origins and evolution. Science Advances, 1(8):e1500527). It suggests that
viruses and host cells evolved independently from a universal common ancestor.
Examination of their data and phylogenetic methods indicates that systematic errors likely
affected the results. Reanalysis of the data with additional tests shows that small-genome
attraction artifacts distort their phylogenomic analyses. These new results indicate that their
suggestion of a distinct ancestry of the viral supergroup is not well supported by the
evidence.
Introduction
The debate on the ancestry of viruses is still undecided: In particular, it is still unclear
whether viruses evolved before their host cells or if they evolved more recently from the host
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
2
cells. The virus-early hypothesis posits that viruses predate or coevolved with their cellular
hosts [1]. Two alternatives describe the virus-late scenario: (i) the progressive evolution also
known as the escape hypothesis and (ii) regressive evolution or reduction hypothesis. Both
propose that viruses evolved from their host cells [1]. According to the first of these two
virus-late models, viruses evolved from their host cells through gradual acquisition of genetic
structures. The other alternative suggests that viruses, like host-dependent endoparasitic
bacteria, evolved from free-living ancestors by reductive evolution. The recent discovery of
the so-called giant viruses with double-stranded DNA genomes that parallel endoparasitic
bacteria with regards to genome size, gene content and particle size revived the reductive
evolution hypothesis. However, there are so far no identifiable ‘universal’ viral genes that are
common to viruses such as the ubiquitous cellular genes. In other words, examples of
common viral components that are analogous to the ribosomal RNA and ribosomal protein
genes, which are common to cellular genomes, are not found. This is one compelling reason
that phylogenetic tests of the “common viral ancestor” hypotheses seem so far inconclusive.
Recently, Nasir and Caetano-Anollés [2] employed phylogenetic analysis of whole-
genomes and gene contents of thousands of viruses and cellular organisms to test the
alternative hypotheses. The authors conclude that viruses are an ancient lineage that diverged
independently and in parallel with their cellular hosts from a universal common ancestor
(UCA). They reiterate their earlier claim [3] that viruses are a unique lineage, which predated
or coevolved with the last UCA of cellular lineages (LUCA) through reductive evolution
rather than through more recent multiple origins. Their claims are based on analyses of
statistical- and phyletic distribution patterns of protein domains, classified as superfamilies
(SFs) in Structural Classification of Proteins (SCOP) [4].
Detailed re-examination of Nasir and Caetano-Anollés' phylogenomic approach [2, 3]
suggests that small genomes systematically distort their phylogenetic reconstructions of the
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
3
tree of life (ToL), especially the rooting of trees. Here the ToL is described as the
evolutionary history of contemporary genomes: as a tree of genomes or a tree of proteomes
(ToP) [5-8]. The bias due to highly reduced genomes of parasites and endosymbionts in
genome-scale phylogenies has been known for over a decade [5, 6]. In fact, prior to the recent
proposal [2] these authors recognized the anomalous effects of including small genomes in
reconstructing the ToL in analyses that were limited to cellular organisms [9] or which
included giant viruses [3]: As they say “In order to improve ToP reconstructions, we
manually studied the lifestyles of cellular organisms in the total dataset and excluded
organisms exhibiting parasitic (P) and obligate parasitic (OP) lifestyles, as their inclusion is
known to affect the topology of the phylogenetic tree” [3]. But, they may not have adequately
addressed this problem, particularly when the samplings include viral genomes that are likely
to further exacerbate bias due to small genomes [2, 3]. For this reason we systematically
tested the reliability of the phylogenetic trees, especially the rooting approach favored by
Nasir and Caetano-Anollés [2]. This approach depends critically on a pseudo-outgroup to
root the ToP (ToL), but that pseudo-outgroup is not identified empirically. Rather, it is
assumed a priori to be an empty set [2].
We show here in several independent phylogenetic reconstructions that a rooting
based on a hypothetical “all-zero” outgroup—an ancestor that is assumed to be an empty set
of protein domains—creates specific phylogenetic artifacts: In this particular approach [2, 3]
implementing the all-zero outgroup artifactually draws the taxa with the smallest genomes
(proteomes) into a false rooting very much like the classical distortions due to long-branch
attractions (LBA) in gene trees [10].
Results and Discussion
We emphasize at the outset of this study that virtually all the evolutionary
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
4
interpretations based on phylogenetic reconstructions depend on a reliable identification of
the root of a tree [11, 12]. In particular, the rooting of a tree will determine the branching
order of species, and define the ancestor-descendant relationships between taxa as well as the
derived features of characters (character states). In effect, the root polarizes the order of
evolutionary changes along the tree with respect to time. By the same token, the rooting of a
tree distinguishes ancestral states from derived states among the different observed states of a
character. If ancestral states are identified directly, for example from fossils, characters can
be polarized a priori with regard to determining the tree topology. A priori polarization
provides for intrinsic rooting. However, direct identification of ancestral states, particularly
for extant genetic data is rarely possible. Consequently conventional phylogenetic methods
use time-reversible (undirected) models of character evolution and they only compute
unrooted trees. For example, the root of the iconic ribosomal RNA ToL or any other gene-
tree has not been determined directly [13, 14].
Accordingly, conventional approaches to character polarization are indirect and the
rooting of trees is normally a two-stage process. The most common rooting method is the
outgroup comparison method, which is based on the premise that character-states common to
the ingroup (study group) and a closely related sister-group (the outgroup) are likely to be
ancestral to both. Therefore, in an unrooted tree the root is expected to be positioned on the
branch that connects the outgroup to the ingroup. In this way, the tree (and characters) may
be polarized a posteriori [12, 14]. However, there are no known outgroups for the ToL. In the
absence of natural outgroups, pseudo-outgroups are used to root the ToL [14]. The best-
known case is the root grafted onto the unrooted ribosomal RNA ToL based on presumed
ancient (pre-LUCA) gene duplications [15, 16]. Here the paralogous proteins act as
reciprocal outgroups that root each other. Unlike gene duplications used to root gene trees,
the challenge of identifying suitable outgroups becomes more acute for genome trees.
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
5
To meet this challenge Nasir and Caetano-Anollés use a hypothetical pseudo-
outgroup: an artificial taxon constructed from presumed ancestral states. For each character
(SF), the state '0' or 'absence' of a SF is assumed to be “ancestral” a priori. This artificial “all-
zero” taxon is used as outgroup to root the ToL. Further, they use the Lundberg rooting
method, in which outgroups are not included in the initial tree reconstruction. The Lundberg
method involves estimating an unrooted tree for the ingroup taxa only, and then attaching
outgroup(s) (when available) or a hypothesized ancestor to the tree a posteriori to determine
the position of the root [17]. Unrooted trees describe relatedness of taxa based on graded
compositional similarities of characters (and states). Accordingly, we can expect the “all-
zero” pseudo-outgroup to cluster with genomes (proteomes) in which the smallest number of
SFs is present. The latter are the proteomes described by the largest number of ‘0s’ in the
data matrix.
The instability of rooting with an all-zero pseudo-outgroup becomes clear when the
smallest proteome in a given taxon sampling varies in the rooting experiments (Figs. 1 and
2). Rooting experiments were preformed both for SF occurrence (presence/absence) patterns
and for SF abundance (copy number) patterns. However, we present results for the SF
abundance patterns, as in [2]. Throughout, we refer to genomic protein repertoires as
proteomes. Proteome size related as the number of distinct SFs in a proteome (SF occurrence)
is depicted next to each taxon for easy comparison in Figs. 1 and 2. Phylogenetic analyses
were carried out as described in [2, 3] (see methods). SFs that are shared between proteomes
of viruses and cellular organisms were used as the characters (Fig. 1a) as in [2]. Initially no
viruses are included in tree reconstructions and here the root was placed within the Archaea,
which has the smallest proteome (503 SFs) among the supergroups (Fig 1b). When a still
smaller bacterial proteome (420 SFs) was included, the position of the root as well as the
branching order changed. In this case, the bacteria were split into two groups and the root
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
6
was placed within one of the bacterial groups (Fig. 1c). Further, when a much smaller
archaeal proteome was included (228 SFs), the root was relocated to a branch leading to the
now smallest proteome (Fig. 1d). Note that the newly included taxa, both bacteria and
archaea are host-dependent symbionts with reduced genomes.
Similarly including viruses in the analyses draws the root towards the smaller viral
proteomes (Fig. 2). As in the rooting experiments in Fig. 1, a group of DNA viruses (107-175
SFs) was introduced. These DNA viruses have larger proteomes than do the RNA viruses,
but they are much smaller than most known endosymbiotic bacteria (Fig. 2d). Again, the root
was repositioned within the DNA viruses group (Fig. 2a). Following this experiment, two
extremely reduced (107 SFs each) endosymbiotic bacteria classified as Betaproteobacteria
were included. These further displaced the root closer to the smallest set of proteomes (Fig.
2b). Finally, a set of four RNA viruses (4-17 SFs) in their genomes was introduced and they
rooted the tree within the RNA viruses (Fig. 2c). These results challenge the conclusion
drawn previously that the proteomes of RNA viruses are more ancient than proteomes of
DNA viruses [2]. In addition, the results contradict the purported antiquity of viral proteomes
as such. Rather, the data suggest that there are severe artifacts generated by genome size-bias
due to the inclusion of the viral proteomes in the analysis. These artifacts are expressed as
grossly erroneous rootings caused by small-genome attraction (SGA) in the Lundberg rooting
using the hypothetical all-zero outgroup.
Including the all-zero pseudo-outgroup in the analysis either implicitly (defined by the
ANCSTATES option in PAUP*) or explicitly (as a taxon in the data matrix) does not make a
difference to the tree topology and rooting. We note that including the hypothetical ancestor
during tree estimation amounts to a priori character polarization and pre-specification of the
root. In addition, the position of the root was the same in the different rooting experiments
when the all-zero pseudo-ancestor was explicitly specified as the outgroup for root trees
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
7
using the outgroup method (see supplementary Figs S1 & S2). These rooting experiments
reveal a strong bias in rooting that favors small genomes irrespective of the different rooting
methods—outgroup rooting, Lundberg rooting and intrinsic rooting—used to root the ToL
with the “all-zero” hypothetical ancestor. This SGA artifact is comparable to the better-
known LBA artifact that is associated with compositional bias of nucleotides or of amino
acids that distort gene trees [10].
The use of artificial outgroups is not uncommon in rooting experiments when rooting
is ambiguous [11]. Artificial taxa are either an all-zero outgroup or an outgroup constructed
by randomizing characters and/or character-states of real taxa. Although rooting experiments
with multiple real outgroups, or randomized artificial outgroups that simulate loss of
phylogenetic signal can minimize the ambiguity in rooting the all-zero outgroup has proved
to be of little use [11, 14]. Conclusions based on an all-zero outgroup are often refuted when
empirically grounded analysis with real taxa are carried out [14]. Indeed, the present rooting
experiments (Figs. 1 and 2) clearly show that the position of the root depends on the smallest
genome in the sample when a hypothetical all-zero ancestor/outgroup is used. In effect, the
rooting approach favored by Nasir and Caetano-Anolles [2] is not reliable.
Nevertheless, small proteome size is not an irreconcilable feature of genome-tree
reconstructions [5, 6, 18]. Small genome attraction artifacts may be observed when highly
reduced proteomes of obligate endosymbionts are included in analyses with common
samplings [3, 5, 6, 18]. However, their untoward effects are only observed in the shallow end
of the tree where the endosymbionts might be clustered with unrelated groups. In such cases
the deep divergences or clustering of major branches are unperturbed [6, 18]. Such size-
biases are readily corrected by normalizations that account for genome size (actually specific
SF content in this case) [5, 6, 18].
Though size-bias correction for SF content phylogeny is known to be reliable, both
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
8
for measures based on distance [6] and those based on frequencies of character distribution
[18], Nasir and Caetano-Anollés do not apply such corrections. They only attend to the
proteome size variations associated with SF abundances [2, 3]. Instead of accounting for
novel taxon-specific SFs in their model of evolution, the authors choose to exclude
potentially problematic small proteomes of parasitic bacteria and to include only the
proteomes of ‘free-living’ cellular organisms in their analyses [2, 3]. But all viruses are
parasites and obviously even more extreme examples of minimal proteomes.
This problem is further exacerbated by the uneven and largely incomplete annotation
of SF domains in viral proteins [19, 20]. In fact, many viral ‘proteomes’ that were sampled in
[2] are as small as a single SF. It is not clear why the inclusion of small viral proteomes was
not recognized as even more problematic than the inclusion of small parasitic bacterial
proteomes, in spite of the previous assertion of these authors that small proteomes should be
excluded [9]. Nevertheless, including small viral proteomes is inconsistent with specifically
excluding small cellular proteomes in the ToL, especially when hypotheses of reductive
evolution are considered. Screening taxa based on ‘lifestyle’ (free-living or parasitic) seems
unwarranted since extreme reductive genome evolution, sometimes called genome
streamlining, is not limited to host adapted parasitic bacteria but is common in free-living
bacteria as well as eukaryotes [21-23].
In addition to the ToP, the authors use a so-called tree of domains (ToD) to support
their conclusion that proteomes of viruses are ancient and that proteomes of RNA viruses are
particuarly ancient. The ToD is projected as the evolutionary trajectory of individual SFs.
Such projections are used as proxies to determine the relative antiquity or novelty of SFs [2].
The ToD like the ToP is also rooted with a presumed pseudo-outgroup and that rooting may
be an artifact as for the ToP. Much more serious than potential artifacts in ToD is an
egregiously bad assumption from the perspective of the SCOP hierarchal classification: the
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
9
very notion that the ToD describes evolutionary relationships between SFs in the same way
the ToL describes genealogy of species. Evolutionary relationships between SFs with
different folds in SCOP classification are not established [4, 24]. Physicochemical protein
folding experiments and corresponding statistical analyses of sequence evolution patterns,
including simulations of protein folding are all consistent with the observations that the
sequence-structure space of SFs is discontinuous [25, 26]. Empirical data indicate that the
evolutionary transition from one SF to another through gradual changes implied in the ToD is
unlikely, if at all feasible. This makes the ToD hypothesis, which assumes that all SFs are
related to one another by common ancestry untenable. Thus the ToD contradicts the very
basis upon which SFs are classified in the SCOP hierarchy [4, 24]. The ToD is therefore
uninterpretable as an evolutionary history of individual SFs. Accordingly, the ToD cannot
reflect the ‘relative ages’ of SFs nor can it support the inferred antiquity of viruses in the
ToP.
Unlike phylogenetic trees that describe the evolution of individual proteomes, Venn
diagrams, SF sharing patterns and summary statistics of SF frequencies among groups of
proteomes only depict generalized trends. Multiple evolutionary scenarios can be invoked a
priori to explain the general trends without any phylogenetic analyses. Although such
patterns may be suggestive, they do not by themselves support reliable phylogenetic
inferences [19, 20]. Thus the authors’ inferences in [2] based on statistical distributions of
SFs alone are speculative, at best.
In summary, the authors’ [2, 3] proposed rooting for the ToL seems to be influenced
by clearly identifiable artifacts. The conceptual issue of proteome evolution may be traced to
a very widely held view: namely that “ToP were rooted by the minimum character state,
assuming that modern proteomes evolved from a relatively simpler urancestral organism that
harbored only few FSFs” [2, 9]. Thus, a common prejudice is that the origins of modern
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
10
proteomes ought to reflect a monotonic evolutionary progression that is embodied in
something like Aristotle’s Great Chain of Being [10]. The “all-zero” or “all-absent”
hypothetical ancestor is neither empirically grounded nor biologically meaningful, but it does
indeed reflect a common prejudice [10, 14]. Indeed, we bring into question the inferred
relative antiquity of viruses and Archaea in the ToL and the notion that viruses make up an
independent fourth supergroup [2]. In effect, we suggest that the phylogenetic approach of
Nasir and Caetano-Anolles’ [2, 3] provides neither a test nor a confirmation of any one of the
hypotheses for the origins of viruses [1]. Despite its importance, reconciling the extensive
genetic and morphological diversity of viruses as well as their evolutionary origins remains
murky [1, 27]. Better methods and empirical models are required to test whether a
multiplicity of scenarios or a single over-arching hypothesis is applicable to understand the
origins of viruses.
Methods
Here, genomic protein repertoires are referred to as proteomes. We re-analyzed a
subset of the 368 proteomes sampled in [2] for phylogenetic rooting of diverse cells and their
viruses. Here, we sampled 102 cellular proteomes containing 34 each from Archaea, Bacteria
and Eukaryotes, respectively as well as 16 viral proteomes from [2]. For the latter, we note
that the DNA virus proteomes were substantially larger than those of RNA viruses in terms of
the number of identified SFs. In addition we included for comparison some of the smallest
known proteomes of Archaea and Bacteria not included in [2]. Roughly, half of the sampled
proteomes were analyzed (Figs 1 and 2) for computational simplicity. Results did not vary
when all the sampled taxa were included (see supplementary Figs S3 & S4). Rooting
experiments were preformed both with SF occurrence (presence/absence) patterns and SF
abundance (copy number) patterns; however, we present results for the SF abundance
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
11
patterns as in [2]. In addition to the Lundberg rooting procedures carried out as in [2, 3],
rooting experiments were repeated by including the all-zero taxon in the tree reconstruction
process implicitly (using the ANCSTATES option in PAUP*) and explicitly as taxon in the
data matrix. Further, when the all-zero taxon was explicitly included, rooting experiments
were also repeated with the outgroup rooting method. Phylogenetic reconstructions were
carried out using maximum parsimony criteria implemented in PAUP* ver. 4.0b10 [28] with
heuristic tree searches using 1,000 replicates of random taxon addition and tree bisection
reconnection (TBR) branch swapping. Trees were rooted by Lundberg method, outgroup
method or intrinsically rooted by including the hypothetical all-zero ancestor in tree searches.
Acknowledgements: A. H. acknowledges support from The Swedish Research Council (to
Måns Ehrenberg) and the Knut and Alice Wallenberg Foundation, RiboCORE (to Måns
Ehrenberg), A.A. acknowledges support from European Regional Development Fund through
the Centre of Excellence in Chemical Biology grant number 3.2.0101.08-0017 and C.G.K.
acknowledges support from the Nobel Committee for Chemistry of the Royal Swedish
Academy of Sciences.
References:
1. Wessner, D., The origins of viruses. Nature Education, 2010. 3(9): p. 37. 2. Nasir, A. and G. Caetano-Anollés, A phylogenomic data-driven exploration of viral
origins and evolution. Science Advances, 2015. 1(8). 3. Nasir, A., K. Kim, and G. Caetano-Anolles, Giant viruses coexisted with the cellular
ancestors and represent a distinct supergroup along with superkingdoms Archaea, Bacteria and Eukarya. BMC Evolutionary Biology, 2012. 12(1): p. 156.
4. Murzin, A.G., et al., SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 1995. 247(4): p. 536-540.
5. Snel, B., P. Bork, and M.A. Huynen, Genome phylogeny based on gene content. Nature Genetics, 1999. 21(1): p. 108-110.
6. Yang, S., R.F. Doolittle, and P.E. Bourne, Phylogeny determined by protein domain content. Proceedings of the National Academy of Sciences of the United States of America, 2005. 102(2): p. 373-378.
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
12
7. Fang, H., et al., A daily-updated tree of (sequenced) life as a reference for genome research. Scientific Reports, 2013. 3.
8. Kurland, C.G. and A. Harish, Structural biology and genome evolution: An introduction. Biochimie, 2015. 119: p. 205-208.
9. Kim, K.M. and G. Caetano-Anollés, The proteomic complexity and rise of the primordial ancestor of diversified life. BMC Evolutionary Biology, 2011. 11(1).
10. Gouy, R., D. Baurain, and H. Philippe, Rooting the tree of life: the phylogenetic jury is still out. Phil. Trans. R. Soc. B, 2015. 370(1678): p. 20140329.
11. Graham, S.W., R.G. Olmstead, and S.C.H. Barrett, Rooting Phylogenetic Trees with Distant Outgroups: A Case Study from the Commelinoid Monocots. Molecular Biology and Evolution, 2002. 19(10): p. 1769-1781.
12. Morrison, D.A., Phylogenetic Analyses of Parasites in the New Millennium, in Advances in Parasitology. 2006. p. 1-124.
13. Pace, N.R., A molecular view of microbial diversity and the biosphere. Science, 1997. 276(5313): p. 734-740.
14. Wheeler, W.C., Systematics: A course of lectures. 2012: John Wiley & Sons. 15. Schwartz, R. and M. Dayhoff, Origins of prokaryotes, eukaryotes, mitochondria, and
chloroplasts. Science, 1978. 199(4327): p. 395-403. 16. Woese, C.R., O. Kandler, and M.L. Wheelis, Towards a natural system of organisms:
Proposal for the domains Archaea, Bacteria, and Eucarya. Proceedings of the National Academy of Sciences of the United States of America, 1990. 87(12): p. 4576-4579.
17. Swofford, D.L. and P.B. Begle, PAUP: Phylogenetic Analysis Using Parsimony: User's Manual (Version 3.1). 1993, Champaign, IL, USA.: Illinois Natural History Survey.
18. Harish, A., A. Tunlid, and C.G. Kurland, Rooted phylogeny of the three superkingdoms. Biochimie, 2013. 95(8): p. 1593-1604.
19. Abroi, A. and J. Gough, Are viruses a source of new protein folds for organisms? - Virosphere structure space and evolution. BioEssays, 2011. 33(8): p. 626-635.
20. Abroi, A., A protein domain-based view of the virosphere-host relationship. Biochimie, 2015. 119: p. 231-243.
21. Andersson, S.G.E. and C.G. Kurland, Reductive evolution of resident genomes. Trends in Microbiology, 1998. 6(7): p. 263-268.
22. Giovannoni, S.J., J. Cameron Thrash, and B. Temperton, Implications of streamlining theory for microbial ecology. ISME J, 2014. 8(8): p. 1553-1565.
23. Dujon, B., et al., Genome evolution in yeasts. Nature, 2004. 430(6995): p. 35-44. 24. Gough, J., et al., Assignment of homology to genome sequences using a library of
hidden Markov models that represent all proteins of known structure. Journal of Molecular Biology, 2001. 313(4): p. 903-919.
25. Oliveberg, M. and P.G. Wolynes, The experimental survey of protein-folding energy landscapes. Quarterly reviews of biophysics, 2005. 38(03): p. 245-288.
26. Wolynes, P.G., Evolution, energy landscapes and the paradoxes of protein folding. Biochimie, 2015. 119: p. 218-230.
27. Forterre, P., M. Krupovic, and D. Prangishvili, Cellular domains and viral lineages. Trends in Microbiology, 2014. 22(10): p. 554-558.
28. Swofford, D.L., PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. 2003, Sunderland, Massachusetts: Sinauer Associates.
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
13
Figure 1
Figure 1. Implementing an “all-zero” pseudo-outgroup [2] severely distorts rooting of the ToL. Rooted
trees were reconstructed from a subset of 368 taxa (proteomes) sampled in [2], which included 17 taxa each
from Archaea, Bacteria, Eukarya (ABE) and 9 taxa from the virus groups (V). (a) Venn diagram shows the 455
SFs shared between viruses and cells (ABEV), which were used to reconstruct trees. (b) Single most
parsimonious tree of ABE taxa rooted within Archaea. (c, d) New taxa, which represent the smallest proteome
after inclusion, were progressively included in size order. The position of the root node changed accordingly to
the branch corresponding to a group (or taxon) with the smallest proteome, which is Bacteria (c), Archaea (d);
the Eukarya section is collapsed since tree topology is unaffected. Taxa are described by their NCBI taxonomy
ID, taxonomic affiliation (A, B, E or V) and proteome size in terms of the number of distinct SFs present in the
genome. To compare the position of the root node trees are drawn to show branching patterns only, branch
lengths are not proportional to the quantity of evolutionary change.
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
14
Figure 2
Figure 2. Rooting experiments (continued from Fig. 1) show rooting bias of “all-zero” pseudo-outgroup
towards small proteomes. (a-c) New taxa, which represent the smallest proteome after inclusion, were
progressively included in size order, where the smallest proteome was from mega DNA viruses (a), Bacteria (b)
and RNA viruses (c). Details in trees are same as in Fig. 1. (d) Comparison of proteome sizes of sampled taxa in
terms of SF occurrence used to estimate trees in this study and in [2]. Numbers in parentheses above each group
indicate the number of proteomes.
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
224324_B_577
367737_B_607
383372_B_722
262724_B_634
243090_B_707
362976_A_570
399549_A_513
272569_A_639
192952_A_618
272568_B_692
70601_A_503
349307_A_512
391774_B_666
290317_B_639
240015_B_694
69014_A_536
411154_B_664
224325_A_569
321956_B_506
402880_A_509
410358_A_525
187420_A_523
272562_B_706
368407_A_598
348780_A_588
326298_B_602
64091_A_549
262543_B_690
273063_A_524
351607_B_638
259564_A_557
323259_A_591
273068_B_646
391009_B_584
70601_A_503
64091_A_549272569_A_639
273063_A_524
348780_A_588
326298_B_602
362976_A_570
321956_B_506
262543_B_690
402880_A_509
224324_B_577
1105097_B_420
383372_B_722
259564_A_557
187420_A_523
192952_A_618
69014_A_536
272562_B_706
410358_A_525
411154_B_664272568_B_692
391009_B_584
240015_B_694
262724_B_634
243090_B_707
351607_B_638
391774_B_666290317_B_639
224325_A_569
368407_A_598323259_A_591
349307_A_512
273068_B_646
399549_A_513
367737_B_607
391009_B_584
323259_A_591
391774_B_666290317_B_639
240015_B_694
272562_B_706
351607_B_638
399549_A_513
272569_A_639
272568_B_692
224325_A_569
410358_A_525
321956_B_506
224324_B_577
273063_A_524
1105097_B_420
368407_A_598
69014_A_536
383372_B_722
70601_A_503
192952_A_618
367737_B_607
273068_B_646
262724_B_634
362976_A_570
262543_B_690
64091_A_549
243090_B_707
402880_A_509
326298_B_602
187420_A_523
348780_A_588
349307_A_512
259564_A_557
228908_A_228
411154_B_664
326298_B_602
64091_A_549
192952_A_618
272569_A_639
391774_B_666
348780_A_588
402880_A_509
411154_B_664
262724_B_634
367737_B_607
273063_A_524
351607_B_638
70601_A_503
391009_B_584
410358_A_525
187420_A_523
321956_B_506
362976_A_570
349307_A_512
273068_B_646
All_Zero_Anc
240015_B_694
290317_B_639
323259_A_591
224325_A_569
243090_B_707
69014_A_536
259564_A_557
272562_B_706
224324_B_577
399549_A_513
383372_B_722
262543_B_690
368407_A_598
272568_B_692
262543_B_690
224324_B_577
362976_A_570
273063_A_524
64091_A_549
290317_B_639
All_Zero_Anc
402880_A_509
391009_B_584
351607_B_638
321956_B_506
383372_B_722
272562_B_706
272569_A_639
349307_A_512
262724_B_634
243090_B_707
187420_A_523
1105097_B_420
391774_B_666
224325_A_569
240015_B_694
367737_B_607
272568_B_692
399549_A_513
368407_A_598323259_A_591
192952_A_618
69014_A_53670601_A_503
259564_A_557
273068_B_646
410358_A_525
348780_A_588
326298_B_602
411154_B_664
399549_A_513
64091_A_549
273063_A_524
272568_B_692
349307_A_512
262724_B_634
224324_B_577
228908_A_228
290317_B_639
362976_A_570
402880_A_509
323259_A_591
224325_A_569
367737_B_607
383372_B_722
273068_B_646272562_B_706
69014_A_536
351607_B_638
243090_B_707
321956_B_506
272569_A_639
All_Zero_Anc
410358_A_525
1105097_B_420
411154_B_664
70601_A_503
348780_A_588
391774_B_666
187420_A_523
368407_A_598
391009_B_584
326298_B_602
192952_A_618
240015_B_694
259564_A_557
262543_B_690
Intrinsincally rooted trees‘0’ prespeci�ed as the ancestral stateand hypothetical “all-zero” ancestor included in tree search.
EUKARYABACTERIAARCHAEA Smallest proteome
Root node
Outgroup rooted trees“all-zero” taxon explicitly includedin the data matrix and speci�ed asthe outgroup.
(a) (b) (c)
(d) (e) (f )
Figure S1. Rooting experiments (continued from Fig 1) where trees were either intrinsically rooted by including the hypothetical ancestor in tree search (a-c) or byincluding and explicit all-zero ancestor taxon and specifying it to be the outgroup.
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
1094892_V_175
290317_B_639
262724_B_634
224325_A_569
323259_A_591
399549_A_51369014_A_536
1349409_V_108
273068_B_646
70601_A_503
391774_B_666
224324_B_577
262543_B_690
212035_V_176
351607_B_638
368407_A_598
272568_B_692
228908_A_228
349307_A_512
383372_B_722
367737_B_607
411154_B_664
1105097_B_420321956_B_506
243090_B_707
240015_B_694
259564_A_557
64091_A_549
391009_B_584
1235314_V_177
187420_A_523
192952_A_618
272569_A_639
410358_A_525
1349410_V_107
273063_A_524
326298_B_602
362976_A_570
272562_B_706
402880_A_509
348780_A_588
224325_A_569
391009_B_584
70601_A_503
290317_B_639
243090_B_707
273068_B_646
349307_A_512
240015_B_694
1349409_V_108
272568_B_692
64091_A_549
326298_B_602
402880_A_509
192952_A_618
410358_A_525
11520_V_7
368407_A_598
224324_B_577
262724_B_634
228908_A_228
362976_A_570
323259_A_591
399549_A_513
187420_A_523
367737_B_607
272569_A_639
69014_A_536
351607_B_638
1343077_B_107
11128_V_17
1105097_B_420
212035_V_176
321956_B_506
1235314_V_177
391774_B_666
348780_A_588
272562_B_706262543_B_690
411154_B_664
656190_V_4
273063_A_524
1349410_V_107
259564_A_557
1094892_V_175
383372_B_722
81931_V_5
1053648_B_107
367737_B_607
192952_A_618
1235314_V_177
326298_B_602
262543_B_690
348780_A_588
1053648_B_107
321956_B_506
349307_A_512
212035_V_176
243090_B_707
1343077_B_107
224324_B_577
351607_B_638
187420_A_523
70601_A_503262724_B_634
272569_A_639
1094892_V_175
1349409_V_108
224325_A_569
259564_A_557
402880_A_509
240015_B_694
391774_B_666
273068_B_646
391009_B_584
272562_B_706
399549_A_513
362976_A_570
410358_A_525
383372_B_722
323259_A_591
411154_B_664
290317_B_639
1105097_B_420
64091_A_549
368407_A_598
228908_A_228
69014_A_536
1349410_V_107
272568_B_692
273063_A_524
362976_A_570
259564_A_557
272568_B_692
69014_A_536
1094892_V_1751235314_V_177
272562_B_706
391774_B_666
411154_B_664
383372_B_722
367737_B_607
1105097_B_420
351607_B_638
368407_A_598
240015_B_694
410358_A_525
192952_A_618
348780_A_588
1349410_V_107
262724_B_634
349307_A_512
64091_A_549
228908_A_228
272569_A_639
70601_A_503
1349409_V_108
273068_B_646
391009_B_584
321956_B_506
399549_A_513
All_Zero_Anc
262543_B_690
224325_A_569
224324_B_577
273063_A_524
212035_V_176
290317_B_639
402880_A_509187420_A_523
243090_B_707
323259_A_591
326298_B_602
411154_B_664
273063_A_524
1094892_V_175
187420_A_523
391774_B_666
1053648_B_107
351607_B_638
410358_A_525
290317_B_639
All_Zero_Anc
402880_A_509
272569_A_639
323259_A_591
272568_B_692
259564_A_557
64091_A_549
228908_A_228
362976_A_570
262543_B_690
399549_A_513
224325_A_569
348780_A_588
367737_B_607
321956_B_506
383372_B_722
262724_B_634
1343077_B_107
326298_B_602
368407_A_598
212035_V_176
243090_B_707
272562_B_706
1105097_B_420
224324_B_577
240015_B_694
1235314_V_177
1349409_V_108
391009_B_584
273068_B_646
192952_A_618
349307_A_512
69014_A_53670601_A_503
1349410_V_107
290317_B_639
272568_B_692
367737_B_607
399549_A_513
349307_A_512
1349410_V_107
1105097_B_420
212035_V_176
243090_B_707
262724_B_634
368407_A_598323259_A_591
240015_B_694
410358_A_525
1094892_V_175
228908_A_228
64091_A_549
1349409_V_108
1343077_B_107
273068_B_646
383372_B_722
224324_B_577
656190_V_4
391774_B_666
272569_A_639
11128_V_17
192952_A_618
81931_V_5
411154_B_664
262543_B_690
362976_A_570
11520_V_7
272562_B_706
259564_A_557
391009_B_584
273063_A_524
321956_B_506
402880_A_509
70601_A_503
1053648_B_107
326298_B_602
187420_A_523
1235314_V_177
224325_A_569
348780_A_588
All_Zero_Anc
351607_B_638
69014_A_536
EUKARYABACTERIAARCHAEADNA VirusesRNA viruses
Smallest proteomeRoot node
Outgroup rooted trees“all-zero” taxon explicitly includedin the data matrix and speci�ed asthe outgroup.
Intrinsincally rooted trees‘0’ prespeci�ed as the ancestral stateand hypothetical “all-zero” ancestor included in tree search.
(a) (b) (c)
(d) (e) (f )
Figure S2. Rooting experiments (continued from Fig 2) where trees were either intrinsically rooted by including the hypothetical ancestor in tree search (a-c) or byincluding and explicit all-zero ancestor taxon and specifying it to be the outgroup.
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
326298_B_602
6239_E_944
224324_B_577
368407_A_598
411154_B_664
243090_B_707
262724_B_634
262543_B_690
259564_A_557
272569_A_639
225164_E_1047
3702_E_988
272562_B_706
167879_B_765
9615_E_1091
208963_B_846
4558_E_960
273116_A_486
9913_E_1090
318161_B_758
399549_A_513
3694_E_974
290317_B_639
362976_A_570
340102_A_493
10090_E_1110
251221_B_733
273063_A_524
399550_A_446
45157_E_821
453591_A_431
367110_E_883
64091_A_549
374847_A_451
486041_E_869
436308_A_502
224308_B_781
312017_E_777
420247_A_500
397948_A_493
349307_A_512
284811_E_765
349101_B_734
192952_A_618
70601_A_503
381666_B_810
306902_E_775
402880_A_509
1054217_A_469
104341_E_864
284590_E_770
4896_E_777
323259_A_591
365044_B_770
383372_B_722
272557_A_469
190192_A_473
448385_B_835
273507_E_834
240015_B_694
1036743_B_726
273068_B_646
444157_A_468
380703_B_807
410358_A_525
243232_A_499
368408_A_449
8355_E_1008
7227_E_985
6669_E_1042
5762_E_843
269482_B_829
7955_E_1088
70791_E_915
263820_A_491
391009_B_584
4932_E_777
100226_B_776
339860_A_475
357804_B_762
321956_B_506
296587_E_898
391774_B_666
391037_B_726
240292_B_783
351607_B_638
187420_A_523
556484_E_858
272568_B_692
81824_E_900
69014_A_536
367737_B_607
565050_B_724
415426_A_455
224325_A_569
9606_E_1113
2903_E_963
9598_E_1088
348780_A_588
44056_E_842
227321_E_899
44689_E_894
318161_B_758
284811_E_765
6669_E_1042
69014_A_536
44056_E_842
383372_B_722
8355_E_1008
436308_A_502
273116_A_486
312017_E_777
415426_A_455
224324_B_577
391774_B_666
70791_E_915
45157_E_821
410358_A_525
227321_E_899
4932_E_777
351607_B_638
272557_A_469
326298_B_602
391037_B_726
367737_B_607
306902_E_775
296587_E_898
1054217_A_469
349101_B_734
104341_E_864
243232_A_499
224308_B_781
9615_E_1091
4896_E_777
348780_A_588
9606_E_1113
9598_E_1088
100226_B_776
381666_B_810
190192_A_473
486041_E_869
192952_A_618
1036743_B_726
251221_B_733
208963_B_846
7955_E_1088
420247_A_500
6239_E_944
9913_E_1090
339860_A_475
269482_B_829
2903_E_963
224325_A_569
448385_B_835
167879_B_765
323259_A_591
453591_A_431
365044_B_770
399549_A_513273063_A_524
444157_A_468
3702_E_988
4558_E_9603694_E_974
391009_B_584
340102_A_493
7227_E_985
284590_E_770
262724_B_634
368408_A_449
565050_B_724
362976_A_570
10090_E_1110
349307_A_512
44689_E_894
70601_A_503
259564_A_557
273507_E_834
262543_B_690
374847_A_451
290317_B_639
272568_B_692
81824_E_900
272569_A_639
240292_B_783
397948_A_493
321956_B_506
411154_B_664
187420_A_523
556484_E_858
367110_E_883
273068_B_646
263820_A_491
5762_E_843
243090_B_707
399550_A_446
1105097_B_420
402880_A_509
64091_A_549
380703_B_807
225164_E_1047
357804_B_762
272562_B_706
240015_B_694
368407_A_598
444157_A_468
284590_E_770
362976_A_570
7955_E_1088
391774_B_666
380703_B_807
399549_A_513
243232_A_499
486041_E_869
3702_E_988
187420_A_523
340102_A_493
70791_E_915
44056_E_842
312017_E_777
367737_B_607
224324_B_577
339860_A_475
306902_E_775
259564_A_557
348780_A_588
402880_A_509
81824_E_900
273507_E_834
367110_E_883
273063_A_524
9913_E_1090
4896_E_777
368407_A_598
6669_E_1042
391037_B_726
290317_B_639
411154_B_664
399550_A_446
224308_B_781
565050_B_724
391009_B_584
6239_E_944
262543_B_690
240015_B_694
296587_E_898
243090_B_707
436308_A_502
269482_B_829
7227_E_985
448385_B_835
192952_A_618
224325_A_569
272557_A_469
272562_B_706
9615_E_1091
5762_E_843
10090_E_1110
4558_E_960
326298_B_602
323259_A_591
45157_E_821
272569_A_639
420247_A_500
381666_B_810
4932_E_777
64091_A_549
349101_B_734
263820_A_491
365044_B_770
100226_B_776
44689_E_894
227321_E_899
240292_B_783
228908_A_228
208963_B_846
357804_B_762
321956_B_506
397948_A_493
190192_A_473
273116_A_486
453591_A_431
374847_A_451
251221_B_733
167879_B_765
1105097_B_420
3694_E_974
284811_E_765
9606_E_1113
2903_E_963
9598_E_1088
318161_B_758
349307_A_512
70601_A_503
383372_B_722
272568_B_692
410358_A_525
262724_B_634
69014_A_536
225164_E_1047
104341_E_864
1054217_A_469
556484_E_858
1036743_B_726
368408_A_449
415426_A_455
273068_B_646
8355_E_1008
351607_B_638
(a) (b) (c)
EUKARYABACTERIAARCHAEA Smallest proteome
Root node
Figure S3. Rooting experiments (corresponding to Fig 1) with a larger taxon sampling
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint
1054217_A_469
1036743_B_726272568_B_692
351607_B_638
486041_E_869
397948_A_493
9615_E_1091
339860_A_475
348780_A_588
306902_E_775
224325_A_569
4896_E_777
565050_B_724
3702_E_988
225164_E_1047
269482_B_829
391037_B_726
5762_E_843
272562_B_706
420247_A_500
7227_E_985
273063_A_524
224308_B_781
243090_B_707
262724_B_634
167879_B_765
243232_A_499
4558_E_9602903_E_963
290317_B_639
367110_E_883
81824_E_900
187420_A_523
8355_E_1008
383372_B_722
410358_A_525
1105097_B_420
399550_A_446
340102_A_493
368408_A_449
4932_E_777
272569_A_639
318161_B_758
64091_A_549
453591_A_431
367737_B_607
9606_E_1113
284811_E_765
399549_A_513
368407_A_598
411154_B_664
391009_B_584
380703_B_807
212035_V_176
391774_B_666
349101_B_734
272557_A_469
44056_E_842
374847_A_451
70791_E_915227321_E_899
263820_A_491
402880_A_509
6669_E_1042
104341_E_864
69014_A_536
6239_E_944
362976_A_570
296587_E_898
312017_E_777
556484_E_858
208963_B_846
190192_A_473
9598_E_1088
357804_B_762
326298_B_602
415426_A_455
100226_B_776
365044_B_770
1094892_V_175
44689_E_8943694_E_974
273116_A_486
192952_A_618
224324_B_577
381666_B_810
444157_A_468
284590_E_770
240015_B_694
321956_B_506
70601_A_503
262543_B_690
10090_E_1110
45157_E_821
1349409_V_108
448385_B_835
273068_B_646
1235314_V_177228908_A_228
273507_E_834
436308_A_502
1349410_V_107
7955_E_1088
349307_A_512
251221_B_733
259564_A_557
9913_E_1090
323259_A_591
240292_B_783
5762_E_843
368408_A_449
263820_A_491
273068_B_646
9598_E_1088
4896_E_777
187420_A_523
3702_E_988
227321_E_899
323259_A_591
4932_E_777
273116_A_486
269482_B_829
391009_B_584321956_B_506
192952_A_618
243232_A_499
100226_B_776
357804_B_762
367110_E_883
284811_E_765
486041_E_869
251221_B_733
272562_B_706
1105097_B_420
224325_A_569
312017_E_777
349101_B_734
415426_A_455
9606_E_1113
348780_A_588272569_A_639
1343077_B_107
362976_A_570
208963_B_846
397948_A_493
2903_E_963
6239_E_9449615_E_1091
240015_B_694565050_B_724
444157_A_468
167879_B_765
296587_E_898
45157_E_821
420247_A_500
383372_B_722391037_B_726
7955_E_108810090_E_1110
411154_B_664
380703_B_807
225164_E_1047
7227_E_985
1053648_B_107
44689_E_894
448385_B_835
6669_E_1042
1094892_V_175
70791_E_915
453591_A_431
306902_E_775
273507_E_834
228908_A_228
290317_B_639
374847_A_451
44056_E_842
340102_A_493
1349409_V_108
240292_B_783
104341_E_864
262724_B_634
339860_A_475
284590_E_770
262543_B_690
9913_E_1090
326298_B_602367737_B_607
399549_A_513
381666_B_810
259564_A_557
8355_E_1008
243090_B_707
272568_B_692
410358_A_525
273063_A_524
69014_A_536
224308_B_781
365044_B_770
1349410_V_107
190192_A_473
4558_E_960
212035_V_176
402880_A_509
70601_A_503
351607_B_638
556484_E_858
318161_B_758
224324_B_577
1235314_V_177
349307_A_512
368407_A_598
81824_E_900
64091_A_549
399550_A_446
436308_A_502
3694_E_974
391774_B_666
1054217_A_469
1036743_B_726
272557_A_469
69014_A_536
44689_E_894
1343077_B_107
11128_V_17
1349410_V_107
348780_A_588
411154_B_664
1094892_V_175
273507_E_834
227321_E_899
349101_B_734
368407_A_598
357804_B_762
374847_A_451
1053648_B_107
269482_B_829
167879_B_765
4932_E_777
656190_V_4
3702_E_988
284590_E_770
240015_B_694
2903_E_963
486041_E_869
399550_A_446
381666_B_810
349307_A_512
391009_B_584
391037_B_726
444157_A_468
225164_E_1047
312017_E_777
565050_B_724
243232_A_499
362976_A_570
284811_E_765
70791_E_915
9913_E_1090
7955_E_1088
263820_A_491
436308_A_502
368408_A_449
8355_E_1008
212035_V_176
340102_A_493
228908_A_228
208963_B_846
10090_E_1110
273068_B_646
273116_A_486
1105097_B_420
81931_V_5
306902_E_775
6239_E_944
224324_B_577351607_B_638
391774_B_666
556484_E_858
240292_B_783
397948_A_493
365044_B_770
1054217_A_469
4558_E_960
224325_A_569
44056_E_842
1349409_V_108
7227_E_985
420247_A_500
9606_E_1113
9615_E_1091
243090_B_707
9598_E_1088
192952_A_618
190192_A_473
3694_E_974
323259_A_591
273063_A_524
64091_A_549
380703_B_807
187420_A_523
6669_E_1042
272562_B_706
81824_E_900
104341_E_864
367737_B_607
367110_E_883
262724_B_634
70601_A_503
321956_B_506
339860_A_475
1235314_V_177
251221_B_733
453591_A_431
448385_B_835
296587_E_898
45157_E_821
318161_B_758
259564_A_557
4896_E_777
272557_A_469
326298_B_602
224308_B_781272568_B_692
410358_A_525
100226_B_776
262543_B_690
290317_B_639
383372_B_722
402880_A_509
11520_V_7
5762_E_843
1036743_B_726
272569_A_639
399549_A_513
415426_A_455
EUKARYABACTERIAARCHAEADNA VirusesRNA viruses
Smallest proteomeRoot node
(a) (b) (c)
Figure S4. Rooting experiments (corresponding to Fig 2) with a larger taxon sampling
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted April 18, 2016. ; https://doi.org/10.1101/049171doi: bioRxiv preprint