11 clear water bay, hong kong, china - biorxiv · 11 clear water bay, hong kong, china 12 13 14...
TRANSCRIPT
-
1
Title: 1
Descent of Bacteria and Eukarya from an archaeal root of life 2
3
4
Authors: 5
Xi Long, Hong Xue and J. Tze-Fei Wong* 6
7
8
Affiliation: 9
Division of Life Science, Hong Kong University of Science and Technology, 10
Clear Water Bay, Hong Kong, China 11
12
13
*Corresponding author: 14
Email: [email protected] 15
16
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
2
Abstract 17
The three biological domains delineated based on SSU rRNAs are confronted by 18
uncertainties regarding the relationship between Archaea and Bacteria, and the origin of 19
Eukarya. Herein the homologies between the paralogous valyl-tRNA and isoleucyl-tRNA 20
synthetases in a wide spectrum of species revealed vertical gene transmission from an 21
archaeal root of life through a Primitive Archaea Cluster to an Ancestral Bacteria Cluster of 22
species. The higher homologies of the ribosomal proteins (rProts) of eukaryotic Giardia 23
toward archaeal relative to bacterial rProts established that an archaeal-parent rather than a 24
bacterial-parent underwent genome merger with an alphaproteobacterium to generate 25
Eukarya. Moreover, based on the top-ranked homology of the proteins of Aciduliprofundum 26
among archaea toward the Giardia and Trichomonas proteomes and the pyruvate phosphate 27
dikinase of Giardia, together with their active acquisition of exogenous bacterial genes 28
plausibly through foodchain gene adoption, the Aciduliprofundum archaea were identified as 29
leading candidates for the archaeal-parent of Eukarya. 30
31
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
3
Molecular evolution analysis of small subunit ribosomal RNA (SSU rRNA) yielded a 32
universal but unrooted tree of life (ToL) that comprises the three biological domains of 33
Archaea, Bacteria and Eukarya1. A ToL of transfer RNAs based on the genetic distances 34
between the 20 classes of tRNA acceptors for different amino acids located its root near the 35
hyperthermophilic archaeal methanogen Methanopyrus (Mka)2. Although this rooting is 36
supported by a wide range of evidence3-9, and the age of ~2.7 Gya for the Methanopyrus 37
lineage as the oldest among living organisms10, the phylogenies of the three biological 38
domains are beset by two fundamental problems: viz. the uncertain relationship between 39
Archaea and Bacteria, and the identity of the prokaryotic-parent that underwent genome 40
merger with an alphaproteobacterium to give rise to Eukarya. As long as these two problems 41
remain unresolved, the nature of the root of life is open to diverse formulations11-15. 42
Accordingly, the objective of the present study is to examine the pathways of descent of 43
Bacteria and Eukarya from an archaeal root of life, and the nature of the archaeal-parent of 44
Eukarya. 45
46
The antiquity of proteins could be assessed based on the increasing divergence of paralogous 47
proteins in time16. Applying this approach, BLASTP was performed between the intraspecies 48
valyl-tRNA synthetase (VARS) and isoleucyl-tRNA synthetase (IARS) in the genomic 49
sequences for 5,398 species in NCBI Genbank. Arrangement of the BLASTP bitscores 50
obtained in descending order (Supplementary Table S1 and partly in Fig. 1) showed that the 51
119 highest scoring species were all archaeons, topped by Mka and including Mfe, Afu, Mnt 52
and Mja with bitscores of 473, 436, 387, 387 and 387 respectively. The top scoring bacterium 53
was the Clostridium Mau with a bit score of 378, and the top scoring eukaryote was the 54
filamentous brown alga Esi with a bit score of 240. These results established the foremost 55
antiquity of Mka among extant organisms. 56
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
4
57
The positions of some of the species analyzed in Fig. 1 were indicated on the SSU rRNA tree, 58
with their intraspecies VARS-IARS bitscores expressed in circles colored according to the 59
thermal scale (Fig. 2a). There was a concentration of euryarchaeons with high VARS-IARS 60
homology in a ‘Primitive Archaea Cluster’ spread between Hal and Mfe. In the Bacteria 61
domain, there was likewise an ‘Ancestral Bacteria Cluster’ with high VARS-IARS homology 62
spread between Det and Mau. The deepest branching species in the Bacteria domain were 63
two members of the Aquificae phylum, viz. the anaerobic Det with high VARS-IARS 64
homology, and the microaerobic, low-homology Aae. Since mutations could cause loss of 65
homology more easily than gain of homology, this suggests that Aae has evolved far from the 66
ancestral Aquificiae species possibly as part of the wave of tumultuous changes undergone by 67
former anaerobes in response to the appearance of atmospheric oxygen17, thereby sustaining 68
extensive evolutionary erosion of its VARS-IARS homology. The enhanced resistance of 69
paralogue homology to perturbation by horizontal gene transfer (HGT), due to the difficulty 70
of transfer of a pair of proteins compared to transfer of a single protein, was illustrated by the 71
preservation of low VARS-IARS bitscores in the proteobacterial region of the tree against 72
any shift toward elevated VARS-IARS homology on account of HGT events, despite the high 73
HGT-susceptibility of for example Eco, which acquired 18% of its genes through HGT 74
subsequent to its departure from Salmonella enterica about 100 million years ago18. 75
Previously, based on the intraspecies alloacceptor tRNA-distances of various species on the 76
tRNA tree, LUCA was positioned between the branches leading to Mka and Ape at a distance 77
ratio of 1.00 from the Mka branch versus 1.14 from the Ape branch2, and this position was 78
adopted in the SSU rRNA tree in Fig. 2. 79
80
Given the relative paucity of HGT effects on VARS-IARS homology in a majority if not all 81
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
5
of the species on the SSU rRNA tree, the parallel prominence of the high VARS-IARS 82
homology species in the Primitive Archaea Cluster and the Ancestral Bacteria Cluster was 83
explicable by vertical genetic transmission of the VARS and IARS genes from an 84
Mka-proximal root of life to the archaeal cluster, and in turn to the bacterial cluster. Since the 85
top ranked bacterial bitscore of Mau at 378 was between that of Mac at 382 and Pfu at 369, 86
the results suggest that the Ancestral Bacteria Cluster branched off from the Primitive 87
Archaea Cluster close to the Mka-proximal root of Archaea. The medium VARS-IARS 88
bitscores of Esi, Tps, Bpr and Cme among the Eukarya (Fig. 2a) were indicative of the 89
extension of the intraspecies VARS-IARS homology into this domain. The much higher 90
VARS homologies (colored squares) and IARS homologies (colored triangles) between 91
bacterial species and the eukaryote Gla compared to those between archaeal species and Gla 92
indicated that Eukarya received the VARS and IARS genes from the Bacteria instead of the 93
Archaea domain (Fig. 2b). 94
95
The aligned segments of VARS and IARS from the archaeon Mka, the bacterium Mau and 96
the eukaryote Esi in Fig. 3 were portions of the six complete sequences (Supplementary Fig. 97
S1). These segments showed 42/207 columns where all six sequences carried the same amino 98
acid, thereby providing strong evidence for the vertical transmission of VARS and IARS 99
genes from Archaea to Bacteria and Eukarya. 100
101
It has been suggested that an endosymbiotic event between an archaeal-parent and an 102
alphaproteobacterium acting as mitochondrion-parent led to the formation of the Last 103
Eukaryotic Common Ancestor (LECA) and ushered in the Eukarya domain19-21. Proposals 104
regarding the identity of the archaeal-parent have focused on a range of prokaryotes including 105
Thermoplasmatales where the lack of a rigid cell wall could facilitate its engulfment of the 106
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
6
mitochondrion-parent to bring about endosymbiosis22-24; and various archaeons, especially 107
the Asgard archaeons Lok and Tho25,26, that are enriched in eukaryotic signature proteins 108
(ESPs)27. There is no consensus regarding a choice between these two groups of organisms28. 109
110
Upon BLASTP comparisons of the fifty-six ribosomal proteins (rProts) of Gla, the lowest 111
branching eukaryote on the SSU rRNA tree, with corresponding prokaryotic rProts, 112
fifty-three of them yielded higher bitscores with archaeons relative to bacteria (Fig. 4a), 113
indicating that eukaryogenesis began with an archaeal-parent instead of a bacterial-parent. 114
S21e and L36e yielded no bitscore with any archeal or bacterial rProt, suggesting that they 115
were derived from a prokaryote not surveyed in the present study, altered beyond recognition 116
by BLASTP, or invented by the eukaryogenesis system. The S7e of eleven eukaryotes from 117
Table 1 although not that of Gla showed detectable homology toward the archaeon Alt. The 118
Sce, Esi and Hsa rProts differed from those of Gla in two aspects: about one-sixth of the 119
rProts that showed higher homology toward archaea relative to bacteria in Gla switched to 120
lower homology toward archaea relative to bacteria in Sce, Esi and Hsa; and there were also 121
some additional rProts in Sce, Esi and Hsa, mainly bacteria-derived ones, not found in Gla 122
(Supplementary Fig. S2). These findings pointed to a significant influx of bacterial rProts into 123
Sce, Esi and Hsa after their divergence from Gla, resulting in the replacement of some of the 124
archaea-derived rProts found in Gla by bacteria-derived ones. Acf, Abo and Mac displayed 125
the highest BLASTP bitscores among archaeons toward Gla pertaining to pyruvate phosphate 126
dikinase (EC2.7.9.1), which would be consistent with a possible role of these three archaeons 127
as archaeal-parent. Interestingly, the bitscores were high for Hei and Tho but low for Odi and 128
Lok among the Asgard archaea, and high for Tac and Tvo but low for Fac, Min and Mte 129
among the Thermoplasmatales (Fig. 4b). 130
131
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
7
Figure 4c shows the prokaryotic distribution of homologues of some the 162 Gla proteins that 132
were ESPs or homologous toward a limited number of prokaryotes (Supplementary Table S2). 133
Tho, Odi, Xca and Lok, the four prokaryotes endowed with the largest numbers of the 134
Gla-homologous proteins, harbored only 26, 19, 17 and 16 of these Gla-homologous proteins 135
respectively (Supplementary Table S3), and the four Asgard archaeons also did not fully 136
share their Gla-like proteins with one another via HGT, thus underlining the difficulty for any 137
one archaeon or bacterium to accumulate a sufficient number of eukaryote-type proteins to 138
launch the Eukarya domain relying only on their own inventiveness and HGT. On the other 139
hand, one or more potential prokaryotic sources were found for each of the Gla-homologous 140
proteins despite the survey of only a small spectrum of prokaryotes, indicating that the 141
obstacle to eukaryogenesis posed by gene deficiency may be overcome if some dependable 142
mechanism were available to assemble the requisite genes from a broad spectrum of 143
prokaryotes. Addressing the inadequacy of ESP coverage by single archaeal species29,30, it 144
was suggested that HGTs, or development of phagocytosis by an ESP-rich archaeon might 145
provide a solution26,31. However, the frequencies of HGTs might be a limiting factor10,32, and 146
rProts could be particularly resistant to HGT33. 147
148
During eukaryogenesis, the archaeal-parent might join up with the mitochondrion-parent to 149
develop directly into LECA in a mitochondria-early scenario, or it might serve as First 150
Eukaryotic Common Ancestor (FECA), and go through successive generations of genomic 151
expansion prior to merger with the mitochondrion-parent to form LECA in a 152
mitochondria-late scenario34. By measuring the phylogenetic distances between different 153
components of LECA and their closest prokaryotic relatives, evidence has been obtained in 154
support of a mitochondria-late time table, with the appearance of nucleolus preceding that of 155
nucleus, endomembrane system and finally mitochondria35. Previously, the proteins of the 156
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
8
eukaryote Sce were observed to contain a substantial variety of bacterial proteins and also 157
some archaeal ones, and it was pointed out that the influx or bacterial genes into Sce could 158
not be explained by a merger between archaeal-parent with another bacterium besides the 159
mitochondrion-parent, or by Sce uptake of bacterial genes through ingestion of bacteria as 160
food. Instead, the mitochondrion-parent was a major source of the exogenous bacterial 161
proteins in Sce20. When the eukaryotic Gla and Trv proteomes were employed as homology 162
probes for BLASTP query against various prokaryotic proteomes, it gave rise to so many hits 163
with both archaea and bacteria (Supplementary Table S4) that the influx of both archaeal and 164
bacterial genes into the eukaryogenesis system had to be mediated by some highly efficient 165
mechanism; and the similar prokaryotic-homology spectra for Gla and Trv (Fig. 5a) suggest 166
that a majority of the prokaryotic genes in these two eukaryotic genomes entered the 167
eukaryogenic lineage prior to the divergence between Gla and Trv. In fact, archaea have long 168
relied on bacteria as a source of genetic diversity, and there was precedent of influx of 169
bacterial genes being a determinant of archaeal phylogenies: a large number of bacterial 170
genes entered into the methanogen that begot the haloarchael archaeons36,37. In view of this, 171
an influx of beneficial prokaryotic genes into the archaeal-parent lineage likely began prior to 172
the FECA stage and continued through LECA to the early eukaryotes as illustrated by the 173
entry of bacterial rProt genes into Sce, Esi and Hsa (Supplementary Fig. S2). 174
175
Among 46 archael proteomes analyzed, the Abo and Acf proteomes displayed the highest 176
average homology bitscores toward the eukaryotic proteomes of both Gla and Trv (Fig. 5b, 177
Supplementary Fig. S3 and Supplementary Table S4), which suggests that these archaeons 178
could be candidate archaeal-parents. Once a bacterial protein entered into the 179
FECA-eukaryote lineage, its bacterial and eukaryotic versions became segregated 180
locationwise, and evolved independently. The divergence between the two versions would 181
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
9
thus increase with time, as in the case of functional paralogues such as VARS and IARS. 182
Accordingly, the bitscores of Ctr and Tpa proteins were high toward Gla and Trv likely 183
because they were taken up late by the eukaryogenic lineage, whereas the bitscores of Mpn 184
and Aae were low likely because they were taken up early (Fig. 5b and Supplementary Fig. 185
S3). Moreover, when the 46 archaeons were compared regarding their ability to import 186
bacterial genes into their own genomes, several archaeons with relatively large proteomes, 187
viz. Hla, Hgi and Mac (with 3,704 to 4,469 protein genes), as well as Lok (5,378 protein 188
genes) and Pfu (2,053 protein genes), displayed distinctive homologies toward bacteria (Fig. 189
5c left panel). However, when the individual archaeal bitscores were normalized with respect 190
to the protein gene numbers of the archaeons, the normalized bitscores of the small-proteome 191
Abo-group of seven euryarchaeons comprising Abo, Acf, Mte, Tvo and Tac (each with 192
-
10
they took as food41. To perform FGA, Abo would employ its array of thirty proteases to 207
digest away the proteins of dead prokaryotes to prepare their naked DNA, import it through 208
the Beveridge bridal S-layer of its cell surface (S-layer lattices are known to house regular 209
channels of 2-6 nm diameter42) for implantation into its own genome. In using proteases to 210
purify DNA for cloning, Abo predates by eons the same usage by modern genetic 211
engineering. The S-layer of Abo is highly flexible and can be bent to form small blebbing 212
vesicles with sharp curvature, indicating that the bonding forces between the S-layer subunits 213
are unusually weak or transient. These vesicles can bud off and anneal with other cells43. 214
Pseudomonas aeruginosa also releases comparable membrane vesicles that contain the 215
pseudomonas quinoline signal for cell-cell communication and group behavior44; and 216
Sulfolobus islandicus produces cell-derived S-layer coated spherical membrane vesicles of 217
90-180 nm diameter45. Importantly, such flexibility of the Abo cell surface could facilitate the 218
formation of eukaryotic endomembrane and the prerequisite phagocytosis machinery for 219
eukaryogenesis31,46,47. Furthermore, while all prokaryotic cells evolve on the basis of 220
nucleotidyl mutations through the replacement, addition and subtraction of nucleotides, FGA 221
gene uptake would enable the cells to evolve also on the basis of gene-content mutations 222
through the replacement, addition and subtraction of genes, or gene clusters, expediting 223
eukaryogenesis by orders of magnitude. In the example of Tac, it has succeeded in acquiring 224
gene clusters from other organisms for rProts, NADH dehydrogenase, precorrin biosynthesis, 225
flagellar proteins and a protein degradation pathway, amounting to 32% of its total open 226
reading frames plausibly via FGA40. Besides, the blebbing vesicles of Abo and Acf could 227
mediate gene exchanges between cells engaged in eukaryogenesis, thus further facilitating the 228
process. Therefore, based on the highest archaeal BLASTP bitscores of Abo and Acf toward 229
Gla pertaining to pyruvate phosphate dikinase (Fig. 4b), their highest average archaeal 230
bitscores toward Gla and Trv proteomes (Fig. 5b and Supplementary Fig. S3) and highest 231
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
11
archaeal normalized bitscores toward bacteria (Fig. 5c), blebbing membrane vesicles, and 232
possession of nine out of ten glycolytic enzymes needed for metabolic cooperation with 233
mitochondrial respiration, these Aciduliprofundum archaea represent exceptionally 234
advantaged candidates for the archaeal-parent role. They even share with the deep branching 235
eukaryote Gla, the rProts of which have remained more archaeal than those of Sce, Esi and 236
Hsa, the scavenger life style. Between Acf and Abo, the catalase/peroxidase HP1-encoding 237
facultatively anaerobic Acf might be more resistant to oxidative damage than Abo during 238
merger with an alphaproteobacterium, and could hunt a wide range of ecological niches for 239
beneficial food species. The bitscore 936 of Acf toward Gla with respect to pyruvate 240
phosphate dikinase was also slightly higher than the 928 bitscore of Abo. The Asgard 241
archaeons have been highly regarded as candidate archaeal-parents on the strength of their 242
important ESP genes, which could render any one of them a valuable food species as well for 243
the archaeal-parent. Moreover, the new cultivatable Asgard Candidatus Prometheoarchaeum 244
syntrophicum MK-D148,49 can degrade amino acids through syntrophy, and display wisp-like 245
membrane protrusions indicative of flexible cell surface and possible FGA activity in keeping 246
with the significant albeit modest bitscores between Lok and the bacterial species Bja and Tht, 247
and between Odi and several bacterial species, in Fig. 5c left and right panels respectively. 248
249
Based on the BLASTP bitscores between the proteins of various prokaryotes and the 250
mitochondrial gene-encoded proteins in four eukaryotes (Supplementary Table S5), the 251
alphaproteobacteria Haematobacter, Chelativorans and Tateyamaria were closest to the 252
lineage of the mitochondrion-parent (Fig. 5d). 253
254
In conclusion, in the present study, Methanopyrus kandleri was found to be the top-ranked 255
organism in VARS-IARS homology among 5,398 species from all three biological domains, 256
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
12
and therefore close to the root of life. Moreover, the high VARS-IARS homologies in the 257
Primitive Archaea Cluster and in the Ancestral Bacteria Cluster delineated a pathway of 258
descent of Bacteria from Archaea that diverged early from Archaea to form the Bacteria 259
domain. The preeminent homology between the Gla rProts and archaeal rProts established 260
that the prokaryote-parent of Eukarya that entered into genome merger with an 261
alphaproteobacterial mitochondrion-parent was an archaeal-parent. The archaeal-parent was 262
suggested to a scavenger archaeon such as the Aciduliprofundum archaea, capable of 263
generating a chimeric eukaryote through large scale foodchain gene adoption. 264
Notwithstanding such elaborate phylogenetic developments, the asterisked columns in Fig. 3, 265
where all six aligned protein sequences showed the same Val or Leu residue despite the ease 266
with which Val, Leu and Ile can be interchanged in evolution, represented a level of protein 267
sequence conservation across two different proteins, three biological domains, and two giga 268
year-plus time span that required the vertical genetic descent of Bacteria and Eukarya from an 269
archaeal root of life. 270
271
272
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
13
References 273
1. Woese, C. R., Kandler, O. & Wheelis, M. L. Towards a natural system of organisms: 274
proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. U.S.A. 275
87, 4576-4579 (1990). 276
2. Xue, H., Tong, K. L., Marck, C., Grosjean, H. & Wong, J. T. Transfer RNA paralogs: 277
evidence for genetic code-amino acid biosynthesis coevolution and an archaeal root of 278
life. Gene 310, 59-66 (2003). 279
3. Tong, K. L. & Wong, J. T. Anticodon and wobble evolution. Gene 333, 169-177 (2004). 280
4. Wong, J. T. F. Coevolution theory of the genetic code at age thirty. BioEssays 27, 281
416-425 (2005). 282
5. Wong, J. T., Chen, J., Mat, W. K., Ng, S. K. & Xue, H. Polyphasic evidence delineating 283
the root of life and roots of biological domains. Gene 403, 39-52 (2007). 284
6. Yu, Z., Takai, K., Slesarev, A., Xue, H. & Wong, J. T. Search for primitive 285
Methanopyrus based on genetic distance between Val- and Ile-tRNA synthetases. J. Mol. 286
Evol. 69, 386-394 (2009). 287
7. Wong, J. T., Ng, S. K., Mat, W. K., Hu, T. & Xue, H. Coevolution theory of the genetic 288
code at age forty: pathway to translation and synthetic life. Life (Basel) 6, 12; 289
10.3390/life6010012 (2016). 290
8. Kelly, S., Wickstead, B. & Gull, K. Archaeal phylogenomics provides evidence in 291
support of a methanogenic origin of the Archaea and a thaumarchaeal origin for the 292
eukaryotes. Proc. Royal Soc. B 278, 1009-1018 (2011). 293
9. Williams, T. A. et al. Integrative modeling of gene and genome evolution roots the 294
archaeal tree of life. Proc. Natl. Acad. Sci. U.S.A. 114, E4602-E4611 (2017). 295
10. Blank, C. E. Low rates of lateral gene transfer among metabolic genes define the 296
evolving biogeochemical niches of archaea through deep time. Archaea 2012, 843539; 297
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
14
10.1155/2012/843539 (2012). 298
11. Doolittle, W. F. Phylogenetic classification and the universal tree. Science 284, 299
2124-2128 (1999). 300
12. Cavalier-Smith, T. Rooting the tree of life by transition analyses. Biol. Direct 1, 19; 301
10.1186/1745-6150-1-19 (2006). 302
13. Lake, J. A., Skophammer, R. G., Herbold, C. W. & Servin, J. A. Genome beginnings: 303
rooting the tree of life. Philos. Trans. Royal Soc. B 364, 2177-2185 (2009). 304
14. Harish, A., Tunlid, A. & Kurland, C. G. Rooted phylogeny of the three superkingdoms. 305
Biochimie 95, 1593-1604 (2013). 306
15. Forterre, P. The universal tree of life: an update. Front. Microbiol. 6, 717; 307
10.3389/fmicb.2015.00717 (2015). 308
16. Schwartz, R. M. & Dayhoff, M. O. Origins of prokaryotes, eukaryotes, mitochondria, 309
and chloroplasts. Science 199, 395-403 (1978). 310
17. Raymond, J. & Segre, D. The effect of oxygen on biochemical networks and the 311
evolution of complex life. Science 311, 1764-1767 (2006). 312
18. Lawrence, J. G. & Ochman, H. Molecular archaeology of the Escherichia coli genome. 313
Proc. Natl. Acad. Sci. U.S.A. 95, 9413-9417 (1998). 314
19. Andersson, S. G. E. et al. The genome sequence of Rickettsia prowazekii and the origin 315
of mitochondria. Nature 396, 133-140 (1998). 316
20. Esser, C. et al. A genome phylogeny for mitochondria among alpha-proteobacteria and a 317
predominantly eubacterial ancestry of yeast nuclear genes. Mol. Biol. Evol. 21, 318
1643-1660 (2004). 319
21. Fitzpatrick, D. A., Creevey, C. J. & McInerney, J. O. Genome phylogenies indicate a 320
meaningful alpha-proteobacterial phylogeny and support a grouping of the mitochondria 321
with the Rickettsiales. Mol. Biol. Evol. 23, 74-85 (2006). 322
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
15
22. Searcy, D. G., Stein, D. B. & Searcy, K. B. A mycoplasma-like archaebacterium 323
possibly related to the nucleus and cytoplasm of eukaryotic cells. Ann. N. Y. Acad. Sci. 324
361, 312-324 (1981). 325
23. Margulis, L., Dolan, M. F. & Guerrero, R. The chimeric eukaryote: origin of the nucleus 326
from the karyomastigont in amitochondriate protists. Proc. Natl. Acad. Sci. U.S.A. 97, 327
6954-6959 (2000). 328
24. Pisani, D., Cotton, J. A. & McInerney, J. O. Supertrees disentangle the chimerical origin 329
of eukaryotic genomes. Mol. Biol. Evol. 24, 1752-1760 (2007). 330
25. Spang, A. et al. Complex archaea that bridge the gap between prokaryotes and 331
eukaryotes. Nature 521, 173-179 (2015). 332
26. Eme, L., Spang, A., Lombard, J., Stairs, C. W. & Ettema, T. J. G. Archaea and the origin 333
of eukaryotes Nat. Rev. Microbiol. 15, 711-723 (2017). 334
27. Hartman, H. & Fedorov, A. The origin of the eukaryotic cell: A genomic investigation. 335
Proc. Natl. Acad. Sci. U.S.A. 99, 1420-1425 (2002). 336
28. Gribaldo, S., Poole, A. M., Daubin, V., Forterre, P. & Brochier-Armanet, C. The origin 337
of eukaryotes and their relationship with the Archaea: are we at a phylogenomic impasse? 338
Nat. Rev. Microbiol. 8, 743-752 (2010). 339
29. Martin, W. F., Tielens, A. G. M., Mentel, M., Garg, S. G. & Gould, S. B. The 340
physiology of phagocytosis in the context of mitochondrial origin. Microbiol. Mol. Biol. 341
Rev. 83, e00008-17; 10.1128/MMBR.00008-17 (2017). 342
30. Nasir, A., Kim, K. M. & Caetano-Anolles, G. Lokiarchaeota: eukaryote-like missing 343
links from microbial dark matter? Trends Microbiol. 23, 448-450 (2015). 344
31. Yutin, N., Wolf, M. Y., Wolf, Y. I. & Koonin, E. V. The origins of phagocytosis and 345
eukaryogenesis. Biol. Direct 4, 9; 10.1186/1745-6150-4-9 (2009). 346
32. Kurland, C. G., Canback, B. & Berg, O. G. Horizontal gene transfer: a critical view. 347
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
16
Proc. Natl. Acad. Sci. U.S.A. 100, 9658-9662 (2003). 348
33. Shi, T. & Falkowski, P. G. Genome evolution in cyanobacteria: the stable core and the 349
variable shell. Proc. Natl. Acad. Sci. U.S.A. 105, 2510-2515 (2008). 350
34. Koumandou, V. L. et al. Molecular paleontology and complexity in the last eukaryotic 351
common ancestor. Crit. Rev. Biochem. Mol. Biol. 48, 373-396 (2013). 352
35. Pittis, A. A. & Gabaldon, T. Late acquisition of mitochondria by a host with chimaeric 353
prokaryotic ancestry. Nature 531, 101-104 (2016). 354
36. Akanni, W. A. et al. Horizontal gene flow from eubacteria to archaebacteria and what it 355
means for our understanding of eukaryogenesis. Philos. Trans. Royal Soc. B 370, 356
20140337; 10.1098/rstb.2014.0337 (2015). 357
37. Nelson-Sathi, S. et al. Acquisition of 1,000 eubacterial genes physiologically 358
transformed a methanogen at the origin of Haloarchaea. Proc. Natl. Acad. Sci. U.S.A. 359
109, 20537-20542 (2012). 360
38. Reysenbach, A. L. et al. A ubiquitous thermoacidophilic archaeon from deep-sea 361
hydrothermal vents. Nature 442, 444-447 (2006). 362
39. Subhraveti P., O. Q., Keseler I., Kothari A., Caspi R., Karp P. D. Summary of 363
Aciduliprofundum sp. MAR08-339, version 23.0. 364
https://biocyc.org/organism-summary?object=ASP673860 (2017). 365
40. Ruepp, A. et al. The genome sequence of the thermoacidophilic scavenger 366
Thermoplasma acidophilum. Nature 407, 508-513 (2000). 367
41. Doolittle, W. E. You are what you eat: a gene transfer ratchet could account for bacterial 368
genes in eukaryotic nuclear genomes. Trends Genet. 14, 307-311 (1998). 369
42. Sara, M. & Sleytr, U. B. Production and characteristics of ultrafiltration membranes with 370
uniform pores from two-dimensional arrays of proteins. J. Membr. Sci. 33, 27-49 (1987). 371
43. Reysenbach, A. L. & Flores, G. E. Electron microscopy encounters with unusual 372
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
17
thermophiles helps direct genomic analysis of Aciduliprofundum boonei. Geobiology 6, 373
331-336 (2008). 374
44. Mashburn, L. M. & Whiteley, M. Membrane vesicles traffic signals and facilitate group 375
activities in a prokaryote. Nature 437, 422-425 (2005). 376
45. Prangishvili, D. et al. Sulfolobicins, specific proteinaceous toxins produced by strains of 377
the extremely thermophilic archaeal genus Sulfolobus. J. Bacteriol. 182, 2985-2988 378
(2000). 379
46. Poole, A. M. & Gribaldo, S. Eukaryotic origins: how and when was the mitochondrion 380
acquired? Cold Spring Harb. Perspect. Biol. 6, a015990; 10.1101/cshperspect.a015990 381
(2014). 382
47. McInerney, J., Pisani, D. & O'Connell, M. J. The ring of life hypothesis for eukaryote 383
origins is supported by multiple kinds of data. Philos. Trans. Royal Soc. B 370, 384
20140323; 10.1098/rstb.2014.0323 (2015). 385
48. Imachi, H. et al. Isolation of an archaeon at the prokaryote-eukaryote interface. 386
Preprint at https://doi.org/10.1101/726976 (2019). 387
49. Lambert, J. Scientists glimpse oddball microbe that could explain rise of complex life. 388
Nature 572, 294 (2019). 389
50. Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence 390
alignments using Clustal Omega. Mol. Syst. Biol. 7, 539; 10.1038/msb.2011.75 (2011). 391
392
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
18
Supplementary Information is available online. 393
394
Author Contribution 395
J.T.W. and H.X. conceived the study; X.L. collected the data and performed computational analysis; 396
and J.T.W., H.X. and X.L. wrote the paper. All authors read and approved the final manuscript. 397
398
Acknowledgments 399
This study was supported by University Grants Council of Hong Kong SAR (ITS/113/15FP), 400
and X.L. was recipient of a Hong Kong Government Ph. D. Fellowship. 401
402
Author information 403
The authors declare that they have no competing interests. Correspondence and material 404
requests should be addressed to T.F.W. ([email protected]). 405
406
Data availability 407
All data supporting the findings of this study are available within the paper and its 408
Supplementary Information files. 409
410
Methods 411
Source of data and materials 412
The protein and SSU rRNA sequences analyzed in the present study were retrieved from the 413
NCBI GenBank release 231 (ftp://ftp.ncbi.nlm.nih.gov/genomes/)51,52. For species without 414
available SSU rRNA information in NCBI, quality checked SSU rRNA sequences were 415
downloaded from the SILVA database release 132 (https://www.arb-silva.de/)53. For species 416
with multiple SSU rRNA sequences, the one yielding the highest total bitscore (using 417
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
19
BLASTN54 with ‘-word_size’ flag set to 4) with SSU rRNAs of other species from the same 418
domain was employed for analysis. The alignment of the 83 SSU rRNA sequences used to 419
build the SSU rRNA tree was given in Supplementary File S1. Eukaryotic mitochondrial 420
gene-encoded protein sequences were retrieved from NCBI Protein database 421
(https://www.ncbi.nlm.nih.gov/protein) by searching species name of the eukaryote of 422
interest and setting the ‘Genetic compartments’ filter as ‘Mitochondrion’. 423
424
Estimation of nuclear or mitochondrial proteome homology 425
When the proteomes of two species were compared using BLASTP54 (with ‘-evalue’ flag 426
setting to 0.05), query and subject sequences that were the only best match of each other, viz. 427
query n has the highest bitscore with subject m, and subject m has the highest bitscore with 428
query n at the same time, were considered in calculating proteome similarity. 429
430
Estimation of protein family homology 431
Protein sequence similarity was estimated by the maximum bitscore of each pair of sequences 432
yielded by BLASTP54 with all default parameters. To estimate ribosomal protein similarities 433
among eukaryotes and prokaryotes, all seed sequences of 80 ribosomal protein families 434
(Supplementary Table S6) retrieved from Pfam database55 were blasted (with ‘-evalue’ flag 435
set to 0.05) against the proteomes of 21 eukaryotes, 46 archaea and 36 bacteria respectively. 436
For every ribosomal protein family, only one protein sequence of each species yielding the 437
highest bitscore with the seed sequences were selected for further analysis. To remove 438
false-positive sequences, the selected sequences were submitted to NCBI Batch CD-search 439
Tool56 to search against the Pfam database, and sequences that failed to map to the same 440
ribosomal protein family were removed. Finally, the eukaryotic ribosomal protein sequences 441
were blasted to all the prokaryotic ribosomal protein sequences, and the maximum bitscores 442
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
20
are shown in Fig. 4a. 443
444
To determine non-rProt families with sequence homology between Gla and prokaryotes, the 445
Gla protein sequence in each instance was blasted against each of the 82 prokaryotic 446
proteomes, and the best matches were submitted to NCBI Batch CD-search Tool to search 447
against the Pfam database. Only cases where both query and subject sequences belonged to 448
the same protein family were kept, and 162 of these cases were shown in Supplementary 449
Table S2, and in part in Fig. 4c. 450
451
51. Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Sayers, E. W. GenBank. 452
Nucleic Acids Res. 44, D67-72 (2016). 453
52. O'Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, 454
taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733-745 455
(2016). 456
53. Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data 457
processing and web-based tools. Nucleic Acids Res. 41, D590-596 (2013). 458
54. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 421; 459
10.1186/1471-2105-10-421 (2009). 460
55. El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, 461
D427-D432 (2019). 462
56. Marchler-Bauer, A. & Bryant, S. H. CD-Search: protein domain annotations on the fly. 463
Nucleic Acids Res. 32, W327-331 (2004). 464
465
466
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
21
Table 1. Partial list of species analyzed. 467Abbr. Species name Abbr. Species name Archaea Bacteria (Continued) Abo Aciduliprofundum boonei Cex Caldisericum exile Acf Aciduliprofundum sp. MAR08-339 Cje Campylobacter jejuni Aen C.Aenigmarchaeota archaeon Cpo Cloacibacillus porcorum Afu Archaeoglobus fulgidus Ctr Chlamydia trachomatis Aia Acidilobus sp. 7A Cvi Caulobacter vibrioides Alt C.Altiarchaeales archaeon Cvo Chelativorans sp. BNC1 Ape Aeropyrum pernix Det Desulfurobacterium thermolithotrophum Bat C.Bathyarchaeota archaeon Dra Deinococcus radiodurans Csu C.Caldiarchaeum subterraneum Dth Dictyoglomus thermophilum Csy Cenarchaeum symbiosum Eco Escherichia coli Dia C.Diapherotrites archaeon Hth Hungateiclostridium thermocellum Fac Ferroplasma acidiphilum Kol Kosmotoga olearia Ffo Fervidicoccus fontis Mau Mahella australiensis Hal Halobacterium salinarum Mhy Megamonas hypermegale Hei C.Heimdallarchaeota archaeon Mpn Mycoplasma pneumoniae Hgi Haloferax gibbonsii Mtu Mycobacterium tuberculosis Hla Halobiforma lacisalsi Pel Pelobacter sp. SFB93 Kcr C.Korarchaeum cryptofilum Pmo Petrotoga mobilis Lok C.Lokiarchaeota archaeon Rpr Rickettsia prowazekii Mac Methanosarcina acetivorans Rru Rhodospirillum rubrum Man C.Mancarchaeum acidiphilum Rso Ralstonia solanacearum Mar C.Marsarchaeota G2 archaeon Spn Streptococcus pneumoniae Mbo Methanoregula boonei Ssp Sporanaerobacter sp. NJN-17 Mco Methanocella conradii Syn Synechocystis sp. PCC 6803 Met C.Methanosuratus sp. Tht Thermobaculum terrenum Mfe Methanothermus fervidus Tis Tistrella mobilis Mic C.Micrarchaeota archaeon Tma Thermotoga maritima Min C.Methanomassiliicoccus intestinalis Tpa Treponema pallidum Mja Methanocaldococcus jannaschii Tte Caldanaerobacter subterraneus Mka Methanopyrus kandleri Xca Xanthomonas campestris Mlt C.Methanoliparum thermophilum Eukarya Mnt Methanonatronarchaeum thermophilum Aca Acanthamoeba castellanii Mph Methanophagales archaeon Bbo Babesia bovis Mte C.Methanoplasma termitum Bho Blastocystis hominis Nca C.Nitrosocaldus cavascurensis Bpr Bathycoccus prasinos Nga C.Nitrososphaera gargensis Cel Caenorhabditis elegans Nko C.Nitrosopumilus koreensis Cme Cyanidioschyzon merolae Nst C.Nanobsidianus stetteri Dme Drosophila melanogaster Odi C.Odinarchaeota archaeon Dre Danio rerio Pae Pyrobaculum aerophilum Esi Ectocarpus siliculosus Pfu Pyrococcus furiosus Gla Giardia intestinalis Sso Saccharolobus solfataricus Hsa Homo sapiens Tac Thermoplasma acidophilum Lma Leishmania major Tho C.Thorarchaeota archaeon Pfa Plasmodium falciparum Tvo Thermoplasma volcanium Pma Perkinsus marinus Woa C.Woesearchaeota archaeon Sce Saccharomyces cerevisiae Bacteria Spa Saprolegnia parasitica Aae Aquifex aeolicus Spo Schizosaccharomyces pombe Atu Agrobacterium tumefaciens Sra Strongyloides ratti Bap Buchnera aphidicola Tps Thalassiosira pseudonana Bja Bradyrhizobium japonicum Ttr Thecamonas trahens Blo Bifidobacterium longum Trv Trichomonas vaginalis Bsu Bacillus subtilis
Note: C. in front of species name stands for Candidatus. Detailed species information was 468
given in Supplementary Table S1. 469
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
22
470
471
Figure 1. Ranking of bitcores for the VARS-IARS homology of species of organisms in 472
descending order (from left to right). Bitscores were obtained using BLASTP for a 473
selection of archaeal (red), bacterial (green) and eukaryotic (blue) species. The complete list 474
of 1,185 archaeal, 3,621 bacterial and 592 eukaryotic species from NCBI Genbank release 475
231 and their bitscores are given in Supplementary Table S1. The full names and three-letter 476
abbreviations for a number of the species are indicated in Table 1. 477
478
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
23
479
480
Figure 2. The rooted universal SSU rRNA tree. The tree was constructed using the 481
Fitch-Margoliash method from the alignment of SSU rRNAs from 33 archaeal, 31 bacterial, 482
and 19 eukaryotic species. Pairwise distances between the aligned sequences were estimated 483
using DNADIST in PHYLIP with Jukes and Cantor model. The thermal scale in (a) expresses 484
the bitscore for the VARS-IARS pair of each species (Supplementary Table S1), and that in 485
(b) expresses the genetic distances of VARS (squares), or IARS (triangles) between Gla and 486
other organisms on the tree. 487
488
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
24
489
490
Figure 3. Segments of the aligned VARS and IARS sequences of Mka, Mau and Esi. The 491
six sequences were aligned using ClustalOmega50. Tick marks indicate the positions of the 492
sequence segments on the complete alignment (Supplementary Fig. S1). Similar amino acids 493
are colored in orange, and ≥ 50% conserved ones in blue. Asterisks indicate the six positions 494
displaying conservation of the same V or L amino acid in all six sequences. 495
496
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
25
497
498
Figure 4. Proteins sequence homology between eukaryotic and prokaryotic species. (a) 499
BLASTP bitscores between Gla rProts and prokaryotic rProts. (b) Bitscores between Gla 500
pyruvate phosphate dikinase and prokaryotic proteins. Among all Gla proteins, this protein 501
elicited the highest total BLASTP bitscore from all the prokaryotes. (c) Bitscores displayed 502
by some of the 162-Gla proteins from Supplementary Table S2 towards prokaryotes. The 503
color coding and order of different prokaryotic species on the x-axis in (b) and (c) are the 504
same as those in (a). Bitscores with E-value < 0.05 are shown, except for asterisked entries in 505
(c) where E-value < 0.1. 506
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372
-
26
507
508
Figure 5. Interspecies protein homologies. (a) Total homology bitscores of Gla and Trv 509
proteomes towards prokaryotic proteomes. (b) Relationships between the number of 510
homologous hits (x-axis) and average bitscore per hit (y-axis) between prokaryotic and Gla 511
proteomes. (c) Homologous hits between archaeal proteomes (y-axis) and bacterial 512
proteomes (x-axis) with (left) or without (right) normalization based on the number of 513
proteins in the archaeon. Complete heat map is given in Supplementary Fig. S4. (d) Bitscores 514
of homologous hits of two eukaryotic species of mitochondrial gene-encoded proteins 515
towards prokaryotes. Total bitscores in each instance are shown in parentheses. 516
517
not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint
https://doi.org/10.1101/745372