11 clear water bay, hong kong, china - biorxiv · 11 clear water bay, hong kong, china 12 13 14...

26
1 Title: 1 Descent of Bacteria and Eukarya from an archaeal root of life 2 3 4 Authors: 5 Xi Long, Hong Xue and J. Tze-Fei Wong* 6 7 8 Affiliation: 9 Division of Life Science, Hong Kong University of Science and Technology, 10 Clear Water Bay, Hong Kong, China 11 12 13 *Corresponding author: 14 Email: [email protected] 15 16 not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was this version posted August 23, 2019. ; https://doi.org/10.1101/745372 doi: bioRxiv preprint

Upload: others

Post on 09-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    Title: 1

    Descent of Bacteria and Eukarya from an archaeal root of life 2

    3

    4

    Authors: 5

    Xi Long, Hong Xue and J. Tze-Fei Wong* 6

    7

    8

    Affiliation: 9

    Division of Life Science, Hong Kong University of Science and Technology, 10

    Clear Water Bay, Hong Kong, China 11

    12

    13

    *Corresponding author: 14

    Email: [email protected] 15

    16

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 2

    Abstract 17

    The three biological domains delineated based on SSU rRNAs are confronted by 18

    uncertainties regarding the relationship between Archaea and Bacteria, and the origin of 19

    Eukarya. Herein the homologies between the paralogous valyl-tRNA and isoleucyl-tRNA 20

    synthetases in a wide spectrum of species revealed vertical gene transmission from an 21

    archaeal root of life through a Primitive Archaea Cluster to an Ancestral Bacteria Cluster of 22

    species. The higher homologies of the ribosomal proteins (rProts) of eukaryotic Giardia 23

    toward archaeal relative to bacterial rProts established that an archaeal-parent rather than a 24

    bacterial-parent underwent genome merger with an alphaproteobacterium to generate 25

    Eukarya. Moreover, based on the top-ranked homology of the proteins of Aciduliprofundum 26

    among archaea toward the Giardia and Trichomonas proteomes and the pyruvate phosphate 27

    dikinase of Giardia, together with their active acquisition of exogenous bacterial genes 28

    plausibly through foodchain gene adoption, the Aciduliprofundum archaea were identified as 29

    leading candidates for the archaeal-parent of Eukarya. 30

    31

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 3

    Molecular evolution analysis of small subunit ribosomal RNA (SSU rRNA) yielded a 32

    universal but unrooted tree of life (ToL) that comprises the three biological domains of 33

    Archaea, Bacteria and Eukarya1. A ToL of transfer RNAs based on the genetic distances 34

    between the 20 classes of tRNA acceptors for different amino acids located its root near the 35

    hyperthermophilic archaeal methanogen Methanopyrus (Mka)2. Although this rooting is 36

    supported by a wide range of evidence3-9, and the age of ~2.7 Gya for the Methanopyrus 37

    lineage as the oldest among living organisms10, the phylogenies of the three biological 38

    domains are beset by two fundamental problems: viz. the uncertain relationship between 39

    Archaea and Bacteria, and the identity of the prokaryotic-parent that underwent genome 40

    merger with an alphaproteobacterium to give rise to Eukarya. As long as these two problems 41

    remain unresolved, the nature of the root of life is open to diverse formulations11-15. 42

    Accordingly, the objective of the present study is to examine the pathways of descent of 43

    Bacteria and Eukarya from an archaeal root of life, and the nature of the archaeal-parent of 44

    Eukarya. 45

    46

    The antiquity of proteins could be assessed based on the increasing divergence of paralogous 47

    proteins in time16. Applying this approach, BLASTP was performed between the intraspecies 48

    valyl-tRNA synthetase (VARS) and isoleucyl-tRNA synthetase (IARS) in the genomic 49

    sequences for 5,398 species in NCBI Genbank. Arrangement of the BLASTP bitscores 50

    obtained in descending order (Supplementary Table S1 and partly in Fig. 1) showed that the 51

    119 highest scoring species were all archaeons, topped by Mka and including Mfe, Afu, Mnt 52

    and Mja with bitscores of 473, 436, 387, 387 and 387 respectively. The top scoring bacterium 53

    was the Clostridium Mau with a bit score of 378, and the top scoring eukaryote was the 54

    filamentous brown alga Esi with a bit score of 240. These results established the foremost 55

    antiquity of Mka among extant organisms. 56

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 4

    57

    The positions of some of the species analyzed in Fig. 1 were indicated on the SSU rRNA tree, 58

    with their intraspecies VARS-IARS bitscores expressed in circles colored according to the 59

    thermal scale (Fig. 2a). There was a concentration of euryarchaeons with high VARS-IARS 60

    homology in a ‘Primitive Archaea Cluster’ spread between Hal and Mfe. In the Bacteria 61

    domain, there was likewise an ‘Ancestral Bacteria Cluster’ with high VARS-IARS homology 62

    spread between Det and Mau. The deepest branching species in the Bacteria domain were 63

    two members of the Aquificae phylum, viz. the anaerobic Det with high VARS-IARS 64

    homology, and the microaerobic, low-homology Aae. Since mutations could cause loss of 65

    homology more easily than gain of homology, this suggests that Aae has evolved far from the 66

    ancestral Aquificiae species possibly as part of the wave of tumultuous changes undergone by 67

    former anaerobes in response to the appearance of atmospheric oxygen17, thereby sustaining 68

    extensive evolutionary erosion of its VARS-IARS homology. The enhanced resistance of 69

    paralogue homology to perturbation by horizontal gene transfer (HGT), due to the difficulty 70

    of transfer of a pair of proteins compared to transfer of a single protein, was illustrated by the 71

    preservation of low VARS-IARS bitscores in the proteobacterial region of the tree against 72

    any shift toward elevated VARS-IARS homology on account of HGT events, despite the high 73

    HGT-susceptibility of for example Eco, which acquired 18% of its genes through HGT 74

    subsequent to its departure from Salmonella enterica about 100 million years ago18. 75

    Previously, based on the intraspecies alloacceptor tRNA-distances of various species on the 76

    tRNA tree, LUCA was positioned between the branches leading to Mka and Ape at a distance 77

    ratio of 1.00 from the Mka branch versus 1.14 from the Ape branch2, and this position was 78

    adopted in the SSU rRNA tree in Fig. 2. 79

    80

    Given the relative paucity of HGT effects on VARS-IARS homology in a majority if not all 81

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 5

    of the species on the SSU rRNA tree, the parallel prominence of the high VARS-IARS 82

    homology species in the Primitive Archaea Cluster and the Ancestral Bacteria Cluster was 83

    explicable by vertical genetic transmission of the VARS and IARS genes from an 84

    Mka-proximal root of life to the archaeal cluster, and in turn to the bacterial cluster. Since the 85

    top ranked bacterial bitscore of Mau at 378 was between that of Mac at 382 and Pfu at 369, 86

    the results suggest that the Ancestral Bacteria Cluster branched off from the Primitive 87

    Archaea Cluster close to the Mka-proximal root of Archaea. The medium VARS-IARS 88

    bitscores of Esi, Tps, Bpr and Cme among the Eukarya (Fig. 2a) were indicative of the 89

    extension of the intraspecies VARS-IARS homology into this domain. The much higher 90

    VARS homologies (colored squares) and IARS homologies (colored triangles) between 91

    bacterial species and the eukaryote Gla compared to those between archaeal species and Gla 92

    indicated that Eukarya received the VARS and IARS genes from the Bacteria instead of the 93

    Archaea domain (Fig. 2b). 94

    95

    The aligned segments of VARS and IARS from the archaeon Mka, the bacterium Mau and 96

    the eukaryote Esi in Fig. 3 were portions of the six complete sequences (Supplementary Fig. 97

    S1). These segments showed 42/207 columns where all six sequences carried the same amino 98

    acid, thereby providing strong evidence for the vertical transmission of VARS and IARS 99

    genes from Archaea to Bacteria and Eukarya. 100

    101

    It has been suggested that an endosymbiotic event between an archaeal-parent and an 102

    alphaproteobacterium acting as mitochondrion-parent led to the formation of the Last 103

    Eukaryotic Common Ancestor (LECA) and ushered in the Eukarya domain19-21. Proposals 104

    regarding the identity of the archaeal-parent have focused on a range of prokaryotes including 105

    Thermoplasmatales where the lack of a rigid cell wall could facilitate its engulfment of the 106

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 6

    mitochondrion-parent to bring about endosymbiosis22-24; and various archaeons, especially 107

    the Asgard archaeons Lok and Tho25,26, that are enriched in eukaryotic signature proteins 108

    (ESPs)27. There is no consensus regarding a choice between these two groups of organisms28. 109

    110

    Upon BLASTP comparisons of the fifty-six ribosomal proteins (rProts) of Gla, the lowest 111

    branching eukaryote on the SSU rRNA tree, with corresponding prokaryotic rProts, 112

    fifty-three of them yielded higher bitscores with archaeons relative to bacteria (Fig. 4a), 113

    indicating that eukaryogenesis began with an archaeal-parent instead of a bacterial-parent. 114

    S21e and L36e yielded no bitscore with any archeal or bacterial rProt, suggesting that they 115

    were derived from a prokaryote not surveyed in the present study, altered beyond recognition 116

    by BLASTP, or invented by the eukaryogenesis system. The S7e of eleven eukaryotes from 117

    Table 1 although not that of Gla showed detectable homology toward the archaeon Alt. The 118

    Sce, Esi and Hsa rProts differed from those of Gla in two aspects: about one-sixth of the 119

    rProts that showed higher homology toward archaea relative to bacteria in Gla switched to 120

    lower homology toward archaea relative to bacteria in Sce, Esi and Hsa; and there were also 121

    some additional rProts in Sce, Esi and Hsa, mainly bacteria-derived ones, not found in Gla 122

    (Supplementary Fig. S2). These findings pointed to a significant influx of bacterial rProts into 123

    Sce, Esi and Hsa after their divergence from Gla, resulting in the replacement of some of the 124

    archaea-derived rProts found in Gla by bacteria-derived ones. Acf, Abo and Mac displayed 125

    the highest BLASTP bitscores among archaeons toward Gla pertaining to pyruvate phosphate 126

    dikinase (EC2.7.9.1), which would be consistent with a possible role of these three archaeons 127

    as archaeal-parent. Interestingly, the bitscores were high for Hei and Tho but low for Odi and 128

    Lok among the Asgard archaea, and high for Tac and Tvo but low for Fac, Min and Mte 129

    among the Thermoplasmatales (Fig. 4b). 130

    131

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 7

    Figure 4c shows the prokaryotic distribution of homologues of some the 162 Gla proteins that 132

    were ESPs or homologous toward a limited number of prokaryotes (Supplementary Table S2). 133

    Tho, Odi, Xca and Lok, the four prokaryotes endowed with the largest numbers of the 134

    Gla-homologous proteins, harbored only 26, 19, 17 and 16 of these Gla-homologous proteins 135

    respectively (Supplementary Table S3), and the four Asgard archaeons also did not fully 136

    share their Gla-like proteins with one another via HGT, thus underlining the difficulty for any 137

    one archaeon or bacterium to accumulate a sufficient number of eukaryote-type proteins to 138

    launch the Eukarya domain relying only on their own inventiveness and HGT. On the other 139

    hand, one or more potential prokaryotic sources were found for each of the Gla-homologous 140

    proteins despite the survey of only a small spectrum of prokaryotes, indicating that the 141

    obstacle to eukaryogenesis posed by gene deficiency may be overcome if some dependable 142

    mechanism were available to assemble the requisite genes from a broad spectrum of 143

    prokaryotes. Addressing the inadequacy of ESP coverage by single archaeal species29,30, it 144

    was suggested that HGTs, or development of phagocytosis by an ESP-rich archaeon might 145

    provide a solution26,31. However, the frequencies of HGTs might be a limiting factor10,32, and 146

    rProts could be particularly resistant to HGT33. 147

    148

    During eukaryogenesis, the archaeal-parent might join up with the mitochondrion-parent to 149

    develop directly into LECA in a mitochondria-early scenario, or it might serve as First 150

    Eukaryotic Common Ancestor (FECA), and go through successive generations of genomic 151

    expansion prior to merger with the mitochondrion-parent to form LECA in a 152

    mitochondria-late scenario34. By measuring the phylogenetic distances between different 153

    components of LECA and their closest prokaryotic relatives, evidence has been obtained in 154

    support of a mitochondria-late time table, with the appearance of nucleolus preceding that of 155

    nucleus, endomembrane system and finally mitochondria35. Previously, the proteins of the 156

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 8

    eukaryote Sce were observed to contain a substantial variety of bacterial proteins and also 157

    some archaeal ones, and it was pointed out that the influx or bacterial genes into Sce could 158

    not be explained by a merger between archaeal-parent with another bacterium besides the 159

    mitochondrion-parent, or by Sce uptake of bacterial genes through ingestion of bacteria as 160

    food. Instead, the mitochondrion-parent was a major source of the exogenous bacterial 161

    proteins in Sce20. When the eukaryotic Gla and Trv proteomes were employed as homology 162

    probes for BLASTP query against various prokaryotic proteomes, it gave rise to so many hits 163

    with both archaea and bacteria (Supplementary Table S4) that the influx of both archaeal and 164

    bacterial genes into the eukaryogenesis system had to be mediated by some highly efficient 165

    mechanism; and the similar prokaryotic-homology spectra for Gla and Trv (Fig. 5a) suggest 166

    that a majority of the prokaryotic genes in these two eukaryotic genomes entered the 167

    eukaryogenic lineage prior to the divergence between Gla and Trv. In fact, archaea have long 168

    relied on bacteria as a source of genetic diversity, and there was precedent of influx of 169

    bacterial genes being a determinant of archaeal phylogenies: a large number of bacterial 170

    genes entered into the methanogen that begot the haloarchael archaeons36,37. In view of this, 171

    an influx of beneficial prokaryotic genes into the archaeal-parent lineage likely began prior to 172

    the FECA stage and continued through LECA to the early eukaryotes as illustrated by the 173

    entry of bacterial rProt genes into Sce, Esi and Hsa (Supplementary Fig. S2). 174

    175

    Among 46 archael proteomes analyzed, the Abo and Acf proteomes displayed the highest 176

    average homology bitscores toward the eukaryotic proteomes of both Gla and Trv (Fig. 5b, 177

    Supplementary Fig. S3 and Supplementary Table S4), which suggests that these archaeons 178

    could be candidate archaeal-parents. Once a bacterial protein entered into the 179

    FECA-eukaryote lineage, its bacterial and eukaryotic versions became segregated 180

    locationwise, and evolved independently. The divergence between the two versions would 181

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 9

    thus increase with time, as in the case of functional paralogues such as VARS and IARS. 182

    Accordingly, the bitscores of Ctr and Tpa proteins were high toward Gla and Trv likely 183

    because they were taken up late by the eukaryogenic lineage, whereas the bitscores of Mpn 184

    and Aae were low likely because they were taken up early (Fig. 5b and Supplementary Fig. 185

    S3). Moreover, when the 46 archaeons were compared regarding their ability to import 186

    bacterial genes into their own genomes, several archaeons with relatively large proteomes, 187

    viz. Hla, Hgi and Mac (with 3,704 to 4,469 protein genes), as well as Lok (5,378 protein 188

    genes) and Pfu (2,053 protein genes), displayed distinctive homologies toward bacteria (Fig. 189

    5c left panel). However, when the individual archaeal bitscores were normalized with respect 190

    to the protein gene numbers of the archaeons, the normalized bitscores of the small-proteome 191

    Abo-group of seven euryarchaeons comprising Abo, Acf, Mte, Tvo and Tac (each with 192

  • 10

    they took as food41. To perform FGA, Abo would employ its array of thirty proteases to 207

    digest away the proteins of dead prokaryotes to prepare their naked DNA, import it through 208

    the Beveridge bridal S-layer of its cell surface (S-layer lattices are known to house regular 209

    channels of 2-6 nm diameter42) for implantation into its own genome. In using proteases to 210

    purify DNA for cloning, Abo predates by eons the same usage by modern genetic 211

    engineering. The S-layer of Abo is highly flexible and can be bent to form small blebbing 212

    vesicles with sharp curvature, indicating that the bonding forces between the S-layer subunits 213

    are unusually weak or transient. These vesicles can bud off and anneal with other cells43. 214

    Pseudomonas aeruginosa also releases comparable membrane vesicles that contain the 215

    pseudomonas quinoline signal for cell-cell communication and group behavior44; and 216

    Sulfolobus islandicus produces cell-derived S-layer coated spherical membrane vesicles of 217

    90-180 nm diameter45. Importantly, such flexibility of the Abo cell surface could facilitate the 218

    formation of eukaryotic endomembrane and the prerequisite phagocytosis machinery for 219

    eukaryogenesis31,46,47. Furthermore, while all prokaryotic cells evolve on the basis of 220

    nucleotidyl mutations through the replacement, addition and subtraction of nucleotides, FGA 221

    gene uptake would enable the cells to evolve also on the basis of gene-content mutations 222

    through the replacement, addition and subtraction of genes, or gene clusters, expediting 223

    eukaryogenesis by orders of magnitude. In the example of Tac, it has succeeded in acquiring 224

    gene clusters from other organisms for rProts, NADH dehydrogenase, precorrin biosynthesis, 225

    flagellar proteins and a protein degradation pathway, amounting to 32% of its total open 226

    reading frames plausibly via FGA40. Besides, the blebbing vesicles of Abo and Acf could 227

    mediate gene exchanges between cells engaged in eukaryogenesis, thus further facilitating the 228

    process. Therefore, based on the highest archaeal BLASTP bitscores of Abo and Acf toward 229

    Gla pertaining to pyruvate phosphate dikinase (Fig. 4b), their highest average archaeal 230

    bitscores toward Gla and Trv proteomes (Fig. 5b and Supplementary Fig. S3) and highest 231

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 11

    archaeal normalized bitscores toward bacteria (Fig. 5c), blebbing membrane vesicles, and 232

    possession of nine out of ten glycolytic enzymes needed for metabolic cooperation with 233

    mitochondrial respiration, these Aciduliprofundum archaea represent exceptionally 234

    advantaged candidates for the archaeal-parent role. They even share with the deep branching 235

    eukaryote Gla, the rProts of which have remained more archaeal than those of Sce, Esi and 236

    Hsa, the scavenger life style. Between Acf and Abo, the catalase/peroxidase HP1-encoding 237

    facultatively anaerobic Acf might be more resistant to oxidative damage than Abo during 238

    merger with an alphaproteobacterium, and could hunt a wide range of ecological niches for 239

    beneficial food species. The bitscore 936 of Acf toward Gla with respect to pyruvate 240

    phosphate dikinase was also slightly higher than the 928 bitscore of Abo. The Asgard 241

    archaeons have been highly regarded as candidate archaeal-parents on the strength of their 242

    important ESP genes, which could render any one of them a valuable food species as well for 243

    the archaeal-parent. Moreover, the new cultivatable Asgard Candidatus Prometheoarchaeum 244

    syntrophicum MK-D148,49 can degrade amino acids through syntrophy, and display wisp-like 245

    membrane protrusions indicative of flexible cell surface and possible FGA activity in keeping 246

    with the significant albeit modest bitscores between Lok and the bacterial species Bja and Tht, 247

    and between Odi and several bacterial species, in Fig. 5c left and right panels respectively. 248

    249

    Based on the BLASTP bitscores between the proteins of various prokaryotes and the 250

    mitochondrial gene-encoded proteins in four eukaryotes (Supplementary Table S5), the 251

    alphaproteobacteria Haematobacter, Chelativorans and Tateyamaria were closest to the 252

    lineage of the mitochondrion-parent (Fig. 5d). 253

    254

    In conclusion, in the present study, Methanopyrus kandleri was found to be the top-ranked 255

    organism in VARS-IARS homology among 5,398 species from all three biological domains, 256

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 12

    and therefore close to the root of life. Moreover, the high VARS-IARS homologies in the 257

    Primitive Archaea Cluster and in the Ancestral Bacteria Cluster delineated a pathway of 258

    descent of Bacteria from Archaea that diverged early from Archaea to form the Bacteria 259

    domain. The preeminent homology between the Gla rProts and archaeal rProts established 260

    that the prokaryote-parent of Eukarya that entered into genome merger with an 261

    alphaproteobacterial mitochondrion-parent was an archaeal-parent. The archaeal-parent was 262

    suggested to a scavenger archaeon such as the Aciduliprofundum archaea, capable of 263

    generating a chimeric eukaryote through large scale foodchain gene adoption. 264

    Notwithstanding such elaborate phylogenetic developments, the asterisked columns in Fig. 3, 265

    where all six aligned protein sequences showed the same Val or Leu residue despite the ease 266

    with which Val, Leu and Ile can be interchanged in evolution, represented a level of protein 267

    sequence conservation across two different proteins, three biological domains, and two giga 268

    year-plus time span that required the vertical genetic descent of Bacteria and Eukarya from an 269

    archaeal root of life. 270

    271

    272

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 13

    References 273

    1. Woese, C. R., Kandler, O. & Wheelis, M. L. Towards a natural system of organisms: 274

    proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. U.S.A. 275

    87, 4576-4579 (1990). 276

    2. Xue, H., Tong, K. L., Marck, C., Grosjean, H. & Wong, J. T. Transfer RNA paralogs: 277

    evidence for genetic code-amino acid biosynthesis coevolution and an archaeal root of 278

    life. Gene 310, 59-66 (2003). 279

    3. Tong, K. L. & Wong, J. T. Anticodon and wobble evolution. Gene 333, 169-177 (2004). 280

    4. Wong, J. T. F. Coevolution theory of the genetic code at age thirty. BioEssays 27, 281

    416-425 (2005). 282

    5. Wong, J. T., Chen, J., Mat, W. K., Ng, S. K. & Xue, H. Polyphasic evidence delineating 283

    the root of life and roots of biological domains. Gene 403, 39-52 (2007). 284

    6. Yu, Z., Takai, K., Slesarev, A., Xue, H. & Wong, J. T. Search for primitive 285

    Methanopyrus based on genetic distance between Val- and Ile-tRNA synthetases. J. Mol. 286

    Evol. 69, 386-394 (2009). 287

    7. Wong, J. T., Ng, S. K., Mat, W. K., Hu, T. & Xue, H. Coevolution theory of the genetic 288

    code at age forty: pathway to translation and synthetic life. Life (Basel) 6, 12; 289

    10.3390/life6010012 (2016). 290

    8. Kelly, S., Wickstead, B. & Gull, K. Archaeal phylogenomics provides evidence in 291

    support of a methanogenic origin of the Archaea and a thaumarchaeal origin for the 292

    eukaryotes. Proc. Royal Soc. B 278, 1009-1018 (2011). 293

    9. Williams, T. A. et al. Integrative modeling of gene and genome evolution roots the 294

    archaeal tree of life. Proc. Natl. Acad. Sci. U.S.A. 114, E4602-E4611 (2017). 295

    10. Blank, C. E. Low rates of lateral gene transfer among metabolic genes define the 296

    evolving biogeochemical niches of archaea through deep time. Archaea 2012, 843539; 297

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 14

    10.1155/2012/843539 (2012). 298

    11. Doolittle, W. F. Phylogenetic classification and the universal tree. Science 284, 299

    2124-2128 (1999). 300

    12. Cavalier-Smith, T. Rooting the tree of life by transition analyses. Biol. Direct 1, 19; 301

    10.1186/1745-6150-1-19 (2006). 302

    13. Lake, J. A., Skophammer, R. G., Herbold, C. W. & Servin, J. A. Genome beginnings: 303

    rooting the tree of life. Philos. Trans. Royal Soc. B 364, 2177-2185 (2009). 304

    14. Harish, A., Tunlid, A. & Kurland, C. G. Rooted phylogeny of the three superkingdoms. 305

    Biochimie 95, 1593-1604 (2013). 306

    15. Forterre, P. The universal tree of life: an update. Front. Microbiol. 6, 717; 307

    10.3389/fmicb.2015.00717 (2015). 308

    16. Schwartz, R. M. & Dayhoff, M. O. Origins of prokaryotes, eukaryotes, mitochondria, 309

    and chloroplasts. Science 199, 395-403 (1978). 310

    17. Raymond, J. & Segre, D. The effect of oxygen on biochemical networks and the 311

    evolution of complex life. Science 311, 1764-1767 (2006). 312

    18. Lawrence, J. G. & Ochman, H. Molecular archaeology of the Escherichia coli genome. 313

    Proc. Natl. Acad. Sci. U.S.A. 95, 9413-9417 (1998). 314

    19. Andersson, S. G. E. et al. The genome sequence of Rickettsia prowazekii and the origin 315

    of mitochondria. Nature 396, 133-140 (1998). 316

    20. Esser, C. et al. A genome phylogeny for mitochondria among alpha-proteobacteria and a 317

    predominantly eubacterial ancestry of yeast nuclear genes. Mol. Biol. Evol. 21, 318

    1643-1660 (2004). 319

    21. Fitzpatrick, D. A., Creevey, C. J. & McInerney, J. O. Genome phylogenies indicate a 320

    meaningful alpha-proteobacterial phylogeny and support a grouping of the mitochondria 321

    with the Rickettsiales. Mol. Biol. Evol. 23, 74-85 (2006). 322

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 15

    22. Searcy, D. G., Stein, D. B. & Searcy, K. B. A mycoplasma-like archaebacterium 323

    possibly related to the nucleus and cytoplasm of eukaryotic cells. Ann. N. Y. Acad. Sci. 324

    361, 312-324 (1981). 325

    23. Margulis, L., Dolan, M. F. & Guerrero, R. The chimeric eukaryote: origin of the nucleus 326

    from the karyomastigont in amitochondriate protists. Proc. Natl. Acad. Sci. U.S.A. 97, 327

    6954-6959 (2000). 328

    24. Pisani, D., Cotton, J. A. & McInerney, J. O. Supertrees disentangle the chimerical origin 329

    of eukaryotic genomes. Mol. Biol. Evol. 24, 1752-1760 (2007). 330

    25. Spang, A. et al. Complex archaea that bridge the gap between prokaryotes and 331

    eukaryotes. Nature 521, 173-179 (2015). 332

    26. Eme, L., Spang, A., Lombard, J., Stairs, C. W. & Ettema, T. J. G. Archaea and the origin 333

    of eukaryotes Nat. Rev. Microbiol. 15, 711-723 (2017). 334

    27. Hartman, H. & Fedorov, A. The origin of the eukaryotic cell: A genomic investigation. 335

    Proc. Natl. Acad. Sci. U.S.A. 99, 1420-1425 (2002). 336

    28. Gribaldo, S., Poole, A. M., Daubin, V., Forterre, P. & Brochier-Armanet, C. The origin 337

    of eukaryotes and their relationship with the Archaea: are we at a phylogenomic impasse? 338

    Nat. Rev. Microbiol. 8, 743-752 (2010). 339

    29. Martin, W. F., Tielens, A. G. M., Mentel, M., Garg, S. G. & Gould, S. B. The 340

    physiology of phagocytosis in the context of mitochondrial origin. Microbiol. Mol. Biol. 341

    Rev. 83, e00008-17; 10.1128/MMBR.00008-17 (2017). 342

    30. Nasir, A., Kim, K. M. & Caetano-Anolles, G. Lokiarchaeota: eukaryote-like missing 343

    links from microbial dark matter? Trends Microbiol. 23, 448-450 (2015). 344

    31. Yutin, N., Wolf, M. Y., Wolf, Y. I. & Koonin, E. V. The origins of phagocytosis and 345

    eukaryogenesis. Biol. Direct 4, 9; 10.1186/1745-6150-4-9 (2009). 346

    32. Kurland, C. G., Canback, B. & Berg, O. G. Horizontal gene transfer: a critical view. 347

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 16

    Proc. Natl. Acad. Sci. U.S.A. 100, 9658-9662 (2003). 348

    33. Shi, T. & Falkowski, P. G. Genome evolution in cyanobacteria: the stable core and the 349

    variable shell. Proc. Natl. Acad. Sci. U.S.A. 105, 2510-2515 (2008). 350

    34. Koumandou, V. L. et al. Molecular paleontology and complexity in the last eukaryotic 351

    common ancestor. Crit. Rev. Biochem. Mol. Biol. 48, 373-396 (2013). 352

    35. Pittis, A. A. & Gabaldon, T. Late acquisition of mitochondria by a host with chimaeric 353

    prokaryotic ancestry. Nature 531, 101-104 (2016). 354

    36. Akanni, W. A. et al. Horizontal gene flow from eubacteria to archaebacteria and what it 355

    means for our understanding of eukaryogenesis. Philos. Trans. Royal Soc. B 370, 356

    20140337; 10.1098/rstb.2014.0337 (2015). 357

    37. Nelson-Sathi, S. et al. Acquisition of 1,000 eubacterial genes physiologically 358

    transformed a methanogen at the origin of Haloarchaea. Proc. Natl. Acad. Sci. U.S.A. 359

    109, 20537-20542 (2012). 360

    38. Reysenbach, A. L. et al. A ubiquitous thermoacidophilic archaeon from deep-sea 361

    hydrothermal vents. Nature 442, 444-447 (2006). 362

    39. Subhraveti P., O. Q., Keseler I., Kothari A., Caspi R., Karp P. D. Summary of 363

    Aciduliprofundum sp. MAR08-339, version 23.0. 364

    https://biocyc.org/organism-summary?object=ASP673860 (2017). 365

    40. Ruepp, A. et al. The genome sequence of the thermoacidophilic scavenger 366

    Thermoplasma acidophilum. Nature 407, 508-513 (2000). 367

    41. Doolittle, W. E. You are what you eat: a gene transfer ratchet could account for bacterial 368

    genes in eukaryotic nuclear genomes. Trends Genet. 14, 307-311 (1998). 369

    42. Sara, M. & Sleytr, U. B. Production and characteristics of ultrafiltration membranes with 370

    uniform pores from two-dimensional arrays of proteins. J. Membr. Sci. 33, 27-49 (1987). 371

    43. Reysenbach, A. L. & Flores, G. E. Electron microscopy encounters with unusual 372

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 17

    thermophiles helps direct genomic analysis of Aciduliprofundum boonei. Geobiology 6, 373

    331-336 (2008). 374

    44. Mashburn, L. M. & Whiteley, M. Membrane vesicles traffic signals and facilitate group 375

    activities in a prokaryote. Nature 437, 422-425 (2005). 376

    45. Prangishvili, D. et al. Sulfolobicins, specific proteinaceous toxins produced by strains of 377

    the extremely thermophilic archaeal genus Sulfolobus. J. Bacteriol. 182, 2985-2988 378

    (2000). 379

    46. Poole, A. M. & Gribaldo, S. Eukaryotic origins: how and when was the mitochondrion 380

    acquired? Cold Spring Harb. Perspect. Biol. 6, a015990; 10.1101/cshperspect.a015990 381

    (2014). 382

    47. McInerney, J., Pisani, D. & O'Connell, M. J. The ring of life hypothesis for eukaryote 383

    origins is supported by multiple kinds of data. Philos. Trans. Royal Soc. B 370, 384

    20140323; 10.1098/rstb.2014.0323 (2015). 385

    48. Imachi, H. et al. Isolation of an archaeon at the prokaryote-eukaryote interface. 386

    Preprint at https://doi.org/10.1101/726976 (2019). 387

    49. Lambert, J. Scientists glimpse oddball microbe that could explain rise of complex life. 388

    Nature 572, 294 (2019). 389

    50. Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence 390

    alignments using Clustal Omega. Mol. Syst. Biol. 7, 539; 10.1038/msb.2011.75 (2011). 391

    392

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 18

    Supplementary Information is available online. 393

    394

    Author Contribution 395

    J.T.W. and H.X. conceived the study; X.L. collected the data and performed computational analysis; 396

    and J.T.W., H.X. and X.L. wrote the paper. All authors read and approved the final manuscript. 397

    398

    Acknowledgments 399

    This study was supported by University Grants Council of Hong Kong SAR (ITS/113/15FP), 400

    and X.L. was recipient of a Hong Kong Government Ph. D. Fellowship. 401

    402

    Author information 403

    The authors declare that they have no competing interests. Correspondence and material 404

    requests should be addressed to T.F.W. ([email protected]). 405

    406

    Data availability 407

    All data supporting the findings of this study are available within the paper and its 408

    Supplementary Information files. 409

    410

    Methods 411

    Source of data and materials 412

    The protein and SSU rRNA sequences analyzed in the present study were retrieved from the 413

    NCBI GenBank release 231 (ftp://ftp.ncbi.nlm.nih.gov/genomes/)51,52. For species without 414

    available SSU rRNA information in NCBI, quality checked SSU rRNA sequences were 415

    downloaded from the SILVA database release 132 (https://www.arb-silva.de/)53. For species 416

    with multiple SSU rRNA sequences, the one yielding the highest total bitscore (using 417

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 19

    BLASTN54 with ‘-word_size’ flag set to 4) with SSU rRNAs of other species from the same 418

    domain was employed for analysis. The alignment of the 83 SSU rRNA sequences used to 419

    build the SSU rRNA tree was given in Supplementary File S1. Eukaryotic mitochondrial 420

    gene-encoded protein sequences were retrieved from NCBI Protein database 421

    (https://www.ncbi.nlm.nih.gov/protein) by searching species name of the eukaryote of 422

    interest and setting the ‘Genetic compartments’ filter as ‘Mitochondrion’. 423

    424

    Estimation of nuclear or mitochondrial proteome homology 425

    When the proteomes of two species were compared using BLASTP54 (with ‘-evalue’ flag 426

    setting to 0.05), query and subject sequences that were the only best match of each other, viz. 427

    query n has the highest bitscore with subject m, and subject m has the highest bitscore with 428

    query n at the same time, were considered in calculating proteome similarity. 429

    430

    Estimation of protein family homology 431

    Protein sequence similarity was estimated by the maximum bitscore of each pair of sequences 432

    yielded by BLASTP54 with all default parameters. To estimate ribosomal protein similarities 433

    among eukaryotes and prokaryotes, all seed sequences of 80 ribosomal protein families 434

    (Supplementary Table S6) retrieved from Pfam database55 were blasted (with ‘-evalue’ flag 435

    set to 0.05) against the proteomes of 21 eukaryotes, 46 archaea and 36 bacteria respectively. 436

    For every ribosomal protein family, only one protein sequence of each species yielding the 437

    highest bitscore with the seed sequences were selected for further analysis. To remove 438

    false-positive sequences, the selected sequences were submitted to NCBI Batch CD-search 439

    Tool56 to search against the Pfam database, and sequences that failed to map to the same 440

    ribosomal protein family were removed. Finally, the eukaryotic ribosomal protein sequences 441

    were blasted to all the prokaryotic ribosomal protein sequences, and the maximum bitscores 442

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 20

    are shown in Fig. 4a. 443

    444

    To determine non-rProt families with sequence homology between Gla and prokaryotes, the 445

    Gla protein sequence in each instance was blasted against each of the 82 prokaryotic 446

    proteomes, and the best matches were submitted to NCBI Batch CD-search Tool to search 447

    against the Pfam database. Only cases where both query and subject sequences belonged to 448

    the same protein family were kept, and 162 of these cases were shown in Supplementary 449

    Table S2, and in part in Fig. 4c. 450

    451

    51. Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Sayers, E. W. GenBank. 452

    Nucleic Acids Res. 44, D67-72 (2016). 453

    52. O'Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, 454

    taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733-745 455

    (2016). 456

    53. Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data 457

    processing and web-based tools. Nucleic Acids Res. 41, D590-596 (2013). 458

    54. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 421; 459

    10.1186/1471-2105-10-421 (2009). 460

    55. El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, 461

    D427-D432 (2019). 462

    56. Marchler-Bauer, A. & Bryant, S. H. CD-Search: protein domain annotations on the fly. 463

    Nucleic Acids Res. 32, W327-331 (2004). 464

    465

    466

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 21

    Table 1. Partial list of species analyzed. 467Abbr. Species name Abbr. Species name Archaea Bacteria (Continued) Abo Aciduliprofundum boonei Cex Caldisericum exile Acf Aciduliprofundum sp. MAR08-339 Cje Campylobacter jejuni Aen C.Aenigmarchaeota archaeon Cpo Cloacibacillus porcorum Afu Archaeoglobus fulgidus Ctr Chlamydia trachomatis Aia Acidilobus sp. 7A Cvi Caulobacter vibrioides Alt C.Altiarchaeales archaeon Cvo Chelativorans sp. BNC1 Ape Aeropyrum pernix Det Desulfurobacterium thermolithotrophum Bat C.Bathyarchaeota archaeon Dra Deinococcus radiodurans Csu C.Caldiarchaeum subterraneum Dth Dictyoglomus thermophilum Csy Cenarchaeum symbiosum Eco Escherichia coli Dia C.Diapherotrites archaeon Hth Hungateiclostridium thermocellum Fac Ferroplasma acidiphilum Kol Kosmotoga olearia Ffo Fervidicoccus fontis Mau Mahella australiensis Hal Halobacterium salinarum Mhy Megamonas hypermegale Hei C.Heimdallarchaeota archaeon Mpn Mycoplasma pneumoniae Hgi Haloferax gibbonsii Mtu Mycobacterium tuberculosis Hla Halobiforma lacisalsi Pel Pelobacter sp. SFB93 Kcr C.Korarchaeum cryptofilum Pmo Petrotoga mobilis Lok C.Lokiarchaeota archaeon Rpr Rickettsia prowazekii Mac Methanosarcina acetivorans Rru Rhodospirillum rubrum Man C.Mancarchaeum acidiphilum Rso Ralstonia solanacearum Mar C.Marsarchaeota G2 archaeon Spn Streptococcus pneumoniae Mbo Methanoregula boonei Ssp Sporanaerobacter sp. NJN-17 Mco Methanocella conradii Syn Synechocystis sp. PCC 6803 Met C.Methanosuratus sp. Tht Thermobaculum terrenum Mfe Methanothermus fervidus Tis Tistrella mobilis Mic C.Micrarchaeota archaeon Tma Thermotoga maritima Min C.Methanomassiliicoccus intestinalis Tpa Treponema pallidum Mja Methanocaldococcus jannaschii Tte Caldanaerobacter subterraneus Mka Methanopyrus kandleri Xca Xanthomonas campestris Mlt C.Methanoliparum thermophilum Eukarya Mnt Methanonatronarchaeum thermophilum Aca Acanthamoeba castellanii Mph Methanophagales archaeon Bbo Babesia bovis Mte C.Methanoplasma termitum Bho Blastocystis hominis Nca C.Nitrosocaldus cavascurensis Bpr Bathycoccus prasinos Nga C.Nitrososphaera gargensis Cel Caenorhabditis elegans Nko C.Nitrosopumilus koreensis Cme Cyanidioschyzon merolae Nst C.Nanobsidianus stetteri Dme Drosophila melanogaster Odi C.Odinarchaeota archaeon Dre Danio rerio Pae Pyrobaculum aerophilum Esi Ectocarpus siliculosus Pfu Pyrococcus furiosus Gla Giardia intestinalis Sso Saccharolobus solfataricus Hsa Homo sapiens Tac Thermoplasma acidophilum Lma Leishmania major Tho C.Thorarchaeota archaeon Pfa Plasmodium falciparum Tvo Thermoplasma volcanium Pma Perkinsus marinus Woa C.Woesearchaeota archaeon Sce Saccharomyces cerevisiae Bacteria Spa Saprolegnia parasitica Aae Aquifex aeolicus Spo Schizosaccharomyces pombe Atu Agrobacterium tumefaciens Sra Strongyloides ratti Bap Buchnera aphidicola Tps Thalassiosira pseudonana Bja Bradyrhizobium japonicum Ttr Thecamonas trahens Blo Bifidobacterium longum Trv Trichomonas vaginalis Bsu Bacillus subtilis

    Note: C. in front of species name stands for Candidatus. Detailed species information was 468

    given in Supplementary Table S1. 469

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 22

    470

    471

    Figure 1. Ranking of bitcores for the VARS-IARS homology of species of organisms in 472

    descending order (from left to right). Bitscores were obtained using BLASTP for a 473

    selection of archaeal (red), bacterial (green) and eukaryotic (blue) species. The complete list 474

    of 1,185 archaeal, 3,621 bacterial and 592 eukaryotic species from NCBI Genbank release 475

    231 and their bitscores are given in Supplementary Table S1. The full names and three-letter 476

    abbreviations for a number of the species are indicated in Table 1. 477

    478

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 23

    479

    480

    Figure 2. The rooted universal SSU rRNA tree. The tree was constructed using the 481

    Fitch-Margoliash method from the alignment of SSU rRNAs from 33 archaeal, 31 bacterial, 482

    and 19 eukaryotic species. Pairwise distances between the aligned sequences were estimated 483

    using DNADIST in PHYLIP with Jukes and Cantor model. The thermal scale in (a) expresses 484

    the bitscore for the VARS-IARS pair of each species (Supplementary Table S1), and that in 485

    (b) expresses the genetic distances of VARS (squares), or IARS (triangles) between Gla and 486

    other organisms on the tree. 487

    488

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 24

    489

    490

    Figure 3. Segments of the aligned VARS and IARS sequences of Mka, Mau and Esi. The 491

    six sequences were aligned using ClustalOmega50. Tick marks indicate the positions of the 492

    sequence segments on the complete alignment (Supplementary Fig. S1). Similar amino acids 493

    are colored in orange, and ≥ 50% conserved ones in blue. Asterisks indicate the six positions 494

    displaying conservation of the same V or L amino acid in all six sequences. 495

    496

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 25

    497

    498

    Figure 4. Proteins sequence homology between eukaryotic and prokaryotic species. (a) 499

    BLASTP bitscores between Gla rProts and prokaryotic rProts. (b) Bitscores between Gla 500

    pyruvate phosphate dikinase and prokaryotic proteins. Among all Gla proteins, this protein 501

    elicited the highest total BLASTP bitscore from all the prokaryotes. (c) Bitscores displayed 502

    by some of the 162-Gla proteins from Supplementary Table S2 towards prokaryotes. The 503

    color coding and order of different prokaryotic species on the x-axis in (b) and (c) are the 504

    same as those in (a). Bitscores with E-value < 0.05 are shown, except for asterisked entries in 505

    (c) where E-value < 0.1. 506

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372

  • 26

    507

    508

    Figure 5. Interspecies protein homologies. (a) Total homology bitscores of Gla and Trv 509

    proteomes towards prokaryotic proteomes. (b) Relationships between the number of 510

    homologous hits (x-axis) and average bitscore per hit (y-axis) between prokaryotic and Gla 511

    proteomes. (c) Homologous hits between archaeal proteomes (y-axis) and bacterial 512

    proteomes (x-axis) with (left) or without (right) normalization based on the number of 513

    proteins in the archaeon. Complete heat map is given in Supplementary Fig. S4. (d) Bitscores 514

    of homologous hits of two eukaryotic species of mitochondrial gene-encoded proteins 515

    towards prokaryotes. Total bitscores in each instance are shown in parentheses. 516

    517

    not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which wasthis version posted August 23, 2019. ; https://doi.org/10.1101/745372doi: bioRxiv preprint

    https://doi.org/10.1101/745372