evidence supporting a viral origin of the eukaryotic nucleus30 the defining feature of the...

34
1 Evidence supporting a viral origin of the eukaryotic nucleus 1 2 3 Dr Philip JL Bell 4 Microbiogen Pty Ltd 5 Correspondence should be addressed to Email [email protected] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Keywords: Viral eukaryogenesis, nucleus, eukaryote origin, viral factory, mRNA capping, phylogeny 25 26 27 . CC-BY-NC-ND 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/679175 doi: bioRxiv preprint

Upload: others

Post on 19-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

1

Evidence supporting a viral origin of the eukaryotic nucleus 1

2

3

Dr Philip JL Bell 4

Microbiogen Pty Ltd 5

Correspondence should be addressed to Email [email protected] 6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

Keywords: Viral eukaryogenesis, nucleus, eukaryote origin, viral factory, mRNA capping, phylogeny 25

26

27

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 2: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

2

Abstract 28

29

The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 30

from translation. This uncoupling of transcription from translation depends on a complex process 31

employing hundreds of eukaryotic specific genes acting in concert and requires the 7-32

methylguanylate (m7G) cap to prime eukaryotic mRNA for splicing, nuclear export, and cytoplasmic 33

translation. The origin of this complex system is currently a paradox since it is not found or needed 34

in prokaryotic cells which lack nuclei, yet it was apparently present and fully functional in the Last 35

Eukaryotic Common Ancestor (LECA). According to the Viral Eukaryogenesis (VE) hypothesis the 36

abrupt appearance of the nucleus in the eukaryotic lineage occurred because the nucleus descends 37

from the viral factory of a DNA phage that infected the archaeal ancestor of the eukaryotes. 38

Consequently, the system for uncoupling of transcription from translation in eukaryotes is predicted 39

by the VE hypothesis to be viral in origin. In support of this hypothesis it is shown here that m7G 40

capping apparatus that primes the uncoupling of transcription from translation in eukaryotes is 41

present in viruses of the Mimiviridae but absent from bona-fide archaeal relatives of the eukaryotes 42

such as Lokiarchaeota. Furthermore, phylogenetic analysis of the m7G capping pathway indicates 43

that eukaryotic nuclei and Mimiviridae obtained this pathway from a common ancestral source that 44

predated the origin of LECA. These results support the VE hypothesis and suggest the eukaryotic 45

nucleus and the Mimiviridae descend from a common First Eukaryotic Nuclear Ancestor (FENA). 46

47

48

49

50

51

52

53

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 3: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

3

Introduction 54

55

A membrane-bound nucleus defines the eukaryotic domain, and all cellular organisms without nuclei 56

are prokaryotic (Sapp 2005; Stanier and Van Niel 1962). Its presence contributes to a great divide 57

between eukaryotes and prokaryotes defined by features such as linear chromosomes, telomeres, 58

nuclear pores, the spliceosome, mitosis, meiosis, the sexual cycle, and the endoplasmic reticulum. 59

Since the nucleus separates the eukaryotic genome from the ribosomal apparatus, its presence also 60

introduces an uncoupling of transcription from translation unique to the eukaryotic domain. 61

62

Lokiarchaeota reportedly ‘bridges the gap between prokaryotes and eukaryotes’ and are proposed 63

to be bona-fide archaeal relative of the eukaryotes (Spang et al. 2015). Lokiarchaeota and related 64

Asgardians encode Crenactins, the ESCRT-III complex, a family of small Ras-like GTPases and a 65

ubiquitin system, making them a plausible direct descendent of an archaeal ancestor of the 66

eukaryotes (Koonin 2015). This discovery supports the Eocyte tree of life where eukaryotes evolved 67

from a specific archaeon rather than representing a sister group to the archaea (Riviera and Lake 68

1992). If Lokiarchaeota are bona-fide archaeal relatives of the eukaryotes, the last common 69

ancestor of the Asgard archaea and the eukaryotes can be inferred to be an archaeal First Eukaryotic 70

Common Ancestor (FECA) (Eme et al. 2017). 71

72

Despite sharing a proposed archaeal ancestor with Lokiarchaeota, the last eukaryotic common 73

ancestor (LECA) possessed both a nucleus and a mitochondrion and no eukaryotes are descended 74

from any earlier intermediates without both these complex organelles (Neumann et al. 2010). 75

Whilst the abrupt appearance of the mitochondrion in LECA is persuasively explained by its 76

endosymbiotic descent from an alpha-proteobacterium (e.g. Lang et al. 1999) the similarly abrupt 77

appearance of a nucleus has been much more difficult to explain (Martin 1999; Martin 2005). 78

79

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 4: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

4

The presence of the eukaryotic nucleus results in an uncoupling transcription and translation, and 80

this uncoupling requires mRNA to be synthesised inside the nucleus, capped, processed, and 81

exported into the cytoplasm for translation (Kyrieleis et al. 2014). This contrasts to prokaryotic 82

translational systems that rely on direct recognition of uncapped mRNA by the ribosomal apparatus 83

(Benelli and Londei 2011). Evidence of archaeal methanogens 3.8 billion years ago (Battistuzzi et al. 84

2004) shows that prokaryotes evolved well before the eukaryotes originated 1.8 billion years ago 85

(Parfrey et al. 2011). Accordingly the prokaryotic system of direct recognition of mRNA by the 86

ribosomal apparatus existed for nearly two billion years before the nucleus and its cap based system 87

abruptly appeared in LECA. 88

89

The change from a prokaryotic translational system found in FECA (Figure 1) to the uncoupled 90

eukaryotic system found in LECA (Figure 2) involved the evolution of a complex molecular system 91

involving hundreds of interacting genes. The m7G cap is critical to this process since it primes the 92

mRNA for processing, export and translation (Figure 2). The genes required to add the m7G cap 93

include: an RNA polymerase (RNAP-II) dedicated to capped mRNA synthesis (Sentenac 1985), an RNA 94

triphophatase (TPase), guanylyltransferase (GTase) and methyltransferase (MTase) required for 95

capping mRNA (Kyrieleis et al. 2014). A cap binding protein (eIF4E) is also essential since it is 96

required for initiating translation of the capped mRNA in the cytoplasm (Marcotrigiano et al. 1997). 97

Paradoxically, the high level of complexity and the integrated nature of the cap based system of 98

uncoupling transcription from translation suggest a long evolutionary history, yet no transitional 99

cellular forms linking the prokaryotic (Figure 1) and eukaryotic systems (Figure 2) have been 100

described. Consequently if only prokaryotes are considered as source for the eukaryotic m7G cap 101

based system, an abrupt and currently insurmountable phylogenetic impasse is encountered. 102

103

The Viral Eukaryogenesis (VE) hypothesis proposes the nucleus derives from an ancient DNA 104

phage/virus and predicts the m7G cap based system that primes the uncoupling of transcription 105

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 5: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

5

from translation in eukaryotes originated amongst the prokaryotic viruses (Bell 2001). Although 106

fossil evidence for viruses is unlikely to be found, viruses almost certainly existed before the origin of 107

LECA. For example, a prokaryotic genome that is free of genetic parasites is expected to show 108

signs of genome degeneration due to the need for a mechanism to overcome the degradation of 109

prokaryotic genomes caused by processes such as Muller’s ratchet (Iranzo et al. 2016). There are 110

also strong biological arguments that the emergence of genetic parasites is inevitable due to the 111

instability of parasite-free states (Koonin et al. 2017). Further experimental support for a pre-LECA 112

origin of viruses comes from phylogenomic analysis which shows that modern eukaryotic viruses 113

evolved from pre-existing prokaryotic phage (Koonin et al. 2015). It can thus be anticipated that 114

viruses would have emerged in concert with the first prokaryotes and existed for much of the 2 115

billion years between the appearance of the first methanogens and the appearance of LECA. 116

117

The VE hypothesis has been supported by the discovery that the Pseudomonas jumbophage 201 Φ2-118

1 constructs a nucleus-like viral factory that uncouples transcription from translation (Chaikeeratisak 119

et al. 2017b). The viral factory established by 201 Φ2-1 confines phage DNA within the factory 120

whilst excluding ribosomes (Chaikeeratisak et al. 2017b). Thus once the factory is established, 121

transcription occurs within the factory and the mRNA must be exported into the cytoplasm for 122

translation. Functionally, infection results in the bacterial protoplasm being divided into a viroplasm 123

where viral information processing occurs, and a cytoplasm where translation and metabolic 124

enzymes are localised. Since viral encoded enzymes such as RNA polymerases and DNA polymerases 125

must function inside the viral factory whilst components of the phage virions are assembled in the 126

cytoplasm, it can be inferred that the boundary of these viral factories must be able to selectively 127

sort which proteins, RNA transcripts and other factors can move across the boundary. 128

129

Deepening similarities between the eukaryotic nucleus and the viral factories of phage 201 Φ2-1, 130

201 Φ2-1 possesses homologues of eukaryotic tubulin (PhuZ), and this tubulin polymerises via 131

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 6: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

6

dynamic instability, positioning the factory in the centre of the infected cell (Chaikeeratisak et al. 132

2017). The PhuZ spindle is the only known example of a cytoskeletal structure that shares three key 133

properties with the eukaryotic spindle: dynamic instability, a bipolar array of filaments, and central 134

positioning of DNA (Chaikeeratisak et al. 2017). This is a significant parallel since eukaryotic nuclei 135

are positioned in the cell by microtubule-dependent motors during development and differentiation 136

(Star 2009). 137

138

It was similarities between the eukaryotic nucleus and the Pox viruses that led to the original VE 139

proposal that the nucleus was derived from a virus that infected the archaeal ancestor of the 140

eukaryotes (Bell 2001). In particular the observations supporting the model were that the Pox 141

viruses could produce capped mRNA, possessed linear chromosomes, could separate transcription 142

from translation, and had an ability to replicate entirely within the host cytoplasm (Bell 2001). 143

Subsequently Pox viruses were found to be members of an ancient monophyletic group, the NCLDV 144

viruses (Iyer et al. 2001). The discovery of the giant Mimivirus in 2004 and its allocation to the 145

NCLDV group (Raoult et al. 2004) demonstrated pox-viral relatives were of unprecedented size and 146

possessed a complexity comparable to prokaryotic cells (Raoult et al. 2004). Many other giant 147

NCLDV viruses have been discovered including even more complex relatives such as the 148

Kloseneuvirus (Schultz et al. 2017) and Tupanvirus (Abrahão et al. 2018). 149

150

A prokaryotic viral ancestry for both the Poxviruses and the other NCLDV viruses has been supported 151

by phylogenomic studies (Koonin and Yutin 2010) and is compatible with the NCLDV common 152

ancestor existing at or before the origin of LECA (Boyer et al. 2010; Nasir et al. 2012; Yutin et al. 153

2009). Furthermore, comparison between inferred genome of the NCLDV common ancestor (Yutin 154

and Koonin 2012) and the modern PhiKZ like viruses (including 201 Φ2-1) reveals that both classes of 155

giant virus possess large genomes, encode homologues of DNA polymerases (Kazlauskas and 156

Venclovas 2011), multi-subunit RNA polymerase (Ceyssens et al. 2014), DNA ligases (Wojtus et al. 157

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 7: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

7

2017), RNA ligases (Wojtus et al. 2017) and replicate with a high degree of autonomy from their 158

hosts (Yuan and Gao 2017). Furthermore distantly related NCLDV viruses from the Poxviridae, 159

Asfaviridae, Pithoviridae Marseilleviridae and Mimiviridae families replicate partially or exclusively 160

within large cytoplasmic Viral Factories (Fridmann-Sirkis 2016). Since the genes for adding an m7G 161

cap to mRNA were present in the common ancestor of all the NCLDV viruses (Iyer et al. 2001; Yutin 162

and Koonin 2012) it can be inferred from these observations that the common ancestor of the 163

NCLDV viruses was a virus that could produce capped mRNA and like phage 201 Φ2-1 could establish 164

a viral factory in its host’s cytoplasm. 165

166

In addition to inheriting the ability to add an m7G cap to mRNA from the NCLDV common ancestor, 167

two separate groups of NCLDV viruses, the Pandoraviridae and the Mimiviridae, also possess 168

homologues of the eukaryotic cap binding protein eIF4E (Schultz et al. 2017). Unlike many NCLDV 169

viruses, the Pandoraviruses possess introns in their genes strongly suggesting that at least part of the 170

Pandoravirus genome is transcribed in the nucleus (Phillipe et al. 2013). By contrast members of the 171

Mimiviridae have been shown to replicate entirely in the host cytoplasm and establish a nucleus-like 172

uncoupling of transcription from translation (Fridmann-Sirkis et al. 2016). Furthermore, the cap 173

binding protein encoded by eIF4E is located in the cytoplasm outside the viral factory during 174

infection (Fridmann-Sirkis et al. 2016). Thus as shown in Figure 3, in addition to viral factories of 175

Mimiviruses and 201 Φ2-1 sharing fundamental features with each other such as the ability to 176

uncouple transcription from translation and selectively control which macromolecules enter and exit 177

the viral factory, the Mimivirus viral factories also share further specific fundamental features with 178

the eukaryotic nucleus. Amongst the shared features with the eukaryotic nucleus, the Mimivirus has 179

a linear genome, establishes a nucleus-like organelle in its host’s cytoplasm, possesses its own RNA 180

polymerase dedicated to transcribing capped mRNA, exports capped mRNA into the cytoplasm for 181

translation, and possesses its own version of the cap binding protein (eIF4E) which is located in the 182

host cytoplasm during infection and is presumably involved in controlling the initiation of translation 183

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 8: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

8

of capped Mimiviral transcripts. These discoveries have led to independent suggestions that the 184

nucleus is a derived from a viral factory (e.g. Forterre and Raoult 2017). 185

186

Amongst the phylogenetically diverse NCLDV viruses, the Mimiviridae appear to be the only group 187

that establishes a solely cytoplasmic viral factory and possesses the eIF4E gene. Thus by analogy 188

with the proposal that the presence of Crenactins, the ESCRT-III complex, a family of small Ras-like 189

GTPases and a ubiquitin system make Lokiarchaeota a plausible direct descendent of an archaeal 190

ancestor of the eukaryotes (Koonin 2015) it is proposed here that ability to construct a viral factory 191

that uncouples transcription from translation, the possession of the m7G capping apparatus and the 192

presence of the eIF4E binding protein make the Mimiviridae a plausible direct descendant of a viral 193

ancestor of the eukaryotic nucleus. It is proposed here that this common ancestor of the 194

Mimiviridae and the eukaryotic nucleus was the First Eukaryotic Nuclear Ancestor (FENA). To test 195

this hypothesis, phylogenetic analysis was performed on the largest subunit of RNAP-II which is 196

required for synthesis of mRNA destined for capping; the capping apparatus which are required to 197

add the m7G cap to eukaryotic mRNA, and the eIF4E gene which is required to initiate translation of 198

capped mRNA in the cytoplasm (see Figure 2). 199

200

201

Results 202

203

The closest known archaeal relative of the eukaryotes shows no evidence of the eukaryotic genes 204

required to prime the uncoupling of transcription from translation in eukaryotes 205

206

To confirm the extent to which the closest archaeal relatives of the eukaryotes lack homologues of 207

the eukaryotic cap based system for uncoupling of transcription from translation, the genome of 208

Heimdallarchaea LC-3 (formerly Loki3, (Spang et al. 2015; Spang et al. 2018)) was Blast searched to 209

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 9: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

9

identify homologues of four of the Saccharomyces cerevisiae genes required to uncouple 210

transcription from translation. 211

212

RPO21 of S. cerevisiae was used to identify homologues of RNAP-II in Heimdallarchaea LC-3. RPO21 213

was chosen because it is the largest subunit of RNAP-II in S. cerevisiae, and it encodes the C Terminal 214

Domain (CTD) containing the heptapeptide repeat (YSPTSPS) that recruits the capping enzymes to 215

the nascent mRNA transcript and is intricately involved in further processing of capped mRNA 216

(McCracken et al. 1997). The Homo sapiens genome was also searched for homologues to illustrate 217

the level of homology of these genes in distantly related descendants of LECA. Despite the large 218

evolutionary distance between the yeast and humans, three different homologues of RPO21 with 219

very significant E values were identified in H. sapiens (Table 1). These three correspond to RNA-I, 220

RNA-II and RNA-III which are are present in all eukaryotes (Sentenac 1985). By contrast in 221

Heimdallarchaeota LC-3, only a single RNA polymerase subunit A’ was identified as a homologue. 222

This is consistent with Heimdallarchaoeta LC-3 possessing a prokaryotic transcription system where 223

all RNA is transcribed by the same RNA polymerase (Werner 2007). 224

225

The capping apparatus in eukaryotes requires a TPase, a GTase and an MTase. Since the TPase gene 226

in eukaryotes apparently arose from two phylogenetically different origins (Ramanathan et al. 2016; 227

Kyrieleis et al. 2014), only the GTase and MTase genes required for constructing the m7G cap were 228

used to search the Heimdallarchaea LC-3 genome. Using the CEG1 (GTase) of S. cerevisiae to search 229

for homologues in H. sapiens identifies the human GTase with a very significant E value. By contrast, 230

although some putative homologues with non-significant E values were identified with Blast in 231

Heimdallarchaeota LC-3, only one of these showed homology with the known domain structure of 232

the GTase. This gene was an ATP Ligase (Table 1), a group known to share homology with the 233

GTases (Shuman and Schwer 1995). Using the ABD1 (MTase) gene of S. cerevisiae identifies 234

homologs with significant E values in both humans and Heimdallarchaeota LC-3. However it is 235

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 10: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

10

known that the methyltransferase domain of the capping enzyme shares homology with a wide 236

family of methyltransferases, and according to the annotated genome of the Heimdallarchaeota LC3, 237

the gene detected in this search shares affinity with the Trans-aconitate 2- methytransferases rather 238

than the capping MTase. 239

240

Using the S. cerevisiae eIF4E gene (CDC33) to search for homologues in H. sapiens identifies the 241

human eIF4E with a very significant E value (Table 1). By contrast, no homologues eIF4E with 242

significant E values were found in the Heimdallarchaeaota LC-3 genome. Furthermore, none of the 243

genes with even low degrees of homology detected in Heimdallarchaeota LC-3 possessed the 244

conserved sites that are known to be involved when eIF4E binds the m7G cap (Marcotrigiano et al. 245

1997). 246

247

These results are consistent with the Asgard archaea being authentically archaeal in design and thus 248

lacking a nucleus, the defining feature of the eukaryotic domain (Stanier and Van Niel 1962). The 249

absence of any sign of a nucleus in Asgard archaea and the sudden appearance of the nucleus in 250

LECA is strikingly similar to the sudden appearance of the mitochondrion in the eukaryotic lineage. 251

Due to fundamental similarities between the mitochondria and alpha-proteobacteria, the abrupt 252

appearance of the mitochondrion in LECA is widely accepted to be the result of endosymbiosis 253

between a bacterium and the ancestor of the eukaryotes (e.g. Lang et al. 1999). The similar abrupt 254

appearance of a highly complex nucleus in LECA in consistent with an endosymbiotic origin, but the 255

nucleus is clearly not of prokaryotic cellular origin since it lacks an obvious homologue or precursor 256

among prokaryotes and is primarily an information processing organelle (Martin 1999; Martin 2005). 257

258

Mimiviral and eukaryotic RNAP-II, Gtase, MTase and eIF4E form two discrete monophyletic groups 259

260

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 11: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

11

Unlike any known prokaryotes, members of the Mimiviridae construct nucleus-like viral factories in 261

the cytoplasm of their hosts that separate transcription from translation (Figure 3). They also 262

possess a functional mRNA capping pathway that is homologous to the pathway utilised by the 263

eukaryotes to prime the uncoupling of transcription from translation. This pathway includes an 264

RNAP dedicated to mRNA synthesis, a TPase, GTase and MTase required to add the m7G cap to the 265

mRNA, and eIF4E, the cytoplasmic cap binding protein required to initiate translation of the capped 266

transcript in the cytoplasm. 267

268

In this phylogenetic analysis members of the eukaryotic domains were carefully selected to cover 269

the major eukaryotic supergroups and thus span the diversity of the eukaryotic domain (see 270

Materials and Methods). Where possible eukaryotic clades were chosen that contained at least one 271

member that has been studied in depth at a molecular level and where experimental knowledge of 272

the processes of transcription and translation exists. In addition, all phylogenetic analysis uses the 273

same organisms, and only species with complete genomes where all genes (RNAP-II, GTase, MTase 274

and eIF4E) could be unambiguously identified were used in tree construction. 275

276

As shown Figure 4a the unrooted phylogenetic tree of the RNAP largest subunit resolves into two 277

discrete monophyletic clades: the eukaryotes which descend from LECA, and the Mimiviridae that 278

descend from the common ancestor of the Mimiviridae. Despite the more limited phylogenetic 279

information contained in the GTase, MTase and eIF4E alignments, similar patterns are observed to 280

the RNAP tree, and the monophyly of the eukaryotes and the Mimiviridae is maintained in each 281

case. Concatenating the four genes (Fig. 4e) generates a phylogenetic tree with bootstrap values 282

higher than any of the individual trees suggesting that the four genes have a common phylogenetic 283

signal. Within the eukaryotic domain, clades corresponding to Holozoa, Ameobozoa, Fungi, 284

Viridiplantae, Alveolata and Excavata were well resolved with high support. These results are 285

consistent with studies that show LECA possessed a functional eukaryotic nucleus (Neumann et al. 286

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 12: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

12

2010) and that all four eukaryotic genes identified as critical in uncoupling transcription from 287

translation primed by the m7G cap descend from a common ancestral set of genes that were 288

present in LECA. The Mimiviridae also belong to well supported monophyletic group suggesting that 289

all four genes were also present in the common viral ancestor of the Mimiviridae. The Mimiviridae 290

resolved into three clades that generally correspond to those previously described in the 291

Mimiviridae (Claverie and Abergel 2018). 292

293

The eukaryotic RNAP-II dedicated to capping mRNA shares a common ancestor with the 294

Mimiviridae RNAP, and the common ancestor predates the origin of LECA 295

296

Although all the phylogenetic trees in Figure 4 have been drawn with a root between the viral and 297

eukaryotic versions of the genes, establishing the root of the MTase, GTase and eIF4E phylogenetic 298

trees is challenging since the capping apparatus is unique to the eukaryotic domain. Thus only 299

paralogues of these three genes exist outside the eukaryotes and the NCLDV viruses making it 300

difficult to establish informative outgroups. In addition, despite being conserved, these genes are 301

short and thus possess relatively little phylogenetic information. By contrast, the RNAP gene is a 302

large phylogenetically informative gene that is found in all cellular domains. Since independent 303

fossil evidence suggests that domain Archaea existed some two billion years before the appearance 304

of LECA (Knoll 2015), and the eukaryotes apparently descend from a particular branch of the archaea 305

(Spang et al. 2015), the RNAP large subunit is a suitable outgroup that can polarise the relationship 306

between the eukaryotic RNAP-II and Mimiviral RNA polymerases. An additional advantage of the 307

RNAP based tree is that all eukaryotes possess multiple RNAP’s (Sentenac 1985). Since these 308

multiple RNAP’s were present in LECA, these can be used in concert with the archaeal sequences to 309

firmly establish the root of the RNAP tree. Since both logic and the phylogenetic analysis performed 310

here show that the RNAP, GTase, MTase and eIF4E genes are part of a co-evolving module 311

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 13: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

13

responsible for producing and translating capped mRNA, it can be argued that establishing the root 312

of the RNAP tree can be used to deduce the phylogeny of the entire capping apparatus. 313

314

As shown in Figure 5, using the archaeal RNAP subunit A’ and the homologous region of the 315

eukaryotic RNAP-III to polarise the relationship between RNAP-II and the Mimiviridae RNAP shows 316

that both the eukaryotic and Mimiviral genes descend from a common ancestral gene that predated 317

the origin of LECA. The high bootstrap values give confidence that there is significant phylogenetic 318

information in the alignment. In addition, both subtrees of the eukaryotic RNAP genes recapitulate 319

the expected phylogenetic relationships between the eukaryotes, including the establishing the 320

Excavata as the most divergent eukaryotic supergroup (Hampl et al. 2009). Furthermore, within the 321

eukaryotic domains, all the chosen eukaryotes were assigned to their accepted branches. A 322

parsimonious explanation of the observed tree is that the ability to produce m7G capped mRNA was 323

a feature of the ancestor of both the eukaryotic RNAP- II and Mimiviridae RNA polymerase since 324

both the eukaryotic and viral genes produce capped mRNA, whilst neither RNAP-III nor the Archaeal 325

RNAP is associated with producing capped mRNA. Although other interpretations may be possible, 326

the tree is entirely consistent with descent of the eukaryotic nucleus and the Mimiviridae from an 327

ancient viral factory that could produce capped mRNA, a defining, core component of the apparatus 328

required to uncouple transcription from translation by the eukaryotic nucleus that has not been 329

observed in the archaeal relatives of the eukaryotes. 330

331

332

Discussion 333

334

Here it is shown that the apparatus used by eukaryotic nuclei to produce and translate capped 335

mRNA is not found in the closest archaeal relatives of eukaryotes. This is significant since in the 336

eukaryotic nucleus, the uncoupling of transcription from translation requires a complex highly 337

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 14: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

14

evolved pathway consisting of hundreds of genes acting in concert (Figure 2) and the m7G cap is 338

critical to this pathway since it is used to prime mRNA for processing, nuclear export, and 339

cytoplasmic translation (Figure 2). The absence of the m7G apparatus implies that the highly 340

complex pathway for uncoupling transcription from translation is also absent from archaeal relatives 341

of the eukaryotes. This presents a major biological paradox since such a complex pathway 342

incorporating the concerted action of hundreds of genes unique to the eukaryotic domain implies a 343

long evolutionary history, yet no sign of the pathway is found in the closest archaeal relatives of the 344

eukaryotes. 345

346

Although the appartus for producing capped mRNA is absent from the archaeal relatives of the 347

eukaryotes, the apparatus is present in the Mimiviridae which is consistent with the postulates of 348

the VE hypothesis. Phylogenetic analysis performed here demonstrates that viral and eukaryotic 349

genes form discrete monophyletic clades, and that both viral and eukaryotic clades descend from a 350

common ancestor that existed prior to the appearance of LECA. This pattern is consistent with 351

proposal that the eukaryotic nucleus and the Mimiviridae both descend from a First Eukaryotic 352

Nuclear Ancestor (FENA). 353

354

Prior to the discovery of the nucleus-like viral factory of 201 Φ2-1, the ability to uncouple 355

transcription from translation was thought to be an exclusive innovation of the eukaryotic nucleus. 356

Thus, arguments could be made that the viral factory of the Mimiviruses had evolved by borrowing 357

genes from the nucleus to allow it to establish the eukaryotic uncoupling of transcription from 358

translation. However, since 201 Φ2-1 infects bacteria it seems very unlikely that it obtained its 359

ability to build a viral factory and uncouple transcription from translation from the eukaryotes, but 360

rather indicates this ability has evolved in prokaryotic viruses as part of their replication cycle. Thus 361

the discovery of 201 Φ2-1 demonstrates that uncoupling of transcription from translation is most 362

likely a viral innovation, and since prokaryotes existed billions of years before the origin of the 363

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 15: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

15

eukaryotes, viral factories potentially existed for billions of years before the origin of LECA. Studies 364

on a relative of phage 201 Φ2-1 (PhiKZ), show the viral factory appears to shield phage DNA from 365

host immune systems including the CrispR cas system (Mendoza et al. 2018). Thus viral factories 366

may have evolved to provide biological protection from various anti-phage systems possessed by 367

prokaryotic hosts (Hendrickson and Poole 2018). 368

369

The modern nucleus is clearly differentiated from any member of the Mimiviridae by its ability to 370

construct a fully functional translational system including ribosomes. In the absence of their own 371

translational machinery, all known viruses are dependent upon their host’s translation machinery to 372

produce polypeptides required for their own reproduction. Thus all mRNAs produced by viruses 373

accordingly engage cellular ribosomes to ensure translation (Jan et al. 2016). However, when the 374

Mimivirus was first discovered the “most unexpected discovery was the presence of numerous 375

genes encoding central protein-translation components” (Raoult et al. 2004). The discovery of the 376

Klosneuvirus increased the number of translation related genes found in viruses to levels that far 377

exceeds that seen in the original Mimivirus (Schulz et al. 2017) and sequencing of the Tupanvirus 378

genome revealed that some members of the Mimiviridae possess a translation associated gene set 379

that ‘only lacks the ribosome’ (Abrahão et al. 2018). Amongst this set of translational genes is up to 380

70 tRNA, 20 aaRS, 11 factors for all translation steps and factors related to tRNA/mRNA maturation 381

and ribosome protein modification (Abrahão et al. 2018). Since it appears that the ancestor of the 382

Mimiviridae did not possess all these functions, the appearance of so many translation related genes 383

in viruses such as the Kloseneuvirus and the Tupanvirus suggests that they acquired these 384

components of the eukaryotic translational machinery via a piecemeal capture process (Schultz et al. 385

2017). 386

387

The expanded viral repertoire of translational genes found in the Tupanvirus and Klosneuvirinae 388

suggest that there is selective pressure to acquire these genes in some branches of the Mimiviridae. 389

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 16: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

16

This capture process may also be acting amongst the modern giant phage where captured ribosomal 390

genes appear to be part of the mechanism(s) by which phage direct the host translation apparatus 391

to selectively translate viral mRNA (Al-Shayeh et al. 2019). If the VE hypothesis is valid and a similar 392

capture process was operating before the origin of the first eukaryotes, the translational apparatus 393

acquired by FENA could only have been captured from prokaryotic cells. Since FENA is proposed to 394

have infected an Asgardian ancestor, many of the translation related genes would be derived from 395

its archaeal Asgardian host and directed to enhancing translation of the viral transcripts by the host’s 396

archaeal translational system. Consistent with this proposal, it is known that eukaryotic nuclei 397

possess a core set of archaeal related translation initiation factors including eIF1A, eIF2, eIF2B, 398

eIF4A, eIF5B and eIF6 (Jagus et al. 2012), and a core set of eukaryotic specific initiation factors (eIF5, 399

eIF4E, eIF4G, eIF4B, eIF4H and eIF3) (Jagus et al. 2012). With the exception of eIF5, all these 400

eukaryotic specific initiation factors are involved with 5’-cap-binding and scanning processes 401

required for translation of capped eukaryotic mRNA (Jagus et al. 2012). Furthermore in the process 402

of evolving into the nucleus, a viral ancestor of the nucleus must have acquired the ability to 403

synthesise uncapped rRNA and tRNA, and thus a part of the transition into a fully autonomous 404

nucleus would have been the capture by the virus of second and third RNA polymerase dedicated to 405

the synthesis of non-capped RNA associated with functioning of the ribosomes. 406

407

If the VE hypothesis can be accepted, the descent of the nucleus from a viral factory provides a 408

plausible resolution to several of the major paradoxes associated with the origin of the nucleus. 409

That is, if the nucleus descends from a viral factory and the viral factory set up by FENA was similar 410

in structure and function to the 201 Φ2-1 and Mimiviridae viral factories (Figure 3), the VE 411

hypothesis explains why the nucleus is mainly an information containing and processing 412

compartment, why it’s boundary selectively controls the entry and exit of proteins and nucleic acids, 413

why it exports mRNA into the cytoplasm, why it contains no functional ribosomes, why it possesses 414

linear rather than circular chromosomes, why it is positioned in the cell by the tubulin cytoskeleton, 415

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 17: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

17

and as explored in this paper, why the eukaryotes possess highly evolved complex machinery to 416

allow uncoupling of transcription from translation with no prokaryotic precedents. It also provides a 417

rationale for the neo-functionalisation of RNA polymerases in the eukaryotes since the viral factory 418

introduces its own RNA polymerase specifically dedicated to the transcription of capped viral mRNA 419

destined for translation in the cytoplasm. The origin of the nucleus from a viral ancestry has also 420

been shown to provide a plausible mechanistic model for the origin of mitosis, meiosis and the 421

sexual cycle (Bell 2006, Bell 2013), a problem described as the queen of evolutionary problems (Bell 422

1982). Thus the origin of the nucleus from a viral factory addresses many of the challenges required 423

to explain the apparently abrupt appearance of a fully formed and functional nucleus in LECA, 424

despite its complete absence from bona-fide archaeal relatives such as members of the Asgard 425

archaea. 426

427

It should be noted that the VE hypothesis is not a pure ‘endosymbiotic theory’. According to the VE 428

hypothesis (Bell 2001), the eukaryotic cell is descended from an archaeal ancestor of the eukaryotic 429

cytoplasm, a bacterial ancestor of the mitochondrion, and as explored in this paper, a viral ancestor 430

of the nucleus. Although the archaeal ancestor of the cytoplasm may had a mutually beneficial 431

symbiotic relationship with a bacterium leading to the origin of the mitochondria, the host archaeon 432

did not gain any benefit from the viral infection, rather the archaeon host was enslaved by the virus 433

and its genome was ultimately destroyed. 434

435

However, like the endosymbiotic theories for the origin of the mitochondria and the chloroplasts, 436

the VE hypothesis deals with complex irreversible events that are difficult to directly test (Margulis 437

1975). In the case of the mitochondria, it took nearly 100 years before the consilience of evidence 438

built up sufficiently for the endosymbiotic origin the mitochondria to become (almost) universally 439

accepted. Although a more radical concept than endosymbiosis, if the VE hypothesis is similarly 440

supported by the accumulation of multiple lines of evidence, it will introduce a major paradigm shift 441

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 18: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

18

in our understanding of the evolution of complex life on earth. In particular, if the VE hypothesis is 442

ultimately accepted, it implies that the eukaryotic cell derives from a consortium of three organisms 443

that became integrated to such an extent that they created an emergent ‘super-organism’. The 444

novel features of this emergent ‘super-organism’ allowed it to escape the limitations of prokaryotic 445

evolution and evolve to levels of unprecedented organismal complexity. 446

447

448

Materials and Methods 449

450

Choice of eukaryotic organisms 451

i) Eukaryotes 452

The organisms used in this study were carefully selected to cover all the relevant groupings of 453

eukaryotes, whilst limiting the complexity of the phylogenetic analysis. Currently 5 or 6 eukaryotic 454

supergroups are proposed to cover the vast majority of eukaryotic diversity (Hampl et al. 2009). The 455

present study focussed on ‘model’ organisms for the phylogenetic trees so that there was significant 456

knowledge of their molecular biology of at least one or more of the divisions. To represent the 457

Holozoa, Homo sapiens, Mus musculus, Danio rerio and Caenorhabditis elegans were chosen since 458

each is a model organism, and the phylogenetic relationships are well established. To represent 459

Amoebozoa, Dictyostelium disocoidium was chosen since it is a model organism. Dictyostellium 460

purpurem and Acytostelium subglosum were chosen as suitably distant relatives. To represent the 461

Fungi, Saccharomyces cerevisiae, Kluyveromyces marxianus and Aspergillus niger were chosen since 462

all three are model organism and the phylogenetic relationships are well understood. To represent 463

Viridiplantae, Arabidopsis lyrata was chosen as a model species and Brassica napus was chosen as a 464

relatively close relative. Ostreococcus tauri was chosen as a distant algal relative of the land plants. 465

To represent the SAR group, focus was placed on the Alveolata group since members such as 466

Plasmodium and Cryptosporidium have been studied in depth at a molecular level. To ensure the 467

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 19: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

19

robustness of the tree and limit the effects of long-branch attraction, Plasmodium falciparum and 468

Plasmodium vivax were chosen as close relatives whilst Theileria equi strain WA, Cryptosporidium 469

muris, and Perkinsus marinus were chosen as increasingly distantly related members of the 470

Alveolata. To represent the Excavata, members of the Trypanosoma were chosen since they are 471

model organisms that have been studied in depth at molecular level. To ensure robustness of the 472

tree and to minimise the effects of long-branch attraction, Trypanosoma cruzi cruzi and 473

Trypanosoma rangeli were chosen as close relatives whilst Leishmania mexicana, Leptomonas 474

seymouri and Bodo saltans were chosen as increasingly distantly related members of the Excavata. 475

The organisms listed above include members of all 5 or 6 major clades. In addition, complete 476

genomes are available for each of the organisms listed, ensuring that the phylogenetic trees 477

included exactly the same organisms. 478

479

ii) Mimiviridae 480

Only members of the Mimiviridae containing clear homologues to RNAP, GTase, MTase and eIF4E 481

were chosen for analysis. Based on phylogenetic analysis by Claverie and Abergel, 2018, the 482

following viruses were chosen to represent three informal groupings of the Mimiviridae. 483

Mesomimivirinae: Tetraselmis virus, Chrysochromomulina ericina virus and Phaeocystis globosa 484

virus. Klosnuevirinae: Klosneuvirus, Catovirus, Indivirus and Bodo saltans virus. Megavirinae: 485

Acanthamoeba polyphaga mimivirus, Powai lake megavirus, Moumouvirus australiensis, 486

Acanthamoeba polyphaga moumouvirus, Tupanvirus deep ocean and Tupanvirus soda lake. 487

Cafeteria roenbergensis virus (CroV) is basal to the Klosenuvirinae and Megavirinae and does not 488

appear to have other close relatives available yet. 489

490

Choice of sequences 491

i) RNAP subunits 492

493

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 20: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

20

Of the three RNA polymerases in eukaryotes, the RNAP-II is the one intimately associated with the 494

capping of transcripts. The largest RNAP-II subunit possesses a carboxy terminal domain (CTD) 495

consisting of a heptapeptide repeat region that is involved in mRNA processing including capping, 496

splicing and polyadenylation (McCracken et al. 1997). Homologues of RPO21, the largest subunit of 497

RNAP-II of S. cerevisiae were identified. With the exception of the members of the Excavata and 498

Perkinus marinus, a CTD heptapeptide repeat region was readily identified in all RNAP-II subunits 499

used in the phylogenetic analysis. Although the heptapeptide repeat is absent from the Excavata 500

studied, Trypanosoma RNAP-II genes possesses a non-canonical C-terminal extension (Smith et al. 501

1989). As a result, the Trypanosoma cruzi cruzi RNAP-II was used to identify RNAP-II homologues in 502

the Excavata clade. In the Mimiviridae only one homologue of the largest subunit of RNAP-II was 503

detected. 504

505

ii) GTase and MTase 506

Although three enzymatic functions are universally required to produced capped mRNA (Kyrieleis et 507

al. 2014), only the GTase and MTase are monophyletic in eukaryotes, with the TPase apparently 508

originating from two independent sources (Ramanathan et al. 2016; Kyrieleis et al. 2014). In S. 509

cerevisiae and most other unicellular eukaryotes such as Alveolata all three functions are encoded 510

by separate genes. In both Holozoa and Viridiplantae the TPase and GTase are encoded in the same 511

polypeptide. In Excavata, two capping complexes are present (Takagi et al. 2007). Of these, the 512

gene encoding both the GTase and MTase in the same polypeptide is essential for growth and 513

adding the m7G cap and was thus chosen for phylogenetic analysis. In the Mimiviridae, all three 514

functions are present in the same polypeptide. 515

516

iii) eIF4E 517

Although in the yeast Saccharomyces cerevisiae, there is only one eIF4E gene, the core role of eIF4E 518

in protein translation has meant that in higher eukaryotes several paralogous eIF4E genes have 519

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 21: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

21

evolved that encode distinctly featured proteins. In addition to regular translation initiation, these 520

paralogues are involved in the preferential translation of particular mRNAs or are tissue and/or 521

developmental stage specific. For example, eight such genes have been found in Drosophila and five 522

in Caenorhabditis (Frydryskova et al. 2018). In humans where there are multiple paralogues, the 523

three isoforms of eIF4E1 bring the mRNAs to the ribosome via an interaction with scaffold protein 524

eIF4G (Frydryskova et al. 2018). As a result, in this study the human eIF4E1 isoform was used to 525

conduct blast searches of Holozoa, and the hits with the highest blast score were taken for 526

phylogenetic analysis. In Arabidopsis, the EIF4E1 is expressed in all tissues except in the cells of the 527

specialization zone of the roots whereas the At.EIF4E2 mRNA is particularly abundant in floral organs 528

and in young developing tissues (Rodriguez et al. 1998). The Arabidopsis EIF4E1 gene was thus used 529

in blast searches of plants, and the genes with the highest homology taken for phylogenetic analysis. 530

Where molecular knowledge was insufficient for such rational sequence selection, the homologue 531

with the highest homology to the Saccharomyces gene was identified, and provided that the gene 532

possessed regions equivalent to the structurally important regions that bind to the m7G cap 533

(Marcotrigiano et al. 1997), this gene was used to identify the closest homologues within the 534

supergroup. With the exception of the Tupanviruses, the Mimiviridae were found to encode only 535

one eIF4E homologue. In the case of the Tupanvirus, two eIF4E homologues were identified. In this 536

case, only one of the homologues was included in the phylogenetic analysis. The homologue with 537

the highest homology to the Mimivirus homologue was used in both cases. 538

539

Phylogenetic analysis 540

Homology searches were carried out using the BLASTp and psi-BLAST algorithms (Altschul et al. 541

1997). MEGA7 (Kumar et al. 2016) was used for all phylogenetic analysis. Unless otherwise stated 542

all program parameters for homology searching and domain identification were left at their 543

respective defaults. Protein alignments were performed using MUSCLE. Once alignments were 544

completed for all organisms for a particular alignment, the alignments were trimmed and used for 545

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 22: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

22

tree construction. The evolutionary histories were inferred by using the Maximum Likelihood 546

method based on the JTT matrix-based model (Jones et al. 1992). All bootstrap consensus trees were 547

inferred from 1000 replicates (Felsenstein 1987) and is taken to represent the evolutionary history of 548

the taxa analyzed (Felsenstein 1987). Branches corresponding to partitions reproduced in less than 549

51% bootstrap replicates were collapsed. The percentage of replicate trees in which the associated 550

taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches 551

(Felsenstein 1987). Initial tree(s) for the heuristic search were obtained automatically by applying 552

Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, 553

and then selecting the topology with superior log likelihood value. 554

555

556

References 557

558

1. Abrahão J, Silva L, Silva LS, Khalil JYB, Rodrigues R, Arantes T, Assis F, Boratto P, Andrade M, 559

Kroon EG, Ribeiro B, Bergier I, Seligmann H, Ghigo E, Colson P, Levasseur A, Kroemer G, 560

Raoult D, La Scola B. 2018. Tailed giant Tupanvirus possesses the most complete 561

translational apparatus of the known virosphere. Nat Commun. 9:749. 562

2. Al-Shayeb B et al. Clades of huge phage from across Earth’s ecosystems. BioRxiv [Preprint] 563

March 11, 2019 [Cited 19/06/2019] Available from: https://doi.org/10.1101/572362 564

3. Battistuzzi FU, Feijao A, Hedges SB. 2004. A genomic timescale of prokaryote evolution: 565

insights into the origin of methanogenesis, phototrophy, and the colonization of land. BMC 566

Evol Biol. 4:44. 567

4. Bell G. 1982. The Masterpiece of Nature: The Evolution and Genetics of Sexuality. London: 568

Croom Helm. p19 569

5. Bell PJ. 2001. Viral eukaryogenesis: was the ancestor of the nucleus a complex DNA virus? J 570

Mol Evol. 53(3):251-256. 571

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 23: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

23

6. Bell PJ. 2006. Sex and the eukaryotic cell cycle is consistent with a viral ancestry for the 572

eukaryotic nucleus. J Theor Biol. 243(1):54-63. 573

7. Bell PJ. 2013. Meiosis: Its Origin According to the Viral Eukaryogenesis Theory. In: Bernstein 574

C, Bernstein M. editors. Meiosis. Intechopen. P. 77-99. 575

8. Benelli D, Londei P. 2011. Translation initiation in Archaea: conserved and domain-576

specific features. Biochem Soc Trans. 39(1):89-93. 577

9. Boyer M, Madoui MA, Gimenez G, La Scola B, Raoult D. 2010. Phylogenetic and phyletic 578

studies of informational genes in genomes highlight existence of a 4 domain of life including 579

giant viruses. PLoS One. (12):e15530. 580

10. Ceyssens PJ, Minakhin L, Van den Bossche A, Yakunina M, Klimuk E, Blasdel B, De Smet J, 581

Noben JP, Bläsi U, Severinov K, Lavigne R. 2014. Development of giant bacteriophage ϕKZ is 582

independent of the host transcription apparatus. J Virol. (18):10501-10510. 583

11. Chaikeeratisak V, Nguyen K, Egan ME, Erb ML, Vavilina A, Pogliano J. 2017. The phage 584

nucleus and tubulin spindle are conserved among large Pseudomonas phages. Cell Rep. 585

20(7):1563-1571. 586

12. Chaikeeratisak V, Nguyen K, Khanna K, Brilot AF, Erb ML, Coker JK, Vavilina A, Newton GL, 587

Buschauer R, Pogliano K, Villa E, Agard DA, Pogliano J. 2017. Assembly of a nucleus-like 588

structure during viral replication in bacteria. Science. 355(6321):194-197. 589

13. Claverie JM, Abergel C. 2018. Mimiviridae: An expanding family of highly diverse large 590

dsDNA viruses infecting a wide Phylogenetic range of aquatic eukaryotes. Viruses. 10(9): 591

506. 592

14. Eme L, Spang A, Lombard J, Stairs CW, Ettema TJG. 2017. Archaea and the origin of 593

eukaryotes. Nat Rev Microbiol. 15(12):711-723. 594

15. Felsenstein J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. 595

Evolution 39:783-791. 596

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 24: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

24

16. Forterre P, Raoult D. 2017. The transformation of a bacterium into a nucleated virocell 597

reminds the viral eukaryogenesis hypothesis. Virologie. 21(4):28-30. 598

17. Fridmann-Sirkis Y, Milrot E, Mutsafi Y, Ben-Dor S, Levin Y, Savidor A, Kartvelishvily E, Minsky 599

A. 2016. Efficiency in complexity: composition and dynamic nature of Mimivirus replication 600

factories. J Virol. 90(21):10039–10047. 601

18. Frydryskova K, Masek T, Borcin K, Mrvova S, Venturi V, Pospisek M. 2016. Distinct 602

recruitment of human eIF4E isoforms to processing bodies and stress granules. BMC Mol 603

Biol. 17(1):21. 604

19. Hampl V, Hug L, Leigh JW, Dacks JB, Lang BF, Simpson AG, Roger AJ. 2009. Phylogenomic 605

analyses support the monophyly of Excavata and resolve relationships among eukaryotic 606

"supergroups". Proc Natl Acad Sci U S A. 106(10):3859–3864. 607

20. Hendrickson HL, Poole AM. 2018. Manifold routes to a nucleus. Front Microbiol. 9:2604. 608

21. Hernández G, Proud CG, Preiss T, Parsyan A. 2012. On the diversification of the translation 609

apparatus across eukaryotes. Comp Funct Genomics . 2012: 256848. 610

22. Iranzo J, Puigbò P, Lobkovsky AE, Wolf YI, Koonin EV. 2016. Inevitability of genetic 611

parasites. Genome Biol Evol. 8(9):2856–2869. 612

23. Iyer LM, Aravind L, Koonin EV. 2001. Common origin of four diverse families of large 613

eukaryotic DNA viruses. J Virol. 75(23):11720–11734. 614

24. Jagus R, Bachvaroff TR, Joshi B, Place AR. 2012. Diversity of Eukaryotic Translational Initiation 615

Factor eIF4E in Protists. Comp Funct Genomics. 2012:134839. 616

25. Jan E, Mohr I, Walsh D. 2016. A cap-to-tail guide to mRNA translation strategies in virus-617

infected cells. Annu Rev Virol. 3(1):283-307. 618

26. Jones DT, Taylor WR, Thornton JM. 1992. The rapid generation of mutation data matrices 619

from protein sequences. Comput Appl Biosci. 8: 275-282. 620

27. Kabachinski G, Schwartz TU. 2015. The nuclear pore complex--structure and function at a 621

glance. J Cell Sci. 128(3):423-429. 622

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 25: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

25

28. Katahira J. 2015. Nuclear export of messenger RNA. Genes 6(2):163-184. 623

29. Kazlauskas D, Venclovas C. 2011. Computational analysis of DNA replicases in double-624

stranded DNA viruses: relationship with the genome size. Nucleic Acids Res. (19):8291-305. 625

30. Knoll AH. Paleobiological perspectives on early microbial evolution. 2015. Cold Spring Harb 626

Perspect Biol. 7(7):a018093. 627

31. Koonin EV, Dolja VV, Krupovic M. 2015. Origins and evolution of viruses of eukaryotes: The 628

ultimate modularity. Virology. 479-480:2–25. 629

32. Koonin EV, Wolf YI, Katsnelson MI. 2017. Inevitability of the emergence and persistence of 630

genetic parasites caused by evolutionary instability of parasite-free states. Biol Direct. 631

12(1):31. 632

33. Koonin EV, Yutin N. 2010. Origin and evolution of eukaryotic large nucleo-cytoplasmic DNA 633

viruses. Intervirology. 53(5):284–292. 634

34. Koonin EV. 2015. Archaeal ancestors of eukaryotes: not so elusive any more. BMC Biol. 635

13:84. 636

35. Kumar S, Stecher G, Tamura K. 2016. MEGA7: Molecular evolutionary genetics analysis 637

version 7.0 for bigger datasets. Mol Biol Evol. 33:1870-1874. 638

36. Kyrieleis OJ, Chang J, de la Peña M, Shuman S, Cusack S. 2014. Crystal structure of vaccinia 639

virus mRNA capping enzyme provides insights into the mechanism and evolution of the 640

capping apparatus. Structure. 22(3):452–465. 641

37. Lang BF, Gray MW, Burger G. 1999. Mitochondrial genome evolution and the origin of 642

eukaryotes. Annu Rev Genet. 33:351-97. 643

38. Marcotrigiano J, Gingras AC, Sonenberg N, Burley SK. 1997. Co-crystal structure of the 644

messenger RNA 5' cap-binding protein (eIF4E) bound to 7-methyl-GDP. Cell. 89(6):951-61. 645

39. Margulis L. 1975. Symbiotic theory of the origin of eukaryotic organelles; criteria for proof. 646

Symp Soc Exp Biol. (29):21-38. 647

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 26: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

26

40. Martin W. 1999. A briefly argued case that mitochondria and plastids are descendants of 648

endosymbionts, but that the nuclear compartment is not. Proc Biol Sci. 266(1426): 1387. 649

41. Martin W. 2005. Archaebacteria (Archaea) and the origin of the eukaryotic nucleus. Curr 650

Opin Microbiol. (6):630-637. 651

42. McCracken S, Fong N, Yankulov K, Ballantyne S, Pan G, Greenblatt J, Patterson SD, Wickens 652

M, Bentley DL. 1997. The C-terminal domain of RNA polymerase II couples mRNA processing 653

to transcription. Nature. 385(6614):357-361. 654

43. Mendoza SD, Berry JD, Nieweglowska ES, Leon LM, David A, Agard DA, Bondy-Denomy J. 655

2018. A nucleus-like compartment shields bacteriophage DNA from CRISPR-Cas and 656

restriction nucleases. BioRxiv [Preprint] July 17, 2018. [Cited 19/06/2019] Available 657

from: https://doi.org/10.1101/370791. 658

44. Mutsafi Y, Zauberman N, Sabanay I, Minsky A. 2010. Vaccinia-like cytoplasmic replication of 659

the giant Mimivirus. Proc Natl Acad Sci U S A. 107(13):5978–5982. 660

45. Nasir A, Kim KM, Caetano-Anolles G. 2012. Giant viruses coexisted with the cellular 661

ancestors and represent a distinct supergroup along with superkingdoms Archaea, Bacteria 662

and Eukarya. BMC Evol Biol. 12:156. 663

46. Neumann N, Lundin D, Poole AM. 2010. Comparative genomic evidence for a complete 664

nuclear pore complex in the last eukaryotic common ancestor. PLoS One. 5(10):e13241. 665

47. Okamura M, Inose H, Masuda S. 2015. RNA Export through the NPC in Eukaryotes. Genes 666

(Basel). 6(1):124–149. 667

48. Parfrey LW, Lahr DJ, Knoll AH, Katz LA. 2011. Estimating the timing of early eukaryotic 668

diversification with multigene molecular clocks. Proc Natl Acad Sci U S A. 108(33):13624–669

13629. 670

49. Philippe N, Legendre M, Doutre G, Couté Y, Poirot O, Lescot M, Arslan D, Seltzer V, Bertaux L, 671

Bruley C, Garin J, Claverie J-M, Abergel C. 2013. Pandoraviruses: Amoeba viruses with 672

genomes up to 2.5 Mb reaching that of parasitic eukaryotes. Science. 341:(6143) 281-286. 673

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 27: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

27

50. Ramanathan A, Robb GB, Chan SH. 2016. mRNA capping: biological functions and 674

applications. Nucleic Acids Res. 44(16):7511–7526. 675

51. Raoult D, Audic S, Robert C, Abergel C, Renesto P, Ogata H, La Scola B, Suzan M, Claverie JM. 676

2004. The 1.2-megabase genome sequence of Mimivirus. Science. 306(5700):1344-1350. 677

52. Rivera MC, Lake JA. 1992. Evidence that eukaryotes and eocyte prokaryotes are immediate 678

relatives. Science. 257:74–76. 679

53. Rodriguez CM, Freire MA, Camilleri C, Robaglia C. 1998. The Arabidopsis thaliana cDNAs 680

coding for eIF4E and eIF(iso)4E are not functionally equivalent for yeast complementation 681

and are differentially expressed during plant development. Plant J. (4):465-73. 682

54. Sapp J. 2005. The prokaryote-eukaryote dichotomy: meanings and mythology. Microbiol Mol 683

Biol Rev. 69(2):292–305. 684

55. Schulz F, Yutin N, Ivanova NN, Ortega DR, Lee TK, Vierheilig J, Daims H, Horn M, Wagner M, 685

Jensen GJ, Kyrpides NC, Koonin EV, Woyke T. 2017. Giant viruses with an expanded 686

complement of translation system components. Science. 356(6333):82-85. 687

56. Sentenac A. 1985. Eukaryotic RNA polymerases. Crit Rev Biochem. 18(1):31-90. 688

57. Shuman S, Schwer B. 1995. RNA capping enzyme and DNA ligase: a superfamily of covalent 689

nucleotidyl transferases. Mol Microbiol. 17(3):405-10. 690

58. Smith JL, Levin JR, Ingles CJ, Agabian N. 1989. In trypanosomes the homolog of the largest 691

subunit of RNA polymerase II is encoded by two genes and has a highly unusual C-terminal 692

domain structure. Cell. 56(5):815-27. 693

59. Spang A, Eme L, Saw JH, Caceres EF, Zaremba-Niedzwiedzka K, Lombard J, Guy L, Ettema TJG. 694

2018. Asgard archaea are the closest prokaryotic relatives of eukaryotes. PLoS Genet 14(3): 695

e1007080. 696

60. Spang A, Saw JH, Jørgensen SL, Zaremba-Niedzwiedzka K, Martijn J, Lind AE, van Eijk R, 697

Schleper C, Guy L, Ettema TJG. 2015. Complex archaea that bridge the gap between 698

prokaryotes and eukaryotes. Nature. 521:173-179. 699

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 28: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

28

61. Stanier RY, van Niel CB. 1962. The concept of a bacterium. Arch. Microbiol. 42:17-35. 700

62. Starr DA. 2009. A nuclear-envelope bridge positions nuclei and moves chromosomes. J Cell 701

Sci. 122(Pt 5):577–586. 702

63. Takagi Y, Sindkar S, Ekonomidis D, Hall MP, Ho CK. 2007. Trypanosoma brucei encodes a 703

bifunctional capping enzyme essential for cap 4 formation on the spliced leader RNA. J Biol 704

Chem. 282(22):15995-6005. 705

64. Werner F. 2007. Structure and function of archaeal RNA polymerases. Mol Microbiol. 706

65(6):1395-404. 707

65. Wojtus JK, Fitch JL, Christian E, Dalefield T, Lawes JK, Kumar K. Peebles CL, Altermann E, 708

Hendrickson HL. 2017. Complete genome sequences of three novel Pseudomonas 709

fluorescens SBW25 bacteriophages, Noxifer, Phabio, and Skulduggery. Genome 710

announcements. 5(31), e00725-17. 711

66. Yuan Y, Gao M. 2017. Jumbo bacteriophages: an overview. Front Microbiol. 8:403. 712

67. Yutin N. Koonin EV. 2012. Hidden evolutionary complexity of Nucleo-Cytoplasmic Large DNA 713

viruses of eukaryotes. Virol J. 9: 161. 714

68. Yutin, N, Wolf, YI, Raoult D, Koonin EV. 2009. Eukaryotic large nucleocytoplasmic DNA 715

viruses: clusters of orthologous genes and reconstruction of viral genome evolution. Virol J. 716

6:223. 717

69. Zauberman N, Mutsafi Y, Halevy DB, Shimoni E, Klein E, Xiao C, Sun S, Minsky A. 2008. 718

Distinct DNA exit and packaging portals in the virus Acanthamoeba polyphaga mimivirus. 719

PLoS Biol. 6(5):e114. 720

721

722

723

724

725

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 29: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

29

Figures 726

727

728 729 730 Figure 1: The coupled prokaryotic system of transcription and translation. Both archaea and bacteria utilize 731 one type of multi-component RNA polymerase (RNAP) to transcribe all RNA (Werner 2007). Transcription and 732 translation in prokaryotes are coupled since transcription and translation occur directly in the protoplasm, and 733 thus translation initiation can occur before the mRNA transcript is fully synthesised. Translation in prokaryotes 734 relies on direct recognition of mRNA by the ribosomal apparatus via sequences such as the Shine-Dalgarno 735 sequences or short UTR’s (Benelli and Londei 2011). In the case of Shine-Dalgarno sites, the 30S ribosomal 736 subunit binds to the mRNA in such a way that AUG codon lies on the peptidyl (P) site and the second codon lies 737 on aminoacyl (A) site. The initiator tRNA binds to the P site, the large ribosomal subunit docks with the small 738 subunit, the initiation factors are released and the ribosome is ready to start translation. Since prokaryotes 739 originated 3.8 billion years ago (Battistuzzi et al. 2004) the coupled prokaryotic process predates the 740 uncoupled eukaryotic system by close to 2 billion years and thus is the most ancient cellular system of 741 transcription and translation. 742 743 744 745 746 747 748 749 750

751

752

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 30: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

30

753 754 Figure 2: The eukaryotic system to uncouple transcription from translation is complex and employs 755 hundreds of genes that act in concert. The dominant eukaryotic cap dependent system of transcription and 756 translation that was apparently present and fully functional in LECA (Neumann et al. 2010) is described below. 757 i) Subunits of RNAP-II are translated in the cytoplasm imported into the nucleus through the Nuclear Pore 758 Complex (NPC). RNAP-II initiates transcription of mRNA by binding to the promoter regions of protein coding 759 genes. ii) After the synthesis of the first 20 to 25 bp of mRNA, the polymerase pauses until the mRNA is 760 capped (Ramanathan et al. 2016, Okamura et al. 2015). The eukaryotic m7G cap (symbolised by ) consists of 761 7-methylguanosine linked via a reversed 5’-5’ triphosphate linkage to the transcript and is the first 762 modification made to RNAP-II transcribed RNA. Three enzymatic functions are required to generate the cap. 763 Firstly, a RNA 5’-phosphatase (TPase) hydrolyses the 5’-triphosphatase end of the nascent mRNA to generate a 764 5’-diphosphate. The 5’-diphosphate is then capped with guanosine mono-phosphate by a RNA 765 guanylyltransferase (GTase) to generate a 5’ GpppRNA cap on the transcript. Finally, the guanosine GpppRNA 766 cap is methylated by RNA (guanine-N7)-methyltransferase (MTase) (Kyrieleis et al. 2014). iii) The nuclear cap 767 binding complex (CBC) binds to the m7G cap which then forms a complex with snRNP’s to initiate splicing and 768 polyadenylation (Ramanathan et al. 2016). Splicing of mRNA transcripts is unique to the eukaryotes and 769 requires interaction of hundreds of proteins and the conserved snRNAs. iv) The m7G cap primes the mRNA for 770 transport through the nuclear pores into the cytoplasm (Katahira 2015) by binding trans-acting factors to form 771 a mature messenger ribonucleoprotein (mRNPs). Recruitment of the multisubunit TRanscription-EXport (TREX-772 1) complex requires the 5’ capping of pre-mRNA because CBP80 interacts with the QAlyRef and THO sub-773 complexes of TREX-1 (Okamura et al. 2015). v) The nuclear pore complex (NPC) is integral to the uncoupling of 774 transcription from translation because the NPC acts as a gate keeper, controlling which macromolecules enter 775 and exit the nucleus. NPC’s are unique to the eukaryotes, and a single NPC comprises ∼500 individual protein 776 molecules collectively known as nucleoporins (Nups) (Kabachinski and Schwartz 2015). The NPC includes a 777 nuclear ring, a central transport channel and eight cytoplasmic fibrils which allow molecules smaller that 40-60 778 kDa to freely diffuse (Kabachinski and Schwartz 2015). Large molecules such as mRNA must associate with 779 specific export receptors such as Nxf1, Crm1 or other karyopherins to be actively transported through the NPC. 780 vi) To initiate translation, the 43S ribosomal preinitiation complex is recruited to the 5’ end of the mRNA, a 781 process that is co-ordinated by eIF4E through its interactions with eIF4G and the 40S ribosomal subunit 782 associated eIF3 (Hernandez et al. 2012). Several eukaryotic specific initiation factors eIF4E, eIF4G, eIF4B, eIF4H 783 and eIF3 are involved with 5’-cap-binding and scanning processes that are essential to the initiation and 784 translation of capped eukaryotic mRNA (Jagus et al. 2012). vii) Once the ribosome has been recruited to the 785 capped mRNA transcript, a scanning process occurs and translation is generally initiated at the first ATG 786 encountered. 787

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 31: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

31

788

Figure 3: 201 Φ2-1 viral factories, Mimivirus viral factories, and the eukaryotic nucleus share the ability to 789 uncouple transcription from translation. i a) Image of Phage 201 Φ2-1 viral factory (Chaikeeratisak et al. 790 2017b). ii a) Image of Mimivirus viral factory (Zauberman et al. 2008). iii a) The eukaryotic nucleus. i b) Phage 791 201 Φ2-1 establishes a viral factory in the cytoplasm of the bacterial host confining DNA replication and 792 transcription to the viral factory. Translation is confined to the cytoplasm since host bacterial ribosomes are 793 excluded from the viral factory (Chaikeeratisak et al. 2017b). Since PhiKZ relatives of 201 Φ2-1 can complete 794 infection in the absence of bacterial RNA polymerase (RNAP) activity (Ceyssens et al. 2014) it can be inferred 795 that the multi-subunit RNAP genes encoded by the phage are transcribed in the viral factory, transcripts 796 exported into the cytoplasm for translation and the proteins re-imported into the viral factory to transcribe 797 the phage DNA. ii b) The Mimivirus also establishes a viral factory in the cytoplasm of its eukaryotic host 798 (Mutsafi et al. 2010) confining DNA replication and transcription to the viral factory. Translation is confined to 799 the cytoplasm since host ribosomes are excluded from the viral factory (Fridmann-Sirkis et al. 2016). 800 Mimiviruses encode a multi-subunit RNA polymerase that transcribes Mimiviral DNA and functions within the 801 viral factory (Fridmann-Sirkis et al. 2016). It can therefore be inferred that the Mimivirus viral factory controls 802 which macromolecules are transported in and out of the viroplasm. Like the eukaryotic nucleus Mimiviridae 803 encode their own mRNA capping apparatus and a version of the eIF4E gene. In cells infected by the 804 Mimiviridae EIF4E remains located in the host cytoplasm (Fridmann-Sirkis et al. 2016). iii b) The eukaryotic 805 nucleus, like viral factories of both phage 201 Φ2-1 and the Mimivirus, is a specialised compartment located in 806 the cytoplasm that confines DNA replication and transcription within its boundaries. Translation is confined to 807 the cytoplasm since functional ribosomes are excluded from the nucleus. The mRNA encoding RNAP- II 808 subunits are transcribed within the nucleus, exported into the cytoplasm for translation, and re-imported into 809 the nucleus to transcribe nuclear DNA. Unlike viral factories, the mechanisms by which the nucleus sorts the 810 macromolecules that can enter and exit the nucleus is well understood, and known to be controlled by the 811 NPCs. Like the Mimivirus viral factory, eukaryotic nuclei encode their own capping apparatus and encode the 812 eIF4E gene which binds to the m7G cap and both are part of a complex system to uncouple transcription from 813 translation. 814 815 816 817 818 819 820 821 822

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 32: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

32

823

824

Figure 4: Unrooted phylogenetic trees of the mRNA capping pathway in selected eukaryotes and 825 Mimiviridae. All five trees use sequences from the same set of carefully selected organisms (see Materials and 826 Methods) and the proposed position of LECA is marked in each tree. The number of conserved amino acids in 827 the final alignment for each gene is marked on the diagram. Trees were constructed and drawn using the ML 828 method using default settings in MEGA7 with 1000 bootstrap replicates. NCBI accession numbers are given for 829 each sequence in the Materials and Methods. Mimiviridae informal grouping names are based on Claverie and 830 Abergel 2018. a) RNAP largest subunit gene tree. b) GTase gene tree. c) MTase gene tree. d) eIF4E gene tree 831 e) Phylogenetic tree inferred from concatenation of all four gene sequences. 832

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 33: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

33

833

Figure 5: Maximum Likelihood tree of RNA polymerases using Archaeal RNAP subunit A’ as an outgroup. 834 The RNAP A’ subunit of archaea was used as an out-group to establish the root of the largest subunit of the 835 Mimiviral RNAP and Eukaryotic RNAP-II and RNAP-III genes. RNAP-II and RNAP-III are found to belong to two 836 separate monophyletic groups. Both the RNAP-II and RNAP-III trees are robust, appropriately assign 837 eukaryotes to their correct phylogenetic branches and re-capitulate the expected phylogenetic relationships 838 between the eukaryotes including the early divergence of the Excavata (Hampl et al. 2009). The Mimiviridae 839 tree is consistent with previous phylogenetic analyses of the Mimiviridae (Claverie and Abergel, 2018). This 840 tree shows that the Mimiviridae and eukaryotic RNAP-II genes share a common ancestor. This ancestor 841 existed before LECA and is consistent with the proposal that both descend from FENA, a proposed viral 842 ancestor of both the Mimiviridae and the eukaryotic nucleus that infected an archaeal ancestor of the 843 eukaryotes. Since both viral and eukaryotic RNAP-II synthesise m7G capped mRNA it can be inferred that the 844 common RNA polymerase ancestor also produced capped mRNA. This tree was produced from an alignment 845 of 64 sequences and 598 positions using Maximum Likelihood method and the JTT substitution model. 846 Bootstrap values are indicated on each branch and are based on 1000 replicates. The tree and the 847 computations were performed using MEGA7. NCBI accession numbers are given for each sequence in the 848 Materials and Methods 849 850

851

852

853

854

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint

Page 34: Evidence supporting a viral origin of the eukaryotic nucleus30 The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 31 from translation

34

Tables 855

856

Table 1. Homologues of S. cerevisiae RNAP-II, GTase, MTase and eIF4E in Homo sapiens and 857

Heimdallarchaota LC-3 identified using Blast. 858

859

860

Table 2. Summary of Accession numbers used in this study 861

862

Saccharomyces Annotation in Annotation in Heimdallarchaeota LC-3

gene name Homo sapiens Accession E value Accession E value

RNA polymerase I large subunit NP_056240 1.00E-97

RPO21 (NP_010141) RNA polymerase II large subunit NP_000928 0 OLS19521.1 0 RNA polymerase subunit A'

RNA polymerase III large subunit NP_008986 0

CEG1 (NP_011385) Guanylytransferase AAH19954 2.00E-17 OLS26805.1 2.7 DNA ligase

ABD1 (NP_009795) Methytransferase NP_003790 2.00E-38 OLS26405.1 4.00E-05 Trans-aconitate 2-methyltransferase

CDC33 (NP_014502) eIF4E NP_001959 3.00E-36 OLS27732.1 0.37 5-exo-hydroxycamphor dehydrogenase

Homo sapiens Heimdallarchaeota LC-3

Group species Gtase Mtase eIF4E RNAP-II RNAP-III

Eukarya Fungi Saccharomyces cerevisiae NP_011385 NP_009795 NP_014502 NP_010141 NP_014759

Eukarya Fungi Kluyveromyces marxianus XP_022676394 XP_022674569 XP_022678436 XP_022677581 XP_022678447

Eukarya Fungi Aspergillus niger XP_001400555 XP_001394253 XP_001395221 XP_001389676 XP_001393726

Eukarya Holozoa Homo sapiens AAH19954 BAA82447 NP_001959 NP_000928 NP_008986

Eukarya Holozoa Mus musculus NP_036014 NP_080716 NP_031943 AAB58418 NP_001074716

Eukarya Holozoa Danio rerio NP_998032 NP_001038465 NP_001007778 XP_005156282 NP_001263425

Eukarya Holozoa Caenorhabdis. elegans NP_001020979 NP_492674 NP_503124 NP_500523 NP_501127

Eukarya viridiplantae Arabidopsis lyrata XP_002873017 XP_002894293 XP_020875354 XP_020873010 XP_020884300

Eukarya viridiplantae Brassica napus XP_013647283 XP_013640697 AGA20262 XP_013656472 XP_013681133

Eukarya viridiplantae Ostreococcus tauri XP_003075327 XP_003081423 XP_022840751 XP_022839775 XP_022840814

Eukarya Amoebozoa Dictyostelium discoideum XP_636333 XP_642389 XP_647593 XP_641735 XP_642724

Eukarya Amoebozoa Dictyostelium purpureum XP_003293052 XP_003293647 XP_003293106 XP_003285719 XP_003284018

Eukarya Amoebozoa Acytostelium subglobosum LB1 XP_012756463 XP_012752660 XP_012756585 XP_012756853 XP_012752065

Eukarya Alveolata Plasmodium falciparum KNC37820 ETW19449 XP_001351220 XP_001351252 XP_001350009

Eukarya Alveolata Plasmodium vivax KMZ83875 SGX75114 XP_001614562 XP_001614530 XP_001614080

Eukarya Alveolata Theileria equi strain WA XP_004828897 XP_004828862 XP_004829399 XP_004831990 XP_004830926

Eukarya Alveolata Perkinsus marinus ATCC 50983 XP_002774114 XP_002774250 XP_002774365 XP_002767562 XP_002778409

Eukarya Alveolata Cryptosporidium muris RN66 XP_002140608 XP_002139632 XP_002140059 XP_002141559 XP_002142344

Eukarya Excavata Trypanosoma cruzi cruzi PBJ71163 PBJ71163 PBJ73557 PBJ81421 PBJ72541

Eukarya Excavata Trypanosoma rangeli RNF00410 RNF00410 RNF02202 RNF07318 RNF04215

Eukarya Excavata Leishmania mexicana XP_003875466 XP_003875466 XP_003876737 XP_003877779 XP_003878621

Eukarya Excavata Leptomonas seymouri KPI83387 KPI83387 KPI89876 KPI84927 KPI86235

Eukarya Excavata Bodo saltans CUG90421 CUG90421 CUF95139 CUI14899 CUI14455

Mimiviridae Klosneuvirinae Catovirus CTV1 ARF09224 ARF09224 ARF09024 ARF09013-20

Mimiviridae Klosneuvirinae Klosneuvirus KNV1 ARF11732 ARF11732 ARF11337 ARF11340-43

Mimiviridae Klosneuvirinae Indivirus ILV1 ARF09638 ARF09638 ARF09452 ARF09455

Mimiviridae Klosneuvirinae Bodo saltans virus ATZ80933 ATZ80933 ATZ80516 ATZ80519

Mimiviridae Mimivirinae Acanthamoeba polyphaga mimivirus AEJ34618 AEJ34618 AKI79272 YP_003987013

Mimiviridae Mimivirinae Moumouvirus australiensis AVL94825 AVL94825 AVL94704 AVL94698 AVL94696

Mimiviridae Mimivirinae Powai lake megavirus ANB50623 ANB50623 ANB50499 ANB50494 ANB50492

Mimiviridae Mimivirinae Tupanvirus deep ocean AUL79325 AUL79325 AUL79602 AUL79608

Mimiviridae Mimivirinae Tupanvirus soda lake AUL78031 AUL78031 AUL78296 AUL78302

Mimiviridae Mimivirinae Acanthamoeba polyphaga moumouvirus YP_007354410 YP_007354410 YP_007354285 YP_007354277

Mimiviridae Mesomimivirinae Chrysochromulina ericina virus YP_009173557 YP_009173557 YP_009173322 YP_009173653

Mimiviridae Mesomimivirinae Phaeocystis globosa virus YP_008052553 YP_008052553 YP_008052407 YP_008052581

Mimiviridae Mesomimivirinae Tetraselmis virus 1 AUF82182 AUF82182 AUF82209 AUF82600

Mimiviridae CroV Cafeteria roenbergensis virus BV-PW1 YP_003969844 YP_003969844 YP_003969852 YP_003970001

Archaea Crenarchaeota Saccharolobus solfataricus WP_009990476

Archaea Crenarchaeota Sulfolobus acidocaldarius WP_011277574

Archaea Asgardarchaea Candidatus Odinarchaeota archaeon LCB_4 OLS17382

Archaea Euryarchaeota Pyrococcus furiosus WP_014835440

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint