pnas-2011-arslan-1110889108

12
Distant Mimivirus relative with a larger genome highlights the fundamental features of Megaviridae Defne Arslan 1 , Matthieu Legendre 1 , Virginie Seltzer, Chantal Abergel 2 , and Jean-Michel Claverie 2 Information Génomique et Structurale, Centre National de la Recherche Scientique-Unité Propre de Recherche 2589, Aix-Marseille University, Institut de Microbiologie de la Méditerranée, Parc Scientique de Luminy, Case 934, FR-13288 Marseille, France Edited by James L. Van Etten, University of Nebraska, Lincoln, NE, and approved September 13, 2011 (received for review July 6, 2011) Mimivirus, a DNA virus infecting acanthamoeba, was for a long time the largest known virus both in terms of particle size and gene content. Its genome encodes 979 proteins, including the rst four aminoacyl tRNA synthetases (ArgRS, CysRS, MetRS, and TyrRS) ever found outside of cellular organisms. The discovery that Mimivirus encoded trademark cellular functions prompted a wealth of theoretical studies revisiting the concept of virus and associated large DNA viruses with the emergence of early eukaryotes. How- ever, the evolutionary signicance of these unique features remained impossible to assess in absence of a Mimivirus relative exhibiting a suitable evolutionary divergence. Here, we present Megavirus chilensis, a giant virus isolated off the coast of Chile, but capable of replicating in fresh water acanthamoeba. Its 1,259,197-bp ge- nome is the largest viral genome fully sequenced so far. It encodes 1,120 putative proteins, of which 258 (23%) have no Mimivirus homologs. The 594 Megavirus/Mimivirus orthologs share an aver- age of 50% of identical residues. Despite this divergence, Megavi- rus retained all of the genomic features characteristic of Mimivirus, including its cellular-like genes. Moreover, Megavirus exhibits three additional aminoacyl-tRNA synthetase genes (IleRS, TrpRS, and AsnRS) adding strong support to the previous suggestion that the Mimivirus/Megavirus lineage evolved from an ancestral cellular genome by reductive evolution. The main differences in gene con- tent between Mimivirus and Megavirus genomes are due to (i ) lineages specic gains or losses of genes, (ii ) lineage specic gene family expansion or deletion, and (iii ) the insertion/migration of mobile elements (intron, intein). Mimiviridae | DNA virus phylogeny | girus | viral translation | tree of life T he discovery of the viral nature of Acanthamoeba polyphaga Mimivirus (1), followed by the determination of its out- standing genome sequence (1,182 kb) (2) led to an irreversible change in the way microbiologists looked at viruses (36), even reviving the debate on their classication as living micro- organisms (7, 8). After Mimivirus, no obvious limit could be set anymore on the expected size of a viral particle, or the com- plexity of its gene content, both of them now largely overlapping with that of the simplest cellular organisms, such as parasitic bacteria (9). Moreover, the Mimivirus genome was found to en- code a number of functions thought to be trademarks of cellular organisms, such as aminoacyl tRNA synthetases (AARS), threat- ening to invalidate the absence of a translation system as the last inviolate criterion separating viruses from the cellular world (1, 10). Subsequent studies revealed additional Mimivirus idiosyn- crasies such as its unique virion unloading/loading mechanisms (11), singular gene transcription signaling such as the hairpin rule(12), and its susceptibility to infection by a new type of satellite virus, called virophage(13, 14). Beyond the initial excitation of their discovery, the next step is now to assess to what extent these features are anecdotal or, in the contrary, deeply linked to the emergence and mode of evolution of giant DNA viruses with genome sizes >1 Mb (hereby referred to as Megaviridae). Such a task requires comparing Mimivirus with relatives situated at optimal evolutionary distances, close enough to allow the unambiguous assignment of homologous features, but divergent enough to provide a clear illustration of the evo- lutionary forces at work. Several Mimivirus-related megaviridae have been mentioned in recent literature: Mamavirus (15) (nearly identical to Mimivirus) and few others more briey de- scribed (Terra1, Terra2, Courdo, or Moumou) (16) and for which no genome sequence is available. Through a campaign of random aquatic environmental sam- pling, followed by culturing on a panel of acanthamoeba species, we isolated Megavirus from sea water sampled close to the shore off the ECIM marine station in Las Cruces, Chile (SI Materials and Methods). Megavirus is now routinely cultured on A. cas- tellanii. Its complete genome sequence was determined by using a combination of 454-titanium and Illumina HiSeq approaches. Here, we present an electron microscopy study of the Megavirus replication cycle in A. castellanii and an analysis of its genome. Despite their substantial evolutionary divergence, all of the unique features noticed in Mimivirus are conserved in Mega- virus, and delineate a core set of cellular-like functions that might be fundamentally linked to the origin and evolution of Megaviridae. Results and Discussion Electron Microscopy Data. Megavirus and Mimivirus virion par- ticles exhibit a very similar overall morphology, with a dense core nucleocapsid encased into an icosasedral-like capsid, itself cov- ered by a layer of bers. However, details make them readily recognizable, even in mixed culture (Fig. 1A): The Megavirus bers are noticeably shorter (75 ± 5 nm compared with 120 ± 5 nm for Mimivirus) and cover a capsid slightly larger in di- ameter (440 ± 10 nm, compared with 390 ± 10 nm for Mim- ivirus). These dimensions refer to dehydrated virions as prepared for thin section transmission electronic microscopy (TEM), which induce an 20% shrinking of the capsids (17). Native Megavirus icosasedral capsids are thus 520 nm in diameter, corresponding to a total particle diameter of 680 nm. In addition, the hairof Megavirus virions often exhibits one or two patches of slightly longer and denser bers (nicknamed cowlicks) (Fig. 1A Inset). The Megavirus particles exhibit a clearly visible special vertex (Fig. 1B), corresponding to the Stargatealready de- scribed for Mimivirus, a ve-pronged star structure the opening of which triggers the release of the nucleocapsid into the host cell cytoplasm (11, 14). Upon protease treatments, Megavirus par- Author contributions: C.A. and J.-M.C. designed research; D.A., M.L., V.S., C.A., and J.-M.C. performed research; D.A., M.L., V.S., C.A., and J.-M.C. analyzed data; and M.L., C.A., and J.-M.C. wrote the paper. The authors declare no conict of interest. This article is a PNAS Direct Submission. Data deposition: The sequence reported in this paper has been deposited in the GenBank database (accession no. JN258408). A Megavirus genome browser is available at www. giantvirus.org/megavirus/. 1 D.A. and M.L. contributed equally to this work. 2 To whom correspondence may be sent. E-mail: [email protected] or chantal. [email protected] This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1110889108/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1110889108 PNAS Early Edition | 1 of 6 MICROBIOLOGY

Upload: vlad-preda

Post on 23-Nov-2015

6 views

Category:

Documents


1 download

DESCRIPTION

science

TRANSCRIPT

  • Distant Mimivirus relative with a larger genomehighlights the fundamental features of MegaviridaeDefne Arslan1, Matthieu Legendre1, Virginie Seltzer, Chantal Abergel2, and Jean-Michel Claverie2

    Information Gnomique et Structurale, Centre National de la Recherche Scientique-Unit Propre de Recherche 2589, Aix-Marseille University, Institutde Microbiologie de la Mditerrane, Parc Scientique de Luminy, Case 934, FR-13288 Marseille, France

    Edited by James L. Van Etten, University of Nebraska, Lincoln, NE, and approved September 13, 2011 (received for review July 6, 2011)

    Mimivirus, a DNA virus infecting acanthamoeba, was for a longtime the largest known virus both in terms of particle size and genecontent. Its genome encodes 979 proteins, including the rst fouraminoacyl tRNA synthetases (ArgRS, CysRS,MetRS, and TyrRS) everfound outside of cellular organisms. The discovery that Mimivirusencoded trademark cellular functions prompted a wealth oftheoretical studies revisiting the concept of virus and associatedlarge DNA viruses with the emergence of early eukaryotes. How-ever, the evolutionary signicance of these unique features remainedimpossible to assess in absence of a Mimivirus relative exhibiting asuitable evolutionary divergence. Here, we present Megaviruschilensis, a giant virus isolated off the coast of Chile, but capableof replicating in fresh water acanthamoeba. Its 1,259,197-bp ge-nome is the largest viral genome fully sequenced so far. It encodes1,120 putative proteins, of which 258 (23%) have no Mimivirushomologs. The 594 Megavirus/Mimivirus orthologs share an aver-age of 50% of identical residues. Despite this divergence, Megavi-rus retained all of the genomic features characteristic of Mimivirus,including its cellular-like genes. Moreover, Megavirus exhibitsthree additional aminoacyl-tRNA synthetase genes (IleRS, TrpRS,and AsnRS) adding strong support to the previous suggestion thattheMimivirus/Megavirus lineage evolved froman ancestral cellulargenome by reductive evolution. The main differences in gene con-tent between Mimivirus and Megavirus genomes are due to (i)lineages specic gains or losses of genes, (ii) lineage specic genefamily expansion or deletion, and (iii) the insertion/migration ofmobile elements (intron, intein).

    Mimiviridae | DNA virus phylogeny | girus | viral translation | tree of life

    The discovery of the viral nature of Acanthamoeba polyphagaMimivirus (1), followed by the determination of its out-standing genome sequence (1,182 kb) (2) led to an irreversiblechange in the way microbiologists looked at viruses (36), evenreviving the debate on their classication as living micro-organisms (7, 8). After Mimivirus, no obvious limit could be setanymore on the expected size of a viral particle, or the com-plexity of its gene content, both of them now largely overlappingwith that of the simplest cellular organisms, such as parasiticbacteria (9). Moreover, the Mimivirus genome was found to en-code a number of functions thought to be trademarks of cellularorganisms, such as aminoacyl tRNA synthetases (AARS), threat-ening to invalidate the absence of a translation system as the lastinviolate criterion separating viruses from the cellular world (1,10). Subsequent studies revealed additional Mimivirus idiosyn-crasies such as its unique virion unloading/loading mechanisms(11), singular gene transcription signaling such as the hairpinrule (12), and its susceptibility to infection by a new type ofsatellite virus, called virophage (13, 14). Beyond the initialexcitation of their discovery, the next step is now to assess towhat extent these features are anecdotal or, in the contrary,deeply linked to the emergence and mode of evolution of giantDNA viruses with genome sizes >1 Mb (hereby referred to asMegaviridae). Such a task requires comparing Mimivirus withrelatives situated at optimal evolutionary distances, close enoughto allow the unambiguous assignment of homologous features,

    but divergent enough to provide a clear illustration of the evo-lutionary forces at work. Several Mimivirus-related megaviridaehave been mentioned in recent literature: Mamavirus (15)(nearly identical to Mimivirus) and few others more briey de-scribed (Terra1, Terra2, Courdo, or Moumou) (16) and forwhich no genome sequence is available.Through a campaign of random aquatic environmental sam-

    pling, followed by culturing on a panel of acanthamoeba species,we isolated Megavirus from sea water sampled close to the shoreoff the ECIM marine station in Las Cruces, Chile (SI Materialsand Methods). Megavirus is now routinely cultured on A. cas-tellanii. Its complete genome sequence was determined by usinga combination of 454-titanium and Illumina HiSeq approaches.Here, we present an electron microscopy study of the Megavirusreplication cycle in A. castellanii and an analysis of its genome.Despite their substantial evolutionary divergence, all of theunique features noticed in Mimivirus are conserved in Mega-virus, and delineate a core set of cellular-like functions thatmight be fundamentally linked to the origin and evolution ofMegaviridae.

    Results and DiscussionElectron Microscopy Data. Megavirus and Mimivirus virion par-ticles exhibit a very similar overall morphology, with a dense corenucleocapsid encased into an icosasedral-like capsid, itself cov-ered by a layer of bers. However, details make them readilyrecognizable, even in mixed culture (Fig. 1A): The Megavirusbers are noticeably shorter (75 5 nm compared with 120 5 nm for Mimivirus) and cover a capsid slightly larger in di-ameter (440 10 nm, compared with 390 10 nm for Mim-ivirus). These dimensions refer to dehydrated virions as preparedfor thin section transmission electronic microscopy (TEM),which induce an 20% shrinking of the capsids (17). NativeMegavirus icosasedral capsids are thus 520 nm in diameter,corresponding to a total particle diameter of 680 nm. In addition,the hair of Megavirus virions often exhibits one or two patchesof slightly longer and denser bers (nicknamed cowlicks) (Fig.1A Inset). The Megavirus particles exhibit a clearly visible specialvertex (Fig. 1B), corresponding to the Stargate already de-scribed for Mimivirus, a ve-pronged star structure the openingof which triggers the release of the nucleocapsid into the host cellcytoplasm (11, 14). Upon protease treatments, Megavirus par-

    Author contributions: C.A. and J.-M.C. designed research; D.A., M.L., V.S., C.A., and J.-M.C.performed research; D.A., M.L., V.S., C.A., and J.-M.C. analyzed data; and M.L., C.A., andJ.-M.C. wrote the paper.

    The authors declare no conict of interest.

    This article is a PNAS Direct Submission.

    Data deposition: The sequence reported in this paper has been deposited in the GenBankdatabase (accession no. JN258408). A Megavirus genome browser is available at www.giantvirus.org/megavirus/.1D.A. and M.L. contributed equally to this work.2To whom correspondence may be sent. E-mail: [email protected] or [email protected]

    This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1110889108/-/DCSupplemental.

    www.pnas.org/cgi/doi/10.1073/pnas.1110889108 PNAS Early Edition | 1 of 6

    MICRO

    BIOLO

    GY

  • ticles appeared more fragile than the Mimivirus ones, showinga larger proportion of damaged particles in TEM preparation.The Megavirus isolate was initially tested for growth on A.

    grifni, A. polyphaga, and A.castellanii, and found to replicate inthese three species. For comparison purpose, the replication ofMegavirus was then studied in detail on A. castellanii (NeffAmerican Type Culture Collection 30010). The infectious cycleof Megavirus lasts 17 h from the initial phagocytosis event to thetime of maximal release of particles from fully mature virionfactories, compared with 12 h for Mimivirus (MOI 10). Thedifference is mostly due to a slower progression from the seedstage (Fig. 1 C and D) to the fully bloomed virion factories (Fig.1 G and H). Another macroscopic phenomenon specic ofMegavirus is that 35% of the A. castellanii cells appear to diewithout undergoing productive infections. The reasons for this

    cytotoxicity is unknown but suggest that the laboratory A. cas-tellanii strain behaves as a nonoptimal substitute for the un-known natural environmental host of Megavirus. We previouslyobserved that Mimivirus induced a rounding of the infected A.castellanii cells, 6 h after infection. The same phenomenon,albeit slightly delayed, was observed with Megavirus. However,at variance with Mimivirus where rounded cells remained ad-herent, Megavirus infection caused most of them to come offtheir support, making the cultures more difcult to monitor byregular microscopy. A. castellanii cells infected by Megavirusexhibited three distinctive ultrastructural features in succession,as described for Mimivirus (11, 14, 18). First, the seed, cor-responding to the Megavirus core nucleocapsid extracted fromthe external particle layers, appeared clearly separated from thecell cytoplasm by a well-delineated lipid membrane (Fig. 1C andD).

    Fig. 1. Electron microscopy of Megavirus compared with Mimivirus. (A) Mimivirus (Upper) and Megavirus (Lower) particles in a same vacuole (coinfection).(A, Inset) Cowlicks (arrow) as often seen in the Megavirus ber outer layer. (B) Megavirus stargate. (B, Inset) Transversal section of a Megavirus particle belowan open stargate. Megavirus (C) and Mimivirus (D) seeds surrounded by a lipid membrane (arrows). Megavirus (E) and Mimivirus (F) early stages of the virionfactories with the seeds at their centers. Megavirus (G) and Mimivirus (H) mature virion factories in full production. (Scale bars: AF, 200 nm; G and H, 1 m).

    2 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1110889108 Arslan et al.

  • This membrane likely derives from the most internal of the twomembranes visible inside the virus particles, as expected from theinfection process (opening of the stargate, followed by the fusionof the rst virus membrane with the vacuole membrane). Theseseeds then progressively turn into early virion factories recog-nizable as electron-dense nucleic acid-rich regions (18) isolatedfrom the surrounding cytoplasm by an exclusion zone constitutedof a mesh of brils 16 nm in diameter (Fig. 1 E and F). Thebiochemical nature of these brils remains to be characterized.Finally, fully mature virion factories release a large number ofparticles from their periphery where three stages of maturationare seen: empty assembled particles budding from the factoryedges, bald particles lled with the nucleocapsid, and ber-covered mature particles ready to exit the host cell (Fig. 1 Gand H). The burst size of Megavirus-infected A. castellanii cells isapproximately one-half the thousand virions released by thoseinfected by Mimivirus.

    Megavirus Overall Genome Structure and Gene Content. The ge-nome of Megavirus is a linear double-stranded DNA moleculewith a size of 1,259,197 bp (74.76% A+T), making it the largestdescribed viral genome. The whole sequence was assembled andcorrected at once from a dataset of 278,663 454-titanium readscombined with 42,288,396 Illumina Hiseq (paired-end) reads (SIMaterials and Methods). We annotated 1,120 putative protein-coding sequences (CDSs) and 3 tRNAs (1 Trp, and 2 Leu).Megavirus CDSs range from 29 aa to 2,908 aa in length, for anaverage of 338.4 aa (median = 281.5 aa). The distance sepa-rating consecutive CDSs was short (114.5 nt in average), result-ing into a very high coding density (90.14%). Eight hundredsixty-two of the 1120 (77%) predicted Megavirus CDS havehomologs in Mimivirus, whereas 793 of the 979 (81%) MimivirusCDSs (19) have homologs in Megavirus. Using the best re-ciprocal match criterion (SI Materials and Methods), Megavirusand Mimivirus were found to share 594 orthologous proteinsexhibiting a broad distribution of similarity centered on an av-erage of 50% identical residues (Table S1A and Figs. S1 and S2).Most likely inherited from a Megavirus/Mimivirus common an-cestor, the corresponding gene set provides a minimal estimateof the core genome of ancestral Megaviridae.Fig. 2 illustrates at one glance both the overall similarity and

    divergence of the Megavirus and Mimivirus genomes. They dis-play a large central region of colinearity extending from mg210to mg804 in Megavirus, and L192 to R730 in Mimivirus. Thisregion is solely disrupted by the inversion of a central 338-kbgenome segment and the translocation of a 76-kb distal segment.Interestingly, one of the boundary of the inversion coincides withthe slope reversal of the A+C excess prole, a position associ-ated to the origin of replication in bacteria (Fig. S3). However,this quasiperfect colinearity abruptly vanishes at the two ex-tremities of the Megavirus chromosome, respectively corre-sponding to 193 kb and 327 kb. Although these regions stillencompass several hundred of homologous genes, their locationand orientation appear extensively shufed (Fig. 2). The processleading to the total loss of colinearity at the genome extremitiesremains unknown, because it is not correlated with a local en-richment in transposases, or to more divergent orthologoussequences. For instance, the highest conservation (93% identicalresidues) was found for a predicted cholinesterase (MimivirusL906/Megavirus mg981) located at the extremity of the chro-mosomes.Interestingly, the same pattern is observed when comparing

    two poxviruses exhibiting a level of sequence divergence com-parable to the one between Megavirus and Mimivirus (e.g., DNApolymerases exhibiting 65% identical residues) (Fig. S4). Thisobservation suggests that Poxviruses and Megaviridae, despitetheir considerable differences, might share a genome replicationstrategy (e.g., coupling replication with recombination) (20, 21)

    that favor the rearrangement, gain, or loss of genes at the ex-tremities of viral chromosomes. Again, the dramatic 190-kb ge-nome reduction exhibited by a recently described spontaneousMimivirus mutant is mainly due to large deletions occurring atboth ends of the genome (22).

    Unique Transcriptional Features in Megaviridae. Previous analyses ofthe Mimivirus genome uncovered two distinctive features withinits noncoding moiety. The rst one was the perfect conservationof the octameric motif AAAATTGA in front of 45% ofMimivirus CDSs (23). The second was the presence of unrelatedpalindromic sequences (capable of generating hairpins with aminimal stem length of 15 bp) at the 3 end of 72% of allMimivirus mapped transcripts (12). The AAAATTGA motif waslater shown to be strongly correlated to early expressed tran-scripts (24), and the predicted hairpins were demonstrated toserve as polyadenylation signals (12, 24). We examined the 100-nt region upstream of the predicted start codon of the 1,120Megavirus CDSs. Overall 446 (40%) were found to contain theexact AAAATTGA motif. This proportion was 33.8% (201/594)among Megavirus genes with orthologs in Mimivirus. For 170 ofthese (85%), the AAAATTGA sequence was also found up-stream of Mimivirus orthologs. This ratio shows that (i) Mega-virus and Mimivirus are using the same motif to specify earlygene expression, and (ii) that the expression pattern of orthol-ogous genes is globally well conserved. Detailed statistics on thedistribution of early and late promoter elements are presented inTable S1B.Similarly, we searched the Megavirus genome 3 intergenic

    regions for palindromic sequences obeying the same constraintsused to identify them in Mimivirus. Nine hundred fty-fourMegavirus CDSs (85%) genes were found to be followed by

    Megavirus CDSs

    Mim

    iviru

    s C

    DS

    s

    1 250 500 750 1120

    125

    050

    075

    097

    9

    1 233000 550000 871000 1259197

    1

    279000

    612000

    913000

    1181549

    ivi

    miM

    rsu

    snoit

    isop

    ci

    mone

    gMegavirus genomic positionsA

    B

    Fig. 2. Comparison of Mimivirus and Megavirus genomes. (A) Colinearity ofMimivirus and Megavirus CDSs. Each dot symbolizes the best BLAST match (evalue 105) between the CDSs of the two viruses, in the same orientation(blue) or in reverse orientation (red). (B) The distribution of homologousCDSs. Megavirus 594 CDSs with Mimivirus orthologs are shown in red. The268 additional Megavirus CDSs with a signicant (nonreciprocal) match inMimivirus CDSs are shown in blue. CDSs specic to Megavirus are shown inyellow. Orthologs clearly cluster in the central region, whereas the two othercategories of CDSs (e.g., duplicated in or specic to Megavirus) tend tocluster at the extremities.

    Arslan et al. PNAS Early Edition | 3 of 6

    MICRO

    BIOLO

    GY

  • a suitable predicted hairpin. The proportion was also 85% forMegavirus genes orthologous to Mimivirus genes exhibiting ahairpin. This correlation again suggests that these hairpins arethe termination signal of Megavirus transcripts and that thetranscript structures are well conserved between the two viruses.This prediction was experimentally veried by sequencing the 3end of the cDNA of Megavirus mg464, orthologous to Mimivirusmajor capsid (MCP) gene, L425. These two CDSs, 78% identicalat the nucleotide level, encode two proteins sharing 79% ofidentical residues. As shown in Fig. S5, the polyadenylation ofthe Megavirus MCP mRNA occurs within the predicted hairpin,albeit 14 nucleotides upstream of the site used in Mimivirus.Given the low level of sequence similarity between these 3UTRs (
  • Finally, Megavirus mg431 encodes a uridine monophosphatekinase, an enzyme previously undescribed in a DNA virus. Thisenzyme is 44% identical to its closest homologs in bacteria whereit is the rate-limiting enzyme of the pyrimidine salvage pathway(Table S1C).

    Lineage-Specic Genome Features. The progressive accumulationof point mutations (random drift) appears to be the main causeof the divergence between the Megavirus and Mimivirusgenomes. The orthologous CDSs, as well as the orthologousintergenic regions, exhibit an average similarity of 65% identicalnucleotides, higher than the 50% identical residues shared byorthologous proteins (Fig. S1).Two hundred fty-eight Megavirus CDSs exhibit no obvious

    homolog in Mimivirus and, reciprocally, 186 Mimivirus CDSshave no homolog in Megavirus. More than 85% of these lineage-specic CDSs correspond to proteins without functional pre-dictions. They appear to cluster at the ends of the Megaviruschromosome (Fig. 2B). These CDSs might also correspond tofast evolving proteins pushed below our similarity threshold (e 50. A very similar tree was computed by using the default option of the Phylogeny.fr server (alignment with MUSCLE andtree reconstruction by PhyML) (2). Despite being one of the least canonical of the AARS (sensu Woese et al.; see ref. 3), the Megavirus AsnRS nicely separatesthe archeal enzymes from all of the eukaryotic (including mitochondrial) types known to intermix with bacterial enzymes. (B) TrpRSs (mg844): This PhyML tree(rooted on the Archaea branch) was computed on the Phylogeny.fr server (2) from 327 conserved positions. The nodes are labeled with their bootstrap valueswhen >50. A similar tree was computed by using the default option of the MAFFT server (tree reconstruction by neighbor joining). The Megavirus TrpRS isbranching off the eukaryotic domain before the radiation of all clades. (C) DNA polymerase (mg582). This neighbor-joining tree was computed from the 523ungapped positions of an alignment of 38 DNA polymerases sequences from the main large DNA virus families: Poxviridae, Iridoviridae, Herpesviridae, As-farviridae, Marseilleviridae, and Phycodnaviridae. Megavirus, Mimivirus, and Terra-2 form a tight cluster (in red) within a larger well supported group ofunclassied (unc) aquatic viruses including those with the largest known genome sizes. This group (red and green) shares a number of distinctive features andis proposed to constitute a new family: the Megaviridae. PoV, Pyramimonas orientalis virus (560 kb); CroV, Cafeteria roenbergensis virus (730 kb); PpV,Phaeocystis pouchetii virus (485 kb); CeV, Chrysochromulina ericina virus (510 kb); OLV, Organic Lake virus (1 and 2); HaVDNA, Heterosigma akashiwo DNAvirus; PBCV, Paramecium bursaria Chlorella virus; ATCV, Acanthocystis turfacea Chlorella virus; BpV, Bathycoccus sp. RCC1105 virus; OsV, Ostreococcus virus;

    Legend continued on following page

    Arslan et al. www.pnas.org/cgi/content/short/1110889108 5 of 6

  • Mimivirus L244

    Megavirus mg339

    1084

    1548

    1548

    866 787 78 288

    1661 34 332

    L247 L246 L245

    mg341 mg340

    1197175 460

    146 340

    125 320

    135 394

    125 325681 993

    1071909

    Fig. S8. Complex reorganization of the RPB2 gene in Megavirus. Exons are shown in blue, introns in brown, and the intein in green. Numbers correspond toDNA segment sizes in base pairs. The rst exon is the only gene segment for which there is a one-to-one correspondence between Mimivirus and Megavirus.The second Megavirus exon incorporates most of the coding sequence of Mimivirus second and third exons, in addition to a 1,084-bp intein (1). The purple andorange boxes correspond to the last (H) and the rst (S) amino acid of the N-terminal and C-terminal exteins, respectively. The third Megavirus exon corre-sponds to the end of the Mimivirus third exon and its entire fourth exon.

    1. Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 9:286e298.2. Dereeper A, et al. (2008) Phylogeny.fr: Robust phylogenetic analysis for the non-specialist. Nucleic Acids Res 36(Web Server issue):W465W469.3. Woese CR, Olsen GJ, Ibba M, Sll D (2000) Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiol Mol Biol Rev 64:202e236.

    1. Perler FB (2002) InBase: The intein database. Nucleic Acids Res 30:383e384.

    Other Supporting Information Files

    Table S1 (DOC)

    OtV, Ostreococcus tauri virus; MpV, Micromonas sp. RCC1109 virus; OlV, Ostreococcus lucimarinus virus; EhV, Emiliania huxleyi virus; FsV, Feldmannia speciesvirus; EsV, Ectocarpus siliculosus virus; WIV,Wiseana iridescent virus; IIV, Invertebrate iridescent virus; LDV, Lymphocystis disease virus; ISKNV, Infectious spleenand kidney necrosis virus; ASFV, African swine fever virus.

    Arslan et al. www.pnas.org/cgi/content/short/1110889108 6 of 6