author's personal copy - university of british columbia · 2014-07-15 · author's...
TRANSCRIPT
This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies areencouraged to visit:
http://www.elsevier.com/authorsrights
Author's personal copy
The others: our biased perspective ofeukaryotic genomesJavier del Campo1,2, Michael E. Sieracki3, Robert Molestina4, Patrick Keeling2,Ramon Massana5, and Inaki Ruiz-Trillo1,6,7
1 Institut de Biologia Evolutiva, CSIC-Universitat Pompeu Fabra, Barcelona, Catalonia, Spain2 University of British Columbia, Vancouver, BC, Canada3 Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, USA4 American Type Culture Collection, Manassas, VA, USA5 Institut de Cie ncies del Mar, CSIC, Barcelona, Catalonia, Spain6 Departament de Gene tica, Universitat de Barcelona, Barcelona, Catalonia, Spain7 Institucio Catalana de Recerca i Estudis Avancats (ICREA), Barcelona, Catalonia, Spain
Understanding the origin and evolution of the eukaryoticcell and the full diversity of eukaryotes is relevant tomany biological disciplines. However, our current under-standing of eukaryotic genomes is extremely biased,leading to a skewed view of eukaryotic biology. Weargue that a phylogeny-driven initiative to cover the fulleukaryotic diversity is needed to overcome this bias. Weencourage the community: (i) to sequence a representa-tive of the neglected groups available at public culturecollections, (ii) to increase our culturing efforts, and (iii)to embrace single cell genomics to access organismsrefractory to propagation in culture. We hope that thecommunity will welcome this proposal, explore theapproaches suggested, and join efforts to sequencethe full diversity of eukaryotes.
The need for a phylogeny-driven eukaryotic genomeprojectEukaryotes are the most complex of the three domains oflife. The origin of eukaryotic cells and their complexityremains one of the longest-debated questions in biology,famously referred to by Roger Stanier as the ‘greatestsingle evolutionary discontinuity’ in life [1]. Thus, under-standing how this complex cell originated and how itevolved into the diversity of forms we see today is relevantto all biological disciplines including cell biology, evolu-tionary biology, ecology, genetics, and biomedical research.Progress in this area relies heavily on both genome datafrom extant organisms and on an understanding of theirphylogenetic relationships.
Genome sequencing is a powerful tool that helps us tounderstand the complexity of eukaryotes and their evolu-tionary history. However, there is a significant bias ineukaryotic genomics that impoverishes our understandingof the diversity of eukaryotes, and leads to skewed views of
what eukaryotes even are, as well as their role in theenvironment. This bias is simple and widely recognized:most genomics focuses on multicellular eukaryotes andtheir parasites. The problem is not exclusive to eukaryotes.The launching of the so-called ‘Genomic Encyclopedia ofBacteria and Archaea’ [2] has begun to reverse a similarbias within prokaryotes, but there is currently no equiva-lent for eukaryotes. Targeted efforts have recently beeninitiated to increase the breadth of our genomic knowledgefor several specific eukaryotic groups, but again these tendto focus on animals [3], plants [4], fungi [5], their parasites[6], or opisthokont relatives of animals and fungi [7].Unfortunately, a phylogeny-driven initiative to sequenceeukaryotic genomes specifically to cover the breadth oftheir diversity is lacking. The tools already exist to over-come these biases and fill in the eukaryotic tree, and we
Opinion
Glossary
18S rDNA: genes encoding the RNA of the small ribosomal subunit are found
in all eukaryotes in many copies per genome. They are also highly expressed
and its nucleotide structure combine well-conserved and variable regions.
Because of these characteristics 18S rDNA has been used as a marker to
identify and barcode eukaryotes at the species or genus level (with some
exceptions). It is also the most widely used eukaryotic phylogenetic marker.
Culturing bias: cultured microbial strains do not necessarily represent, and
usually are not, the dominant members of the environment from which they
were isolated. This bias affects bacteria, viruses, and protists. The culturing
bias can be the result of a lack of continuous culturing efforts, or inadequate
isolation and/or culturing strategies – or because, for whatever reason, some
species in the environment may be refractory to isolation and culturing.
Genomes OnLine Database (GOLD): an online resource for comprehensive
access to information regarding genome and metagenome sequencing
projects, and their associated metadata (http://www.genomesonline.org/).
Operational taxonomic unit (OTU): an operational definition of a species or
group of species. In microbial ecology, and in particular protist ecology, this
operational definition is generally based in a percentage similarity threshold of
the 18S rDNA (e.g., OTU97 refers to a cluster of sequences with >97% similarity
that are inferred to represent a single taxonomic unit).
Single amplified genomes (SAGs): the products of single cell whole-genome
amplification that can be further analyzed in similar ways to DNA extracts from
pure cultures.
Single cell genomics (SCG): a method to amplify and sequence the genome of a
single cell. The method consists of an integrated pipeline that starts with the
collection and preservation of environmental samples, followed by physical
separation, lysis, and whole-genome amplification from individual cells. This is
followed by sequencing of the resulting material. SCG is a powerful complement
to culture-based and environmental microbiology approaches [23].
0169-5347/
� 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tree.2014.03.006
Corresponding authors: del Campo, J. ([email protected]); Ruiz-Trillo,I. ([email protected]).Keywords: eukaryotic genomics; phylogeny; ecology; eukaryotic tree of life; culturecollections; culturing bias; single cell genomics.
252 Trends in Ecology & Evolution, May 2014, Vol. 29, No. 5
Author's personal copy
therefore hope that researchers will be inspired to explorethese tools and embrace the prospect of working towards acommunity-driven initiative to sequence the full diversityof eukaryotes.
The multicellular effectIt is not surprising that the first and main bias in the study ofeukaryotes arises from our anthropocentric view of life.More than 96% of the described eukaryotic species are eitherMetazoa (animals), Fungi, or Embryophyta (land plants) [8](Figure 1A) – which we call the ‘big three’ of multicellularorganisms (even though the Fungi also include unicellularmembers such as the yeasts). However, these lineages onlyrepresent 62% of the 18S rDNA (see Glossary) Genbanksequences (Figure 1B), which is of course a biased sample, or23% of all operational taxonomic units (OTUs) in environ-mental surveys (Figure 1C). This bias is not new; researchhas historically focused on these three paradigmatic eukary-otic kingdoms, which are indeed important, but are alsosimply more conspicuous and familiar to us. In genomicsthis bias is amplified considerably: 85% of the completed orprojected genome projects {as shown by the Genomes On-Line Database (GOLD) [9]} belong to the ‘big three’(Figure 1D). Moreover, even within these groups thereare biases. For example, many diverse invertebrate groupssuffer from a lack of genomic data as keenly as do microbialgroups. This makes for a pitiful future if we aim to under-stand and appreciate the complete eukaryotic tree of life. Ifwe do not change this trend we risk neglecting the majorityof eukaryotic diversity in future genomic or metagenomic-based ecological and evolutionary studies. This would pro-vide us with a far from realistic picture.
The ‘multicellular bias’ is the most serious, but is notalone. The eukaryotic groups with most species depositedin culture collections and/or genome projects are alsobiased towards either those containing mainly photo-trophic species or those that are parasitic and/or economi-cally important (Figure 2). For example, bothArchaeplastida and Stramenopila have more cultured spe-cies than other eukaryotes as a result of a long phycologicaltradition and the well-provided phycological culture collec-tions [10], and also because they are easier to maintain inculture than heterotrophs. In both cases this translates to acomparatively large number of genome projects: severalgenomic studies target photosynthetic stramenopiles[11,12] and, owing to their economic relevance in theagriculture, the peronosporomycetes [13]. In addition,the apicomplexans within the Alveolata are also relativelywell studied at the genomic level because they containimportant human and animal parasites [14] such as Plas-modium and Toxoplasma. If we look instead at the numberof sequenced strains rather than species, these biases areincreased further (Figure 3). As a result, a significantproportion of the retrieved cultures and genomes corre-spond to different strains of the same dominant species.Therefore, we have a pool of species that have been redun-dantly cultured and sequenced.
The missing branches of the eukaryotic tree of lifeAlthough we lack an incontrovertible, detailed phylogenet-ic tree of the eukaryotes, a consensus tree is emerging
Other pro�stsHaptophytaChryptophyceaeStramenopilaRhizariaOpisthokontaExcavataArchaeplas�daAmoebozoaAlveolata
(B) (C) (D) (E)
0
20
40
60
80
100(A)
Total 18S
Cultures
Genomes
Described
Env 18S
TRENDS in Ecology & Evolution
Figure 2. Relative representation of eukaryotic supergroup diversity in different
databases. (excluding metazoans, fungi, and land plants). (A) Percentage of
described species per eukaryotic supergroup according to the CBOL ProWG. (B)
Percentage of 18S rDNA OTU97 per eukaryotic supergroups in GenBank. (C)
Percentage of environmental 18S rDNA OTU97 per eukaryotic supergroups. (D)
Percentage of species with a cultured strain in any of the analyzed culture collections.
Culture data are from five large protist culture collections (n = 3084) (the American
Type Culture Collection, Culture Collection of Algae and Protozoa [24], the Roscoff
Culture Collection [25], the National Center for Marine Algae and Microbiota [26] and
the Culture Collection of Algae at Gottingen University [27]). (E) Relative numbers of
species with a genome project completed or in progress according to GOLD, per
eukaryotic group. Data from panels A–C are from [8]. Data from panels D and E are
publicly available and the taxonomic analysis can be found in the supplementary
data online. Abbreviations: CBOL ProWG, Consortium for the Barcode of Life Protist
Working Group; Env 18S, environmental 18S rDNA sequences; GOLD, Genomes
OnLine Database; OTU97, operational taxonomic unit (>97% sequence identity).
TRENDS in Ecology & Evolution
Metazoa Fungi Embryophyta Pro�sts
GenomesDescribed species
Total18S rDNA
Environmental18S rDNA
(A)
(B)
(C)
(D)
Figure 1. Relative representation of metazoans, fungi, and land plants versus all
the other eukaryotes in different databases. (A) Relative numbers of described
species according to the CBOL ProWG (n = 2 001 573). (B) Relative numbers of 18S
rDNA OTU97 in GenBank (n = 22 475). (C) Relative number of environmental 18S
rDNA OTU97 in GenBank (n = 1165). (D) Relative number of species with a genome
project completed or in progress according to GOLD, per eukaryotic group
(n = 1758). Data in panels A–C are from [8]. Abbreviations: CBOL ProWG,
Consortium for the Barcode of Life Protist Working Group; GOLD, Genomes
OnLine Database; OTU97, operational taxonomic unit (>97% sequence identity).
Opinion Trends in Ecology & Evolution May 2014, Vol. 29, No. 5
253
Author's personal copy
thanks to molecular phylogenies [15]. The five monophy-letic supergroups of eukaryotes are summarized in Box 1.The distribution of cultured and sequenced species over thetree provides a broad overview of our current knowledge of
eukaryotic diversity (Figure 4). However, a quarter of therepresented lineages lack even a single culture in any ofthe analyzed culture collections and, notably, 51% of themlack a genome. The most important gaps are within the
105
1580
0
20
40
60
80
100
120
0
0
50
100
150
200
250
300
20
40
60
80
100
120
140
0
0
129
258
290
3083
Colorless Pigmented Both
**
* **
* **
* * ** * * *
*
*Parasi�c organisms
Ince
rtae
sedi
s dic
tyoc
hoph
ycea
e
Ince
rtae
sedi
s dic
tyoc
hoph
ycea
e
Ince
rtae
sedi
s opi
stho
kont
a
Ince
rtae
sedi
s euk
aryo
ta
**
(A) The 25 species with the most strains in culture collec�ons
(B) The 25 species with the most ongoing genome projects
(C) The 25 most abundant SAGs OTU97
TRENDS in Ecology & Evolution
Figure 3. Eukaryotic diversity distribution among the analyzed databases. (A) The 25 speciesa with the most strains represented in the analyzed culture collections. (B) The
25 speciesa with the most ongoing genome projects. (C) The 25 most abundant SAGs OTU97 in the analyzed dataset. Abbreviations: MAST, marine stramenopile; OTU97,
operational taxonomic unit (>97% sequence identity); SAG, single amplified genome.aSome strains are not described at the species level and have been grouped by genus. Therefore they may represent more than a single species.
Opinion Trends in Ecology & Evolution May 2014, Vol. 29, No. 5
254
Author's personal copy
Rhizaria, the Amoebozoa, and the Stramenopila, wheremany lineages are still underrepresented. However, manyother lineages that lack any representative genome se-quence are also found in the relatively well-describedOpisthokonta and Excavata groups. This map is likely tobe incomplete because several genome projects may not bereflected in the GOLD database, and because many cul-tures are not deposited in culture collections, but theoverall trends probably afford an accurate representationof the biases we currently face.
Filling the gaps: how toAlthough there may not be bad choices when selectingorganisms for genome sequencing, there are certainlybetter choices if we aim to understand eukaryotic diversity.We argue that at least some of the effort should be specifi-cally directed towards filling the gaps in the eukaryotictree of life, focusing on those lineages that occupy keyphylogenetic positions. How can that be done? One optionis to sequence more cultured organisms. In fact, 95% ofprotist species in culture are not yet targeted for a genomeproject (Figure S1 in the supplementary data online).Thus, by obtaining the genome of some available culturedlineages that have not yet been sequenced, we could easilyfill some of the important gaps of the tree, including someheterotrophic Stramenopila, Amoebozoa, and Rhizaria.However, selecting species that are available in cultureis itself strongly biasing, and most lineages remain withoutany cultured representative [16]. Publicly accessible
protist collections [such as the American Type CultureCollection (ATCC) and the Culture Collection of Algaeand Protozoa (CCAP); summarized in Box 2] are consid-erably smaller than their bacterial or fungal counter-parts. Among the reasons is the lack of a required,systematic deposit of newly described taxa, in contrastto the situation for bacteria [17]. Notably, and unfortu-nately, half of the species with genome projects completedor in progress are not deposited in any of the five analyzedpublicly accessible culture collections. To avoid more ‘lostcultures’ in the future the community should establishand adopt standard procedures similar to those used inbacteriology to release cultures to protist collections. Thewhole community will benefit from this in the short andlong term. In addition, there is an inherent technical biasin culturing, as well as a bias in culturing efforts. Forexample, phototrophic representatives of Stramenopilaand Alveolata tend to have more cultures available thantheir heterotrophic counterparts (Figure 4). Indeed,70.6% of the most common protist strains present inculture collections are phototrophic organisms(Figure 3). Therefore there is a need both to increasethe culturing effort for a wider variety of environmentsand to develop novel and alternative culture techniques toretrieve refractory organisms [18], both of which taketime, energy, and funding. Importantly, culture collec-tions will need to be supported so that they can take on thechallenge of maintaining more cultures and open theirscope to include more difficult organisms that tend to be
Box 1. The five eukaryotic supergroups
Thanks to molecular phylogenetics, to ultrastructural analyses, and to
the efforts of many researchers, we have in recent years advanced
significantly our understanding of the tree of eukaryotes. According to
the most recent consensus taxonomy [28], the eukaryotes can be
divided into five monophyletic supergroups. We here introduce these
supergroups, detailing some specific features of each.
Amoebozoa: this group consists of amoeboid organisms, most of
them possessing a relatively simple life cycle and limited morpholo-
gical features, as well as a few flagellated organisms [30]. They are
common free-living protists inhabiting marine, freshwater, and
terrestrial environments. Some well-known amoebozoans include
the causative agent of amoebiasis (Entamoeba histolytica) and
Dictyostelium sp., a model organism used in the study of the origin
of multicellularity.
Archaeplastida: also known as ‘the green lineage’ or Viridiplantae,
this group comprises the green algae and the land plants. The
Archaeplastida is one of the major groups of oxygenic photosynthetic
eukaryotes [31]. Green algae are diverse and ubiquitous in aquatic
habitats. The land plants are probably the most dominant primary
producers on terrestrial ecosystems. Both green algae and land plants
have historically played a central role in the global ecosystem.
Excavata: the group Excavata was proposed based of shared
morphological characters [32], and was later confirmed through
phylogenomic analyses [33]. Most members of this group are
heterotrophic organisms, among them some well-known human
parasites such as Trichomonas vaginalis (the agent of trichomoniasis)
and Giardia lamblia (the agent of giardiasis), as well as animal
parasites such as Leishmania sp. (the agent of leishmaniasis) as well
as Trypanosoma brucei, and Trypanosoma cruzi (the agents of
sleeping sickness and Chagas disease respectively).
Opisthokonta: the opisthokonts include two of the best-studied
kingdoms of life: the Metazoa (animals) and the Fungi. Recent
phylogenetic and phylogenomic analyses have shown that the
Opisthokonta also include several unicellular lineages [34]. These
include the Choanoflagellata (the closest unicellular relatives of the
animals) and the Ichthyospora (that include several fish parasites that
impact negatively on aquaculture).
SAR (Stramenopila - Alveolata, and Rhizaria): three groups that
have been historically studied separately. Phylogenetic analyses,
however, have shown that those three groups share a common
ancestor, forming a supergroup known as SAR [36]. This eukaryotic
assemblage comprises the highest diversity within the protists.
Stramenopila: also known as heterokonts, the stramenopiles
include a wide range of ubiquitous phototrophic and heterotrophic
organisms [37]. Most are unicellular flagellates but there are also
some multicellular organisms, such as the giant kelps. Other relevant
members of the Stramenopila are the diatoms (algae contained within
a silica cell wall), the chrysophytes (abundant in freshwater environ-
ments), the MAST (marine stramenopile) groups (the most abundant
microbial predators of the ocean), and plant parasites such as the
Peronosporomycetes.
Alveolata: a widespread group of unicellular eukaryotes that have
adopted diverse life strategies such as predation, photoautotrophy,
and intracellular parasitism [29]. They include some environmen-
tally relevant groups such as the Syndiniales, the Dinoflagellata,
and the ciliates (Ciliophora), as well as the Apicomplexa group that
contains notorious parasites such as Plasmodium sp. (the agent of
malaria), Toxoplasma sp. (the agent of toxoplasmosis), and
Cryptosporidium sp.
Rhizaria: this is a diverse group of mostly heterotrophic unicellular
eukaryotes including both amoeboid and flagellate forms [35]. Two
iconic protist groups, Haeckel’s Radiolaria and the Foraminifera, are
members of the Rhizaria. Foraminifera have been very useful in
paleoclimatology and paleoceanography due to their external shell
that can be detected in the fossil record.
Incertae sedis: Latin for ‘of uncertain placement’, a term used to
indicate those organisms or lineages with unclear taxonomical
position.
Opinion Trends in Ecology & Evolution May 2014, Vol. 29, No. 5
255
Author's personal copy
excluded from existing collections, in particular hetero-trophs.
A complementary option to increase the breadth ofeukaryotic genomics is to use single cell genomics (SCG)[19]. Although the technology is still developing, this isprobably the best way we have today to retrieve genomicinformation from abundant microbial eukaryotes that areecologically relevant but are refractory to being cultured.For example, the single amplified genomes (SAGs) from
different global oceanic sites obtained during the TaraOceans cruise (M.E.S., unpublished data) fill reasonablywell the culture and genomic gaps that some of the mostabundant groups in the oceans suffer from (Figure 4). Inparticular, a significant fraction of the SAGs correspond touncultured organisms such as the marine stramenopilesMAST-4 and MAST-7 [20], chrysophyte groups H and G[21], and the Syndiniales [22]. Importantly, sequence tag-ging shows that only 10% of the SAGs are present in any
Tsukubamonas
Jakobida
Euglenozoa
HeteroloboseaFornica
ta
Parabasalia
Palpitomonas
Picozoa
Cryptophyceae
Glaucophyta
Embryophyta
Charophyta
PalmophyllalesPrasinophytaeNephroselmisMamiellophyceaePedinophyceae
UlvophyceaeChlorophyceaeTrebouxiophyceae
Chlorodendrophyceae
Rhodophyceae
Centrohelida
TelonemaHaptophyta
ChrysophyceaeSynurales
Eus�gmatales
Pinguiochrysidales
PhaeophyceeSchizocladia
Xanthophyceae
PhaeothamniophyceaeRaphidophyceaeAc�nophryidae
PelagophyceaeDictyochophyceae
DiatomeaBolidomonas
MAST 2MAST 1PeronosporomycetesHyphochytriales
OpalinataBlastocys�s
MAST 12
Bicosoecid
aMAST 3
Placidida
MAST
4 / 7
/ 11
MAST 8 / 1
0
Laby
rinth
ulom
ycet
es
Cilio
phor
a
Perk
insid
ae
Synd
inial
es
Oxyr
rhis
Dinofla
gella
ta
Colp
odel
lida
Chro
mer
ida
Apico
mpl
exa Chlorarachniophyta
Imbricatea
Glissomonadida
Cercomonadidae
Pansomonadida
ThecofiloseaGranofiloseaM
etromonadea
Tremula
Phytomyxea
Ascetosporea
Filoreta
Gromia
Vampyrellida
Foraminifera
Acantharia
Polycis�nea
Cultures
Key:
GenomesSAGs
ColorlessPigmentedBoth
OpisthokontaAmoebozoaExcavataArchaeplas�daRhizariaAlveolataStramenopilaIncertae sedis
Collo
dictyo
nidae
Breviatea Myx
ogas
tria
Schi
zopl
asm
odiid
aGr
acili
podi
daFr
acto
vite
lliid
aPr
otos
telii
daCa
vost
eliid
aPr
otos
pora
ngiid
aDi
ctyo
stel
iaM
ul�c
iliaTu
bulin
eaDisc
osea
Arch
amoe
bae
Apusomonadida
Ancyromonadida
AphelidaNuclearia
CorallochytriumFilasterea
Choanomonada
Metazoa
IchthyosporeaFungi
Rozella
Fon�cula
Mantam
onas
Malawim
onas
Preaxosty
la
Rigifilida
TRENDS in Ecology & Evolution
Figure 4. The tree of eukaryotes, showing the distribution of current effort on culturing, genomics, and environmental single amplified genome (SAG) genomics for the
main protistan lineages. Eukaryotic schematic tree representing major lineages. Colored branches represent the seven main eukaryotic supergroups, whereas grey
branches are phylogenetically contentious taxa. The sizes of the dots indicate the proportion of species/OTU97 in each database. Culture data are from the analyzed publicly
available protist culture collections (n = 3084). Genome data were extracted from the Genomes OnLine Database (GOLD) (n = 258) [9]. SAGs of OTU97 correspond to those
retrieved during the Tara Oceans cruise (n = 158) (M.E.S., unpublished data). Taxonomic annotation of all datasets is based on [28]. The ‘big three’ (in bold) have been
excluded from this analysis. Abbreviation: OTU97, operational taxonomic unit (>97% sequence identity).
Opinion Trends in Ecology & Evolution May 2014, Vol. 29, No. 5
256
Author's personal copy
culture collection, and only 2.5% have an ongoing genomeproject (based on cultured taxa). It is worth mentioningthat the SAGs so far available represent only marinemicroeukaryotes. Thus, although the analyzed SAGs cer-tainly overcome part of the bias, they do not cover the fulldiversity of eukaryotes.
Given the potential of SAGs to improve further ourunderstanding of eukaryotic diversity, an important ques-tion to ask is whether high-quality genome data can beacquired from SAGs [19]. Currently, there seems to be adiversity of outcomes when using SAGs owing to the biasintroduced by the whole-genome amplification procedure.The completeness range of the retrieved genome varies fromless than 10% to a complete genome, and depends on theintrinsic properties of the cell studied as well as on theamplification method [23]. Culture certainly provides amore reliable way to obtain a genome of high quality atpresent, and a species in culture also provides researcherswith a direct window to the biology of the organism and post-genomic research. Auto-ecological experiments, ultrastruc-ture analyses, and even functional experiments can all beperformed in culture, thereby providing a deeper context forthe genome and the organism. However, in light of the lack ofdata we currently face, and the unlikelihood that a signifi-cant increase in resources for cultivation will soon appear,we argue strongly that genomic sequencing of SAGs is animportant complement to culture-based research in further-ing our understanding of eukaryotic diversity.
Make the tree thrive: a call to actionGenome sequences have cast invaluable light on the clas-sification of organisms, notably in many cases where par-ticular species were misclassified (Box 3). However, theavailable genome sequences of eukaryotes do not inform usonly about the biology of the particular organism. They also
make significant contributions to our understanding ofeukaryotic biology in general, and to large-scale evolution-ary and ecological processes. Nevertheless, for this poten-tial to be completely fulfilled we must sample broadly, andthere are currently important gaps in the diversity ofeukaryotic genome sequences that undermine our effortsto capitalize on this potential. Understanding the whole ofeukaryotic diversity will doubtless contribute to our un-derstanding of specific biological questions, including someof our more pernicious problems in medicine, agriculture,evolution, and ecology.
We propose that filling in the eukaryotic tree at thegenomic level based on phylogenetic diversity should be apriority for the community. We also argue that this can beachieved by a combination of three complementaryapproaches. First, at least one genome from underrepre-sented lineages from which cultures are available shouldbe sequenced. This is a straightforward problem, requiringphycologists, protistologists, culture collection curators,and genomic sequencing centers to coordinate effortsand expertise to choose the best target taxa and sequencingstrategies. Second, efforts to culture diverse organismsshould be supported, by sampling additional areas of theplanet, developing novel techniques to include more recal-citrant species (especially heterotrophs), and by rewardingthis difficult but essential task, especially in youngerresearchers before they conclude en masse that such crucialwork is a professional dead-end. Such efforts are time-consuming and have a built-in failure rate that makesthem risky, and therefore policy changes will be helpfulin order that funding agencies, universities, and researchcenters recognize the value of such work independently ofthe publication outcome. Finally, microbial ecologists andgenomic centers should embrace the use of SCG and con-tinue to improve the technology, which we believe will be
Box 2. Protist culture collections
Culture collections are cornerstones for the development of all
microbiological disciplines. Cultures are key to the establishment of
model organisms and, therefore, to a better understanding of their
biology. Below we describe some of the major protistan collections.
ATCC (American Type Culture Collection; Manassas, Virginia,
USA): a private, non-profit biological resource center established in
1925 with the aim of creating a central collection to supply
microorganisms to scientists all over the world (http://www.atcc.org).
ATCC collections include a great variety of biological materials such
as cell lines, molecular genomics tools, microorganisms, and
bioproducts. The microorganism collection includes more than 18
000 strains of bacteria, 3000 different types of viruses, over 49 000
yeast and fungal strains, and 2000 strains of protists.
CCAP (Culture Collection of Algae and Protozoa; Oban, Scotland,
UK): a culture collection funded by the UK Natural Environmental
Research Centre (NERC) that contains algae and protozoa from both
freshwater and marine environments. The foundations of CCAP
(http://www.ccap.ac.uk) were laid by Prof. Ernst Georg Pringsheim
and his collaborators and the cultures they established at the
Botanical Institute of the German University of Prague in the 1920s.
Pringsheim moved to England where the collection was expanded
and taken over by Cambridge University in 1947. In 1970 these
cultures formed the basis of the Culture Centre of Algae and Protozoa
that later became the modern CCAP.
NCMA (Provasoli-Guillard National Center for Marine Algae and
Microbiota, East Boothbay, Maine, USA): this integrated collection of
marine algae, protozoa, bacteria, archaea, and viruses was named a
National Center and Facility by the US Congress in 1992. The NCMA
(http://ncma.bigelow.org) originated from private culture collections
established by Dr Luigi Provasoli at Yale University and Dr Robert R.L.
Guillard at Woods Hole Oceanographic Institution. When it was born
in the 1980s it was known as the Culture Collection of Marine
Phytoplankton (CCMP) and provided to the community algal cultures
of scientific interest or for aquaculture.
RCC (Roscoff Culture Collection; Roscoff, France): this collection
(http://www.roscoff-culture-collection.org) is located at the Station
Biologique de Roscoff and is closely linked to the Oceanic Plankton
group of this institution. They maintain more than 3000 strains of
marine phytoplankton, especially picoplankton and picoeukaryotes
from various oceanic regions. Most of the strains are available for
distribution whereas others are in the process of being described.
SAG (Sammlung von Algenkulturen: Culture Collection of Algae at
Go ttingen University, Go ttingen, Germany): the SAG is a non-profit
organization maintained by the University of Gottingen (http://
www.epsag.uni-goettingen.de). The collection primarily contains
microscopic algae and cyanobacteria from freshwater or terrestrial
habitats, but there are also some marine algae. With more than 2400
strains, the SAG is among the three largest culture collections of
algae in the world. Prof. Pringsheim is also the founder of the SAG: it
was initiated in 1953 when he returned to Gottingen after his time as a
refugee scientist in England. From then on the Pringsheim algal
collection has been growing and evolving into the service collection
we know nowadays.
Opinion Trends in Ecology & Evolution May 2014, Vol. 29, No. 5
257
Author's personal copy
the key to filling in missing parts of the tree in the shortterm. To coordinate all these efforts, funding agenciesshould also support the development of communityresources such as publicly accessible culture collectionsand the maintenance of key taxa that are difficult to keep.
We believe strongly that the time is ripe to reverse thegenome sequencing bias in the tree of eukaryotes. We nowhave in our hands all the elements needed to change thisskewed view and further our understanding of eukaryoticbiology and evolution. All that needs to change is the willand a joint coordinated initiative. Thus, we hope that theeukaryotic community will welcome this proposal to build arepresentative and diverse ‘Genomic Encyclopedia ofEukaryotes’ and collaborate to make this happen.
Appendix A. Supplementary dataSupplementary data associated with this article can be found, in the onlineversion, at http://dx.doi.org/10.1016/j.tree.2014.03.006.
References1 Stanier, R.Y. et al. (1957) The Microbial World, Prentice-Hall2 Wu, D. et al. (2009) A phylogeny-driven genomic encyclopaedia of
Bacteria and Archaea. Nature 462, 1056–10603 Pennisi, E. (2009) No genome left behind. Science 326, 794–7954 Bennetzen, J. and Kellogg, E. (1998) A plant genome initiative. Plant
Cell 10, 488–4945 Galagan, J.E. et al. (2005) Genomics of the fungal kingdom: insights
into eukaryotic biology. Genome Res. 15, 1620–16316 Degrave, W.M. et al. (2001) Parasite genome initiatives. Int. J.
Parasitol. 31, 532–5367 Ruiz-Trillo, I. et al. (2008) A phylogenomic investigation into the origin
of metazoa. Mol. Biol. Evol. 25, 664–6728 Pawlowski, J. et al. (2012) CBOL Protist Working Group, barcoding
eukaryotic richness beyond the Animal, Plant, and Fungal Kingdoms.PLoS Biol. 10, e1001419
9 Pagani, I. et al. (2012) The Genomes OnLine Database (GOLD) v.4,status of genomic and metagenomic projects and their associatedmetadata. Nucleic Acids Res. 40, D571–D579
10 Day, J.G. et al. (2004) Pringsheim’s living legacy: CCALA, CCAP, SAGand UTEX culture collections of algae. Nova Hedwigia 79, 27–37
11 Bowler, C. et al. (2008) The Phaeodactylum genome reveals theevolutionary history of diatom genomes. Nature 456, 239–244
12 Cock, J.M. et al. (2010) The Ectocarpus genome and the independentevolution of multicellularity in brown algae. Nature 465, 617–621
13 Pais, M. et al. (2013) From pathogen genomes to host plant processes:the power of plant parasitic oomycetes. Genome Biol. 14, 211
14 Van Dooren, G.G. and Striepen, B. (2013) The algal past and parasitepresent of the apicoplast. Annu. Rev. Microbiol. 67, 271–289
15 He, D. et al. (2014) An alternative root for the eukaryote tree of life.Curr. Biol. 24, 465–470
16 del Campo, J. et al. (2013) Culturing bias in marine heterotrophicflagellates analyzed through seawater enrichment incubations.Microb. Ecol. 66, 489–499
17 Lapage, S.P. et al. (1992) International Code of Nomenclature ofBacteria, ASM Press (http://www.ncbi.nlm.nih.gov/books/NBK8817/)
18 del Campo, J. et al. (2013) Taming the smallest predators of the oceans.ISME J. 7, 351–358
19 Yoon, H.S. et al. (2011) Single-cell genomics reveals organismalinteractions in uncultivated marine protists. Science 332, 714–717
20 Massana, R. et al. (2013) Exploring the uncultured microeukaryotemajority in the oceans: reevaluation of ribogroups withinstramenopiles. ISME J. 8, 854–866
21 del Campo, J. and Massana, R. (2011) Emerging diversity withinChrysophytes, Choanoflagellates and Bicosoecids based onmolecular surveys. Protist 162, 435–448
22 Guillou, L. et al. (2008) Widespread occurrence and genetic diversity ofmarine parasitoids belonging to Syndiniales (Alveolata). Environ.Microbiol. 10, 3349–3365
23 Stepanauskas, R. (2012) Single cell genomics: an individual look atmicrobes. Curr. Opin. Microbiol. 15, 613–620
24 Gachon, C.M.M. et al. (2007) The Culture Collection of Algae andProtozoa (CCAP): a biological resource for protistan genomics. Gene406, 51–57
25 Vaulot, D. et al. (2004) The Roscoff Culture Collection (RCC):a collection dedicated to marine picoplankton. Nova Hedwigia 79,49–70
26 Andersen, R. et al. (1997) Provasoli–Guillard National Center forculture of marine phytoplankton 1997 list of strains. J. Phycol. 33(s6), 1–75
27 Friedl, T. and Lorenz, M. (2012) The Culture Collection of Algae atGottingen University (SAG): a biological resource for biotechnologicaland biodiversity research. Procedia Environ. Sci. 15, 110–117
28 Adl, S.M. et al. (2012) The revised classification of eukaryotes. J.Eukaryot. Microbiol. 59, 429–514
Box 3. The rectification of names and our understanding of eukaryotic biology
A better understanding on the eukaryotic diversity has a deep impact
in several biological disciplines such as medicine, agriculture,
evolution, and ecology. A large body of research backs up this
statement. Below we mention a few examples that illustrate the
power of having a better understanding of the diversity, biology, and
evolution of eukaryotes.
Medicine has greatly benefited from evolutionary studies in
eukaryotes. Studies on the genome and biology of close relatives of
parasites have provided unique insights into analogous molecular
mechanisms involved in the clinical effects of parasites. A proper
taxonomic assignment of pathogenic organisms has also been the
key to fighting them. A good example is Pneumocystis, an
opportunistic pathogen affecting immunocompromised patients,
predominantly HIV-infected. Pneumocystis was considered for years
to a protozoan of unclear taxonomic assignment. It was not until
molecular data allowed researchers to properly assign Pneumocystis
to the fungi, in 1988, that adequate treatments based on antifungal
agents could be used [38]. The opposite situation happened with the
fungus-like Phytophthora, the causative agent of the potato blight.
New molecular data showed that Phytophthora are peronosporomy-
cetes (stramenopiles) within the order Peronosporales, and not fungi
as previously thought, thus explaining the ineffective use of
fungicides [39]. Further knowledge of its genome provided insights
not only into its evolution but also into the potential reasons for its
speed to form resistant forms [40].
It is, however, in evolutionary studies where the impact of having a
broad taxon sampling of eukaryotes is more apparent. Indeed, and
looking back in time, it is clear that the absence of key taxa in
evolutionary analyses led to hypotheses that are now known to be in
error. The fact is that to elucidate which genomic or morphological
features have been conserved, which were ancestral to eukaryotes,
and which are novel, one needs to perform comparative analyses that
must include key taxa from each major eukaryotic lineage. For
example: instances of lineage-specific gene loss in Choanoflagellatea
and Fungi, and the absence of representative taxa from non-parasites
Excavata and Rhizaria, confounded attempts to reconstruct accurately
the gene content of the last unicellular ancestor of metazoans [41] and
the last eukaryotic common ancestor [42], respectively.
Ecology is also influenced by a better understanding of eukaryote
biology. The global ecological cycles are deeply influenced by several
groups of eukaryotes, most of them unicellular. We have a good
understanding of phototrophic eukaryotes that, together with the
Cyanobacteria, drive most of the carbon cycle and the oxygen
production on earth. Nevertheless, our understanding of hetero-
trophic protists remains insufficient. For example, both MASTs and
the Syndiniales are extremely abundant in the oceans [43]. Therefore,
they are surely influential in global processes. However, we cannot
understand their role if we lack information on their metabolic
pathways or biology, something we can only obtain from genomic
data.
Opinion Trends in Ecology & Evolution May 2014, Vol. 29, No. 5
258
Author's personal copy
29 Fast, N.M. et al. (2002) Re-examining alveolate evolution usingmultiple protein molecular phylogenies. J. Eukaryot. Microbiol. 49,30–37
30 Smirnov, A. et al. (2005) Molecular phylogeny and classification of thelobose amoebae. Protist 156, 129–142
31 Leliaert, F. et al. (2012) Phylogeny and molecular evolution of the greenalgae. CRC Crit. Rev. Plant Sci. 31, 1–46
32 Simpson, A.G.B. and Patterson, D.J. (1999) The ultrastructure ofCarpediemonas membranifera (Eukaryota) with reference to the‘Excavate hypothesis’. Eur. J. Protistol. 35, 353–370
33 Hampl, V. et al. (2009) Phylogenomic analyses support the monophylyof Excavata and resolve relationships among eukaryotic ‘supergroups’.Proc. Natl. Acad. Sci. U.S.A. 106, 3859–3864
34 Torruella, G. et al. (2011) Phylogenetic relationships within theOpisthokonta based on phylogenomic analyses of conserved singlecopy protein domains. Mol. Biol. Evol. 29, 531–544
35 Burki, F. and Keeling, P.J. (2014) Rhizaria. Curr. Biol. 24, 103–107
36 Burki, F. et al. (2007) Phylogenomics reshuffles the eukaryoticsupergroups. PLoS ONE 2, 790
37 Riisberg, I. et al. (2009) Seven gene phylogeny of heterokonts. Protist160, 191–204
38 Edman, J.C. et al. (1988) Ribosomal RNA sequence showsPneumocystis carinii to be a member of the fungi. Nature 334, 519–522
39 Thines, M. and Kamoun, S. (2010) Oomycete–plant coevolution: recentadvances and future prospects. Curr. Opin. Plant. Biol. 13, 427–433
40 Haas, B.J. et al. (2009) Genome sequence and analysis of the Irishpotato famine pathogen Phytophthora infestans. Nature 461, 393–398
41 Sebe-Pedros, A. et al. (2010) Ancient origin of the integrin-mediatedadhesion and signaling machinery. Proc. Natl. Acad. Sci. U.S.A. 107,10142–10147
42 Sebe-Pedros, A. et al. (2014) Evolution and classification of myosins, apaneukaryotic whole genome approach. Genome Biol. Evol. 6, 290–235
43 Not, F. et al. (2009) New insights into the diversity of marinepicoeukaryotes. PLoS ONE 4, 7
Opinion Trends in Ecology & Evolution May 2014, Vol. 29, No. 5
259