diversity patterns of uncultured haptophytes unravelled by...
TRANSCRIPT
Diversity patterns of uncultured Haptophytes unravelledby pyrosequencing in Naples Bay
LUCIE BITTNER,* ANGELIQUE GOBET,*† STEPHANE AUDIC,* SARAH ROMAC,*
ELIANNE S. EGGE,‡ SEBASTIEN SANTINI ,§ HIROYUKI OGATA,§ IAN PROBERT,*
BENTE EDVARDSEN‡ and COLOMBAN DE VARGAS*
*CNRS, UMR7144 & Universite Pierre et Marie Curie, Team EPPO, Station biologique de Roscoff, Place Georges Tessier,
Roscoff, France, †Genoscope (CEA), CNRS UMR 8030, Universite d’Evry, 2 rue Gaston Cremieux, BP5706, 91057 Evry,
France, ‡Department of Biology, Marine Biology, University of Oslo, NO-0316 Oslo, Norway, §CNRS, Aix-Marseille
Universite, IGS UMR7256, FR-13288 Marseille, France
Abstract
Haptophytes are a key phylum of marine protists, including ~300 described morphospe-
cies and 80 morphogenera. We used 454 pyrosequencing on large subunit ribosomal
DNA (LSU rDNA) fragments to assess the diversity from size-fractioned plankton sam-
ples collected in the Bay of Naples. One group-specific primer set targeting the LSU
rDNA D1/D2 region was designed to amplify Haptophyte sequences from nucleic acid
extracts (total DNA or RNA) of two size fractions (0.8–3 or 3–20 lm) and two sampling
depths [subsurface, at 1 m, or deep chlorophyll maximum (DCM) at 23 m]. 454 reads
were identified using a database covering the entire Haptophyta diversity currently
sequenced. Our data set revealed several hundreds of Haptophyte clusters. However,
most of these clusters could not be linked to taxonomically known sequences: consider-
ing OTUs97% (clusters build at a sequence identity level of 97%) on our global data set,
less than 1% of the reads clustered with sequences from cultures, and less than 12% clus-
tered with reference sequences obtained previously from cloning and Sanger sequencing
of environmental samples. Thus, we highlighted a large uncharacterized environmental
genetic diversity, which clearly shows that currently cultivated species poorly reflect the
actual diversity present in the natural environment. Haptophyte community appeared to
be significantly structured according to the depth. The highest diversity and evenness
were obtained in samples from the DCM, and samples from the large size fraction (3–20 lm) taken at the DCM shared a lower proportion of common OTUs97% with the other
samples. Reads from the species Chrysoculter romboideus were notably found at the
DCM, while they could be detected at the subsurface. The highest proportion of totally
unknown OTUs97% was collected at the DCM in the smallest size fraction (0.8–3 lm).
Overall, this study emphasized several technical and theoretical barriers inherent to the
exploration of the large and largely unknown diversity of unicellular eukaryotes.
Keywords: 454 pyrosequencing, DCM, environmental genetic diversity, environmental samples,
Haptophyta, LSU rDNA, LSU rRNA
Received 2 October 2011; revision received 13 September 2012; accepted 20 September 2012
Introduction
Massive parallel pyrosequencing has supplemented
Sanger sequencing in recent years, especially for
environmental exploration of the microbial world
(Margulies et al. 2005; Bik et al. 2012). The use of
high-throughput sequencing technology avoids the time-
consuming and potentially biasing ligation and transforma-
tion steps inherent to classical clone library approaches,
and enables relatively exhaustive study of environmental
biodiversity for a lower cost (Sogin et al. 2006).Correspondence: Lucie Bittner, Fax: +33 (0)2 98 29 23 23;
E-mail: [email protected]
© 2012 Blackwell Publishing Ltd
Molecular Ecology (2013) 22, 87–101 doi: 10.1111/mec.12108
The targeted locus most commonly chosen to study
microbial community diversity is the small subunit of
the ribosomal DNA or RNA (SSU rDNA or rRNA). This
molecular marker has many advantages, notably being
present and fulfilling the same function in all organisms
(Olsen et al. 1986; Woese 1987). The singular, mosaic
evolutionary rate of this marker facilitates the design of
broad-taxonomic range primers and probes in highly
conserved regions, while comparisons of variable
regions is used for diversity studies. Variable SSU
rDNA regions have already been used in many 454
pyrosequencing surveys of the environmental diversity
of Bacteria and Archaea (e.g. Kysela et al. 2005; Sogin
et al. 2006; Huber et al. 2007; Roesch et al. 2007; Huse
et al. 2008; Brown et al. 2009; Barberan et al. 2011;
Eiler et al. 2011) and protists (e.g. Amaral-Zettler et al.
2009; Brown et al. 2009; Stoeck et al. 2009, 2010; Behnke
et al. 2010; Nolte et al. 2010; Cheung et al. 2010; Pawlowski
et al. 2011; Edgcomb et al. 2011; Logares et al. 2012).
The choice of the genomic region to amplify can be
constrained by both analytical (informativeness of the
targeted region for diversity studies, relevance of com-
parative database) and technological steps (e.g. higher
error rates are found when sequences longer than
300 bp are amplified, or long fragments can statistically
involve more homopolymers) (Huse et al. 2007; Schloss
2010; Kunin et al. 2010; Behnke et al. 2010). In particular,
the interpretation of environmental sequences is highly
dependent on the use of a reference database including
as many sequences as possible from taxonomically
described organisms (Stoeck et al. 2010; Pawlowski et al.
2011). SSU rDNA is by far the most common genetic
marker used to identify the strains of bacterial, archaeal
or eukaryotic microbes. However, the heterogeneity
of substitution rates in SSU rDNA can be a weakness
when diversity studies are undertaken at a global
eukaryotic scale. For example, relatively high evolution-
ary rates are observed in SSU rDNA sequences of
Foraminifera, Acantharea and Acanthamoeba (Pawlowski
& Burki 2009; Caron et al. 2009), as compared to Prasin-
ophyceae (Piganeau et al. 2011) or Haptophyta (Liu
et al. 2009). Most of pyrosequencing surveys focusing
on environmental unicellular eukaryotic (protistan)
diversity published to date (Stoeck et al. 2009, 2010;
Edgcomb et al. 2011; Pawlowski et al. 2011; Shalchian-
Tabrizi et al. 2011; Logares et al. 2012) have used ‘uni-
versal’ eukaryotic primers targeting the V4 and/or the
V9 variable regions of the SSU rDNA. In these studies,
pyrosequences assigned to the Haptophyta are rela-
tively rare. This trend might be partly explained by the
fact that some of the ecosystems investigated (e.g.
anoxic lakes, deep-sea habitats) are not expected to har-
bour significant Haptophyte populations. But important
bias could also result from lower affinity of the PCR
amplification step to Haptophytes, related to intrinsic
high GC content of Haptophyte DNA (Liu et al. 2009;
Stoeck et al. 2010) or simple mismatches in the priming
site, as observed in the V4 reverse ‘universal’ primer
(Stoeck et al. 2010). The low Haptophyte diversity
detected in these studies may also result from the rela-
tively slow rate of rDNA substitution in this lineage.
For example, V9 rDNA sequences are identical among
species within the genus Phaeocystis (Pawlowski et al.
2011). The amplification and sequencing of LSU rDNA
D1-D2 domain fragments of five clone libraries from
picoplankton size fraction (0.2–3 lm) samples taken in
subpolar and subtropical oceanic waters highlighted
hundreds of new Haptophyte ribotypes (Liu et al. 2009).
This unveiled diversity could explain the paradox of
the apparent dominance of Haptophytes in photosyn-
thetic pigment–based analyses from marine ecosystems
compared to the scarcity of Haptophyte sequences in
SSU rDNA studies when using universal primers.
In this study, we probed marine Haptophyte genetic
diversity using a primer set targeting specifically the
Haptophyte D1-D2 LSU rDNA region. This newly
designed primer set was used to amplify nucleic acid
extracts from water samples collected in the Bay of
Naples, a location a priori rich in Haptophyta (e.g.
McDonald et al. 2007). Eight samples were studied, corre-
sponding to a combination of the following parameters:
rDNA or rRNA/cDNA (reverse-transcribed from rRNA),
0.8 to 3lm or 3 to 20lm filtration size fractions, subsur-
face (1 m) or deep chlorophyll maximum (DCM, 23 m).
We addressed three main questions (i) can we reveal new
diversity in environmental Haptophyta populations at a
single geographical location, using a group-specific 454
pyrosequencing approach? (ii) which sample as template
(rDNA or rRNA), filtration size fraction and hydrog-
raphical conditions tested herein yield the highest pro-
portion of unknown phylotypes and the highest
Haptophyte genetic diversity? and (iii) can we detect sig-
nificant differences in composition of the communities
according to depth, size fraction or template?
Materials and methods
Sampling, rDNA and rRNA extraction, PCRamplification and 454 sequencing
Samples were collected in the Bay of Naples on 13th
October 2009 at the ‘Mare Chiara’ station (position 40°48.5′ N, 14° 15′ E) (Fig. S1, Supporting information) as
part of the BioMarKs project (http://www.biomarks.eu/
). Sea water was sampled with Niskin bottles at two
depths (1 and 23 m). Samples from 1 m are hereafter
referred to as ‘subsurface’ samples, whereas the deeper
samples correspond to the deep chlorophyll maximum
© 2012 Blackwell Publishing Ltd
88 L. BITTNER ET AL.
‘DCM’. After prefiltration through a 20-lm pore-size
plankton net, 30 L of sea water was successively filtered
through 3-lm and 0.8-lm pore-size polycarbonate filters
(142-mm polycarbonate filters). To limit RNA degrada-
tion, filtration time did not exceed 30 min. Filters
were then flash-frozen in liquid nitrogen and stored at
�80 °C. In the laboratory, filters with cells were cryo-
crushed (6 knocks/sec for 1 min; FreezerMill 6700).
Total DNA and RNA were extracted simultaneously
from the same crushed filter using the NucleoSpin®
RNA L kit and quantified using a Nanodrop ND-1000
Spectrophotometer. The quality of nucleic acid extracts
was checked on a 1.5% agarose gel. Total RNA extracts
were treated with the TurboDNA free kit in order to
remove any contaminating DNA. RT–PCR was then
performed with Superscript III according to the manu-
facturer’s instructions. Eight samples were finally
obtained, corresponding to a combination of the follow-
ing conditions: rDNA or rRNA (cDNA), filtration size
fractions of [0.8–3 lm] or [3–20 lm], surface or DCM.
Based on a reference alignment of LSU rDNA D1-D2
sequences from 172 cultured Haptophyte strains repre-
senting 75 species belonging to all known families,
a primer set, named LSU1 (Table S1, Supporting
information), was manually designed to specifically
amplify Haptophyta sequences in environmental sam-
ples (Fig. S2, Supporting information). A nucleic frag-
ment ranging between 350 and 400 bp was targeted,
and for the highest number of Haptophyta lineages, we
tried to minimize the number of mismatches appearing
in the 5′ region of the primer. At the same time, we
tried to maximize the number of mismatches with LSU
sequences from non-Haptophyta lineages, referring to
LSU sequences available on the SILVA database (~23 600
sequences) (Table S2, Supporting information). Specific-
ity of the primer set was then tested on DNA
extracts of several protistan cultures (Haptophyte and
non-Haptophyte) by PCR, cloning and Sanger sequenc-
ing. These preliminary steps allowed us to consider our
primer set as Haptophyta-biased.
PCRs were conducted with ‘fusion’ primers, which
include the primers designed in this study linked to
adaptor and key sequences required for 454 sequencing
on a FLX Titanium Sequencer. For each of the 8 ampli-
fied samples, a 7-bp multiplex identifier or MID
sequence was designed and included in one of the
fusion primers in order to identify the origin of every
single read from the pooled population generated on a
single run. Structures of the ‘fusion’ primers were as fol-
lows: Primer 1: (5′) Adaptor A + MID + Key + [Forward
primer] (3′); Primer 2: (5′) Adaptor B + Key + [Reverse
primer] (3′) (with Adaptor A: 5′-CCATCTCATCCCTGC
GTGTCTCCGAC-3′, Adaptor B: 5′-CCTATCCCCTGTGT
GCCTTGGCAGTC-3′, and Key: 5′-TCAG-3′). Three PCR
amplifications were conducted from each of the 8
extracts with Phusion® High-Fidelity DNA Polymerase
(Finnzymes) with an initial denaturation step at 98 °Cfor 30 s, followed by 25 cycles of 10 s at 98 °C, 30 s at
53 °C for annealing, 30 s at 72 °C, and a final elongation
step at 72 °C for 10 min. PCR products were run on a
1.5% agarose gel to check for successful amplification
products of the expected length. Replicated PCRs were
then pooled and purified using the NucleoSpin® Extract
II kit. The purified products were quantified using a
nanodrop spectrophotometer and finally mixed in equal
concentrations. The final mix was delivered for sequenc-
ing at the Norwegian Sequencing Centre, University of
Oslo. Emulsion PCR and sequencing were performed
using a GS FLX emPCR amplicon kit using unidirec-
tional sequencing with Lib-L chemistry (Genome
Sequencer FLX Titanium, 454 Life Sciences from Roche,
Brandford, CT, USA).
Sequence data cleaning and processing
454 GS FLX flowgrams (sff files) were deposited on
the Dryad database (see the Data Accessibility sec-
tion). From sff files, we extracted untrimmed sequence
and quality data using the sff2fastq software (http://
github.com/indraniel/sff2fastq), which converts files
to the easily parsable fastq format. In each sequence,
we searched for the MID, followed by the sequence
of the forward primer, the targeted genetic sequence
and the sequence of the reverse primer, in order to
assign sequences to one of the 8 initial samples. We
extracted the targeted part of each sequence, together
with its quality value. For each sequence, we com-
puted the expected number of errors in any 50-bp
window (EE) from the quality scores, using the for-
mula EE ¼ Pi 10
�Qi=4, where Q i is the quality value
of the flowgram at position i. Any sequence with a
50-bp window with > 1% error (EE⁄50 9 100 > 1%)
was discarded. Finally, we applied chimera detection
in each sample, using the uchime module from the
usearch v4.0 software (Edgar 2010; http://www.drive5.
com/usearch/), either using the external reference
database used later for sequence assignment (see
below) or using the experimental sequences (obtained
in this study) as references because chimeras in a
sample should be formed from sequences from the
same sample.
Taxonomic assignment of 454 reads
Hierarchical clustering. Taxonomic assignment of reads
was performed using our pre-existing database of LSU
rDNA sequences from Haptophyta (see the Data Acces-
sibility section). This database includes 1462 reference
© 2012 Blackwell Publishing Ltd
DIVERSITY PATTERNS OF UNCULTURED HAPTOPHYTES 89
LSU rDNA sequences generated by Sanger sequencing:
172 sequences from Haptophyte strains in culture (75
species representing all current known families and
almost all cultivable species currently known) and 1290
sequences from environmental clone libraries. The LSU
rDNA reference database provided in this study is cov-
ering the entire Haptophyta diversity currently
sequenced. For each primer set, we detected the for-
ward and reverse primers within the reference database
using position weight matrices allowing up to 5 degen-
eracies, and extracted the amplified parts of the
sequences. Experimental sequences (reads), sorted by
abundance, were then aligned with the reference
extracted sequences sorted by decreasing length. All
sequences, experimental and referential, were then clus-
tered to 85% identity using the global alignment cluster-
ing option of the uclust module from the usearch v4.0
software (Edgar 2010). Each 85% cluster was then reclu-
stered at a higher stringency level (86%) and so on
(87%, 88%,…) in a hierarchical manner up to 100%
similarity. Each experimental sequence was then
identified by the list of clusters to which it belonged to
at 85–100% levels. This information can be viewed as a
matrix with the rows corresponding to different
sequences and the columns corresponding to the cluster
membership at each clustering level. Taxonomic assign-
ment for a given read was performed by first looking
whether reference sequences clustered with the experi-
mental sequence at the 100% clustering level. If this
was the case, the last common taxonomic name of the
reference sequence(s) within the cluster was used to
assign the environmental read. If not, the same proce-
dure was applied to clusters from 99% to 85% similarity
if necessary, until a cluster was found containing both
the experimental read and reference sequence(s), in
which case sequences were taxonomically assigned as
described above.
Phylogenetic mapping. Several dedicated programs for
phylogenetic mapping of anonymous sequences onto
reference trees, which can handle very large data sets,
are available (Matsen et al. 2010; Berger et al. 2011).
These methods require a reference alignment and a
corresponding reference phylogenetic tree, onto which
the position of the query sequences is examined using
phylogenetic tree reconstruction algorithms. Here, two
reference alignments of LSU rDNA were built: the first
included the 172 sequences of cultured strains from our
database (labelled subsequently data set 1), the second
including all 1462 sequences from our database (cul-
tures and environmental samples, labelled subsequently
data set 2). LSU rDNA sequences were aligned using
MAFFT v6.818 taking into account RNA secondary
structure (Q-INS-i option; Katoh & Toh 2008), with
subsequent de visu refinement in BioEdit v7.0.5.3 (Hall
1999). The general time-reversible (GTR) model was
selected as the best nucleotide substitution model
according to the corrected Akaike information criterion
and the Bayesian Information Criterion implemented
and calculated in jModeltest v0.1.1 (Posada 2008). Two
LSU rDNA trees (tree1 corresponding to data set 1 and
tree2 corresponding to data set 2) were built using
maximum-likelihood (ML) inference with a GTR model
and a gamma and invariant sites distributions as imple-
mented in PhyML v3.0 (Guindon & Gascuel 2003).
These alignment/tree couples were used as references
for phylogenetic mapping. Environmental 454 reads
obtained after our cleaning process and assigned to
Haptophyta (see section Sequence data cleaning and
processing above) were aligned to the hidden Markov
model (HMM) profiles built from the reference align-
ments using tools from the HMMER v3.0 suite (http://
hmmer.org/). The resulting alignments were curated,
that is, gapped columns in the reference alignment were
removed. Finally, the phylogenetic positions of the
reads were computed using Pplacer, which enables
efficient ML and posterior probability phylogenetic
mapping (Matsen et al. 2010). Each 454 read was thus
mapped to the reference alignment, and the most
probable location was reported on the reference trees.
Haptophyte community diversity
Differences between Haptophyte communities were
investigated by considering operational taxonomic units
(OTUs) as Haptophyte sequences clustering at 97%
identity. This level of clustering was chosen according
to cultured Haptophyta intra- and inter-rank genetic
diversity (Fig. S3, Supporting information) and accord-
ing to the shape of rarefaction curves built with our 454
data (Fig. S4, Supporting information). Abundance
tables from OTU97% were built. Alpha diversity was cal-
culated using Shannon’s diversity index (Shannon 1948)
and Simpson’s evenness (Simpson 1949). Mean values
of Shannon and Simpson diversity indexes were com-
pared by an overall Kruskal–Wallis test and subsequent
pairwise Wilcoxon–Mann–Whitney tests. The OTU97%
abundance tables were standardized using the Hellinger
transformation to lower the weight of rare ‘species’
(Legendre & Gallagher 2001), and pairwise distance
matrices were then calculated using the Bray–Curtis
dissimilarity index (Bray & Curtis 1957). Variation in
Haptophyte community structure was then determined
by applying nonmetric multidimensional scaling
(NMDS, Gower 1966) to the dissimilarity matrices. The
Haptophyte community composition recorded in differ-
ent conditions (i.e. depths, size fractions and template)
was compared and tested for significant differences
© 2012 Blackwell Publishing Ltd
90 L. BITTNER ET AL.
using the analysis of similarity (ANOSIM, Clarke 1993),
followed by 10 000 Monte Carlo permutation tests and
Bonferroni correction. Community turnover was deter-
mined by calculating the proportion of shared OTU97%
and the proportion of specific OTU97% between the 8
samples. All data and statistical analyses were carried
out using the vegan (Oksanen et al. 2007), the MASS and
the limma packages (from the bioconductor website:
http://www.bioconductor.org/biocLite.R), as well as
custom R scripts in the R statistical environment (R ver-
sion 2.10.0, R Development Core Team, 2009). ANOSIM
was calculated through the PAST software (Hammer
et al. 2001).
Results
Haptophyte diversity revealed by 454 pyrosequencingof LSU rDNA sequences
Following our stringent strategy of data cleaning and
processing, 13 501 reads were kept and assigned as
Haptophyta sequences (Table 1, Table S3, Supporting
information). Our cleaning process involved the
removal of reads with errors in the adaptor and MID
sequences, removal of reads with one or more unre-
solved bases (Ns), a strict selection of sequences with
error score <1% and removal of presumed chimera.
Consequently, the quantity of analysed reads was rather
low (~32%, Table S3), but of high quality. Only reads
that clustered with Haptophyta reference sequences to
an identity level >85% were retained. A large majority
of reads (~87%) were assigned to Haptophyta after
cleaning and clustering steps (Table S3, Fig. S5, Sup-
porting information). This result confirms the high
Haptophyte specificity of the primer set LSU1 here
designed. In a parallel ongoing study (Bittner et al. in
preparation), the same samples were indeed amplified
with universal eukaryotic primer sets targeting variable
regions of the SSU rRNA (the V4 and the V9 region),
but only a very low proportion of the pyrosequencing
reads were assigned to Haptophyta (in the best case
2.6% of the total reads; Fig. S6, Supporting information).
It demonstrates furthermore the advantage of using the
Haptophyta-biased primers set LSU1 to specifically
amplify Haptophyta rDNAs out of total DNA and RNA
samples.
For each sample, the total number of cleaned Hap-
tophyta 454 reads, the number of Haptophyta OTUs97%,
the proportion of single reads (sr, in%), the percentage
of OTUs97% including only one read (uq), the percent-
age of assigned OTUs97% clustering with a reference
sequence previously obtained with Sanger sequenc-
ing (aREF) and the percentage of assigned OTUs97%clustering with a reference sequence from a cultured T
able
1Number
ofread
san
dnumber
ofOTU
97%
obtained
aftercleaningprocess
(see
sectionSeq
uen
cedatacleaningan
dprocessing).(sr)%
indicates
theproportionofsingle
read
s.(uq)%
indicates
theproportionofOTU
97%includingonly
1read
.(a
REF)%
indicates
theproportionofOTUs 9
7%assigned
toareference
sequen
ce(a
reference
sequen
cewas
producedwithSan
ger
sequen
cingan
dcorrespondsto
theam
plificationfrom
Hap
tophytanstrain
incu
lture
orfrom
environmen
talclonelibraries).(a
REFcu
lt)%
indicates
thepro-
portionofOTUs 9
7%assigned
toareference
sequen
ceobtained
from
culturedHap
tophyta
strains.
DCM
Subsu
rface
Dep
th
0.8–3µm
20–3
µm
0.8–3µm
20–3
µm
Sizefraction
rRNA
rDNA
rRNA
rDNA
rRNA
rDNA
rRNA
rDNA
Tem
plate
2032
read
s
(4.1%
sr)
254OTUs 9
7%
(32.7%
uq)
(16.5%
a REF
/1.4%
a REFcu
lt)
2296
read
s
(4.3%
sr)
286OTUs 9
7%
(34.3%
uq)
(14%
a REF
/1.4%
a REFcu
lt)
872read
s
(13.5%
sr)
284OTUs 9
7%
(41.5%
uq)
(15.3%
a REF
/1.6%
a REFcu
lt)
1517
read
s
(9.3%
sr)
361OTUs 9
7%
(39.3%
uq)
(17.9%
aREF
/1.7%
a REFcu
lt)
2012
read
s
(7.2%
sr)
371OTUs 9
7%
(39.1%
uq)
(18.6%
a REF
/1.3%
a REFcu
lt)
1745
read
s
(7.4%
sr)
312OTUs 9
7%
(41%
uq)
(19.6%
a REF
/1.3%
a REFcu
lt)
1519
read
s
(6.2%
sr)
259OTUs 9
7%
(36.3%
uq)
(15.8%
a REF
/1.9%
a REFcu
lt)
1508
read
s
(4.3%
sr)
212OTUs 9
7%
(30.7%
uq)
(15.1%
aREF
/1.9%
a REFcu
lt)
LSU1(8
samples
pooled)
13501read
s
(1.8%
sr)
871OTUs 9
7%
(28%
uq)
(11.8%
a REF
/0.9%
a REFcu
lt)
© 2012 Blackwell Publishing Ltd
DIVERSITY PATTERNS OF UNCULTURED HAPTOPHYTES 91
Haptophyte strain (aREFcult) were calculated (Table 1).
On the global data set, the proportion of OTUs97%including only one read was about ~28% (Table 1).
The most striking result was the very low percentage
of OTU97% clustering with reference sequences. On the
global data set, only 11.8% of the OTU97% clustered
with reference sequences. Additionally, 0.9% of the
OTU97% clustered with reference sequences from cul-
tured Haptophyte strains, representing a wide range of
the group’s natural diversity. Fig. 1A, B further explores
this significant proportion of Haptophyte unreferenced
OTUs at different levels of clustering. Fig. 1A shows
the extent of new clusters that are built when environ-
mental 454 reads are added to Sanger sequences.
Fig. 1B depicts the proportion of unknown Haptophyta
clusters obtained with 454 sequencing for each set of
primers. Most Haptophyte assigned reads clustered
with Sanger reference sequences from environmental
clone libraries instead of clustering with a cultured ref-
erence Haptophyta taxa (Fig. 1B, Table 1). After a hier-
archical process of assignment decreasing to a sequence
identity level of 85%, only 7.5% of reads could be
assigned to cultured, reference Haptophyta taxa (data
not shown).
The proportion of totally new OTU97% (clusters that
are not including sequences previously obtained by San-
ger sequencing from culture or environmental samples,
which can be deduced from aREF) was in average higher
in sub-data sets from the large fraction size (3–20 lm)
sampled at the DCM, and corresponding to the rRNA
template (Table 1). However, the proportion of taxonom-
ically unknown OTU97% (clusters that are not including
any sequence from Haptophytes in culture, which can be
deduced from aCULT) was in average higher in the
samples from the small size fraction (0.8–3 lm) and from
the DCM (Table 1).
Haptophyte community diversity and structuring
Abundance tables of OTU97% were used to calculate the
Shannon’s diversity index and Simpson’s evenness. The
highest alpha diversity was found in the rRNA data
from the large size fraction samples collected at the
DCM (Fig. 2, Fig. S7, Supporting information). This dif-
ference in diversity was nevertheless only significant
when comparing samples from the subsurface and from
the DCM (Kruskal–Wallis and Wilcoxon tests, details of
the results not shown). The same trend was found
when OTUs were defined at lower levels of clustering
or when singletons were removed from the abundance
tables (data not shown).
Variations in the Haptophyte community structure
were determined by a two-dimensional representation
of NMDS (Fig. 3). The Haptophyte community seems to
be mostly structured according to the depth. Commu-
nity structure seems to be also relatively influenced by
the size fraction. In contrast, rDNA and rRNA commu-
nity structures reveal large overlap. Group separation of
the samples was further tested by analysis of similarity
(ANOSIM): the Haptophyte community inferred showed a
significant differentiation of community structuring
between the two sampling depths (R = 0.66, P < 0.05)
(Table S4, Supporting information). Any other signifi-
cant dissimilarity was further detected when comparing
the samples from the small and the large size fraction,
or comparing rDNA and rRNA data (Table S4).
OTU97% which can be found in all conditions were
rare: only 12 were found in our data set. Venn diagrams
comparing the number of common OTU97% between
samples showed that on the total data set, the propor-
tion of clusters sharing both rDNA and rRNA reads
(A)
(B)
Fig. 1 Known versus unknown diversity. (A) Number of clus-
ters as a function of clustering level. (B) Proportion of unas-
signed vs. assigned reads as a function of clustering level. Full
lines indicate the proportion of reads clustering at least with
one reference sequence obtained by Sanger sequencing of envi-
ronmental or cultured samples. Dashed lines indicate the pro-
portion of reads clustering with reference sequences from
cultured Haptophyte strains.
© 2012 Blackwell Publishing Ltd
92 L. BITTNER ET AL.
was about 1/2 (Fig. S8, Supporting information). The
proportion of common OTU97% between the small and
the large size fractions and between the subsurface and
the DCM samples reached, respectively, 43% and 39%
(Fig. S8). Samples including the highest proportion of
specific (or nonshared) OTU97% were taken at the DCM.
A pairwise comparison of each of the eight samples
clearly highlighted that the sample, which is sharing
the lowest proportion of common OTU97% with the
others, is the one taken at the DCM, corresponding to
the large size fraction and to the analyses of the rRNA
template (Fig. S9, Supporting information).
rDN
A
rRN
A
[0.8
-3]
[3-2
0]
subs
urfa
ce
DC
M4.6
4.8
5.0
5.2
Sha
nnon
div
ersi
ty in
dex
(OTU
97%
)
[0.8
-3] r
DN
A[0
.8-3
] rR
NA
[3-2
0] rD
NA
[3-2
0] rR
NA
subs
urfa
ce rD
NA
subs
urfa
ce rR
NA
DC
M rD
NA
DC
M rR
NA
[0.8
-3] s
ubsu
rface
[3-2
0] s
ubsu
rface
[0.8
-3] D
CM
[3-2
0] D
CM
4.6
4.8
5.0
5.2
Sha
nnon
div
ersi
ty in
dex
(OTU
97%
)
[3-2
0] rD
NA
sub
surfa
ce
[3-2
0] rR
NA
sub
surfa
ce
[0.8
-3] r
DN
A s
ubsu
rface
[0.8
-3] r
RN
A s
ubsu
rface
[3-2
0] rD
NA
DC
M
[3-2
0] rR
NA
DC
M
[0.8
-3] r
DN
A D
CM
[0.8
-3] r
RN
A D
CM
4.6
4.8
5.0
5.2
Sha
nnon
div
ersi
ty in
dex
(OTU
97%
)
Fig. 2 Boxplots summarizing the range of a-diversity (Shannon’s index) calculated at 97% clustering level. Top, middle and bottom
lines of boxes represent the 25th (lower hinge), 50th (median) and 75th (upper hinge) percentiles; whiskers represent the nonextreme
sample minimum and maximum (i.e. less than 1.5 9 the interquartile range of the box).
–0.6 –0.4 –0.2 0.0 0.2 0.4
–0.3
–0.2
–0.1
0.0
0.1
0.2
0.3
NMDS1
NM
DS
2
DCMsubsurface
Stress = 2.4%
–0.6 –0.4 –0.2 0.0 0.2 0.4
–0.3
–0.2
–0.1
0.0
0.1
0.2
0.3
NMDS1
NM
DS
2
[3-20][0.8-3]
Stress = 2.4%
–0.6 –0.4 –0.2 0.0 0.2 0.4
–0.3
–0.2
–0.1
0.0
0.1
0.2
0.3
NMDS1
NM
DS
2
DNAcDNA
Stress = 2.4%
Fig. 3 Haptophyte community structure based on NMDS (nonmetric multidimensional scaling) ordination of the LSU rDNA data set
distance matrices from OTU97%. Distances matrix was beforehand calculated using the Bray–Curtis dissimilarity index. Each object
on the plot represents a sample for a given template, size range and depth. Samples with the lighter colour correspond to the
samples taken at the subsurface, whereas samples with the darker colour correspond to samples taken at the DCM. Large circles and
large diamonds correspond to samples from the 3–20lm size fraction, whereas small circles and small diamonds correspond to sam-
ples from the 0.8–3lm size fraction. Circles correspond to rDNA samples, whereas diamonds correspond to rRNA samples. Similar-
ity in Haptophyte community structure is indicated by the distance between objects: a smaller distance indicates a higher
resemblance in community structure. Samples are here grouped according to the size range. The goodness-of-fit of the NMDS
representation is indicated by the low stress values.
© 2012 Blackwell Publishing Ltd
DIVERSITY PATTERNS OF UNCULTURED HAPTOPHYTES 93
Phylogenetic mapping
The 13 501 LSU environmental reads obtained after our
cleaning process were mapped onto a reference tree
built from an alignment of the 172 reference sequences
from cultured Haptophytes (Fig. 4 A, B). Considering
all samples, reads belonged to all Haptophyta orders,
except the Pavlovales and the Zygodiscales. The highest
proportion of reads was mapped in the Prymnesiales
from clade B2 (or Chrysochromulinaceae): considering
the pooled data set, 68.6% of the reads were identified
as Prymnesiaceae reads. Phaeocystales represented the
second most abundant group in terms of number of
reads (11.1%). An important proportion of the reads
(8.6%) could not be assigned precisely to an Hapto-
phyte species or even to an order. Prymnesiales from
clade B1 (or Prymnesiaceae) represented 5.4% of the
reads, Coccolithales 3%, Syracosphaerales and Isochrysi-
dales each 1.2%. 0.02% of the reads were also assigned
to the species Chrysoculter rhomboideus. In each sample,
the proportions above indicated are approximately the
same (Fig. 4B, Fig. S10, Supporting information). The
most important differences that can be highlighted
between communities from the subsurface and the
DCM are as follows (i) a higher proportion of reads
were assigned to Phaeocystales in DCM samples; and
(ii) reads assigned to the species Chrysoculter rhomboi-
deus were only found in the DCM samples.
Discussion
Stringent primer design and cleaning process allowan accurate targeted metagenomics approach
The LSU rDNA region targeted herein is longer than pre-
viously pyrosequenced eukaryotic genomic regions
(>350 bp in the current study, as compared to a maximum
of 270 bp in previous studies using the V4 SSU rDNA
region), increasing the likelihood of producing low-qual-
ity sequences towards the end of the reads (Huse et al.
2007; Gilles et al. 2011; Quince et al. 2011). Therefore, our
cleaning process was stringent, with the following two
steps (i) a chimera detection step for all reads; and (ii) spe-
cific quality checking for sequences appearing only once
in a sample. The proportion of low-quality reads detected
and eliminated by our cleaning pipeline was at least twice
as high as in previous published studies exploring protis-
tan diversity (Amaral-Zettler et al. 2009; Stoeck et al. 2009,
2010; Nolte et al. 2010; Behnke et al. 2010; Cheung et al.
2010; Pawlowksi et al. 2011). Considering that the esti-
mated error rate of the 454 GS FLX Titanium is about 1%
of the whole data output (Gilles et al. 2011), interpreting
our data using a 97% sequence identity threshold seems
to be a reasonable strategy to minimize inflation of the
number of clusters (OTUs) and hence to limit the overesti-
mation of diversity.
The clustering level used to define OTUs is lineage
specific and marker dependent (Caron et al. 2009; Nebel
et al. 2011). In our data set, rarefaction curves, based on
OTUs100%, did not reach saturation, whereas they did
when using OTUs97%. This clustering level was thus
also chosen to accommodate the relatively slow rate of
rDNA substitution known from Haptophyta, as
revealed by the presence of rDNA from two different
reference cultured species in the same OTU100%. Using
OTUs97%, the proportion of the clusters including only
one read is inferior to 1/3. This proportion is similar to
the one calculated in previous NGS diversity studies
(Sogin et al. 2006; Roesch et al. 2007; Brown et al. 2009;
Stoeck et al. 2009; Behnke et al. 2010; Huse et al. 2010).
The structure of the Haptophyte diversity found here
still supports the ‘rare biosphere’ model (Sogin et al.
2006; Dawson & Hagen 2009; Caron & Countway 2009):
environmental microbial communities are dominated by
a few relatively abundant populations, and hundreds of
low-abundance populations account for most of the
observed phylogenetic diversity.
For the first time, this study tested the accuracy of a set
of LSU rDNA primer to specifically amplify a major
group of unicellular eukaryotes – the Haptophytes – from
environmental samples. Our set of primers gave excellent
results concerning specificity (87% of postcleaning reads
were assigned to Haptophyta). Monchy et al. (2011) pub-
lished a comparative study of reads obtained with uni-
versal eukaryote primers and specific fungus-designed
primers, targeting SSU hypervariable regions, in which
enrichment in fungal sequences with specific primers
reached only 3–10%. The fungus-specific primers
decreased the proportion of Metazoa, Viridiplantae and
Stramenopiles sequences, but largely favoured the ampli-
fication of Katablepharidophyta and Cryptophyta. Thus,
designing primer sets allowing the extraction of a rather
large monophyletic group out of the otherwise extremely
rich and ancient protistan diversity is not trivial. Our
study highlighted particularly the importance of design-
ing both forward and reverse group-specific primers.
Fig. 4 Pplacer phylogenetic mapping of the 13 501 environmental reads (obtained after cleaning process) onto a phylogenetic tree
including all reference sequences from cultured Haptophytes (172 sequences of cultured strains). (A) Phylogenetic mapping on the
phylogenetic tree. Number of reads assigned to a node or a branch are indicated in the green dots. Nodes labelled with a (*) corre-spond to ‘basal’ nodes that do not have yet (uncontested) taxonomic denomination. (B) Details on the proportion of reads assigned
by phylogenetic mapping for the main Haptophyte lineages.
© 2012 Blackwell Publishing Ltd
94 L. BITTNER ET AL.
Chrysoculter rhomboideus
Isochrysidales
Syracosphaerales
Coccolithales
Prymnesiales B1
Basal nodes (*)
Phaeocystales
Prymnesiales B2
71,8 % 59,2 % 73,4 % 74,6 % 52,9 % 83,7 % 66,2 % 69,1 % 68,6 % Prymnesiales B2 4,4 % 5,4 % 8,6 % 10,7 % 16,6 % 5,4 % 17,4 % 16,5 % 11,1 % Phaeocystales 9,4 % 10,9 % 9,1 % 8,8 % 7 % 2,2 % 9,9 % 8,1 % 8,6 % Basal nodes (*) 9,5 % 11,7 % 5,6 % 3,3 % 6 % 4,1 % 4,5 % 3,4 % 5,9 % Prymnesiales B1 0,6 % 6,3 % 2,3 % 2,4 % 4,5 % 1,4 % 1,2 % 1,6 % 2,5 % Coccolithales0,5 % 2,5 % 0,9 % 0,1 % 8,8 % 1,9 % 0,8 % 1,1 % 1,9 % Syracosphaerales3,7 % 4 % 0,2 % 0 % 4,1 % 1,1 % 0 % 0,2 % 1,5 % Isochrysidales0 % 0 % 0 % 0 % 0,2 % 0,1 % 0,1 % 0 % 0,04 % Chrysoculter rhomboideus
(A)
(B)
Large (3-20)0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Subsurface
DNA
Subsurface
cDNA
Subsurface
DNA DNA
Subsurface DCM
DNA
DCMDCM
cDNA cDNA
DCM (poolingof the
8 samples)cDNA
Large (3-20) Small (0.8-3) Small (0.8-3) Large (3-20) Large (3-20) Small (0.8-3) Small (0.8-3) Total
© 2012 Blackwell Publishing Ltd
DIVERSITY PATTERNS OF UNCULTURED HAPTOPHYTES 95
454 sequencing reveals significant new environmentalHaptophyte diversity
Using Sanger sequencing of Haptophyte-specific clone
libraries, Liu et al. (2009) published 674 environmental
LSU rDNA D1-D2 sequences from the picoplankton size
fraction. None of environmental phylotypes was 100%
identical to any of the taxonomically predefined
sequences obtained with the sequencing of Haptophyte
in culture. In our study here, the use of 454 sequencing
LSU rDNA reads from a single geographical location
(the Bay of Naples) unveiled hundreds of new, unas-
signed Haptophyta clusters, even at relatively low clus-
tering levels. Among the few 454 reads that could be
assigned to pre-existing Sanger sequences, most of them
were related to environmental sequences, corroborating
previous results based on SSU rDNA clone libraries
(Shi et al. 2009; Cuvelier et al. 2010). At 97% clustering
level, we can suggest that at least 99% of the genetic
LSU rDNA Haptophyte diversity is currently not avail-
able in culture, as predicted by previous clone library
studies (Shi et al. 2009; Liu et al. 2009; Cuvelier et al.
2010; Shi et al. 2011). Strains that are currently in culture
therefore do not reflect by far the natural environmental
genetic diversity. Clearly, the extent of unknown protis-
tan genetic diversity will soar with the continued use of
next-generation sequencing (NGS) technologies.
The proportion of totally new OTU97% (OTUs that are
not clustering with sequences from cultured Hap-
tophyta strains or environmental samples) and also
diversity indexes were higher in rRNA sub-data sets
from the large size fraction (3–20 lm) taken at the
DCM. The slightly higher diversity observed with
rRNA sequences can be partly due to the fact that they
have been transcribed and retro-transcribed, and both
processes are more error prone than replication (Pulsi-
nelli & Temin 1994; Sydow & Cramer 2009). These
additional steps might have introduced artefacts that
might inflate the diversity of rRNA samples. Neverthe-
less, we would have expected that these artificial
sequences would constitute rare phylotypes. We
observed, however, that rDNA OTU97% were less
evenly distributed (few abundant OTUs and many rare
OTUs) than rRNA OTU97% (more even OTUs). As PCR
amplification preferably amplifies the more abundant
reads, we thus concluded that a slightly higher, but
‘true’ diversity was found in the rRNA samples.
The proportion of taxonomically unknown OTU97%
(OTUs that are not clustering with any sequences from
cultured Haptophyta strains) was higher in the samples
from the small size fraction (0.8–3 lm). It confirms pre-
vious published results obtained with SSU rDNA
sequences from environmental pico-prymnesiophytes
(Haptophyta < 2–3 lm) sorted by flow cytometry:
Cuvelier et al. (2010) showed that potential new Hapto-
phytan phylogenetic lineages can be found when study-
ing the smallest size fractions. Moreover, one can bear
in mind that only approximately ten Haptophyte spe-
cies of this size have been described (Vaulot et al. 2008).
Taxonomic characterization of the novel environmentaldiversity
As the hierarchical clustering did not permit assign-
ment of 99% of environmental LSU rDNA 454 reads to
Sanger sequences from cultures, linking them to a reli-
able taxonomic framework was not permittable. The
phylogeny inferred on all 1462 previous LSU rDNA
Sanger sequences from cultures and environmental
samples showed that the vast majority of environmental
sequences branch deep in the Haptophyte tree, in
clades often characterized by weak bootstrap supports.
Therefore, only the reference tree based on cultured
Haptophytes could be used to map the environmental
454 reads with a certain degree of accuracy. Such analy-
sis revealed that a substantial proportion of our reads,
and thus for each of our samples, mapped into the
Prymnesiales from clade B2, a group recently desig-
nated as the family Chrysochromulinaceae (Edvardsen
et al. 2011), and highlighted for its wide distribution in
planktonic ecosystems (Liu et al. 2009). The Chrysoch-
romulina species from this group are typically small
(<5 lm) noncalcifying and saddle-shaped cells (Vaulot
et al. 2008; Edvardsen et al. 2011). A substantial propor-
tion of reads were also assigned to Phaeocystales. The
microalga Phaeocystis, the unique genus currently
described in this order, is one of the most extensively
studied taxa of marine phytoplankton notably because
of its major contribution to the global carbon budget
(Arigo et al. 1999). Phaeocytis is described as a cosmo-
politan bloom-forming microalgae (Schoemann et al.
2005), but including individual cells generally inferior
to 6 lm (Long et al. 2007).
The 454 sequencing from a single geographical site
uncovered reads belonging to all Haptophyta orders,
except the Pavlovales and the Zygodiscales. Zygodi-
scales include coccolithophore genera such as Helicosph-
aera, Rhabdosphaera, Discosphaera or Scyphosphaera, which
are nevertheless commonly observed in microscopy-
based surveys. For the first time, environmental reads
were assigned to the genus Chrysoculter, a noncalcifying
Haptophyte that has only been reported thus far from
coastal waters of Northern Japan (Nakayama et al. 2005).
Structure of environmental Haptophyte communities
In Naples Bay, we found communities that were princi-
pally structured according to sampling depth. This
© 2012 Blackwell Publishing Ltd
96 L. BITTNER ET AL.
structural difference was significant. Studies comparing
the structuring of the genetic diversity in marine eco-
systems between subsurface and DCM are until now
rare, and dissimilarities observed were often attributed
to undersampling issues (Massana et al. 2011). Never-
theless, it does not seem to be our case because rarefac-
tion curves calculated were reaching a plateau.
Moreover, Haptophyta communities have already been
shown to have different compositions according to the
depth in previous studies: in South Pacific (Shi et al.
2009, 2011) or in the Red Sea (Man-Aharonovich et al.
2010). Little information has been published yet on
Haptophytes’ vertical distribution in waters columns,
and it relies in majority on scanning electron micros-
copy (SEM) studies from coccolithophores lineages
(Coccolithales, Isochrysidales, Zygodiscales, Syracosp-
haerales) (Winter et al. 2002). In SEM studies, taxonomic
composition was indeed shown to change mainly
according to the depth. Furthermore, in the upper
layers, taxonomic composition was also influenced by
temperature and availability of phosphate, and in the
deeper layers, it was mainly influenced by temperature
and light availability (Cortes et al. 2001). Emiliania
huxleyi (an Isochrysidale) is distributed in the whole
water column, but other species and genera are
clearly restricted to specific depth: for example, Oolitho-
tus (a Coccolithale) and Algirosphaera (a Syracosphae-
rale) are typically found in the middle photic zone
(Cortes et al. 2001; Malinverno et al. 2003; Frada et al.
2012). The few reads that we were able to assign to the
species level in this study tend to confirm the results
previously obtained with SEM studies. The significance
of these patterns will thus have to be tested with a
broader range of samples, including also periodic sam-
pling and environmental parameters in order to take
into account seasonal variations.
We did not find significant structuring differences in
terms of size or template (rRNA or rDNA). Not et al.
(2009) suggested that metagenomic approaches based
on rRNA may significantly reduce the biases inherent
in rDNA surveys (such as the group-specific variability
in rDNA copy numbers, dormant cells and occurrence
of extracellular, ancient DNA), depicting more accu-
rately the active part of the communities. Considering
the relatively low differences in community structuring
based on rRNA and rDNA reads, it appears that the
most abundant Haptophytes in our samples were also
the most physiologically active taxa.
Suggestions for interpreting environmentalHaptophyte diversity
In our study, a high OTUs diversity has been revealed
through the use of high-throughput sequencing of a
LSU rDNA fragment a single location. However, we
cannot conclude whether the extended genetic diversity
shown here is the result of intraspecific or interspecific
diversity and/or nonconcerted evolution of ribosomal
operon copies and/or sequencing errors. Diversity cre-
ated by sequencing errors certainly occurred, but the
stringency of our cleaning process and our clustering
strategy should have largely reduced these biases in
our data set. This extended genetic diversity can also be
observed if multiple copies of LSU rDNA are present in
Haptophyte genomes, and if these diverse copies have
accumulated mutations (in a single genome) and then
were amplified. Unfortunately, reliable information
about the copy number and the variability of copies of
LSU (or even SSU) in the Haptophytes is presently
not available. Despite the sequencing of Haptophyte
genomes/transcriptomes/ESTs (e.g. from Emiliania hux-
leyi), this information is not available because of the
way data have been assembled (mainly through the
consensus of short sequences). SSU and LSU rDNA
sequences from E. huxleyi show generally no sequence
variation (on public databases, only one version is
given). rDNA copy numbers can have very different
values in taxa from different eukaryotic domains (Zhu
et al. 2005). So, rDNA and rRNA surveys are expected
to give very different views when analysing together
(in a universal study) groups with very high copy num-
ber (like Alveolates) and very low copy number (likes
Pelagophytes) (Zhu et al. 2005; Not et al. 2009; Logares
et al. 2012). In general, for the analyses of the same sam-
ple but comparing rDNA and rRNA results, lineages,
which would show a higher proportion of rDNA reads
compared to the proportion of rRNA, would be pointed
out as lineages with a high copy number of rDNA. In
this study, targeting only Haptophytes, as our ratio of
rDNA to rRNA is globally similar in the majority of the
lineages, the rDNA copy number issue and their non-
concerted evolution probably play a minor role. We can
thus expect to deal here with ‘real’ intraspecific or inter-
specific diversity. Only lineages showing less than 2.5%
of the reads (Syracosphaerales, Isochrysidales and
Chrysoculter rhomboideus) might be impacted by noncon-
certed evolution.
In our study, sub-data sets did not share more than
one half of common clusters. Potential explanations for
this relatively low overlapping are multiple. Even if
undersampling and sequencing depth are expected to be
now reduced with NGS methods, we unfortunately can
see here that they constitute ongoing issues. What we
can learn also from this relatively low overlap is that we
are still far from describing the entire environmental
Haptophyte diversity when we are focusing on single
location even if we are using a Haptophyte-specific pri-
mer set and deep sequencing methods. We confirmed
© 2012 Blackwell Publishing Ltd
DIVERSITY PATTERNS OF UNCULTURED HAPTOPHYTES 97
here that studying rDNA and rRNA from the same
sample gives us complementary information on diver-
sity (Not et al. 2009), and for future exploration of envi-
ronmental Haptophyte diversity, it seems to us
necessary to include both templates, and also if possible
different sets of primers to target different genomic
regions. Inferring diversity from one unique source of
information, such as a single molecular marker (Bittner
et al. 2010; Piganeau et al. 2011), or from a unique pri-
mer set obviously may bias the results or at least may
give us different view of diversity (Stoeck et al. 2010).
Pluralistic alternatives should be used in future studies
in order to build real exhaustive picture of the Hapto-
phyte diversity.
Conclusion
This work is the first NGS environmental study focus-
ing on Haptophytes. Our Haptophyte-specific primer
set, targeting the D1-D2 domain of the LSU rDNA gene,
permitted the discovery of a significantly high number
of Haptophyta phylotypes as compared to previous
studies based on Sanger sequencing of clone libraries or
pyrosequencing using universal eukaryote primers tar-
geting SSU rDNA. The majority of the environmental
LSU 454 new reads did not cluster with taxonomically
known sequences. This result further highlights the
major gap existing between the well-defined diversity
and classification inferred from cultivated microorgan-
isms, and their significantly larger natural diversity that
is not yet well understood. We also pointed out that
estimation of the number and the diversity of rDNA
copies in Haptophyte genomes is an important element
for interpreting environmental diversity studies. We
detected in our study a significant dissimilarity between
communities from different depths; this trend will have
to be further linked to chemico-physical parameters in
Naples Bay and to the presence of other microorgan-
isms (protists, bacteria, virus) for a better understanding
of the structuring. Our study highlighted once more the
observation that diversity inferred from environmental
samples is partly dependent on the samples used, and
for future studies, we therefore recommend deep
sequencing of each sample and combining results from
the rDNA and rRNA templates, and if possible using
more than one primer set.
Acknowledgments
We thank the BioMarKs consortium, and especially Fabrice
Not, Adriana Zingone and the staff and crew of the Stazione
Zoologica Napoli, for organizing the sampling of the material
analysed herein. We thank Professor Jean-Michel Claverie for
providing free access to the large computer facility of the
PACA-Bioinfo IBISA platform. Thanks to Richard Christen and
Frederic Mahe for their support and advice about computer
analyses. This work is part of the EU EraNet BiodivErsA pro-
gram BioMarKs (CdV) and the Norwegian Research Council
project HAPTODIV (190307/S40) (BE, EE). We acknowledge
the following programmes for additional support: the French
ANR grant POSEIDON (ANR-09-BLAN-0348-01) for LB, the
ANR project PROMETHEUS (ANR-09-GENM-031) for AG, the
projects ANR-09-PCS-GENM-218 and ANR-08-BDVA-003 for
SS and HO, and the EU ASSEMBLE project (227799) for IP.
The authors are grateful to Michelle Gehringer and to Micah
Dunthorn for helpful rereadings. The authors gratefully thank
anonymous referees and the subject editor for their thorough
reviews and constructive criticism on previous versions of the
manuscript.
References
Amaral-Zettler LA, McCliment EA, Ducklow HW, Huse SM
(2009) A method for studying protistan diversity using mas-
sively parallel sequencing of V9 hypervariable regions of
small-subunit ribosomal RNA genes. PLoS ONE, 4, e6372.
Barberan A, Bates ST, Casamayor EO, Fierer N (2011) Using
network analysis to explore co-occurrence patterns in soil
microbial communities. ISME Journal, 6, 343–351.Behnke A, Engel M, Christen R, Nebel M, Klein RR, Stoeck T
(2010) Depicting more accurate pictures of protistan commu-
nity complexity using pyrosequencing of hypervariable SSU
rRNAgene regions. Environmental Microbiology, 13, 340–349.Berger SA, Krompass D, Stamatakis A (2011) Performance,
accuracy, and web server for evolutionary placement of
short sequence reads under maximum likelihood. Systematic
Biology, 60, 291–302.Bik HM, Porazinska DL, Creer S, Caporaso JG, Knigth R,
Thomas WK (2012) Sequencing our way towards under-
standing global eukaryotic biodiversity. Trends in Ecology and
Evolution, 27, 233–243.
Bittner L (*), Halary S (*), Payri C et al. (2010) Some consider-
ations for analyzing biodiversity using integrative metage-
nomics and gene networks. Biology Direct, 5, 47.
Bray JR, Curtis JT (1957) An ordination of the upland forest
communities of Southern Wisconsin. Ecological Monographs,
27, 326–349.
Brown MV, Philip GK, Bunge JA, Smith MC, Bisset A, Lauro
FM et al. (2009) Microbial community structure in the North
Pacific Ocean. ISME Journal, 3, 1374–1386.Caron DA, Countway PD (2009) Hypotheses on the role of the
protistan rare biosphere in a changing world. Aquatic Micro-
bial Ecology, 57, 227–238.
Caron DA, Countway PD, Savai P, Gast RJ et al. (2009) Defin-
ing DNA-based operational taxonomic units for microbial
eukaryote ecology. Applied Environmental Microbiology, 75,
5797–5808.
Cheung MK, Au CH, Chu KH, Kwan HS, Wong CK (2010)
Composition and genetic diversity of picoeukaryotes in sub-
tropical coastal waters as revealed by 454 pyrosequencing.
ISME Journal, 4, 1053–1059.
Clarke KR (1993) Non-parametric multivariate analyses of changes
in community structure. Australian Journal of Ecology, 18, 117–143.
Cortes MY, Bollmann J, Thierstein HR (2001) Coccolithophore
ecology at the HOT station ALOHA, Hawaii. Deep-Sea
Research II, 48, 1957–1981.
© 2012 Blackwell Publishing Ltd
98 L. BITTNER ET AL.
Cuvelier ML, Allen AE, Monier A et al. (2010) Targeted
metagenomics and ecology of globally important uncultured
eukaryotic phytoplankton. Proceedings of the National Academy
of Sciences of the United States of America, 17, 14679–14684.Dawson SC, Hagen KD (2009) Mapping the protistan ‘rare
biosphere’. Journal of Biology, 8, 105.
Edgar RC (2010) Search and clustering orders of magnitude
faster than BLAST. Bioinformatics, 26, 2460–2461.Edgcomb V, Orsi W, Bunge J, et al. (2011) Protistan microbial
observatory in the Cariaco Basin, Caribbean I. Pyrosequenc-
ing vs Sanger insights into species richness. ISME Journal, 5,
1344–1356.Edvardsen B, Eikrem W, Throndsen JAS, Probert I, Medlin L
(2011) Ribosomal DNA phylogenies and a morphological
revision provide the basis for a new taxonomy of Prymnesi-
ales (Haptophyta). European Journal of Phycology, 46, 202–228.Eiler A, Heinrich F, Bertilsson S (2011) Coherent dynamics and
association networks among lake bacterioplankton taxa.
ISME Journal, 6, 330–342.
Frada MJ, Bidle KD, Probert I, Vargas C. de (2012) In situ sur-
vey of life cycle phases of the coccolithophore Emiliania hux-
leyi (Haptophyta). Environmental microbiology, 14, 1558–1569.Gilles A, Meglecz E, Pech N, Ferreira S, Malausa T, Martin JF
(2011) Accuracy and quality assessment of 454 GS-FLX Tita-
nium pyrosequencing. BMC Genomics, 19, 245.
Gower JC (1966) Some distance properties of latent root and
vector methods used in multivariate analysis. Biometrika, 53,
325–338.
Guindon S, Gascuel O (2003) A simple, fast, and accurate algo-
rithm to estimate large phylogenies by maximum likelihood.
Systematic Biology, 52, 696–704.Hall TA (1999) BioEdit: a user-friendly biological sequence
alignment editor and analysis program for Windows 95/98/
NT. Nucleic Acids Symposium Series, 41, 95–98.
Hammer Ø, Harper DAT, Ryan PD (2001) PAST: paleontologi-
cal statistics software package for education and data analy-
sis. Palaeontologia Electronica, 4, 1–9.Huber JA, Mark WDB, Morrison HG et al. (2007) Microbial
population structures in the deep marine biosphere. Science,
318, 97–100.
Huse SM, Huber JA Morrison HG Sogin ML, Welch DM (2007)
Accuracy and quality of massively parallel DNA
pyrosequencing. Genome Biology, 8, R143.
Huse SM, Dethlefsen L, Huber JA, et al. (2008) Exploring micro-
bial diversity and taxonomy using SSU rRNA hypervariable
tag sequencing. PLoS Genetics , 4, e1000255.
Huse SM, Welch DM, Morrison HG, Sogin ML (2010) Iron-
ing out the wrinkles in the rare biosphere through
improved OTU clustering. Environmental Microbiology, 12,
1889–1898.
Katoh K, Toh H (2008) Improved accuracy of multiple ncRNA
alignment by incorporating structural information into a
MAFFT-based framework. BMC Bioinformatics, 9, 212–224.Kunin V, Engelbrektson A, Ochman H, Hugenholtz P (2010)
Wrinkles in the rare biosphere: pyrosequencing errors can
lead to artificial inflation of diversity estimates. Environmental
Microbiology, 12, 118–123.Kysela DT, Palacios C, Sogin ML (2005) Serial analysis of
V6-ribosomal sequence tags (SARST-V6): a method for effi-
cient, high-throughput analysis of microbial community
composition. Environmental Microbiology, 7, 356–364.
Legendre P, Gallagher ED (2001) Ecologically meaningful
transformations for ordination of species data. Oecologia, 129,
271–280.
Liu H, Probert I, Uitz J et al. (2009) Extreme diversity in non-
calcifying Haptophytes explains a major pigment paradox in
open oceans. Proceedings of the National Academy of Sciences of
the United States of America, 106, 12803–12808.
Logares R, Audic S, Santini S, Pernice MC, de Vargas C, Mas-
sana (2012) Diversity patterns and activity of uncultured
marine heterotrophic flagellates unveiled with pyrosequenc-
ing. ISME Journal, 6, 1823–1833.
Long JD, Smalley GW, Barsby T, Anderson JT, Hay ME (2007)
Chemical cues induce consumer-specific defenses in a
bloom-forming marine phytoplankton. Proceedings of the
National Academy of Sciences of the United States of America,
104, 10512–10517.Malinverno E, Ziveri P, Corselli C (2003) Coccolithophorid dis-
tribution in the Ionian Sea and its relationship to eastern
Mediterranean circulation during late fall to early winter
1997. Journal of Geophysical Research, 108, 8115.
Man-Aharonovich D, Philosof A, Kirkup BC et al. (2010) Diver-
sity of active marine picoeukaryotes in the Eastern Mediter-
ranean Sea unveiled using photosystem-II psbA transcripts.
ISME Journal, 4, 1044–1052.Margulies M, Egholm M, Altman WE et al. (2005) Genome
sequencing in microfabricated high-density picolitre reactors.
Nature, 437, 376–380.
Massana R, Pernice M, Bunge JA, del Campo J (2011) Sequence
diversity and novelty of natural assemblages of picoeukary-
otes from the Indian Ocean. ISME Journal, 5, 184–195.
Matsen FA, Kodner RB, Armbrust EV (2010) Pplacer: linear time
maximum-likelihood and Bayesian phylogenetic placement of
sequences onto a fixed reference tree. BMC Bioinformatics, 11, 538.
McDonald SM, Sarno D, Scanlan DJ, Zingone A (2007) Genetic
diversity of eukaryotic ultraphytoplankton in the Gulf of
Naples during an annual cycle. Aquatic Microbial Ecology, 50,
75–89.Medlin LK, Kooistra WHCF (2010) Methods to estimate the
diversity in the marine photosynthetic protist community
with illustrations from case studies: a review. Diversity, 2,
973–1014.Monchy S, Sanciu G, Jobard M et al. (2011) Exploring and
quantifying fungal diversity in freshwater lake ecosystems
using rDNA cloning/sequencing and SSU tag pyrosequenc-
ing. Environmental Microbiology, 13, 1433–1453.Nakayama T, Yoshida M, Noel M-H, Kawachi M, Inouye I
(2005) Ultrastructure and phylogenetic position of Chrysocul-
ter rhomboideus gen. et sp. nov (Prymnesiophyceae), a new
flagellate haptophyte from Japanese coastal waters. Phycolo-
gia, 44, 369–383.
Nebel M, Pfabel C, Stock A, Dunthorn M, Stoeck T (2011)
Delimiting operational taxonomic units for assessing cili-
ate environmental diversity using small-subunit rRNA
gene sequences. Environmental Microbiology Reports, 3,
154–158.Nolte V, Pandey RV, Jost S et al. (2010) Contrasting seasonal
niche separation between rare and abundant taxa conceals the
extent of protist diversity.Molecular Ecology, 19, 2908–2015.
Not F, del Campo J, Balague V, de Vargas C, Massana R (2009)
New insights into the diversity of marine picoeukaryotes.
PLoS ONE, 4, e7143.
© 2012 Blackwell Publishing Ltd
DIVERSITY PATTERNS OF UNCULTURED HAPTOPHYTES 99
Oksanen J, Kindt R, Legendre P, O’Hara RB (2007) vegan:
community ecology package version, 1, 8–5.Available from
http://r-forge.r-project.org/projects/vegan/.
Olsen GJ, Lane DJ, Giovannoni SJ, Pace NR, Stalh DA (1986)
Microbial ecology and evolution: a ribosomal RNA approach.
Annual Review of Microbiology, 40, 337–365.Pawlowski J, Burki F (2009) Untangling the Phylogeny of Amoe-
boid Protists. Journal of Eukaryotic Microbiology, 56, 16–25.Pawlowski J, Christen R, Lecroq B et al. (2011) Eukaryotic
richness in the abyss: insights from pyrotag sequencing.
PLoS ONE, 6, e18169.
Piganeau G, Eyre-Walker A, Grimsley N, Moreau H (2011)
How and why DNA barcodes underestimate the diversity of
microbial eukaryotes. PLoS ONE, 6, e16342.
Posada D (2008) jModelTest: phylogenetic model averaging.
Molecular Biology and Evolution, 25, 1253–1256.Pulsinelli GA, Temin HM (1994) High rate of mismatch exten-
sion during reverse transcription in a single round of retrovi-
rus replication. Proceedings of the National Academy of Sciences
of the United States of America, 91, 9490–9494.Quince C, Lanzen A, Davenport RJ, Turnbaugh PJ (2011)
Removing noise from pyrosequenced amplicons. BMC Bioin-
formatics, 12, 38.
Roesch LF, Fulthorpe RR, Riva A et al. (2007) Pyrosequencing
enumerates and contrasts soil microbial diversity. ISME
Journal, 1, 283–290.Schloss PD (2010) The effects of alignment quality, distance cal-
culation method, sequence filtering, and region on the analy-
sis of 16S rRNA gene-based studies. PLoS Computational
Biology, 6, e1000844.
Schoemann V, Becquevort S, Stefels J, Rousseau V, Lancelot C (2005)
Phaeocystis blooms in the global ocean and their controlling mech-
anisms: a review. Journal of Sea Research, 53, 43–66.Shalchian-Tabrizi K, Reier-Røberg K, Ree DK, Klaveness D,
Brate J (2011) Marine-freshwater colonizations of Hapto-
phytes inferred from phylogeny of environmental 18S rDNA
sequences. Journal of Eukaryotic Microbiology, 58, 315–318.Shannon CE (1948) A mathematical theory of communication.
Bell System Technical Journal, 27, 379–423.Shi XL, Marie D, Jardillier L, Scanlan DJ, Vaulot D (2009)
Groups without cultured representatives dominate eukary-
otic picophytoplankton in the oligotrophic South East Pacific
Ocean. PLoS ONE, 4, e7657.
Shi XL, Lepere C, ScanlanDJ, VaulotD (2011) Plastid 16S rRNAgene
diversity among eukaryotic picophytoplankton sorted by flow
cytometry from the South PacificOcean.PLoSONE, 6, e18979.
Simpson EH (1949) Measurement of diversity. Nature, 163, 688.
Sogin ML, Morrison HG, Huber J et al. (2006) Microbial diver-
sity in the deep sea and the underexplored ‘‘rare biosphere’’.
Proceedings of the National Academy of Sciences of the United
States of America, 103, 12115–12120.Stoeck T, Behnke A, Christen R et al. (2009) Massively parallel
tag sequencing reveals the complexity of anaerobic marine
protistan communities. BMC Biology, 7, 72.
Stoeck T, Bass D, Nebel M et al. (2010) Multiple marker parallel
tag environmental DNA sequencing reveals a highly com-
plex eukaryotic community in marine anoxic water. Molecu-
lar Ecology, 19, 21–31.
Sydow JF, Cramer P (2009) RNA polymerase fidelity and tran-
scriptional proofreading. Current Opinion in Structural Biol-
ogy, 19, 732–739.
Winter A, Rost B, Hilbrecht H, Elbrachter M (2002) Vertical
and horizontal distribution of coccolithophores in the Carib-
bean Sea. Geo-Marine Letters, 22, 150–161.
Woese CR (1987) Bacterial evolution. Microbiology Review, 51,
221–271.
L.B. and C.D.V. initiated and L.B. designed and coordinated
the research. L.B. and S.R. performed the molecular experi-
ments. L.B., I.P., B.E., E.S.E., and C.D.V. helped in the building
of the reference database. L.B., S.A., A.G., S.S., and H.O. ana-
lyzed the data. L.B. wrote the manuscript. All authors partici-
pated in revising the manuscript. All authors read and
approved the final article.
Data accessibility
Dryad Digital Repository, Package Identifier doi:10.5061/
dryad.tv5v1v26 and Sequence Read Archive (SRA),
http://www.ebi.ac.uk/ena/data/view/ERP001891
Supporting information
Additional supporting information may be found in the online ver-
sion of this article.
Table S1 Primer set used to amplify the D1-D2 region of the
Haptophyte LSU rDNA
Table S2 Specificity of our primer set highlighted by in silico
analysis.
Table S3 Effect of cleaning.
Table S4 Comparison of community similarities between tem-
plates, size fractions and depths as described by ANOSIM values.
Table S5 Number of reads and number of OTU considering
LSU1 reads at 99%, 98% and 97% of sequence identity.
Fig. S1 Location of the ‘Mare Chiara’ (MC) station, Bay of
Naples, Mediterranean sea.
Fig. S2 Relative position of the LSU1 Haptophyta-specific pri-
mer set compared to the LSU rDNA D1-D2 domain.
Fig. S3 Expected D1-D2 LSU rDNA genetic distances at vari-
ous Haptophyta taxonomic levels.
Fig. S4 Rarefaction analysis of OTU100%, OTU99%, OTU98% and
OTU97% pooling reads from the eight samples.
Fig. S5 Taxonomic composition of LSU rDNA reads not
assigned to Haptophyta at 85% homology, as inferred from
blast analyses in GenBank, using default parameters.
Fig. S6 In the framework of another study, SSU rDNA reads
from the V4 and the V9 region were pyrosequenced for the
eight same samples from Naples using ‘universal’ eukaryotic
primers (details on the primers can be found in Stoeck et al.
2010 or in Logares et al. 2012).
© 2012 Blackwell Publishing Ltd
100 L. BITTNER ET AL.
Fig. S7 Boxplots summarizing the range of a-diversity (calcula-
tion using Simpson’s index) for each condition calculated at
97% clustering level.
Fig. S8 Venn diagrams calculated with OTU97%.
Fig. S9 Heatmap summarizing the % of shared OTU97%
between the samples.
Fig. S10 More details on the proportion of reads assigned by
phylogenetic mapping for the main Haptophytes lineages.
Fig. S11 Haptophyte community structure based on NMDS
(nonmetric multidimensional scaling) ordination of the LSU
rDNA dataset distance matrices from OTU97%, followed by
comparison of community similarities between depths, size
fractions and templates as described by ANOSIM values.
© 2012 Blackwell Publishing Ltd
DIVERSITY PATTERNS OF UNCULTURED HAPTOPHYTES 101