assessment of microbial communities by graph …aem.asm.org/content/75/18/5863.full.pdfstudies have...

8
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, Sept. 2009, p. 5863–5870 Vol. 75, No. 18 0099-2240/09/$08.000 doi:10.1128/AEM.00748-09 Copyright © 2009, American Society for Microbiology. All Rights Reserved. Assessment of Microbial Communities by Graph Partitioning in a Study of Soil Fungi in Two Alpine Meadows L. Zinger, 1 * E. Coissac, 1 P. Choler, 1,2,3 and R. A. Geremia 1 Laboratoire d’Ecologie Alpine, CNRS UMR 5553, Universite ´ de Grenoble, BP 53, F-38041 Grenoble Cedex 09, France 1 ; Station Alpine J. Fourier CNRS UMS 2925, Universite ´ de Grenoble, F-38041 Grenoble, France 2 ; and CSIRO Marine and Atmospheric Research, Canberra, Australia 3 Received 2 April 2009/Accepted 10 July 2009 Understanding how microbial community structure and diversity respond to environmental conditions is one of the main challenges in environmental microbiology. However, there is often confusion between determining the phylogenetic structure of microbial communities and assessing the distribution and diversity of molecular operational taxonomic units (MOTUs) in these communities. This has led to the use of sequence analysis tools such as multiple alignments and hierarchical clustering that are not adapted to the analysis of large and diverse data sets and not always justified for characterization of MOTUs. Here, we developed an approach combining a pairwise alignment algorithm and graph partitioning by using MCL (Markov clustering) in order to generate discrete groups for nuclear large-subunit rRNA gene and internal transcript spacer 1 sequence data sets obtained from a yearly monitoring study of two spatially close but ecologically contrasting alpine soils (namely, early and late snowmelt locations). We compared MCL with a classical single-linkage method (Ccomps) and showed that MCL reduced bias such as the chaining effect. Using MCL, we characterized fungal communities in early and late snowmelt locations. We found contrasting distributions of MOTUs in the two soils, suggesting that there is a high level of habitat filtering in the assembly of alpine soil fungal communities. However, few MOTUs were specific to one location. Assessment of the diversity and composition of microbial communities is a central issue in understanding the effects of environmental conditions on microbial community assembly. With this framework, massive cloning and sequencing or py- rosequencing of universal DNA markers (e.g., rRNA genes) allows microbial communities to be characterized in a qualita- tive and semiquantitative way (37, 50). The composition and diversity of microbial communities based on such methods are assessed by (i) evaluating distances between DNA sequences, (ii) classifying DNA sequences in molecular taxonomic units (MOTUs), and (iii) identifying MOTUs using reference data- bases (i.e., barcoding approaches) (48). Although numerous studies have focused on subsequent estimation of diversity and richness (24, 39, 44), only a few studies have cast doubt on the alignment and sequence classification methods (25). Multiple-sequence alignment tools are fairly popular in en- vironmental microbiology, although establishment of phyloge- netic links between phylotypes is not necessarily required. DNA markers that are well suited for barcoding approaches sensu lato, as defined by Valentini et al. (48), require sufficient variation in composition and size, making analysis of subse- quent data sets with multiple-sequence alignments difficult. Indeed, based on heuristics, these algorithms excessively pe- nalize insertion-deletion events (32). Pairwise-sequence align- ments obtained using exact algorithms, such as NWS (35), avoid this limitation and have already been used in bacterial (25) and fungal community studies (37, 39). However a large proportion of DNA sequences obtained from massive sequenc- ing are partial sequences due to interruption of the sequencing reaction, which causes the NWS algorithm to overestimate the distance between complete and partial sequences that are ac- tually similar. In this context, use of a variant of this algorithm, Free End Gap (46), has the advantage that gaps opening or extending at the beginning or at the end of partial sequences are not penalized. DNA classification is an application of graph theory from mathematics. It consists of partitioning a graph composed of nodes (DNA sequences) connected by edges (similarities) so that each node belongs to only one cluster and is connected to most of the other nodes in this cluster (Fig. 1). There are numerous graph partitioning methods with different philoso- phies. The use of a particular method can influence the diver- sity and richness estimates for microbial communities. The most classical graph partitioning methods, the single- and com- plete-linkage methods (3), are based on merging a node or a cluster with the nearest (single link) or farthest (complete link) neighbors. While the single-linkage method is a true partition- ing method, the latter approach relies on the maximal clique concept, where one node can belong to several clusters. This feature is often bypassed by artificially organizing the classifi- cation into a hierarchy, which is not always justified biologi- cally. On the other hand, the single-linkage method fails to separate clusters connected by one or few nodes, resulting in the so-called “chaining effect” (Fig. 1). The Markov clustering algorithm (MCL) is an alternative to the classical clustering methods. Like the single-linkage method, it is a true partition- ing method that additionally simulates node merging according * Corresponding author. Mailing address: Laboratoire d’Ecologie Alpine UJF/CNRS, Universite ´ de Grenoble, 2233, rue de la Piscine, BP 53 Bat D Biologie, Grenoble F-38041, France. Phone: 33-4-76-51- 44-59. Fax: 33-4-76-51-42-79. E-mail: [email protected]. † Supplemental material for this article may be found at http://aem .asm.org/. Published ahead of print on 17 July 2009. 5863 on June 6, 2018 by guest http://aem.asm.org/ Downloaded from

Upload: hoangthuy

Post on 21-Apr-2018

214 views

Category:

Documents


2 download

TRANSCRIPT

APPLIED AND ENVIRONMENTAL MICROBIOLOGY, Sept. 2009, p. 5863–5870 Vol. 75, No. 180099-2240/09/$08.00�0 doi:10.1128/AEM.00748-09Copyright © 2009, American Society for Microbiology. All Rights Reserved.

Assessment of Microbial Communities by Graph Partitioning in aStudy of Soil Fungi in Two Alpine Meadows�†

L. Zinger,1* E. Coissac,1 P. Choler,1,2,3 and R. A. Geremia1

Laboratoire d’Ecologie Alpine, CNRS UMR 5553, Universite de Grenoble, BP 53, F-38041 Grenoble Cedex 09, France1;Station Alpine J. Fourier CNRS UMS 2925, Universite de Grenoble, F-38041 Grenoble, France2; and

CSIRO Marine and Atmospheric Research, Canberra, Australia3

Received 2 April 2009/Accepted 10 July 2009

Understanding how microbial community structure and diversity respond to environmental conditions is oneof the main challenges in environmental microbiology. However, there is often confusion between determiningthe phylogenetic structure of microbial communities and assessing the distribution and diversity of molecularoperational taxonomic units (MOTUs) in these communities. This has led to the use of sequence analysis toolssuch as multiple alignments and hierarchical clustering that are not adapted to the analysis of large anddiverse data sets and not always justified for characterization of MOTUs. Here, we developed an approachcombining a pairwise alignment algorithm and graph partitioning by using MCL (Markov clustering) in orderto generate discrete groups for nuclear large-subunit rRNA gene and internal transcript spacer 1 sequencedata sets obtained from a yearly monitoring study of two spatially close but ecologically contrasting alpine soils(namely, early and late snowmelt locations). We compared MCL with a classical single-linkage method(Ccomps) and showed that MCL reduced bias such as the chaining effect. Using MCL, we characterized fungalcommunities in early and late snowmelt locations. We found contrasting distributions of MOTUs in the twosoils, suggesting that there is a high level of habitat filtering in the assembly of alpine soil fungal communities.However, few MOTUs were specific to one location.

Assessment of the diversity and composition of microbialcommunities is a central issue in understanding the effects ofenvironmental conditions on microbial community assembly.With this framework, massive cloning and sequencing or py-rosequencing of universal DNA markers (e.g., rRNA genes)allows microbial communities to be characterized in a qualita-tive and semiquantitative way (37, 50). The composition anddiversity of microbial communities based on such methods areassessed by (i) evaluating distances between DNA sequences,(ii) classifying DNA sequences in molecular taxonomic units(MOTUs), and (iii) identifying MOTUs using reference data-bases (i.e., barcoding approaches) (48). Although numerousstudies have focused on subsequent estimation of diversity andrichness (24, 39, 44), only a few studies have cast doubt on thealignment and sequence classification methods (25).

Multiple-sequence alignment tools are fairly popular in en-vironmental microbiology, although establishment of phyloge-netic links between phylotypes is not necessarily required.DNA markers that are well suited for barcoding approachessensu lato, as defined by Valentini et al. (48), require sufficientvariation in composition and size, making analysis of subse-quent data sets with multiple-sequence alignments difficult.Indeed, based on heuristics, these algorithms excessively pe-nalize insertion-deletion events (32). Pairwise-sequence align-ments obtained using exact algorithms, such as NWS (35),

avoid this limitation and have already been used in bacterial(25) and fungal community studies (37, 39). However a largeproportion of DNA sequences obtained from massive sequenc-ing are partial sequences due to interruption of the sequencingreaction, which causes the NWS algorithm to overestimate thedistance between complete and partial sequences that are ac-tually similar. In this context, use of a variant of this algorithm,Free End Gap (46), has the advantage that gaps opening orextending at the beginning or at the end of partial sequencesare not penalized.

DNA classification is an application of graph theory frommathematics. It consists of partitioning a graph composed ofnodes (DNA sequences) connected by edges (similarities) sothat each node belongs to only one cluster and is connected tomost of the other nodes in this cluster (Fig. 1). There arenumerous graph partitioning methods with different philoso-phies. The use of a particular method can influence the diver-sity and richness estimates for microbial communities. Themost classical graph partitioning methods, the single- and com-plete-linkage methods (3), are based on merging a node or acluster with the nearest (single link) or farthest (complete link)neighbors. While the single-linkage method is a true partition-ing method, the latter approach relies on the maximal cliqueconcept, where one node can belong to several clusters. Thisfeature is often bypassed by artificially organizing the classifi-cation into a hierarchy, which is not always justified biologi-cally. On the other hand, the single-linkage method fails toseparate clusters connected by one or few nodes, resulting inthe so-called “chaining effect” (Fig. 1). The Markov clusteringalgorithm (MCL) is an alternative to the classical clusteringmethods. Like the single-linkage method, it is a true partition-ing method that additionally simulates node merging according

* Corresponding author. Mailing address: Laboratoire d’EcologieAlpine UJF/CNRS, Universite de Grenoble, 2233, rue de la Piscine,BP 53 Bat D Biologie, Grenoble F-38041, France. Phone: 33-4-76-51-44-59. Fax: 33-4-76-51-42-79. E-mail: [email protected].

† Supplemental material for this article may be found at http://aem.asm.org/.

� Published ahead of print on 17 July 2009.

5863

on June 6, 2018 by guesthttp://aem

.asm.org/

Dow

nloaded from

to the local neighborhood topology, avoiding the chaining phe-nomenon (49). This method is being used increasingly in bioin-formatics, mostly in proteomic and genetic analyses (7, 13,29, 52).

Fungi are ubiquitous in soils and can have a broad range offunctions in ecosystem processes. They are involved in nutrientcycling through mycorrhizal associations and litter decompo-sition, and they also strongly affect plant diversity via mutual-isms or pathogenic interactions (1, 17, 21). Moreover, soilscontain a great diversity of fungal species (1, 6, 37), whosespatial distribution results from many factors, such as dispersal,plant coverage, land use, and soil properties (14, 27, 30, 43, 51).Temperate alpine tundra landscapes exhibit mesotopographi-cal variations that strongly affect local snow cover dynamics(28), which in turn influences soil properties, plant cover, andecosystem processes (10, 12, 31). Alpine tundra thus is a mo-saic of ecosystems distributed along snow cover gradients.Fungi have previously been described as an important compo-nent of the soil microbial biomass in alpine soils (43), and

numerous studies have focused on mycorrhizal fungus distri-butions (11, 18, 34, 41). However, only a few surveys haveattempted to characterize whole soil fungal communities (5,43). In previous work, we highlighted covariations in snowcover dynamics and the composition and diversity of soil fungalcommunities (55). Although these studies provided insight intothe seasonal succession and diversity of fungal communities inalpine tundra, the assembly of these communities remainspoorly characterized.

To avoid the limitations of classical analysis methods forDNA sequence classification, we coupled pairwise-sequencealignments with a graph partitioning approach by using MCL.In order to illustrate the suitability of MCL, we compared theclustering quality obtained with this algorithm and that ob-tained with a classical single-linkage method, Ccomps (con-nected components). We analyzed two data sets resulting frommassive cloning and sequencing of internal transcript spacer 1(ITS1) and part of the large-subunit rRNA gene (LSU). TheseDNA regions were chosen because more-numerous sequencesare available in international databases and provided the op-portunity to assess the suitability of the clustering method forboth highly polymorphic and conserved DNA fragments. Datasets were obtained for two closed soils that differed in snowcover duration: soil from an early snowmelt location (ESM)and soil from a late snowmelt location (LSM) (see Materialsand Methods). The proposed approach was used to furthercharacterize ESM and LSM fungal communities and their sea-sonal variations.

MATERIALS AND METHODS

Study site description and sampling. The study site was located in the GrandGalibier massif (French southwestern Alps; 45°0.05�N, 06°0.38�E; elevation,2,480 m). The compositions of microbial communities at ESM and LSM loca-tions were studied. Although the two locations were separated by only approx-imately 20 m, they differed greatly in the duration of winter snow cover becauseof topographical effects. The shallow winter snowpack in the ESM plot led tofreezing of the soil in winter and dominance of a stress-tolerant graminoid,Kobresia muysoroides (Cyperaceae), and a dwarf shrub, Dryas octopetala (Rosa-ceae). In contrast, the LSM plot had a persistent and deep winter snowpack,which insulated the soil from cold winter temperatures. It was dominated bylow-stature species that have a shorter growing season, such as Carex foetida(Cyperaceae), Alpecurus alpinus (Poaceae), Alchemilla pentaphyllea (Rosaceae),and Salix herbaceae (Salicaceae) (4, 10). The soils of these two plots also differedin clay content (which was higher in the LSM plot) and in organic matter content(which was higher in the ESM plot). The soil pH ranged from �5.5 in the LSMplot to �6.5 in the ESM plot, but the values were inverted in May (55).

The following four sampling dates were chosen to measure variations in themicrobial communities in relation to snow cover dynamics: 24 June 2005, whenthe growing season was starting in the ESM plot and the snow was melting in theLSM plot; 10 August 2005, during the time when the amount of standing biomasswas largest; 10 October 2005, during litter fall; and 3 May 2006, at the end ofwinter, just after thawing occurred in the ESM plot and the LSM plot was stillunder a consistent snowpack that was 2.5 m deep. Five spatial replicates wereobtained for each plot on each date and sieved (2 mm) in order to homogenizethe soil and remove rocks and the main parts of roots.

Clone library construction. Soil DNA extraction was carried out with a Powersoil extraction kit (MO BIO Laboratories, Ozyme, St. Quentin en Yvelines,France) as previously described (55), and the DNA extracts from the five spatialreplicates for each location were pooled to limit the soil spatial heterogeneity.Moreover, this pooling strategy has been reported to yield a representativefungal molecular signature for the area studied (45), which was confirmed by apreliminary study of spatial replicates using capillary electrophoresis–single-strand conformation polymorphism analysis (data not shown).

From the eight remaining DNA pools, the LSU gene was amplified withprimers U1 (5�-GTGAAATTGTTGAAAGGGAA-3�) (42) and nLSU1221R(5�-CTAGATGAACYAACACCTT-3�) (43) as described previously (55). A sec-

FIG. 1. Weighted nondirected graph showing the clustering resultsfor a data subset subjected to (A) Ccomps and (B) MCL clustering.Each node represents a clone, and each edge represents a similarityvalue of at least 98% for two clones. The different shades of gray oflabels illustrate the subgroups formed by Ccomps and MCL. Thegraphic was drawn with fdp (www.graphviz.org).

5864 ZINGER ET AL. APPL. ENVIRON. MICROBIOL.

on June 6, 2018 by guesthttp://aem

.asm.org/

Dow

nloaded from

ond clone library of the ITS1 region was also constructed with primers ITS5(5�-GGAAGTAAAAGTCGTAACAACG-3�) and ITS2 (5�-GCTGCGTTCTTCATCGATGC-3�) (54) using the PCR conditions reported previously (55) forcapillary electrophoresis–single-strand conformation polymorphism analysis. Forboth LSU and ITS1, eight independent PCR amplifications were performed, andthen the preparations were mixed for each of the DNA pools in order to dilutethe PCR biases.

The 16 PCR products were then cloned using a TOPO TA PCR 2.1 cloning kit(Invitrogen SARL, Molecular Probes, Cergy Pontoise, France), and ligated prep-arations were sent to the Centre National de Sequencage (Genoscope, Evry,France) for transformation and sequencing. A total of 2,844 and 2,956 sequenceswere obtained for LSU and ITS1, respectively, and the number of sequences persample ranged from 350 to 380.

Sequence analysis. DNA sequences obtained from the 16 clone libraries weregrouped in one data set for each DNA marker. Sequences shorter than 100 bp(ITS1) or 500 bp (LSU) were removed from the analysis to ensure overlap of allsequences. Our data sets contained numerous sequences that were variable in length(incomplete LSU sequences and great size polymorphism of ITS sequences), whichprevented us from obtaining reliable alignments using multiple-sequence alignmenttools. Thus, we performed pairwise-sequence alignment based on the Free End Gapdynamic programming alignment algorithm (46) and computed similarity matricesusing the “borneo” software (available by request at [email protected]).

Based on the similarity matrices, both graphs were subjected to analysis witha single-linkage method, Ccomps (www.graphviz.org), and with the Markovclustering algorithm MCL (49; micans.org) with 95, 97, and 98% similaritythresholds. The main parameter of MCL, inflation, also called granularity, de-termines the partition sharpness. The parameters of MCL were adjusted with aninitial inflation value of 6 and a main iteration value of 100. Graph visualizationwas obtained using fdp (www.graphviz.org). In order to evaluate the sensitivity ofclustering to the chaining effect, we determined the cluster number and cohesionobtained with Ccomps and MCL for the two data sets. Cluster cohesion wasassessed as follows: (e/E) � (n/N), where e is the number of edges observed, Eis the maximum number of edges, n is the number of nodes, and N is the totalnumber of nodes in the graph. This formula results in a value between 0 and 1,with 0 meaning that no nodes are connected in the graph and 1 meaning that allnodes are connected.

Each DNA sequence was then compared with imported reference sequencesfrom the GenBank database (http://www.ncbi.nlm.nih.gov; accession date, No-vember 2008) using BLAST (2). The best BLAST match was used for sequenceidentification. Nonmatching or partially matching sequences were considerednonexploitable sequences and thus were not used in subsequent analyses. Be-cause of the ambiguity of the rarest MOTUs (e.g., the MOTUs containing lessthan five sequences), these MOTUs were not considered in the analysis of theESM and LSM fungal communities. Based on BLAST results, the congruence ofthe taxonomic information carried by the LSU and ITS1 markers was checked atthe order level by comparing the abundances of fungal orders detected in the twodata sets using the nonparametric Kendall tau rank correlation test. This test wasalso used to compare the abundance rankings of MOTUs for the ESM and LSMfungal communities. Statistical analyses were performed using the R software(40).

Nucleotide sequence accession numbers. The GenBank accession numbers forthe LSU sequences are FJ568339 to FJ570564, as previously reported (55), andthe GenBank accession numbers for the ITS1 sequences are FN293398 toFN295479 (this study).

RESULTS

Comparison of Ccomps and MCL clustering methods.Based on similarity matrices obtained from pairwise-sequencealignments, the two clustering methods did not result in thesame discretization of MOTUs, as shown in Fig. 1 for a datasubset. Indeed, MCL generated more clusters than Ccomps,which was particularly evident for the LSU graph (Table 1),while MCL generated �1.2-fold more MOTUs in the ITS1graph, resulting in two- to eightfold more LSU MOTUs. In thesame way, the number of singletons obtained with MCL wasalmost constant, while singletons were less numerous and thenumber of them increased with the similarity threshold whenCcomps was used (Table 1). Finally, the cohesion of MOTUs(the number of edges in one MOTU) was greater for MOTUs

obtained by MCL (Table 1). Again, the use of MCL increasedthe cohesion of MOTUs �15-fold for the ITS1 data set, whileit resulted in 14- to �2,500-fold more cohesive MOTUs for theLSU data set. Multiple-sequence alignments of each of theclusters obtained from MCL were found to be consistent (datanot shown).

Distribution of MOTUs in samples. For the distributionanalysis, we used the MOTUs composed of at least five se-quences (see Materials and Methods). With a threshold levelof 98%, MCL analysis resulted in 62 and 100 MOTUs for LSUand ITS1, respectively. Taxonomic identification of DNA se-quences by BLAST revealed that only one LSU MOTU (17clones) and three ITS1 MOTUs (a total of 27 clones) corre-sponded to nonfungal sequences (stramenophiles or plants).Also, 18 MOTUs from the ITS1 data set that were suspectedchimeras or did not match GenBank reference sequences (227clones) were not used in the subsequent analyses. The remain-ing MOTUs included 60 MOTUs for LSU and 79 MOTUs forITS1. At the genus level, while most of the ITS1 sequencesdisplayed the same level of identity in each MOTU, the resultswere less consistent for the LSU data set (see Tables S1 and S2in the supplemental material). The congruence of MOTUabundance data between the DNA markers used was thenchecked by comparing the rank of fungal order (sensu taxon-omy) abundance for taxa that were common to both data sets(Fig. 2A). This resulted in a positive correlation supported bythe results of a Kendall tau rank correlation test (� � 0.723 andP � 0.001 for the ESM plot and � � 0.566 and P � 0.01 for theLSM plot), which was mostly due to the dominant taxonomicgroups.

Most MOTUs displayed contrasting abundances across loca-tions (Fig. 2B), which was confirmed by the Kendall tau rankcorrelation test (� � �0.202 and P � 0.05 for LSU and � ��0.399 and P � 0.001 for ITS1). However, only 9 LSU MOTUswere specific to the ESM location and 16 LSU MOTUs werespecific to the LSM location. In addition, 21 ITS1 MOTUs werespecific to the ESM location and 17 ITS1 MOTUs were specificto the LSM location. These location-specific MOTUs were raremost of the time. On the other hand, few MOTUs were found to

TABLE 1. Comparison of LSU and ITS1 graphs partitioned byCcomps and MCL using different similarity thresholds

DNAregion

Clusteringmethod

Similaritythreshold

(%)

Clustering performance

No. ofgroupsa

No. ofsingletonsa

Cohesion index(103, mean SD)b

LSU Ccomps 95 11 23 17 33 (5)97 34 63 3 7 (27)98 56 115 0.09 0.16 (36)

MCL 95 56 124 245 97 (25)97 111 133 215 116 (48)98 125 133 248 131 (62)

ITS1 Ccomps 95 153 185 99 16 (78)97 191 227 92 16 (89)98 203 274 99 16 (96)

MCL 95 182 185 143 65 (93)97 207 227 118 54 (97)98 214 277 1,278 131 (100)

a The total numbers of sequences were 2,661 for LSU and 2,956 for ITS.b The values are based on data for MOTUs containing more than four se-

quences. The numbers in parentheses are the numbers of MOTUs.

VOL. 75, 2009 GRAPH PARTITIONING FOR FUNGAL COMMUNITY STUDIES 5865

on June 6, 2018 by guesthttp://aem

.asm.org/

Dow

nloaded from

be represented equally at the two locations (Fig. 2B). Becausealmost 80% of the clones were found to be distributed in 30MOTUs whatever samples and DNA marker were used, we fo-cused on these MOTUs for further description.

Spatial and seasonal variation of abundant MOTUs. (i)LSU data set. Globally, the ESM location had few dominantMOTUs and the majority of MOTUs were rare, except in May,when the MOTUs were more uniformly distributed. Similar towhat was observed for the ESM May sample, MOTUs had auniform distribution at the LSM location (Fig. 3). Neverthe-less, this location was noticeably dominated by MOTUs L3(Helotiales) and L5 (Mortierellaceae) in August and Octoberand by MOTU L23 (Tricholomataceae) in June. At the ESMlocation, Basidiomycota were found to be, on average, moreabundant, and this group was represented mainly by MOTUsL1 (Cortinariaceae), L4 (Cortinariaceae and Russulaceae), andL9 and L10 (Clavulinaceae) as well as by MOTU L3 (Pleospo-rale and Ascomycota). These MOTUs were for the most partstable during the growing season, except for the ClavulinaceaeMOTUs, which appeared to be overrepresented in May.

(ii) ITS1 data set. Ascomycota were much less common inthe ITS1 data set (Fig. 3). As observed for the LSU data set,the distribution of MOTUs was more uniform at the LSMlocation. However, several MOTUs displayed time-dependentdominance, including MOTU I3 (Hypocreales) in May, MOTUI6 (mitosporic Ascomycota) in October, MOTUs I4 (Atheli-aceae) and I22 (Thelephoraceae) in May, and MOTU I5 (Hy-grophoraceae) in June. Unfortunately, a large proportion ofLSM MOTUs were not identified. The ESM location waslargely dominated by Basidiomycota, particularly by MOTU I1(Russulaceae) from June to October and by MOTUs I7 (Cor-tinariaceae) and I2 (Clavulinaceae) in May.

DISCUSSION

DNA analysis considerations. DNA sequences can be con-sidered a continuum in which each sequence shares more orfewer similarities with other sequences. To assess the similar-ities, sequence alignment is a critical step, and numerous andvaried tools exist for this purpose. While multiple-sequencealignments can answer phylogenetic questions, they are notappropriate for assessment of the diversity and structure ofmicrobial communities based on DNA barcoding approachessensu lato (48). In this context, the Free End Gap algorithm ismuch more appropriate because it is less affected by insertion-deletion events and does not penalize partial DNA sequences,a current feature in large data sets. Comparison of microbialcommunities across continua is possible by comparing the in-tracommunity DNA sequence similarity with the intercommu-nity DNA sequence similarity (53). However, such a compar-ison does not provide information about the microbialcommunities themselves. In contrast, artificial discretization ofa continuum allows easier comparisons of microbial commu-nities to be made by producing data comparable to the dataused for traditional taxonomy-based approaches. In this con-text, DNA classification is usually assessed by using hierarchi-cal clustering methods (44). However, this approach is notjustified since connections between DNA classes are not nec-essarily required for global characterization of microbial com-munities. In this context, simple discretization of our contin-uum is sufficient and can be assessed by graph partitioningmethods.

Ideally, a graph partitioning method should be able to (i)produce highly cohesive and discrete classes, (ii) detect single-tons, and (iii) be efficient in terms of the computation time

FIG. 2. Comparison of MOTU abundance according to the DNA markers and the locations studied. (A) Abundance of MOTUs belonging tocommon fungal orders for the LSU and ITS1 markers. (B) Abundance of MOTUs for the LSM and ESM locations. The axes are log10 scales. Thenonparametric Kendall tau rank correlation coefficient revealed significant correspondence of the abundances of fungal orders for the DNAmarker used and negative correspondence of MOTU abundances for the two locations studied.

5866 ZINGER ET AL. APPL. ENVIRON. MICROBIOL.

on June 6, 2018 by guesthttp://aem

.asm.org/

Dow

nloaded from

required to process large data sets. Here, we also tested clus-tering efficiency with both highly polymorphic (ITS1) and moreconserved (LSU) DNA markers. We found that the partition-ing of the ITS1 graph resulted in more MOTUs with bothclustering methods (Table 1) according to the degree of poly-morphism of the two markers studied. Furthermore, conser-vative DNA regions generated more continuum graph struc-tures than less conserved regions generated. The latter regionsare harder to partition, because they are more prone to thechaining effect (Fig. 1). A good partitioning method shouldthus be able to form discrete classes within the continuumstructures.

The MCL algorithm has previously been reported to behighly efficient for processing large data sets in a more robust

way (7). Our observations support the performance of MCL,which produced more numerous and cohesive MOTUs thanCcomps, especially for the LSU graph (Table 1 and Fig. 1).This result reflects the ability of MCL to avoid chaining effectsin continuum structures. Furthermore, it indicates that theclustering methods can generate erroneous results, as shownby the small number of LSU MOTUs found by Ccomps. In thesame way, MCL detected the maximum number of singletonsat the lowest similarity thresholds, while Ccomps did not detectthe maximum number of singletons at a similarity level of 98%(Table 1). These results show that the clustering method usedcan affect the assessment of richness and diversity and thatMCL is more appropriate for our purposes. Besides, MCLcould be used for other conservative DNA regions, such as the

FIG. 3. Spatial and temporal distribution of the main MOTUs. (a) LSU data set; (b) ITS data set. The columns indicate the 30 most abundantMOTUs. The lines indicate samples. The size of a square indicates the percentage of clones in the sample. A, Ascomycota; B, Basidiomycota; Z,Zygomycota; U, unclassified fungi.

VOL. 75, 2009 GRAPH PARTITIONING FOR FUNGAL COMMUNITY STUDIES 5867

on June 6, 2018 by guesthttp://aem

.asm.org/

Dow

nloaded from

16S rRNA gene for analyzing prokaryotic communities. Al-though this method has been used previously in comparativegenomic analyses (29, 52), to our knowledge this is the firsttime that this algorithm has been used for characterization ofmicrobial communities.

Despite the formation of discrete and cohesive clusters, tax-onomic assignment of MOTUs, especially for the LSU data set(see Table S1 in the supplemental material), remained prob-lematic for three main reasons. First, a “universal similaritythreshold” does not exist and the threshold varies according tothe taxonomic family considered and the DNA marker used(6). Second, the taxonomy of fungi is still evolving and canhardly be fixed because of the paraphyletism or uncertainty ofthe phylogenetic positions of certain fungal groups, such asHelotiales (23) and Russulales (22). Third, the quality of publicdatabases is doubtful; 20% of fungal sequences in internationaldatabases may be incorrectly identified at the species level(36). Although the quality is still unknown at higher taxonomiclevels, we found, for instance, that Cryptococcus GenBank se-quences (Tremellales, Basidiomycota) are annotated as belong-ing to Tremellales and also to Cystofilobasidiales, Filobasidiales,and the Ascomycota phylum. Although the abundance valuesfor fungal orders were found to be quite similar for the LSUand ITS1 data sets (Fig. 2A), the use of taxonomy as a criterionfor DNA classification is a doubtful strategy in this framework,and thus DNA classification should be based instead on se-quence similarities. Moreover, taxonomic assignment ofMOTUs and subsequent ecological conclusions should bemade carefully.

Thus, the use of a graph partitioning approach based onpairwise-sequence alignment and coupled with MCL clusteringseems a good alternative for analyzing large DNA sequencedata sets obtained using pyrosequencing techniques, since itavoids recurrent issues of “manual corrections” of multiple-sequence alignments and the reference database errors. Al-though this method does not explain the evolutionary connec-tion between MOTUs, it allows assessment of microbialcommunity structure, richness, and diversity and may also be afirst filtration step for creating more homogeneous data sub-sets for further phylogenetic studies, as suggested by Loytynojaand Goldman (32).

Phylotype diversity in ESM and LSM soils. The fungal com-munities at both locations studied contained few dominantMOTUs and numerous rare MOTUs (Fig. 2B). For instance,the 10 most abundant MOTUs accounted for 40 to 90% ofeach clone library, similar to what is typically observed inmacroorganism communities (20, 24). Interestingly, at theESM location there were fewer highly dominant MOTUs thanat the LSM location, supporting our previous findings obtainedwith classical approaches (e.g., multiple alignments, completelinkage) showing that the diversity of fungal communities washigher at LSM locations (55).

Although our study locations were spatially close, the differ-ences in snow cover dynamics, vegetation type, soil properties,and ecosystem processes were substantial (4), and these factorsvaried with the microbial communities (55). These differencessignificantly affected the distribution of MOTUs irrespective ofthe DNA marker used (Fig. 2B and 3). However, only a fewMOTUs were strictly specific to one location or one season.Rather, the differences in fungal communities between the two

locations were expressed in terms of abundance (Fig. 3). Mostof the site-specific MOTUs were rare, and our sequencingeffort may not have been sufficient to detect them at bothlocations. The lack of MOTU specificity is quite surprisingbecause ESM and LSM areas have dramatic differences inplant species composition that could recruit specific mycorrhi-zal fungi. Thus, Cripps et al. (11) reported patchy distributionof mycorrhizal fungi that mirrored the vegetation mosaic inalpine tundra. The feature observed here could be explainedby the high rate of dispersal of these organisms (16) and theshort distance between the locations studied, as previouslyobserved for freshwater bacterial communities (15). On theother hand, our results may indicate a lack of taxonomic res-olution with the DNA markers used.

On the other hand, as suggested by the Kendall tau corre-lation test, the abundant MOTUs at the LSM location werequite rare at the ESM location and vice versa (Fig. 2B and 3),suggesting that contrasting environmental conditions filtercontrasting fungal communities. Indeed, the ESM locationrepresents a stressful and nutrient-limited habitat (10). Thislocation was dominated by phylotypes related to ectomycor-rhizal fungi (e.g., Cortinariaceae, Russulaceae, and Helotiales)(Fig. 3) often associated with the plants K. myosuroides and D.octopetala dominant at ESM locations (18, 34, 43) and toPleosporales, previously described as dark septate fungistrongly involved in nitrogen translocation in alpine grasslands(19, 38). These dominance patterns were stable from spring toautumn, suggesting that the fungi shift along a parasitism-mutualism continuum through both saprotrophic activities andnutrient uptake capacities (26, 33). In late winter, these fungalgroups were replaced by presumably psychrotolerant Can-tharellales (Fig. 3), which are known to form ectomycorrhizalassociations (47) and for their lignin degradation abilities (9).At the ESM location, the dominance of ectomycorrhizal asso-ciations and phylotypes related to fungal species able to de-grade complex organic matter was comparable to the domi-nance that occurs in ecosystems with slow nutrient turnover(8). In contrast, at the LSM location the litter degradation rateis high and the nutrient availability is high (3a). LSM fungalcommunities appeared to be more temporally variable andmore diversified with an important component of poorly doc-umented fungal groups, which precluded further conclusions(Fig. 3). However, the higher diversity is potentially related tohigher nutrient availability (51) and has been associated withopen nutrient cycle ecosystems (8).

This study shows that use of a pairwise-sequence alignmentalgorithm coupled with efficient MCL clustering is a reliabletechnique for analyzing large data sets for microbial commu-nity studies. This is particularly useful given the increasingavailability of massive sequencing methods. Moreover, the ob-served difference between the clustering methods tested alsoindicates that there should be more thorough consideration ofthe tools employed for definition of MOTUs in microbial ecol-ogy. Applied to the soil fungal communities of two contrastingalpine habitats, our analysis confirmed the importance of hab-itat filtering in the assembly of fungal communities, but furthercharacterization combining traditional approaches coupledwith meta-transcriptomics is needed to determine the mycor-rhizal, pathogen or saprophytic status of the MOTUs describedhere.

5868 ZINGER ET AL. APPL. ENVIRON. MICROBIOL.

on June 6, 2018 by guesthttp://aem

.asm.org/

Dow

nloaded from

ACKNOWLEDGMENTS

We thank Christelle Melodelima for helpful discussions and DavidLejon for comments on an early draft of the manuscript. We also thankSerge Aubert and the staff of Station Alpine J. Fourier for providinglogistic facilities during the field work and Alice Roy and the CentreNational de Sequencage (Genoscope, Evry) for sequencing clone li-braries.

This work was supported by “Microalpes” project ANR-06-BLAN-0301.

REFERENCES

1. Allen, E. B., M. F. Allen, D. J. Helm, J. M. Trappe, R. Molina, and E. Rincon.1995. Patterns and regulation of mycorrhizal plant and fungal diversity. PlantSoil 170:47–62.

2. Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990.Basic local alignment search tool. J. Mol. Biol. 215:403–410.

3. Augustson, J. G., and J. Minker. 1970. An analysis of some graph theoreticalcluster techniques. J. ACM 17:571–588.

3a.Baptist, F., N. G. Yoccoz, and P. Choler. Direct and indirect snow covercontrol over decomposition in alpine tundra along a snowmelt gradient.Plant Soil, in press.

4. Baptist, F., and P. Choler. 2008. A simulation of the importance of length ofgrowing season and canopy functional properties on the seasonal grossprimary production of temperate alpine meadows. Ann. Bot. 101:549–559.

5. Bjork, R. G., M. P. Bjorkman, M. X. Andersson, and L. Klemedtsson. 2008.Temporal variation in soil microbial communities in alpine tundra. Soil Biol.Biochem. 40:266–268.

6. Bridge, P., and B. Spooner. 2001. Soil fungi: diversity and detection. PlantSoil 232:147–154.

7. Brohee, S., and J. Van Helden. 2006. Evaluation of clustering algorithms forprotein-protein interaction networks. BMC Bioinformatics. 7:488.

8. Chapman, S. K., J. A. Langley, S. C. Hart, and G. W. Koch. 2006. Plantsactively control nitrogen cycling: uncorking the microbial bottleneck. NewPhytol. 169:27–34.

9. Chen, D. M., A. F. S. Taylor, R. M. Burke, and J. W. G. Cairney. 2001.Identification of genes for lignin peroxidases and manganese peroxidases inectomycorrhizal fungi. New Phytol. 152:151–158.

10. Choler, P. 2005. Consistent shifts in alpine plant traits along a mesotopo-graphical gradient. Arct. Antarct. Alp. Res. 37:444–453.

11. Cripps, C. L., and L. H. Eddington. 2005. Distribution of mycorrhizal typesamong alpine vascular plant families on the Beartooth Plateau, rocky moun-tains, USA, in reference to large-scale patterns in arctic-alpine habitats.Arct. Antarct. Alp. Res. 37:177–188.

12. Edwards, A. C., R. Scalenghe, and M. Freppaz. 2007. Changes in the sea-sonal snow cover of alpine regions and its effect on soil processes: a review.Quat. Int. 162:172–181.

13. Enright, A. J., S. Van Dongen, and C. A. Ouzounis. 2002. An efficientalgorithm for large-scale detection of protein families. Nucleic Acids Res.30:1575–1584.

14. Ettema, C. H., and D. A. Wardle. 2002. Spatial soil ecology. Trends Ecol.Evol. 17:177–183.

15. Fierer, N., J. L. Morse, S. T. Berthrong, E. S. Bernhardt, and R. B. Jackson.2007. Environmental controls on the landscape-scale biogeography of streambacterial communities. Ecology 88:2162–2173.

16. Finlay, B. J. 2002. Global dispersal of free-living microbial eukaryote species.Science 296:1061–1063.

17. Finlay, R. D. 2008. Ecological aspects of mycorrhizal symbiosis: with specialemphasis on the functional diversity of interactions involving the extraradicalmycelium. J. Exp. Bot. 59:1115–1126.

18. Gardes, M., and A. Dahlberg. 1996. Mycorrhizal diversity in arctic and alpinetundra: an open question. New Phytol. 133:147–157.

19. Green, L. E., A. Porras-Alfaro, and R. L. Sinsabaugh. 2008. Translocation ofnitrogen and carbon integrates biotic crust and grass production in desertgrassland. J. Ecol. 96:1076–1085.

20. Grime, J. P. 1998. Benefits of plant diversity to ecosystems: immediate, filterand founder effects. J. Ecol. 86:902–910.

21. Hattenschwiler, S., A. V. Tiunov, and S. Scheu. 2005. Biodiversity and litterdecomposition interrestrial ecosystems. Annu. Rev. Ecol. Evol. Syst. 36:191–218.

22. Hibbett, D. S. 2006. A phylogenetic overview of the Agaricomycotina. My-cologia 98:917–925.

23. Hibbett, D. S., M. Binder, J. F. Bischoff, M. Blackwell, P. F. Cannon, O. E.Eriksson, S. Huhndorf, T. James, P. M. Kirk, R. Lucking, H. T. Lumbsch, F.Lutzoni, P. B. Matheny, D. J. McLaughlin, M. J. Powell, S. Redhead, C. L.Schoch, J. W. Spatafora, J. A. Stalpers, R. Vilgalys, M. C. Aime, A. Aptroot,R. Bauer, D. Begerow, G. L. Benny, L. A. Castlebury, P. W. Crous, Y. C. Dai,W. Gams, D. M. Geiser, G. W. Griffith, C. Gueidan, D. L. Hawksworth, G.Hestmark, K. Hosaka, R. A. Humber, K. D. Hyde, J. E. Ironside, U. Koljalg,C. P. Kurtzman, K. H. Larsson, R. Lichtwardt, J. Longcore, J. Miad-likowska, A. Miller, J. M. Moncalvo, S. Mozley-Standridge, F. Oberwinkler,

E. Parmasto, V. Reeb, J. D. Rogers, C. Roux, L. Ryvarden, J. P. Sampaio, A.Schussler, J. Sugiyama, R. G. Thorn, L. Tibell, W. A. Untereiner, C. Walker,Z. Wang, A. Weir, M. Weiss, M. M. White, K. Winka, Y. J. Yao, and N.Zhang. 2007. A higher-level phylogenetic classification of the Fungi. Mycol.Res. 111:509–547.

24. Hughes, J. B., J. J. Hellmann, T. H. Ricketts, and B. J. M. Bohannan. 2001.Counting the uncountable: statistical approaches to estimating microbialdiversity. Appl. Environ. Microbiol. 67:4399–4406.

25. Hur, I., and J. Chun. 2004. A method for comparing multiple bacterialcommunity structures from 16S rDNA clone library sequences. J. Microbiol.42:9–13.

26. Johnson, N. C., J. H. Graham, and F. A. Smith. 1997. Functioning ofmycorrhizal associations along the mutualism-parasitism continuum. NewPhytol. 135:575–586.

27. Kasel, S., L. T. Bennett, and J. Tibbits. 2008. Land use influences soil fungalcommunity composition across central Victoria, south-eastern Australia. SoilBiol. Biochem. 40:1724–1732.

28. Korner, C. 1999. Alpine plant life. Springer Verlag, Berlin, Germany.29. Krogan, N. J., G. Cagney, H. Yu, G. Zhong, X. Guo, A. Ignatchenko, J. Li, S.

Pu, N. Datta, A. P. Tikuisis, T. Punna, J. M. PeregrAn-Alvarez, M. Shales,X. Zhang, M. Davey, M. D. Robinson, A. Paccanaro, J. E. Bray, A. Sheung,B. Beattie, D. P. Richards, V. Canadien, A. Lalev, F. Mena, P. Wong, A.Starostine, M. M. Canete, J. Vlasblom, S. Wu, C. Orsi, S. R. Collins, S.Chandran, R. Haw, J. J. Rilstone, K. Gandi, N. J. Thompson, G. Musso, P.St Onge, S. Ghanny, M. H. Y. Lam, G. Butland, A. M. Altaf-Ul, S. Kanaya,A. Shilatifard, E. O’Shea, J. S. Weissman, C. J. Ingles, T. R. Hughes, J.Parkinson, M. Gerstein, S. J. Wodak, A. Emili, and J. F. Greenblatt. 2006.Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.Nature 440:637.

30. Lauber, C. L., M. S. Strickland, M. A. Bradford, and N. Fierer. 2008. Theinfluence of soil properties on the structure of bacterial and fungal commu-nities across land-use types. Soil Biol. Biochem. 40:2407–2415.

31. Litaor, M. I., T. R. Seastedt, and D. A. Walker. 2001. Spatial analysis ofselected soil attributes across an alpine topographic/snow gradient. Landsc.Ecol. 17:71–85.

32. Loytynoja, A., and N. Goldman. 2008. Phylogeny-aware gap placement pre-vents errors in sequence alignment and evolutionary analysis. Science 320:1632–1635.

33. Martin, F., and M. A. Selosse. 2008. The Laccaria genome: a symbiontblueprint decoded. New Phytol. 180:296–310.

34. Muhlmann, O., and U. Peintner. 2008. Ectomycorrhiza of Kobresia myosur-oides at a primary successional glacier forefront. Mycorrhiza 18:355–362.

35. Needleman, S. B., and C. D. Wunsch. 1970. A general method applicable tothe search for similarities in the amino acid sequence of two proteins. J. Mol.Biol. 48:443–453.

36. Nilsson, R. H., M. Ryberg, E. Kristiansson, K. Abarenkov, K.-H. Larsson,and U. Koljalg. 2006. Taxonomic reliability of DNA sequences in publicsequence databases: a fungal perspective. PLoS ONE 1:e59.

37. O’Brien, H. E., J. L. Parrent, J. A. Jackson, J.-M. Moncalvo, and R. Vilgalys.2005. Fungal community analysis by large-scale sequencing of environmentalsamples. Appl. Environ. Microbiol. 71:5544–5550.

38. Porras-Alfaro, A., J. Herrera, R. L. Sinsabaugh, K. J. Odenbach, T. Lowrey,and D. O. Natvig. 2008. Novel root fungal consortium associated with adominant desert grass. Appl. Environ. Microbiol. 74:2805–2813.

39. Porter, T. M., J. E. Skillman, and J. M. Moncalvo. 2008. Fruiting body andsoil rDNA sampling detects complementary assemblage of Agaricomycotina(Basidiomycota, Fungi) in a hemlock-dominated forest plot in southern On-tario. Mol. Ecol. 17:3037–3050.

40. R Development Core Team. 2007. R: a language and environment for sta-tistical computing. R Development Core Team, Vienna, Austria.

41. Read, D. J., and K. Haselwandter. 1981. Observations on the mycorrhizalstatus of some alpine plant communities. New Phytol. 88:341–352.

42. Sandhu, G. S., B. C. Kline, L. Stockman, and G. D. Roberts. 1995. Molecularprobes for diagnosis of fungal infections. J. Clin. Microbiol. 33:2913–2919.

43. Schadt, C. W., A. P. Martin, D. A. Lipson, and S. K. Schmidt. 2003. Seasonaldynamics of previously unknown fungal lineages in tundra soils. Science301:1359–1361.

44. Schloss, P. D., and J. Handelsman. 2005. Introducing DOTUR, a computerprogram for defining operational taxonomic units and estimating speciesrichness. Appl. Environ. Microbiol. 71:1501–1506.

45. Schwarzenbach, K., J. Enkerli, and F. Widmer. 2007. Objective criteria toassess representativity of soil fungal community profiles. J. Microbiol. Meth-ods 68:358–366.

46. Sellers, P. H. 1974. On the theory and computation of evolutionary dis-tances. SIAM J. Appl. Math. 26:787–793.

47. Tedersoo, L., T. Jairus, B. M. Horton, K. Abarenkov, T. Suvi, I. Saar, and U.Koljalg. 2008. Strong host preference of ectomycorrhizal fungi in a Tasma-nian wet sclerophyll forest as revealed by DNA barcoding and taxon-specificprimers. New Phytol. 180:479–490.

48. Valentini, A., F. Pompanon, and P. Taberlet. 2009. DNA barcoding forecologists. Trends Ecol. Evol. 24:110.

VOL. 75, 2009 GRAPH PARTITIONING FOR FUNGAL COMMUNITY STUDIES 5869

on June 6, 2018 by guesthttp://aem

.asm.org/

Dow

nloaded from

49. Van Dongen, S. 2000. Graph clustering by flow simulation. University ofUtrecht, Utrecht, The Netherlands.

50. Venter, J. C., K. Remington, J. F. Heidelberg, A. L. Halpern, D. Rusch, J. A.Eisen, D. Y. Wu, I. Paulsen, K. E. Nelson, W. Nelson, D. E. Fouts, S. Levy,A. H. Knap, M. W. Lomas, K. Nealson, O. White, J. Peterson, J. Hoffman, R.Parsons, H. Baden-Tillson, C. Pfannkoch, Y. H. Rogers, and H. O. Smith.2004. Environmental genome shotgun sequencing of the Sargasso Sea. Sci-ence 304:66–74.

51. Waldrop, M. P., D. R. Zak, C. B. Blackwood, C. D. Curtis, and D. Tilman.2006. Resource availability controls fungal diversity across a plant diversitygradient. Ecol. Lett. 9:1127–1135.

52. Wall, P. K., J. Leebens-Mack, K. F. Muller, D. Field, N. S. Altman, and C. W.

dePamphilis. 2008. PlantTribes: a gene and gene family resource for com-parative genomics in plants. Nucleic Acids Res. 36:D970–D976.

53. Webb, C. O., D. D. Ackerly, M. A. McPeek, and M. J. Donoghue. 2002.Phylogenies and community ecology. Annu. Rev. Ecol. Syst. 33:475.

54. White, T. J., T. Bruns, S. Lee, and J. Taylor. 1990. Amplification anddirect sequencing of fungal ribosomal RNA genes for phylogenetics, p.315–322. In M. A. Innis, D. H. Gelfand, J. J. Sninsky, and T. J. White(ed.), PCR protocols: a guide to methods and applications. AcademicPress, San Diego, CA.

55. Zinger, L., B. Shanhavaz, F. Baptist, R. A. Geremia, and P. Choler. 2009.Microbial diversity in alpine tundra soils correlates with snow cover dynam-ics. ISME J. 3:850–859.

5870 ZINGER ET AL. APPL. ENVIRON. MICROBIOL.

on June 6, 2018 by guesthttp://aem

.asm.org/

Dow

nloaded from