metagenomic analysis of the microbial community in...

10
Metagenomic analysis of the microbial community in ker grains Ufuk Nalbantoglu a,1 , Atilla Cakar b, 1 , Haluk Dogan c,1 , Neslihan Abaci b , Duran Ustek b , Khalid Sayood a , Handan Can c, * a Department of Electrical Engineering, University of Nebraska-Lincoln, 68588 Nebraska, USA b Department of Genetics, Institute of Experimental Medical Research, Istanbul University, 34093 Istanbul, Turkey c Department of Genetics and Bioengineering, Istanbul Bilgi University, 34060 Istanbul, Turkey article info Article history: Received 24 June 2013 Received in revised form 15 January 2014 Accepted 24 January 2014 Available online 1 February 2014 Keywords: Ker Metagenomics Pyrosequencing Whole genome sequencing Microbial diversity abstract Ker grains as a probiotic have been subject to microbial community identication using culture- dependent and independent methods that target specic strains in the community, or that are based on limited 16S rRNA analysis. We performed whole genome shotgun pyrosequencing using two Turkish Ker grains. Sequencing generated 3,682,455 high quality reads for a total of w1.6 Gbp of data assembled into 6151 contigs with a total length of w24 Mbp. Species identication mapped 88.16% and 93.81% of the reads rendering 4 Mpb of assembly that did not show any homology to known bacterial sequences. Identied communities in the two grains showed high concordance where Lactobacillus was the most abundant genus with a mapped abundance of 99.42% and 99.79%. This genus was dominantly repre- sented by three species Lactobacillus keranofaciens, Lactobacillus buchneri and Lactobacillus helveticus with a total mapped abundance of 97.63% and 98.74%. We compared and veried our ndings with 16S pyrosequencing and model based 16S data analysis. Our results suggest that microbial community proling using whole genome shotgun data is feasible, can identify novel species data, and has the potential to generate a more accurate and detailed assessment of the underlying bacterial community, especially for low abundance species. Ó 2014 Elsevier Ltd. All rights reserved. 1. Introduction Ker is a traditional drink obtained via fermentation of milk by ker grains. Ker grains, which are complex mixtures of bacteria, yeast, and the polysaccharides produced by this microora, propagate and pass their properties on to the following generation of new grains (Abraham and De Antoni, 1999; Marshall et al., 1984). Ker, which is believed to be a functional fooddue to its health benets and disease prevention properties beyond its basic nutritional value, is becoming increasingly popular throughout the world (Farnworth and Mainville, 2003). Under- standing the structure and stability of the bacterial community in the ker grain is important for the success of production strategies and the use of ker as functional food. Although there have been some attempts at identifying the bacterial community in the ker grain, these studies are either limited to culture-dependent methods only (Angulo et al., 1993; Fujisawa et al., 1988; Garrote et al., 2001; Simova et al., 2002; Witthuhn et al., 2004) or target specic strains in the community (Delfederico et al., 2006; Kesmen and Kacmaz, 2011). Recently, culture-independent methods such as Polymerase Chain Reaction (PCR)-based amplication and sequencing of 16S rRNA genes or Denaturing Gradient Gel Electrophoresis (DGGE) have been used to analyze microbial diversity in ker grains (Chen et al., 2008; Dobson et al., 2011; Kesmen and Kacmaz, 2011; Leite et al., 2012; Zhou et al., 2009). However, such analyses might not provide a complete picture of the microbial community and lead to ambiguous results due to limitations and errors inherent in these classical proling methods. Although PCR-based methods are widely used in assessing the microbial diversity, these methods often erroneously determine the underlying species and/or strains and may miss up to half of the microbial diversity (Hong et al., 2009). In species identication studies using sequencing of 16S rRNA gene regions, generally a 95% identity is used as the cut-off for sequence similarity. However, as species which are different may still exhibit similarities above this threshold, these studies do not accurately report the underlying community prole with high resolution (Petrosino et al., 2009). Metagenomic analysis using whole genome sequencing (WGS) that does not involve cloning or 16S rRNA gene region amplication provides a culture independent approach, and is extremely * Corresponding author. Tel.: þ90 212 3117431; fax: þ90 212 4278270. E-mail address: [email protected] (H. Can). 1 These authors have contributed equally. Contents lists available at ScienceDirect Food Microbiology journal homepage: www.elsevier.com/locate/fm 0740-0020/$ e see front matter Ó 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.fm.2014.01.014 Food Microbiology 41 (2014) 42e51

Upload: donguyet

Post on 14-Feb-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

lable at ScienceDirect

Food Microbiology 41 (2014) 42e51

Contents lists avai

Food Microbiology

journal homepage: www.elsevier .com/locate/ fm

Metagenomic analysis of the microbial community in kefir grains

Ufuk Nalbantoglu a,1, Atilla Cakar b,1, Haluk Dogan c,1, Neslihan Abaci b, Duran Ustek b,Khalid Sayood a, Handan Can c,*

aDepartment of Electrical Engineering, University of Nebraska-Lincoln, 68588 Nebraska, USAbDepartment of Genetics, Institute of Experimental Medical Research, Istanbul University, 34093 Istanbul, TurkeycDepartment of Genetics and Bioengineering, Istanbul Bilgi University, 34060 Istanbul, Turkey

a r t i c l e i n f o

Article history:Received 24 June 2013Received in revised form15 January 2014Accepted 24 January 2014Available online 1 February 2014

Keywords:KefirMetagenomicsPyrosequencingWhole genome sequencingMicrobial diversity

* Corresponding author. Tel.: þ90 212 3117431; faxE-mail address: [email protected] (H. Can).

1 These authors have contributed equally.

0740-0020/$ e see front matter � 2014 Elsevier Ltd.http://dx.doi.org/10.1016/j.fm.2014.01.014

a b s t r a c t

Kefir grains as a probiotic have been subject to microbial community identification using culture-dependent and independent methods that target specific strains in the community, or that are basedon limited 16S rRNA analysis. We performed whole genome shotgun pyrosequencing using two TurkishKefir grains. Sequencing generated 3,682,455 high quality reads for a total ofw1.6 Gbp of data assembledinto 6151 contigs with a total length ofw24 Mbp. Species identification mapped 88.16% and 93.81% of thereads rendering 4 Mpb of assembly that did not show any homology to known bacterial sequences.Identified communities in the two grains showed high concordance where Lactobacillus was the mostabundant genus with a mapped abundance of 99.42% and 99.79%. This genus was dominantly repre-sented by three species Lactobacillus kefiranofaciens, Lactobacillus buchneri and Lactobacillus helveticuswith a total mapped abundance of 97.63% and 98.74%. We compared and verified our findings with 16Spyrosequencing and model based 16S data analysis. Our results suggest that microbial communityprofiling using whole genome shotgun data is feasible, can identify novel species data, and has thepotential to generate a more accurate and detailed assessment of the underlying bacterial community,especially for low abundance species.

� 2014 Elsevier Ltd. All rights reserved.

1. Introduction

Kefir is a traditional drink obtained via fermentation of milk by“kefir grains”. Kefir grains, which are complex mixtures of bacteria,yeast, and the polysaccharides produced by this microflora,propagate and pass their properties on to the following generationof new grains (Abraham and De Antoni, 1999; Marshall et al.,1984). Kefir, which is believed to be a “functional food” due toits health benefits and disease prevention properties beyond itsbasic nutritional value, is becoming increasingly popularthroughout the world (Farnworth and Mainville, 2003). Under-standing the structure and stability of the bacterial community inthe kefir grain is important for the success of production strategiesand the use of kefir as functional food. Although there have beensome attempts at identifying the bacterial community in the kefirgrain, these studies are either limited to culture-dependentmethods only (Angulo et al., 1993; Fujisawa et al., 1988; Garroteet al., 2001; Simova et al., 2002; Witthuhn et al., 2004) or target

: þ90 212 4278270.

All rights reserved.

specific strains in the community (Delfederico et al., 2006; Kesmenand Kacmaz, 2011).

Recently, culture-independent methods such as PolymeraseChain Reaction (PCR)-based amplification and sequencing of 16SrRNA genes or Denaturing Gradient Gel Electrophoresis (DGGE)have been used to analyze microbial diversity in kefir grains (Chenet al., 2008; Dobson et al., 2011; Kesmen and Kacmaz, 2011; Leiteet al., 2012; Zhou et al., 2009). However, such analyses might notprovide a complete picture of the microbial community and lead toambiguous results due to limitations and errors inherent in theseclassical profiling methods. Although PCR-based methods arewidely used in assessing the microbial diversity, these methodsoften erroneously determine the underlying species and/or strainsand may miss up to half of the microbial diversity (Hong et al.,2009). In species identification studies using sequencing of 16SrRNA gene regions, generally a 95% identity is used as the cut-off forsequence similarity. However, as species which are different maystill exhibit similarities above this threshold, these studies do notaccurately report the underlying community profile with highresolution (Petrosino et al., 2009).

Metagenomic analysis using whole genome sequencing (WGS)that does not involve cloning or 16S rRNA gene region amplificationprovides a culture independent approach, and is extremely

U. Nalbantoglu et al. / Food Microbiology 41 (2014) 42e51 43

important as this approach overcomes the aforementioned prob-lems involved in alternative species identification methods(Kalyuzhnaya et al., 2008; Pallen et al., 2010; Ventura et al., 2009).The most important limitation that has delayed or even preventedapplication of whole genome sequencing to microbial communityprofiling is the lack of advanced bioinformatics algorithms that canhandle the complex nature of the data produced (Ng and Kirkness,2010).

In this study, for the first time, we identified the microbial di-versity in kefir grains in a fast and accurate manner using WGS viapyrosequencing, which is a culture independent approach and doesnot require any cloning. In the species identification phase, we useda robust taxonomic classification method that employs an iterativeprocedure and successfully maps models derived from relativelyshorter contigs generated in metagenomic studies. We used twodifferent Turkish Kefir grains as our model and compared andvalidated our findings with two separate 16S analysis approach. Weperformed PCR amplification of the hypervariable V1eV2 regionsof the 16S rRNA gene of the two kefir grains followed by pyrose-quencing and also extracted the 16S rRNA gene reads that comefrom WGS using a Hidden Markov Model based approach. Weassessed the community profile of both Kefir grains using the threedifferent methods and performed a comparative analysis bothwithin the kefir grains used in this study and the ones used in theliterature. In Fig. 1, we summarize our analysis strategy.

2. Materials and methods

2.1. Kefir grain samples

Two Turkish kefir grains were used for the present study. Thefirst kefir grain sample (Kefir1) was obtained from Ege University,Faculty of Agriculture, Department of Dairy Technology, _Izmir,Turkey. The second kefir grain sample (Kefir2) was obtained from afamily living in the Northwest region of Turkey who cultivate thekefir grains for self-consumption. Kefir grain samples were

Fig. 1. Overall analysis strategy performed to analyze Kefir’s microbial community. DNA fromtrimming, filtering, and assembly of these reads, species identification was done using RAIcomputationally identify the reads coming from the 16S rRNA genes. These reads were thperformed only on the amplified 16S rRNA region for the microbial community. These sequseparately done for the two Kefir samples.

transported to the laboratory and cultured in sterilized whole milk.50 g of kefir grains were inoculated with 500 ml of the sterilizedmilk and incubated at 25 �C for 3 days. This step was repeatedseveral times until the kefir grains had appropriate characteristicsand increased in biomass (10%). Later, the grains were filtered toremove fermented milk beverages.

2.2. Isolation of metagenomic DNA

Kefir grains were homegenized in sterile 0.9% NaCI solution for3 min for total DNA extraction. 2 ml of each homogenate wascentrifuged for 15 min at 10,000 � g and the pellet was washedtwice with sterile water. Lysis steps were based on the method ofDNA isolation from kefir grains with some modifications(Kowalczyk et al., 2012). Pellets were resuspended in 1 ml of lysisbuffer (50 mM EDTA, 0.1 M NaCI, 10 mM TriseHCI [pH 7.5]) con-taining 25 mM sucrose. After thorough resuspension, three freezeethaw steps were performed. 100 ml lysozyme (30 mg/ml), 5000 ul/ml mutanolysin and 10 ml RNAase (10 mg/ml) were added to themixture and incubated for 1 h at 37 �C with occasional agitation.50 ml of 20% SDS and 5 ml of Proteinase K (10 mg/ml) were thenadded and the mixture was incubated for another 1 h at 37 �C toallow cell lysis. The lysate was centrifuged at 13,000� g at 25 �C for10 min and supernatant was transferred to a clean tube. Eachsample was subjected to DNA extraction using Wizard� GenomicDNA Purification kit (Promega BioSciences, LLC.San Luis Obispo,USA) according to the manufacturer’s protocol for Gram bacteria.The extracted DNA was stored at �20 �C.

2.3. Pyrosequencing

The microflora of kefir grains was characterized by twosequencing methods using the Roche/454 GS FLXþ system (RocheDiagnostics Co., Indianapolis, IN, USA). The first method used WGSof metagenomic DNA while the second method used amplifiedhypervariable regions (containing V1 and V2) of the 16S rRNA gene.

two Kefir samples was used for Whole Genome Shotgun (WGS) Sequencing. Followingphy and BLAST. Reads from the WGS data was subject to Hidden Markov Modeling toen subject to species identification. Separate (other than WGS) pyrosequencing wasencing results were used for species identification. All three identification steps were

U. Nalbantoglu et al. / Food Microbiology 41 (2014) 42e5144

For the first method, 500 ng of DNA was used to generate theshotgun libraries using the GS FLXþ Rapid Library Preparationmethod for the WGS sequencing. Pyrosequencing was conductedusing the GS FLXþ System following manufacturer’s instructions.For the second method, bacterial 16S rRNA genes were amplifiedfrom metagenomic DNA with two universal primer pairs: the for-ward primer Y1 (50-TGGCTCAGGACGAACGCTGGCGGC-30) andreverse primer Y2 (50-CCTACTGCTGCCTCCCGTAGGAGT-30) thatcorresponded to Escherichia coli positions 20 to 361 (Young et al.,1991). 454-adapters were also attached to both, followed by a 10-bp sample specific barcode sequence. PCR was performed using thefollowing cycling profile: initial denaturation at 95 �C for 5 minfollowed by 25 cycles of 94 �C for 30 s, 52 �C for 40 s and 72 �C for30 s, and a final extension step at 72 �C for 10min (Leite et al., 2012).Amplicons were purified using Mini-Elute Kit (Qiagen Inc, Valencia,CA, USA) and quantified. The purified amplicon library was furtherverified and quantified and subject to 454/Roche pyrosequencing.

Sequence reads obtained from the FLXþ system were furthertrimmed and filtered using Popoolation (Kofler et al., 2011). Highquality sequencing reads were obtained by first removinglow quality bases (trimming) and then keeping reads with a min-imum length of 40, after low quality bases were trimmed (filtering).Novel sequence information identified in this study is submittedto DDBJ/EMBL/GenBank with BioProject IDs L238 and L239, locustags PRJNA200716 and PRJNA200717, accession numbersASXE00000000 and ASRJ00000000, and versions ASXE01000000and ASRJ01000000 for the two Kefir samples used in our study,respectively.

2.4. Metagenome analysis

Kefir metagenome phylotyping analysis was carried out usingthree consecutive steps: read assembly, species binning, and sim-ilarity search for unclassifed DNA sequences. In the assembly step,overlapping sequence reads were grouped together to be classifiedas a single unit. Species binning was conducted by classifying thegenerated contigs using a sequence composition based method.Finally the sequencing reads and contigs that were unclassified atthe binning stepwere searched for similarity against the nucleotidedatabase of NCBI (http://www.ncbi.nlm.nih.gov/nuccore). Thisdatabase is a combination of various NCBI databases includingGenBank, RefSeq, Third Party Annotation (TPA) Sequence, andProtein Data Bank (PDB).

2.4.1. The assembly methodAssembly of the pyrosequencing reads is used as a preprocess in

order to obtain longer DNA fragments for a more accurate sequencebinning. The algorithm employed to assemble contigs was a slightlymodified version of theMeta-IDBA algorithm (Peng et al., 2011) andkept track of the number of reads used in the generation of a contig.Therefore the abundance information is not lost and this infor-mation is later used in the binning phase. The Eulerian path para-digmwas used to generate the contigs (Idury andWaterman, 1995;Pevzner et al., 2001). In the scheme employed, short oligonucleo-tides in the sequencing data are determined and a De-Bruijn graphis constructed. In this graph, the sequence reads form the verticesand the edges denote the overlapping sequences. An acyclic path inthe graph represents a fragment of a sequenced genome. The as-sembly algorithm used in this work has a slight modification on theEulerian methods. While building the graphs, the vertices repre-sented k-mers (a DNA sequence of length k), and the edges wereweighted by the number of reads inwhich the corresponding k-meroverlaps were observed. Therefore, choosing an optimal path withthe maximal score can eliminate sequencing errors, and help avoidchimeric sequences. Starting from an arbitrary k-mer, the maximal

path visiting the edges giving the maximum total weight wasdetermined. The Viterbi algorithm was used to find the optimalpath (Press et al., 2007). The sequence of vertices determined thecontig, and the total assembly score (representing the number ofreads) reflected the abundance of that contig in the metagenome.The edges and the vertices were removed from the graph as theywere visited, and the process terminated when there were noremaining vertices. In the final workflow for the assembly phase,low coverage filtering was disabled and k-mers of length 50 wereutilized.

2.4.2. Species binningRAIphy was used to bin the contigs obtained from the previous

assembly step by assigning them to known species (Nalbantogluet al., 2011). RAIphy obtains a model using whole or partial ge-nomes of species and uses this model to classify the contigs. RAIphymodels were built using NCBI’s RefSeq database with 2379 speciesfor which whole genome or genome length greater than 100 Kbpsequences were available. RAIphy assigns a likelihood score to aDNA sequence belonging to a species. A 95% confidence thresholdwas used to classify a given sequence. This threshold was deter-mined as the average likelihood score rejecting 95% of the falsepositives by simulating a mixture sampling 1000 random frag-ments from each sequenced (complete or near-complete) microbialgenome. The cumulative assembly score of the contigs assigned toan organism was used as the abundance score for that organism.Contigs that were not assigned to any species using RAIphy weremapped using megablast with E-value and similarity cut-off valuesof 10�10 and 99%, respectively (Zhang et al., 2000). We used themetagenomics RAST (MG-RAST) server (http://metagenomics.anl.gov/) for the functional analysis of the WGS metagenomic se-quences (Meyer et al., 2008).

2.5. 16S analysis

16S analyses were done using two different approaches: i) 16Ssequences were found from thewhole genome shotgun sequencingdata using a computational approach; ii) 16S reads were directlyobtained using pyrosequencing performed separately from wholegenome sequencing. In the first approach, reads that were pre-dicted to come from a 16S sequence were found using a HiddenMarkov Model based approach (Huang et al., 2009). Both data setswere similarly mapped to the Ribosomal Database Project Release10 16S database consisting of 2,639,157 16S sequences (Cole et al.,2009) using megablast.

3. Results

3.1. Sequence analysis

Whole genome shotgun sequencing of the Kefir1 samplerevealed 2,066,539 reads containing 1,781,686,229 base pairs (bp)rendering an average read length of 862 bp. Following trimmingand filtering, we obtained 1,832,793 reads with an average readlength of 438 bp amounting to 802,989,242 bp of total sequencedata. Kefir2 sequencing resulted in 2,026,402 raw sequence readscontaining 2,392,340,770 bp of total raw sequence data with anaverage read length of 1180 bp. When low quality reads weretrimmed and remaining short sequences were filtered out, theKefir2 sample contained 1,849,662 high quality reads with anaverage read length of 466 bp and a total sequence data of862,317,873 bp. These results are summarized in Table 1. In Fig. 2,we show the distribution of average read lengths and number ofreads for varying average quality values, underlying the highquality reads obtained for both kefir samples.

Table 1Sequence statistics for the two Kefir samples comparing raw and high qualitysequence data (following trimming and filtering) used in downstream analysis.

Kefir1 Kefir2

Raw T&F Raw T&F

# of reads 2,066,539 1,832,793 2,026,402 1,849,662Total bp 1,781,686,229 802,989,242 2,392,340,770 862,317,873Average read

length (bp)862 438 1.180 466

Abbreviations: T&F, Trimmed and Filtered; bp, base pairs.

U. Nalbantoglu et al. / Food Microbiology 41 (2014) 42e51 45

3.2. Metagenome phylotyping

Following assembly, we obtained 3268 and 2883 contigs inKefir1 and Kefir2, respectively. The numbers of reads that were notpart of any contigs (singletons) were 1129 in Kefir1 and 1397 inKefir2, amounting to 0.06% and 0.07% of the total number of reads,respectively. The total assembly length for Kefir1 was 12,486,312 bpwith the largest contig comprising of 214,415 bp. Similarly, for Kefir2, we obtained a total assembly length of 11,403,121 bp where thelargest contig was 298,694 bp long. The N50 values for Kefir1 andKefir2 assemblies were 24,721 bp and 19,185 bp, respectively withan average contig length of 3821 bp for Kefir1 and 3956 bp forKefir2. These statistics are summarized in Table 2.

74.23% and 81.81% of all the reads (represented in 514 and 363contigs accounting for 7,295,787 bp and 6,674,497 bp of total as-sembly) were successfully mapped to a species using RAIphy forKefir1 and Kefir2, respectively. Out of the remaining contigs, 1320and 1473 (representing 13.93% and 11.36% of all the reads with totalassemblies of 2,885,652 bp and 2,972,563 bp) were mapped to aknown species using BLAST for Kefir1 and Kefir2, respectively.Therefore, in the end there were 11.84% and 6.83% unmapped reads

Fig. 2. Distribution of average read lengths and number of reads with respect toaverage quality of the reads.

(represented in 1433 and 1046 contigs amounting for 2,304,873 bpand 1,756,061 bp of total assembly) for Kefir1 and Kefir2, respec-tively. These statistics are summarized in Table 2; and in Table 3, werepresent the combined percent abundance of species found usingRAIphy and BLAST for Kefir1 and Kefir2, respectively. In Table 3, thepercent abundance of a species is represented both by consideringall reads and by considering mapped reads only. We includedspecies so that the coverage is over 99.9% of reads that have beenassigned to a species. If a species was not mapped in a Kefir sampledue to its low abundance, this is indicated by a ‘*’ representinginsignificant abundance. Overall, using the 99.9% coverage cut-off,we identified 27 species in Kefir1 and 17 species in Kefir2. Therewere 14 species commonly found in both samples. In Fig. 3, weshow the phylogenetic tree of these species based on rRNA analysiswhere E. coli PK3 is used as an outgroup. In this figure, we indicatethe species commonly and uniquely found in Kefir1 and Kefir2along with the species’ total abundance rates.

Whole genome shotgun metagenomic analysis identified fourbacterial families: Lactobacillaceae, Leuconostoccaceae, Enter-ococcaceae, and Streptococcaceae. Lactobacillaceae, which wasfound to be the most abundant family in both Kefir samples, wasrepresented by two genera, Lactobacillus and Pediococcus, in theanalyzed samples. The genus Lactobacillus contained 88.07% and92.93% of the bacterial community in Kefir1 and Kefir2 samples,respectively (see Table 3). On the other hand, the genus Pediococcuscontained 0.11% and 0.021% of the species in Kefir1 and Kefir2samples, respectively. The Leuconostoccaceae family was repre-sented only with the Oenococcus genus in Kefir1 for a total abun-dance of 0.096%, while this family had both the Oenococcus and theLeuconostoc genera present in Kefir2 for a total abundance of0.069%. The Enterococcaceae family was only observed in Kefir1with the genus Tetragenococcus at a total rate of 0.034%. Finally, thefamily Streptococcaceae was found in both samples, represented bythe genus Lactococcus at an abundance rate of 0.046% in Kefir1 and0.013% in Kefir2.

In both Kefir samples, most of the community was made up ofthe species Lactobacillus kefiranofaciens (74.30% in Kefir1, 84.63% inKefir2; see Table 3). When we aligned the assembled contigs withthe whole genome of L. kefiranofaciens (Wang et al., 2011), Kefir1contigs achieved 183� average coverage mapping 99.54% of thetarget genome. Similarly, the contigs from the assembly of Kefir2shotgun data resulted in 206� average coveragemapping to 99.79%of the L. kefiranofaciens genome. L. kefiranofaciens was followed, inthe order of abundance, by Lactobacillus buchneri (8.017% in Kefir1,2.53% in Kefir2) and Lactobacillus helveticus (3.75 % in Kefir1, 4.79%in Kefir2). These three species make up of 86.05% of the totalcommunity in Kefir1 and 91.96% of the total community in Kefir2.

3.3. 16S analysis

Hidden Markov Model based 16S sequence modeling revealed8847 reads in Kefir1 (mean length w495 bp) and 5163 reads inKefir2 (mean length w521 bp). 16S pyrosequencing, done sepa-rately from the whole genome sequencing, resulted in 1507 and2057 high quality reads with an average read length of w257 bpand w209 bp for Kefir1 and Kefir2, respectively. Species identifi-cation using the modeled 16S reads that come from the wholegenome sequence data revealed 15 and 14 species for Kefir1 andKefir2 samples, respectively. Using the 16S pyrosequencing reads,we identified 11 species for both Kefir1 and Kefir2 samples. Theseresults are summarized in Table 4 where species that were morethan 0.05% of the total community are shown.

In the two 16S analysis methods shown in Table 4 (S: sequencingof the V1 and V2 regions from the metagenomic DNA; M: modeled16S sequences from the whole genome sequencing of the

Table 2Assembly and mapping statistics for the two Kefir samples. Assembled contigs are first mapped using RAIphy. Contigs that are not mapped using this species binning step aremapped using BLAST.

Contigs Assembly RAIphy BLAST Unmapped

Kefir1 Kefir2 Kefir1 Kefir2 Kefir1 Kefir2 Kefir1 Kefir2

# 3268 2883 514 363 1320 1473 1433 1046Length 12,486,312 11,403,121 7,295,787 6,674,497 2,885,652 2,972,563 2,304,873 1,756,061Largest 214,415 298,694 214,415 298,694 40,451 52,189 10,478 10,052N50 24,721 19,185 22,641 33,156 3157 2335 1111 1212N90 1812 1725 3755 5430 1327 1070 980 900Median 1686 1919 11,463 13,523 1427 1256 1235 1215Mean 3821 3956 14,194 18,387 2186 2018 1608 1679

U. Nalbantoglu et al. / Food Microbiology 41 (2014) 42e5146

metagenomic DNA), we identified two families: Lactobacillaceaeand Enterococcaceae. Similar to the whole genome metagenomicanalysis, the Lactobacillaceae family was the most abundant familyand was represented by the two genera in Kefir with Lactobacillusand Pediococcus. In Kefir1, the Lactobacillus genus made up of97.041% of the total community using the 16S sequencing (S), whilethis abundancewas found to be 99.09% using the 16Smodeling (M).In Kefir1, the genus Pediococcus was not identified using the Smethod and was found to contain only 0.091% of the total com-munity using the M method. Similar results were obtained forKefir2. While the genus Lactobacillus was found to be very abun-dant (98.73% using the S method and 99.16% using the M method),the genus Pediococcuswas only found using theMmethod at a rate

Table 3Abundance of species mapped using RAIphy and BLAST. ‘Total’ (Mapped) abundancepercentage indicates the percentage of the reads mapped to a certain species when‘all of the sequencing reads’ (onlymapped sequencing reads) are considered. Specieswhose mapped abundance percentages add up to 99.9% in the respective Kefirsample are shown. If the abundance of a species was too low to be included in thislist for the respective Kefir sample, this is indicated by a *.

Species Kefir1 Kefir2

Mapped% Total% Mapped% Total%

Lactobacillus kefiranofaciens 84.280 74.301 90.884 84.639Lactobacillus buchneri 9.094 8.017 2.722 2.536Lactobacillus helveticus 4.262 3.757 5.143 4.792Lactobacillus casei 0.272 0.240 0.157 0.146Lactobacillus acidophilus 0.367 0.324 0.208 0.194Lactobacillus amylovorus 0.305 0.269 0.397 0.370Lactobacillus brevis 0.143 0.126 0.017 0.016Lactobacillus delbrueckii 0.089 0.079 * *Lactobacillus plantarum 0.232 0.204 0.056 0.053Lactobacillus pentosus 0.210 0.185 0.075 0.070Pediococcus claussenii 0.149 0.131 0.022 0.021Oenococcus oeni 0.109 0.096 0.051 0.048Pediococcus damnosus 0.097 0.086 * *Lactobacillus salivarius 0.060 0.053 * *Lactococcus garvieae 0.043 0.038 0.014 0.013Tetragenococcus halophilus 0.038 0.034 * *Lactobacillus johnsonii 0.024 0.021 0.085 0.079Lactobacillus rhamnosus 0.021 0.018 * *Lactobacillus crispatus 0.018 0.016 * *Pediococcus halophilus 0.018 0.016 * *Lactobacillus gasseri 0.017 0.015 0.011 0.010Lactobacillus rossiae 0.012 0.011 * *Pediococcus pentosaceus 0.010 0.009 * *Lactobacillus kefiri 0.010 0.009 * *Lactobacillus sakei 0.009 0.008 * *Lactococcus lactis 0.009 0.008 * *Lactobacillus reuteri 0.008 0.007 * *Leuconostoc mesenteroides * * 0.022 0.021Lactobacillus gallinarum * * 0.016 0.015Lactobacillus paracasei * * 0.026 0.013

Total 99.905 88.077 99.906 93.034

* Insignificant abundance.

of 0.175%. The family Enterococcaceae was only found in Kefir1using the M method and the abundance was at 0.091% representedby the genus Tetragenococcus.

In both samples, as expected, most of the bacterial communitywas found to contain L. kefiranofaciens using the 16S analysismethods: In Kefir1 L. kefiranofaciens abundance was calculated tobe 77.01% using theMmethod, while this rate came down to 59.98%based on the S method. For Kefir2, 16S modeling (M) analysisrevealed 78.14% of the community to be L. kefiranofaciens and 16sequencing (S) analysis resulted in an abundance of 62.63% for thisspecies. Although the other species identified by the two 16Sanalysis methods show high agreement (only three species iden-tified using the S method was not found in the M method), thepercent abundance of the species other than L. kefiranofaciens didnot show high concordance between the two methods.

4. Discussion

Sequencing results for the two Kefir samples (Table 1) representtypical outputs of an FLXþ sequencing with over 400 bp highquality read average length and over 800 Mbp of high quality totalsequence data for each run. The distribution of the average readlengths based on the average quality values of the reads show thatthis distribution is unimodal for both kefir samples and peak at the25e32 average read quality range exceeding 500 bp average lengthper read. Similarly, the number of reads for a given average readquality value exhibit a unimodal distribution with bulk of the datahaving an average read quality in the 24e30 range. Although Kefir1sample resulted in larger read lengths for average quality valuesexceeding 32, this does not amount to a significant difference as thenumber of reads in this average read quality range is small. Overall,these results suggest that both sequencing runs produced compa-rable high quality read data in the expected amounts for down-stream analysis.

The total assembly length, N50 and average contig length valuesfor the two sequencing runs in addition to an almost insignificantpercentage of singletons (0.06e0.07%) exhibit that the assemblyphase was successfully carried out generating large contigs withhigh coverage (see Table 2). The average contig sizes for eachmetagenome sample were greater than 1000 bp where RAIphy isreported to operate accurately (Nalbantoglu et al., 2011), justifyingthe use of the species binningmethod. RAIphy successfully mappedthe bulk of the reads (74.23% and 81.81% for Kefir1 and Kefir2,respectively) that amount to themajority of the assembly (58.4% forKefir1 and 58.5% for Kefir2). The contigs mapped by RAIphy(average lengths of 14,194 bp and 18,387 bp for Kefir1 and Kefir2,respectively) were significantly larger than the average assemblycontig length. As RAIphy uses a k-mer based profile developed fromgenomic fragments for species binning, the fact that long(>1000 bp) contigs have been used to derive the profile, improvethe confidence in these mapping results. The assembly mapped by

Fig. 3. Phylogenetic analysis of species found in Kefir1 and Kefir2 using whole genome metagenomic analysis. If a sequence is exclusively listed for one of the Kefir samples due toits insignificant abundance in the other Kefir sample, the sample with ample abundance is indicated in parenthesis. The total percent abundance of the species is indicated inbrackets for Kefir1 and Kefir2, respectively.

U. Nalbantoglu et al. / Food Microbiology 41 (2014) 42e51 47

BLAST and the unmapped assembly, roughly share the remainingreads, and the average contig lengths mapped by BLAST are slightlylarger than unmapped contigs. Considering the stringent criteriaused in species identification, the whole genome metagenomicanalysis proposed here has found w2.3 Mbp and w1.8 Mbp ofunmapped assembly, which represent potentially novel sequencedata for Kefir1 and Kefir2 communities, respectively. These novel

Table 4Abundance of species found using 16S analysis. M (%) indicates the percent abun-dance based on the Modeled 16S reads using the whole genome sequence results. S(%) indicates the percent abundance based on the 16S pyrosequencing of Kefir1 andKefir2 samples.

Species Kefir1 Kefir2

M (%) S (%) M (%) S (%)

Lactobacillus kefiranofaciens 77.018 59.985 78.149 62.632Lactobacillus acidophilus 10.241 0.074 11.023 e

Lactobacillus sunkii 4.217 7.544 2.469 4.110Lactobacillus johnsonii 1.614 0.370 1.691 0.440Lactobacillus crispatus 1.262 e 1.089 e

Lactobacillus kefiri 1.262 9.541 0.816 11.957Lactobacillus otakiensis 1.250 10.281 1.050 8.115Lactobacillus helveticus 1.091 e 1.691 e

Lactobacillus kalixensis 0.580 e 0.583 e

Lactobacillus rapi 0.296 1.923 0.272 3.033Pediococcus lolii 0.205 e 0.175 e

Lactobacillus diolivorans 0.114 e 0.097 e

Tetragenococcus halophilus 0.091 e e e

Lactobacillus buchneri 0.080 2.663 0.136 4.881Lactobacillus parafarraginis 0.068 e e e

Lactobacillus parabuchneri e 1.849 e 0.054Lactobacillus plantarum e 1.405 e 0.062Lactobacillus parakefiri e 1.405 e 2.832Lactobacillus brevis e e 0.097 0.734

Total 99.386 97.041 99.339 98.850

sequences are represented in 2479 contigs in total, with an averagelength of over 1500 bp.

According to the results summarized in Table 3, thirteen specieslisted exclusively under Kefir1 (as they were found in traceamounts in Kefir2) add up to 0.35% of the total read abundance inKefir1. Similarly, the three species that had insignificant abundancein Kefir1 and therefore were listed only under Kefir2 amount to0.05% of the total read abundance in Kefir2. These results suggestthat the microbial community found in both samples greatlyoverlap both with respect to the species profile and the speciesabundance. There are, however, minute differences between thecommunity profiles of the two kefir samples.

We used 16S data analysis to verify and compare whole genomeshotgun metagenomic results. Whenwe used whole genome readsthat were predicted to come from 16S sequences, we identified aless diverse community, about 60e80% of the total number of spe-cies identified using thewhole genome approach. On the other hand16S pyrosequencing generated an even less precise communityprofile, where only 40e60% of the species identified by the wholegenome approach were found. These results suggest that the pro-posed approach using WGS is superior to 16S phylotyping usingpyrosequencing as it not only generates novel sequence data butalso by identifies a higher resolution community profile. Sevenspecies in Kefir1 and 5 species in Kefir2 identified based on 16Sbased methods were not discovered in the whole genome meta-genomic approach. However, these correspond to 5e7% of thecommunity profile discovered using the 16S based metagenomicanalysis. Over 90% of the 16S species identification results matchedwith our whole genome approach providing a justification for theuse of the proposed approach. Therefore, we hypothesize that mi-crobial community profiling using whole genome shotgun data isfeasible, can identify novel species data, and has the potential togenerate amore accurate assessment of the underlying distribution.

U. Nalbantoglu et al. / Food Microbiology 41 (2014) 42e5148

16S modeled species identification showed high concordancebetween the two Kefir samples as 12 species were commonly foundin both samples out of the 15 and 14 identified species in Kefir1 andKefir2, respectively. We observed a similar agreement between theresults found by 16S pyrosequencing data analysis as 9 species werecommonly found in both samples out of the 11 identified species inKefir1 and Kefir2. When the two different 16S analysis approacheswere compared, we observed that only three species (amounting tow3e4.5% of the total community) identified by the 16S pyrose-quencing approach was not found using the 16S modeled read dataanalysis. Therefore, the whole genome data, even when boileddown to only the 16S reads, not only identifiesw75% of the speciesfound using 16S pyrosequencing but also generates an w50% morediverse population profile.

L. kefiranofaciens turned out to be the most dominant species inthe bacterial community hosted by both of the Kefir samples(74.30% in Kefir1 and 84.63% in Kefir2; see Table 3) showing slightlydifferent abundance rates based on the employed 16S analysismethods (w80% in 16S modeling andw60% in 16S sequencing; seeTable 4). The results of mapping the assembled contigs to thewholegenome of this species in both kefir samples showedw200�meancoverage with over 99.5% of the bases accounted for in the genome.Although studies employing culture dependentmethods to identifythe microbial community in Kefir have mostly failed to identify thisspecies (Angulo et al., 1993; Fujisawa et al., 1988; Garrote et al.,2001; Simova et al., 2002; Witthuhn et al., 2004), studies usingculture independent methods have reported L. kefiranofaciens asthe most dominant member of the community despite analyzingkefir grains from different sources (Chen et al., 2008; Dobson et al.,2011; Kesmen and Kacmaz, 2011; Leite et al., 2012; Zhou et al.,2009). In a study analyzing the Taiwanese Kefir, PCR-DGGE and16S analysis revealed that 25e43% of the community consists ofL. kefiranofaciens (Chen et al., 2008), and in another study using asimilar approach to analyze various Turkish Kefir samples,L. kefiranofaciens was found to be the most dominant species in allthe samples (Kesmen and Kacmaz, 2011). Concordant results havebeen observed in studies analyzing Tibetan (Zhou et al., 2009), Irish(Dobson et al., 2011), and Brazilian (Leite et al., 2012) Kefir samplesusing culture independent methods. Our 16S analysis approachesreveal a similar community profile as those found in the literaturethat employ 16S based identification techniques. However, in thesestudies the abundance of L. kefiranofaciens has been reported to bearound 40%. We have observed a similar trend, i.e., in our 16Ssequencing based analyses, L. kefiranofaciens abundance is reportedto be significantly lower than the abundance we report using theWGS metagenomic analysis and closer to the ratios obtained inother studies. This difference is potentially due to the differentprimers used in amplifying certain regions of the 16S rRNA gene,which may limit the ability to assess the dominant species’ abun-dance with high accuracy (Hong et al., 2009; Petrosino et al., 2009).

Based on the whole genome pyrosequencing analysis, we haveidentified L. buchneri as the second most abundant species (8.011%in Kefir1 and 2.53% in Kefir2; see Table 3). In most previous studiessuch as (Chen et al., 2008; Kesmen and Kacmaz, 2011; Zhou et al.,2009), in almost all of the analyzed Kefir samples, the secondmost abundant species have been reported to be Lactobacillus kefiri.In these studies, 16S rRNA gene’s V3 region has been amplified andsequence results have been analyzed using the PCR-DGGE method.In one of these studies it was reported that L. buchneri and L. kefirireveal identical DGGE profiles, and GenBank and EMBL sequenceanalysis do not show any difference between the 16S rRNA V3 re-gions of these two species (Kesmen and Kacmaz, 2011). They haveconcluded that they have observed L. kefiri, not L. buchneri, asprevious culture dependent methods have isolated L. kefiri fromKefir samples. In our WGS metagenomic analysis, we have not

identified L. kefiri in Kefir2 and this species was found to be in traceamounts (0.009%, see Table 3) in Kefir1. On the other hand, both ofour 16S based analysis reveal similar results as the ones found inliterature where L. buchneri is reported at much lower levels (Ke-fir1: 0.080% (16S modeling) 2.663% (16S sequencing); Kefir2:0.136% (16S modeling) 4.881% (16S sequencing); see Table 4) thanL. kefiri (Kefir1: 1.262% (16S modeling) 9.541% (16S sequencing);Kefir2: 0.816% (16S modeling) 11.957% (16S sequencing); seeTable 4). However, as the sequenced regions for these two speciesdo not vary enough to make a correct assessment, we believe thatthe WGS analysis here may have identified the microbial commu-nity more accurately by calling the species in question asL. buchneri. We note that the average contig lengths used to identifythe species based on WGS was around 15K bp in the RAIphy anal-ysis, and over 2K bp in the BLAST analysis (see Table 2). These se-quences are much longer than the ones used in 16S based sequenceidentification and therefore has a much higher chance of making acorrect assignment to a given query sequence. This observation isfurther strengthened by the fact that short 16S rRNA gene regionsused to identify species may not show a high enough variancebetween different species. Our results based on the 16S modeledsequences are more similar to the WGS phylotyping results, sup-porting its findings as the average sequence lengths used in 16Smodeling analysis (495 bp for Kefir1, 521 bp for Kefir2) are longerthan the average sequence lengths used in the 16S sequencingbased analysis (257 bp for Kefir1 and 209 bp for Kefir2). Moreover,the 16S modeled sequences do not necessarily come from a certainregion of the 16S rRNA gene, which may not show a high enoughvariance. These results potentially exhibit the primer sequence biasproblem observed in 16S rRNA based approaches, which may bebypassed using WGS metagenomic analysis.

A similar observation can be made for Lactobacillus otakiensis.This species has been found to be the second most abundant spe-cies using 16S sequencing (10.281% in Kefir1 and 8.115% in Kefir2,see Table 4) and the seventh most abundant species using 16Smodeling analysis (1.250% in Kefir1 and 1.050% in Kefir2, seeTable 4). However, our WGS metagenomic analysis has not identi-fied L. otakiensis as one of the community members in the Kefirmicroflora. A careful investigation of the 16S rRNA gene sequencesfor L. buchneri, L. kefiri, L. otakiensis, and Lactobacillus sunkii speciesreveal that these sequences exhibit more than 97% identity, the V3region show 100% identity and the V1eV2 region exhibit over 95%pairwise identity (see Fig. 4). Therefore, we believe that 16S basedanalysis methods do not provide enough detail to differentiatebetween these species. Similar sequence identity results have beenobserved for Lactobacillus rapi, which has been identified using the16S based methods but not the WGS analysis (data not shown).

Our whole genome phylotyping results reveal L. helveticus as thethird dominant species in both Kefir samples (3.75% for Kefir1 and4.79% for Kefir2; see Table 3). When compared with other cultureindependent methods, this species has been found to be one of thedominant members in the Tibetan and Taiwanese Kefir samples(Chen et al., 2008; Zhou et al., 2009) but was not listed as one of themajor players in the microflora identified in other studies (Dobsonet al., 2011; Kesmen and Kacmaz, 2011; Leite et al., 2012). This isprobably due to the difference between the community profiles ofKefir samples that do not have the same origin.

The species found in the Kefir microbial community by our WGSapproach, which were also previously identified in Kefir by otherstudies, constitute more than 99% of the abundance we have foundin the analyzed Kefir samples. Newly identified species in this workall belong to the Lactobacillales order, which is expected as themembers of these families are generally potential probiotics andseen in other fermented products (Reid et al., 2006). For example,Lactococcus garvieae, a newly identified species in our study, has

Fig. 4. Multiple sequence alignment of the V1eV3 regions of the 16S rRNA gene forLactobacillus buchneri, Lactobacillus kefiri, Lactobacillus otakiensis, and Lactobacillussunkii. Alignments other than matched regions are indicated by vertical white lines.

U. Nalbantoglu et al. / Food Microbiology 41 (2014) 42e51 49

been found in dairy products manufactured from rawmilk possiblycontributing to the sensory profile and safety of the final product(Alegria et al., 2009; Florez et al., 2012; Foschino et al., 2008). Onthe other hand, another species not previously identified in Kefir,Tetragenococcus halophilus, has been identified in Mexican cheesecontributing to the proteolytic activity (Morales et al., 2011). Similarobservations can be made for Lactobacillus crispatus and Lactoba-cillus johnsonii (El-Baradei et al., 2008; Henri-Dubernet et al., 2008;Wegmann et al., 2009). Therefore, it is possible that these speciescould contribute to the sensorial attribute and safety of theanalyzed Kefir grains. The low abundance of these species does notimply a limited or ineffective functional role assumed by them in

Fig. 5. MG-RAST analyses of Kefir1 and Kefir2 WGS metagenomic sequences. A: Genus level

forming the Kefir grain. Confirmation of these new species andtheir roles in Kefir renders itself as a future challenge to improvethe understanding of Kefir’s microbial community structure.

Although pyrosequencing is being used with an increasingpopularity to identify the microbial community in fermented milkproducts, this technique has been applied to Kefir samples in onlytwo studies so far (Dobson et al., 2011; Leite et al., 2012). In boththese studies, which have analyzed Brazilian and Irish Kefir sam-ples, 16S rRNA gene regions have been amplified for pyrose-quencing. The identified phyla in Actinobacteria, Proteobacteria,Bacteriodetes by these two studies that have not been identified inthe current study, is a reflection of the diversity seen in Kefirsamples based on the sample’s place of origin. On the other hand,there are also commonalities such as the phylum Firmicutes thathas been identified in all three studies. However, we note that thewhole genome approach is capable of identifying the species in thephylum Firmicutes, which contains the most dominant speciesfound in Kefir, with greater resolution and accuracy, especially forthe low abundant species.

MG-RAST analyses of the WGS metagenomic sequencesrevealed an a-diversity of 23.9 for Kefir1 and 21.4 for Kefir2implying similar species diversity in both Kefir samples, which islarger than the diversity levels obtained by typical 16S studiesaiming at identifying the community profile in Kefir (Dobson et al.,2011; Kesmen and Kacmaz, 2011; Leite et al., 2012). The attain-ability of a higher a-diversity validates the use of WGS approach formetagenomic species identification providing a more accurate andresolved community profile. The rarefraction curves for both Kefirsamples reached a plateau and remained flat as the number ofsequences increased indicating that the current sampling of thecommunities are sufficient and there remains to be very few

assignment of the sequences. B: Functional annotation of predicted protein sequences.

U. Nalbantoglu et al. / Food Microbiology 41 (2014) 42e5150

additional species discovered with more intensive sampling. TheGC content for both Kefir samples were unimodal with peaksaround 45%. The genus level assignments made by MG-RAST areshown in Fig. 5. In both Kefir samples, the genus Lactobacillusdominate the profile with w90% abundance, followed by the genreOenococcus, Enterococcus, Streptococcus, Pediococcus, and Leuco-nostocwith varying but comparable abundance. These results are inaccordancewith the profile obtained by the employedworkflow forspecies binning.

Predicted protein features that showed similarity to proteinswith known functions consisted of 79.9% and 78.2% of the meta-genomic sequences in Kefir1 and Kefir2, respectively. Of thesefeatures that could be annotated, 65% and 67% were placed in ahierarchy in Kefir1 and Kefir2, respectively. In Fig. 5, we show thepercent abundance of the functional categories found in Kefir1and Kefir2 samples. The categories show almost identical abun-dance in both samples with the most abundant category beingclustering-based subsystems, which include genes with functionalcoupling evidence without a specific/known task. The highlyabundant subsystems follow an expected order with carbohy-drate, protein, amino acid, DNA/RNA metabolism dominating thelist, which is also similarly observed in other comparable meta-genomes in the MG-RAST server. On the other hand, lowly abun-dant categories such as motility and chemotaxis and dormancyand sporulation are coherent with the functional characteristics ofKefir as the community members are not mobile and do notsporulate.

5. Conclusions

In this study we assessed the microbial community in twoTurkish Kefir grains using whole genome shotgun metagenomicsvia pyrosequencing. We compared and verified our results withtwo 16S based analysis: using pyrosequencing of V1eV2 regions ofthe 16S rRNA genes and using the WGS reads predicted to comefrom the 16S rRNA genes. Our results indicate that WGS basedapproach identifies the underlying community with higher reso-lution and better abundance accuracy. 16S based approaches arevulnerable to primer and amplification bias and may inaccuratelyassess the abundance of the community members due to highsimilarity of the corresponding 16S sequences. Moreover, using aWGS based approach it is possible to identify novel speciessequence data contributing to our understanding of the biologicalmechanisms that give rise to the underlying microflora.

Acknowledgments

This work was supported by the Scientific and TechnologicalResearch Council of Turkey (TUBITAK) grant number 111T369 (toHC).

References

Abraham, A.G., De Antoni, G.L., 1999. Characterization of kefir grains grown in cows’milk and in soya milk. J. Dairy Res. 66, 327e333.

Alegria, A., Alvarez-Martin, P., Sacristan, N., Fernandez, E., Delgado, S., Mayo, B.,2009. Diversity and evolution of the microbial populations during manufactureand ripening of Casein, a traditional Spanish, starter-free cheese made fromcow’s milk. Int. J. Food Microbiol. 136, 44e51.

Angulo, L., Lopez, E., Lema, C., 1993. Microflora present in kefir grains of the Galicianregion (north-west of Spain. J. Dairy Res. 60, 263e267.

Chen, H.C., Wang, S.Y., Chen, M.J., 2008. Microbiological study of lactic acid bacteriain kefir grains by culture-dependent and culture-independent methods. FoodMicrobiol. 25, 492e501.

Cole, J.R., Wang, Q., Cardenas, E., Fish, J., Chai, B., Farris, R.J., Kulam-Syed-Mohideen, A.S., McGarrell, D.M., Marsh, T., Garrity, G.M., Tiedje, J.M., 2009. TheRibosomal Database Project: improved alignments and new tools for rRNAanalysis. Nucleic Acids Res. 37, D141eD145.

Delfederico, L., Hollmann, A., Martinez, M., Iglesias, N.G., De Antoni, G., Semorile, L.,2006. Molecular identification and typing of lactobacilli isolated from kefirgrains. J. Dairy Res. 73, 20e27.

Dobson, A., O’Sullivan, O., Cotter, P.D., Ross, P., Hill, C., 2011. High-throughputsequence-based analysis of the bacterial composition of kefir and an associatedkefir grain. FEMS Microbiol. Lett. 320, 56e62.

El-Baradei, G., Delacroix-Buchet, A., Ogier, J.C., 2008. Bacterial biodiversity oftraditional Zabady fermented milk. Int. J. Food Microbiol. 121, 295e301.

Farnworth, E.R., Mainville, I., 2003. Kefir: a fermented milk product. In:Farnworth, E.R. (Ed.), Handbook of Fermented Functional Foods. CRC Press, BocaRaton, FL, pp. 77e112.

Florez, A.B., Reimundo, P., Delgado, S., Fernandez, E., Alegria, A., Guijarro, J.A.,Mayo, B., 2012. Genome sequence of Lactococcus garvieae IPLA 31405, abacteriocin-producing, tetracycline-resistant strain isolated from a raw-milkcheese. J. Bacteriol. 194, 5118e5119.

Foschino, R., Nucera, D., Volponi, G., Picozzi, C., Ortoffi, M., Bottero, M.T., 2008.Comparison of Lactococcus garvieae strains isolated in northern Italy from dairyproducts and fishes through molecular typing. J. Appl. Microbiol. 105, 652e662.

Fujisawa, T., Adachi, S., Toba, T., Arihara, K., Mitsuoka, T., 1988. Lactobacillus kefir-anofaciens sp. nov. isolated from kefir grains. Int. J. Syst. Bacteriol. 38, 12e14.

Garrote, G.L., Abraham, A.G., De Antoni, G.L., 2001. Chemical and microbiologicalcharacterisation of kefir grains. J. Dairy Res. 68, 639e652.

Henri-Dubernet, S., Desmasures, N., Gueguen, M., 2008. Diversity and dynamics oflactobacilli populations during ripening of RDO Camembert cheese. Can. J.Microbiol. 54, 218e228.

Hong, S., Bunge, J., Leslin, C., Jeon, S., Epstein, S.S., 2009. Polymerase chain reactionprimers miss half of rRNA microbial diversity. ISME J. 3, 1365e1373.

Huang, Y., Gilna, P., Li, W., 2009. Identification of ribosomal RNA genes in meta-genomic fragments. Bioinformatics 25, 1338e1340.

Idury, R.M., Waterman, M.S., 1995. A new algorithm for DNA sequence assembly.J. Comput. Biol. 2, 291e306.

Kalyuzhnaya, M.G., Lapidus, A., Ivanova, N., Copeland, A.C., McHardy, A.C., Szeto, E.,Salamov, A., Grigoriev, I.V., Suciu, D., Levine, S.R., Markowitz, V.M., Rigoutsos, I.,Tringe, S.G., Bruce, D.C., Richardson, P.M., Lidstrom, M.E., Chistoserdova, L.,2008. High-resolution metagenomics targets specific functional types in com-plex microbial communities. Nat. Biotechnol. 26, 1029e1034.

Kesmen, Z., Kacmaz, N., 2011. Determination of lactic microflora of kefir grains andkefir beverage by using culture-dependent and culture-independent methods.J. Food Sci. 76, M276eM283.

Kofler, R., Orozco-terWengel, P., De Maio, N., Pandey, R.V., Nolte, V., Futschik, A.,Kosiol, C., Schlotterer, C., 2011. PoPoolation: a toolbox for population geneticanalysis of next generation sequencing data from pooled individuals. PLoS One6, e15925.

Kowalczyk, M., Kolakowski, P., Radziwill-Bienkowska, J.M., Szmytkowska, A.,Bardowski, J., 2012. Cascade cell lyses and DNA extraction for identification ofgenes and microorganisms in kefir grains. J. Dairy Res. 79, 26e32.

Leite, A.M., Mayo, B., Rachid, C.T., Peixoto, R.S., Silva, J.T., Paschoalin, V.M.,Delgado, S., 2012. Assessment of the microbial diversity of Brazilian kefir grainsby PCR-DGGE and pyrosequencing analysis. Food Microbiol. 31, 215e221.

Marshall, V.M., Cole, W.M., Brooker, B.E., 1984. Observations on the structureof kefir grains and the distribution of the microflora. J. Appl. Bacteriol. 57,491e497.

Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E.M., Kubal, M., Paczian, T.,Rodriguez, A., Stevens, R., Wilke, A., Wilkening, J., Edwards, R.A., 2008. Themetagenomics RAST server e a public resource for the automatic phylogeneticand functional analysis of metagenomes. BMC Bioinform. 9, 386.

Morales, F., Morales, J.I., Hernandez, C.H., Hernandez-Sanchez, H., 2011. Isolationand partial characterization of halotolerant lactic acid bacteria from twoMexican cheeses. Appl. Biochem. Biotechnol. 164, 889e905.

Nalbantoglu, O.U., Way, S.F., Hinrichs, S.H., Sayood, K., 2011. RAIphy: phylogeneticclassification of metagenomics samples using iterative refinement of relativeabundance index profiles. BMC Bioinform. 12, 41.

Ng, P.C., Kirkness, E.F., 2010. Whole genome sequencing. Methods Mol. Biol. 628,215e226.

Pallen, M.J., Loman, N.J., Penn, C.W., 2010. High-throughput sequencing and clinicalmicrobiology: progress, opportunities and challenges. Curr. Opin. Microbiol. 13,625e631.

Peng, Y., Leung, H.C., Yiu, S.M., Chin, F.Y., 2011. Meta-IDBA: a de Novo assembler formetagenomic data. Bioinformatics 27, i94e101.

Petrosino, J.F., Highlander, S., Luna, R.A., Gibbs, R.A., Versalovic, J., 2009.Metagenomic pyrosequencing and microbial identification. Clin. Chem. 55,856e866.

Pevzner, P.A., Tang, H., Waterman, M.S., 2001. An Eulerian path approach to DNAfragment assembly. Proc. Natl. Acad. Sci. U. S. A. 98, 9748e9753.

Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 2007. Section 16.2.Viterbi Decoding, Numerical Recipes: the Art of Scientific Computing. Cam-bridge University Press, New York.

Reid, G., Kim, S.O., Kohler, G.A., 2006. Selecting, testing and understanding probioticmicroorganisms. FEMS Immunol. Med. Microbiol. 46, 149e157.

Simova, E., Beshkova, D., Angelov, A., Hristozova, T., Frengova, G., Spasov, Z., 2002.Lactic acid bacteria and yeasts in kefir grains and kefir made from them. J. Ind.Microbiol. Biotechnol. 28, 1e6.

Ventura, M., Turroni, F., Canchaya, C., Vaughan, E.E., O’Toole, P.W., van Sinderen, D.,2009. Microbial diversity in the human intestine and novel insights frommetagenomics. Front. Biosci. 14, 3214e3221.

U. Nalbantoglu et al. / Food Microbiology 41 (2014) 42e51 51

Wang, Y., Wang, J., Ahmed, Z., Bai, X., Wang, J., 2011. Complete genome sequence ofLactobacillus kefiranofaciens ZW3. J. Bacteriol. 193, 4280e4281.

Wegmann, U., Overweg, K., Horn, N., Goesmann, A., Narbad, A., Gasson, M.J.,Shearman, C., 2009. Complete genome sequence of Lactobacillus johnsoniiFI9785, a competitive exclusion agent against pathogens in poultry. J. Bacteriol.191, 7142e7143.

Witthuhn, R.C., Schoeman, T., Britz, T.J., 2004. Isolation and characterization of themicrobial population of different South African kefir grains. Int. J. Dairy Technol.57, 33e37.

Young, J.P., Downer, H.L., Eardly, B.D., 1991. Phylogeny of the phototrophic rhizo-bium strain BTAi1 by polymerase chain reaction-based sequencing of a 16SrRNA gene segment. J. Bacteriol. 173, 2271e2277.

Zhang, Z., Schwartz, S., Wagner, L., Miller, W., 2000. A greedy algorithm for aligningDNA sequences. J. Comput. Biol. 7, 203e214.

Zhou, J., Liu, X., Jiang, H., Dong, M., 2009. Analysis of the microflora in Tibetan kefirgrains using denaturing gradient gel electrophoresis. Food Microbiol. 26, 770e775.