artificial neural network modeling of microbial community structures in the atlantic forest of...

9
Articial neural network modeling of microbial community structures in the Atlantic Forest of Brazil Eder C. Santos a , Eduardo Dutra Armas a , David Crowley b , Marcio Rodrigues Lambais a, * a Departamento de Ciência do Solo, Escola Superior de Agricultura Luiz de Queiroz, Universidade de São Paulo, Av. Pádua Dias 11, 13418-900 Piracicaba, SP, Brazil b Department of Environmental Sciences, University of California Riverside, 318 Science Laboratories I, CA 92521 Riverside, CA, USA article info Article history: Received 25 March 2013 Received in revised form 21 October 2013 Accepted 26 October 2013 Available online 15 November 2013 Keywords: Articial neural networks Ecosystem modeling FAME Microbial communities Soil ecology Tropical forests abstract Microbial communities vary across the landscape in forest soils, but prediction of their biomass and composition is a difcult challenge due to the large numbers of variables that inuence their community structures. Here we examine the use of articial neural network (ANN) models for extraction of patterns among soil chemical variables and microbial community structures in forest soils from three regions of the Atlantic Forest of Brazil. At each location, variations in soil chemical properties and FAME proles of microbial community structures were mapped at 20 20 m intervals within 10 ha parcels. Geostatistical analyses showed that spatial variability in soil physical and chemical variables could be mapped at scale distances of 20 m, but that FAME proles representing the microbial communities were highly variable and had no spatial dependence at the same scale in most cases. RDA analysis showed that FAME sig- natures representing different microbial groups were positively associated with soil pH, OM, P and base cations concentrations, whereas microbial biomass was negatively associated with the same environ- mental factors. In contrast, ANN models revealed clear relationships between microbial community structures at each parcel location, and generated veriable predictions of variations in FAME proles in relation to soil pH, texture, and the relative abundances of base cations. The results suggest that ANN modeling provides a useful approach for describing the relationships between microbial community structures and soil properties in tropical forest soils that were not able to be captured using geostatistical and RDA analyses. Ó 2013 Elsevier Ltd. All rights reserved. 1. Introduction The Atlantic Forest of Brazil is one of the Earths biodiversity hotspotsthat is comprised of widely distributed, remnant patches of tropical and subtropical moist forests, tropical dry forest, tropical savannas, and mangroves. While largely decimated over the past three centuries, the remaining Atlantic Forest still harbors approximately 20,000 plant species, of which half are endemic (Tabarelli et al., 2003, 2005). As part of the effort to characterize the relationships between environmental variables and the composi- tion of the remaining forest fragments, the São Paulo Research Foundation (FAPESP) has initiated a research program (BIOTA) to map the forest vegetation, soil types, and microbial communities. The latter component of this program focuses on soil variables that shape microbial community composition. To date this effort has met with limited success, which is thought to be due to the large numbers of variables that simultaneously and interactively shape microbial community structures. Other challenges include the dif- culty in determining the appropriate level of resolution and se- lection of spatial scales for model development that can reliably predict changes in microbial community structures across the landscape. A variety of molecular and biochemical methods are now available for characterizing the composition of soil microbial communities. To this end, one of the most useful methods for describing microbial communities has been the use of fatty acid proles that reect broad level differences in microbial community composition by measuring the concentrations of signature fatty acids that represent different functional groups (White, 1993; White et al., 1996; Kaur et al., 2005). These methods include both fatty acid methyl ester (FAME) and phospholipid fatty acid (PLFA) analyses, both of which have relatively similar abilities to discriminate microbial communities. Nonetheless, the interpreta- tion of data sets describing microbial communities is inherently * Corresponding author. Tel.: þ55 19 3417 2107. E-mail address: [email protected] (M.R. Lambais). Contents lists available at ScienceDirect Soil Biology & Biochemistry journal homepage: www.elsevier.com/locate/soilbio 0038-0717/$ e see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.soilbio.2013.10.049 Soil Biology & Biochemistry 69 (2014) 101e109

Upload: marcio-rodrigues

Post on 23-Dec-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Artificial neural network modeling of microbial community structures in the Atlantic Forest of Brazil

lable at ScienceDirect

Soil Biology & Biochemistry 69 (2014) 101e109

Contents lists avai

Soil Biology & Biochemistry

journal homepage: www.elsevier .com/locate/soi lb io

Artificial neural network modeling of microbial community structuresin the Atlantic Forest of Brazil

Eder C. Santos a, Eduardo Dutra Armas a, David Crowley b, Marcio Rodrigues Lambais a,*

aDepartamento de Ciência do Solo, Escola Superior de Agricultura “Luiz de Queiroz”, Universidade de São Paulo, Av. Pádua Dias 11,13418-900 Piracicaba, SP, BrazilbDepartment of Environmental Sciences, University of California Riverside, 318 Science Laboratories I, CA 92521 Riverside, CA, USA

a r t i c l e i n f o

Article history:Received 25 March 2013Received in revised form21 October 2013Accepted 26 October 2013Available online 15 November 2013

Keywords:Artificial neural networksEcosystem modelingFAMEMicrobial communitiesSoil ecologyTropical forests

* Corresponding author. Tel.: þ55 19 3417 2107.E-mail address: [email protected] (M.R. Lambais).

0038-0717/$ e see front matter � 2013 Elsevier Ltd.http://dx.doi.org/10.1016/j.soilbio.2013.10.049

a b s t r a c t

Microbial communities vary across the landscape in forest soils, but prediction of their biomass andcomposition is a difficult challenge due to the large numbers of variables that influence their communitystructures. Here we examine the use of artificial neural network (ANN) models for extraction of patternsamong soil chemical variables and microbial community structures in forest soils from three regions ofthe Atlantic Forest of Brazil. At each location, variations in soil chemical properties and FAME profiles ofmicrobial community structures were mapped at 20 � 20 m intervals within 10 ha parcels. Geostatisticalanalyses showed that spatial variability in soil physical and chemical variables could be mapped at scaledistances of 20 m, but that FAME profiles representing the microbial communities were highly variableand had no spatial dependence at the same scale in most cases. RDA analysis showed that FAME sig-natures representing different microbial groups were positively associated with soil pH, OM, P and basecations concentrations, whereas microbial biomass was negatively associated with the same environ-mental factors. In contrast, ANN models revealed clear relationships between microbial communitystructures at each parcel location, and generated verifiable predictions of variations in FAME profiles inrelation to soil pH, texture, and the relative abundances of base cations. The results suggest that ANNmodeling provides a useful approach for describing the relationships between microbial communitystructures and soil properties in tropical forest soils that were not able to be captured using geostatisticaland RDA analyses.

� 2013 Elsevier Ltd. All rights reserved.

1. Introduction

The Atlantic Forest of Brazil is one of the Earth’s “biodiversityhotspots” that is comprised of widely distributed, remnant patchesof tropical and subtropical moist forests, tropical dry forest, tropicalsavannas, and mangroves. While largely decimated over the pastthree centuries, the remaining Atlantic Forest still harborsapproximately 20,000 plant species, of which half are endemic(Tabarelli et al., 2003, 2005). As part of the effort to characterize therelationships between environmental variables and the composi-tion of the remaining forest fragments, the São Paulo ResearchFoundation (FAPESP) has initiated a research program (BIOTA) tomap the forest vegetation, soil types, and microbial communities.The latter component of this program focuses on soil variables thatshape microbial community composition. To date this effort has

All rights reserved.

met with limited success, which is thought to be due to the largenumbers of variables that simultaneously and interactively shapemicrobial community structures. Other challenges include the dif-ficulty in determining the appropriate level of resolution and se-lection of spatial scales for model development that can reliablypredict changes in microbial community structures across thelandscape.

A variety of molecular and biochemical methods are nowavailable for characterizing the composition of soil microbialcommunities. To this end, one of the most useful methods fordescribing microbial communities has been the use of fatty acidprofiles that reflect broad level differences in microbial communitycomposition by measuring the concentrations of signature fattyacids that represent different functional groups (White, 1993;White et al., 1996; Kaur et al., 2005). These methods include bothfatty acid methyl ester (FAME) and phospholipid fatty acid (PLFA)analyses, both of which have relatively similar abilities todiscriminate microbial communities. Nonetheless, the interpreta-tion of data sets describing microbial communities is inherently

Page 2: Artificial neural network modeling of microbial community structures in the Atlantic Forest of Brazil

E.C. Santos et al. / Soil Biology & Biochemistry 69 (2014) 101e109102

complicated by their multidimensionality, which in the case ofPLFA or FAME typically include some 30 or more fatty acids that areused to describe the community structures.

Moreover, as many different environmental variables appear tosimultaneously influence the composition of microbial commu-nities, many of which are correlated and interactive, the relation-ships between individual variables and the composition ofindividual fatty acids are difficult to extract using mathematicalmodels. One of the primary methods for separating out the effectsof different variables in microbial ecology employs canonicalordination such as redundancy analysis (RDA), which combinesmultiple regression with classical ordination (Ramette, 2007). Us-ing RDA, the main patterns of species variation can be depicted andexplained by the measured environmental variables. In addition,correlation coefficients between species and each environmentalvariable can be determined.

Another approach that is now receiving increased attention isthe use of artificial neural network models, which can be trained toextract nonlinear patterns that exist in large complex data setswithout requiring a priori hypotheses to guide the model devel-opment. These models may thereafter be used to predict howdifferent combinations of variables may affect microbial commu-nity structures, and tested using new data sets to evaluate theirvalidity (Noble et al., 2000; Mele and Crowley, 2008).

In this report, we describe the application of ANN modelingapproaches for determining the relationships between FAME pro-files of microbial communities from three different ecosystems ofthe Atlantic Forest and their correspondence with soil chemicalvariables that shape the community structures. The ANN modelsincluded the use of both unsupervised Kohonen self-organizingmaps (KSOM) and a function-based model (Englebrecht, 2007;Mele and Crowley, 2008). KSOM are used primarily for unsuper-vised pattern recognition in which the goal is to find underlyingstructure in the data.When two variables have similar distributionsin relation to the other variables in the data set, the color-codedpatterns representing those variables will correspond to oneanother. Likewise, inverse relationships can be observed as inversepatterns. On the other hand, when two variables are unrelated theywill have different map patterns. Variables that map as multipleclusters in turn suggest the existence of nonlinear interactions thatare driven by different combinations of variables that come intoplay for each cluster.

Going beyond the unsupervised models that are generated byKSOM, ANN function based models are supervised models inwhichselected input variables are studied in relation to a specificdependent/output variable that is controlled by the independentinput variables. Selection of the input variables can be optimized byusing sensitivity analysis that measures the relative importance ofall the independent variables for determining the value of theoutput variable. Supervised models are constructed by iterativemodel runs in which different sets of independent variables aremethodically evaluated for their predictive power and bymeasuring their error when run with a previously unseen valida-tion data set. To this end, a portion of the original data set that hasbeen randomized is set aside for later use in model validation.

Here, we used both KSOM and function based models toexamine the relationships between soil chemical and biologicalproperties. The KSOM represent a starting point in the data anal-ysis, and provide color coded maps, in which the relationshipsbetween all independent and dependent variables in the data setcan be used to visually assess clusters in the data and form hy-potheses regarding potential relationships among the variables. Wethen developed a series of ANN function models that are based on amultilayer perceptron architecture. In these models, selected in-dependent variables (environmental variables) are represented by

nodes that are mathematically interconnected to the dependentvariables (microbial community descriptors) in the simulatedneural network. The linkages between the input and output vari-ables are initially set with randomvalues, and are thereafter trainedusing an iterative training process in which the mathematicallinkages are refined to minimize the error in predicting thedependent variables. After optimizing the number of training cyclesand running a series of independent ANN models to test for modelrobustness, we then selected the best performing models for use inpredicting the concentrations of FAME biomarkers that representdifferent components of the microbial communities from the soilchemical data. We further compared the results obtained by theANN approach with those obtained by conventional RDA andgeostatistical analyses. Our final objective was to evaluate therelative utility of these different methods for mapping thecomposition of the microbial communities and to determine thosevariables that were most influential in shaping soil microbialcommunity structures.

2. Materials and methods

2.1. Site description and soil physico-chemical analyses

In preliminary work to establish the experiment, sites wereselected from 3 different ecosystems of the Atlantic Forest thatrepresent different types of forests. To our knowledge, all of thesites represent pristine Atlantic Forests that today comprise only3e5% of its original area in Brazil. Samples were collected in 10 hapermanent parcels established in the Carlos Botelho State Park (CB),Assis Ecological Station (AS) and Caetetus Ecological Station (GA).The main features of the sites are described in Table 1.

At each location, the permanent parcel was physically mappedand subdivided into 20 � 20 m plots arranged in a rectangular gridover the landscape. Within each of the 256 subplots at each loca-tion, soil samples were collected at 0e5 cm depth using stainlesssteel cylinders (50 mm diameter) that were hammered into the soilto obtain intact soil cores that were placed in coolers and trans-ported to the laboratory for processing. The soil samples were airdried and removed from the cores, after which they were sievedthrough a 2-mm screen and homogenized. The sieved soils werethen divided into paired subsamples, one of which was sent to theUniversity of California, Riverside for FAME analysis, and the otherhalf was analyzed for soil chemical properties at the ESALQ campusof the University of São Paulo in Piracicaba, Brazil. Air-dried andsieved soil samples were analyzed for physical (sand, silt and claycontents) and chemical (pH, OM, P, Na, K, Ca, Mg, Al) properties,according to standard soil testing methods (Raij et al., 1987).

2.2. FAME analyses

Duplicate (detection limit analysis) or triplicate (samplediscrimination analysis) subsamples of 0.5 g of soil (DW) (unlessstated otherwise) were extracted according to the Microbial Iden-tification System (MIS; Microbial ID Inc., Newark, DE) standardprocedure. To each soil sample, 3.25 M NaOH in MeOH:H20 (1:1)was added (1 ml solution added per 1 g soil). The samples werevortexed and then placed in an 80 �C water bath for 30 min, duringwhich time the cells were lysed and the FAs were cleaved from thecellular lipids. Following this saponification step, the FAs wereconverted to FAMEs by adding 6.0 M HCl:MeOH (1:0.85) (2 mlsolution per 1 g soil) to each sample. To extract the FAMEs from theacidic aqueous phase into the organic phase, a hexane:MTBE (1:1)solution was added to each sample (2 ml solution per 1 g soil).Following addition of the hexane:MTBE (1:1) solution, the MIDIprocedure then was modified as described in Cavigelli et al. (1995)

Page 3: Artificial neural network modeling of microbial community structures in the Atlantic Forest of Brazil

Table 1Main features of the sampling sites.

Site Altitude (m) Coordinates Annual averagetemperature (�C)

Annual rainfall (mm) Climate (Köppen’s classification) Vegetation

CB 300 24� 030 0000 S47� 490 0000 W

19 1685 Cfa Sub-montane ombrophilous dense forest

AS 500 22� 350 1400 S50� 220 3800 W

22 1250 Transition between Cwa and Cfa Cerrado latu sensu

GA 600 22� 410 0000 S49� 160 0000 W

21 1300 Cwa Semi-deciduous seasonal forest

CB, Carlos Botelho State Park; AS, Assis Ecological Station; GA, Caetetus Ecological Station.

E.C. Santos et al. / Soil Biology & Biochemistry 69 (2014) 101e109 103

by adding a 10 min 2500 rpm centrifugation step followingextraction. The organic phase subsequently was removed, washedwith a mild base (0.3 M NaOH), and dried with N2 before re-dissolution in hexane containing an internal standard (19:0FAME, Sigma Chemical Co., St. Louis, MO). FAME concentrations(nmol g�1 soil) were calculated by comparing peak areas to ananalytical standard calibration curve. The FAMEs are described bythe number of C atoms, followed by a colon, the number of doublebonds and then by the position of the first double bond from themethyl (u) end of the molecule. Isomers are indicated by c, andbranched fatty acids are indicated by the prefixes i and a for iso andanteiso, respectively. Other notations are Me for methyl, OH forhydroxy and cyclo for cyclopropane.

2.3. Statistical analyses

2.3.1. RDA and geostatistical analysesRDA was performed using the relative abundances of FAMEs

that have been suggested as microbial indicators (Bosio et al., 1998;Zelles 1999). Microbial biomass was calculated as the sum of theindividual FAMEs (nmol g�1 soil). Diversity indices were calculatedusing FAME indicators (Table S1, Supplemental material) and thefunction diversity from R-package Vegan (Oksanen et al., 2010).RDA was performed using CANOCO software to determine the re-lationships between microbiological variables and environmentalvariables. Monte Carlo permutation test (499 permutations) wasused to determine the significance of data ordination.

Geostatistical analyses were performed in three stages. Firstly,data normalizationwas performed using BoxeCox transformations.For all variables, a significant normal distributionwas not achieved.Nonetheless, variables showed data distribution close to normalitybased on QeQ plots, and, considering the peculiarity of the dataset,

Table 2Evaluation of ANN model robustness for independently generated ANN functionmodels.

Run Biomassa G�:Gþb F:Bc

TR VAL TR VAL TR VAL

1 42,221 48,223 0.956 0.750 0.290 0.2602 47,122 52,267 0.930 0.940 0.286 0.2823 43,942 61,874 0.980 0.555 0.290 0.2694* 48,264 43,647 1.000 0.248 0.290 0.1805 48,358 41,665 0.978 0.380 0.297 0.2036 48,135 44,579 0.960 0.710 0.291 0.2567 47,878 47,984 0.950 0.749 0.248 0.4438 46,999 52,626 0.690 1.750 0.298 0.206

a Biomass values are sum of FAME peak area determined by GC-FID.b G�:Gþ ¼ ratio of Gram negative to Gram positive bacteria biomarkers.c F:B ¼ ratio of fungi to bacteria biomarkers; TR ¼ training data set;

VAL ¼ Validation data set; eight separate models are compared *Model 4 selectedfor the final analysis based on best data fit as determined by its minimum errorvalues for both the TR and VAL data sets.

were used for further analyses. The spatial autocorrelation betweenthe sample points for each variable was determined based on theestimated semivariograms. Since most of the biological soil prop-erties showed no spatial correlation, data were subjected to inter-polation using the Inverse of Distance Weighted (IDW) model.Geostatistical analyses were performed using software R andpackage GSTAT for IDW interpolation model (Pebesma, 2004). Thecriteria used to validate the interpolation model included rootmean square error (RMSE), mean bias error (MBA) and mean ab-solute error (MAE) (Webster and Oliver, 2001).

2.3.2. Artificial neural network analysesANN analyses were conducted using the software Synapse

(Peltarion Corporation, Sweden) following the protocols providedby the modeling program. The data were first entered into EXCELfiles where they were inspected, randomized, and then saved astext files for import into Synapse. KSOM were generated from thefull data set, including both the independent and dependent vari-ables. For the function models, the software divides the originaldata into both a training data set and a validation data set, the latterconsisting of the penultimate 15% of data entries in the spread-sheet. As the validation data set is automatically generated from thelast entries in the spreadsheet, the data required randomization bysample (row) in order to obtain full representation of the data fromdifferent locations within the validation data set. Preliminarymodel runs were then conducted using incremental steps of 100training cycles to evaluate the optimal number of training cyclesthat were required tominimizemodel error in predicting the valuesof both the training and validation data sets, which were inspectedat the end of each set of 100 training cycles. The model was opti-mized for biomass after only a few hundred cycles, whereas F:B andG�/Gþ ratios were optimized at approximately 1000 cycles.Further increases in the cycle numbers used to train the modelresulted in overtraining such that there was no better resolution ofdifferences for the training set, but the models lost power to fit thevalidation data sets. We thus selected 1000 cycles for modeltraining.

Each time a model is generated by the computer, the run startswith randomized inputs such that independently generatedmodelswill have different powers of resolution. To select the mostpowerful model and to evaluate the consistency in predictionsamong independent ANN models, 8 independent models weregenerated using newly randomized data sets for each model run,after which the model that yielded the best resolutionwas selectedfor use. The output from this particular model was then used toproduce figures depicting the interrelationships between selectedcombinations of physical and chemical variables as drivers of mi-crobial community structure. The Table 2 shows that model 4 wasclearly the best, providing the lowest standard error values for theeach of three representative biological variables that were used toassess the power of the model.

Page 4: Artificial neural network modeling of microbial community structures in the Atlantic Forest of Brazil

E.C. Santos et al. / Soil Biology & Biochemistry 69 (2014) 101e109104

After selecting the ANNmodel that best fit both the training andvalidation data, the relationships between selected combinations ofdependent and independent variables were examined using thepostprocessing tools in the program. Predicted output values forthe dependent variables (fatty acids, sum of fatty acids, ratios ofsignature fatty acids for fungi and bacteria, Gram negative/Grampositive bacteria, etc) were generated by entering specific valuesthat covered the full range of each independent variable or pair ofindependent variable (3-D plots) into the model and recording theoutput values. The output values were then entered into newspreadsheets for generation of figures using Sigma Plot (SYSTATSoftware Inc).

3. Results

3.1. Physical mapping of FAME markers

Maps of the FAME markers representing different taxonomicgroups of microorganisms showed that each location contained“hotspots” in which particular groups of microorganisms domi-nated the communities (Fig. 1 and Fig. S1eS3, supplementalmaterial). The map shown in Fig. 1 is a surface plot for biomass(sum of FAMEs) corresponding to the actual map coordinates atthis location. However, regression analysis of the FAME data withsoil chemical and physical variables did not reveal any directcorrelations between the FAME descriptors of communitystructure and soil variables when analyzed across the entire dataset or when analyzed separately by individual location (data notshown). Nor did stepwise regression analysis indicate anyconsistent relationships between combinations of chemical andphysical variables that were associated with changes in

Fig. 1. Surface plot for relative biomass (sum of FAMES) correspon

community structure. This suggests that these variables mayinteract in nonlinear ways that could not be separated usingtraditional statistical analyses.

3.2. RDA and geostatistical analyses

In our study RDA was employed to analyze the relationshipsbetween FAME profiles and environmental variables. The centralconcept of the RDA is the use of linear regression to explain vari-ation between independent and dependent variables in an iterativeprocess to obtain the best ordination.

Although this is one of the main linear statistical tools used toanalyze microbial biomarker data sets, our results showed that hefirst and second canonical axis accounted for 75.5 and 14.8% of thevariance of FAME signatureseenvironment relations, respectively(Monte Carlo test, p ¼ 0.002; Fig. S4, supplemental material),explaining approximately 90% of the variability, and indicating thatthe use of RDA to detecting groups of variables related to microbialbiomarkers may be very efficient, even though no modeling ispossible. Our results mainly show a site discrimination based onsoil chemical properties where GA is positively related with basiccations and pH, whereas AS is positively related to Al concentra-tions. In contrast, CB is mostly associated with high levels of OM,clay and silt. The Myc:Fungi, Gramþ:Gram� bacteria (Gþ:G�) andFungi:Bacteria (F:B) ratios were also positively associated with OM,pH, base cations and P.

Figs. S5eS7 (supplemental material) show the results of thesemivariogram analyses for several FAME biomarkers for each sitestudied. In general, most of the model outputs showed no signifi-cant spatial dependency for the FAME biomarkers evaluated, with apure nugget effect evidenced by the semivariograms. A large

ding to the map coordinates from Carlos Botelho State Park.

Page 5: Artificial neural network modeling of microbial community structures in the Atlantic Forest of Brazil

E.C. Santos et al. / Soil Biology & Biochemistry 69 (2014) 101e109 105

portion of the variance occurred over small spatial distances and nominimum distance range in which data exhibited spatial correla-tion could be determined.

The semivariance analyses did not reveal spatial dependencystructure in the variance of the FAME biomarkers evaluated, noteven when linear and nonlinear models were applied, suggestingthat the community structure and microbial biomass may vary atsmall scales (<1 m) in these ecosystems. In this case, deterministicspatial models such as IDW may be considered a good approach tospatially interpolate data sets with no spatial dependency inferredfrom variographic analysis based on stochastic models. Using IDW,the spatial autocorrelation statistics is replaced by a weight matrixshowing the local influence of measured points on the predictionlocation, which decreases with increasing distances. The spatialdistribution of the FAME biomarkers estimated by IDW is shown inFigs. S1eS3 (supplemental material).

3.3. ANN analyses

Visual inspection of the KSOM starts with the U-matrix, which isthe panel in the upper left corner of Fig. 2. The U-matrix summa-rizes the overall variation in the data set, which is then laid out inthe remaining panels that represent individual variables. Here, theU-matrix revealed three distinct clusters in the data set that cor-responded with broad level differences in the three locations(Fig. 2). At the bottom of each panel, the values for the individualmeasurements are coded by color. Where color patterns betweentwo panels correspond, the KSOM indicate that the variables co-vary. For example, soils with high clay content are inverselycorrelated with those having high sand content. Panels represent-ing the base cations showed an expected covariance. Likewise, thepanels displaying signature fatty acids or combinations of variables(eg. Gram�/Gramþ) can be inspected for possible relationshipsbetween the microbial community structures and the soil chemicalproperties.

Patterns that clearly emerge from inspection of the KSOM showthat the three locations (AS, CB and GA)were distinct on the basis ofdifferences in soil pH. The panel representing the summary envi-ronmental and biological characteristics for GA correspondsdirectly with high pH, whereas the soil at CB had low pH, and AScontained soils spanning the full pH range of the other two sites.Other independent variables that co-associated with locationincluded differences in the concentrations of base cations, organicmatter, and phosphorus, and by differences in the concentrations ofexchangeable aluminum. With respect to biological variables, soilsfrom GA were associated with a lower overall microbial biomassthat was comprised primarily of fungi, and a high abundance ofmycorrhizal fungi and protozoa as compared to the other twolocations.

Variables showing positive correspondence with the soils fromAS site included a high sand content, low soil pH, and corre-sponding low levels of base cations, organic matter, and phos-phorus, with intermediate concentrations of exchangeablealuminum. Biological features included intermediate levels of fungiand actinomycetes, a relatively high biomass of mycorrhizal fungi,and a low overall microbial biomass. In turn, the soils at the CBlocation had a high clay and silt content, low pH, high concentra-tions of sodium and potassium, high concentrations of exchange-able aluminum, intermediate levels of organic matter, and lowlevels of base cations. The soil microbial communities at thislocation generally had a high biomass, which was dominated byactinomycetes, slow-growth bacteria, and rhizobia, and interme-diate abundance of fungi, as compared to the other two locations.

Altogether, relationships displayed by the KSOM suggested thatmicrobial community structures varied in relation to several

chemical and physical variables in a non-linear manner, althoughcautionmust be used in attributing these variations to direct effectsof individual variables. Soil microbial communities from GA weregenerally dominated by fungi, as indicated by high F:B ratios. Onthe other hand, no single variables appeared to drive the shifts inthe ratio of Gþ:G� bacteria, which clustered in the center of thepanel representing this variable. This cluster occurred at aconvergent location corresponding to soils having a low pH andhigh clay content. This also corresponded to the locations in CBhaving the highest plant diversity (Shannon H index) and plantspecies richness. Overall microbial diversity (ShannonH values, andrichness as calculated based on FAMEs) was highest in CB and in AS,but was only vaguely associated with plant diversity.

To further model the relationships between the variables, anANN function model was generated using soil chemical and phys-ical variables to predict the microbial community biomass andstructure (F:B and G�:Gþ ratios), as shown in Figs. S8eS17(supplemental material).

The ANN function models that were used to examine relation-ships between pairs of chemical and physical variables yieldedseveral clear patterns in the data. To examine specific relationshipsbetween any two variables, the values for all of the other inde-pendent variables in the data set were set at the average valuesobtained for those variables across the entire data set. Among therelationships that emerged, microbial biomass was predicted toincrease with increasing concentrations of potassium and phos-phorus, each lending a partially additive effect when both variableswere increased in a stepwise fashion (Fig. 3). The effect of potas-sium increases alone were relatively small, whereas increasingphosphorus from 10 to 40 mg P kg�1 resulted in much larger effectsof increasing potassium. The overall data suggests that phosphoruswas the major driving variable that influenced microbial commu-nity structure. The model further suggested that both soil pH andorganic matter content contributed to high microbial biomass,although to a lesser extent than phosphorus and potassium. Mi-crobial biomass increased in a linear fashion with increases in soilpH over the range from 3 to 6, and with increasing organic matterfrom 0.20 to 2% (Fig. S8, supplemental material).

At low pH, increasing soil organic matter resulted in greatereffects on increasing the microbial biomass than at high pH. Rela-tive biomass (sum of FAMEs) increased from 17.7 to 31.8 nmol g�1

soil (w100% increase) at pH 3 when organic matter was increasedfrom 20 to 200 mg kg�1. In comparison, at pH 6, the microbialbiomass increased from 42 to 57 nmol g�1 soil (30%) over the samechange in level of soil organic matter. The same effects were alsoexamined independently for each parcel, where there appeared tobe location-specific differences on the size of the effects of each ofthe physical and chemical variables on microbial biomass (Figs. S9and S10, supplemental material). For example, the effects ofphosphorus increases were much greater at AS and CB than at GA,the latter site characterized by high pH, organic matter, and lowaluminum (Fig. S9, supplemental material).

The ANN function models revealed several factors that wereassociated with changes in F:B ratios (Fig. 4 and Figs. S11eS13,supplemental material). In contrast to microbial biomass, F:B ra-tios decreased with increasing phosphorus and potassium (Fig. 4).The highest F:B ratio was predicted to occur in soils having2 mg K kg�1 and 80 mg P kg�1, at which fungi comprised 45% of thebiomass. When K and P were increased to 12 and 80 mg kg�1

respectively, the F:B ratio decreased to 19%. Soil pH and organicmatter had very little effects on the F:B ratio, which varied onlyfrom 32 to 38 nmol g�1 soil over the full range of pH and organicmatter levels encompassed across the three locations. Moreover,the effects of pH and soil organic matter, while suggesting thatfungi became more dominant as organic matter was increased in

Page 6: Artificial neural network modeling of microbial community structures in the Atlantic Forest of Brazil

Fig. 2. Kohonen self organizing maps of variables describing soil physical chemical and biological properties of three permanent parcels located at the Assis Ecological Station (AS),Carlos Botelho State Park (CB) and Caetetus Ecological Station (GA) representing different climatic and vegetation zones in the Atlantic Forest of São Paulo, Brazil.

E.C. Santos et al. / Soil Biology & Biochemistry 69 (2014) 101e109106

soils with high pH, were below the confidence intervals for statis-tical significance.

4. Discussion

While much remains to be understood regarding the structureand function of microbial communities, previous studies have

shown that soil microbial communities vary across the landscape(de Vries et al., 2012). It can therefore be hypothesized that whiledifferent subsets of microorganisms are carrying out the samefunctions, communities with different structures may carry outbiogeochemical processes at different rates or may have differentcontributions. For example, fungal dominated communities arethought to provide greater inputs of stable carbon that contribute

Page 7: Artificial neural network modeling of microbial community structures in the Atlantic Forest of Brazil

Fig. 3. Microbial biomass in the soil of the Atlantic forest as predicted by ANN modelwith increasing concentrations of potassium and phosphorus. All other variables thanK and P were set as mean values for entire data set across all locations. Biomass valuesare sum of FAME biomarkers.

E.C. Santos et al. / Soil Biology & Biochemistry 69 (2014) 101e109 107

to soil aggregate formation than bacterial dominated communities(Chantigny et al., 1997; Six et al., 2006). The composition of mi-crobial communities may also reflect the environmental factorsthat have shaped the community, for example the abundance ofG�/Gþ bacteria or actinomycetes in relation to pH andwater stress.For these reasons, there is considerable interest in understandingthe environmental factors that shape different microbial commu-nity structures in terrestrial ecosystems.

In this research, our objective was to model the variations in soilmicrobial community structures in relation to soil chemical (pH,cations) and physical (soil texture) variables. Many other factorsthat were not examined here will also influence the structure of

Fig. 4. Effects of extractable soil potassium and phosphorus on Fungi:Bacteria ratios aspredicted by ANN model. All other variables than K and P were set as mean values forentire data set across all locations.

microbial communities, including the type of vegetation and itssubstrate quality, soil moisture, seasonal climate variation, andphysical properties such as mineral composition, bulk density, andsoil aeration. Given all of the disparate types of variables thatpotentially shape microbial communities and the likelihood ofpositive, negative, and synergistic interactions between variouscombinations of variables, the description of microbial commu-nities in relation to soil properties poses a tremendous challenge inecology. This is further complicated by deciding on the appropriatelevel of resolution to describe the microbial community, which canbe approached taxonomically at the level of individual species andfamilies, or by broad level descriptions of taxonomic and functionalgroups at higher levels (e.g. Bacteria, Fungi, etc). Previous studieshave shown the influence of pH as a major control of microbialdiversity, and the large-scale control of fungi/bacterial ratios (Bosioet al., 1998; Bossio and Skow,1995; Fierer and Jackson, 2006). Theseinteractions are further complicated by spatial variability across thelandscape. Many studies have examined agricultural fields wheretillage homogenizes much of the variability. On the other extreme,forests present a more spatially variable system in which differenttypes of plants are distributed over the landscape, and the terrainwill vary in three dimensions, leading to local accumulation oforganic matter, variation in soil particle size, nutrients, and hy-drological characteristics (Jacquemyn et al., 2003; Jamoneau et al.,2011).

In the multidisciplinary Biota-FAPESP program, one of the majorobjectives is to characterize the spatial variability in vegetation andmicrobial community structures across representative pristinelandscapes of the Brazilian Atlantic Forest, with the aim of under-standing the interconnectivity of the above and below groundecosystems (Joly et al., 2010). Fundamental information from thisproject can be used to monitor or predict the impacts of climatechange and anthropogenic factors on the remaining native forest.This knowledge can also be used to assist in transforming agricul-tural lands back into Atlantic Forest vegetation in areas that arenow being restored. In a related project, all the plant species con-tained in 10 ha parcels at CB, AS and GA have been physicallymapped, so that the geographical positions of the plants in theareas are known (Rodrigues, R.R., unpublished). In the same areas,we have collected soil cores at 20 m intervals in a grid pattern toconstruct maps of the soil physical, chemical and biological char-acteristics that could then be overlaid on the vegetation spatialdistribution maps and statistically analyzed to determine whetherthe vegetation controls the microbial community structure asrepresented by FAME profiles. These profiles providemarker FAMEsthat can be used to estimate broad characteristics of the microbialcommunity, including biomass, fungi, mycorrhizae, G� and Gþbacteria, actinomycetes, and community structure characteristicssuch as F:B and G�/Gþ ratios (Table S1, supplemental material).

Maps of themicrobial communities as revealed by FAME profilesshowed that the biomass and structural characteristics variedgreatly across the landscapes, with hotspots in which there wereextreme changes over short distances of meters (Figs. S1eS3,supplemental data maps). Geostatistical analyses showed thephysical and chemical variables could be mapped at distances of20 m, whereas semivariograms for all of the biological variablesshowed no spatial dependence (Figs. S5eS7, Supplemental figures).RDA further showed that approximately 90% of the variation in thesoil biological properties could be explained by chemical andphysical variables. Among the chemical variables, pH and bases(calcium, potassium, magnesium) were the most important vari-ables (highest loading factors) associated with axis 1, whereas Alconcentration and plant species diversity and richness was asso-ciated with axis 2. When examined across the three differentlandscapes, the samples separated out into 3 distinct clusters

Page 8: Artificial neural network modeling of microbial community structures in the Atlantic Forest of Brazil

E.C. Santos et al. / Soil Biology & Biochemistry 69 (2014) 101e109108

indicating similarities for physical, chemical, and biological vari-ables within the parcels (Fig. S4, supplemental material). Compar-isons of means for the marker FAME, based on 95% confidenceintervals, further showed that there were statistically significantdifferences in the microbial community composition for most taxaacross the three sites (data not shown). However, these differencesin taxon distribution could not be attributed to any combinations ofvariables when analyzed by multivariate analyses, or stepwiseregressions.

Given the difficulty in separating out the influences of the maindrivers of community structure using standard statistical analysesand linear ordination, the most promising approach was toconstruct ANN models that used soil chemical and physical vari-ables as input data to describe the biological variables. These var-iables were visualized using KSOM (Fig. 2), which revealed theassociations between all of the variables simultaneously. The KSOManalysis was in agreement with the RDA analysis in showing sitedependent clustering of the samples, with pH, aluminum, and basecations being the main chemical variables that differentiated thethree sites, but the KSOM was more useful in showing the distri-butions of values for each of the different independent anddependent variables. In the individual panels of the KSOM thatrepresent single variables, there were several instances in whichindividual variables demonstrated one or more clusters ofmaximum value, indicating that there were site dependent in-teractions (Fig. 2). The strongest clustering of the soil chemicalvariables and FAME signatures occurred with respect to site loca-tion, after which multiple clusters in the data appear to correspondto particular situations. For example, the map panel for biomassshowed two local maxima, one associated with CB, and the otherwith AS. At CB, these local maxima were further associated withhigh aluminum, sodium, potassium, and clay content. At GA, themain chemical factor was low aluminum. None of these relation-ships were easily inferred from linear ordination (RDA) or geo-statistical analyses.

One of the criteria for evaluating the utility of the ANNmodelingapproach is its ability to predict relationships that have alreadybeen well described in the literature. For example, there has beenmuch prior research examining the effects of soil chemical andphysical variables on F:B ratios. Fungi are generally more abundantunder low pH conditions (Pennanen et al., 1998). In the ANNmodelgenerated here, there was very little effect of pH on the F:B ratio,with the model predicting slight decreases in the F:B ratios withincreasing pH. F:B ratios varied from 0.4 at pH 3 to 0.3 at pH 6 insoils with low organic matter, and from 0.29 to 0.36, respectively, athigh organic matter levels (Fig. S11, supplemental material). Incontrast to our results, recent studies conducted by Rousk et al.(2009) have shown that the growth based measurementsrevealed a 5-fold decrease in bacterial growth and a 5-fold increasein fungal growth with lower pH. Their data thus indicated anapproximately 30-fold increase in fungal dominance, as the soil pHdecreased from 8.3 to 4.5.

Although the KSOM was useful for examining the relationshipsbetween the different variables, it is not possible to attribute causeand effect relationships between chemical and biological variablesfrom these diagrams. On the other hand, several of the relation-ships among physical and chemical variables were obvious, forexample the CB soils had a high clay content, which further cor-responded to low sand, a low pH, and high aluminum concen-tration. The GA soils had a high pH, which corresponded to highbase cation content, high organic matter, high phosphorus, andlow aluminum. Where the associations become particularlyinteresting in the context of this research is the correspondence ofthese chemical and physical variables to the FAME data thatdescribe the microbial communities. To investigate these

relationships, we constructed an ANN function model using thechemical and physical variables as input to describe biologicaloutput variables including biomass, markers for individual mi-crobial groups, and community structure. The resulting modelsshowed that variations in phosphorus and potassium corre-sponded with changes in microbial biomass, and that there werepositive interactions between these variables (Fig. 3), which couldnot be detected using RDA, probably due to the lack of linear re-lationships between the variables. Likewise, pH and organic mat-ter content were important factors positively associated with highmicrobial biomass. To examine the pairwise combinations ofselected variables for particular interactions, generation of thesemodels required setting all of the other variables at selected fixedvalues. Nonetheless, the real power of a multilayer perceptronANN model is the ability to examine how different combinationsof values affect community structure to predict all possible con-ditions that occur within the data set, which is not possible usinglinear or non-linear ordination methods. For example, it is possiblewith the model to predict combined effects of organic matter, pH,potassium, and phosphorus simultaneously at any combination ofvalues within the numerical ranges observed for these variables,and how this differs across the three locations. It has been shownin several instances that microbial biomass is a function of soilorganic matter content, with microbial biomass C normallycomprising 2e4% of the organic C (Sparling, 1992). In ourmodeling, phosphorus had a highly significant relationship withmicrobial biomass, which was also site dependent. Relativebiomass was predicted to increase with increasing phosphorusconcentrations from 5 to 45 nmol g�1 over phosphorus concen-tration values ranging from 10 to 90 mg P kg�1 at GA, and from 18to 65 mg P kg�1 at AS. On the other hand, caution must be usedwith these models when the data extrapolate beyond the range ofactual values for the individual variable at a particular location. Inthis case, ANN models may be useful for guiding hypothesis for-mation, for example by providing an a priori prediction of theeffects of fertilizing with supplemental phosphorus that can beexperimentally tested and analyzed in carefully selected experi-mental plots with control and experimental treatments.

5. Conclusion

FAME profile analyses representing soil microbial communitybiomass and structure were linked to spatial variation in chemicaland physical properties of Atlantic Forest soils when analyzed usingnon-linear statistical procedures. Our findings showed that theANN modeling approach provides novel and detailed insight intothe structure of microbial communities in the Atlantic forest, andshould prove useful for monitoring impacts of anthropogenic fac-tors and assist in forest restoration programs.

Acknowledgments

We thank Dr. Ricardo Ribeiro Rodrigues for providing data onplant species composition in the sampled sites, Dr. Lucas C.B. Aze-vedo for the critical review of the multivariate analyses, and Ste-phen Qi for technical assistance with the FAME analysis. We alsothank the anonymous reviewers for their helpful and constructivecomments that greatly contributed to improving the final version ofthe manuscript. This research was supported by São Paulo ResearchFoundation (BIOTA-FAPESP).

Appendix A. Supplementary data

Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.soilbio.2013.10.049.

Page 9: Artificial neural network modeling of microbial community structures in the Atlantic Forest of Brazil

E.C. Santos et al. / Soil Biology & Biochemistry 69 (2014) 101e109 109

References

Bosio, D., Scow, K.M., Gunapala, N., Graham, K.J., 1998. Determinants of soil mi-crobial communities: effects of agricultural management, season and soil typeon phospholipid fatty acid profiles. Microb. Ecol. 36, 1e12.

Bossio, D.A., Scow, K.M., 1995. Impact of carbon and flooding on the metabolicdiversity of microbial communities in soils. Appl. Environ. Microbiol. 61,4043e4050.

Cavigelli, M.A., Robertson, G.P., Klug, M.J., 1995. Fatty acid methyl ester (FAME)profiles as measures of soil microbial community structure. Plant Soil 170,99e113.

Chantigny, M.H., Angers, D.A., Prévost, D., Vézina, L.-P., Chalifour, F.-P., 1997. Soilaggregation and fungal and bacterial biomass under annual and perennialcropping systems. Soil Sci. Soc. Am. J. 61, 262e267.

de Vries, F.T., Manning, P., Tallowin, J.R., Mortimer, S.R., Pilgrim, E.S., Harrison, K.A.,Hobbs, P.J., Quirk, H., Shipley, B., Cornelissen, J.H., Kattge, J., Bardgett, R.D., 2012.Abiotic drivers and plant traits explain landscape-scale patterns in soil micro-bial communities. Ecol. Lett. 15, 1230e1239.

Englebrecht, A.P., 2007. Computational Intelligence: an Introduction, second ed.Wiley, Inc. 628 pp.

Fierer, N., Jackson, R., 2006. The diversity and biogeography of soil bacterial com-munities. Proc. Natl. Acad. Sci. U. S. A. 103, 626e631.

Jacquemyn, H., Butaye, J., Hermy, M., 2003. Influence of environmental and spatialvariables on regional distribution of forest plant species in a fragmented andchanging landscape. Ecography 26, 768e776.

Jamoneau, A., Sonnier, G., Chabrerie, O., Closset-Kopp, D., Saguez, R., Gallet-Moron, E., Decocq, G., 2011. Drivers of plant species assemblages in forestpatches among contrasted dynamic agricultural landscapes. J. Ecol. 99, 1152e1161.

Joly, C.A., Rodrigues, R.R., Metzger, J.P., Haddad, C.F.B., Verdade, L.M., Oliveira, M.C.,Bolzani, V.S., 2010. Biodiversity conservation research, training, and policy inSão Paulo. Science 328, 1358e1359.

Kaur, A., Chaudhary, A., Kaur, A., Choudhary, R., Kaushik, R., 2005. Phospholipid fattyacid e a bioindicator of environment monitoring and assessment in soilecosystem. Curr. Sci. 89, 1103e1112.

Mele, P.M., Crowley, D.E., 2008. Application of self-organizing maps for assessingsoil biological quality. Agric. Ecosyst. Environ. 126, 139e152.

Noble, P.A., Almeida, J.S., Lovell, C.R., 2000. Application of neutral computingmethods for interpreting phospholipid fatty acid profiles of natural microbialcommunities. Appl. Environ. Microbiol. 66, 694e699.

Oksanen, J., Blanchet, F.G., Kindt, R., Legendre, P., O’hara, R.B., Simpson, G.L.,Solymos, P., Stevens, M.H.H., Wagner, H., 2010. Vegan: Community EcologyPackage. R Package Version 1.17-3. http://CRAN.r-project.org/package¼vegan.

Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Comput.Geosci. 30, 683e691.

Pennanen, T., Fritze, H., Vanhala, P., Kiikkila, O., Neuvonen, S., Baath, E., 1998.Structure of a microbial community in soil after prolonged addition of lowlevels of simulated acid rain. Appl. Environ. Microbiol. 64, 2173e2180.

van Raij, B., Quaggio, J.A., Cantarella, H., Ferreira, M.E., Lopes, A.S., Bataglia, O.C.,1987. Análise química do solo para fins de fertilidade. Fundação Cargill, Cam-pinas, p. 170.

Ramette, A., 2007. Multivariate analyses in microbial ecology. FEMS Microbiol. Ecol.62, 142e160.

Rousk, J., Brookes, P.C., Baath, E., 2009. Contrasting soil pH effects on fungal andbacterial growth suggest functional redundancy in carbon mineralization. Appl.Environ. Microbiol. 149, 1589e1596.

Six, J., Frey, S.D., Thiet, R.K., Batten, K.M., 2006. Bacterial and fungal contributions tocarbon sequestration in agroecosystems. Soil Sci. Soc. Am. J. 70, 555e569.

Sparling, G.P., 1992. Ratio of microbial biomass carbon to soil organic carbon as asensitive indicator of changes in soil organic matter. Aust. J. Soil Res. 30,195e207.

Tabarelli, M., Pinto, L.P., Silva, J.M.C., Costa, C.M.R., 2003. The Atlantic forest of Brazil:endangered species and conservation planning. In: Galindo-Leal, C., Câmara, I.G.(Eds.), The Atlantic Forest of South America: Biodiversity Status, Trends, andOutlook. Center for Applied Biodiversity Science e Island Press, Washington,D.C, pp. 86e94.

Tabarelli, M., Pinto, L.P., Silva, J.M.C., Hirota, M.M., Bedê, L.C., 2005. Desafios eoportunidades para a conservação da biodiversidade na Mata Atlântica brasi-leira. Megadiversidade 1, 132e138.

Webster, R., Oliver, M., 2001. Geostatistics for Environmental Scientists. John Wiley& Sons, Ltd., England, Chichester.

White, D.C., Stair, J.O., Ringelberg, D.B., 1996. Quantitative comparisons of in situmicrobial biodiversity by signature biomarker analysis. J. Ind. Microbiol. 17,185e196.

White, D.C., 1993. In-situ measurement of microbial biomass, community structureand nutritional status. Philos. Trans. R. Soc. Lond. 344, 59e67.

Zelles, L., 1999. Fatty acid patterns of phospholipids and lipopolysaccharides in thecharacterisation of microbial communities in soil: a review. Biol. Fertil. Soils 29,111e129.