dominguez-granda, lock, goethals - 2011(2)

6
Using multi-target clustering trees as a tool to predict biological water quality indices based on benthic macroinvertebrates and environmental parameters in the Chaguana watershed (Ecuador) Luis Dominguez-Granda a, b , Koen Lock a, , Peter L.M. Goethals a a Ghent University, Laboratory of Environmental Toxicology and Aquatic Ecology, Belgium b Escuela Superior Politécnica del Litoral (ESPOL), Instituto de Ciencias Químicas y Ambientales (ICQA), Guayaquil, Ecuador abstract article info Article history: Received 22 December 2009 Received in revised form 23 May 2011 Accepted 24 May 2011 Available online 1 June 2011 Keywords: Aquatic insects Biological indices Multitarget clustering trees Diversity Macroinvertebrates Richness Macroinvertebrates were sampled in the Chaguana river basin in SW Ecuador in the wet season (March) and the dry season (September) of 2005 and 2006. To assess the robustness of several biological indicators, correlations were calculated between both years and between the wet and the dry season. In addition, it was tested if the indices gave signicantly different results for sites with a bad, poor, moderate and good ecological water quality. Composition measures performed poorly in most cases, however, abundance, diversity and richness measures often performed better and tolerance measures, the so-called biotic indices, performed very well, even indices developed for temperate regions. By using pruned multitarget clustering trees, it was possible to predict several well-performing ecological water quality indices simultaneously on the basis of the occurring key macroinvertebrate taxa or, alternatively, on the basis of key environmental variables. In contrast to unpruned trees, which resulted in complex trees that were difcult to interpret and performed inferiorly, pruning resulted in transparent trees. Water quality indices scored high when Hydropsychidae were present and even higher when in addition also Megapodagrionidae were present. When no Hydropsychidae nor Libellulidae were present, the indices reached the lowest scores. However, this model based on key taxa occurrences did not perform well during validation. Water quality indices scored higher with increasing dissolved oxygen concentrations and a strong current velocity. The latter model based on environmental variables also performed well during validation. In the presented study, the ecological water quality could thus be accurately predicted solely on the basis of dissolved oxygen concentration and current velocity. It can therefore be concluded that multitarget clustering trees can be easily used as a practical tool for cost-effective decision support by water quality managers. © 2011 Elsevier B.V. All rights reserved. 1. Introduction Assessment of river health using biological methods is currently commonplace in most temperate countries. Several of these methods have been standardized and included in national and regional monitoring programs (De Pauw et al., 2006; Hering et al., 2003), serving as a basis for policy decisions concerning surface water management. However, this is not the case in most tropical countries, where physicalchemical methods, some of which require expensive laboratory analysis, are predominantly used to assess running water quality. Since most tropical regions consist of developing countries, their limited technical and nancial resources for environmental issues constrain the establishment of national monitoring programs and therefore, cost-effective monitoring programs are needed. After a process of adaptation, testing and standardization, biotic indices for macroinvertebrates can be reliable systems for application in river management of tropical regions. Several studies have already evaluated the applicability of water quality assessment methods from temperate regions in neotropical rivers with satisfactory results (Baptista et al., 2007; Fenoglio et al., 2002; Jacobsen, 1998; Marques and Barbosa, 2001; Silveira et al., 2005; Umana-Villalobos and Springer, 2006). In addition, several bioassess- ment methods have been developed for tropical regions, generally based on a biotic approach and predominantly adaptations of the English BMWP (Biological Monitoring Working Party) (Astorga et al., 1997; Baptista et al., 2007; Chessman, 1995, 2003; Chutter, 1972; Dickens and Graham, 2002; Junqueira and Campos, 1998; Mustow, 2002; Roldán, 2003; Sharma and Moog, 1998). These methods are usually adapted for tropical rivers through (1) the inclusion of local taxa and the exclusion of absent ones and (2) the modication of tolerance value. A range of techniques has already been applied to model the water quality of running waters. Articial neural networks (Dedecker et al., 2007), fuzzy logic (Adriaenssens et al., 2004a; Mouton et al., 2009), Ecological Informatics 6 (2011) 303308 Corresponding author at: Ghent University, Laboratory of Environmental Toxicol- ogy and Aquatic Ecology, J. Plateaustraat 22, 9000 Ghent, Belgium. Tel.: + 32 9 2643996; fax: +32 9 2643766. E-mail address: [email protected] (K. Lock). 1574-9541/$ see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.ecoinf.2011.05.004 Contents lists available at ScienceDirect Ecological Informatics journal homepage: www.elsevier.com/locate/ecolinf

Upload: margaret-rodriquez

Post on 28-Jan-2016

213 views

Category:

Documents


0 download

DESCRIPTION

Ecologia

TRANSCRIPT

Page 1: Dominguez-Granda, Lock, Goethals - 2011(2)

Ecological Informatics 6 (2011) 303–308

Contents lists available at ScienceDirect

Ecological Informatics

j ourna l homepage: www.e lsev ie r.com/ locate /eco l in f

Using multi-target clustering trees as a tool to predict biological water quality indicesbased on benthic macroinvertebrates and environmental parameters in theChaguana watershed (Ecuador)

Luis Dominguez-Granda a,b, Koen Lock a,⁎, Peter L.M. Goethals a

a Ghent University, Laboratory of Environmental Toxicology and Aquatic Ecology, Belgiumb Escuela Superior Politécnica del Litoral (ESPOL), Instituto de Ciencias Químicas y Ambientales (ICQA), Guayaquil, Ecuador

⁎ Corresponding author at: Ghent University, Laboratogy and Aquatic Ecology, J. Plateaustraat 22, 9000 Ghent,fax: +32 9 2643766.

E-mail address: [email protected] (K. Lock).

1574-9541/$ – see front matter © 2011 Elsevier B.V. Aldoi:10.1016/j.ecoinf.2011.05.004

a b s t r a c t

a r t i c l e i n f o

Article history:Received 22 December 2009Received in revised form 23 May 2011Accepted 24 May 2011Available online 1 June 2011

Keywords:Aquatic insectsBiological indicesMultitarget clustering treesDiversityMacroinvertebratesRichness

Macroinvertebrates were sampled in the Chaguana river basin in SW Ecuador in the wet season (March) andthe dry season (September) of 2005 and 2006. To assess the robustness of several biological indicators,correlations were calculated between both years and between the wet and the dry season. In addition, it wastested if the indices gave significantly different results for sites with a bad, poor, moderate and good ecologicalwater quality. Composition measures performed poorly in most cases, however, abundance, diversity andrichness measures often performed better and tolerance measures, the so-called biotic indices, performedvery well, even indices developed for temperate regions. By using pruned multitarget clustering trees, it waspossible to predict several well-performing ecological water quality indices simultaneously on the basis of theoccurring key macroinvertebrate taxa or, alternatively, on the basis of key environmental variables. Incontrast to unpruned trees, which resulted in complex trees that were difficult to interpret and performedinferiorly, pruning resulted in transparent trees. Water quality indices scored high when Hydropsychidaewere present and even higher when in addition also Megapodagrionidae were present. When noHydropsychidae nor Libellulidae were present, the indices reached the lowest scores. However, this modelbased on key taxa occurrences did not perform well during validation. Water quality indices scored higherwith increasing dissolved oxygen concentrations and a strong current velocity. The latter model based onenvironmental variables also performed well during validation. In the presented study, the ecological waterquality could thus be accurately predicted solely on the basis of dissolved oxygen concentration and currentvelocity. It can therefore be concluded that multitarget clustering trees can be easily used as a practical tool forcost-effective decision support by water quality managers.

ory of Environmental Toxicol-Belgium. Tel.: +32 9 2643996;

l rights reserved.

© 2011 Elsevier B.V. All rights reserved.

1. Introduction

Assessment of river health using biological methods is currentlycommonplace in most temperate countries. Several of these methodshave been standardized and included in national and regionalmonitoring programs (De Pauw et al., 2006; Hering et al., 2003),serving as a basis for policy decisions concerning surface watermanagement. However, this is not the case in most tropical countries,where physical–chemical methods, some of which require expensivelaboratory analysis, are predominantly used to assess running waterquality. Since most tropical regions consist of developing countries,their limited technical and financial resources for environmentalissues constrain the establishment of national monitoring programsand therefore, cost-effective monitoring programs are needed. After a

process of adaptation, testing and standardization, biotic indices formacroinvertebrates can be reliable systems for application in rivermanagement of tropical regions.

Several studies have already evaluated the applicability of waterquality assessment methods from temperate regions in neotropicalriverswith satisfactory results (Baptista et al., 2007; Fenoglio et al., 2002;Jacobsen, 1998; Marques and Barbosa, 2001; Silveira et al., 2005;Umana-Villalobos and Springer, 2006). In addition, several bioassess-mentmethodshavebeendeveloped for tropical regions, generally basedon a biotic approach and predominantly adaptations of the EnglishBMWP (Biological Monitoring Working Party) (Astorga et al., 1997;Baptista et al., 2007; Chessman, 1995, 2003; Chutter, 1972; Dickens andGraham, 2002; Junqueira and Campos, 1998; Mustow, 2002; Roldán,2003; Sharma andMoog, 1998). These methods are usually adapted fortropical rivers through (1) the inclusionof local taxa and the exclusionofabsent ones and (2) the modification of tolerance value.

A range of techniques has already been applied to model the waterquality of running waters. Artificial neural networks (Dedecker et al.,2007), fuzzy logic (Adriaenssens et al., 2004a; Mouton et al., 2009),

Page 2: Dominguez-Granda, Lock, Goethals - 2011(2)

304 L. Dominguez-Granda et al. / Ecological Informatics 6 (2011) 303–308

classification trees (Dakou et al., 2007; Dzeroski et al., 2000), Bayesianbelief networks (Adriaenssens et al., 2004b) and support vectormachines (Ambelu et al., 2010; Hoang et al., 2010) have proven tohave a high potential for water quality assessment. However, in thepresent study, multi-target clustering trees were used, which haverarely been used for water quality assessment (Everaert et al., inpress). This technique is applicable for relatively small datasets andhas the advantage that several parameters can be predicted simul-taneously. In addition, their transparency makes them suitable as apractical tool for decision support by water quality managers.

Walley andDzeroski (1995) already useddatamining techniques forbiological riverquality assessment, however,Dzeroski et al. (1997)wereamong the first to apply classification trees for river communityanalysis: based on biological data, British rivers were classified andSlovenian rivers were classified based on the influence of physical andchemical parameters on selected bioindicator organisms. Simultaneouspredictions of multiple physical–chemical properties were made fromits biological properties using a single decision tree by Blockeel et al.(1999) and they also predicted past physical–chemical properties of theriver water from its current biological properties. On the basis ofbiological data, Dzeroski et al. (2000) predicted physical–chemicalvariables: taxa that occurred in many trees were considered as usefulindicator taxa. Habitat suitability for six macroinvertebrate taxa in theriver Axios (Northern Greece) were predicted by Dakou et al. (2007) byinducing decision trees. Also D'heygere et al. (2003, 2006) predicted theoccurrence of several macroinvertebrate taxa in Flanders based on aselection of environmental variables. River types for abiotic features,riffle zones, fish assemblages and macroinvertebrate assemblages fromriver edges were defined by Turak and Koop (2008) on the basis ofreference sites in New South Wales (Australia). In contrast to theclassical single-target approach, which learns a model for each targetattribute separately, a multi-target approach was recently developed,which builds one model for all target attributes simultaneously (Kocev

9

8

2

6

13

31

12

15

4

14

2021

10

11

18

22

29 23

28

24

19

5

17

16

9

8

2

6

13

31

12

15

4

14

2021

10

11

18

22

29 23

28

24

19

5

17

16

Fig. 1. Location of the sampling stations in the Chaguana watershed, with indication of th

et al., 2009). The latter technique was already used to predict thepresence of several alien macroinvertebrates based on the measuredenvironmental variables (Everaert et al., in press) and is potentially alsoa suitable tool in water quality assessments.

The general objectives of the present study were (1) to evaluate thesuitability of several biological indices for water quality assessment ofthe Chaguana watershed in Ecuador and (2) using multi-targetclustering trees to predict several reliable biological indices simulta-neously on the basis of the presence of key macroinvertebrate taxa or,alternatively, based on key environmental variables. In this way, it wasassessedwhethermulti-target clustering trees couldbeused asa tool forcost-effective water quality assessment.

2. Materials and methods

2.1. Study area

The Chaguana river basin is situated in El Oro Province in SWEcuador (Fig. 1). The basin drains an area of approximately 32,000 ha,flowing from the occidental slope of theW Andes to a larger watershedcalled the Pagua river basin. The Chaguana river basin drains amountainous area of difficult access with headwaters located at about2900 m above sea level. Near natural conditions (humid forests andbrushes,mangroves and uncultivated land) cover 37% of the land, whileactivities performed in the basin are shrimp farming, human settle-ments and in recent years gold mining (Matamoros, 2004).

2.2. Sampling

Within the Chaguana river basin, 29 sampling sites, distributedamongdifferent land use categories,were selected to be surveyed in thewet season (March) and the dry season (September) of 2005 and 2006.However, in 2006, access to some stations was denied due to land use

25

26

27

7

km

25

26

27

7

km

e sites with a high (black), an intermediate (grey) and a low human impact (white).

Page 3: Dominguez-Granda, Lock, Goethals - 2011(2)

305L. Dominguez-Granda et al. / Ecological Informatics 6 (2011) 303–308

conflicts. In total, 104 samples were taken: 29 during the dry season of2005, 29 during the wet season of 2005, 24 during dry season of 2006and 22 during thewet season of 2006.Macroinvertebrate sampleswerealways collected by the same operator by means of a standard hand netconsisting of a metal frame holding a conical net (20×30 cm, 300 μmmesh size). Sampling duration was 3 min active sampling in 2005, in2006 sampling duration was increased to 10 min active sampling.Organisms were collected from the different habitats present at thesampling site. Riffle habitats were sampled by holding the netdownstream while the operator disturbed the substratum by kickingdirectly in front of the net opening. Stream edge habitats were sampledby vigorously sweeping along the stream margins disturbing bottomand bank substratum. The objective of the sampling was to collect themost representative taxa of macroinvertebrates at the site examined.After separation, macroinvertebrates were identified under a stereomi-croscope. The taxonomical knowledge of stream fauna in Ecuadorianstreams is still scarce; therefore aquatic insectswere identified at familylevel with the available literature containing identification keys anddescriptions of the riverine fauna of the region (Domínguez et al., 1994;Fernández and Dominguez, 2001; Roldán, 1988). Non insects weremostly identified at higher taxonomic levels.

2.3. Indices

This study deals with the performance of biological methodsdeveloped in Western Europe, North, South and Central America,

Table 1Spearman rank correlations between the years 2005 and 2006 and between the dry and thewith a bad, poor, moderate and good ecological quality according to the BMWP-Colombia.

Year Season Discrimina

Bad–good

Abundance measures# Individuals 0.19 nc 0.33 * ***

Diversity measuresMargalef 0.55 *** 0.49 *** ***Shannon 0.64 *** 0.49 *** ***Simpson 0.56 *** 0.31 * ***Evenness 0.15 nc 0.26 nc

Richness measures# Taxa 0.47 ** 0.42 ** ***# EPT taxa 0.72 *** 0.59 *** ***# Ephemeroptera taxa 0.60 *** 0.53 *** ***# Plecoptera taxa 0.80 *** 0.29 * **# Trichoptera taxa 0.60 *** 0.44 ** ***# Diptera taxa 0.55 *** 0.56 *** ***

Tolerance measuresBMWP 0.59 *** 0.47 *** ***BMWP-ASPT 0.66 *** 0.53 *** ***IBMWP 0.56 *** 0.45 *** ***IBMWP-IASPT 0.40 ** 0.32 * ***Family Biotic index −0.052 nc 0.00054 ncBMWP/Col 0.64 *** 0.61 *** ***BMWP-ASPT/Col 0.53 *** 0.39 ** ***BMWP (CR) 0.66 *** 0.64 *** ***BMWP-ASPT (CR) 0.74 *** 0.62 *** ***Signal 2 score (ab.) 0.69 *** 0.64 *** ***Signal 2 score (not ab.) 0.68 *** 0.59 *** ***SASS5 0.68 *** 0.60 *** ***SASS5-ASPT 0.63 *** 0.50 *** ***NEPBIOS 0.68 *** 0.60 *** ***NEPBIOS-ASPT 0.66 *** 0.68 *** ***BMWPTHAI 0.64 *** 0.57 *** ***BMWP-ASPTTHAI 0.52 *** 0.46 *** ***

Composition measures% Hydropsychidae of Trichoptera 0.44 * −0.056 nc ***% EPT 0.59 *** 0.16 nc **% Ephemeroptera 0.48 ** 0.23 nc% Trichoptera 0.75 *** 0.58 *** ***% Diptera 0.046 nc 0.080 nc% Chironomidae 0.012 nc 0.096 nc

pb0.05 *; pb0.01 **, pb0.001 ***, nc = no significant correlation, ab. = abundance.

Africa, Asia and Australia for the assessment of the Chaguana riverbasin. The Biological MonitoringWorking Party (BMWP) (Armitage etal., 1983), which is the water quality index used in the UK, has itsorigin in the Trent Biotic Index, the first biotic index developed for theassessment of running water. The BMWP was improved by Walleyand Hawkes (1996, 1997). If the BMWP index is divided by thenumber of scoring families present in the taxa list, the result is knownas the Average Score Per Taxon (ASPT) index. The BMWPwas adaptedfor Colombia (BMWP/Col) by Roldán (2003) and for Costa Rica(BMWP/CR) by Astorga et al. (1997). The Stream Invertebrate GradeNumber Average Level index (SIGNAL) was developed by Chessman(1995) for the assessment of organic pollution in running waters of SEAustralia and later adjusted for application in the whole country(Chessman, 2003). This index can be calculated with and without theabundance weighing. Also the Nepalese Biotic Score (NEPBIOS)(Sharma and Moog, 1998) is an adaptation of the BMWP and Mustow(2002) developed the BMWP for Thailand (BMWPTHAI). The SouthAfrican Scoring System (SASS) was originally developed by Chutter(1972) for river quality assessment in South Africa. Several improve-ments have been made and here, the fifth version (SASS5) wasapplied (Dickens and Graham, 2002). The Family Biotic Index (FBI)was developed by Hilsenhoff (1988) for application in the US. TheIberian BMWP (IBMWP) was adapted for the Iberian Peninsula (Alba-Tercedor and Sanchez-Ortega, 1988). When available, latest versionsof these indices were applied aiming to take into account their recentimprovements (e.g. updated tolerance values, inclusion of new taxa).

wet season. The discriminative power (Mann–Whitney U test) is given for the samples

tive power

Bad–moderate Poor–good Bad–poor Poor_moderate Moderate–good

*** *** ** * **

*** *** *** *** ****** *** ** ** ***** *** **

*** *** *** *** ****** *** *** *** ****** *** ** *** *

**** *** *** * ****** *** ** ***

*** *** *** *** ****** *** ****** *** *** *** ****** *** *** *

***** *** *** *** ****** *** *** **** *** *** *** ****** *** *** ****** *** ** **** ** ****** *** *** *** ****** ****** *** *** *** ****** *** *** * ***** *** *** *** ****** *** *** * *

*** ** * **** ***** *** *** *

Page 4: Dominguez-Granda, Lock, Goethals - 2011(2)

absent

present

Number of EPT-taxa = 5.6BMWP Colombia = 115BMWP Costa Rica = 87

Hydropsychidae

Number of EPT-taxa = 5.0BMWP Colombia = 78BMWP Costa Rica = 59

Number of EPT-taxa = 1.7BMWP Colombia = 46BMWP Costa Rica = 30

Number of EPT-taxa = 1.6BMWP Colombia = 25BMWP Costa Rica = 19

Libellulidae

Megapodagrionidae

absent

absent

present

present

Fig. 2.Multi-target clustering tree which predicted the number of EPT-taxa, the BMWP-Colombia and the BMWP-Costa Rica based on the occurrence of macroinvertebratetaxa.

306 L. Dominguez-Granda et al. / Ecological Informatics 6 (2011) 303–308

Apart from biotic indices, also several diversity indices were calculated:the Margalef (1951) index, the Simpson (1949) index, the Shannon–Wiener index (Shannon andWeaver, 1963) and the Shannon evennessindex,which is calculated bydividing the Shannon–Wiener indexby thenatural logarithm of the number of taxa.

2.4. Statistics

To check the robustness of the biological indicators, Spearmanrank correlations were applied between the years 2005 and 2006 andbetween the dry and the wet season. As the BMWP-Colombiaincluded the highest number of taxa that were present in theChaguana watershed and because Colombia is the country which isclosest to Ecuador of all countries which developed biological indices,the performance of all indices was evaluated based on the outcome ofthe BMWP-Colombia. In addition, the BMWP-Colombia best reflectedthe three classes of human impact, whichwere separated based on themacroinvertebrate community composition using multivariate anal-ysis (Fig. 1) (Dominguez-Granda et al., in press). Four water qualityclasses were recognised based on outcome of the BMWP-Colombia:bad (0–40), poor (41–70), moderate (71–100) and good (N100). TheMann–Whitney U Test was used to identify significant differencesbetween sites with different water quality classes.

Based on a training set consisting of the 58 samples taken in 2005,multi-target clustering treeswere built using CLUS (Blockeel and Struyf,2002). A test set consisting of the 46 samples taken in 2006was used formodel validation. The leaves of a multi-target classification tree store a

32 cm.s-1

Dissolved oxygen

Number of EPT-taxa = 2.0BMWP Colombia = 35BMWP Costa Rica = 24

6.83 mg.l-1 >6.83 mg.l-1

7.6

Number of EPT-taxa = 2.9BMWP Colombia = 68BMWP Costa Rica = 49

Number of EPT-taxaBMWP Colombia =BMWP Costa Rica =

Current ve

Fig. 3. Multi-target clustering tree which predicted the number of EPT-taxa, the B

vector of class values, instead of storing a single class value like single-target classification trees do. This means that each component of thevector is a prediction for one of the target attributes. Multi-targetclustering trees were constructed by top-down induction and the treeswere pruned by only generating clusters with at least 10 instances ineach subset. The stability of the trees was maximised using a 10-foldcross-validation procedure.Model performancewas evaluatedbased onPearson correlation.

3. Results

The total number of individuals was not constant over the yearsbecause the sampling effort was increased during the second year ofsampling, however, the number of individuals already gave arelatively good idea of the ecological quality (Table 1). The diversitynumbers of Margalef and Shannon performed very well, while theSimpson index performed less good and the evenness performed verybadly: results varied depending on the year and the season and it hada very low discriminative power. Most richness measures, especiallythe number of EPT taxa, were good indicators for the water quality:constant results were obtained over the years and the seasons anddiscriminative power was high. Only the number of Plecoptera taxadid not perform that good. With one exception, the biotic measuresperformed very good: results were hardly affected by the year or theseason of sampling and the discriminative power was usually high.However, the Family Biotic Index varied between the years and theseasons and only the sites with a high and a low human impact couldbe separated. The discriminative power of the indices was usuallyhigher than the average score per taxon (ASPT) variant of therespective indices. The composition measures were poor indicators,only the fraction Trichoptera performed relatively well.

Using a multi-target clustering tree, three ecological water qualityindices with a good performance (number of EPT-taxa, BMWP-Colombia and BMWP-Costa-Rica) were predicted based on theoccurring macroinvertebrate taxa (Pearson R=0.79) (Fig. 2). IfHydropsychidae were present, water quality was better. When inaddition also Megapodagrionidae were present, the highest waterquality scores were obtained. However, when no Hydropsychidae norLibellulidae were present, the lowest scores were predicted. However,when the model developed for the samples of 2005 was validatedwith the samples of 2006, the ecological water quality could not bepredicted accurately (Pearson R=0.21).

The ecological water quality indices could also be predicted on thebasis of the environmental variables (Pearson R=0.73) (Fig. 3). Whenthe dissolved oxygen concentration was lower than 6.83 mg l−1, thelowest ecological water quality scores were found. When the dissolved

>32 cm.s-1

Dissolved oxygen

2 mg.l-1 >7.62 mg.l-1

= 5.1 81 61

Number of EPT-taxa = 6.3BMWP Colombia = 122BMWP Costa Rica = 93

locity

MWP-Colombia and the BMWP-Costa Rica based on environmental variables.

Page 5: Dominguez-Granda, Lock, Goethals - 2011(2)

307L. Dominguez-Granda et al. / Ecological Informatics 6 (2011) 303–308

oxygen concentration was higher, water quality further improved witha current velocity higher than 32 cm s−1 and when the dissolvedoxygen concentration was higher than 7.62 mg l−1, the highest scoreswere obtained. In contrast to trees based on taxa composition, the treesbased on environmental variables performed well during validation(Pearson R=0.66).

4. Discussion

With the exception of the Family Biotic Index (Hilsenhoff, 1988),all biotic measures performed quite well. This could be expectedbecause these measures were especially developed for this purpose.Not only the methods developed for Colombia (Roldán, 2003) andCosta Rica (Astorga et al., 1997), but even the methods that weredeveloped for temperate regions (Alba-Tercedor and Sanchez-Ortega,1988; Armitage et al., 1983; Walley and Hawkes, 1996, 1997) andThailand (Mustow, 2002), South Africa (Chutter, 1972; Dickens andGraham, 2002), Australia (Chessman, 1995, 2003) and Nepal (Sharmaand Moog, 1998), gave good results in the Chaguana watershed.Notwithstanding the fact that most biotic measures performed quitewell, it is preferable to use an index of an area with a similarmacrobenthic fauna. Since Colombia and Costa Rica had the macro-benthic fauna that most resembled that of the Chaguana river basin inEcuador, these indices are preferable until Ecuador develops its ownindex. The biotic indices performed usually better than the averagescore per taxon (ASPT) variant of the respective indices: not takingthe number of occurring taxa into account thus seemed to decreasethe performance of the biotic indices.

The reduction of the dimensionality of a clustering tree contributesto a more easy interpretation of the revealed trends in the data,focusing the attention on the important variables (Dzeroski et al.,1997). Optimal pruning is an important mechanism as it improves thetransparency of the induced trees by reducing their size and enhancesthe accuracy by eliminating errors that are present due to noise in thedata (Dakou et al., 2007). In addition, the generalisation capacity ofcomplex trees is usually corrupted because these trees can beoverfitted to the used dataset. Therefore, the tree size in the presentstudy was reduced by using a large minimum group size for division.In this way, small trees were obtained which are easy to interpret andwhich can be used for decision support by water quality managers.

Based on the occurring taxa and the environmental variables, multi-target clustering trees could be induced thatwere able to predict severalecological water quality indices simultaneously. The performance fortrees based on the occurring taxa and the environmental variables wassimilar for the models developed based on the samples of 2005.However, validationwith the samples of 2006 indicated that trees basedon taxa composition did not result in accurate predictions. Althoughsampling effort was higher in 2006 (10 min active sampling) than in2005 (3 min active sampling), it is unlikely that sampling effort causedthe poor performance of the clustering tree based on taxa compositionsince the indicator taxa were usually present at high densities if theywere present. It is more probable that taxa composition of macro-invertebrate communities is to variable to make decisions based on theoccurrence of a single taxon. However, multi-target clustering treesbased on environmental variables performed much better duringvalidation and their predictive performance would probably furtherimprove when additional variables such as nutrient concentrationscould be incorporated as well.

5. Conclusions

The presented results indicate that indices developed for other, evenremote, countries can already give a good idea of the ecological waterquality. However, for application in routinewater quality assessments, itis advised that region-specificmethodsaredeveloped.With someminormodifications to adapt the existing indices to the local fauna, it should

therefore be possible to develop a biotic index for macroinvertebratesfor water quality assessment in Ecuador. Using multi-target clusteringtrees, it was possible to predict the ecological water quality on the basisof the occurring taxa as well as on the basis of the environmentalparameters. However, validation of thesemodels indicated that modelsbased on occurrences did not perform well, whereas models based onenvironmental variables were more reliable and the inclusion ofadditional parameters could even improve their predictive power.Multi-target clustering trees can therefore be a valuable tool for cost-effective decision support in water quality assessment.

Acknowledgments

The first author is grateful for the financial support of the VLIR-ESPOL IUC programme in Ecuador and SENACYT. In particular, wewould like to thank Magda Vincx, coordinator of the VLIR-ESPOL IUCprogramme, Nancy Fockedey, pioneer of this programme in Ecuadorand Pilar Cornejo, director of this program in Ecuador. We would alsolike to thank Galo, Erick, Christian L., Christian R., Christian V., Felixand Santiago for the help during the field work. Koen Lock is currentlysupported by a post-doctoral fellowship from the Fund for ScientificResearch (FWO-Vlaanderen, Belgium).

References

Adriaenssens, V., De Baets, B., Goethals, P.L.M., De Pauw, N., 2004a. Fuzzy rule-basedmodels for decision support in ecosystem management. Sci. Total. Environ. 319,1–12.

Adriaenssens, V., Goethals, P.L.M., Charles, J., De Pauw, N., 2004b. Application ofBayesian Belief Networks for the prediction of macroinvertebrate taxa in rivers.Ann. Limnol.- Int. J. Limnol. 40, 181–191.

Alba-Tercedor, J., Sanchez-Ortega, A., 1988. Un método rápido y simple para evaluar lacalidad biológica de las aguas corrientes basado en el de Helawell (1978). Limnetica4, 51–56.

Ambelu, A., Lock, K., Goethals, P.L.M., 2010. Comparison of modeling techniques topredict macroinvertebrate community composition in rivers of Ethiopia. Ecol.Inform. 5, 147–152.

Armitage, P.D., Moss, D., Wright, J.F., Furse, M.T., 1983. The performance of a newbiological water quality score system based on macroinvertebrates over a widerange of unpolluted running-water sites. Wat. Res. 17, 333–347.

Astorga, Y., De Pauw, N., Persoone, G., 1997. Development and application of cost-effectivemethods for biological monitoring of rivers in Costa Rica. European Community. Finalreport, Joint research European Union Project No NCI1* CT-92-0094.

Baptista, D.F., Buss, D.F., Egler, M., Giovanelli, A., Silveira, M.P., Nessimian, J.L., 2007. Amultimetric index based on benthic macroinvertebrates for evaluation of Atlanticforest streams at Rio de Janeiro State, Brazil. Hydrobiologia 575, 83–94.

Blockeel, H., Struyf, J., 2002. Efficient algorithms for decision tree cross-validation.J. Mach. Learn. Res. 3, 621–650.

Blockeel, H., Dzeroski, S., Grbovic, J., 1999. Simultaneous prediction of multiplechemical parameters of river water quality with TILDE. Lect. Notes Artif. Intell.1704, 32–40.

Chessman, B.C., 1995. Rapid assessment of rivers using macroinvertebrates: aprocedure based on habitat-specific sampling, family level identification and abiotic index. Aust. J. Ecol. 20, 122–129.

Chessman, B.C., 2003. SIGNAL 2 — a scoring system for macroinvertebrate (‘waterbugs’) in Australian rivers. Monitoring River Heath Initiative Technical Report no31Commonwealth of Australia, Canberra. 32 pp.

Chutter, F.M., 1972. An empirical biotic index of the quality of water in South Africanstreams and rivers. Wat. Res. 6, 19–30.

D'heygere, T., Goethals, P.L.M., De Pauw, N., 2003. Use of genetic algorithms to selectinput variables in decision tree models for the prediction of benthic macroinverte-brates. Ecol. Model. 160, 291–300.

D'heygere, T., Goethals, P.L.M., De Pauw, N., 2006. Genetic algorithms for optimisationof predictive ecosystemsmodels based on decision trees and neural networks. Ecol.Model. 195, 20–29.

Dakou, E., D'heygere, T., Dedecker, A.P., Goethals, P.L.M., Lazaridou-Dimitriadou, M., DePauw, N., 2007. Decision tree models for prediction of macroinvertebrate taxa inthe river Axios (Northern Greece). Aquat. Ecol. 41, 399–411.

De Pauw, N., Gabriels, W., Goethals, P.L.M., 2006. River monitoring and assessmentmethods based on macroinvertebrates. In: Ziglio, G., Siligardi, M., Flaim, G. (Eds.),Biological Monitoring of Rivers: Applications and Perspectives. John Wiley & Sons,Chichester, pp. 113–134.

Dedecker, A., Van Melckebeke, K., Goethals, P.L.M., De Pauw, N., 2007. Development ofmigration models for macroinvertebrates in the Zwalm river basin (Flanders,Belgium) as tools for restoration management. Ecol. Model. 203, 72–86.

Dickens, C.W.S., Graham, P.M., 2002. The South African Scoring System (SASS) version 5rapid bioassessment method for rivers. Afr. J. Aquat. Sci. 27, 1–10.

Domínguez, E., Hubbard, M.D., Pescador, M.L., 1994. Los Ephemeroptera en Argentina.Fauna de Agua Dulce de la Republica Argentina 33, 1–142.

Page 6: Dominguez-Granda, Lock, Goethals - 2011(2)

308 L. Dominguez-Granda et al. / Ecological Informatics 6 (2011) 303–308

Dominguez-Granda, L., Lock, K., Goethals, P.L.M., in press. Application of classificationtrees to determine biological and chemical indicators for river assessment: case-study in the Chaguana watershed (Ecuador). J. Hydroinform.

Dzeroski, S., Grbovic, J., Walley, W.J., 1997. Machine learning applications in biologicalclassification of river water quality. In: Michalski, R.S., Bratko, I., Kubat, M. (Eds.),Machine Learning and Data Mining: Methods and Applications. John Wiley andSons Ltd., New York, pp. 429–448.

Dzeroski, S., Demsar, D., Grbovic, J., 2000. Predicting chemical parameters of river waterquality from bioindicator data. Appl. Intell. 13, 7–17.

Everaert, G., Boets, P., Lock, K., Goethals, P.L.M., in press. Application of decision trees toanalyze the ecological impact of invasive species in polder lakes in Flanders,Belgium. Ecol. Model. doi:10.1016/j.ecolmodel.2010.08.013.

Fenoglio, S., Badino, G., Bona, F., 2002. Benthic macroinvertebrate communities asindicators of river environment quality: an experience in Nicaragua. Rev. Biol. Trop.50, 1125–1131.

Fernández, H.R., Dominguez, E., 2001. Guía para la determinación de los artrópodosbentónicos sudamericanos. Universidad Nacional de Tucumán, Tucumán. 282 pp.

Hering,D., Buffagni, A.,Moog, O., Sandin, L., Sommerhäuser,M., Stubauer, I., Feld, C., Johnson,R., Pinto, P., Skoulikidis, N., Verdonschot, P., Zahrádková, S., 2003. The development of asystemtoassess the ecological quality of streamsbasedonmacroinvertebrates—designof the sampling programmewithin the AQEMproject. Int. Rev. Hydrobiol. 88, 345–361.

Hilsenhoff, W.L., 1988. Rapid field assessment of organic pollution with a family-levelbiotic index. J. N. Amer. Benth. Soc. 7, 65–68.

Hoang, T.H., Lock, K., Mouton, A., Goethals, P.L.M., 2010. Application of decision treesand support vector machines to model the presence of macroinvertebrates in riversin Vietnam. Ecol. Inform. 5, 140–146.

Jacobsen, D., 1998. The effect of organic pollution on the macroinvertebrate fauna ofEcuadorian highland streams. Arch. Hydrobiol. 143, 179–195.

Junqueira, V.M., Campos, S.C.M., 1998. Adaptation of the “BMWP” method for waterquality evaluation to Rio das Velhas watershed (Minas Gerais, Brazil). Acta Limnol.Bras. 10, 123–135.

Kocev, D., Dzeroski, S., White, M.D., Newell, G.R., Griffoens, P., 2009. Using single andmulti-target regression trees and ensembles to model a compound index ofvegetation condition. Ecol. Model. 220, 1159–1168.

Margalef, R., 1951. Diversidad de especies en las comunidades naturales. Publ. Inst. Biol.Appl. Barc. 6, 59–72.

Marques, M., Barbosa, F., 2001. Biological quality of waters from an impacted tropicalwatershed (middle Rio Doce basin, southeast Brazil), using benthic macroinverte-brate communities as an indicator. Hydrobiologia 457, 69–76.

Matamoros, D., 2004. Predicting river concentrations of pesticides from bananaplantations under data-poor condition. Ghent University, Ghent. 204 pp.

Mouton, A.M., De Baets, B., Goethals, P.L.M., 2009. Knowledge-based versus data-drivenfuzzy habitat suitability models for river management. Environ. Modell. Softw. 24,982–993.

Mustow, S.E., 2002. Biological monitoring of rivers in Thailand: use and adaptation ofthe BMWP score. Hydrobiologia 479, 191–229.

Roldán, G., 1988. Guía para el estudio de los macroinvertebrados acuáticos delDepartamento de Antioquia. Fondo FEN Colombia, Conciencias-Universidad deAntioquia, Santafé de Bogota (Colombia).

Roldán, G., 2003. Bioindicación de la calidad del agua en Colombia. Uso del métodoBMWP/Col. Universidad de Antioquia, Medellín, Medellín (Colombia). 182 pp.

Shannon, C.E., Weaver, W., 1963. The Mathematical Theory of Communication.University of Illinois Press, Urbana, Urbana (Illinois). 117 pp.

Sharma, S., Moog, O., 1998. The applicability of biotic indices and scores in water qualityassessment of Nepalese rivers. In: Chalise, S.R., Herrmann, A., Khanal, N.R., Lang, H.,Molnar, L., Pokhrel, A.P. (Eds.), Ecohydrology of High Mountain Areas, Proceedingsof the International Conference on Ecohydrology of High Mountain Areas. ICIMOD,UNESCO, Kathmandu. 680 pp.

Silveira, M.P., Baptista, D.F., Buss, D.F., Nessimian, J.L., Egler, M., 2005. Application ofbiological measures for stream integrity assessment in South-East Brazil. Environ.Monit. Assess. 101, 117–128.

Simpson, E.H., 1949. Measurement of diversity. Nature 163 688–688.Turak, E., Koop, K., 2008. Multi-attribute ecological river typology for assessing

ecological condition and conservation planning. Hydrobiologia 603, 83–104.Umana-Villalobos, G., Springer, M., 2006. Environmental variation in the Grande de

Terraba river and some of its tributaries, south Pacific of Costa Rica. Rev. Biol. Trop.54, 265–272.

Walley, W.J., Dzeroski, S., 1995. Biological monitoring: a comparison between Bayesian,neural and machine learning methods of water quality classification. In: Denzer, R.,Shimak, G., Russell, D. (Eds.), Environmental Software Systems. Chapman & Hall,London, pp. 229–240.

Walley, W.J., Hawkes, H.A., 1996. A computer-based reappraisal of BiologicalMonitoring Working Party scores using data from the 1990 River Quality Surveyof England and Wales. Wat. Res. 30, 2086–2094.

Walley, W.J., Hawkes, H.A., 1997. A computer-based development of the BiologicalMonitoring Working Party score system incorporating abundance rating, biotopetype and indicator value. Wat. Res. 31, 201–210.