Download - JVS 5795 De Caceres Accepted-Revised
1
Numerical reproduction of traditional classifications and automatic 1
vegetation identification 2
Miquel de Cáceres1, 2,*, Xavier Font1, Paloma Vicente1, Francesc Oliva2
1Department of Plant Biology, University of Barcelona, Avda. Diagonal 645, Barcelona, ES-08028, Spain; 2Department of Statistics, University of Barcelona, Avda. Diagonal 645, Barcelona, ES-08028, Spain; *Corresponding author; E-mail: [email protected] Abstract 3
Questions: Is it possible to develop an expert system to provide reliable automatic identifications 4
of plant communities at the precision level of phytosociological associations? How can unreliable 5
expert-based knowledge be discarded before applying supervised classification methods? 6
Material: We used 3677 relevés from Catalonia (Spain), belonging to eight orders of terrestrial 7
vegetation. These relevés were classified by experts into 222 low-level units (associations or 8
subassociations). 9
Methods: We reproduced low-level expert-defined vegetation units as independent fuzzy clusters 10
by using the Possibilistic C-means algorithm. Those relevés detected as transitional between 11
vegetation types were excluded in order to maximize the number of units numerically 12
reproduced. Cluster centroids were then considered static and used to perform supervised 13
classifications of vegetation data. Finally, we evaluated the classifier’s ability to correctly 14
identify the unit of both typical (i.e. training) and transitional relevés. 15
Results: Only 166 out of 222 (75%) of the original units could be numerically reproduced. 16
Almost all the unrecognized units were subassociations. Among the original relevés, 61% were 17
deemed transitional or untypical. Typical relevés were correctly identified 95% of times, while 18
the efficiency of the classifier on transitional data was only 64%. However, if the second 19
2
classifier’s choice was also considered the rate of correct classification for transitional relevés 20
was 80%. 21
Conclusions: Our approach stresses the transitional nature of relevé data coming from vegetation 22
databases. Relevé selection is justified in order to adequately represent the vegetation concepts 23
associated to expert-defined units. 24
Keywords: Fuzzy sets; Expert systems; Possibilistic C-means; Phytosociological data; 25
Syntaxonomy. 26
Introduction 27
During recent years, there has been a renewed interest in vegetation classification, even in 28
parts of the world with little phytosociological tradition (e.g. Rodwell et al. 1995, Jennings 2003). 29
Nature managers are in need of consistent systems of vegetation classification. Indeed, assigning 30
a meaningful vegetation type to the plant community observed in a sampling site is the first step 31
in applied ecological studies, such as landscape mapping, vegetation conservation or restoration 32
planning. Such assignment (i.e. the determination of the community type) would be a simpler 33
task if the identification of possible types was done through the use of remote expert systems of 34
vegetation classification (Noble 1987). Up to date, there is no expert system specially designed 35
for providing web-based vegetation classification services on the basis of species 36
composition/abundance. Nevertheless, several local computer programs are already available for 37
this purpose (van Tongeren 1986, Hill 1996, Pot 1997, Tichý 2002, van Tongeren et al. 2008), 38
and Czech vegetation scientists distribute expert system configurations to be used locally within 39
the JUICE program (Chytrý 2007). 40
Methodologically speaking, the act of identifying the predefined class or classes to which a 41
given plant community may be assigned is usually called supervised classification. Standard 42
statistical tools such as quadratic discriminant analysis (Ejrnæs et al. 2004) and specially artificial 43
3
neural networks (Cerná & Chytrý 2005) have recently been advocated as efficient 44
methodological approaches for the identification of plot data. Simpler but more easily 45
interpretable approaches consist in calculating resemblance values between the target relevé and 46
each of the predefined vegetation units. After that, the relevé is identified with the nearest unit(s). 47
Relevé resemblance computation may be performed by combining information from species 48
composition, abundance values, and/or the presence of diagnostic species (van Tongeren 1986, 49
Hill 1989, Kocí et al. 2003, Tichý 2005, van Tongeren et al. 2008). Another approach consists in 50
identifying potential units by progressing downward from higher to lower hierarchical levels (Pot 51
1997). In either the case, a classifier is developed from a training data set of plot observations 52
whose classification is previously known and is assumed to be valid. Such assumption can be a 53
source of problems in expert domains where it does not hold, or when there is no consensus on 54
the classification of the training set. Traditional expert-based vegetation classifications usually 55
suffer from several inconsistencies (i.e. different researchers used variable and sometimes not 56
explicitly stated classification criteria) and/or contain loosely defined units (i.e. plant 57
communities defined by the occurrence, dominance, or absence of particular species). Under this 58
scenario supervised classification methods may spread potentially wrong knowledge if traditional 59
expert-defined classifications are not previously validated using a common classification 60
criterion. Since contemporary vegetation scientists are increasingly using numerical clustering 61
(i.e. unsupervised) methods to derive new vegetation units (Mucina & van der Maarel 1989, 62
Mucina 1997), they should also be used to review traditional classifications. However, note that 63
current conservation policies, like those of the Natura 2000 networking program, are based on 64
habitat definitions (e.g. the CORINE biotopes manual, Devillers et al. 1991), which in turn rely 65
on traditional phytosociological units. Therefore drastic changes in regional/national vegetation 66
classifications can be problematic and should be avoided. Even if traditional vegetation units are 67
considered valid, we believe the classification criterion of supervised classifications should be 68
4
congruent with the one used in the original classification of the training data set. Otherwise, 69
either the efficiency and/or interpretation of results may be affected. This explains why 70
supervised approaches emulating traditional phytosociological concepts perform better when the 71
expert classification of the training set is used instead of that resulting from numerical clustering 72
analyses (e.g. van Tongeren et al. 2008). 73
The aim of this paper is to propose a methodological framework for translating low-level 74
expert-defined vegetation units into an automatic vegetation identifier. It consists of two main 75
steps. First, we use Possibilistic C-means, a fuzzy unsupervised classification method, to 76
reproduce expert-defined vegetation units. Second, clusters centroids resulting from the first step 77
are used to identify new observations by means of a fuzzy classifier. We use the traditional 78
phytosociological classification of the Catalan vegetation to build numerical clusters and to 79
evaluate the classifier’s ability to provide satisfactory answers at the precision level of 80
association. 81
Material and Methods 82
Data sets and data transformations 83
We took the traditional phytosociological classification of terrestrial vegetation in Catalonia, 84
northeast of Spain. In order to span a broad range of vegetation types, we considered eight 85
syntaxonomical orders (see Table 1), which include different types of grasslands, shrublands and 86
forests. For each order we compiled all relevés from those phytosociological associations 87
containing at least 3 representatives. Relevés were drawn from the Biodiversity Data Bank of 88
Catalonia (Font 2008). Original authors of the relevés had assigned them to associations or 89
subassociations that were fitted into the syntaxonomical classification made by Bolòs & Vigo 90
(1984). Only Brometalia erecti grasslands had undergone a numerical revision, based on 91
correspondence analyses, of the original expert-based classification (Font 1993). We did not 92
5
perform any stratified re-sampling (Knollová et al. 2005) neither an elimination of those relevés 93
with unusually small or large plot sizes (Otýpková & Chytrý 2006). Relevé compilation resulted 94
in eight distinct datasets, one corresponding to each order. Taken together, we considered 3677 95
relevés, which belong to 222 distinct low-level (i.e. association or subassociation) vegetation 96
units. These vegetation types were the expert knowledge to be validated and emulated by means 97
of numerical methods. 98
Species nomenclature was homogenized using a regional flora (Bolòs et al. 1990). Unsure 99
plant determinations, determinations not reaching the species level and taxon names not 100
appearing in the flora were eliminated. Although they are not consistently reported, we kept 101
cryptogam records because they are diagnostic for some vegetation units. Braun-Blanquet cover-102
abundance values were first transformed to the nine-degree ordinal scale (van der Maarel 1979). 103
We then applied the Hellinger transformation (Legendre & Gallagher 2001). The Hellinger 104
distance (Rao 1995, Legendre & Legendre 1998) is equal to the chord distance (Orlóci 1967) 105
computed after taking the square root of the abundance values. The multivariate space provided 106
by the Hellinger distance was used to define numerical clusters reproducing expert-defined 107
vegetation units. 108
Cluster model 109
In the opinion of many vegetation scientists, vegetation types are not crisp classes but types 110
that are conceptually vague and fuzzy (e.g. Dale 1988, Moraczewski 1993, Willner 2006). 111
Therefore, any numerical classification of vegetation should allow some degree of overlap and 112
even allow leaving some relevés unclassified. Setting a hierarchical tree or a partition (either 113
fuzzy or crisp) as classification model seemed excessively constraining to us. In addition, we 114
wanted a cluster model where new clusters could be defined without changing all those clusters 115
previously defined. Due to these two reasons, we turned our attention to the Possibilistic C-116
6
Means algorithm (PCM, Krishnapuram & Keller 1993, 1996), which implements a clustering 117
model where clusters are both fuzzy and independent. PCM algorithm originated from Fuzzy C-118
means (FCM, Bezdek 1981) an unsupervised partitive clustering procedure well-known among 119
vegetation scientists (Marsili-Libelli 1989, Mucina 1997). Table 2 summarizes the mathematical 120
differences between PCM and FCM models. In the possibilistic approach, fuzzy membership 121
values are not relative (i.e. probabilistic) as in FCM, but are interpreted absolute cluster 122
typicalities. Cluster independence is obtained because the partition constrain of FCM is 123
eliminated. That is, for any object the sum of its possibilistic membership values does not have to 124
be one. Resulting from this fundamental difference, PCM is a mode-seeking algorithm. That is, in 125
PCM each vegetation cluster corresponds to a dense region in the multivariate space of relations 126
between plots. A single PCM run can be regarded as c independent runs of an algorithm looking 127
for a single cluster (Davé & Krishnapuram 1997). The PCM model solves the FCM problem 128
raised by Dale (1995), consisting on the possible data contamination resulting from types not 129
well represented and whose centre lies outside the available data. Fig. 1 further illustrates the 130
differences between the two models, by showing their corresponding results on relevé data from 131
three xerophytic grassland associations. 132
Reproduction of traditional units into numerical clusters 133
Whenever possible, we create one possibilistic fuzzy cluster for each traditional low-level 134
vegetation unit (syntaxonomical association or subassociation). One additional advantages of 135
PCM over FCM is that it avoids the need of specifying the number of clusters to be sought. 136
Instead, distinct clusters are permitted as long as they represent distinguishable dense regions of 137
the multivariate space. In our case, we considered two clusters as distinguishable when their 138
amount of overlap was less than 10% (see below). We used this criterion to detect poorly defined 139
vegetation units. Moreover, relevé databases may be plagued with noisy and transitional plot 140
7
data. Including indiscriminately all the available relevés would preclude the PCM algorithm from 141
distinguishing many expert-defined units. Therefore some relevés were discarded during the 142
reproduction process. 143
The following steps were performed for each of the eight datasets: We started by taking those 144
relevés belonging to the first low-level vegetation unit. The one-cluster PCM algorithm was then 145
run on this initial training relevé set, using the three closest relevés as starting cluster members. 146
The fuzziness exponent was set to m = 1.03, which is a rather crisp value but allowed higher 147
sensitivity of the algorithm. The PCM cluster size parameter (ηi in Table 2) was then 148
progressively augmented in order to make the cluster grow. This was done by using a method 149
described in De Cáceres et al. (2006), which allows finding appropriate PCM cluster sizes. Once 150
grown, the relevés showing very low membership values (i.e. with uij < 0.0001) were excluded 151
from the training data set and stored in a set of transitional (non-typical) relevés. The final cluster 152
configuration was also stored. After reproducing this first unit, the relevés belonging to the next 153
vegetation unit were included in the training relevé set, and we let the previously defined PCM 154
cluster(s) “react” to the newly added relevés by running the PCM algorithm from its last 155
configuration, also allowing for changes in the cluster size parameter (De Cáceres et al. 2006). It 156
could happen that some of the newly added relevés become members (i.e. with a possibilistic 157
fuzzy membership uij > 0.1) of any of the previous cluster(s). In this case, those relevés were 158
deemed transitional, and they were also excluded from the training set and stored in the 159
transitional set. We then reloaded the stored cluster configuration(s) and the “reacting” process 160
was rerun without the noisy transitional relevés. This was repeated until all previously-defined 161
PCM cluster(s) were stable to the new relevés. Note that this process of discarding relevés could 162
leave a given vegetation unit without enough relevés for being translated into a numerical cluster. 163
If enough relevés were left, we used the three closest relevés as starting cluster members for a 164
8
new PCM cluster, which was grown as described above. Any given PCM cluster reproducing a 165
low-level expert unit was only accepted whenever it fulfilled the following three conditions: (a) 166
The sum of membership values for the fuzzy cluster set (i.e. its cardinality) was equal or greater 167
than 3; (b) all relevés with a membership value for the fuzzy cluster above 0.1 had been classified 168
by experts into the same vegetation type (this condition ensured that the PCM cluster represented 169
the proper vegetation concept); and (c) the proportion of overlap between the fuzzy cluster and 170
any of the remaining PCM clusters was lower than 10%. Cluster overlap between any pair of 171
clusters was calculated as the cardinality of the fuzzy intersection set divided by the cardinality of 172
the fuzzy union set. Whenever a cluster failed to be accepted, a distinct set of three relevés was 173
used as starting cluster configuration. The steps above-described were iteratively repeated until 174
all the traditional low-level vegetation units had been considered. Subassociations were given 175
priority over associations as units to be reproduced. This algorithm yielded three sets: (1) a final 176
training set, made of typical relevés only (this is hereafter also referred to as the typical relevé 177
set); (2) a transitional set, containing those relevés that were outliers or similar to more than one 178
numerical cluster; and (3) a set of PCM fuzzy clusters corresponding to reproduced expert-179
defined vegetation units. 180
Supervised classification of relevés 181
We used the probabilistic approach of FCM to perform supervised classifications. In order to 182
use this unsupervised method in a supervised mode, the centroid coordinates for each of the fuzzy 183
clusters must be considered static (but see the leave-one-out procedure below). Supervised FCM 184
classification of any relevé j was performed in two simple steps: (1) Compute eij, the distance 185
between the relevé j and each fixed cluster centroid i; and (2) Compute uij, the relevé fuzzy 186
membership to each cluster i, by using the FCM membership function (eq. 1 in Table 2). We set 187
9
the fuzziness exponent to m = 1.2 in this case, as recommended by several authors (e.g. Marsili-188
Libelli 1989, Podani 1990, Escudero & Pajarón 1994). 189
Evaluation of the classifier 190
Our objective was to assess the performance of the classifier by measuring its rate of correct 191
identification at the precision level of association. If a given association (and its possible 192
subassociations) had not been reproduced, it was not represented in the set of fuzzy clusters. 193
Hence, its relevés could not be used to evaluate the classifier’s performance. However, if some 194
subassociations of an association or the association itself had been reproduced, then all its 195
subassociations were considered to be represented because in this case the classifier was capable 196
of returning a correct answer at the level of association. 197
Both typical and transitional relevé sets were used for the evaluation of the classifier. Since 198
relevés of the transitional set had been discarded in the definition of PCM clusters, they could be 199
used as a test set. However, relevés of the typical (training) set exerted an attraction on the 200
centroids, and thus their re-classification was biased. Aiming to remove this bias, we used a 201
leave-one-out crossvalidation procedure. For each training relevé to be classified we temporarily 202
removed it from the training set and the PCM clusters were allowed to “react” as explained 203
above. After this step, identification could be done without the influence of the target relevé on 204
cluster centroids. 205
The classifier responses were homogenized at the level of association. For each represented 206
association within each order we estimated the sensitivity and positive predictive power of the 207
classifier (see Cerná & Chytrý 2005 for details). We also calculated rates of correct association 208
identification for each of the eight datasets, and for all datasets taken together. In order to gain 209
10
more detailed information on the classifier’s performance, we repeated this efficiency assessment 210
also taking into account the second choice as an additional source of correct identification. 211
Results 212
Reproduction of traditional units 213
Among the 222 original low-level units, 166 (75%) could be numerically reproduced using 214
strategy described (see Table 1). Only two of the 56 non-reproduced units were associations. The 215
remaining 54 non-reproduced units were subassociations, which means that in all these cases 216
other subassociations of the same association could be reproduced. Approximately 39% of the 217
original relevés were finally kept in the training set, but this percentage varied from 31% (for 218
Fagetalia beech forests) to 57% (for Galio-Alliarietalia megaforb communities). Hence, nearly 219
25% of the expert-defined vegetation units and 61% of the relevés can be considered of 220
transitional nature following our cluster building criteria. 221
Performance of the vegetation classifier 222
The two non-reproduced associations accounted for 27 relevés. The remaining 3650 relevés 223
belonged to associations represented in the classifier, so they were used to assess its performance. 224
We report detailed result tables on the sensitivity and positive predictive power for each 225
association in App. 1. We show in Table 3 the rates of correct identification computed for the 226
eight datasets independently and altogether. The overall rate of correct association identification 227
for the typical relevés was very high: 95% of relevés were classified into the correct association 228
in the first choice, and 99% taking into account the first and second choices of the classifier (see 229
Table 3). This high rate of success is not surprising, since the relevés of this set were those 230
which, by definition, were closest to cluster centroids. In contrast, the classifier identified the 231
correct association for 64% of the relevés of the transitional set. Nevertheless, if we take into 232
account the transitional nature of these relevés, the percentage of correct identification using the 233
11
first and second choices may be a more realistic measure of performance. Over all 234
phytosociological orders, this latter percentage was 79.5%. Identification of beech forests 235
(Fagetalia sylvaticae) was the least successful (66%) and that of Quercus ilex forests and related 236
communities (Quercetalia ilicis) the most successful (89.3%). When considering both typical and 237
transitional relevé, the estimated overall efficiency of the classifier was 76.3% of correct 238
identification on first choice, and 86.9% considering also the classifier’s second choice. 239
Discussion 240
Reproduction of traditional classifications 241
Several attempts of reproduction of traditional vegetation classifications usually forced the 242
reproduction of all expert-defined units into the classifier (e.g. van Tongeren 1986, Hill 1989, van 243
Tongeren et al. 2008). In the case of Kocí et al. (2003), the use of the Cocktail algorithm 244
(Bruelheide 2000) allowed excluding poorly differentiated units, but their approach was still 245
essentially expert-based (Chytrý 2007). Going a step further, we stressed here the necessity of 246
validating traditional vegetation units through the use of an unsupervised clustering method. 247
Although we tried to maximize the amount of vegetation types that could be numerically 248
reproduced, 25% of the original low-level units turned out to be impossible to stand. 249
Subassociations turned out to be more difficult to reproduce because many of them are 250
traditionally defined as a subclass of an association that shows a tendency towards an 251
ecologically neighbouring association (in other words, they are transitional). 252
Moreover, in previous approaches relevé identification was usually performed using 253
assignment rules that were different from the rules originally used in the classification of training 254
data (e.g. Kocí et al. 2003, Tichý 2005, van Tongeren et al. 2008). We preferred to use the 255
resemblance in species abundance values only, as a simple common criterion for both 256
unsupervised and supervised classification. Not using Cocktail’s species groups but overall 257
12
species composition has the advantage that it allows reproducing units lacking differential species 258
(i.e. ‘basal’ or ‘central’ communities). However, the classifier is not expected to provide accurate 259
results with such units due to their high variability and amount of transitional relevés. 260
Performance of the vegetation classifier 261
Whereas inconsistency in the original classification methods can be avoided by applying 262
numerical clustering, it reappears when attempting to evaluate the efficiency of the classifier 263
because the reference classification is expert-based. That is, the precision in the original 264
assignments may be affecting the percentages of successful identification. In addition, relevés 265
belonging to transitional subassociations were more difficult to classify correctly than relevés 266
belonging to reproduced vegetation units (even if both were represented at the level of 267
association). This occurred because the classifier lacked centroids to represent these units and 268
hence its relevés were assigned to one of the neighbouring units. The high number of 269
unrecognized subassociations in Fagetalia beech forests (see Table 1) may account for the low 270
classifier results on this data set (Table 2). There are other possible sources of low supervised 271
classification efficiency, derived from inconsistencies in the sampling methods that different 272
authors use. Otýpková & Chytrý (2006) showed that smaller plots tend to produce less stable 273
ordinations in data sets of low beta diversity. The lecture of their findings in terms of 274
classification is that relevés from small plots may be easily misclassified because of their higher 275
degree of variability both in species presence and abundance. The same reasoning may be applied 276
to the inconsistent recording of cryptogams. 277
Sampling and the appropriate representation of vegetation types 278
We carefully selected the relevés included in the training set, which certainly is a critical point 279
in our approach and must be justified. Statistically speaking, such relevé selection is still a 280
subjective decision that completely biases sampling and precludes any inference on the validity 281
13
of groups. Hence, one cannot expect to accurately reflect the real patterns of vegetation. 282
Moreover, Cerná & Chytrý (2005) found that selecting plots with diagnostic species as training 283
set resulted in lower efficiency of neural network classifiers compared to using a randomly 284
selected training set. Nevertheless, nowadays vegetation scientists generally agree that vegetation 285
is mainly of continuous nature. Therefore, as long as an optimal vegetation sampling theory is 286
lacking, statistical inference on clustering results will remain a delicate subject (e.g. Rolecek et 287
al. 2007). Meanwhile, vegetation classification should not aim at discovering true vegetation 288
types, but should provide a knowledge basis for performing applied ecological studies. Having 289
this in mind, we considered more important to keep the vegetation concept to be reproduced very 290
clear. We set a specific point in the multivariate space (i.e. the cluster centroid) as the 291
representative of the expert-defined unit. Not including transitional relevés into the centroid 292
definition helped in keeping it as an ideal type. Ensuring that the nomenclatural type relevé (if 293
available) shows a high membership to the unit would be a way to allow using the syntaxon name 294
for the fuzzy cluster. 295
Limitations of the numerical cluster model 296
Note that our numerical cluster model assumes roughly spherical clusters, both when building 297
PCM clusters and when executing the FCM classifier. One of Dale’s (1995) criticisms to FCM 298
was its inability to cope with non-spherical cluster shapes. Although it is possible allow 299
hyperellipsoidal clusters in FCM and PCM algorithms (Krishnapuram & Keller 1993), by taking 300
into account the cluster variance-covariance matrix. Another limitation of our approach is that 301
FCM membership function works better with clusters of similar size. PCM typicality function 302
may be used instead, but at the expense of obtaining values which cannot be interpreted as 303
probabilities. 304
Final remarks and future work 305
14
In our opinion, vegetation scientists should decide whether they would prefer: (1) a vegetation 306
classifier designed as an interface to communicate expert vegetation knowledge to non-experts; 307
or (2) a computer program like the former, but which could also promote the revision of the 308
expert knowledge itself. In the first case the program would simply run supervised classification 309
methods from a knowledge that would be assumed to be true. In contrast, in the second case the 310
system would allow doubting expert knowledge, and even changing his point of view. We 311
believed this second model was more flexible and promising. We implemented our proposals in a 312
set of related computer programs called Araucaria (see App. 2 and 313
http://biodiver.bio.ub.es/vegana/araucaria). One of them allows experts to feed the classifier with 314
new plot data, and see how the current set of PCM clusters “reacts” to this new information. 315
Regarding future developments, we strongly believe that a comparison of vegetation 316
classification methodologies is necessary, not only in terms of efficiency but also aiming a 317
unification of traditional and numerical approaches. Since vegetation classifications are 318
regionally restricted, studying solutions for biogeographical issues (e.g. vicariant units) would be 319
another interesting research topic. Nevertheless, large-scale vegetation expert systems (say valid 320
for all Europe) will certainly be difficult to develop. 321
Acknowledgements 322
We would like to thank Lubomir Tichý and an anonymous reviewer for their very useful 323
comments on a previous version of this manuscript. This study was supported by a Ph.D. grant 324
awarded by the “Comissionat per a Universitats i Recerca” (1999SGR00059), of the 325
“Departament d’Universitats, Recerca i Societat de la Informació de la Generalitat de Catalunya” 326
(2001 FI 00269), and by a research project from the Spanish “Ministerio de Educación y Ciencia” 327
(CGL2006-13421-C04-01/BOS). 328
15
References 329
Bezdek, J. C. 1981. Pattern recognition with fuzzy objective functions. Plenum Press, New York. 330
Bolòs, O. de & Vigo, J. 1984. Flora dels Països Catalans. Vol. 1. Ed. Barcino, Barcelona. 331
Bolòs, O. de, Vigo, J., Masalles, R. M. & Ninot, J. M. 1990. Flora Manual dels Països Catalans. 332
2nd ed. Pòrtic, Barcelona. 333
Braun-Blanquet, J. 1964. Pflanzensoziologie: Grundzüge der Vegetationskunde. Springer. 334
Bruelheide, H. 2000. A new measure of fidelity and its application to defining species groups. 335
Journal of Vegetation Science 11(2): 167-178. 336
Cerná, L. & Chytrý., M. 2005. Supervised classification of plant communities with artificial 337
neural networks. Journal of Vegetation Science 16: 407-414. 338
Chytrý., M. (ed.) 2007. Vegetation of the Czech Republic. 1. Grassland and Heathland 339
Vegetation, Academia, Praha, 525 pp. 340
http://www.sci.muni.cz/botany/vegsci/expertni_system.php?lang=en 341
Dale, M. B. 1988. Some fuzzy approaches to phytosociology. Ideals and instances. Folia 342
geobotanica et phytotaxonomica 23: 239-274. 343
Dale, M. B. 1995. Evaluating classification strategies. Journal of Vegetation Science 6:437-440. 344
Davé, R. N. & Krishnapuram, R. 1997. Robust clustering methods: a unified view. IEEE 345
transactions on fuzzy systems 5: 270-293. 346
De Cáceres, M., Oliva, F. & Font, X. 2006. On relational possibilistic clustering. Pattern 347
recognition 39: 2010-2024. 348
Devillers, P., Devillers-Terschuren, J. & Ledant, J.-P. (1991). CORINE biotopes manual. 349
Habitats of the European Community. A method to identify and describe consistently sites 350
of major importance for nature conservation. Data specifications - Part 2. Office for 351
Official Publications of the European Communities. Luxembourg. 352
16
Ejrnæs, R., Bruun, H. H., Aude, E. & Buchwald, E. 2004. Developing a classifier for the Habitats 353
Directive grassland types in Denmark using species lists for prediction. Applied 354
Vegetation Science 7: 71-80. 355
Escudero, A. & Pajarón, S. 1994. Numerical syntaxonomy of the Asplenietalia petrarchae in the 356
Iberian Peninsula. Journal of Vegetation Science 5: 205-214. 357
Font, X. 1993. Estudis geobotànics sobre els prats xeròfils de l’estatge montà dels pirineus. 358
Institut d’Estudis Catalans, Barcelona, ES. 359
Font, X. 2008. Mòdul Flora i Vegetació. Banc de Dades de Biodiversitat de Catalunya. 360
Generalitat de Catalunya i Universitat de Barcelona. 361
http://biodiver.bio.ub.es/biocat/homepage.html 362
Hill, M. O. 1989. Computerized matching of relevés and association tables, with an application to 363
the British National Vegetation Classification. Vegetatio 83: 187-194. 364
Hill, M. O. 1996. TABLEFIT version 1.0, for identification of vegetation types. Institute of 365
Terrestrial Ecology, Huntingdon, UK. 366
Jennings, M. 2003. Guidelines for Describing Associations and Alliances of the US National 367
Vegetation Classification. Ecological Society of America. 368
Knollová, I., Chytrý, M., Tichý, L. & Hajek, O. 2005. Stratified resampling of phytosociological 369
databases: some strategies for obtaining more representative data sets for classification 370
studies. Journal of Vegetation Science 16: 479-486. 371
Kocí, M., Chytrý, M. & Tichý, L. 2003. Formalized reproduction of an expert-based 372
phytosociological classification: A case study of subalpine tall-forb vegetation. Journal of 373
Vegetation Science 14: 601-610. 374
Krishnapuram, R., & J. M. Keller. 1993. A possibilistic approach to clustering. IEEE 375
transactions on fuzzy systems 1: 98-110. 376
17
Krishnapuram, R. & Keller, J. M. 1996. The possibilistic c-means algorithm: Insights and 377
recommendations. IEEE transactions on fuzzy systems 4: 385-393. 378
Legendre, P. & Gallagher, E. D. 2001. Ecologically meaningful transformations for ordination of 379
species data. Oecologia 129: 271-280. 380
Legendre, P., & Legendre, L. 1998. Numerical Ecology. 2nd english ed. Elsevier. 381
Marsili-Libelli, S. 1989. Fuzzy clustering of ecological data. Coenoses 4: 95-106. 382
Moraczewski, I. R. 1993. Fuzzy logic for phytosociology: 1. Syntaxa as vague concepts. 383
Vegetatio 106: 1-11. 384
Mucina, L. 1997. Classification of vegetation: Past, present and future. Journal of Vegetation 385
Science 8: 751-760. 386
Mucina, L. & van der Maarel, E. 1989. Twenty years of numerical syntaxonomy. Vegetatio 81: 387
1-15. 388
Noble, I. R. 1987. The role of expert systems in vegetation science. Vegetatio 69: 115-121. 389
Orlóci, L. 1967. An agglomerative method for classification of plant comunities. Journal of 390
Ecology 55: 193-206. 391
Otýpková, Z. & Chytrý, M. 2006. Effects of plot size on the ordination of vegetation samples. 392
Journal of Vegetation Science 17: 465-472. 393
Podani, J. 1990. Comparison of fuzzy classifications. Coenoses 5: 17-21. 394
Pot, R. 1997. SYNDIAT, SYNtaxonomical DIAgnostics Tool, a computer program based on the 395
deductive method of community identification. Acta Botanica Neerlandica 46: 230. 396
Rao, C. R. 1995. A review of canonical coordinates and an alternative to correspondence analysis 397
using Hellinger distance. Qüestiió (Quaderns d'Estadistica i Investivació Operativa) 19: 398
23-63. 399
Rodwell, J. S., Pignatti, S., Mucina, L. & Schaminée, J. H. J. 1995. European Vegetation Survey: 400
update on progress. Journal of Vegetation Science 6: 759-762. 401
18
Rolecek, J., Chytrý, M., Háyek, M., Lvoncik, S. & Tichý, L. 2007. Sampling in large-scale 402
vegetation studies: Do not sacrifice ecological thinking to statistical puritanism. Folia 403
Geobotanica 42: 199-208. 404
Tichý, L. 2002. JUICE, software for vegetation classification. Journal of Vegetation Science 13: 405
451-453. 406
Tichý, L. 2005. New similarity indices for the assignment of relevés to the vegetation units of an 407
existing phytosociological classification. Plant Ecology 179: 67-72. 408
van der Maarel, E. 1979. Transformation of cover-abundance values in phytosociology and its 409
efects on community similarity. Vegetatio 39: 97-114. 410
van Tongeren, O. 1986. FLEXCLUS, an interactive program for classification and tabulation of 411
ecological data. Acta Botanica Neerlandica 35: 137-142. 412
van Tongeren, O., Gremmen, N., & Hennekens, S. M. 2008. Assignment of relevés to predefined 413
classes by supervised clustering of plant communities using a new composite index. 414
Journal of Vegetation Science 19: 525-536. 415
Willner, W. 2006. The association concept revisited. Phytocoenologia 36: 67-76. 416
417
19
Table 1. The eight phytosociological orders studied and results of the numerical reproduction of 417
their low-level classification. 418
Phytosociological order Short description
Ori
gin
al u
nit
s
Ori
gin
al re
levés
Rep
rod
uced
un
its
No
n-r
ep
rod
uced
un
its
Tra
inin
g (
typ
ical)
rel.
Brometalia erecti mesophytic or slightly xerophytic pastures 30 531 26 4 231
Origanetalia vulgaris herb communities growing on forest fringes 12 133 10 2 67
Galio-Alliarietalia megaforb sciophilous communities 13 124 12 1 71
Prunetalia spinosae shrub communities growing on decideous forest fringes 18 353 16 2 161
Populetalia albae riverine meso-macroforests growing on wet fluvisols with high water-table 17 199 10 7 107
Quercetalia ilicis mediterranean woodlands, scrublands and maquis 31 753 25 6 254
Quercetalia pubescentis submediterranean decideous oak woodlands 41 651 30 11 243
Fagetalia sylvaticae beech forests 60 933 37 23 286
Total 222 3677 166 56 1420 419
420
Table 2: Main mathematical characteristics of the Fuzzy C-means (FCM) and Possibilistic C-421
means (PCM) clustering algorithms. 422
FCM PCM
Fuzzy membership
definition 1
1=! =
c
i iju for all objects j = 1, ..., n 0
1>! =
c
i iju for all objects j = 1, ..., n
Optimisation function !!= =
=c
i
n
j
ij
m
ijFCM euJ1 1
2)( ! !!!= == =
"+=c
i
n
j
m
iji
c
i
n
j
ij
m
ijPCM ueuJ1 11 1
2 )1()( #
Membership function !=
"=
c
l
m
ljijij eeu1
)1/(2)/(/1 (1) ))/(1/(1 )1/(12 !+=
m
iijijeu " (2)
423
424
20
Table 3. Classification efficiency of the numerical classifier at the association level. Column 424
blocks list the efficiency on the typical and transitional relevé sets, as well as the overall 425
efficiency for the represented associations. Ass.: Number of represented associations. %: 426
Percentage of relevés correctly classified; L/U: Lower/upper 95% confidence limits following the 427
binomial distribution. 428
Phytosociological order Ass. Rel. % L U % L U Rel. % L U % L U Rel. % L U % L U
Brometalia erecti 20 231 97.4 94.4 99.0 99.1 96.9 99.9 285 68.8 63.5 74.6 85.6 81.1 89.6 516 81.6 78.4 85.2 91.7 89.1 94.0
Origanetalia vulgaris 10 67 92.5 83.4 97.5 100.0 94.6 100.0 66 39.4 27.6 52.2 78.8 67.0 87.9 133 66.2 57.5 74.1 89.5 83.0 94.1
Galio-Alliarietalia 11 71 94.4 86.2 98.4 97.2 90.2 99.7 53 56.6 42.3 70.2 73.6 59.7 84.7 124 78.2 69.9 85.1 87.1 79.9 92.4
Prunetalia spinosae 9 161 96.3 92.1 98.6 98.8 95.6 99.8 192 72.9 66.0 79.1 85.9 80.2 90.5 353 83.6 79.3 87.3 91.8 88.4 94.4
Populetalia albae 7 107 92.5 85.8 96.7 94.4 88.2 97.9 92 64.1 53.5 73.9 82.6 73.3 89.7 199 79.4 73.1 84.8 88.9 83.7 92.9
Quercetalia ilicis 13 254 99.2 97.2 99.9 99.2 97.2 99.9 487 80.9 79.4 86.7 89.3 88.2 93.8 741 87.2 86.7 91.5 92.7 92.2 95.9
Quercetalia pubescentis 10 243 90.5 86.1 93.9 98.8 96.4 99.7 408 65.7 60.9 70.3 82.1 78.0 85.7 651 75.0 71.4 78.2 88.3 85.6 90.7
Fagetalia sylvaticae 22 286 96.2 93.2 98.1 99.0 97.0 99.8 647 49.0 45.1 52.9 66.0 62.2 69.6 933 63.5 60.3 66.5 76.1 73.2 78.8
Total 102 1420 95.4 94.2 96.4 98.6 97.8 99.1 2230 64.1 62.1 66.2 79.5 77.9 81.3 3650 76.3 75.1 77.9 86.9 86.0 88.2
1st/2nd choice1st choice
TransitionalTypical Represented
1st/2nd choice 1st choice 1st/2nd choice1st choice
429
430
431
21
Fig. 1: Example of clustering results of FCM and PCM on relevés belonging to three grassland 431
associations of Brometalia erecti. (a) Classical multidimensional scaling coordinates from Bray-432
Curtis distances, with the original vegetation units labelled using different symbols (filled circles: 433
Koelerio-Avenuletum ibericae; squares: Adonido-Brometum erecti; diamonds: Lino viscosi-434
Brometum erecti; empty circles: intermediate artificial relevés created by averaging randomly-435
selected relevés from the three groups). (b) FCM (m=1.2) solution with three groups. (c) PCM 436
(m=1.09) solution with three groups, after setting appropriate reference distance parameters as 437
described in De Cáceres et al. (2006). Symbol size and colour intensity are function of the 438
object’s largest membership value. 439
440 <Figure Files (print size should be around 5x5 cm each) > 441 442 <JVS 5795 Fig.1A.tiff> 443 <JVS 5795 Fig.1B.tiff> 444 <JVS 5795 Fig.1C.tiff> 445 446