Download - Classification
-
Classification(Dis)similarity measures, Resemblance functionsCluster analysisTWINSPAN
-
Similarity measuresEach ordination or classification method is based (explicitly or implicitly) on some similarity measure(Two possible formulations of ordination problem)
-
Similarities (dissimilarities, distances)Resemblance functions (the term includes both similarities and dissimilarities)If 0 S 1, then often D = 1 S nebo D = (1 S) nebo D = (1 S2)Different indices are usually used for sample similarity than for species similaritySimilarity of two cases (samples) has a meaning by itself: similarity of two species has meaning only in relation to the data set.Species set is fixed (e.g. all vascular plant species), cases are usually random selection from a population of sites
-
Distance should fulfill the triangular inequality
ACBAB < AC + BC
-
Resemblance functionsProbably hundreds were proposed and tens are used
We compare:cases - Qspecies RData typePresence/absence (0 / 1)Srensen coefficientJaccard coefficientPearson f (V) coeff.Yule (Q) coefficientQuantitativeEuclidean distancec2 distancePercentage similaritycorrelation coefficientsc2 distance
-
Case similarity based on qualitative data
SrensenJacquardd - number of species absent in both cases (usually not used) - consequently, the value is independent of other cases in a table
_1032286875.unknown
Species in sample B
+
-
Species in sample A
+
a
b
-
c
d
Therefore, a is the number of species present in both compared
-
Species similarity based on presence absenced - number of cases without both species - absolutely necessary
Species B
+
-
Species A
+
a
b
-
c
d
_1032287198.unknown
Table A
Table B
50
50
50
50
50
1000
50
5
-
Species vs. case similaritySpecies similarity (i.e. similarity of species ecological behavior, e.g. V, Q) often scaled from -1 to 1. Null model means independence of the species, and in this case V=Q=0.Case similarity (S, J), usually scaled from 0 (no common species) to 1 (identical species composition). No null model available. (or better, no meaningfull null model available; compare random selection of two sets of species from species pool?)
-
Transformation is an algebraic function Xij=f(Xij) which is applied independently of the other values. Standardization is done either with respect to the values of other species in the case (standardization by cases) or with respect to the values of the species in other cases (standardization by species).
Quantitative dataCentering means the subtraction of a mean so that the resulting variable (species) or case has amean of zero. Standardization usually means division of each value by the case (species) norm or by the total of all the values in a case (species).
-
Ordinal transformation of Br.-Bl. scale is roughly equivalent to log transformation of the cover values
Br.-Bl. scale
Ordinal tr
Cover
(Bannister 1966)
log(Cover+1)
r
1
0.1
0.04139
+
2
0.5
0.17609
1
3
3
0.60206
2
4
15
1.20412
3
5
37.5
1.58546
4
6
62.5
1.80277
5
7
87.5
1.94694
_932817547.xls
Sheet: List1
Sheet: List2
Sheet: List3
Sheet: List4
Sheet: List5
Sheet: List6
Sheet: List7
Sheet: List8
Sheet: List9
Sheet: List10
Sheet: List11
Sheet: List12
Sheet: List13
Sheet: List14
Sheet: List15
Sheet: List16
Z-M scale
Ordinal tr
Cover
log(Cover+1)
r
0.04139268515822508
+
0.17609125905568124
0.6020599913279624
1.2041199826559248
1.5854607295085006
1.8027737252919758
1.9469432706978254
-
Euclidean distanceFor ED, standardize by case norm, not by totalThe cases with t contain values standardized by the total, those with n are standardized by case norm. For cases standardized by total, ED12 = 1.41 (2), whereas ED34=0.82, whereas for cases standardized by case norm, ED12=ED34=1.41
CASES
1
2
3
4
1t
2t
3t
4t
1n
2n
3n
4n
Species 1
10
5
1
0.33
1
0.58
Species 2
10
5
1
0.33
1
0.58
Species 3
5
0.33
0.58
Species 4
5
0.33
0.58
Species 5
5
0.33
0.58
Species 6
5
0.33
0.58
Table 61 Hypothetical table with samples 1 and 2 containing one species each and samples 3 and 4, containing three equally abundant species each (for standardized data the actual quantities are not important). 1 has no species in common with 2 and 3 has no species in common with 4. The samples with t contain values standardized by the total, those with n samples standardized by sample norm. For samples standardized by total, ED12 = 1.41 (2), whereas ED34=0.82, whereas for samples standardized by sample norm, ED12=ED34=1.41
-
Percentual similarity (quantitative Srensen)Neither ED, nor PS take into consideration species which are absent in the two compared cases
_1032288756.unknown
-
Similarity of species based on quantitative data
Correlation coefficients (ordinary, rank) i.e. again taking into account the cases where both species are missingNote the implicit double standardization (by both, case and species total) Consequently, the value is changing according to composition of other cases in the total table.
-
Similarity of samples vs. similarity of communitiesInspired be seemingly high beta-diversity of insects in tropics
Population (%)
Sample 1 (indiv.)
Sample 2 (indiv.)
Spec.1
5
3
1
Spec.2
3
1
2
Spec.3
1
0
1
Spec.4
1
0
1
.
1
0
0
.
1
0
0
.
1
1
0
.
1
0
1
.
1
0
0
.
1
0
0
.
1
1
0
.
1
0
1
.
1
0
0
.
1
1
0
.
1
1
1
.
1
0
0
.
0.5
0
0
.
0.5
1
1
.
0.2
0
0
.
0.2
0
0
.
0.1
1
0
.
0.1
0
1
Spec. n
0.1
1
0
Etc.
-
expected number of shared species in two subsamples taken randomly from the second sample.
22Normalized expected shared species =
Normalized expected species similarity XE "Similarity/dissimilarity:NESS" index (NESS, Grassle & Smith 1976)
NESS=
expected number of species in common between two random subsamples of certain size drawn from the two compared larger samples without replacement
expected number of shared species in the two subsamples taken randomly from the first sample
expected number of shared species in two subsamples taken randomly from the second sample.
-
Similarity of objects when variable are measured on various scales
-
Gower distance
-
Standardization of variables by the s.d. or range is sometimes problematic possibility to standardize by variation see Lep et al. 2006
-
Similarity matrices - directly used inMultidimensional scaling (both metric and non-metric see milauer lecture)Mantel test
-
Mantel TestQuestion is there any dependence between two (dis)similarity matrices?e.g. is the distance of individual plants in physical space correlated with their genetic dissimilarity?
-
Individuals in the plotIndiv. No. 5And this individual is strange (just one of five)
Chart3
2
3
1
2
10
Sheet1
plant12345plant12345
1*1*
21.41*21.4142135624*
31.002.24*312.2360679775*
41.001.001.41*4111.4142135624*
512.0410.6312.7311.31*512.041594578810.630145812712.727922061411.313708499*
plant1234
1
20.1
30.20.2
40.10.30.2
50.90.60.70.8
1121.41421356240.1
22310.2
31110.1
42212.04159457880.9
510102.23606797750.2
10.3
10.63014581270.6
1.41421356240.2
12.72792206140.7
11.3137084990.8
Sheet1
Sheet2
-1.05989159060.8446293253
1.2199802275-0.7058465673
1.06228175420.7103778663
-1.9996207825-0.1512558757
-1.83047783720.5238539659
1.5345402398-0.8666588698
-1.34474050880.2088102471
1.7021321480.1906048229
0.9688020291-0.5912676726
-0.80843292280.5780729181
-1.49590884440.2773897066
1.0400263205-0.1768197143
0.8246666018-0.1307615046
1.3989978249-1.1174361321
-0.8562715362-0.1150041484
1.33803836570.4858700445
1.17794757920.3038034096
1.44002656980.7209983088
1.5570785536-0.0547577212
0.8872359987-0.026567813
-0.84725395410.5160180243
-1.00541666930.34133475
-1.62428432420.1405797294
1.05777632890.7708854357
-1.2544518082-0.6853940481
1.3989991567-0.4422902338
1.3129841090.250330837
0.88232915130.3118461145
1.7631552990.8221098683
-0.25989159060.8446293253
0.4199802275-0.7058465673
0.26228175420.7103778663
-1.1996207825-0.1512558757
-1.03047783720.5238539659
0.7345402398-0.8666588698
-0.54474050880.2088102471
0.9021321480.1906048229
0.1688020291-0.5912676726
-0.00843292280.5780729181
-0.69590884440.2773897066
0.2400263205-0.1768197143
0.0246666018-0.1307615046
0.5989978249-1.1174361321
-0.0562715362-0.1150041484
0.53803836570.4858700445
0.37794757920.3038034096
0.64002656980.7209983088
0.7570785536-0.0547577212
0.0872359987-0.026567813
-0.04725395410.5160180243
-0.20541666930.34133475
-0.82428432420.1405797294
0.25777632890.7708854357
-0.4544518082-0.6853940481
0.5989991567-0.4422902338
0.5129841090.250330837
0.08232915130.3118461145
0.9631552990.8221098683
-0.53500700920.617673286
Sheet2
Sheet3
-
Two dissimilarity matricesDistance in the plotGenetic distance
plant12345121.4131.002.2441.001.001.41512.0410.6312.7311.31
plant12345120.130.20.240.10.30.250.90.60.70.8
-
Regression is highly significant(but we have 10 independent observations out of five plants!)And four independent observations out of ten are the largest
-
SolutionPermutation testNot individual distances, but individuals are permuted
-
ClassificationOf historical significance only
(e.g. Association (e.g. TWINSPAN)
analysis)
_1032332534.doc
nonhierarchical hierarchical
(e.g., K-means clustering)
divisive agglomerative
(classical
cluster analysis)
monothetic polythetic
Twinspan
-
Non-hierarchical classificationK-means clustering
In fact, reverse ANOVA ANOVA: F=Msgroup/MSresidualThe goal divide the set into groups to maximize F (multivariate counterpart of F)
-
Hierarchical agglomerative (cluster analysis)
Original data matrix
Species
Similarity matrix
Cases
Samples
Resemblance
Clustering algorithm
Samples
-
Subjective decisions in the objective procedureNevertheless, the procedure is reproducible
_1007129516.doc
Field sampling
importance value
Raw data
transformation, standard-
ization, similarity measure
(Dis)similarity matrix
clustering algorithm
Tree
-
Cluster analysis joiningDistances among objects are in the (dis)similarity matrix. In the hierarchical classification, we need also the distances among clusters....
-
Single linkage (nearest neighbour, representative of short hand) and complete linkage (furthest neighbour, representative of long hand methods)Several other methods, e.g. Ward (minimum dispersion), average linkage most popular, but the term was used for several methods preferred name UPGMA - Unweighted Pair Group Method with Arithmetic mean
A
B
-
Single linkage - > chaining
-
Order does not play a role
-
TWINSPAN Two Way INdicator SPecies ANalysisInvented (by Mark Hill) to search for a pattern in extensive vegetation tablesInspired by classical phytosociologyAlgorithm based on the presence/absence data Quantitative data used for definition of pseudospecies
-
TWINSPAN 2 - pseudospeciesDefinition of cut levels has similar effect as transformation (weighting dominance vs. presence/absence)Compare 0, 1, 10, 100 vs. 0, 10, 20, 30, 40 Lower exclusive border
-
Divisive method each group is divided on the basis of the first CA axisThe first axis is based on CA ordination - it is then not surprising, that TWINSPAN results well correspond to, e.g., DCA and individual groups are well clustered in ordination space.
Chart1
-0.3679595113
0.487400698
0.8336608237
0.346413928
0.0881523337
-0.6085497293
-1.2634235051
-0.1220614036
0.0174041701
-0.2576307406
-0.2911361097
-0.0621242366
0.4383365224
-0.2079000338
-0.5418904056
0.1885450519
0.3289072378
0.4956929662
-0.4953047098
0.1555205329
-0.2811742022
0.3659734018
0.3822855362
-0.4919264144
0.5080853705
-0.686154586
-0.1533210826
-0.6083404181
0.5661628712
Sheet1
plant12345plant12345
1*1*
21.41*21.4142135624*
31.002.24*312.2360679775*
41.001.001.41*4111.4142135624*
512.0410.6312.7311.31*512.041594578810.630145812712.727922061411.313708499*
plant1234
1
20.1
30.20.2
40.10.30.2
50.90.60.70.8
1121.41421356240.1
22310.2
31110.1
42212.04159457880.9
510102.23606797750.2
10.3
10.63014581270.6
1.41421356240.2
12.72792206140.7
11.3137084990.8
Sheet2
-0.70383239591.0557679707
0.52682580980.3923011625
0.1506871627-0.629887616
0.3523369730.0764144746
0.89743795680.1781975337
0.00344365891.0867507516
-0.6312628478-0.5724934656
0.2384124314-0.5634444577
0.8005400635-0.2486600287
-0.46596275330.4277186253
-0.21194897290.5943651535
-0.51691178830.929757173
0.6225735240.3030059853
0.8036097460.253517701
1.02676750640.5808707369
-0.25808145610.2702218077
0.67854188880.5043086096
0.4687398180.2700476168
0.13224146680.8821441992
1.1103362324-0.1869459751
-0.9205506237-0.0592178541
0.0688630294-0.4988946773
0.613265630.5039467708
0.159155324-0.0345564789
-0.1390510737-0.7178660029
0.7982507809-0.6447369746
-0.0298952327-0.650099294
-0.19305923970.1079624256
0.4536185971-0.806939811
-0.02605791420.061506275
Sheet2
0.0867776412
0.5693165604
-0.0499121699
0.0588071358
0.1267425698
0.6624017301
0.3946393342
0.0089313559
0.2076436561
-0.3710935994
-0.9867220017
0.1945870664
0.9357918204
-0.279090028
-0.286325505
-0.4675655384
0.5654707065
-0.2113086077
0.4466090503
-0.5255387516
0.19027618
0.4045493148
0.2513534681
0.4375226748
-0.6446929855
-0.3246038918
0.7229029066
-0.0418989898
0.4289417369
Sheet3
-
Divisive method each group is divided on the basis of the first CA axisMost of the cases are usually around the center -> we need some polarization
Chart1
-0.3679595113
0.487400698
0.8336608237
0.346413928
0.0881523337
-0.6085497293
-1.2634235051
-0.1220614036
0.0174041701
-0.2576307406
-0.2911361097
-0.0621242366
0.4383365224
-0.2079000338
-0.5418904056
0.1885450519
0.3289072378
0.4956929662
-0.4953047098
0.1555205329
-0.2811742022
0.3659734018
0.3822855362
-0.4919264144
0.5080853705
-0.686154586
-0.1533210826
-0.6083404181
0.5661628712
Sheet1
plant12345plant12345
1*1*
21.41*21.4142135624*
31.002.24*312.2360679775*
41.001.001.41*4111.4142135624*
512.0410.6312.7311.31*512.041594578810.630145812712.727922061411.313708499*
plant1234
1
20.1
30.20.2
40.10.30.2
50.90.60.70.8
1121.41421356240.1
22310.2
31110.1
42212.04159457880.9
510102.23606797750.2
10.3
10.63014581270.6
1.41421356240.2
12.72792206140.7
11.3137084990.8
Sheet2
-0.70383239591.0557679707
0.52682580980.3923011625
0.1506871627-0.629887616
0.3523369730.0764144746
0.89743795680.1781975337
0.00344365891.0867507516
-0.6312628478-0.5724934656
0.2384124314-0.5634444577
0.8005400635-0.2486600287
-0.46596275330.4277186253
-0.21194897290.5943651535
-0.51691178830.929757173
0.6225735240.3030059853
0.8036097460.253517701
1.02676750640.5808707369
-0.25808145610.2702218077
0.67854188880.5043086096
0.4687398180.2700476168
0.13224146680.8821441992
1.1103362324-0.1869459751
-0.9205506237-0.0592178541
0.0688630294-0.4988946773
0.613265630.5039467708
0.159155324-0.0345564789
-0.1390510737-0.7178660029
0.7982507809-0.6447369746
-0.0298952327-0.650099294
-0.19305923970.1079624256
0.4536185971-0.806939811
-0.02605791420.061506275
Sheet2
0.0867776412
0.5693165604
-0.0499121699
0.0588071358
0.1267425698
0.6624017301
0.3946393342
0.0089313559
0.2076436561
-0.3710935994
-0.9867220017
0.1945870664
0.9357918204
-0.279090028
-0.286325505
-0.4675655384
0.5654707065
-0.2113086077
0.4466090503
-0.5255387516
0.19027618
0.4045493148
0.2513534681
0.4375226748
-0.6446929855
-0.3246038918
0.7229029066
-0.0418989898
0.4289417369
Sheet3
-
Polarized ordination (based on indicator species)
Chart2
0.8446293253
-0.7058465673
0.7103778663
-0.1512558757
0.5238539659
-0.8666588698
0.2088102471
0.1906048229
-0.5912676726
0.5780729181
0.2773897066
-0.1768197143
-0.1307615046
-1.1174361321
-0.1150041484
0.4858700445
0.3038034096
0.7209983088
-0.0547577212
-0.026567813
0.5160180243
0.34133475
0.1405797294
0.7708854357
-0.6853940481
-0.4422902338
0.250330837
0.3118461145
0.8221098683
Sheet1
plant12345plant12345
1*1*
21.41*21.4142135624*
31.002.24*312.2360679775*
41.001.001.41*4111.4142135624*
512.0410.6312.7311.31*512.041594578810.630145812712.727922061411.313708499*
plant1234
1
20.1
30.20.2
40.10.30.2
50.90.60.70.8
1121.41421356240.1
22310.2
31110.1
42212.04159457880.9
510102.23606797750.2
10.3
10.63014581270.6
1.41421356240.2
12.72792206140.7
11.3137084990.8
Sheet2
-1.05989159060.8446293253
1.2199802275-0.7058465673
1.06228175420.7103778663
-1.9996207825-0.1512558757
-1.83047783720.5238539659
1.5345402398-0.8666588698
-1.34474050880.2088102471
1.7021321480.1906048229
0.9688020291-0.5912676726
-0.80843292280.5780729181
-1.49590884440.2773897066
1.0400263205-0.1768197143
0.8246666018-0.1307615046
1.3989978249-1.1174361321
-0.8562715362-0.1150041484
1.33803836570.4858700445
1.17794757920.3038034096
1.44002656980.7209983088
1.5570785536-0.0547577212
0.8872359987-0.026567813
-0.84725395410.5160180243
-1.00541666930.34133475
-1.62428432420.1405797294
1.05777632890.7708854357
-1.2544518082-0.6853940481
1.3989991567-0.4422902338
1.3129841090.250330837
0.88232915130.3118461145
1.7631552990.8221098683
-0.25989159060.8446293253
0.4199802275-0.7058465673
0.26228175420.7103778663
-1.1996207825-0.1512558757
-1.03047783720.5238539659
0.7345402398-0.8666588698
-0.54474050880.2088102471
0.9021321480.1906048229
0.1688020291-0.5912676726
-0.00843292280.5780729181
-0.69590884440.2773897066
0.2400263205-0.1768197143
0.0246666018-0.1307615046
0.5989978249-1.1174361321
-0.0562715362-0.1150041484
0.53803836570.4858700445
0.37794757920.3038034096
0.64002656980.7209983088
0.7570785536-0.0547577212
0.0872359987-0.026567813
-0.04725395410.5160180243
-0.20541666930.34133475
-0.82428432420.1405797294
0.25777632890.7708854357
-0.4544518082-0.6853940481
0.5989991567-0.4422902338
0.5129841090.250330837
0.08232915130.3118461145
0.9631552990.8221098683
-0.53500700920.617673286
Sheet2
Sheet3
-
01 is more similar to 1 than 00The order of groups reflects possible gradient in the table
-
SSSSSSSSSSSSSS
aaaaaaaaaaaaaa
mmmmmmmmmmmmmm
pppppppppppppp
00000000000000
00000000000000
00000000011111
21345678901234
4 Sali SIle ---------5---- 0000
29 Anth Alpi -----2---3---- 0000
30 Hype Macu ------2--3---- 0000
31 Rubu Idae ------2--3---- 0000
28 Aden Alli -----2---2---- 0001
1 Pice Abie 6666665---5--- 001000
7 Oxal Acet 55344-4--3---- 001001
9 Sold Hung 43444--------- 001001
18 Luzu Pilo 2-2----------- 001001
20 Luzu Sylv --3243-------- 001001
12 Gent Ascl 23333333------ 001010
32 Geun Mont ------2--3-33- 1101
5 Juni Comm -----------24- 111
34 Puls Alba -----------32- 111
38 Oreo Dist ----------5564 111
39 Fest Supi ----------3444 111
40 Camp Alpi ----------34-4 111
41 Junc Trif ----------4453 111
42 Luzu Alpi ----------33-- 111
43 Hier Alpi ----------233- 111
44 Care Semp -----------545 111
45 Tris Fusc -----------33- 111
46 Pote Aure ------------32 111
47 Sale Herb -------------5 111
48 Prim Mini -------------4 111
00000000001111
0000000111
0000011
-
*