analysis of multivariate ecological databiol09.biol.umontreal.ca › trieste16 ›...
TRANSCRIPT
Analysis of Multivariate� Ecological Data
School on Recent Advances in Analysis of Multivariate Ecological Data24-28 October 2016
Prof. Pierre LegendreDr. Daniel Borcard
Département de sciences biologiquesUniversité de Montréal
C.P. 6128, succursale Centre VilleMontréal QC
H3C 3J7 [email protected]@umontreal.ca
Brain storming in Pierre Legendre's lab
Daniel Borcard Pierre Legendre Pedro Peres-Neto Stéphane Dray
3
Day 1
4
Introduction to data analysis
Day 1
5
Main references
• Legendre, P. & L. Legendre. 2012. Numerical ecology. 3rd English Edition. Elsevier, Amsterdam.
• Borcard, D., F. Gillet & P. Legendre. 2011. Numerical Ecology with R. Springer, New York.
• http://www.numericalecology.com/
6
Introduction to data analysis
0 1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Distribution de F, 15 et 30 d.l.
F
Densité
# Import data into R spe <- read.csv("DoubsSpe.csv", row.names=1) env <- read.csv("DoubsEnv.csv", row.names=1) # Hellinger-transform the species dataset spe.hel <- decostand(spe, "hellinger") # Redundancy analysis (RDA) # ************************* # RDA of the Hellinger-transformed fish species data, constrained # by all the environmental variables contained in env2 (spe.rda <- rda(spe.hel ~ ., env)) # Observe the shortcut # formula summary(spe.rda) # Scaling 2 (default) # Canonical coefficients from the rda object coef(spe.rda) # R^2 retrieved from the rda object RsquareAdj(spe.rda)
−0.5 0.0 0.5 1.0
−0.5
0.0
0.5
RDA triplot − Scaling 1 − lc
RDA 1
RDA
2 ●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●●
●●
● ●
●
●
●
●
●
●
●●
12
3
4
56
7
9
10
11 12 1314
15
16
17181920
21 22
23
24
25
2627
28
2930SatrPhphChna
Baba
AlbiRham
Legi
CycaAbbr
Gyce
Ruru
Blbj
Alal
Anan
alt
flopH
har
pho
nit
amm
oxy
bdo
● ●
●
●slolowslomoderate
slosteep
slovery_steep
7
Software
R 3.3.1 for Windows, Mac OS or Linux
8
Definitions
Numerical ecology “The field of quantitative ecology devoted to the numerical analysis of ecological data sets. (...) The purpose of numerical ecology is to describe and interpret the structure of data sets by combining a variety of numerical approaches. Numerical ecology differs from descriptive or inferential ecological statistics in that it combines relevant multidimensional statistical methods with heuristic techniques (e.g. cluster analysis) (...) ” (Legendre & Legendre, 2012).
9
Definitions
Numerical ecologyLet us add that a great number of the methods used in numerical ecology, especially the new approaches developed since the 1980’s, have been developed by ecologists (and not pure statisticians!), in response to specific ecological problems.
10
Definitions
Multivariate, multidimensional analysis: methods of numerical analysis addressing whole data tables where every observation, every sampling unit is characterised by several variables:
- species abundances
- climatic measures- and so on...
11
The data
Table I - Structure of an ecological data table DescriptorsObjects Variable 1 Variable 2 Variable j Variable pObject 1 y11 y12 ... y1j ... y1pObject 2 y21 y22 ... y2j ... y2p....Object i yi1 yi2 ... yij ... yip...Object n yn1 yn2 ... ynj ... ynpThe objects are the observations (sites, relevés...).
12
The data
.
Species
Objects
1
n
p 1
n
m
Environnementalvariables
Spatialvariables1
n
q
Descriptors
13
Aims of multivariate analysis
v Measurement of resemblance among objects or variables of a data table;
v Clustering of the objects or variables according to these resemblances;
v Ordination in a reduced space allowing to emphasise their main structures (especially gradients);v Modelling of the relationships between response data tables and explanatory variables;
v Tests of these relationships for statistical significance.
14
Ordination in reduced space
Day 1
15
Ordination in reduced space
1. Generalities
.
00
1
2
3
4
5
6
1 2 3 4 5
Object 1
Object 2
Object 3Object 4
Object 5 Object 6
Variable 1
Var
iabl
e 2
Var.1 Var.2
Obj.1Obj.2Obj.3Obj.4Obj.5Obj.6
1 54 43 31 32 24 1
16
Ordination in reduced space
2. Principal component analysis (PCA)
.
00
1
2
3
4
5
6
1 2 3 4 5
Object 1
Object 2Object 3
Object 4
Object 5
Object 6
Variable 1
Var
iabl
e 2
Axis PCA 1
Axis PC
A 2Object 1
Object 4 Object 5Object 6
Object 3
Object 2
Axis PCA 1
Axi
s PCA
2
17
Ordination in reduced space
2. Principal component analysis (PCA)
..
Object 1
Object 4 Object 5
Object 3
Object 2
Axis PCA 1Object 6
Variable 1
Variable 2A
xis P
CA 2
λ1 = 0.705
λ 2 =
0.2
95
18
Ordination in reduced space
2. Principal component analysis (PCA)
One can conduct a PCA and display its results in different ways:Covariance or correlation?
PCA on the raw(centred) variables (= PCA on a covariance matrix) is only appropriate when the variables are dimensionally homogeneous.
PCA on a correlation matrix: when the variables are not dimensionally homogeneous or when one wants all variables to have equal weights in the analysis.
19
Ordination in reduced space
2. Principal component analysis (PCA)
One can conduct a PCA and display its results in different ways:
Scaling:
PCA Scaling 1 = Distance biplot: the eigenvectors are scaled to unit length; the main properties of the biplot are the following:
(1) Distances among objects in the biplot are approximations of their Euclidean distances in multidimensional space.
(2) Projecting an object at right angle on a descriptor approximates the position of the object along that descriptor.
20
Ordination in reduced space
2. Principal component analysis (PCA)
One can conduct a PCA and display its results in different ways:
Scaling:
PCA Scaling 1 = Distance biplot: the eigenvectors are scaled to unit length; the main properties of the biplot are the following:
(3) Since descriptors have length 1 in the full-dimensional space, the length of the projection of a descriptor in the reduced space indicates how much it contributes to that space.
(4) The angles among descriptor vectors do not reflect their correlations.
PCA Scaling 1 = Distance biplot: the eigenvectors are scaled to unit length; the main properties of the biplot are the following: (1) Distances among objects in the biplot are approximations of their Euclidean distances in multidimensional space. (2) Projecting an object at right angle on a descriptor approximates the position of the object along that descriptor. (3) Since descriptors have length 1 in the full-dimensional space, the length of the projection of a descriptor in the reduced space
indicates how much it contributes to that space. (4) The angles among descriptor vectors do not reflect their correlations.
22
Ordination in reduced space
2. Principal component analysis (PCA)
PCA Scaling 2 = correlation biplot: the eigenvectors are scaled to the square root of their eigenvalue.
(1) Distances among objects in the biplot are not approximations
of their Euclidean distances in multidimensional space. (2) Projecting an object at right angle on a descriptor
approximates the position of the object along that descriptor.
23
Ordination in reduced space
2. Principal component analysis (PCA)
(3) (3a) PCA on a covariance matrix: descriptors have length sj (= their standard deviation) in the full-dimensional space; therefore, the length of the projection of a descriptor in the reduced space is an approximation of its standard deviation.
(4) (3b) PCA on a correlation matrix: all the descriptors have unit variance (s = 1); the length of the projection of a descriptor in the reduced space reflects its contribution to that space.
(5) The angles between descriptors in the biplot reflect their correlations.
PCA: scaling 2
Observe thethree groups ofhighly correlatedspecies.
25
Ordination in reduced space
2. Principal component analysis (PCA) - in summary
One can conduct a PCA and display its results in different ways:
Scaling:
If the main interest of the analysis is to interpret the relationships among objects, choose scaling 1.
If the main interest focuses on the relationships among descriptors, choose scaling 2.
26
Ordination in reduced space
2. Principal component analysis (PCA)
Equilibrium contribution circle
• In scaling 1.• Equilibrium contribution: length that a descriptor-vector
would have if it contributed equally to all the dimensions (principal axes) of the PCA.
27
Ordination in reduced space
2. Principal component analysis (PCA)
Equilibrium contribution circle
• The circle has a radius equal to √(d/p), where d equals the number of dimensions of the reduced space considered (usually d=2) and p equals the total number of descriptors (and hence of principal components) in the analysis.
PCA: Equilibrium contribution circle
-5 0 5
-6-4
-20
24
6
PCA biplot - Scaling 1
PCA 1
PC
A 2
1
234
56
7
910
1112131415
16 171819202122
23
24
25
26
27
282930
dfs
ele
slo
dis
pH
har
pho
nit
amm
oxy
bod
-2 0 2 4
-3-2
-10
12
3
PCA biplot - Scaling 2
PCA 1
PC
A 2
1
2
34
56
7
910
1112
131415
16 171819202122
23
24
25
26
27
28
29
30 dfs
eleslo
dis
pH
har
pho
nit
amm
oxy
bod
29
Ordination in reduced space
2. Principal component analysis (PCA)
Interpretation of meaningful components
Not all ordination axes are meaningful. Which ones are?1. Arbitrary decision (ex.: interpretation of axes representing
75% variance)2. Interpretation of axes whose eigenvalues are larger than the
average of all eigenvalues (Kaiser-Guttman criterion)
30
Ordination in reduced space
2. Principal component analysis (PCA)
Interpretation of meaningful components
Not all ordination axes are meaningful. Which ones are?3. Broken stick model: comparison of eigenvalues to a model
predicting the (decreasing) sizes of the pieces of a stick broken in as many pieces as there are eigenvalues.1. Comparison of individual eigenvalues;2. Comparison of cumulative eigenvalues.
31
Ordination in reduced space
2. Principal component analysis (PCA)
Uses and misuses of PCA
Main applications in ecology: ordination of sites on the basis of quantitative environmental variables (physical and chemical descriptors), not raw (untransformed) species abundances in most cases (see Chapter 2, double zero problem). However, PCA is now very often applied to species data that have undergone a pre-transformation following Legendre and Gallagher (2001). See slides 36 and following.
Legendre, P. & E. D. Gallagher. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia 129: 271-280.
32
Ordination in reduced space
2. Principal component analysis (PCA)
Uses and misuses of PCA
PCA has originally been defined for data with multinormal distributions. However, PCA is not very sensitive to departure from multinormality, as long as the distributions are not exaggerately skewed. If some are, then the first few PCA axes will display several objects with extreme values on these variables instead of main trends.
33
Ordination in reduced space
2. Principal component analysis (PCA)
Uses and misuses of PCA
PCA is computed from a matrix of dispersion (linear covariances or correlations). This means that the method must be applied on a dispersion matrix among descriptors that are quantitative and for which valid estimations of covariances or correlations may be obtained.
34
Ordination in reduced space2. Principal component analysis (PCA)
Uses and misuses of PCAThese conditions are violated in the following cases: (1) The data matrix must not be transposed, since covariances or correlations among objects are meaningless. (2) Covariances and correlations are defined for quantitative variables. However, PCA is very robust to variations in the precision of data.
35
Ordination in reduced space
2. Principal component analysis (PCA)
Uses and misuses of PCAThese conditions are violated in the following cases: (3) With data sets with many zeros, PCA produce inadequate estimates of the site positions, unless the abundance data have been transformed in an ecologically appropriate way.(4) Avoid the (all too common) mistake of interpreting the relationships among variables with help of the proximities of the apices (points) of the vector-arrows instead of the angles (when appropriate) on the biplots!
36
Transformation of species abundance data tables prior to
linear analyses
Day 1
37
Data transformation
0. Preliminary remarks: general types of transformation for any kind of variables
38
Data transformation
1. Make comparable descriptors that have been measured in different units:
Ranging: Standardization:
yi' =
yi − yminymax − ymin
yi' =
yiymax
yi' = zi =
yi − y sy
Example: model II regression, ranged major axis
Example: explanatory variables in RDA
39
Data transformation
2. Normalize the data and stabilize their varianceSquare root transformation: y'i = √(yi+c):
.
0
10
20
30
0 2 4 6 8 10 12Nb. Syrphid Diptera
.5 1 1.5 2 2.5 3 3.5 4Square root(Nb. Syrphids)
Nb.
obs
erva
tion
hour
s
Example: species data with moderate asymmetryNote: constant c must be added only if there are negative values in the data, i.e., not in the case of species abundances.
40
Data transformation
2. Normalize the data and stabilize their varianceLog transformation: y'i = ln(yi+c):
.
02468
10121416
0 .5 1 1.5 2 2.5 3 3.5 4 4.5ln(nb. ind.+1)
05
1015202530
0 10 20 30 40 50 60 70Nb. individuals
Nb.
of s
oil c
ores
Oppiella nova
Example: species data with more severe asymmetry
41
Data transformation
3. Recode semi-quantitative variables as quantitativeExample: transformation of Braun-Blanquet's phytosociological scale into
quantitative values:
42
Data transformation
4. Binary coding of nominal variables
1 qualitative descriptor 4 binary descriptors (dummy variables)
Modalities Classes (factors) Calcosol Brunisol Neoluvisol Calcisol
Calcosol 1 1 0 0 0
Brunisol 2 0 1 0 0
Neoluvisol 3 0 0 1 0
Calcisol 4 0 0 0 1 Translated from F. Gillet (Neuchâtel, Switzerland) course notes.
For k levels, (k – 1) dummy variables needed <-> (k – 1) d.f.
à later: see orthogonal Helmert contrasts
43
Data transformation
5. Transformations to obtain ecologically meaningful relationships among sites while using linear techniques
44
1. The double-zero problem
Community composition data sampled over variable environmental conditions, e.g. along long environmental gradients, typically contain many zero values because species are known to generally have unimodal distributions along environmental gradients and to be absent from sites far from their optimal living conditions.
The proportion of zeros is greater when the environmental conditions are more variable across the sampling sites.
Transformation of species data
45
1. The double-zero problem
Transformation of species data
Brachysp Phthirsp Hoplcfpa Rhysardu Atrostri Protopsp Malacfeg Malacfpr Malacosp Eniominu 1 17 5 5 3 2 1 4 2 2 1 2 2 7 16 0 6 0 4 2 0 0 3 4 3 1 1 2 0 3 0 0 0 4 23 7 10 2 2 0 4 0 1 2 5 5 8 13 9 0 13 0 0 0 3 6 19 7 5 9 3 2 3 0 0 20 7 17 3 8 2 3 0 3 0 0 19 8 5 4 8 2 1 2 3 0 0 1 9 3 3 2 2 1 1 12 0 0 0 10 22 4 5 3 0 0 0 0 0 11
46
1. The double-zero problem
The zero value in a matrix of species abundances (or presence-absence) is tricky to interpret:
The presence of a species at a given site (i.e., non-zero value) generally implies that this site provides a set of minimal conditions allowing the species to survive (the dimensions of its ecological niche).
Transformation of species data
47
1. The double-zero problem
The zero value in a matrix of species abundances (or presence-absence) is tricky to interpret:
The absence of a species from a site may be due to a variety of causes: the species' niche may be occupied by a replacement species, or the absence of the species is due to adverse conditions on any of the important dimensions of its ecological niche, or the species has been missed because of a purely stochastic component of its spatial distribution.
Transformation of species data
48
1. The double-zero problem
Double zero: absence of a given species from two sites.
A double zero cannot readily be interpreted as a resemblance between the two sites.
Transformation of species data
The methods of analysis of species data must take this problem into account.
Most methods of analysis of species data work by comparing the sites, explicitly or implicitly, on the basis of an association measure.
49
1. The double-zero problem
• The association coefficients that consider the double zero as a resemblance (as any other value) are said to be symmetrical.
• The association coefficients that do not consider the double zero as a resemblance (as any other value) are said to be asymmetrical.
It is preferable to use asymmetrical coefficients when analysing species data
Transformation of species data
50
2. Pre-transformation of species data
Since it is a linear method working in a Euclidean space, PCA is not adapted to raw species abundance data, since the zero is treated as any other value. The Euclidean distance imbedded in PCA is a symmetrical measure.
Therefore….
Transformation of species data
51
2. Pre-transformation of species data
Legendre, P. & E. D. Gallagher. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia 129: 271-280.
Pre-transform the species data in such a way that, after PCA, the distance preserved among objects is no more the Euclidean distance, but an ecologically meaningful one.
Transformation of species data
52
Ordination in reduced space
2. Pre-transformation of species data
Chord distance (D3)
χ2 metric (D15)
χ2 distance (D16)
Distance between species profiles
Hellinger distance (D17)
yij' =
yij
yij2
j=1
p
∑
yij' =
yijyi+ y+ j
yij' = y++
yijyi+ y+ j
yij' =
yijyi+
yij' =
yijyi+
53
Ordination in reduced space
2. Pre-transformation of species data: illustrationThe species abundance paradox (Orlóci, 1978)
Transformations for community composition data 329
Figure 7.8 Species abundance paradox data, modified from Orlóci (1978). The paradox is that the
Euclidean distance between sites 2 and 3, which have no species in common, is smaller than that
between sites 1 and 2 which share species 2 and 3. This results in an incorrect assessment of the
ecological relationships among sites. With the other coefficients in this figure, which are
asymmetrical, the distance between sites 2 and 3 is larger than that between sites 1 and 2, and
the distance between sites 1 and 3 is the same as between sites 2 and 3, or very nearly so.
Distance matrix D15 (not shown) is equal to D16/ = D16/ .y++
15
In these data, the Euclidean distance between sites 2 and 3 (no species in common) is smaller than the distance between sites 1 and 2.
Transformations for community composition data 329
Figure 7.8 Species abundance paradox data, modified from Orlóci (1978). The paradox is that the
Euclidean distance between sites 2 and 3, which have no species in common, is smaller than that
between sites 1 and 2 which share species 2 and 3. This results in an incorrect assessment of the
ecological relationships among sites. With the other coefficients in this figure, which are
asymmetrical, the distance between sites 2 and 3 is larger than that between sites 1 and 2, and
the distance between sites 1 and 3 is the same as between sites 2 and 3, or very nearly so.
Distance matrix D15 (not shown) is equal to D16/ = D16/ .y++
15
Site 1 Site 2 Site 3Site 1Site 2Site 3
54
Ordination in reduced space
3. Pre-transformation of species data: illustrationThe species abundance paradox (Orlóci, 1978)
Transformations for community composition data 329
Figure 7.8 Species abundance paradox data, modified from Orlóci (1978). The paradox is that the
Euclidean distance between sites 2 and 3, which have no species in common, is smaller than that
between sites 1 and 2 which share species 2 and 3. This results in an incorrect assessment of the
ecological relationships among sites. With the other coefficients in this figure, which are
asymmetrical, the distance between sites 2 and 3 is larger than that between sites 1 and 2, and
the distance between sites 1 and 3 is the same as between sites 2 and 3, or very nearly so.
Distance matrix D15 (not shown) is equal to D16/ = D16/ .y++
15
Chord
Species profiles
Hellinger
χ2 distance
Euclidean
55
2. Pre-transformation of species data
The chord transformation consists in transforming the species abundances from each object into a vector of length 1.
Transformation of species data
y.chord <- decostand(y, method="normalize")
If one transforms a data table with this formula, any method that preserves the Euclidean distance among the objects (e.g., PCA, RDA, ANOVA) is now accessible for species abundance data.
56
2. Pre-transformation of species data
The Hellinger transformation consists in taking the squareroot of the relative species abundances.
Transformation of species data
y.hel <- decostand(y, method="hellinger")
If one transforms a data table with this formula, any method that preserves the Euclidean distance among the objects (e.g., PCA, RDA, ANOVA) is now accessible for species abundance data.
57
2. Pre-transformation of species data
The chord and Hellinger transformations appear to be the best for general use.
They can both be applied to presence-absence data as well, with identical results.
Transformation of species data
58
Correspondence analysis (CA)
Day 1
59
Ordination in reduced space
4. Correspondence analysis (CA)
CA is actually a PCA on a species data table that has been transformed into a table of Pearson's χ2 statistic (the one used in contingeny tables).
The χ2 distance (D16) is preserved among sites or species.
CA is a method adapted to the analysis of species abundance data.
60
Ordination in reduced space
4. Correspondence analysis (CA)
The data must be dimensionally homogeneous and 0 or positive.CA produces one axis less than min[n,p].The variation of the data is measured by a quantity named inertia.Inertia is partitioned among ordination axes as in PCA. The weight
of each axis is also measured by an eigenvalue.
The objects and the species are represented as points on the same joint plot.
61
Ordination in reduced space
4. Correspondence analysis (CA)
Scalings:
CA scaling type 1: rows (objects) are at the centroids of columns.
The most appropriate if one is primarily interested in the
ordination of objects (sites). In the multidimensional space, χ2 distance is preserved among
objects.
62
Ordination in reduced space
4. Correspondence analysis (CA)
CA scaling type 1: interpretation:(1) The distances among objects in the reduced space
approximate their χ2 distance. Thus, object points that are close to one another are likely to be relatively similar in their species relative frequencies.
(2) Any object found near the point representing a species is likely to have a high contribution of that species. For presence-absence data, the object is more likely to posses the state "1" for that species.
63
Ordination in reduced space
4. Correspondence analysis (CA)Scalings:
CA scaling type 1
..
Species 1 Species 3
Species 2
Object 1
Object 2Object 4
Object 3
Object 6 Object 5
Axis CA 1
Axi
s CA
2
λ1 = 0.2295
λ 2 =
0.0
857
Université Laval Analyse multivariable - mars-avril 2008 13
Dr. Daniel Borcard Université de Montréal
(2) Any object found near the point representing a species is likelyto have a high contribution of that species. For presence-absencedata, the object is more likely to possess the state "1" for thatspecies.
• CA scaling type 2: columns are at the centroids of rows. This scalingis the most appropriate if one is primarily interested in the ordination ofspecies. In the multidimensional space, χ2 distance is preserved amongspecies. Interpretation:
(1) The distances among species in the reduced space approximatetheir χ2 distance. Thus, species points that are close to one anotherare likely to have relatively similar relative frequencies in theobjects.(2) Any species that lies close to the point representing an object ismore likely to be found in that object, or to have a higherfrequency there than in objects that are further away in the jointplot.
The following example (Table VII) will be submitted to acorrespondence analysis:
Table VII - Artificial data for CA
Spec.1
Obj.1Obj.2Obj.3Obj.4Obj.5Obj.6
1 54 43 31 32 24 1
260540
Spec.3Spec.2
64
4. Ordination in reduced space
-2 -1 0 1 2
-2-1
01
2
Oribatid mites, Lac Geai
CA1
CA2 Brachy
PHTH
HPAV
RARD
SSTR
ProtoplMEGR
MPRO
TVIE
HMIN
HMIN2
NPRA
TVEL
ONOV
SUCT
LCIL
Oribatl1
Ceratoz1
PWIL
Galumna1
Stgncrs2
HRUF
Trhypch1
PPEL
NCOR
SLAT
FSETLepidzts
Eupelops
Miniglmn
LRUGPLAG2Ceratoz3
Oppiminu
Trimalc2
123
45678
9
10 111213
14
15
161718
19
20
212223
24
252627
28
29 30 31
32
33
34
3536
37
38
39
40
41
42
4344
45
46
47
48
49
50
5152
53
5455
56
57
58
59
60
6162
63
64
65
66
67
6869
70
CA scaling type 1
library(vegan) data(mite) mite.ca <- cca(mite) plot(mite.ca, scaling=1, main="Oribatid mites, Lac Geai")
65
Ordination in reduced space
4. Correspondence analysis (CA)
Scalings:
CA scaling type 2: columns (species) are at the centroids of rows.
The most appropriate if one is primarily interested in the
ordination of species. In the multidimensional space, χ2 distance is preserved among
variables (species).
66
Ordination in reduced space
4. Correspondence analysis (CA)
CA scaling type 2: interpretation:(1) The distances among species in the reduced space
approximate their χ2 distance. Thus, species points that are close to one another are likely to have relatively similar relative frequencies in the objects.
(2) Any species that lies close to the point representing an object is more likely to be found in that object, or to have a higher frequency there than in objects that are further away in the joint plot.
67
Ordination in reduced space
4. Correspondence analysis (CA)
Scalings:
CA scaling type 2
..
Species 1
Species 2
Object 1
Object 2Object 4
Object 3
Object 6 Object 5
Axis CA 1
Axi
s CA
2
λ1 = 0.2295
λ 2 =
0.0
857
Species 3
Université Laval Analyse multivariable - mars-avril 2008 13
Dr. Daniel Borcard Université de Montréal
(2) Any object found near the point representing a species is likelyto have a high contribution of that species. For presence-absencedata, the object is more likely to possess the state "1" for thatspecies.
• CA scaling type 2: columns are at the centroids of rows. This scalingis the most appropriate if one is primarily interested in the ordination ofspecies. In the multidimensional space, χ2 distance is preserved amongspecies. Interpretation:
(1) The distances among species in the reduced space approximatetheir χ2 distance. Thus, species points that are close to one anotherare likely to have relatively similar relative frequencies in theobjects.(2) Any species that lies close to the point representing an object ismore likely to be found in that object, or to have a higherfrequency there than in objects that are further away in the jointplot.
The following example (Table VII) will be submitted to acorrespondence analysis:
Table VII - Artificial data for CA
Spec.1
Obj.1Obj.2Obj.3Obj.4Obj.5Obj.6
1 54 43 31 32 24 1
260540
Spec.3Spec.2
68
4. Ordination in reduced space
-2 -1 0 1 2
-2-1
01
Oribatid mites, Lac Geai
CA1
CA2
Brachy
PHTH
HPAV
RARD
SSTR
Protopl MEGR
MPRO
TVIE
HMIN
HMIN2
NPRA
TVEL
ONOV
SUCT
LCIL
Oribatl1
Ceratoz1
PWIL
Galumna1
Stgncrs2
HRUFTrhypch1
PPEL
NCOR
SLATFSETLepidzts
Eupelops
Miniglmn
LRUGPLAG2Ceratoz3
Oppiminu
Trimalc2
12
3
4
5
6
7
8
9
10 11
1213
14
15
1617
18
19
20
2122
23
24
2526
27
28
2930 31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
6162
63
64
65
66
67
68
69
70
CA scaling type 2library(vegan) data(mite) mite.ca <- cca(mite) plot(mite.ca, scaling=2, main="Oribatid mites, Lac Geai")
69
Ordination in reduced space
4. Correspondence analysis (CA)
Words of caution
• Graphically, correspondence analysis seems very sensitive to rare species, which take extreme values in plots.
• Rare species have a small influence on the extraction of the principal axes, however.
70
Ordination in reduced space
4. Correspondence analysis (CA)
Words of caution
• Strong gradients generate arch effects:Environmental gradients often support a succession of species that tend to have unimodal distributions.At the ends of the gradients, series are distorted.Distortion occurs also in PCA (horseshoe effects).
71
Ordination in reduced space
4. Correspondence analysis (CA)
Words of cautionSpecies packing model:
CA approximates a Gaussian ordination.
Position of species along axes approximate their ecological optimum. 0
5
10
15
20
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
72
Ordination in reduced space
Arch and horseshoe effects
-1.41 -1.34
-0.70
-0.66
0.00
0.02
0.70
0.71
1.41
1.39
-14.63-17.31
-7.31
-10.15
0.00
-2.99
7.32
4.18
14.63
11.34
CA: arch effect PCA: horseshoe effect
A B
73
Ordination in reduced space
4. Correspondence analysis (CA)
Detrended correspondence analysis (DCA):
Detrending by segments:
Axis I is divided into an arbitrary number of segments.
Within each segment, the mean of the object scores along axis 2 is made equal to zero.
This methods has been strongly rejected by many authors.
The scores on the second axis are essentially meaningless.
74
Principal coordinate analysis (PCoA)
Day 1
75
Ordination in reduced space
5. Principal coordinate analysis (PCoA)
Allows to obtain a Euclidean representation of a set of objects whose relationships are measured by any similarity or dissimilarity coefficient chosen by the user.PCoA produces a set of orthogonal ordination axes whose importance is measured by eigenvalues (as in PCA and CA).
PCoA can only represent the relationships among objects (Q mode) or variables (R mode), but not both at the same time, unless scores for variables are computed after a Q-mode PCoA (using correlations or weighted averages).
76
Ordination in reduced space
5. Principal coordinate analysis (PCoA)In the case of Euclidean association measures, PCoA will behave in a Euclidean manner.
If the association coefficient used is nonmetric, semimetric or has other problems of "non-Euclideanarity", then PCoA will react by producing several negative eigenvalues in addition to the positive ones.If one wants to retain all the variance of the original association matrix, there are technical solutions to this problem (e.g. taking the square root of the dissimilarities, or else apply Lingoes or Caillez correction).
77
Ordination in reduced space
5. Principal coordinate analysis (PCoA)
..
-0.30 -0.15
-0.10
-0.05
0.10
0.05
0.30
0.15
0.50
0.25
Object 5
Object 1
Object 2
Object 4
Object 3
Object 6
Axi
s PCo
A 2
λ 2 =
0.0
95
Axis PCoA 1λ1 = 0.359
1 × 2 axis of a PCoA on a D14 (percentage difference, aka Bray-Curtis) matrix of the 6 × 3 data used in the first CA example. The two first axes have eigenvalues of 0.359 and 0.095; they represent 59.5% and 20.2% variance respectively. This PCoA gives 4 positive, one zero and one negative eigenvalue.
78
-0.4 -0.2 0.0 0.2 0.4
-0.4
-0.2
0.0
0.2
0.4
Oribatid mites, D14, PCoA
Dim1
Dim2
1
234
5 6
78
9
10
11
1213
1415
161718
19
20
21
22
23
24
25
26
27
28
2930
31
32
33
34
35
36
37
38
39
40
41
42
43
44
4546
47
48
49
50
51
52
53
5455
56
57
58
59
60
61
62
6364
65
66
67
68 69
70
Brachy
PHTH
HPAV
RARD
SSTR
Protopl
MEGR
MPRO
TVIEHMIN
HMIN2
NPRA
TVELONOV
SUCT
LCIL
Oribatl1
Ceratoz1
PWILGalumna1
Stgncrs2
HRUFTrhypch1
PPEL
NCOR
SLATFSETLepidzts
Eupelops
Miniglmn
LRUG PLAG2Ceratoz3Oppiminu
Trimalc2
mite.bray <- vegdist(mite, "bray") mite.PCoA <- cmdscale(mite.bray, k=ncol(mite)-1, eig=TRUE) ordiplot(scores(mite.PCoA)[,c(1,2)], type="t", main="Oribatid mites, D14, PCoA") abline(h=0, lty=3) abline(v=0, lty=3) sp.wa <- wascores(mite.PCoA$points[,1:2], mite) text(sp.wa, rownames(sp.wa), cex=0.7, col="red")
Species scores have been computed after the PCoA analysis by weighted averages of site scores
79
PCoA on the Doubs fish data. Scaling 1.Species scores have been computed after the PCoA analysis by correlation with the ordination axes, as in PCA.
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
Axis.1
Axis.2
1
2
3
45
6
7
8
9
10
1112
13
14
15
1617
18
19
20 2122
23
24
25
26
2728
29
30
-0.4 -0.2 0.0 0.2 0.4
-0.4
-0.2
0.0
0.2
0.4
Cogo
Satr
Phph
Babl
Thth
Teso
Chna
PatoLele
Sqce
Baba
Albi
Gogo
Eslu
Pefl
Rham
Legi
Scer
Cyca
Titi
Abbr
Icme
Gyce
Ruru
Blbj
Alal
Anan
PCoA biplotResponse variables projected
as in PCA with scaling 1
80
Ordination in reduced space
5. Principal coordinate analysis (PCoA)
Sometimes used as an intermediate, technical step in more complex analyses.Examples:• Raw variables neither quantitative nor orthogonal ==> PCoA
==> one obtains a set of quantitative and orthogonal variables (the PCoA axes)!
• Spatial analysis using dbMEM (distance-based Moran's eigenvector maps à Day 5
81
Ordination in reduced space
X =Explanatory
variables
Y = Raw data
(sites x species)
Short gradients: CCA or RDALong gradients: CCA
Constrained ordination of species data(d) Classical approach
(e) Transformation-based approach (tb-RDA)
(f) Distance-based approach (db-RDA)
Raw data
(sites x species)
X =Explanatory
variables
Y=Transformeddata
(sites x species)
RDA
Raw data
(sites x species)Distancematrix
X =Explanatory
variables
Y =(sites x
principal coord.)
PCoA
RDA
Representation of elements:Species = arrowsSites = symbolsExplanatory variables = lines
Canonicalordination triplot
Unconstrained ordination of species data(a) Classical approach
(b) Transformation-based approach (tb-PCA)
Y = Raw data
(sites x species)
Short gradients: CA or PCALong gradients: CA
Raw data
(sites x species)
Y=Transformeddata
(sites x species)
PCA
Ordination biplot
Representation of elements:Species = arrowsSites = symbols
Raw data
(sites x species)Distancematrix
PCoA
(c) Distance-based approach (PCoA)Ordination of sites
Representation of elements:Sites = symbols
82
Ordination in reduced spaceX =Explanatory
variables
Y = Raw data
(sites x species)
Short gradients: CCA or RDALong gradients: CCA
Constrained ordination of species data(d) Classical approach
(e) Transformation-based approach (tb-RDA)
(f) Distance-based approach (db-RDA)
Raw data
(sites x species)
X =Explanatory
variables
Y=Transformeddata
(sites x species)
RDA
Raw data
(sites x species)Distancematrix
X =Explanatory
variables
Y =(sites x
principal coord.)
PCoA
RDA
Representation of elements:Species = arrowsSites = symbolsExplanatory variables = lines
Canonicalordination triplot
Unconstrained ordination of species data(a) Classical approach
(b) Transformation-based approach (tb-PCA)
Y = Raw data
(sites x species)
Short gradients: CA or PCALong gradients: CA
Raw data
(sites x species)
Y=Transformeddata
(sites x species)
PCA
Ordination biplot
Representation of elements:Species = arrowsSites = symbols
Raw data
(sites x species)Distancematrix
PCoA
(c) Distance-based approach (PCoA)Ordination of sites
Representation of elements:Sites = symbols