chapter 19 correspondence analysis

23
CHAPTER 19 Correspondence Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon http://www.pcord.com Tables, Figures, and Equations

Upload: hilary-shannon

Post on 01-Jan-2016

61 views

Category:

Documents


1 download

DESCRIPTION

CHAPTER 19 Correspondence Analysis. Tables, Figures, and Equations. From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities . MjM Software Design, Gleneden Beach, Oregon http://www.pcord.com. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CHAPTER 19 Correspondence Analysis

CHAPTER 19Correspondence Analysis

From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon http://www.pcord.com

Tables, Figures, and Equations

Page 2: CHAPTER 19 Correspondence Analysis

Figure 19.1. A synthetic data set of eleven species with noiseless hump-shaped responses to an environmental gradient. The gradient was sampled at eleven points (sample units), numbered 1-11.

Page 3: CHAPTER 19 Correspondence Analysis

SU1

SU2

SU3

SU4

SU5

SU6

SU7

SU8

SU9

SU10

SU11

EnvGradA

PCA

Axis 1

Axi

s 2

SU1

SU2

SU3

SU4

SU5SU6

SU7

SU8

SU9

SU10

SU11

EnvGradA

CA

Axis 1

Axi

s 2

Figure 19.2. Comparison of PCA and CA of the data set shown in Figure 19.1. PCA curves the ends of the gradient in, while CA does not. The vectors indicate the correlations of the environmental gradient with the axis scores.

Page 4: CHAPTER 19 Correspondence Analysis

1234567891011

1 2 345678910 11

1234567891011

Axis 1 Figure 19.3. One-dimensional ordinations of the data set in Figures 19.1 and 19.2, using nonmetric multidimensional scaling (NMS), PCA, and CA. Both NMS and CA ordered the points correctly on the first axis, while PCA did not.

CA

PCA

NMS

Page 5: CHAPTER 19 Correspondence Analysis

How it worksAxes are rotated simultaneously through species space and sample space with the object of maximizing the correspondence between the two spaces. This produces two sets of linear equations, X and Y, where

X = A Y and Y = A' X

A is the original data matrix with n rows (henceforth sample units) and p columns (henceforth species). X contains the coordinates for the n sample units on k axes (dimensions). Y contains the coordinates for the p species on k axes.

Note that both Y and X refer to the same coordinate system.

Page 6: CHAPTER 19 Correspondence Analysis

The goal is to maximize the correspondence, R, defined as:

R =

a x y

a

i=

n

j=

p

ij i j

i=

n

j=

p

ij

1 1

1 1

under the constraints that

i=

n

i

j=

p

ix = y = 1

2

1

21 1 and

Page 7: CHAPTER 19 Correspondence Analysis

Major steps eigenanalysis approach

1. Calculate weighting coefficients based on the reciprocals of the sample unit totals and the species totals.

v contains the sample unit weights,

w contains the species weights,

ai+ is the total for sample unit i,

a+j is the total for species j.

ii+

j+ j

v =a

w =a

1 1and

Page 8: CHAPTER 19 Correspondence Analysis

The square roots of these weights are placed in the diagonal of the two matrices V½ and W½, which are otherwise filled with zeros.

Given n sample units and p species:V½ has dimensions n n W½ has dimensions p p.

Page 9: CHAPTER 19 Correspondence Analysis

2. Weight the data matrix A by V½ and W½:

B V A W = / /1 2 1 2

In other words,

ijij

i+ + j

b = a

a a

This is a simultaneous weighting by row and column totals. The resulting matrix B has n rows and p columns.

Page 10: CHAPTER 19 Correspondence Analysis

3. Calculate a cross-products matrix: S = B'B = V½AWA'V½. The dimensions of S are n n. The term on the right has dimensions:

(n n)(n p)(p p)(p n)(n n)

Note that S, the cross-products matrix, is a variance-covariance matrix as in PCA except that the cross-products are weighted by the reciprocals of the square roots of the sample unit totals and the species totals.

Page 11: CHAPTER 19 Correspondence Analysis

4. Now find eigenvalues as in PCA. Each eigenvalue (latent root) is a lambda () that solves:

│S - I│ = 0

This is the “characteristic equation.” Note that it is the same as that used in PCA, except for the contents of S.

Page 12: CHAPTER 19 Correspondence Analysis

5. Also find the eigenvectors Y (p k) and X (n k) for each of k dimensions:

[S - I]x = 0 and

[ W½A'VAW½ - I]y = 0

using the same set of in both cases. For each axis there is one and there is one vector x. For every there is one vector y.

Page 13: CHAPTER 19 Correspondence Analysis

6. At this point, we have found X and Y for k dimensions such that:

X = A Yn k n p p k

andY = A' X

p k p n n k

whereY contains the species ordination,A is the original data matrix, andX contains the sample ordination.

Page 14: CHAPTER 19 Correspondence Analysis

Each component or axis can be represented as a linear combination of the original variables.

Each eigenvector contains the coefficients for the equation for one axis.

For eigenvector 1 (the first column of Y):

Score1 xi = y1ai1 + y2ai2 + ... + ypaip for entity i

Page 15: CHAPTER 19 Correspondence Analysis

The sample unit scores are scaled by multiplying each element of the SU eigenvectors, X, by

SU scaling factor = a / a++ i+

where a++ is the grand total of the matrix A.

The species scores are scaled by multiplying each element of the SU eigenvectors, Y, by

Species scaling factor = a / a++ + j

Page 16: CHAPTER 19 Correspondence Analysis

Major steps reciprocal averaging approach

1. Arbitrarily assign scores, x, to the n sample units. The scores position the sample units on an ordination axis.

Page 17: CHAPTER 19 Correspondence Analysis

2. Calculate species scores as weighted averages, where a+j is the total for species j:

ji=

n

ij i

+ j

y = a x

a1

Page 18: CHAPTER 19 Correspondence Analysis

3. Calculate new site scores by weighted averaging of the species scores, where ai+ is the total for sample unit i:

ij=

p

ij j

i+x =

a y

a1

Page 19: CHAPTER 19 Correspondence Analysis

4. Center and standardize the site scores so that

i=

n

i+ ii=

n

i+ i2a x = a x =

1 1

0 1 and

Page 20: CHAPTER 19 Correspondence Analysis

5. Check for convergence of the solution.

If the site scores are closer than a prescribed tolerance to the site scores of the preceding iteration, then stop. Otherwise, return to step 2.

Page 21: CHAPTER 19 Correspondence Analysis

A B

-192 160Sample units

2

56-254 212

1 3Species

Figure 19.4. One dimensional CA ordination of the same data set used in the weighted averaging example in the previous chapter (Fig. 18.1). Scores were standardized to unit variance, then multiplied by 100.

Page 22: CHAPTER 19 Correspondence Analysis

-200

-200

0 200 400

-100

0

100

200

RA

Axis 1

Axis

2-2.0

-1.0

-1.0 0.0 1.0

-0.5

0.0

0.5

1.0

NMS

Axis 1

Axis

2

-6

-6

-2 2 6 10

-4

-2

0

2

PCA

Axis 1

Axi

s 2

Figure 19.5. Comparison of 2-D CA (RA), nonmetric multidimensional scaling (NMS), and principal components analysis (PCA) of a data set with known underlying structure. The lines connect sample points along the major environmental gradient. The minor secondary gradient is nearly orthogonal to the major gradient. In the perfect ordination, the points would form a horizontally elongate grid. Inset: the ideal result is a regular 3 10 grid.

CA

Page 23: CHAPTER 19 Correspondence Analysis

Table 19.1. Comparison of correspondence analysis CA, nonmetric multidimensional scaling (NMS), and principal components analysis (PCA) of a data set with known underlying structure.

CA NMS PCA

Proportion of variance represented*

Axis 1 0.473 0.832 0.327

Axis 2 0.242 0.084 0.194 Cumulative 0.715 0.917 0.521

Correlations with environmental variables

Axis 1 Gradient 1 -0.982 0.984 -0.852 Gradient 2 -0.067 0.022 -0.397 Axis 2 Gradient 1 -0.058 0.102 -0.204 Gradient 2 -0.241 0.790 0.059

* Eigenvalue-based for CA and PCA; distance-based for NMS