analysis of temporal (stratigraphic) and spatial data numerical analysis of biological and...

Analysis of Temporal (Stratigraphic) and Spatial

Data

NUMERICAL ANALYSIS OF BIOLOGICAL AND

ENVIRONMENTAL DATA

John Birks

Introduction

Temporal stratigraphic data

Single sequence

Partitioning or zonation

Sequence splitting

Rate-of-change analysis

Gradient analysis and summarisation

Analogue matching

Relationships between two or more sets of variables in same sequence

Two or more sequences

Sequence comparison and correlation

Multi-proxy studies

Hypothesis testing

Spatial geographical data

Spatial autocorrelation

Spatially constrained clusterings

Spatially constrained ordinations

Predictive models for spatial data

ANALYSIS OF TEMPORAL AND SPATIAL DATA

INTRODUCTIONAnalysis of quadrats, lakes, streams, etc. Assume no autocorrelation, namely cannot predict the values of a variable at some point in space from known values at other sampling points.

PALAEOCOLOGY – fixed sample order in time.

strong autocorrelation – temporal autocorrelation

STRATIGRAPHICAL DATA

biostratigraphic, lithostratigraphic, geochemical, geophysical, morphometric, isotopic

multivariate

continuous or discontinuous time series

ordering very important – display, partitioning, trends, interpretation

SPATIAL DATA

many types, spatial autocorrelation

spatial or geographical co-ordinates very important

raises problems of statistical inference as samples not independent

TEMPORAL STRATIGRAPHIC DATA

ANALYSIS OF SINGLE SEQUENCE

ZONATION OR PARTITIONINGUseful for:

1) description

2) discussion and interpretation

3) comparisons in time and space

“sediment body with a broadly similar composition that differs from underlying and overlying sediment bodies in the kind and/ or amount of its composition”.

CONSTRAINED CLUSTERINGS

1) Constrained agglomerative procedures CONSLINK

CONISS

2) Constrained binary divisive procedures

Partition into g groups by placing g – 1 boundaries.

Number of possibilities

Compared with non-constrained situation.

Criteria – within-group sum-of-squares or variance SPLITLSQ

– within-group information SPLITINF

211 gnng for

12 1 n

n

i

m

k ikikik qpp

1 1

log

3) Constrained optimal divisive analysis OPTIMAL

2 group______________________________

3 group

4 group

4) Variable barriers approachBARRIER

All methods in one program: ZONE

RIOJA

nn11

nn11

nn11nn22

nn22 nn33

Pollen diagram and numerical zonation analyses for the complete Abernethy Forest 1974 data set.

Birks & Gordon (1985)

CONISS = constrained incremental sum-of-squares (= constrained Ward's minimum

variance)

OPTIMAL SUM OF SQUARES PARTITIONS OF THE ABERNETHY

FOREST 1974 DATANumber of groups g (zones)

Percentage of total sum-of-

squares

Markers

2 59.3 15

3 28.4 15 32

4 18.9 15 33 41

5 14.7 15 33 41 45

6 10.6 15 32 34 41 45

7 8.1 15 26 32 34 41 45

8 5.8 8 15 26 32 34 41 45

9 4.7 8 15 24 29 32 34 41 45

10 3.9 8 15 24 29 32 33 34 41 45

HOW MANY ZONES?

K D Bennett (1996) Determination of the number of zones in a bio-stratigraphical sequence. New Phytologist 132, 155-170

Broken stick model

Pn iri k

n

1 1

RIOJA (R)

BSTICK

Pollen percentage diagram plotted against depth. Lithostratigraphic column is represented; symbols are based on Troels-Smith (1995).

Tzedakis (1994)

Ioannina Basin

Variance accounted for by the nth zone as a proportion of the total variance (fluctuating curve) compared with values from a broken-stick model (smooth curve):

(a) randomized data set,

(b) original data set.

Zonation method: binary divisive using the information content statistic.

Data set; Ioannina.

Original data

Broken stick model

Bennett (1996)

Technical Point

Turns out that the binary divisive procedures SPLINTF and SPLITLSQ of Gordon and Birks (1972) are an early implementation of De’ath’s (2002) multivariate regression trees (MRT) discussed in the Modern Regression lecture.

Both are MRTs where a vector of sample depths or ages is used as the sole explanatory predictor variable

SPLINTF = distance-based MRT with information content as the dissimilarity measure

SPLITLSQ = MRT with Euclidean distance as the distance measure

Advantage of MRT over SPLINTF/SPLITLSQ as a zonation procedure is that the k-fold cross-validation in CARTs provides a simple way to assess the number of zones into which the stratigraphical sequence should be split.

MRT using the optimal partitioning approach still to be implemented.

mvpart (R)

SEQUENCE SPLITTING

Walker & Wilson (1978) J. Biogeog. 5, 1–21

Walker & Pittelkow (1981) J. Biogeog. 8, 37–51

SPLIT, SPLIT2

BOUND2

Need statistically ‘independent’ curves

Pollen influx (grains cm–2 year–1)

PCA or CA or DCA axes CANOCO

Aitchison log-ratio transformation LOGRATIO

i

ikik p

pZ log

m

k

iki m

pp1

loglogwhere

Correlograms of sequence splits with charcoal, inorganic matter and total pollen influxes for three sections of the pollen record. The vertical scales give correlations; the horizontal scales give time lag in years (assuming a sampling interval of 50 years).

Technical Point

The sequence splitting of Walker and Wilson (1978) is a precursor of regression trees within CART (see Modern Regression lecture).

In a regression tree a quantitative response variable, in our case a stratigraphical sequence of taxon A, is repeatedly split so that at each partition the sequence is divided into two mutually exclusive groups, each of which is homogeneous as possible.

In the regression tree implementation, a vector of sample depths or ages is used as the sole explanatory predictor variable. The splitting is then applied to each group separately until some stopping rule is reached.

Usually k-fold cross-validation is used to find the optimal tree-size using cross-complexity (CC) pruning.

CC = Timpurity + (Tcomplexity)

where Timpurity is the impurity of the current tree over all terminal nodes; Tcomplexity is the number of terminal leaves; and is a real number >0

is the tuning parameter that is minimised in CC pruning. Represent trade-off between tree-size and goodness-of-fit.

Small values of give large trees; large values of lead to small trees.

Starting with full tree, search to identify the terminal node that results in the lowest CC for a given value of .

As penalty on tree complexity is increased, the tree that minimises CC will become smaller and smaller until the penalty is so great that a tree with a single node (i.e. the original data) has the lowest CC:

Search produces a sequence of progressively smaller trees with associated CC.

k-fold cross-validation is used to find the optimal value of that gives the minimal root mean squared error (RMSE). Alternative is to select the smallest tree that lies within 1 standard error of the RMSE of the best tree.

rpart (R)

Amount of palynological compositional change per unit time.

Calculate dissimilarity between pollen assemblages of two adjacent samples and standardise to constant time unit, e.g. 250 14C years.

Jacobson & Grimm (1986) Ecology 67, 958-966

Grimm & Jacobson (1992) Climate Dynamics 6, 179-184

RATEPOL

POLSTACK

(TILIA)

RATE OF CHANGE ANALYSIS

Graph of distance (number of standard deviations) moved every 100 yr in the first three dimensions of the ordination vs age. Greater distance indicates greater change in pollen spectra in 100yr.

Jacobson & Grimm (1986)

Jaco

bso

n &

Gri

mm

(1

98

6)

GRADIENT ANALYSIS OF SINGLE SEQUENCE

Ordination methods CA/DCA or PCAjoint plot biplot

Constrained CA or PCA

Sample summary CA/DCA/PCA

Species arrangementCCA or simple discriminants

CA = correspondence analysis

DCA = detrended correspondence analysis

PCA = principal components analysis

CCA = canonical correspondence analysis VEGAN

CANOCO

Biplot of the Kirchner Marsh data; C2 = 0.746. The lengths of the Picea and Quercus vectors have been scaled down relative to the other vectors. Stratigraphically neighbouring levels are joined by a line.

PCA Biplot 74.6%

Gordon, 1982

Correspondence analysis representation of the Kirchner Marsh data; C2 = 0.620. Stratigraphically neighbouring levels are joined by a line.

CA Joint Plot 62%

Gordon, 1982

Stratigraphical plot of sample scores on the first correspondence analysis axis (left) and of rarefaction estimate of richness (E(Sn)) (right) for Diss Mere, England. Major pollen-stratigraphical and cultural levels are also shown. The vertical axis is depth (cm). The scale for sample scores runs from –1.0 (left) to + 1.2 (right).

The 1st and 2nd axis of the Detrended Correspondence Analysis for Laguna Oprasa and Laguna Facil plotted against calibrated calendar age (cal yr BP). The 1st axis contrasts taxa from warmer forested sites with cooler herbaceous sites. The 2nd axis contrasts taxa preferring wetter sites with those preferring drier sites.

Haberle & Bennett 2004

Percentage pollen and spore diagram from Abernethy Forest, Inverness-shire. The percentages are plotted against time, the age of each sample having been estimated from the deposition time. Nomenclatural conventions follow Birks (1973a) unless stated in Appendix 1. The sediment lithology is indicated on the left side, using the symbols of Troels-Smith (1995). The pollen sum, P, includes all non-aquatic taxa. Aquatic taxa, pteridophytes, and algae are calculated on the basis of P + group as indicated.

Species arrangement

Pollen types re-arranged on the basis of the weighted average for depth TRAN

ANALOGUE ANALYSIS

Modern training set – similar taxonomy

– similar sedimentary environment

Compare fossil sample 1 with all modern samples, use appropriate DC, find sample in modern set ‘most like’ (i.e. lowest DC) fossil sample 1, call it ‘closest analogue’, repeat for fossil sample 2, etc.

Overpeck et al. (1985) Quat. Res. 23, 87–108

ANALOG

MATCH

MAT

ANALOGUE – R package

RIOJA

Repeat for all fossil samples

Repeat for all modern samples

Compare fossil sample i with

modern sample j

Calculate similarity

between i and j

Sij

Find modern sample with highest similarity

'ANALOGUE'

? Evaluation

Dissimilarity coefficients, radiocarbon dates, pollen zones, and vegetation types represented by the top ten analogues from the Lake West Okoboji site.

Maps of squared chord distance values with modern samples at selected time

intervals

Plots of minimum squared chord-distance for each fossil spectrum at each of the eight sites.

A schematic representation of how fossil diatom zones/samples in a sediment core from an acidified lake can be compared numerically with modern surface sediment samples collected from potential modern analogue lakes. In this space-for-time model the vertical axis represents sedimentary diatom zones defined by depth and time; the horizontal axis represents spatially distributed modern analogue lakes and the dotted lines indicate good floristic matches (dij = <0.65), as defined by the mean squared Chi-squared estimate of dissimilarity (SCD, see text).

Flower et al. (1997)

Analogues and lake restoration

Flower et al. (1997)

COMPARISON AND CORRELATION BETWEEN TIME

SERIES Two or more stratigraphical sets of variables from same

sequence.

Are the temporal patterns similar?

(1) Separate ordinations

Oscillation log - likelihood G-test or 2 test

(2) Constrained ordinations

Pollen data - 3 or 4 ordination axes or major patterns of variation Y

Chemical data - 3 or 4 ordination axes X

Depth as a covariable

Does 'chemistry' explain or predict 'pollen'? i.e. is variance in Y well explained by X?

Lotter et al. (1992) J. Quat. Sci. Pollen 16O/18O (depth)

34% 16% 12%

79% 12% 4% 1%

COMPARISON AND CORRELATION BETWEEN TIME SERIES

Two or more stratigraphical sets of variables from same sequence.

Are the temporal patterns similar?

(1) Separate ordinations

Oscillation log - likelihood G-test or 2 test

(2) Constrained ordinations

Pollen data - 3 or 4 ordination axes or major patterns of variation Y

Chemical data - 3 or 4 ordination axes X

Depth as a covariable

Does 'chemistry' explain or predict 'pollen'? i.e. is variance in Y well explained by X?

Lotter et al. (1992) J. Quat. Sci. Pollen 16O/18O (depth)

Pollen, oxygen-isotope stratigraphy, and sediment composition of Aegelsee core AE-1 (after Wegmüller and Lotter 1990)

Pollen and oxygen-isotope stratigraphy of Gerzensee core G-III (after Eicher and Siegenthaler

1976)

Is there a statistically significant relationship between the pollen stratigraphy and the stable-isotope record?

Summary of the results from detrended correspondence analysis (DCA) of late-glacial pollen spectra from five sequences. The percentage variance represented by each DCA axis is listed.

Reduce pollen data to DCA axes. Use these then as ‘responses’

Site No. of samples

No. of taxa

DCA Axis

1 2 3 4

Aegelsee AE-1 100 26 57.2 12.0 2.3 1.4

Aegelsee AE-3 54 32 44.3 3.3 1.5 1.4

Gerzensee G-III

65 28 37.6 4.0 1.2 0.9

Faulenseemoos

62 25 44.1 18.8 5.0 3.8

Rotsee RL-250 44 23 38.2 13.3 3.1 2.3

Results of redundancy analysis and partial redundancy analysis permutation tests for the significance of axis 1 when oxygen isotopes and depth are predictor variables, when oxygen is the only predictor, and when oxygen isotopes are the predictor variable and depth is a covariable.

Site Predictor variable: 18O

and depth

Predictor variable: 18O

Covariable: depth

Predictor variable:

18O

Number of response

variables (DCA axes)

Pollen DCA axes

Aegelsee AE-1

0.01a 0.01a 0.02a 2

Aegelsee AE-3

0.01a 0.16 0.20 1

Gerzensee G-III

0.01a 0.46 0.57 1

Faulenseemoos

0.01a 0.01a 0.01a 3

Rotsee RL-250

0.01a 0.21 0.08 2

a Significant at p< 0.05(Lotter et al. 1992)

In multi-proxy studies (e.g. pollen, diatoms, chironomids, etc. studied on the same core), important question is ‘are the major stratigraphical patterns of variation (‘signal’) the same in all proxies?’

Laguna Facil, southern Chile

Massaferro et al. 2005 Quaternary Science Reviews 24: 2510-2522

Pollen and chironomids studied on the same core

Simplified each data-set to the first ordination axes of a correspondence analysis (CA) and a principal components analysis (PCA) for both data-sets

MULTI-PROXY STUDIES

Massaferro et al. 2005

Chironomid stratigraphy


Pollen stratigraphy

Can detect similarities in both proxies and differences

1. Major change in both prior to 14,700 cal yr BP.

2. Changes in the chironomids tend to lag behind changes in the pollen. Perhaps a chironomid response to changes in vegetation (tree canopy and forest type) or lake chemistry, resulting from changes in catchment soils as a result of vegetational change.

3. At about 7200 cal yr BP, chironomids change before the pollen. May be a response to climate change.

4. Strong correlations between the charcoal stratigraphy and pollen and chironomid stratigraphies. Probable importance of fire and/or vulcanism in influencing both vegetational and limnological dynamics.

Charc

o

al


Can use ordination methods to summarise several palaeoecological proxies and to compare with other proxies

Major changes between pre-European period (A)

and European settlement (B)

Lake Euramoo, NE Queensland, last 800 years

Haberle et al. 2006

Tested how well different proxies ‘predict’ or ‘explain’ (in a statistical sense) other proxies

Only proxy that significantly predicted other proxies was pollen that predicted changes in diatoms (25.4%) and chironomids (15.4%)

Illustrates the importance of catchment and its vegetation on the lake and its biota

Assessing Potential External 'Drivers' on an Aquatic Ecosystem

Bradshaw et al. 2005 The Holocene 15: 1152-1162

Dalland Sø, a small (15 ha), shallow (2.6 m) lowland eutrophic lake on the island of Funen, Denmark.

Catchment (153 ha) today

agriculture 77 ha

built-up areas 41 ha

woodland 32 ha

wetlands 3 ha

Nutrient rich – total P 65-120 g l-1

Map of Dalland Sø

Multi-proxy study to assess role of potential external 'drivers' or forcing functions on changes in the lake ecosystem in last 7000 yrs.

Data: No. of samples

Transformation

Sediment loss-on-ignition % 560 None

Sediment dry mass accumulation rate

560 Log (x + 1)

Sediment minerogenic matter accumulation rate

560 Log (x + 1)

Plant macrofossil concentrations

280 Log (x + 1)

Pollen % 90 None

Diatoms % 118 None

Diatom inferred total P 118 None

Biogenic silica 84 Not used

Pediastrum % 90 None

Zooplankton 31 Not used

Terrestrial landscape or catchment development

Bradshaw et al. 2005

Aquatic ecosystem development


DCA of pollen and diatom data separately to summarise major underlying trends in both data sets

Pollen – high scores for trees, low scores for light-demanding herbs and crops

Diatom - high scores mainly planktonic and large benthic types, low scores for Fragilaria spp. and eutrophic spp. (e.g. Cyclostephanos dubius)


Major contrast between samples before and after Late Bronze Age forest clearances


'Catchment'

'Lake

'

Prior to clearance, lake experienced few impacts.

After the clearance, lake heavily impacted.

Canonical Correspondence Analysis

Response variables:

Diatom taxa

Predictor variables:

Pollen taxa, LOI, dry mass and minerogenic accumulation rates, plant macrofossils, Pediastrum

Covariable:

Age

69 matching samples

Partial CCA with age partialled out as a covariable. Makes interpretation of effects of predictors easier by removing temporal trends and temporal autocorrelation

Partial CCA all variables:

18.4% of variation in diatom data explained by Poaceae pollen, Cannabis-type pollen, and Daphnia ephippia, the only three independent and statistically significant predictors.

As different external factors may be important at different times, divided data into 50 overlapping data sets – sample 1-20, 2-21, 3-22, etc.

CCA of 50 subsets from bottom to top and % variance explained


1. 4520-1840 BC Poaceae is sole predictor variable (20-22% of diatom variance)

2. 3760-1310 BC LOI and Populus pollen (16-33%)

3. 3050-600 BC Betula, Ulmus, Populus, Fagus, Plantago, etc. (17-40%)

i.e. in these early periods, diatom change influenced to some degree by external catchment processes and terrestrial vegetation change.

4. 2570 BC – 1260 AD Erosion indicators (charcoal, dry mass accumulation), retting indicator Linum capsules, Daphnia ephippia, Secale and Hordeum pollen (11-52%)

i.e. changing water depth and external factors

5. 160 BC – 1900 AD Hordeum, Fagus, Cannabis pollen, Pediastrum boryanum, Nymphaea seeds (22-47%)

i.e. nutrient enrichment as a result of retting hemp, also changes in water depth and water clarity

Strong link between inferred catchment change and within-lake development. Timing and magnitude are not always perfectly matched, e.g. transition to Mediæval Period


Regional zones, description of common features, interpretation, detection of unique features.

Sequence comparison and correlation.

Sequence slotting

SLOTSEQ

FITSEQ

CONSSLOT

Combined scaling of two or more sequences.

CANOCO

ANALYSIS OF TWO OR MORE SEQUENCES

Slotting of the sequences S1 (A1, A2, ..., A10) and S2 (B1, B2, ..., B7), illustrating the contributions to the measure of discordance (S1, S2) and the 'length' of the sequences, (S1, S2).

The results of sequence-slotting of the Wolf Creek and Horseshoe Lake pollen sequences ( = 2.095). Radiocarbon dates for the pollen zone boundaries are also given, expressed as radiocarbon years before present (BP).

SLOTSEQ

Birks & Gordon (1985)

Comparison of oxygen-isotope records from Swiss lakes Aegelsee (AE-3), Faulenseemoos (FSM) and Gerzensee (G-III) with the Greenland Dye 3 record (Dansgaard et al, 1982). LST marks the position of the Laacher See Tephra (11,000 yr BP). Letters and numbers mark the position of synchronous events (for details see text).

Psi values for pair-wise sequence slotting of the stable-isotope stratigraphy at five Swiss late-glacial sites and the Dye 3 site in Greenland. Values above the diagonal are constrained slotting, using the three major shifts shown in previous figure; values below the diagonal are for sequence slotting in the absence of any external constraints. The mean 18O and standard deviation for each sequence is also listed.

CONSLOXY

Lotter et al. (1992)

FUGLA NESS, Shetland

Pollen diagram from Sel Ayre showing the frequencies of all determinable and indeterminable pollen and spores expressed as percentages of total pollen and spores (P).

Abbreviations: undiff. = undifferentiated, indet = indeterminable.

Comparison of Bjärsjöholmssjön and Färskesjön using principal component analysis. The mean scores of the local pollen zones and the ranges of the sample scores in each zone are plotted on the first and second principal components, and are joined up in stratigraphic order. The Blekinge regional pollen assemblage zones are also shown.

Birks & Berglund (1979)

Comparison of Färskesjön and Lösensjön using principal component analysis. The mean scores of the local pollen zones and the ranges of the sample scores in each zone are plotted on the first and second principal components, and are joined up in stratigraphic order. The regional pollen assemblage zones are also shown.

The 1st and 2nd axis of the Detrended Correspondence Analysis for Laguna Oprasa and Laguna Facil plotted against calibrated calendar age (cal yr BP). The 1st axis contrasts taxa from warmer forested sites with cooler herbaceous sites. The 2nd axis contrasts taxa preferring wetter sites with those preferring drier sites

Haberle & Bennett, 2004

Pollen percentage diagram of selected taxa plotted against depth. Lithostratigraphic symbols are based on Troels-Smith (1995). For correlations and ages see Tzedakis (1993, 1994).

Tzedakis & Bennett (1995)

Pollen percentage diagrams of selected arboreal taxa of the Metsovon, Zista, Pamvotis, and Dodoni I and II forest periods of Ioannina 249.

5e

7c

9c

11a + b + c

Tzedakis & Bennett (1995)

Solar insolation values of mid-month day for selected periods at latitude 39º40'N. Values are given for July and January extremes and July minus January for each interglacial period calculated at thousand year intervals. Values are expressed in cal cm2 day-1. In parentheses are percentage differences from 10 ka values. Timing of extreme insolation excursions also given. Data from a computer program written by N.G. Pisias, based on Berger (1978). Chronology based on Imbrie et al. (1984) and Martinson et al. (1987)

Combined plot of sample scores on the first two principal components for Metsovon, Zista, Pamvotis, and Dodoni I forest periods. Asterisks indicate the base of the intervals considered.

Results of comparison of vegetation and climatic signatures of different interglacial periods. '+' sign means similar and '-' means different. First sign refers to climate and second to vegetation character.

Different climate, similar pollen in one comparison

Tzedakis & Bennett, 1995

TEMPORAL DATA

e.g. from monitoring

Rate of change

Gradient analysis (unconstrained, constrained)

Principal response curves

Variance partitioning

Trend analysis – regression against time, Monte Carlo permutation testing

Time-series analysis – see Gavin Simpson’s lecture

- many (>100) points

- few (10–20) points

HYPOTHESIS TESTING

1. External climate forcing functions

2. Catchment forcing functions

3. Lake as isolated system that evolves through time with its own internal dynamics

Assessing potential 'drivers' on aquatic ecosystems.

What determines changes in lake organisms and lake sediments?

Lake Development and Catchment Change

Birks et al. 2000

(a) Sägistalsee, Bernese Oberland, Swiss Alps

A.F. Lotter et al. 2003

J. Paleolimnology 30: 253-342

Andy Lotter

Lotter & Birks 2003

Lotter & Birks 2003

Age-depth model Sedimentation rate

Wick et al. 2003

Heiri & Lotter 2003

Sägistalsee, SwitzerlandIdeal study:

1. Critical ecological situation at tree-line today; sensitive

2. One core. Many proxies (pollen, macros, chironomids, cladocera, grain size, sediment magnetics, sediment geochemistry)

3. Well dated; 18 AMS 14C dates on terrestrial plant material

4. Well co-ordinated by A.F. Lotter

5. High quality data:Data-set

No. of samples

No. of taxa/variables

Pollen 212 203

Plant macros 372 53

Chironomids 82 30

Cladocera 112 7

Geochemistry 176 14

Grain-size 294 6

Magnetics 504 5

6. Consistent numerical methodology on all proxies

7. Numerical methods used to test hypotheses about the influence of climate and catchment processes on the aquatic ecosystem in the perspective of the Holocene time-scale. (Partial redundancy analysis with restricted Monte Carlo permutation tests)

Of the catchment changes, the main ones appear to be the spread of Picea abies at about 6300 cal BP and Bronze Age and subsequent forest clearances and conversion to grazing pastures.

8. Split proxy data into one predictor variable (plant macrofossils as a reflection of catchment vegetation) and several response variables (cladocera, chironomids, pollen, sediment grain-size, magnetics, geochemistry)

Predictor variables:

Lotter & Birks 2003

Hypotheses tested:1. Climate has had a significant control on lake ecosystem changes2. Catchment vegetation has played significant role on lake changes

"Responses" (proxies)

Scale Climate a signif-icant predictor?

Catchment vegetation a significant predictor?

Terrestrial

Pollen Catchment & regional Y Y

Macrofossils Catchment - -Lake biotic

Chironomids Lake N Y Cladocera Lake N Y

Lake abiotic

Grain size Lake - Y Magnetics Lake - Y

Geochemistry Lake - (Y)* #

* Tested against insolation, central European cold phases, & Atlantic IRD record

# Veg phases: Betula-Pinus cembra; Alnus-Pinus cembra; Picea abies ~ 6300 cal BP; Pasture phases from Bronze Age to present

SPATIAL GEOGRAPHICAL DATA

Geographical co-ordinates X, Y

Spatial analysis

Legendre & Fortin (1989) Vegetatio 80: 107-138

Legendre (1993) Ecology 74: 1659-1673

Koenig (1999) Trends in Ecology & Evolution 14: 22-26

Borcard et al. (2004) Ecology 85: 1826-1832

STATISTICAL ANALYSIS

Random sample assumption

Spatial autocorrelation

Effect of spatial autocorrelation on tests of correlation coefficients for randomly generated, positively autocorrelated data

True interval: r not significantly different from zero

Confidence interval computed from the usual tables r 0 ***

Confidence interval of a correlation coefficient

-1-1 00 +1+1rr

‘Liberal’ results – too many coefficients will be judged statistically significant when, in reality, they are not

SPATIAL AUTOCORRELATION

Classical statistics assumes independence of observations.

Ecological variables very commonly show spatial structure in the sample space.

Variable is autocorrelated when it is possible to predict values of this variable at some points in space from the known values at other sampling points whose spatial positions are known. Correlation in relative

mean density of mountain hares between eleven provinces in Finland over 39 years (1946-85) plotted against distance between centres of provinces.

HOW TO TEST FOR SPATIAL STRUCTURE?

Spatial autocorrelation coefficients – Moran's I

H0 – no spatial autocorrelation

Each value of the I coefficient is equal to

E(I) = -(n-1)-1 0

where E(I) is the expected I and n is the number of data points

H1 – there is significant spatial autocorrelation

The value of I is significantly different from E(I) 2)())(()( yyWyyyywndI ijiij

2)())(()( yyWyyyywndI ijiij

where y represents the values of the variables, all summations are for i and j varying from 1 to n, the number of data points but excluding where i = j. The wij's take the value 1 when the pair (i,j) relates to distance class d (the one being computed) and is 0 otherwise, W is the sum of the wij's or the number of pairs (in the whole square matrix of distances between points) taken into account when computing coefficients for a given distance class.

I(d) is computed for each distance class d.

Moran's I usually -1 to +1 but can exceed these values.

Positive I suggests positive correlation

Negative I suggests negative correlation.

Can test for significance by standard errors and confidence intervals or by randomisation tests.

Behaves like Pearson's correlation coefficient r as its numerator is sum of cross-products of centred terms (covariance term), comparing in turn the values found at all pairs of points in the given distance class.

Sensitive to extreme values, like r is.

Plot a CORRELOGRAM where Moran's I is plotted against distance (d).

All-directional correlogram – assume that the phenomenon is isotropic, namely that the autocorrelation function is the same whatever direction is considered.

Correlograms for artificial data. Black squares are significant at = 0.05

Legendre & Fortin 1989

Moran's I correlogram for cross-validation residuals for transfer functions. See low I in MAT and ANN, high I in WA and GLR (ML) (spatial autocorrelation not sucked in by these methods), intermediate I in WAPLS

Legendre (1987)

In: Evolutionary Biogeography of the Marine Algae of the North Atlantic (eds. D.J. Garbary & R.R. Soult). Springer

Legendre & Legendre

(1984)

Can. J. Fish. Aquat. Sci. 41, 1781-1802

Andersson (1988)

Vegetatio 74, 95-106

Openshaw (1974)

Computer Applic. 3-4, 136-160

Webster & Burrough

(1972)

J. Soil Sc. 23, 222-234

SPATIALLY CONSTRAINED CLUSTERINGS

REGIONALISATION

REGULAR GRID

A) Only group objects if they are adjacent

CONCLUST DC matrix of objects D

Adjacency matrix (1/0) A

(adjacent if have side or corner in common)

Compare D and A. If not adjacent, flag as negative DC and ignore.

Generalised agglomerative strategy 7 methods

As fuse, update adjacency matrix

If Dab or Dbc positive, Dabc must be positive

Plot results as map for 10, 9, 8... 2 groups

CONCMAP CONCSCR

printer screen colours

Observations:

1) Little difference in results between clustering methods (cf unconstrained ca).

Little difference with different DCs (within reason!).

2) Faster than unconstrained ca.

3) Spatial constraints with biogeographical data make little difference, i.e. data strongly structured themselves.

IRREGULAR GRID

B) Weight DC matrix between objects

Webster & Burrough (1972)CONDCMAT

distance weighting

inverse square where

exponential

Similar results to CONCLUST, but does not have to be grid pattern.

w

ddDD wijijd

ij

1

.max

w

ddDD wijijd

ij

1.max 2

ijdww

wdij

dij

ijeDD / 1

Geog distance

Weighting factor

Andersson (1988) neighbour weighting 1/0 data for species (variable) analysis

NEIWEI

+ + +

+ 1 +

+ + +

+ +

1 +

1 + 8 = 9 score

1 + 3 = 4 score

'pseudofrequency' scores

1 1 1

1 1 1

1 1 1 1 1

1 1 1 1 1

1 1 1 1 1

4 4 1

7 7 3

5 8 8 8 5

6 9 9 9 6

4 6 6 6 4

Species A Scores

SPATIALLY CONSTRAINED ORDINATIONS

CCA or RDA detect simple gradients using x and y co-ordinates

Direction of gradient is tan–1 (b2/b1)

Complex gradients

ybxb 21

Trend-surface analysis

Can partial out spatial effects – remove effects of spatial autocorrelation.

254

2321 ybxybxbybxb

39

28

27

36 ybxybyxbxb

quadratic

cubic

CANOCO

Maps obtained by block kriging for the sample scores, on canonical axes 1 (top) and 2 (bottom), in the species space (left) and in the trend-surface geographic space (right); values multiplied by 100 for mapping. Peaks are shadowed. No samples had been taken from the blanked area on the left.

CCA site scores WA species scores

CCA site scores linear combinations of env. variables

GE

OG

RA

PH

ICA

L SPA

CE

Axis 2Axis 2

SPEC

IES S

PA

CE

Axis 1Axis 1

VARIANCE PARTITIONING INTO FOUR ADDITIVE COMPONENTS

a) Non-spatial environmental variation

i.e. environmental effects after partialling geographical variation Local environmental

b)Spatially structured environmental variation

i.e. spatially covarying environmental variation

Regional environmental

c) Spatial variation not shared by environmental variables

i.e. spatial effects after partialling environmental variables Pure spatial

d)Unexplained

CCA explanatory vars

covariables

canonical

s

%

1) CCA Envir - 0.268 18.6

2) CCA Geography - 0.373 25.9

3) partial CCA

Envir Geography

0.156 10.8

4) partial CCA

Geography Envir 0.261 18.1Total inertia 1.443

a) Non-spatial (analysis 3) 10.8%

b) Spatially covarying environmental variation

(analyses 1-3) 7.8%

c) Pure spatial (analysis 4) 18.1%

d) Unexplained 63.3%

(a) (b) (c) (d)

Environmental variance Unexplained

Spatial structure variance

Variation partitioning of a species data table, showing that fraction (b) is the intersection of the environmental and spatial components of the species variation.

Variation partitioning of the oribatid mites data matrix

13.7 %

31.0 %

12.2 %

43.0 %

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Oribatids

Pe

rce

nt

of

va

ria

tio

n

Undetermined

Space

Env + space

Environment

Fraction A

Non-spatial environmental variation 13.7%

'Local environment'

'Pure environment' independent of space

Fraction B

Spatially-structured environmental variation 31.0%

(Spatial component of the environmental influence)

Substrate moisture content

Fraction C

Non-environmentally explained variation 12.2%

Spatial structure independent of the environmental variables

'Pure spatial'

Theoretical causal relationships between environmental variables (representing processes) and community structure. Fractions (a), (b), (c) and (d) of the community data variation refer to Figure 5. ECM: Environmental control model. BCM: Biotic control model. HD: Historical dynamics. Asterisks * indicates factors not explicitly spelled out in the model.

Fraction

Causal factor Process

Effect

Non-spatial environ-mental variation

(a) Environmental factor ECM - Community structure Local environment

(a)* Non-spatially structured factor not included in the analysis

ECM - Env. variable in the analysis - Non-spatial community var.

Historical events without spatial structure at the study scale

HD - Env. variable in the analysis - Non-spatial community var.

Spatially structured env. variation

(b) Env. factor with spatial structure ECM - Community spatial structure

Covariation between environment and space

(b)* Spatially structured env. factor not included in the analysis

ECM - Env. variable in the analysis - Community spatial structure

Spatially structured historical events

HD - Env. variable in the analysis - Community spatial structure

Non-envir spatial variation

(c)* Spatially structured factors not included in the analysis

ECM - Community spatial structure

Spatial

Spatially structured historical events

HD - Community spatial structure

Predation, competition, etc. BCM - Community spatial structure

Un-explained

(d)* Factor not included in the analysis, not spatially structured (at study scale)

ECM - Non-explained community var.

Biotic control factors not spatially structured (at study scale)

BCM - Non-explained community var.

Random variation, sampling error, etc.

Noise - Non-explained community var.

Major limitation of this approach is that it is unsuitable for spatial structures present at a WIDE range of different spatial scales.

Principal co-ordinates analysis of neighbour matrices (PCNM).

Borcard & Legendre (2002) Ecological Modelling 153: 51-68

Borcard et al. (2004) Ecology 85: 1826-1832

Eigenvalue decomposition of a truncated matrix of geographic distances between the sampling sites.

Eigenvalues corresponding to positive eigenvalues are used as spatial descriptors in regression or canonical ordinations.

SPACEMAKER

PCNM (R)

spacemakeR

Borcard & Legendre (2002)

PCNM of linear transect of 100 samples, 1 m apart.

Set distance threshold at 1 m to retain only the closest neighbours: replaced other distance by 1 m x 4 = 4 m.

Principal co-ordinates correspond to a series of sinusoids with decreasing periods. Largest is n+1, smallest is ~3.

Borcard & Legendre (2002)

Ecological data – Adiantum tomentosum abundance along transects in NE Peru. 260 adjacent 5 x 5 m subplots

(a) Fern (thick), PCNM (thin line)

(b) very broad scale (thick), broad scale (thin line)

(c) medium scale

(d) fine scale

Oribatid mites and PCNM – irregular two-dimensional sampling

PCNM gives 43 variables with truncation distance of 1.012 m

Show coarse broad-scale patterns and fine-scale patterns

Forward selection in RDA retains 12 PCNM variables. Explains 45.1% of variance (cf. 43.2% in simple RDA)

RDA Axis 1 22.6% variance – shrubs or no shrubs R2 = 0.48

RDA Axis 2 8.4% variance – shrubs or hummocksR2 = 0.11

RDA Axis 3 4.5% variance R2 = 0.34 – areas of low water content and no shrubs

When use environmental variables and simple X-Y trend as covariables, and RDA with PCNM variables, two significant axes remain. May reflect unmeasured abiotic or biotic mechanisms, such as food sources.

Atlantic foraminifera & SST Telford & Birks (2005)

Matrix of PCNM variables created from matrix of distances between N Atlantic sites truncated at 781 km, the minimum distance that links all sites into a single network.

385 orthogonal PCNM representing space.

Forward selection in CCA retained 37 of these.

Represent large spatial patterns.

SST independent of space 1.8% variance

Covariation between SST & space 29.9% variance

Space independent of SST 42.5% variance

Unexplained 25.7%

Pure space explains most. Therefore there are important unknown spatial structures in the data. If only considering SST, expect strong spatial autocorrelation in residuals of SST transfer function models.

Lowest auto-correlation in MAT and ANN residuals

Highest auto-correlation in WA and GLR (= ML) residuals

Highlights 'secret assumption' of transfer functions

PREDICTIVE MODELS FROM SPATIAL DATA

Nature management – well explored areas, poorly explored areas

Lesotho bird atlas

Habitat variables PCA axes

Logistic regression to model species occurrences and absences in terms of habitat PCA

Wildlife management GIS Mt Graham red squirrel in relation to env vars Logistic regression

Pereira & Itami (1991) Photogr. Engin. & Remote Sensing 57, 1475–1486

554433221101xbxbxbxbxbb

pp

log

recording effort

PCA site scores

Distribution maps for three bird species in Lesotho produced by logistic modelling of presence-absence data. Higher probabilities of occurrence are indicated by increasing circle size and actual field records are shown as filled circles.

Pied crow

Ground wood-pecker

Cape vulture 1

Cape vulture 2*

PC1 -0.90 (0.28)

0.54 (0.18)

0.40 (0.14)

0.85 (0.28)

PC2 -0.14 (0.41)

-0.72 (o.29)

0.02 (0.22)

-0.15 (0.25)

PC3 -0.49 (0.35)

0.01 (0.28)

-0.31 (0.23)

-0.44 (0.27)

PC4 -0.34 (0.29)

-0.24 (0.29)

0.02 (0.29)

0.76 (0.48)

Effort 0.15 (0.09)

0.31 (0.14)

0.04 (0.03)

0.10 (0.04)

Con-stant

-2.43 (0.92)

-1.52 (0.84)

-0.75 (0.42)

-1.96 (0.79)

Devi-ance

33.95 45.21 62.73 48.88

Df 49 49 49 47

P-value+

0.95 0.63 0.09 0.40

Summary of the overall logistic models. The upper data are regression coefficients with their standard errors in brackets.

* Cape vulture 2 excludes data for two squares * Cape vulture 2 excludes data for two squares identified as having a disproportionate effect on identified as having a disproportionate effect on the model using all the data (Cape vulture 1).the model using all the data (Cape vulture 1).+ The + The PP-value is best interpreted as a measure of -value is best interpreted as a measure of standardized deviance, useful for comparing standardized deviance, useful for comparing models with differing degrees of freedom.models with differing degrees of freedom.

Hill (1991) J. Biogeogr. 18, 247–255

CCA species data +/–

environmental data max altitude

annual rainfall

mean temperature

geology

presence of coast

4534232121101

xbxbxbxbxbbp

p

log

x1 – x4 are site scores in CCA

Predict distributions given simple environmental data.

Actual DIPPER

Actual LITTLE RINGED PLOVER

Actual ROCKROSE

Predicted DIPPER

Predicted ROCKROSE

Predicted LITTLE RINGED PLOVER

Actual and predicted distributions of species using logit regression with six parameters. The species are Dipper (Cinclus cinclus), Little Ringed Plover (Charadius dubius) and Common Rockrose (Helianthemum nummularium). Circles of increasing size signify categories of probability as follows: 1-4%; 5-10% 11-30%; 31-50%; 51-75%; 76-100%.

PREDICTION OF UPLAND PLANT COMMUNITY DISTRIBUTION USING

LOGISTIC REGRESSION

54 upland vegetation types recorded in 1,514 ten-kilometre grid squares in the uplands of Scotland, England, and Wales.

Environmental variables from National Land Characteristics Data Bank.

Topography 13 variables (22 possible)

Climate 18 variables (29 possible)

Geology 19 variables (29 possible)

Soil types 8 variables (8 possible)

Land-use 2 variables (22 possible)

Reduced 31 Topography + climate variables to 5 PCA axes (63.6% variance) and 27 Geology + Soil type variables to 2 PCA axes (20.3%)

Used 5 PCA axes + their square terms, the 2 PCA axes, + Land-use variables as predictors in logistic regression using the +/- of each vegetation type as the response variable.

54 models

7 have rho (r2)< 0.20

26 have rho 0.20 - 0.40

20 have rho 0.40 - 0.60

2 have rho > 0.60

Mean rho values

Calcareous grassland0.38

Heaths 0.41

Mires 0.26

Other grasslands 0.41

Woodland & scrub 0.40

Alpine snow-beds etc.0.52

Poorest fits: Heaths 1

Mires 5

Grasslands 1

Predicted and known 10km square distribution of NVC U20 (Pteridium aquilinum – Galium saxatile community). Predictions were not made for lowland areas.

Predicted and known 10km square distribution of NVC U10 (Carex bigelowii – Racomitrium lanuginosum moss-heath).

Predicted and known 10km square distribution of NVC H13 (Calluna vulgaris – Cladonia arbuscula heath).

Predicted and known 10km square distribution of NVC H9 (Calluna vulgaris – Deschampsia flexuosa heath) in the uplands.

Predicted and known 10km square distribution of NVC M6 (Carex echinata – Sphagnum recurvum/auriculatum mire).

Predicted and known 10km square distribution of NVC M10 (Carex dioica – Pinguicula vulgaris mire).

Predicted and known 10km square distribution of NVC W19 (Juniperus communis – Oxalis acetosella woodland).

Salix herbacea-Racomitrium heterostichum, snow-bed

Cryptogramma crispa-Athyrium distentifolium, snow-bed

Luzula sylvatica-Geum rivale, tall-herb community

Saxifraga aizoides-Alchemilla glabra, banks

Nardus stricta-Galium saxatile, grassland

Festuca ovina-Agrostis capillaris-Galium saxatile, grassland

Festuca ovina-Agrostis capillaris-Rumex acetosella, grassland

Calluna vulgaris-Erica cinerea, heath

Erica tetralix-Sphagnum compactum, wet heath

Erica tetralix-Sphagnum papillosum, raised and blanket mire

PREDICTING THE PROBABILITY OF SPECIES OCCURRENCE USING

SURVEY DATALe Duc et al. (1992) Watsonia 19: 97-105

Le Duc et al. (1992) Aspects of Applied Biology 29: 41-48

Firbank et al. (1998) Weed Research 35: 1-10

Plant recording 10 km grid squares

Tetrads 2 km grid squares

Impossible to record all tetrads, only record 3 (A, J, and W)

Convert tetrad data to probabilities of species occurrence, introducing some spatial smoothing in the interpolation.

Layout of the botanical monitoring scheme of the BSBI.

Gaussian smoothing of occurrence in tetrads.

Species

occurrenceProbability of species

occurrence

To predict species occurrence, need external predictors (e.g. soil type, land-use classes) and logistic regression.

(a) data

(b) estimated probability

(c) estimated probability using soil groups

(d) estimated probability using land-use classes

Soil type main predictor

Veronica montana

Predicting weed distribution using tetrad data and soil types.

Firbank et al. (1998)

Soil x Smooth x 1

log cbap

pe

Soil 16 classes

Alopecurus myosuroides

(a) tetrads

(b) smoothed probability of occurrence

(c) prediction using (b) + soils

(d) 10 km square map

(a) Elymus repens

(b) Legousia hybrida

(c) Papaver rhoeas

(d) Senecio jacobea

(a) grass weeds of cereals

(b) broad-leaved weeds

(c) distribution of arable land

Species pool of cereal weeds greatest in central and southern England. Does not entirely coincide with distribution of arable farming.

OBSERVED today PREDICTED today PREDICTED future

PREDICTION OF FUTURE CHANGES - TROLLIUS EUROPAEUS

Known distribution of globeflower (Trollius europaeus)(data from the Biological Records Centre)

Predicted current distribution using Jan min. & July max. temp and annual precipitation as independent variables in a logistic regression.

Predicted distribution in 2050 using the same model but imposing the UK transient climate scenario for 2050.

Watt et al. (1997)

KEY RESEARCHERS IN ANALYSIS OF TEMPORAL

PALAEOECOLOGICAL DATA

Steve Juggins

Allan Gordon

Ed Cushing

Keith Bennett

Eric Grimm Bent Odgaard

Andy Lotter

KEY RESEARCHERS IN SPATIAL ANALYSIS OF ECOLOGICAL DATA

Pierre Legendre

Daniel Borcard

Mark Hill Marie-Josée Fortin

Richard Telford

analysis of temporal (stratigraphic) and spatial data numerical analysis of biological and...

Documents

constrained incremental

data set ioannina

randomized data set

b original data set

group sum

nonconstrained situation

g groups

numerical zonation analyses