analysis of temporal (stratigraphic) and spatial data numerical analysis of biological and...
TRANSCRIPT
Analysis of Temporal (Stratigraphic) and Spatial
Data
NUMERICAL ANALYSIS OF BIOLOGICAL AND
ENVIRONMENTAL DATA
John Birks
Introduction
Temporal stratigraphic data
Single sequence
Partitioning or zonation
Sequence splitting
Rate-of-change analysis
Gradient analysis and summarisation
Analogue matching
Relationships between two or more sets of variables in same sequence
Two or more sequences
Sequence comparison and correlation
Multi-proxy studies
Hypothesis testing
Spatial geographical data
Spatial autocorrelation
Spatially constrained clusterings
Spatially constrained ordinations
Predictive models for spatial data
ANALYSIS OF TEMPORAL AND SPATIAL DATA
INTRODUCTIONAnalysis of quadrats, lakes, streams, etc. Assume no autocorrelation, namely cannot predict the values of a variable at some point in space from known values at other sampling points.
PALAEOCOLOGY – fixed sample order in time.
strong autocorrelation – temporal autocorrelation
STRATIGRAPHICAL DATA
biostratigraphic, lithostratigraphic, geochemical, geophysical, morphometric, isotopic
multivariate
continuous or discontinuous time series
ordering very important – display, partitioning, trends, interpretation
SPATIAL DATA
many types, spatial autocorrelation
spatial or geographical co-ordinates very important
raises problems of statistical inference as samples not independent
TEMPORAL STRATIGRAPHIC DATA
ANALYSIS OF SINGLE SEQUENCE
ZONATION OR PARTITIONINGUseful for:
1) description
2) discussion and interpretation
3) comparisons in time and space
“sediment body with a broadly similar composition that differs from underlying and overlying sediment bodies in the kind and/ or amount of its composition”.
CONSTRAINED CLUSTERINGS
1) Constrained agglomerative procedures CONSLINK
CONISS
2) Constrained binary divisive procedures
Partition into g groups by placing g – 1 boundaries.
Number of possibilities
Compared with non-constrained situation.
Criteria – within-group sum-of-squares or variance SPLITLSQ
– within-group information SPLITINF
211 gnng for
12 1 n
n
i
m
k ikikik qpp
1 1
log
3) Constrained optimal divisive analysis OPTIMAL
2 group______________________________
3 group
4 group
4) Variable barriers approachBARRIER
All methods in one program: ZONE
RIOJA
nn11
nn11
nn11nn22
nn22 nn33
Pollen diagram and numerical zonation analyses for the complete Abernethy Forest 1974 data set.
Birks & Gordon (1985)
CONISS = constrained incremental sum-of-squares (= constrained Ward's minimum
variance)
OPTIMAL SUM OF SQUARES PARTITIONS OF THE ABERNETHY
FOREST 1974 DATANumber of groups g (zones)
Percentage of total sum-of-
squares
Markers
2 59.3 15
3 28.4 15 32
4 18.9 15 33 41
5 14.7 15 33 41 45
6 10.6 15 32 34 41 45
7 8.1 15 26 32 34 41 45
8 5.8 8 15 26 32 34 41 45
9 4.7 8 15 24 29 32 34 41 45
10 3.9 8 15 24 29 32 33 34 41 45
HOW MANY ZONES?
K D Bennett (1996) Determination of the number of zones in a bio-stratigraphical sequence. New Phytologist 132, 155-170
Broken stick model
Pn iri k
n
1 1
RIOJA (R)
BSTICK
Pollen percentage diagram plotted against depth. Lithostratigraphic column is represented; symbols are based on Troels-Smith (1995).
Tzedakis (1994)
Ioannina Basin
Variance accounted for by the nth zone as a proportion of the total variance (fluctuating curve) compared with values from a broken-stick model (smooth curve):
(a) randomized data set,
(b) original data set.
Zonation method: binary divisive using the information content statistic.
Data set; Ioannina.
Original data
Broken stick model
Bennett (1996)
Technical Point
Turns out that the binary divisive procedures SPLINTF and SPLITLSQ of Gordon and Birks (1972) are an early implementation of De’ath’s (2002) multivariate regression trees (MRT) discussed in the Modern Regression lecture.
Both are MRTs where a vector of sample depths or ages is used as the sole explanatory predictor variable
SPLINTF = distance-based MRT with information content as the dissimilarity measure
SPLITLSQ = MRT with Euclidean distance as the distance measure
Advantage of MRT over SPLINTF/SPLITLSQ as a zonation procedure is that the k-fold cross-validation in CARTs provides a simple way to assess the number of zones into which the stratigraphical sequence should be split.
MRT using the optimal partitioning approach still to be implemented.
mvpart (R)
SEQUENCE SPLITTING
Walker & Wilson (1978) J. Biogeog. 5, 1–21
Walker & Pittelkow (1981) J. Biogeog. 8, 37–51
SPLIT, SPLIT2
BOUND2
Need statistically ‘independent’ curves
Pollen influx (grains cm–2 year–1)
PCA or CA or DCA axes CANOCO
Aitchison log-ratio transformation LOGRATIO
i
ikik p
pZ log
m
k
iki m
pp1
loglogwhere
Correlograms of sequence splits with charcoal, inorganic matter and total pollen influxes for three sections of the pollen record. The vertical scales give correlations; the horizontal scales give time lag in years (assuming a sampling interval of 50 years).
Technical Point
The sequence splitting of Walker and Wilson (1978) is a precursor of regression trees within CART (see Modern Regression lecture).
In a regression tree a quantitative response variable, in our case a stratigraphical sequence of taxon A, is repeatedly split so that at each partition the sequence is divided into two mutually exclusive groups, each of which is homogeneous as possible.
In the regression tree implementation, a vector of sample depths or ages is used as the sole explanatory predictor variable. The splitting is then applied to each group separately until some stopping rule is reached.
Usually k-fold cross-validation is used to find the optimal tree-size using cross-complexity (CC) pruning.
CC = Timpurity + (Tcomplexity)
where Timpurity is the impurity of the current tree over all terminal nodes; Tcomplexity is the number of terminal leaves; and is a real number >0
is the tuning parameter that is minimised in CC pruning. Represent trade-off between tree-size and goodness-of-fit.
Small values of give large trees; large values of lead to small trees.
Starting with full tree, search to identify the terminal node that results in the lowest CC for a given value of .
As penalty on tree complexity is increased, the tree that minimises CC will become smaller and smaller until the penalty is so great that a tree with a single node (i.e. the original data) has the lowest CC:
Search produces a sequence of progressively smaller trees with associated CC.
k-fold cross-validation is used to find the optimal value of that gives the minimal root mean squared error (RMSE). Alternative is to select the smallest tree that lies within 1 standard error of the RMSE of the best tree.
rpart (R)
Amount of palynological compositional change per unit time.
Calculate dissimilarity between pollen assemblages of two adjacent samples and standardise to constant time unit, e.g. 250 14C years.
Jacobson & Grimm (1986) Ecology 67, 958-966
Grimm & Jacobson (1992) Climate Dynamics 6, 179-184
RATEPOL
POLSTACK
(TILIA)
RATE OF CHANGE ANALYSIS
Graph of distance (number of standard deviations) moved every 100 yr in the first three dimensions of the ordination vs age. Greater distance indicates greater change in pollen spectra in 100yr.
Jacobson & Grimm (1986)
Jaco
bso
n &
Gri
mm
(1
98
6)
GRADIENT ANALYSIS OF SINGLE SEQUENCE
Ordination methods CA/DCA or PCAjoint plot biplot
Constrained CA or PCA
Sample summary CA/DCA/PCA
Species arrangementCCA or simple discriminants
CA = correspondence analysis
DCA = detrended correspondence analysis
PCA = principal components analysis
CCA = canonical correspondence analysis VEGAN
CANOCO
Biplot of the Kirchner Marsh data; C2 = 0.746. The lengths of the Picea and Quercus vectors have been scaled down relative to the other vectors. Stratigraphically neighbouring levels are joined by a line.
PCA Biplot 74.6%
Gordon, 1982
Correspondence analysis representation of the Kirchner Marsh data; C2 = 0.620. Stratigraphically neighbouring levels are joined by a line.
CA Joint Plot 62%
Gordon, 1982
Stratigraphical plot of sample scores on the first correspondence analysis axis (left) and of rarefaction estimate of richness (E(Sn)) (right) for Diss Mere, England. Major pollen-stratigraphical and cultural levels are also shown. The vertical axis is depth (cm). The scale for sample scores runs from –1.0 (left) to + 1.2 (right).
The 1st and 2nd axis of the Detrended Correspondence Analysis for Laguna Oprasa and Laguna Facil plotted against calibrated calendar age (cal yr BP). The 1st axis contrasts taxa from warmer forested sites with cooler herbaceous sites. The 2nd axis contrasts taxa preferring wetter sites with those preferring drier sites.
Haberle & Bennett 2004
Percentage pollen and spore diagram from Abernethy Forest, Inverness-shire. The percentages are plotted against time, the age of each sample having been estimated from the deposition time. Nomenclatural conventions follow Birks (1973a) unless stated in Appendix 1. The sediment lithology is indicated on the left side, using the symbols of Troels-Smith (1995). The pollen sum, P, includes all non-aquatic taxa. Aquatic taxa, pteridophytes, and algae are calculated on the basis of P + group as indicated.
Species arrangement
Pollen types re-arranged on the basis of the weighted average for depth TRAN
ANALOGUE ANALYSIS
Modern training set – similar taxonomy
– similar sedimentary environment
Compare fossil sample 1 with all modern samples, use appropriate DC, find sample in modern set ‘most like’ (i.e. lowest DC) fossil sample 1, call it ‘closest analogue’, repeat for fossil sample 2, etc.
Overpeck et al. (1985) Quat. Res. 23, 87–108
ANALOG
MATCH
MAT
ANALOGUE – R package
RIOJA
Repeat for all fossil samples
Repeat for all modern samples
Compare fossil sample i with
modern sample j
Calculate similarity
between i and j
Sij
Find modern sample with highest similarity
'ANALOGUE'
? Evaluation
Dissimilarity coefficients, radiocarbon dates, pollen zones, and vegetation types represented by the top ten analogues from the Lake West Okoboji site.
Maps of squared chord distance values with modern samples at selected time
intervals
Plots of minimum squared chord-distance for each fossil spectrum at each of the eight sites.
A schematic representation of how fossil diatom zones/samples in a sediment core from an acidified lake can be compared numerically with modern surface sediment samples collected from potential modern analogue lakes. In this space-for-time model the vertical axis represents sedimentary diatom zones defined by depth and time; the horizontal axis represents spatially distributed modern analogue lakes and the dotted lines indicate good floristic matches (dij = <0.65), as defined by the mean squared Chi-squared estimate of dissimilarity (SCD, see text).
Flower et al. (1997)
Analogues and lake restoration
Flower et al. (1997)
COMPARISON AND CORRELATION BETWEEN TIME
SERIES Two or more stratigraphical sets of variables from same
sequence.
Are the temporal patterns similar?
(1) Separate ordinations
Oscillation log - likelihood G-test or 2 test
(2) Constrained ordinations
Pollen data - 3 or 4 ordination axes or major patterns of variation Y
Chemical data - 3 or 4 ordination axes X
Depth as a covariable
Does 'chemistry' explain or predict 'pollen'? i.e. is variance in Y well explained by X?
Lotter et al. (1992) J. Quat. Sci. Pollen 16O/18O (depth)
34% 16% 12%
79% 12% 4% 1%
COMPARISON AND CORRELATION BETWEEN TIME SERIES
Two or more stratigraphical sets of variables from same sequence.
Are the temporal patterns similar?
(1) Separate ordinations
Oscillation log - likelihood G-test or 2 test
(2) Constrained ordinations
Pollen data - 3 or 4 ordination axes or major patterns of variation Y
Chemical data - 3 or 4 ordination axes X
Depth as a covariable
Does 'chemistry' explain or predict 'pollen'? i.e. is variance in Y well explained by X?
Lotter et al. (1992) J. Quat. Sci. Pollen 16O/18O (depth)
Pollen, oxygen-isotope stratigraphy, and sediment composition of Aegelsee core AE-1 (after Wegmüller and Lotter 1990)
Pollen and oxygen-isotope stratigraphy of Gerzensee core G-III (after Eicher and Siegenthaler
1976)
Is there a statistically significant relationship between the pollen stratigraphy and the stable-isotope record?
Summary of the results from detrended correspondence analysis (DCA) of late-glacial pollen spectra from five sequences. The percentage variance represented by each DCA axis is listed.
Reduce pollen data to DCA axes. Use these then as ‘responses’
Site No. of samples
No. of taxa
DCA Axis
1 2 3 4
Aegelsee AE-1 100 26 57.2 12.0 2.3 1.4
Aegelsee AE-3 54 32 44.3 3.3 1.5 1.4
Gerzensee G-III
65 28 37.6 4.0 1.2 0.9
Faulenseemoos
62 25 44.1 18.8 5.0 3.8
Rotsee RL-250 44 23 38.2 13.3 3.1 2.3
Results of redundancy analysis and partial redundancy analysis permutation tests for the significance of axis 1 when oxygen isotopes and depth are predictor variables, when oxygen is the only predictor, and when oxygen isotopes are the predictor variable and depth is a covariable.
Site Predictor variable: 18O
and depth
Predictor variable: 18O
Covariable: depth
Predictor variable:
18O
Number of response
variables (DCA axes)
Pollen DCA axes
Aegelsee AE-1
0.01a 0.01a 0.02a 2
Aegelsee AE-3
0.01a 0.16 0.20 1
Gerzensee G-III
0.01a 0.46 0.57 1
Faulenseemoos
0.01a 0.01a 0.01a 3
Rotsee RL-250
0.01a 0.21 0.08 2
a Significant at p< 0.05(Lotter et al. 1992)
In multi-proxy studies (e.g. pollen, diatoms, chironomids, etc. studied on the same core), important question is ‘are the major stratigraphical patterns of variation (‘signal’) the same in all proxies?’
Laguna Facil, southern Chile
Massaferro et al. 2005 Quaternary Science Reviews 24: 2510-2522
Pollen and chironomids studied on the same core
Simplified each data-set to the first ordination axes of a correspondence analysis (CA) and a principal components analysis (PCA) for both data-sets
MULTI-PROXY STUDIES
Massaferro et al. 2005
Chironomid stratigraphy
Massaferro et al. 2005
Pollen stratigraphy
Massaferro et al. 2005
Can detect similarities in both proxies and differences
1. Major change in both prior to 14,700 cal yr BP.
2. Changes in the chironomids tend to lag behind changes in the pollen. Perhaps a chironomid response to changes in vegetation (tree canopy and forest type) or lake chemistry, resulting from changes in catchment soils as a result of vegetational change.
3. At about 7200 cal yr BP, chironomids change before the pollen. May be a response to climate change.
4. Strong correlations between the charcoal stratigraphy and pollen and chironomid stratigraphies. Probable importance of fire and/or vulcanism in influencing both vegetational and limnological dynamics.
Charc
o
al
Massaferro et al. 2005
Can use ordination methods to summarise several palaeoecological proxies and to compare with other proxies
Major changes between pre-European period (A)
and European settlement (B)
Lake Euramoo, NE Queensland, last 800 years
Haberle et al. 2006
Tested how well different proxies ‘predict’ or ‘explain’ (in a statistical sense) other proxies
Only proxy that significantly predicted other proxies was pollen that predicted changes in diatoms (25.4%) and chironomids (15.4%)
Illustrates the importance of catchment and its vegetation on the lake and its biota
Assessing Potential External 'Drivers' on an Aquatic Ecosystem
Bradshaw et al. 2005 The Holocene 15: 1152-1162
Dalland Sø, a small (15 ha), shallow (2.6 m) lowland eutrophic lake on the island of Funen, Denmark.
Catchment (153 ha) today
agriculture 77 ha
built-up areas 41 ha
woodland 32 ha
wetlands 3 ha
Nutrient rich – total P 65-120 g l-1
Map of Dalland Sø
Multi-proxy study to assess role of potential external 'drivers' or forcing functions on changes in the lake ecosystem in last 7000 yrs.
Data: No. of samples
Transformation
Sediment loss-on-ignition % 560 None
Sediment dry mass accumulation rate
560 Log (x + 1)
Sediment minerogenic matter accumulation rate
560 Log (x + 1)
Plant macrofossil concentrations
280 Log (x + 1)
Pollen % 90 None
Diatoms % 118 None
Diatom inferred total P 118 None
Biogenic silica 84 Not used
Pediastrum % 90 None
Zooplankton 31 Not used
Terrestrial landscape or catchment development
Bradshaw et al. 2005
Aquatic ecosystem development
Bradshaw et al. 2005
DCA of pollen and diatom data separately to summarise major underlying trends in both data sets
Pollen – high scores for trees, low scores for light-demanding herbs and crops
Diatom - high scores mainly planktonic and large benthic types, low scores for Fragilaria spp. and eutrophic spp. (e.g. Cyclostephanos dubius)
Bradshaw et al. 2005
Major contrast between samples before and after Late Bronze Age forest clearances
Bradshaw et al. 2005
'Catchment'
'Lake
'
Prior to clearance, lake experienced few impacts.
After the clearance, lake heavily impacted.
Canonical Correspondence Analysis
Response variables:
Diatom taxa
Predictor variables:
Pollen taxa, LOI, dry mass and minerogenic accumulation rates, plant macrofossils, Pediastrum
Covariable:
Age
69 matching samples
Partial CCA with age partialled out as a covariable. Makes interpretation of effects of predictors easier by removing temporal trends and temporal autocorrelation
Partial CCA all variables:
18.4% of variation in diatom data explained by Poaceae pollen, Cannabis-type pollen, and Daphnia ephippia, the only three independent and statistically significant predictors.
As different external factors may be important at different times, divided data into 50 overlapping data sets – sample 1-20, 2-21, 3-22, etc.
CCA of 50 subsets from bottom to top and % variance explained
Bradshaw et al. 2005
1. 4520-1840 BC Poaceae is sole predictor variable (20-22% of diatom variance)
2. 3760-1310 BC LOI and Populus pollen (16-33%)
3. 3050-600 BC Betula, Ulmus, Populus, Fagus, Plantago, etc. (17-40%)
i.e. in these early periods, diatom change influenced to some degree by external catchment processes and terrestrial vegetation change.
4. 2570 BC – 1260 AD Erosion indicators (charcoal, dry mass accumulation), retting indicator Linum capsules, Daphnia ephippia, Secale and Hordeum pollen (11-52%)
i.e. changing water depth and external factors
5. 160 BC – 1900 AD Hordeum, Fagus, Cannabis pollen, Pediastrum boryanum, Nymphaea seeds (22-47%)
i.e. nutrient enrichment as a result of retting hemp, also changes in water depth and water clarity
Strong link between inferred catchment change and within-lake development. Timing and magnitude are not always perfectly matched, e.g. transition to Mediæval Period
Bradshaw et al. 2005
Regional zones, description of common features, interpretation, detection of unique features.
Sequence comparison and correlation.
Sequence slotting
SLOTSEQ
FITSEQ
CONSSLOT
Combined scaling of two or more sequences.
CANOCO
ANALYSIS OF TWO OR MORE SEQUENCES
Slotting of the sequences S1 (A1, A2, ..., A10) and S2 (B1, B2, ..., B7), illustrating the contributions to the measure of discordance (S1, S2) and the 'length' of the sequences, (S1, S2).
The results of sequence-slotting of the Wolf Creek and Horseshoe Lake pollen sequences ( = 2.095). Radiocarbon dates for the pollen zone boundaries are also given, expressed as radiocarbon years before present (BP).
SLOTSEQ
Birks & Gordon (1985)
Comparison of oxygen-isotope records from Swiss lakes Aegelsee (AE-3), Faulenseemoos (FSM) and Gerzensee (G-III) with the Greenland Dye 3 record (Dansgaard et al, 1982). LST marks the position of the Laacher See Tephra (11,000 yr BP). Letters and numbers mark the position of synchronous events (for details see text).
Psi values for pair-wise sequence slotting of the stable-isotope stratigraphy at five Swiss late-glacial sites and the Dye 3 site in Greenland. Values above the diagonal are constrained slotting, using the three major shifts shown in previous figure; values below the diagonal are for sequence slotting in the absence of any external constraints. The mean 18O and standard deviation for each sequence is also listed.
CONSLOXY
Lotter et al. (1992)
FUGLA NESS, Shetland
Pollen diagram from Sel Ayre showing the frequencies of all determinable and indeterminable pollen and spores expressed as percentages of total pollen and spores (P).
Abbreviations: undiff. = undifferentiated, indet = indeterminable.
Comparison of Bjärsjöholmssjön and Färskesjön using principal component analysis. The mean scores of the local pollen zones and the ranges of the sample scores in each zone are plotted on the first and second principal components, and are joined up in stratigraphic order. The Blekinge regional pollen assemblage zones are also shown.
Birks & Berglund (1979)
Comparison of Färskesjön and Lösensjön using principal component analysis. The mean scores of the local pollen zones and the ranges of the sample scores in each zone are plotted on the first and second principal components, and are joined up in stratigraphic order. The regional pollen assemblage zones are also shown.
The 1st and 2nd axis of the Detrended Correspondence Analysis for Laguna Oprasa and Laguna Facil plotted against calibrated calendar age (cal yr BP). The 1st axis contrasts taxa from warmer forested sites with cooler herbaceous sites. The 2nd axis contrasts taxa preferring wetter sites with those preferring drier sites
Haberle & Bennett, 2004
Pollen percentage diagram of selected taxa plotted against depth. Lithostratigraphic symbols are based on Troels-Smith (1995). For correlations and ages see Tzedakis (1993, 1994).
Tzedakis & Bennett (1995)
Pollen percentage diagrams of selected arboreal taxa of the Metsovon, Zista, Pamvotis, and Dodoni I and II forest periods of Ioannina 249.
5e
7c
9c
11a + b + c
Tzedakis & Bennett (1995)
Solar insolation values of mid-month day for selected periods at latitude 39º40'N. Values are given for July and January extremes and July minus January for each interglacial period calculated at thousand year intervals. Values are expressed in cal cm2 day-1. In parentheses are percentage differences from 10 ka values. Timing of extreme insolation excursions also given. Data from a computer program written by N.G. Pisias, based on Berger (1978). Chronology based on Imbrie et al. (1984) and Martinson et al. (1987)
Combined plot of sample scores on the first two principal components for Metsovon, Zista, Pamvotis, and Dodoni I forest periods. Asterisks indicate the base of the intervals considered.
Results of comparison of vegetation and climatic signatures of different interglacial periods. '+' sign means similar and '-' means different. First sign refers to climate and second to vegetation character.
Different climate, similar pollen in one comparison
Tzedakis & Bennett, 1995
TEMPORAL DATA
e.g. from monitoring
Rate of change
Gradient analysis (unconstrained, constrained)
Principal response curves
Variance partitioning
Trend analysis – regression against time, Monte Carlo permutation testing
Time-series analysis – see Gavin Simpson’s lecture
- many (>100) points
- few (10–20) points
HYPOTHESIS TESTING
1. External climate forcing functions
2. Catchment forcing functions
3. Lake as isolated system that evolves through time with its own internal dynamics
Assessing potential 'drivers' on aquatic ecosystems.
What determines changes in lake organisms and lake sediments?
Lake Development and Catchment Change
Birks et al. 2000
(a) Sägistalsee, Bernese Oberland, Swiss Alps
A.F. Lotter et al. 2003
J. Paleolimnology 30: 253-342
Andy Lotter
Lotter & Birks 2003
Lotter & Birks 2003
Age-depth model Sedimentation rate
Wick et al. 2003
Wick et al. 2003
Heiri & Lotter 2003
Sägistalsee, SwitzerlandIdeal study:
1. Critical ecological situation at tree-line today; sensitive
2. One core. Many proxies (pollen, macros, chironomids, cladocera, grain size, sediment magnetics, sediment geochemistry)
3. Well dated; 18 AMS 14C dates on terrestrial plant material
4. Well co-ordinated by A.F. Lotter
5. High quality data:Data-set
No. of samples
No. of taxa/variables
Pollen 212 203
Plant macros 372 53
Chironomids 82 30
Cladocera 112 7
Geochemistry 176 14
Grain-size 294 6
Magnetics 504 5
6. Consistent numerical methodology on all proxies
7. Numerical methods used to test hypotheses about the influence of climate and catchment processes on the aquatic ecosystem in the perspective of the Holocene time-scale. (Partial redundancy analysis with restricted Monte Carlo permutation tests)
Of the catchment changes, the main ones appear to be the spread of Picea abies at about 6300 cal BP and Bronze Age and subsequent forest clearances and conversion to grazing pastures.
8. Split proxy data into one predictor variable (plant macrofossils as a reflection of catchment vegetation) and several response variables (cladocera, chironomids, pollen, sediment grain-size, magnetics, geochemistry)
Predictor variables:
Lotter & Birks 2003
Hypotheses tested:1. Climate has had a significant control on lake ecosystem changes2. Catchment vegetation has played significant role on lake changes
"Responses" (proxies)
Scale Climate a signif-icant predictor?
Catchment vegetation a significant predictor?
Terrestrial
Pollen Catchment & regional Y Y
Macrofossils Catchment - -Lake biotic
Chironomids Lake N Y Cladocera Lake N Y
Lake abiotic
Grain size Lake - Y Magnetics Lake - Y
Geochemistry Lake - (Y)* #
* Tested against insolation, central European cold phases, & Atlantic IRD record
# Veg phases: Betula-Pinus cembra; Alnus-Pinus cembra; Picea abies ~ 6300 cal BP; Pasture phases from Bronze Age to present
SPATIAL GEOGRAPHICAL DATA
Geographical co-ordinates X, Y
Spatial analysis
Legendre & Fortin (1989) Vegetatio 80: 107-138
Legendre (1993) Ecology 74: 1659-1673
Koenig (1999) Trends in Ecology & Evolution 14: 22-26
Borcard et al. (2004) Ecology 85: 1826-1832
STATISTICAL ANALYSIS
Random sample assumption
Spatial autocorrelation
Effect of spatial autocorrelation on tests of correlation coefficients for randomly generated, positively autocorrelated data
True interval: r not significantly different from zero
Confidence interval computed from the usual tables r 0 ***
Confidence interval of a correlation coefficient
-1-1 00 +1+1rr
‘Liberal’ results – too many coefficients will be judged statistically significant when, in reality, they are not
SPATIAL AUTOCORRELATION
Classical statistics assumes independence of observations.
Ecological variables very commonly show spatial structure in the sample space.
Variable is autocorrelated when it is possible to predict values of this variable at some points in space from the known values at other sampling points whose spatial positions are known. Correlation in relative
mean density of mountain hares between eleven provinces in Finland over 39 years (1946-85) plotted against distance between centres of provinces.
HOW TO TEST FOR SPATIAL STRUCTURE?
Spatial autocorrelation coefficients – Moran's I
H0 – no spatial autocorrelation
Each value of the I coefficient is equal to
E(I) = -(n-1)-1 0
where E(I) is the expected I and n is the number of data points
H1 – there is significant spatial autocorrelation
The value of I is significantly different from E(I) 2)())(()( yyWyyyywndI ijiij
2)())(()( yyWyyyywndI ijiij
where y represents the values of the variables, all summations are for i and j varying from 1 to n, the number of data points but excluding where i = j. The wij's take the value 1 when the pair (i,j) relates to distance class d (the one being computed) and is 0 otherwise, W is the sum of the wij's or the number of pairs (in the whole square matrix of distances between points) taken into account when computing coefficients for a given distance class.
I(d) is computed for each distance class d.
Moran's I usually -1 to +1 but can exceed these values.
Positive I suggests positive correlation
Negative I suggests negative correlation.
Can test for significance by standard errors and confidence intervals or by randomisation tests.
Behaves like Pearson's correlation coefficient r as its numerator is sum of cross-products of centred terms (covariance term), comparing in turn the values found at all pairs of points in the given distance class.
Sensitive to extreme values, like r is.
Plot a CORRELOGRAM where Moran's I is plotted against distance (d).
All-directional correlogram – assume that the phenomenon is isotropic, namely that the autocorrelation function is the same whatever direction is considered.
Correlograms for artificial data. Black squares are significant at = 0.05
Legendre & Fortin 1989
Moran's I correlogram for cross-validation residuals for transfer functions. See low I in MAT and ANN, high I in WA and GLR (ML) (spatial autocorrelation not sucked in by these methods), intermediate I in WAPLS
Legendre (1987)
In: Evolutionary Biogeography of the Marine Algae of the North Atlantic (eds. D.J. Garbary & R.R. Soult). Springer
Legendre & Legendre
(1984)
Can. J. Fish. Aquat. Sci. 41, 1781-1802
Andersson (1988)
Vegetatio 74, 95-106
Openshaw (1974)
Computer Applic. 3-4, 136-160
Webster & Burrough
(1972)
J. Soil Sc. 23, 222-234
SPATIALLY CONSTRAINED CLUSTERINGS
REGIONALISATION
REGULAR GRID
A) Only group objects if they are adjacent
CONCLUST DC matrix of objects D
Adjacency matrix (1/0) A
(adjacent if have side or corner in common)
Compare D and A. If not adjacent, flag as negative DC and ignore.
Generalised agglomerative strategy 7 methods
As fuse, update adjacency matrix
If Dab or Dbc positive, Dabc must be positive
Plot results as map for 10, 9, 8... 2 groups
CONCMAP CONCSCR
printer screen colours
Observations:
1) Little difference in results between clustering methods (cf unconstrained ca).
Little difference with different DCs (within reason!).
2) Faster than unconstrained ca.
3) Spatial constraints with biogeographical data make little difference, i.e. data strongly structured themselves.
IRREGULAR GRID
B) Weight DC matrix between objects
Webster & Burrough (1972)CONDCMAT
distance weighting
inverse square where
exponential
Similar results to CONCLUST, but does not have to be grid pattern.
w
ddDD wijijd
ij
1
.max
w
ddDD wijijd
ij
1.max 2
ijdww
wdij
dij
ijeDD / 1
Geog distance
Weighting factor
Andersson (1988) neighbour weighting 1/0 data for species (variable) analysis
NEIWEI
+ + +
+ 1 +
+ + +
+ +
1 +
1 + 8 = 9 score
1 + 3 = 4 score
'pseudofrequency' scores
1 1 1
1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
4 4 1
7 7 3
5 8 8 8 5
6 9 9 9 6
4 6 6 6 4
Species A Scores
SPATIALLY CONSTRAINED ORDINATIONS
CCA or RDA detect simple gradients using x and y co-ordinates
Direction of gradient is tan–1 (b2/b1)
Complex gradients
ybxb 21
Trend-surface analysis
Can partial out spatial effects – remove effects of spatial autocorrelation.
254
2321 ybxybxbybxb
39
28
27
36 ybxybyxbxb
quadratic
cubic
CANOCO
Maps obtained by block kriging for the sample scores, on canonical axes 1 (top) and 2 (bottom), in the species space (left) and in the trend-surface geographic space (right); values multiplied by 100 for mapping. Peaks are shadowed. No samples had been taken from the blanked area on the left.
CCA site scores WA species scores
CCA site scores linear combinations of env. variables
GE
OG
RA
PH
ICA
L SPA
CE
Axis 2Axis 2
SPEC
IES S
PA
CE
Axis 1Axis 1
VARIANCE PARTITIONING INTO FOUR ADDITIVE COMPONENTS
a) Non-spatial environmental variation
i.e. environmental effects after partialling geographical variation Local environmental
b)Spatially structured environmental variation
i.e. spatially covarying environmental variation
Regional environmental
c) Spatial variation not shared by environmental variables
i.e. spatial effects after partialling environmental variables Pure spatial
d)Unexplained
CCA explanatory vars
covariables
canonical
s
%
1) CCA Envir - 0.268 18.6
2) CCA Geography - 0.373 25.9
3) partial CCA
Envir Geography
0.156 10.8
4) partial CCA
Geography Envir 0.261 18.1Total inertia 1.443
a) Non-spatial (analysis 3) 10.8%
b) Spatially covarying environmental variation
(analyses 1-3) 7.8%
c) Pure spatial (analysis 4) 18.1%
d) Unexplained 63.3%
(a) (b) (c) (d)
Environmental variance Unexplained
Spatial structure variance
Variation partitioning of a species data table, showing that fraction (b) is the intersection of the environmental and spatial components of the species variation.
Variation partitioning of the oribatid mites data matrix
13.7 %
31.0 %
12.2 %
43.0 %
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Oribatids
Pe
rce
nt
of
va
ria
tio
n
Undetermined
Space
Env + space
Environment
Fraction A
Non-spatial environmental variation 13.7%
'Local environment'
'Pure environment' independent of space
Fraction B
Spatially-structured environmental variation 31.0%
(Spatial component of the environmental influence)
Substrate moisture content
Fraction C
Non-environmentally explained variation 12.2%
Spatial structure independent of the environmental variables
'Pure spatial'
Theoretical causal relationships between environmental variables (representing processes) and community structure. Fractions (a), (b), (c) and (d) of the community data variation refer to Figure 5. ECM: Environmental control model. BCM: Biotic control model. HD: Historical dynamics. Asterisks * indicates factors not explicitly spelled out in the model.
Fraction
Causal factor Process
Effect
Non-spatial environ-mental variation
(a) Environmental factor ECM - Community structure Local environment
(a)* Non-spatially structured factor not included in the analysis
ECM - Env. variable in the analysis - Non-spatial community var.
Historical events without spatial structure at the study scale
HD - Env. variable in the analysis - Non-spatial community var.
Spatially structured env. variation
(b) Env. factor with spatial structure ECM - Community spatial structure
Covariation between environment and space
(b)* Spatially structured env. factor not included in the analysis
ECM - Env. variable in the analysis - Community spatial structure
Spatially structured historical events
HD - Env. variable in the analysis - Community spatial structure
Non-envir spatial variation
(c)* Spatially structured factors not included in the analysis
ECM - Community spatial structure
Spatial
Spatially structured historical events
HD - Community spatial structure
Predation, competition, etc. BCM - Community spatial structure
Un-explained
(d)* Factor not included in the analysis, not spatially structured (at study scale)
ECM - Non-explained community var.
Biotic control factors not spatially structured (at study scale)
BCM - Non-explained community var.
Random variation, sampling error, etc.
Noise - Non-explained community var.
Major limitation of this approach is that it is unsuitable for spatial structures present at a WIDE range of different spatial scales.
Principal co-ordinates analysis of neighbour matrices (PCNM).
Borcard & Legendre (2002) Ecological Modelling 153: 51-68
Borcard et al. (2004) Ecology 85: 1826-1832
Eigenvalue decomposition of a truncated matrix of geographic distances between the sampling sites.
Eigenvalues corresponding to positive eigenvalues are used as spatial descriptors in regression or canonical ordinations.
SPACEMAKER
PCNM (R)
spacemakeR
Borcard & Legendre (2002)
PCNM of linear transect of 100 samples, 1 m apart.
Set distance threshold at 1 m to retain only the closest neighbours: replaced other distance by 1 m x 4 = 4 m.
Principal co-ordinates correspond to a series of sinusoids with decreasing periods. Largest is n+1, smallest is ~3.
Borcard & Legendre (2002)
Ecological data – Adiantum tomentosum abundance along transects in NE Peru. 260 adjacent 5 x 5 m subplots
(a) Fern (thick), PCNM (thin line)
(b) very broad scale (thick), broad scale (thin line)
(c) medium scale
(d) fine scale
Oribatid mites and PCNM – irregular two-dimensional sampling
PCNM gives 43 variables with truncation distance of 1.012 m
Show coarse broad-scale patterns and fine-scale patterns
Forward selection in RDA retains 12 PCNM variables. Explains 45.1% of variance (cf. 43.2% in simple RDA)
RDA Axis 1 22.6% variance – shrubs or no shrubs R2 = 0.48
RDA Axis 2 8.4% variance – shrubs or hummocksR2 = 0.11
RDA Axis 3 4.5% variance R2 = 0.34 – areas of low water content and no shrubs
When use environmental variables and simple X-Y trend as covariables, and RDA with PCNM variables, two significant axes remain. May reflect unmeasured abiotic or biotic mechanisms, such as food sources.
Atlantic foraminifera & SST Telford & Birks (2005)
Matrix of PCNM variables created from matrix of distances between N Atlantic sites truncated at 781 km, the minimum distance that links all sites into a single network.
385 orthogonal PCNM representing space.
Forward selection in CCA retained 37 of these.
Represent large spatial patterns.
SST independent of space 1.8% variance
Covariation between SST & space 29.9% variance
Space independent of SST 42.5% variance
Unexplained 25.7%
Pure space explains most. Therefore there are important unknown spatial structures in the data. If only considering SST, expect strong spatial autocorrelation in residuals of SST transfer function models.
Lowest auto-correlation in MAT and ANN residuals
Highest auto-correlation in WA and GLR (= ML) residuals
Highlights 'secret assumption' of transfer functions
PREDICTIVE MODELS FROM SPATIAL DATA
Nature management – well explored areas, poorly explored areas
Lesotho bird atlas
Habitat variables PCA axes
Logistic regression to model species occurrences and absences in terms of habitat PCA
Wildlife management GIS Mt Graham red squirrel in relation to env vars Logistic regression
Pereira & Itami (1991) Photogr. Engin. & Remote Sensing 57, 1475–1486
554433221101xbxbxbxbxbb
pp
log
recording effort
PCA site scores
Distribution maps for three bird species in Lesotho produced by logistic modelling of presence-absence data. Higher probabilities of occurrence are indicated by increasing circle size and actual field records are shown as filled circles.
Pied crow
Ground wood-pecker
Cape vulture 1
Cape vulture 2*
PC1 -0.90 (0.28)
0.54 (0.18)
0.40 (0.14)
0.85 (0.28)
PC2 -0.14 (0.41)
-0.72 (o.29)
0.02 (0.22)
-0.15 (0.25)
PC3 -0.49 (0.35)
0.01 (0.28)
-0.31 (0.23)
-0.44 (0.27)
PC4 -0.34 (0.29)
-0.24 (0.29)
0.02 (0.29)
0.76 (0.48)
Effort 0.15 (0.09)
0.31 (0.14)
0.04 (0.03)
0.10 (0.04)
Con-stant
-2.43 (0.92)
-1.52 (0.84)
-0.75 (0.42)
-1.96 (0.79)
Devi-ance
33.95 45.21 62.73 48.88
Df 49 49 49 47
P-value+
0.95 0.63 0.09 0.40
Summary of the overall logistic models. The upper data are regression coefficients with their standard errors in brackets.
* Cape vulture 2 excludes data for two squares * Cape vulture 2 excludes data for two squares identified as having a disproportionate effect on identified as having a disproportionate effect on the model using all the data (Cape vulture 1).the model using all the data (Cape vulture 1).+ The + The PP-value is best interpreted as a measure of -value is best interpreted as a measure of standardized deviance, useful for comparing standardized deviance, useful for comparing models with differing degrees of freedom.models with differing degrees of freedom.
Hill (1991) J. Biogeogr. 18, 247–255
CCA species data +/–
environmental data max altitude
annual rainfall
mean temperature
geology
presence of coast
4534232121101
xbxbxbxbxbbp
p
log
x1 – x4 are site scores in CCA
Predict distributions given simple environmental data.
Actual DIPPER
Actual LITTLE RINGED PLOVER
Actual ROCKROSE
Predicted DIPPER
Predicted ROCKROSE
Predicted LITTLE RINGED PLOVER
Actual and predicted distributions of species using logit regression with six parameters. The species are Dipper (Cinclus cinclus), Little Ringed Plover (Charadius dubius) and Common Rockrose (Helianthemum nummularium). Circles of increasing size signify categories of probability as follows: 1-4%; 5-10% 11-30%; 31-50%; 51-75%; 76-100%.
PREDICTION OF UPLAND PLANT COMMUNITY DISTRIBUTION USING
LOGISTIC REGRESSION
54 upland vegetation types recorded in 1,514 ten-kilometre grid squares in the uplands of Scotland, England, and Wales.
Environmental variables from National Land Characteristics Data Bank.
Topography 13 variables (22 possible)
Climate 18 variables (29 possible)
Geology 19 variables (29 possible)
Soil types 8 variables (8 possible)
Land-use 2 variables (22 possible)
Reduced 31 Topography + climate variables to 5 PCA axes (63.6% variance) and 27 Geology + Soil type variables to 2 PCA axes (20.3%)
Used 5 PCA axes + their square terms, the 2 PCA axes, + Land-use variables as predictors in logistic regression using the +/- of each vegetation type as the response variable.
54 models
7 have rho (r2)< 0.20
26 have rho 0.20 - 0.40
20 have rho 0.40 - 0.60
2 have rho > 0.60
Mean rho values
Calcareous grassland0.38
Heaths 0.41
Mires 0.26
Other grasslands 0.41
Woodland & scrub 0.40
Alpine snow-beds etc.0.52
Poorest fits: Heaths 1
Mires 5
Grasslands 1
Predicted and known 10km square distribution of NVC U20 (Pteridium aquilinum – Galium saxatile community). Predictions were not made for lowland areas.
Predicted and known 10km square distribution of NVC U10 (Carex bigelowii – Racomitrium lanuginosum moss-heath).
Predicted and known 10km square distribution of NVC H13 (Calluna vulgaris – Cladonia arbuscula heath).
Predicted and known 10km square distribution of NVC H9 (Calluna vulgaris – Deschampsia flexuosa heath) in the uplands.
Predicted and known 10km square distribution of NVC M6 (Carex echinata – Sphagnum recurvum/auriculatum mire).
Predicted and known 10km square distribution of NVC M10 (Carex dioica – Pinguicula vulgaris mire).
Predicted and known 10km square distribution of NVC W19 (Juniperus communis – Oxalis acetosella woodland).
Salix herbacea-Racomitrium heterostichum, snow-bed
Cryptogramma crispa-Athyrium distentifolium, snow-bed
Luzula sylvatica-Geum rivale, tall-herb community
Saxifraga aizoides-Alchemilla glabra, banks
Nardus stricta-Galium saxatile, grassland
Festuca ovina-Agrostis capillaris-Galium saxatile, grassland
Festuca ovina-Agrostis capillaris-Rumex acetosella, grassland
Calluna vulgaris-Erica cinerea, heath
Erica tetralix-Sphagnum compactum, wet heath
Erica tetralix-Sphagnum papillosum, raised and blanket mire
PREDICTING THE PROBABILITY OF SPECIES OCCURRENCE USING
SURVEY DATALe Duc et al. (1992) Watsonia 19: 97-105
Le Duc et al. (1992) Aspects of Applied Biology 29: 41-48
Firbank et al. (1998) Weed Research 35: 1-10
Plant recording 10 km grid squares
Tetrads 2 km grid squares
Impossible to record all tetrads, only record 3 (A, J, and W)
Convert tetrad data to probabilities of species occurrence, introducing some spatial smoothing in the interpolation.
Layout of the botanical monitoring scheme of the BSBI.
Gaussian smoothing of occurrence in tetrads.
Species
occurrenceProbability of species
occurrence
To predict species occurrence, need external predictors (e.g. soil type, land-use classes) and logistic regression.
(a) data
(b) estimated probability
(c) estimated probability using soil groups
(d) estimated probability using land-use classes
Soil type main predictor
Veronica montana
Predicting weed distribution using tetrad data and soil types.
Firbank et al. (1998)
Soil x Smooth x 1
log cbap
pe
Soil 16 classes
Alopecurus myosuroides
(a) tetrads
(b) smoothed probability of occurrence
(c) prediction using (b) + soils
(d) 10 km square map
(a) Elymus repens
(b) Legousia hybrida
(c) Papaver rhoeas
(d) Senecio jacobea
(a) grass weeds of cereals
(b) broad-leaved weeds
(c) distribution of arable land
Species pool of cereal weeds greatest in central and southern England. Does not entirely coincide with distribution of arable farming.
OBSERVED today PREDICTED today PREDICTED future
PREDICTION OF FUTURE CHANGES - TROLLIUS EUROPAEUS
Known distribution of globeflower (Trollius europaeus)(data from the Biological Records Centre)
Predicted current distribution using Jan min. & July max. temp and annual precipitation as independent variables in a logistic regression.
Predicted distribution in 2050 using the same model but imposing the UK transient climate scenario for 2050.
Watt et al. (1997)
KEY RESEARCHERS IN ANALYSIS OF TEMPORAL
PALAEOECOLOGICAL DATA
Steve Juggins
Allan Gordon
Ed Cushing
Keith Bennett
Eric Grimm Bent Odgaard
Andy Lotter
KEY RESEARCHERS IN SPATIAL ANALYSIS OF ECOLOGICAL DATA
Pierre Legendre
Daniel Borcard
Mark Hill Marie-Josée Fortin
Richard Telford