exploring spatial variation and spatial relationships in a freshwater acidification critical load...

17
Exploring spatial variation and spatial relationships in a freshwater acidification critical load data set for Great Britain using geographically weighted summary statistics Paul Harris a, , Chris Brunsdon b a National Centre for Geocomputation, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland b Department of Geography, University of Leicester, Leicester LE1 7RH, UK article info Article history: Received 20 October 2008 Received in revised form 29 March 2009 Accepted 1 April 2009 Keywords: Local statistics Geographical kernel weighting Nonstationarity Acidified surface waters Catchment characteristics abstract In this study, geographically weighted summary statistics (GWSSs) are used to investigate spatial variation and spatial relationships in a freshwater acidification critical load data set covering Great Britain. This use of GWSSs not only provides valuable insight into the critical load process prior to a geographically weighted regression (GWR) calibration, but also helps in interpreting its output. GWSSs are similarly useful prior to the calibration of other spatial models, such as those used in geostatistics. Results agree with those of previous research, where relationships between critical load and contextual catchment data can vary across space. However the more sophisticated models used here are shown to be much more flexible and informative, allowing more spatial patterns to be revealed than before. & 2009 Elsevier Ltd. All rights reserved. 1. Introduction Acid deposition is a major environmental threat to lakes and streams throughout large areas of upland Britain (Mason, 1993). Pollutants that contribute to freshwater acidification are generally emitted as sulphur dioxide and nitrogen oxides. The major sources of such acidifying compounds are from combustion of fossil fuels at power stations or from other industrial processes. Vehicle exhausts, agriculture, volcanoes and the oceans also contribute. For freshwater acidification most of the atmospheric deposition is to the terrestrial part of the catchment rather than open water. Therefore lake and stream acidification is a function of flow paths and the physical and chemical properties of catchment soils. Acidified freshwaters are a hostile environment for many forms of aquatic life and consequently of environmental concern. Contin- uous assessment and informed management strategies for fresh- waters are fundamental for their protection. One approach to protecting freshwaters focuses on the calculation of acid deposition critical load values at freshwater sites. Critical load values are calculated in such a way as to indicate a site’s capacity to buffer the input of strong acid anions of sulphur and nitrogen. Critical load values are thresholds and can be compared directly to current and future deposition values. For sites where the deposition value exceeds the critical load value, acidification and associated environmental damage is expected. Spatial variability in critical load values should be considered jointly with spatial variability in deposition values. This approach allows for selectivity and for exceeded sites to be preferentially managed. For remediation of sites, two avenues are possible: (a) reduce (nearby) deposition rates or (b) physically neutralise freshwater acidity (e.g. by the addition of an alkali compound). In general, the susceptibility of freshwaters to acidification varies according to geology and land use. Waters situated on bedrocks with a high weathering rate are usually well buffered against rain-deposited acidity by the relatively rapid release of neutralising base cations (mainly Ca 2+ and Mg 2+ ). However for areas of slowly weathering bedrocks the reverse is true, with acidifying compounds displacing H + ions, which directly lead to acidification. For Great Britain, the granite regions of Scotland and Wales are particularly affected by acidification. To calculate a critical load for any given freshwater site requires surface water chemistry data. Collecting such data for every site across Great Britain is prohibitively expensive. There- fore previous research has looked at ways of predicting critical loads at sites where water chemistry data are unavailable, as an alternative to a costly sampling programme. In this respect, research has endeavoured to link critical load variation with various catchment characteristics. This is useful as many catch- ment variables can be formulated from existing data sources and ARTICLE IN PRESS Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/cageo Computers & Geosciences 0098-3004/$ - see front matter & 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2009.04.012 Corresponding author. Tel.: + 353 1708 6208; fax: + 353 1708 6456. E-mail address: [email protected] (P. Harris). Computers & Geosciences 36 (2010) 54–70

Upload: paul-harris

Post on 04-Sep-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

ARTICLE IN PRESS

Computers & Geosciences 36 (2010) 54–70

Contents lists available at ScienceDirect

Computers & Geosciences

0098-30

doi:10.1

� Corr

E-m

journal homepage: www.elsevier.com/locate/cageo

Exploring spatial variation and spatial relationships in a freshwateracidification critical load data set for Great Britain using geographicallyweighted summary statistics

Paul Harris a,�, Chris Brunsdon b

a National Centre for Geocomputation, National University of Ireland Maynooth, Maynooth, Co. Kildare, Irelandb Department of Geography, University of Leicester, Leicester LE1 7RH, UK

a r t i c l e i n f o

Article history:

Received 20 October 2008

Received in revised form

29 March 2009

Accepted 1 April 2009

Keywords:

Local statistics

Geographical kernel weighting

Nonstationarity

Acidified surface waters

Catchment characteristics

04/$ - see front matter & 2009 Elsevier Ltd. A

016/j.cageo.2009.04.012

esponding author. Tel.: +3531708 6208; fax:

ail address: [email protected] (P. Harris).

a b s t r a c t

In this study, geographically weighted summary statistics (GWSSs) are used to investigate spatial

variation and spatial relationships in a freshwater acidification critical load data set covering Great

Britain. This use of GWSSs not only provides valuable insight into the critical load process prior to a

geographically weighted regression (GWR) calibration, but also helps in interpreting its output. GWSSs

are similarly useful prior to the calibration of other spatial models, such as those used in geostatistics.

Results agree with those of previous research, where relationships between critical load and contextual

catchment data can vary across space. However the more sophisticated models used here are shown to

be much more flexible and informative, allowing more spatial patterns to be revealed than before.

& 2009 Elsevier Ltd. All rights reserved.

1. Introduction

Acid deposition is a major environmental threat to lakes andstreams throughout large areas of upland Britain (Mason, 1993).Pollutants that contribute to freshwater acidification are generallyemitted as sulphur dioxide and nitrogen oxides. The major sourcesof such acidifying compounds are from combustion of fossil fuelsat power stations or from other industrial processes. Vehicleexhausts, agriculture, volcanoes and the oceans also contribute.For freshwater acidification most of the atmospheric deposition isto the terrestrial part of the catchment rather than open water.Therefore lake and stream acidification is a function of flow pathsand the physical and chemical properties of catchment soils.Acidified freshwaters are a hostile environment for many forms ofaquatic life and consequently of environmental concern. Contin-uous assessment and informed management strategies for fresh-waters are fundamental for their protection.

One approach to protecting freshwaters focuses on thecalculation of acid deposition critical load values at freshwatersites. Critical load values are calculated in such a way as toindicate a site’s capacity to buffer the input of strong acid anionsof sulphur and nitrogen. Critical load values are thresholds andcan be compared directly to current and future deposition values.

ll rights reserved.

+3531708 6456.

For sites where the deposition value exceeds the critical loadvalue, acidification and associated environmental damage isexpected. Spatial variability in critical load values should beconsidered jointly with spatial variability in deposition values.This approach allows for selectivity and for exceeded sites to bepreferentially managed. For remediation of sites, two avenues arepossible: (a) reduce (nearby) deposition rates or (b) physicallyneutralise freshwater acidity (e.g. by the addition of an alkalicompound). In general, the susceptibility of freshwaters toacidification varies according to geology and land use. Waterssituated on bedrocks with a high weathering rate are usually wellbuffered against rain-deposited acidity by the relatively rapidrelease of neutralising base cations (mainly Ca2 + and Mg2 +).However for areas of slowly weathering bedrocks the reverse istrue, with acidifying compounds displacing H+ ions, whichdirectly lead to acidification. For Great Britain, the graniteregions of Scotland and Wales are particularly affected byacidification.

To calculate a critical load for any given freshwater siterequires surface water chemistry data. Collecting such data forevery site across Great Britain is prohibitively expensive. There-fore previous research has looked at ways of predicting criticalloads at sites where water chemistry data are unavailable, as analternative to a costly sampling programme. In this respect,research has endeavoured to link critical load variation withvarious catchment characteristics. This is useful as many catch-ment variables can be formulated from existing data sources and

ARTICLE IN PRESS

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–70 55

are therefore relatively inexpensive. Catchment variables havebeen used to predict classes of critical load (Hall et al., 1995) or toexplain critical load variation (Kernan et al., 1998, 2001) at GreatBritain and similar spatial scales. Such studies found moderatelystrong relationships between critical load and catchment data,where the strength and nature of relationships could varyaccording to sample scale in both attribute-and geographic-space.However studies applied only basic methods, where any localregression modelling was fairly rudimentary in design usingarbitrary aspatial or spatial partitions.

For this and companion studies (in preparation), it is taken thata critical load spatial process would be better investigated usingmore sophisticated techniques. In the first instance (this study),geographically weighted summary statistics (GWSSs; Brunsdonet al., 2002; Fotheringham et al., 2002) are used to explorethe critical load data set spatially. This initial study acts as aninformative precursor to an exploration with geographicallyweighted regression (GWR; Brunsdon et al., 1996; Fotheringhamet al., 2002) and to other spatial models. With GWSSs (and GWR)nearby data are given more influence by weighting observationsaccording to a distance-decay function. This use of spatiallyweighted data enables the calibration of numerous location-specific statistics (or regressions). This ‘moving-window’ ap-proach (where weights follow a focal point around the map)allows statistics to be computed for regions not necessarilyattainable in any partition-based approach. Such a use ofsample information tends to provide a continuous and smoothmodel output, where a local statistic can be mapped andvisually explored. The nonparametric techniques of GWSS andGWR are influenced by kernel density estimation (KDE)methods (Silverman, 1986; Wand and Jones, 1995), the attri-bute-space local regression (LR) models of Cleveland (1979) andLoader (2004), and the generalised additive/varying-coefficientmodels of Hastie and Tibshirani (1990, 1993). In addition to themodelling of continuous spatial processes (i.e. models used in thisstudy), the kernel approach has been extensively adopted for themodelling of point spatial processes (see Diggle, 1985; Silverman,1986).

The models of this and companion studies will use similarcritical load and catchment variables to those used in the studiesof Hall, Kernan and co-workers (from above) and therefore somelimited comparison between studies will be possible. Further-more, the use of GWSSs and GWR need not be confined to thisparticular data set as they are likely to be similarly applicable todata sets found from other environmental processes that areconsidered heterogenic. For example, applications to critical loaddata sets for regions of China where acidification is currentlyposing a major environmental problem (Brimblecombe, 2007) orto critical load data sets for other pollutants, such as those foundfor heavy metals (e.g. see Slootweg et al., 2007).

2. Data

The calculation of a critical load value for any freshwater site isitself a complex issue and competing models exist for theircalculation. Steady-state approaches calculate values such thatexceedances (critical load minus deposition) reflect potentialfuture damage once steady-state is achieved. Steady-state modelsinclude the steady-state water chemistry (SSWC) model (Henriksenet al., 1992; Curtis et al., 2000), the Diatom model (Battarbee et al.,1996) and the First-order Acidity Balance model (Poschet al., 1997; Curtis et al., 2000). Models can be calibrated forsulphur deposition, for nitrogen deposition or for both (totalacidity). For this study, critical load values from the SSWC model

for total acidity are spatially modelled. Units for critical loads (anddeposition data) are in keq. H+ ha�1 year�1.

Researchers at the Department of Geography, UniversityCollege London (UCL), provided the critical load and thecontextual catchment data. The critical load data stem from awater chemistry sampling programme for Great Britain as part ofthe UK Department of Transport and Regions critical loadsmapping programme (Kreiser et al., 1993). Water chemistrysamples were taken during the autumn or early spring over theperiod 1992–1994. Sites were chosen to represent the mostsensitive water body within either a 10 km grid square (formedium- to high-sensitive areas) or within a 20 km grid square(for low- or non-sensitive areas) so that the minimum critical loadcould be calculated. Research teams within the Critical LoadsAdvisory Group (CLAG) then used the water chemistry data tocalculate and map critical load values. Details of the samplingprogramme and mapping exercise are given in CLAG Freshwaters(1995).

The version of critical load and catchment data used for thisstudy was provided in January 2002. At this time, the waterchemistry data had been screened for problematic values byresearchers at UCL, which resulted in a critical load (andcatchment) data set of 1371 sites covering the whole of GreatBritain. This data set was further manipulated for this andcompanion studies to avoid problems of preferential sampling(i.e. data in medium- to high-sensitive areas is over-represented)when calibrating GWSSs and other spatial models (not presentedhere). This data manipulation also removed sites with missingdata. As a result, a spatially representative (declustered) data setof 497 sites for model calibration and a spatially representative(set-aside) data set of 189 sites for model validation (not usedhere) were found. The coverage of the resultant calibration data(Fig. 2b) is extensive and fairly regular, which is suitable forspatial modelling. Investigations (not presented) found this ratherlarge loss of model calibration information to have a negligibleeffect on model interpretation or performance.

To explain critical load variation, four percentage-based classvariables are used and manipulated. These catchment-specificvariables are geological sensitivity (GSP), soil buffering capacity(SBCP), soil critical load (SCLP) and land cover (LCP). The firstthree of these variables relate to a freshwater site’s ability tobuffer acid loading and comprise of four, three or five ordinalclasses for GSP, SBCP and SCLP, respectively. The twenty-five-classLCP variable is nominal. Such data were generated by over-layingdigitised catchment areas for each sampled site on to existingdigital maps. Full descriptions of the GSP and LCP data generationcan be found in Kernan et al. (1998, 2001). For the SBCP and SCLPdata generation, the reader is referred to Kernan et al. (1998)(where SBCP is termed soil sensitivity). If data reliability isconsidered an issue, then the following order of reliability isassumed: LCP, GSP, SBCP and SCLP (with the most reliable first).Other contextual variables were also available (e.g. site altitude,rainfall, etc.), but each variable added little to the varianceexplained of any regression fit; hence these variables werediscarded.

To more easily facilitate the use of percentage-based classvariables in this study’s correlations (and a companion study’sregressions), the three ordinal variables were re-formulated intosingle-value, weighted sensitivity data (Wt.GSP, Wt.SBCP andWt.SCLP). This results in a continuous variable form with only amarginal loss of information. Table 1 summarises the range ofvalues that the original percentage-based and correspondingweighted variable can take according to an expected acidbuffering capacity (or acid sensitivity). Thus low critical loadvalues would be expected to correspond to low Wt.GSP, Wt.SBCPand Wt.SCLP values (and vice versa). Twenty-five land cover

ARTICLE IN PRESS

Table 1Summary of ordinal/continuous catchment variables.

Buffering capacity GSP Wt.GSP SBCP Wt.SBCP SCLP Wt.SCLP Acid sensitivity

Low 1 1.0 1 10.0 5 0.1 Highk k k k k k k k

High 4 4.0 3 80.0 1 4.0 Low

Note the reverse order of SCLP.

Table 2Nine-class aggregation (LC9) of original twenty-five-class land cover data (LC25).

LC9 class LC25 class Description

1 1–4 and 20–22 Water and built/bare ground

2 6 Mown/grazed turf

3 7 Meadow/verge/semi-natural

4 18 Tilled land

5 14 and 15 Deciduous woodland

6 16 Coniferous woodland

7 5, 8, 13, 19 and 23–25 Lowland semi-natural grass/moor

8 9, 12 and 17 Upland semi-natural grass/bog moor

9 10 and 11 Upland semi-natural shrub moor

The most abundant LC25 classes are in bold.

Table 3Four kernel weighting functions.

Box car wij=1 if dijrr wij=0 otherwise

Bi-square wij=(1�(dij/r)2)2 if dijrr wij=0 otherwise

Gaussian wij ¼ expð�d2ij=2b2Þ

Exponential wij=exp(-dij/b)

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–7056

classes would be problematic when calibrating and interpretingregressions. Hence an aggregation that gave a nine-class landcover variable was undertaken (see Table 2). It was then useful toconvert this percentage-based variable formulation into adominant catchment attribute (LC9D), as analysis with adominant class variable can more easily assess how land coverclasses discriminate between different critical load populations.This dominant class variable can be considered as low-resolutionland cover data, but overall there was little loss of informationbetween high- and low-resolution data forms.

3. Geographically weighted summary statistics (GWSSs)

3.1. Fixed and adaptive window sizes

The simplest way to find a local statistic surface is with amoving-window algorithm, where local statistics are calculatedand mapped at each window’s centre using only data withineach window. Summary statistics found using (over- or non-over-lapping) rectangular moving windows can be found in manyspatial studies, often prior to a geostatistical analysis (e.g. seeIsaaks and Srivastava, 1989; Rossi et al., 1992; Carroll and Oliver,2005; Zhang et al., 2007). Other window shapes are possible andin this study only circular ones are considered. However of mostimportance is the window size. If the window is too small, too fewdata are used to calculate the local statistic, resulting in an erraticor spiky output. If the window is too large, the local statistic willtend to the corresponding global statistic and thus provide littlespatial insight.

Window size is commonly defined as (a) a fixed size bydistance or (b) an adaptive size, where a fixed number of localdata items are used for each local statistic calculation. For sampledata on a fairly regular grid, either method is usually appropriate.For sample data on an irregular grid, the adaptive method ispreferred, resulting in different window sizes according to thedensity of local data. Hence the method is adaptive in a distancesense. Adaptive specifications eliminate poorly informed statistics

from windows with little or no information, but at a possible costof reduced ‘localness’ in some areas.

3.2. Distance-decay kernel functions

The moving-window approach can be generalised to thecalculation of locally weighted statistics, where data are nowweighted according to their proximity to a local calibration point(i.e. GWSSs). Here the weighting functions are called kernelfunctions, where a moving-window approach would relate to aGWSS specified with a box-car kernel. Importantly, the use of adistance-decay kernel can maximise sample information whilststill retaining a local focus. Specifying such a kernel will produce asmooth output across space. However, models using a distance-decay kernel should not be automatically preferred to those usinga box-car kernel, as the simple moving-window specification ismore likely to provide an output showing abrupt changes incontinuity that may be of special interest.

The box-car and the three distance-decay kernel functionsconsidered in this study are defined in Table 3. Each functionincludes a bandwidth parameter (r or b), which controls the rateof decay. All functions are defined in terms of weighting thesample data, where i is taken as the index of the calibration point,j the index of the sample data point and dij the distance betweenthe points indexed by i and j. For the box-car and bi-squarefunctions, the bandwidth r can be specified beforehand (i.e. a fixeddistance) or specified as the distance between the point i and itsNth nearest neighbour, where N is specified beforehand (i.e. anadaptive distance). The bi-square function gives fractionaldecaying weights according to the proximity of the data to eachlocation i, up until a fixed distance or a distance according to aspecified Nth nearest neighbour. The local search strategy for thisand the box-car function is simply N neighbours within a fixedradius r or N nearest neighbours for an adaptive approach. Bothfunctions can suffer from discontinuity, although the bi-squarefunction can be defined with a bandwidth that uses all of the datato minimise such problems.

The Gaussian and exponential functions are continuous anduse all the data. Their weights decay according to a Gaussian orexponential curve. According to the bandwidth set, data that are along way from the calibration point i receive virtually zero weight.The key difference between these functions is their behaviour atthe origin. Usually these continuous functions are defined with afixed bandwidth b, but can be constructed to behave in anadaptive manner. The bi-square function is useful as it can provide

ARTICLE IN PRESS

Kernel shapes

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-50t

K (t

)

Box-car weighting

Bi-square weighting

Gaussian weighting

Exponential weighting

-40 -30 -20 -10 0 10 20 30 40 50

Fig. 1. Kernel shapes (r=25 and b=11.785 see Table 3).

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–70 57

an intermediate weighting between the box-car and the Gaussianfunctions. To get similar weights from the bi-square and Gaussianfunctions, the bandwidths r and b can be approximately related byrffið3ð

ffiffiffi2p

=2Þb. The shapes of the functions are pictured in Fig. 1with bandwidths chosen to highlight this relationship. For allfunctions, if r or b is set suitably large enough, then all data canreceive a weight of one and hence the corresponding globalstatistic would be found.

3.3. Bandwidth size

There are many interacting specifications when implementinga kernel-based model. Bandwidth size, type of bandwidth(adaptive or fixed), type of kernel, shape of kernel (circular,elliptical, etc.), local search strategy and the data requirements ofthe local statistic or model (with respect to reliability) are allinter-connected. The problem lies in how to specify the model sothat the true heterogeneous nature of a given process isadequately depicted. In practice, bandwidth size is almost alwaysthe crucial model parameter and focus is usually placed on findingit (see Clark, 1977). For KDE there are many automated approachesfor finding an optimal bandwidth, where a cross-validationapproach is commonly taken (see Bowman and Azzalini, 1997,pp. 31–36). However cross-validation is possible only if there is anobjective function to cross-validate with and for most localstatistics this is simply not possible. Thus for any local statisticthat cannot be used to predict, bandwidths need to be chosensubjectively. This is not necessarily a problem, as their calculationand visualisation using a range of bandwidths is appropriate in anexploratory analysis.

3.4. Formulae: univariate and bivariate GWSSs

Formulae for the calculation of GWSSs can now be defined andin all cases, accord to those given in Brunsdon et al. (2002) andFotheringham et al. (2002, pp. 159–185). For parameterising andinterpreting a GWR model, it is (at least) useful to explore thechange across space in the mean, variance, coefficient of variationskew and correlation coefficient. Local correlations are particu-larly useful in that they allow preliminary investigations intorelationship nonstationarity prior to a GWR fit. Global correlations(and by extension the global multiple linear regression (MLR) fit)should not be overlooked. This is especially true for physical

processes such as the critical load process which are not expectedto show strongly nonstationary relationships as that commonlyfound with socio-demographic/economic data (for which GWRhas been extensively developed for and applied to).

Thus for attribute z a local mean can be defined asmðziÞ ¼

Pnj ¼ 1 wijzj=

Pnj ¼ 1 wij, where m(zi) is the local mean value

at any location i and wij accords to some kernel function. A localvariance can be defined as s2ðziÞ ¼

Pnj ¼ 1 wijðzj �mðziÞÞ

2=Pn

j ¼ 1 wij

and a local coefficient of variation (CV) can be defined asCVðziÞ ¼ sðziÞ=mðziÞ, where s(zi) is the local standard deviation(SD). A local skewness can be defined as

bðziÞ ¼ ð3ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPn

j ¼ 1 wijðzj �mðziÞÞ3=Pn

j ¼ 1 wij

qÞ=sðziÞ. For attributes

z and y, a local correlation coefficient can be defined asp(zi, yi)=c(zi, yi)/(s(zi)s(yi)), where p(zi, yi) is the local correlationcoefficient at any location i; s(zi) and s(yi) are the respective localSDs; and cðzi; yiÞ ¼

Pnj ¼ 1 wijfðzj �mðziÞÞðyj �mðyiÞÞg=

Pnj ¼ 1 wij is

the local covariance.

3.5. Tests

Fotheringham et al. (2002, pp. 165–169) discuss methods forinterpreting local statistics. Here it is suggested that themagnitude of local z-scores can give an indication of how localmeans are different from the global mean. That is local z-scorescan identify areas where the local mean varies more thanexpected (under the assumption of no local variation in the localmean). Local z-scores zsi are defined as zsi ¼ ðmðziÞ � mÞ=ðs

ffiffiffiffiffiffiffiffiffiffiffiffiffiPjw

2ij

at any location indexed by i, where m is the global mean estimate,s the global SD estimate and wij can be any weighting function,but re-scaled to sum to one for each i; 95% limits (i.e. |zsi|Z1.96))can be used to identify interesting or unusual local means forexploratory purposes, but should not be used as a formalstatistical significance test.

For higher moments it is not easy to find an approximatedistribution and therefore similar local significance tests are notso easily derived. Consequently, Monte Carlo simulation tests areadvocated. These tests identify areas where local statistics are‘significantly’ different from such local statistics found by chanceor artifacts of random variation in the data. Here the sample dataare successively randomised and the nonstationary model isapplied after each randomisation. A basis of a significance test isthen possible by comparing actual results with results from a

ARTICLE IN PRESS

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–7058

large number of randomised distributions. The randomisationhypothesis is that any pattern seen in the data occurs by chanceand therefore any permutation of the data is equally likely. Thetest proceeds as follows:

calculate the true local statistic at any location i using thesample data; � randomly choose a permutation of the data (note that the

coordinates are kept in the same pairs, as are the attributepairs for correlations);

� calculate the same local statistic at location i using the

randomised data;

� repeat steps 2 and 3, say 999 times (the more the better); � rank the 999 simulated local statistics and the true local

statistic;

� ascertain where the true local statistic lies in this ranked scale

of 1000 values;

� if the true local statistic lies in the top or bottom 2.5% tail of

this ranked distribution then the true local statistic can be saidto be ‘significantly’ different (at the 95% level) to such a localstatistic found by chance.

Note that the test is conditional on the specification of thenonstationary model in the first place (i.e. bandwidth size, type ofkernel, etc.).

4. Analysis of the critical load data set

4.1. Standard EDA

4.1.1. critical load distribution and trend

A histogram for the critical load data is shown in Fig. 2a, wheremulti-modality suggests evidence of two or three critical loadpopulations. The sample distribution is positively skewed, whereits median value is about 42% lower than its mean value (seeTable 4). From the spatial distribution given in Fig. 2b, low criticalloads are predominantly in areas of N Scotland, Wales and parts ofN England. Low critical loads occur to a lesser extent in a few areasof SW and SE England. All such areas are therefore the mostsensitive to inputs of acid anions. High critical loads cover themajority of England and central to southern Scotland. Thus ageneral trend of high to low critical loads is apparent in the SE toNW direction. Outlying data can be found in N and SW England.Overall there is visual evidence for both global and local trends incritical load variability.

Stronger evidence of any global trend in the critical load datamay be found by plotting critical load against each spatialcoordinate. Such plots are given in Fig. 3a, where unfortunatelya clearly defined relationship is not evident. A plot limited to theSE to NW direction would be expected to show the strongestrelationship, but again this is not evident. Moderate relationshipsare similarly found with simple MLR trend fits, where first- andsecond-order polynomials of the coordinates give R2 valuesof only 0.25 and 0.26, respectively (a trend fit limited to theSE to NW direction similarly gives a weak R2 value of 0.24).

4.1.2. Global relationships

A linear correlation matrix relating critical load with thecontinuous catchment data is given in Table 5, where all threecatchment variables are moderately, positively correlated withcritical load (a five-number summary of each variable is also givenin Table 6). These catchment variables are also moderatelycollinear and may therefore offer similar critical loadexplanatory powers to each other. Both raw and ranked data

correlations are given, where the ranked data correlations revealsimilar relationships to that found with the raw data. Thus at thisglobal scale any outlying data that exist appear to have a minimalinfluence on relationships. Scatterplots (Fig. 3b) largely confirmsuch relationships (but with much scatter) and experimentationwith various data transforms could not strengthen anyrelationship. It is likely that only one of these variables shouldbe used in any (global) MLR model at a time. Locally with GWR,this may be different.

Conditional critical load distributions for the LC9D variable areinvestigated using parallel box plots in Fig. 4a. LC9D classes 2–4appear to relate to high critical loads (i.e. these land cover classesoccur in large areas of England and to a lesser extent, areas ofcentral to southern Scotland, see Fig. 5a). It is also possible thatsome contextual relationship may discriminate between criticalload populations. To this extent, the LC9D variable is related tothree critical load populations, which are experimentally definedusing thresholds of 3 and 17 keq. H+ ha�1 year�1 (chosen from aninvestigation of the cumulative critical load distribution andlooking for where a change in gradient occurs, see Fig. 2c). FromFig. 4b, it appears that the LC9D variable can discriminatebetween the first (low-valued) and second (medium-valued)critical load populations, which is promising.

4.2. EDA with GWSSs

4.2.1. Specifications

The irregular shape of Great Britain should favour thespecification of adaptive bandwidths over fixed ones and unlessstated otherwise, box-car kernels are chosen. These specificationsshould provide local statistic surfaces that are fairly simple tointerpret, highlighting any unusual spatial features. Bandwidthsare given as a percentage, where for example, a bandwidth of 5%will use a data subset of the nearest N=25 observations. All localstatistics are calculated on the same rectangular 35E�50N gridand maps are presented in isoline form. In all cases, test values arecalculated on a much smaller 10�16 grid to aid interpretation.These are then mapped with the same local statistic calculated onthe 35�50 grid for context. The same bandwidth is used in eachcase. This visualisation strategy is advocated in Fotheringham etal. (2002, p. 167). All of the local statistic surfaces presented arejudged to be representative only after some initial exploration andexperimentation. The local statistic algorithms were developedwithin the R statistical computing environment (Ihaka andGentleman, 1996).

4.2.2. Outlying critical load data

Before investigating the spatial change in critical loadstatistics, it is first useful to identify spatial outliers. Theidentification of spatial outliers can aid in the interpretation ofany spatial model output and an analysis with the correspondingfiltered data set can act as a simple alternative to the use ofspecifically designed robust models (e.g. see Brunsdon et al., 2002,for the use of quantile-based GWSSs). Spatial outliers can beidentified from a geostatistical variographic analysis (e.g. seePloner, 1999), but for this study, the method of Hawkins (1980)described in Rossi et al. (1992) is followed.

In this method, all sample values, z(xa) are suspected a priori tobe spatial outliers, where z(xa) is a spatial outlier ifðNðzðxaÞ �mlÞ

2Þ=ððNþ1Þs2

l ÞZw2crit-1. Here N is the number of

neighbouring values of z(xa), ml the local mean, s2l the average

variance for equivalently sized neighbourhoods across the samplearea (i.e. the average local variance) and w2

crit-1 is a critical value ofthe chi-squared distribution for one degree of freedom. As there is

ARTICLE IN PRESS

0

10

20

30

40

50

60

70

80

90

100

0Critical load

Cum

ulat

ive

freq

uenc

y %

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34

Fig. 2. Critical load aspatial (a), spatial (b) and aspatial cumulative (c) distributions. Note that Orkney and Shetland Islands (off NW Great Britain) have been omitted, as

they have no sampled sites. Cumulative distribution is shown with experimental thresholds at 3 and 17 keq. H+ ha�1 year�1.

Table 4Summary statistics for critical load data.

Minimum 0 Standard deviation 5.74

Mean 5.87 Coefficient of variation 0.98

Median 3.41 Skew 1.21

Maximum 31.32 Kurtosis 1.04

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–70 59

no objective function for cross-validation, neighbourhood defini-tions can only be chosen subjectively with this test statistic.

To calculate this test statistic for the critical load data, the localmean and variances were found using an adaptive bandwidth of5%. At the 95% level of confidence, this resulted in approximately5% of the data being identified as spatially outlying, which in turnyielded a filtered data set of 474 values. Fig. 5b maps the identified

spatial outliers. There are no obvious trends or distinct clusters ofspatial outliers, although the highest concentrations appear inareas of N England and on the border of SE Wales/SW England. Itis suspected that many of the outliers detected in these areasinfluence each other in this method (e.g. see the cluster in SEWales). That is, the removal of only one or two key outliers mayresult in the remaining nearby outliers being de-classified asspatially outlying. Overall, the critical load process appearssmooth and continuous for much of Scotland, whereas for partsof England and Wales there is evidence of much discontinuity.

4.2.3. critical load distribution and trend

Local mean, variance and skew surfaces for the critical loaddata are given in Fig. 6a. From the local mean surface, thedistribution of unusual means (via z-scores located on the 10�16grid) suggests unusually low critical loads located in N Scotland.

ARTICLE IN PRESS

Fig. 3. Critical load scatterplots versus (a) spatial coordinates and (b) continuous catchment data. Scatterplots are shown with marginal box plots, MLR fit (dashed line) and

LR fit (solid line) using R code provided with applied regression work of Fox (2002). LR smoothing parameter is conservatively chosen at 0.8 (i.e. proportion of observations

included locally). Correlations for critical load to Easting and critical load to Northing are r=0.46 and �0.43, respectively. Continuous catchment variables are jittered

(a random error addition to the coordinate) to negate effects of over-plotting.

Table 5Linear raw (and ranked) data correlations for critical load and continuous

catchment data.

Critical load Wt.GSP Wt.SBCP Wt.SCLP

Critical load 1 0.58 (0.58) 0.65 (0.64) 0.54 (0.57)

Wt.GSP 1 0.66 (0.67) 0.59 (0.58)

Wt.SBCP 1 0.73 (0.73)

Wt.SCLP 1

Table 6Five-number summaries for critical load and continuous catchment data.

Min. Q1 Median Q3 Max.

Critical load 0 1.29 3.41 9.31 31.32

Wt.GSP 1.00 1.00 2.00 3.10 4.00

Wt.SBCP 10.00 10.00 21.10 58.40 80.00

Wt.SCLP 0.10 0.50 0.50 1.00 4.00

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–7060

ARTICLE IN PRESS

Fig. 4. Land cover data: (a) nine conditional distributions and (b) according to three experimental critical load populations defined by thresholds of Fig. 2(c).

Fig. 5. (a) LC9D classes 2/3/4 and (b) location of critical load spatial outliers.

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–70 61

Unusually high critical loads can be found in a large area of centralto SE England and a small area of N England/S Scotland. Data inthe identified areas relate to the multi-modal nature of the criticalload histogram. From the local variance surface, critical loadslocated in SW and N England reveal the highest variability. Clearly,spatial outliers influence the magnitude of the variance in suchareas (see Fig. 5b). A randomisation test suggests that critical loadvariability in a large area of N Scotland is ‘significantly’ low, andthat there are pockets of ‘significantly’ high and low variances,elsewhere. Thus N Scotland provides the least variation in criticalload, as does a large area of central to SE England. This is

interesting as such areas have predominantly low and high criticalloads, respectively. Thus high critical loads do not necessarilycoincide with high variance.

This unusual phenomenon can be more clearly seen with a CVsurface. Here if high critical loads coincide with high variance (orSD) whilst low critical loads coincide with low variance (or SD)(i.e. a proportional relationship between local mean and SD data),then the CV should be fairly uniform across space. Fig. 6b presentsthree such surfaces, where bandwidths are taken at 5%, 10% and20% (to highlight how interpretations can change depending onwhat scale the process is viewed at). Clearly, there is little

ARTICLE IN PRESS

Fig. 6. Critical load local statistic surfaces using an adaptive box-car kernel: (a) bandwidths set at 5% for the mean and 10% for variance and skew surfaces and (b)

bandwidths set at 5%, 10% and 20% (top–bottom) for CV surfaces. All surfaces are shown with corresponding test results (defined on a 10�16 grid). For mean and variance

surfaces, a few white areas within sampled region indicate local moments outside ranges specified.

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–7062

evidence that the SD scales with the mean for the critical loadprocess on any global scale. In addition to the relatively lowvariation that is evident in central to SE England, a relatively highlocal variation is evident in N England.

From the local skew surface in Fig. 6a, the same area of centralto SE England that has low variance also has a negatively skeweddistribution. This contrary direction of skew can be explained by adomination of high critical loads combined with a handful of lowcritical loads. That is the tail of this local distribution is stretchedwith low critical loads. Interestingly a randomisation test not onlyidentifies this area of negative skew, but also an area of positive

skew in Scotland as unusual. These findings suggest that datatransforms may need to be defined locally.

4.2.4. critical load distribution and trend: use of more robust

measures

To gauge the effect of spatial outliers, local moments are nextcalculated using the filtered data set of 474 values. As mostoutliers are high valued, calibration with the filtered datagenerally lowers the local means and variances in those areasmost affected by outlying data (compare surfaces in Fig. 7 with

ARTICLE IN PRESS

Fig. 7. Critical load surfaces for robust statistics using an adaptive box-car kernel. Bandwidths are set at 5% for the mean (a) and 10% for variance (b) surface (both use

filtered data). Bandwidth is set at 10% for reduced variability (c) surface. All surfaces are shown with corresponding test results.

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–70 63

corresponding surfaces in Fig. 6a). For local skew (not shown), afew areas in central S England change sign. However, reducingunwanted variability is the key reason for identifying outliers inthe first place. One way to visualise any reduction in variance isshown in Fig. 7c, where differences in the local variances from thefull to filtered data (at the 474 sites) are locally averaged. Here it isclear that for N England, critical load variability has lowereddramatically. This relates to an influential area of outliersobserved before. With the full data set, this area is likely to havethe least continuity in critical load and prove the mostproblematic when modelling with GWR and similar spatialmodels.

4.2.5. critical load distribution and trend: use of

distance-decay kernels

As the choice of kernel and bandwidth cannot usually bechosen with any objectivity, it is important to experiment.Therefore in the first instance, the local mean, variance and skewsurfaces are re-specified with a bi-square kernel using 15%, 30%and 30% adaptive bandwidths, respectively (i.e. N=75, 150 and150, see Fig. 8a). In each case, this results in a more continuousand smooth output, but where the previously perceived momentnonstationarity is (reassuringly) confirmed. To get such smoothoutputs using a box-car kernel would require a much largerbandwidth, but this would reduce local detail. In the secondinstance, an alternative comparison is possible by re-specifyingone statistic using different kernels. Here the local CV surfaces arere-specified in Fig. 8b using bi-square, Gaussian and exponential

kernels, where fixed bandwidths are chosen to highlightthe smoothing similarities between the weighting functions (c.f.Fig. 1). Again (and as expected) a more continuous output isapparent from all three surfaces. Again there is little evidence thatthe SD scales with the mean on any global scale, but now therelatively low variation that is still evident in central to SEEngland appears more clearly defined.

4.2.6. Distribution and trend with the continuous catchment data

To interpret local correlation between a continuous catchmentvariable and critical load, it is first useful to assess how thecatchment variable itself varies across space. Hence local meanand SD surfaces are presented in Fig. 9 for Wt.GSP, Wt.SBCP andWt.SCLP. In a broad global sense, it is evident that these variablesvary in a similar fashion to each other and also to critical load (c.f.Fig. 6a). Consequently, any one of these variables should at leastexplain the vague NW–SE trend in critical load across GreatBritain and these initial results tend to suggest fairly stationarycritical load to continuous catchment data relationships. Areas oflow SD in an independent variable can also indicate where GWRmay have calibration difficulties (i.e. towards singular matrices).In this respect, areas of low SD should be noted and they are SEEngland and N Scotland for Wt.GSP; N Scotland for Wt.SBCP;Wales and N England/Scotland for Wt.SCLP. Observe also that thespatial distribution of the land cover classes with respect to theirrelationship to high and low critical load data values (i.e. thosegiven in Fig. 5a) shows a broad similarity with the mean surface of

ARTICLE IN PRESS

Fig. 8. Critical load surfaces for: (a) mean, variance and skew each using an adaptive bi-square kernel and (b) CV using bi-square, Gaussian and exponential fixed kernels

(top to bottom). Bandwidths (bi-square kernel) are set at 15% for mean and 30% for variance and skew surfaces. For bi-square mean surface, a few white areas indicate local

means outside ranges specified. CV bandwidths are set at 350 km for bi-square (i.e. r=350) and 165 km for Gaussian and exponential kernels (i.e. b=165). All surfaces are

shown with corresponding test results.

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–7064

each continuous catchment variable. This suggests a furthercollinearity amongst the catchment data.

4.2.7. Local relationships: collinearity in the

continuous catchment data

As the continuous catchment data vary in a similar spatialfashion to each other, it is likely that the (global) collinearity inthese variables extends to a more local scale. To see this a localcorrelation matrix for the continuous catchment data is given inFig. 10, where map intervals coloured: (a) pink suggest little or nocollinearity, (b) white suggest moderate to strong positivecollinearity and (c) green suggest moderate to strong negative

collinearity. Clearly, the degree of collinearity at a global scaledoes not always extend to the local scale after all. For example,Wt.SBCP and Wt.SCLP have little relationship in W Scotland, or forexample, Wt.GSP and Wt.SCLP have little relationship over largeareas of SE England and N England/S Scotland. Results suggestthat a combination of continuous catchment variables can beincluded in a GWR fit, without compromising its interpretation.However, such combinations are still unlikely to be viable with anMLR fit. It remains to be seen if the LC9D variable iscomplementary to the continuous variables. It is difficult toinvestigate LC9D in relation to other independent data due to itscategorical nature.

ARTICLE IN PRESS

Fig. 9. Local mean (a) and SD (b) surfaces for Wt.GSP, Wt.SBCP and Wt.SCLP. All surfaces are specified with an adaptive box-car kernel using a 5% bandwidth for mean and a

15% bandwidth for SD surface. All surfaces are shown with corresponding test results.

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–70 65

4.2.8. Local relationships: critical load correlations

Local correlations can now be used to explore spatial relation-ships between the continuous catchment data and critical load.Surfaces and histograms are given in Fig. 11, where the histogramsare constructed from local correlation values at every calibrationdata location. It is apparent that for all surfaces there is nosignificant change of correlation sign across space. Overall thereare no strongly nonstationary relationships and in general, anychange in relationship occurs at a fairly large spatial scale. It isconsidered that critical load’s relationship with Wt.SBCP is themost stationary, whilst critical load’s relationship with Wt.GSP isthe most nonstationary (primarily due to a weak correlation in N

England/S Scotland). The consistently moderate to strong natureof the critical load to Wt.SBCP relationship suggests that thisvariable is likely to be the most promising in any regression.Wt.SBCP relates best to critical load in Wales and SW England.Regional differences in critical load’s relationship to Wt.GSP canalso be illustrated by sub-setting the data into (a) sites below400 km Northings, (b) sites between 400 and 700 km Northingsand (c) sites above 700 km Northings, and then plotting Wt.GSPagainst critical load. The relationship should be the strongest withsubset (a) in the south of Great Britain (see Fig. 12). Clearly, a localcorrelation surface is a more elegant approach to explore localrelationships.

ARTICLE IN PRESS

Fig. 10. Local correlation matrix for continuous catchment variables. Correlations are defined with an adaptive 15% bandwidth using a box-car kernel. All surfaces are

shown with corresponding test results. Inter-quartile ranges are also given for local correlation values at every calibration data location only (i.e. n=497).

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–7066

4.2.9. Local relationships: critical load correlations with distance-

decay kernels

Each continuous catchment variable appears to relate weaklyto critical load in areas of NW Scotland. For Wt.SCLP, a weakcorrelation is likely to be genuine and a consequence of littlevariation in one or both variables. However for Wt.GSP andWt.SBCP, the more abrupt changes in correlation are a likelyartefact of one or two outlying observation pairs just fallingwithin the box-car kernel. For the correlation coefficient, anoutlying value (or relationship) can seriously affect this linearconstruct. In the spirit of exploration, it is worthwhile experi-menting with other kernel specifications less susceptible tooutliers. As such, local correlation surfaces for Wt.GSP, Wt.SBCPand Wt.SCLP are re-specified with an exponential kernel inFig. 13a. Here the chosen bandwidth matches that used in anAIC-defined GWR model of a companion study. According to thisnew specification, weak correlations with critical load are not soevident in NW Scotland for Wt.GSP and Wt.SBCP, whereas forWt.SCLP a weak correlation remains (as expected).

4.2.10. Local relationships: robust critical load correlations

Specifying distance-decay kernels to negate the effects ofoutlying data on local correlations is only one approach to thisproblem. For example, a filtered data approach could be usedinstead. However, it does not directly follow that representativecorrelation surfaces would be found using the filtered data setfound from before, as the filtering was based only on theidentification of outlying data in a univariate sense. Instead a

different filtered data set is now needed where critical loads thatare considered unusual in their relationship to the catchment dataare filtered out. As an example, such relationship outliers could beidentified from an assessment of high prediction errors from someregression fit.

Alternatively, a third and more direct approach to provide arobust local correlation surface is to re-apply the local correlationalgorithm to ranked data. Even though individual outlyingrelationships are not directly identified, as with a filtered dataapproach, this direct approach should at least identify regionswhere outlying relationships are most influential. As such, localrank correlation surfaces for Wt.GSP, Wt.SBCP and Wt.SCLP aregiven in Fig. 13b, each using the same kernel as that specified inFig. 13a. It appears that the use of ranked data has the greatestimpact when interpreting local relationships between critical loadand Wt.SCLP, whereas other critical load relationships remainbroadly similar. These findings loosely mimic those found globally(see Table 5).

5. Discussion

5.1. Analysis summary

Standard EDA has found evidence for both global and localspatial trends in critical load variability across Great Britain.Moderately strong correlation coefficients are found at the globalscale, but these single-valued statistics can mask high scattering

ARTICLE IN PRESS

Fig. 11. Local correlation surfaces (a) and histograms (b) for critical load relationships with Wt.GSP, Wt.SBCP and Wt.SCLP. All correlations are found using an adaptive box-

car kernel with a 15% bandwidth. All surfaces are shown with corresponding test results. Histograms represent local correlation values at every calibration data location

only (i.e. n=497).

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–70 67

in each critical load to continuous catchment variable (i.e.geological sensitivity, soil buffering capacity and soil critical load)relationship. Similar critical load explanatory powers are likelywith a categorical land cover variable, which could only partiallydiscriminate between three (experimental) critical load popula-tions.

According to the mapping of local statistics, there is strongevidence of mean, variance, CV and skew nonstationarity in thecritical load process. Spatial outliers are the most influential in anarea of N England, giving rise to a heightened critical loadvariance. The continuous catchment variables vary spatially in a

similar (global) fashion to each other and to critical load. Howeverlocally there are differences and the high collinearity found at aglobal scale does not always extend to the local scale. For localcorrelation with critical load, there is no significant change of signacross space with the continuous catchment data. In general andas expected with a physical process, any change in relationshipbetween these data and critical load only occurs at a fairly largespatial scale. Geological sensitivity appears to have the mostmarked nonstationary relationship to critical load, whilst soilbuffering capacity appears to have the most marked stationaryrelationship.

ARTICLE IN PRESS

Fig. 12. Scatterplots for sub-setted critical load and Wt.GSP data: (a) sites below 400 km Northings, (b) sites between 400 km and 700 km Northings and (c) sites above

700 km Northings. Linear correlation coefficients are 0.58, 0.24 and 0.36 for subsets (a)–(c), respectively. All plots are shown with marginal box plots, MLR fit (dashed line)

and LR fit (solid line with smoothing parameter=0.8). Wt.GSP is jittered to aid visualisation.

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–7068

5.2. The Critical load data

As with any statistical analysis, model choice and outputdepend strongly on the sample data. In this respect, there are twomisgivings with the critical load data that need to be noted.Firstly, the water chemistry samples were taken over a 2-yearperiod (1992–1994), so it needs to be assumed that spatialvariation in water chemistry has not been contaminated with anytemporal variation in water chemistry. Secondly, the waterchemistry sites were chosen to represent the most sensitivewater body within a 10 or 20 km grid square so that the minimumcritical load could be calculated. Unfortunately, this site selectionwas unlikely to be free from error. Curtis et al. (1995) suggestedthat as many as a third of the selected sites were not at the mostsensitive water body within the given grid square. This entailsthat a third of the critical load data could be a significant over-estimate of the minimum critical load. Consequently (and for bothmisgivings), the critical load data may actually reflect a mixture ofcritical load populations, which may then lead to incorrect modelchoices and spurious results. It is suggested that a random siteselection within each grid square over a much shorter samplingtime period would have reduced such concerns.

6. Conclusions

It has been worthwhile to investigate a freshwater acidificationcritical load data for Great Britain using GWSSs, as a goodunderstanding of spatial variation and spatial covariation in this

data set has unfolded. With the use of GW univariate statistics,critical load and catchment data distributions can be taken asnonstationary in all of their key moments. With the use of GWbivariate statistics, many critical load to catchment data relation-ships are also nonstationary (which agrees with previousresearch), but this occurs only at a fairly large spatial scale (whichis to be expected from a physical process). From these exploratoryresults, an investigation with GWR of the same data set is nowwarranted. GWR enables a more complete investigation intononstationary relationships than that found with the GWcorrelation coefficient as (a) GWR incorporates the controllingeffects that each independent variable has on each other, (b) GWRcan easily include any categorical independent variable and (c)GWR calibration does not have to depend on an arbitrarily chosenweighting function. Other spatial models can also follow thisGWSS analysis, as ultimately, models for predicting critical loadsare sought. A GW mean or GWR itself can be used as one suchpredictor. Alternatively if a geostatistical predictor is preferred,then it should be constructed in a manner that caters for thenonstationarities observed here.

Acknowledgements

Research presented in this paper was funded by a StrategicResearch Cluster Grant (07/SRC/I1168) by the Science FoundationIreland under the National Development Plan. The authorsgratefully acknowledge this support and the first author’s Ph.D.

ARTICLE IN PRESS

Fig. 13. Local correlation surfaces for critical load relationships with Wt.GSP, Wt.SBCP and Wt.SCLP: (a) raw data and (b) ranked data. All surfaces are specified with an

exponential kernel using an adaptive bandwidth set at 3.62%. This bandwidth is a nonlinear parameter, which loosely reflects a local data subset size (as a percentage) that

exerts the greatest influence on each local correlation calculation. All surfaces are shown with corresponding test results.

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–70 69

studentship at Newcastle University. Thanks are also due to A.S.Fotheringham and S. Juggins.

References

Battarbee, R.W., Allot, T.E.H., Juggins, S., Kreiser, A.M., Curtis, C., Harriman, R., 1996.Critical loads of acidity to surface waters—an empirical diatom-basedpaleolimnological model. Ambio 25, 366–369.

Bowman, A.W., Azzalini, A., 1997. Applied Smoothing Techniques for DataAnalysis—The Kernel Approach with S-plus Illustrations. Oxford UniversityPress, New York (193 pp.).

Brimblecombe, P., 2007. Preface. Water, Air, and Soil Pollution: Focus 7, 1–2.Brunsdon, C., Fotheringham, A.S., Charlton, M.E., 1996. Geographically weighted

regression: a method for exploring spatial nonstationarity. GeographicalAnalysis 28, 281–289.

Brunsdon, C., Fotheringham, A.S., Charlton, M.E., 2002. Geographically weightedsummary statistics—a framework for localised exploratory data analysis.Computers, Environment and Urban Systems 26, 501–524.

Carroll, Z.L., Oliver, M.A., 2005. Exploring the spatial relations between soil physicalproperties and apparent electrical conductivity. Geoderma 128, 354–374.

CLAG Freshwaters, 1995. Critical loads of acid deposition for United Kingdomfreshwaters. Sub-report on Freshwaters, Critical Loads Advisory Group,Institute of Terrestrial Ecology (ITE), Penicuik, Scotland, 80 pp.

Clark, R.M., 1977. Non-parametric estimation of a smooth regression function.Journal of the Royal Statistical Society B 39, 107–113.

Cleveland, W.S., 1979. Robust locally weighted regression and smoothingscatterplots. Journal of the American Statistical Association 74, 829–836.

Curtis, C.J., Allott, T.E.H., Battarbee, R.W., Harriman, R., 1995. Validation of the UKcritical loads for freshwaters: site selection and sensitivity. Water, Air, and SoilPollution 85, 2467–2472.

Curtis, C., Allott, T., Hall, J., Harriman, R., Helliwell, R., Hughes, M., Kernan, M.,Reynolds, B., Ullyett, J., 2000. Critical loads of sulphur and nitrogen for

ARTICLE IN PRESS

P. Harris, C. Brunsdon / Computers & Geosciences 36 (2010) 54–7070

freshwaters in Great Britain and assessment of deposition reduction require-ments with the First-order Acidity Balance (FAB) model. Hydrology and EarthSystem Sciences 4, 125–140.

Diggle, P., 1985. A kernel method for smoothing point process data. AppliedStatistics 34, 138–147.

Fotheringham, A.S., Brunsdon, C., Charlton, M., 2002. Geographically WeightedRegression—The Analysis of Spatially Varying Relationships. John Wiley,Chichester, Sussex (269 pp.).

Fox, J., 2002. An R and S-Plus Companion to Applied Regression. Sage, London(312 pp.).

Hall, J.R., Wright, S.M., Sparks, T.H., Ullyett, J., Allott, T.E.H., Hornung, M., 1995.Predicting freshwater critical loads from national data on geology, soils andland use. Water, Air, and Soil Pollution 85, 2443–2448.

Hastie, T.J., Tibshirani, R.J., 1990. Generalized Additive Models. Chapman & Hall,London (335 pp.).

Hastie, T.J., Tibshirani, R.J., 1993. Varying-coefficient models. Journal of the RoyalStatistical Society B 55, 757–796.

Hawkins, R.M., 1980. Identification of Outliers. Chapman & Hall, London (188 pp.).Henriksen, A., Kamari, J., Posch, M., Wilander, A., 1992. Critical loads of acidity:

Nordic surface waters. Ambio 21, 356–363.Ihaka, R., Gentleman, R., 1996. R: a language for data analysis and graphics. Journal

of Computational and Graphical Statistics 5, 299–314.Isaaks, E.H., Srivastava, R.M., 1989. An Introduction to Applied Geostatistics. Oxford

University Press, New York (561 pp.).Kernan, M.R., Allott, T.E.H., Battarbee, R.W., 1998. Predicting freshwater critical

loads of acidification at the catchment scale: an empirical model. Water, Air,and Soil Pollution 185, 31–41.

Kernan, M.R., Haliwell, R.C., Hughes, M.J., 2001. Predicting freshwater critical loadsfrom catchment characteristics using national datasets. Water, Air, and SoilPollution: Focus 1, 415–435.

Kreiser, A.M., Patrick, S.T., Battarbee, R.W., 1993. Critical loads for UK freshwa-ters—introduction, sampling strategy and use of maps. In: Hornung, M.,Skeffington, R.A., (Eds.), Critical Loads: Concepts and Applications, Proceedingsof ITE Symposium No. 28, HMSO (Her Majesty’s Stationery Office), London, pp.94–98.

Loader, C., 2004. Smoothing: local regression techniques. In: Gentle, J., Hardle, W.,Mori, Y. (Eds.), Handbook of Computational Statistics. Springer-Verlag,Heidelberg, pp. 539–564.

Mason, C.F., 1993. Biology of Freshwater Pollution. John Wiley, New York (351 pp.).Ploner, A., 1999. The use of the variogram cloud in geostatistical modelling.

Environmetrics 10, 413–437.Posch, M., Kamari, J., Forsius, M., Henriksen, A., Wilander, A., 1997. Exceedance of

critical loads for lakes in Finland, Norway and Sweden: reduction requirementsfor acidifying nitrogen and sulphur deposition. Environmental Management21, 291–304.

Rossi, R.E., Mulla, D.J., Journel, A.G., Franz, E.H., 1992. Geostatistical tools formodelling and interpreting ecological spatial dependence. Ecological Mono-graphs 62, 277–314.

Silverman, B.M., 1986. Density Estimation for Statistics and Data Analysis.Chapman & Hall, London (175 pp.).

Slootweg, J., Hettelingh, J-P., Posch, M., Schutze, G., Spranger, T., de Vries, W., Rinds,G.J., van’t Zelfde, M., Dutchak, S., Illyin, I., 2007. European critical loads ofcadmium, lead and mercury and their exceedances. Water, Air, and SoilPollution: Focus 7, 371–377.

Wand, M.P., Jones, M.C., 1995. Kernel Smoothing. Chapman & Hall, London(212 pp.).

Zhang, C., Jordan, C., Higgins, A., 2007. Using neighbourhood statistics and GIS toquantify and visualize spatial variation in geochemical variables: an exampleusing Ni concentrations in the topsoils of Northern Ireland. Geoderma 137,466–476.