abstract - geos.ed.ac.ukomacdona/dissertation/researchpaper... · partitioning of data, to give...

Research Paper

1

Abstract

Fragmentation of habitat is an issue of great concern to ecologists and conservation

managers, and poses a potential threat to the continued existence of species

populations. Many studies have been conducted worldwide to assess this threat, and

habitat-abundance models exist for a variety of species. Few studies though, have

looked at multiple species, or spatial arrangement of habitat.

The uplands of Scotland are characterised by a fragmented landscape with a trend for

larger scale conservation management. This paper proposes an approach combining

GIS data and statistical modelling with habitat spatial arrangement, to predict, across

Scotland, both overall diversity, and distribution and frequency (relative abundance)

for a smaller set focal species. Models created from GIS and stepwise regression are

shown to provide powerful tools for high resolution predictive modelling, and for

understanding the optimum spatial configurations of habitat that maximise diversity.

Research Paper

2

1 Introduction

Habitat fragmentation has been identified as a significant control on population and

dynamics for many species. Malanson and Cramer (1999, p.1) describe fragmentation

as “[possibly] the greatest current threat to biodiversity”. Despite this, the effects of

fragmentation have not been fully explored (Tucker, 2001), and relatively little is

known about implications for multiple species populations or for overall diversity.

1.1 Studies of upland bird species

Upland Scotland is host to a range of bird species populations of international

importance (Stillman and Brown, 1998; Tharme et. al., 2001). The landscape is

characterised by fragmented habitat with large-scale conservation management

(Moorland Working Group, 2002). Therefore species-habitat models which account

for fragmentation can assist land management for bird diversity in Scottish upland

regions. Most previous models have focussed on single species, e.g. Baines (1995),

Hancock et. al. (1999), Hill (2001), Hack (2002), Pearce-Higgins and Yalden (2003).

One notable exception is that of Stillman and Brown (1998) who carried out an

analysis of presence/absence (distribution) data across British upland areas. They

used the same bird survey data as this study (Gibbons et. al. 1993), and a similar

system of land cover classes to define upland habitats. From this they produced a

definitive list of upland bird species. They did not, however, look at the spatial

arrangement of habitat. Thus in Scotland in particular, neither diversity, nor its

relationship with habitat spatial structure, have been fully examined.

Statistical modelling is a widely used standard tool for many biological and ecological

studies. More recently, studies combining GIS and statistical modelling have been

used. In Scotland this approach has been used to predict the occurrence of Scottish

black grouse nationally (Hill, 2001) and also at a local home-range level (Hack,

2002). This paper seeks to extend these techniques, in order to predict the

environmental variables and landscape patterns that govern overall diversity.

1.2 Proposed approach

This paper utilises data from a 10-km nationwide bird survey (Gibbons et. al. 1993).

Bird data consist of species presence/absence in every 10-km square (distribution),

Research Paper

3

and the proportion of 2-km tetrads visited within each 10-km square in which a

species was observed (relative abundance, or frequency). These are combined with

land cover polygons, interpreted from aerial-photography dating from the same period

(Macaulay Institute of Land Use Research, 1989).

Statistical models are constructed for five selected upland-associated species, and also

for an index of overall diversity based on 27 upland bird species. Analysis of

landscape pattern allows the addition of many new potential explanatory variables.

These measures, or metrics , are submitted to the statistical modelling process in order

to determine the most significant spatial variables in controlling species distribution,

frequency, and overall diversity.

Research Paper

4

2 Methods

2.1 The study area

The extent of the land cover dataset used in this study is for Scotland only. Upland

regions are therefore selected from within Scotland, including all islands. Upland

areas are defined as polygons of suitable land cover types, i.e. cover types known to

constitute land lying above the line of enclosed or managed land. Additionally, the

study also includes land defined as non-upland that is completely enclosed by upland

habitat, as this may form “effective” habitat (Pearce-Higgins and Yalden, 2003).

2.2 Software used

Data preparation and later processing were undertaken in Microsoft Excel. Spatial

analysis was performed in ArcInfo 8 (ESRI, 2001), or ArcGIS 8.3 (ESRI, 2004), with

maps produced using ArcGIS 8.3. Landscape pattern metrics were calculated, to

quantify spatial structures, using Fragstats 3.3 (McGarigal et. al., 2002), automated by

use of Arc Macro Language (AML) and python scripts (www.python.org).

Partitioning of data, to give build and evaluation datasets, was performed in Excel

using simple randomisation (see section 2.4.1 for details). Data sets were assembled

into a single table, with one unique multiple-valued record for each grid square, using

Oracle 9i (Luscher, 2001) and SQL. All statistical models and predictions were made

using S-PLUS 6.0 (see S-PLUS 6 for Windows User’s Guide, 2001).

2.3 Data sources

The three principal data sets used in this study are outlined in Table 1. These are a

bird survey data set, air-photo derived land cover polygons, and raster elevation data.

2.3.1 Bird survey data

All bird data used in this study originate from a volunteer survey organised by the

British Trust for Ornithology (BTO) between 1988 and 1991 (Gibbons et. al., 1993).

For this study we looked at 691 upland 10-km squares for which bird numbers, land

cover, and land pattern metric data were available. These squares were partitioned

from an overall set of 993 squares, leaving a prediction set of 302 (30%). The spatial

Research Paper

5

unit of 10 km is considered appropriate for this study as it was the scale of unit used

by Gibbons et. al. (1993) and is sufficiently large to hold a number of home ranges for

most species (see Cramp and Simmonds, 1977).

2.3.2 Focal species

Data for a total of 29 upland bird species were provided by the BTO for this study.

Of these, 27 species were selected for analysis. These formed the basis for the

calculation of overall diversity in each 10-km square. The species names and BTO

codes are listed in Table 2. From the initial set, five species were selected for

individual statistical analysis using generalised linear modelling. These were black

grouse (BK), curlew (CU), golden plover (GP), meadow pipit (MP), and ptarmigan

(PM), and were chosen to reflect a range of generalist (commonly spread) and

specialist (localised) habitat-type species. A fuller review of each species is given in

Macdonald (2004).

Two species were not considered. Golden eagles were omitted as they can have a

home range of up to more than double that of the 10-km unit used in this study

(Sterry, 1995, p.42; see also www.hawk-conservancy.org/priors/geagle.shtml). Data

for common scoters were sparse and unsuitable for analysis.

2.3.3 Diversity Index

The Shannon Diversity Index, H, (Shannon and Weaver, 1963; Begon et. al., 1990)

was chosen to assess the overall diversity in each 10-km square. In this study we use

the natural logarithm, ln, as the basis for this index. The equation is given below.

�−=

i

ii PPH ln Equation 1

This index is conventionally calculated from proportional abundance data. I.e. the

proportion of individuals, Pi, of a community P, constituted by species i. In this study

the index is instead calculated from bird frequency, or relative abundance. This

means that the version of the Shannon Index presented here is in reality a transformed

measure of relative density. Since the number of tetrads visited per square is constant

for all species recorded in that square, however, and since positive frequency values

will always be positively correlated with actual abundance, or count, values, we thus

have a measure that is directly proportional to the true diversity index. Therefore in

this study, H, is a comparative measure of overall bird diversity in each 10-km square.

Res

earc

h P

aper

6

Tab

le 1

P

rinci

pal

dat

a se

ts u

sed f

or

crea

tion a

nd t

esti

ng o

f pre

dic

tive

model

s, d

iver

sity

, dis

trib

uti

on a

nd f

requen

cy o

f S

cott

ish u

pla

nd b

irds.

Data

set

D

escr

ipti

on

S

ou

rce

Ref

eren

ce

BT

O b

ird s

urv

ey d

ata

set.

The

New

Atl

as o

f B

reed

ing B

irds

in

Bri

tain

and I

rela

nd, 1988-1

991

Surv

ey o

rgan

ised

by t

he

Bri

tish

Tru

st

for

Orn

itholo

gy. C

om

pri

ses

bir

d

dis

trib

uti

on, fr

equen

cy, an

d m

ean

count

per

tet

rad f

or

ever

y 1

0-k

m

squar

e in

Bri

tish

and I

rish

Nat

ional

Gri

ds.

Under

lic

ence

to t

he

Inst

itute

of

Geo

gra

phy, U

niv

ersi

ty o

f E

din

burg

h.

Gib

bons

et. al

. (1

993)

Lan

d C

over

of

Sco

tlan

d 1

988

(LC

S88)

Lan

d c

over

poly

gon d

ata

set

pro

duce

d b

y t

he

Mac

aula

y L

and U

se

Res

earc

h I

nst

itute

. C

onsi

sts

of

126

pri

mar

y l

and c

over

cla

sses

and 1

197

mosa

ic (

2-t

ype)

poly

gons.

D

ata

inte

rpre

ted f

rom

air

photo

surv

ey

flow

n i

n 1

988.

Under

lic

ence

to t

he

Inst

itute

of

Geo

gra

phy, U

niv

ersi

ty o

f E

din

burg

h.

Mac

aula

y L

and U

se R

esea

rch

Inst

itute

(1989)

Ord

nan

ce S

urv

ey P

anora

ma

50m

Dig

ital

Ele

vat

ion M

odel

Ras

ter

elev

atio

n d

ata

der

ived

by O

S

from

thei

r 10m

conto

ur

dat

a se

t.

Over

all

quote

d a

ccura

cy 5

m.

Under

lic

ence

to t

he

Inst

itute

of

Geo

gra

phy, U

niv

ersi

ty o

f E

din

burg

h.

Als

o a

vai

lable

fro

m:

ww

w.e

din

a.ac

.uk/d

igim

ap

[Acc

esse

d 2

6 J

une

2004]

Research Paper

7

Table 2 Names and codes for all 27 upland bird species analysed in this study. Species

marked with an asterisk * were selected for individual statistical modelling. Sources: BTO

species code guide (www.bto.org), Sterry (1995).

Species Latin name BTO code

Black Grouse* Tetrao tetrix BK

Buzzard Buteo buteo BZ

Common Sandpiper Actitis hypoleucos CS

Curlew* Numenius arquata CU

Dipper Cinclus cinclus DI

Dunlin Calidris alpina DN

Goosander Mergus merganser GD

Greenshank Tringa nebularia GK

Grey Wagtail Motacilla cinerea GL

Golden Plover* Pluvialis apricaria GP

Hen Harrier Circus cyaneus HH

Merlin Falco columbarius ML

Meadow Pipit* Anthus pratensis MP

Peregrine Falco peregrinus PE

Ptarmigan* Lagopus mutus PM

Red Grouse Lagopus lagopus RG

Red-breasted Merganser Mergus serrator RM

Raven Corvus corax RN

Ring Ouzel Turdus torquatus RZ

Skylark Alauda arvensis S.

Short-eared Owl Asio flammeus SE

Snipe Gallinago gallinago SN

Green-winged Teal Ana crecca T.

Twite Carduelis flavirostris TW

Wheatear Oenanthe oenanthe W

Whinchat Saxicola rubetra WC

Wigeon Anas penelope WN

Research Paper

8

2.3.4 Topography data

Spatial summary statistics were calculated from an Ordnance Survey Panorama 50m

digital elevation model (DEM) using GIS overlay techniques. Values were extracted

from the DEM, and transferred to the 10-km square data set to form a set of

topographic explanatory variables for each grid square (see Table 4).

2.3.5 Land cover data

Land cover data was selected from the LCS88 data set (Macaulay Land Use Research

Institute, 1989). This was processed using GIS and ecological pattern analysis

software to derive metrics quantifying the amount (area and perimeter) of 17 different

habitat classes (see Table 3). An extended set of metrics was also calculated both at

class (Table 4) and landscape (Table 5) levels to measure the spatial arrangement of

habitat types. Combining all variables together gave a complete set of 885

explanatory variables, from which a working set of 79 was selected, summarised in

Tables 4 and 5.

Research Paper

9

Table 3 Land cover classes included in modelling. Classes were extracted and reclassified to

a land cover class (0-17) from LCS88 polygons (Macaulay Land Use Research Institute,

1989) to create an ‘upland’ dataset.

Class Description

0 Non-upland ‘islands’, completely enclosed by upland habitat.

1 Upland cliffs ( > 5km from coastline)

2 Water features

3 Coniferous plantation

4 Semi-natural conifers

5 Undifferentiated broadleaved woodland

6 Undifferentiated mixed woodland

7 Recent ploughing

8 Recent felling

9 Open canopy young plantation

10 Land recently ripped for afforestation

11 Heather moor

12 Coarse grassland

13 Smooth grassland

14 Bracken

15 Blanket bog and peatlands

16 Montane vegetation

Research Paper

10

Table 4 Explanatory variables used in statistical modelling – Geographic, topographic, and

land cover class metrics. Topographic variables were calculated over 10-km square zones.

Group Code Description

Geographic X, Y Easting and northing.

Topographic MIN, MAX, MEAN

MAJORITY,

MINORITY,

VARIETY

Minimum, maximum, mean elevation.

The most and least frequent height value, and the

number of different values within the 10-km

zone.

Class Metrics

Area CA0…..16 Total area of each land cover class.

Perimeter TE0……16 Total perimeter length of each class.

Research Paper

11

Table 5 Explanatory variables used in modelling – landscape metrics. Measures of landscape patch

pattern and spatial arrangement calculated form raster grids using Fragstats pattern analysis software

(McGarigal et. al. 2002; McGarigal and Marks 1995). Distribution statistics record average measures

across all patches. Single metrics are individual (single-value) measures of landscape character. For a

detailed textual, and mathematical, description see the on-line help resources available at:

www.umass.edu/landeco/research/fragstats/documents/Metrics/Metrics TOC.htm

Group or type Name (Fragstats code) Description

Landscape

Metrics

Distribution statistics

.MN, .AM, .MD,

.RA, .SD, .CV

Mean, Area-weighted Mean, Median,

Range, Standard Deviation, Coefficient

of Variation.

Applied to each of the five

below metrics:

Area AREA Patch area

Shape SHAPE, FRAC, PARA,

CONTIG

Shape Index (irregularity), Fractal

Dimension, Perimeter-area Ratio,

Contiguity Index.

Landscape

Metrics

Single metrics

Area/edge NP, PD, LPI, ED, LSI Number of Patches, Patch Density,

Largest Patch (size) Index, Edge Density,

Landscape Shape Index.

Shape PAFRAC Perimeter-area Fractal Dimension.

Connectivity COHESION ‘Connectedness’ of focal patch.

Contagion/

interspersion

CONTAG, PLADJ, IJI,

DIVISION, SPLIT, AI

Contagion (class dispersion), % Like

Adjacencies (cells), Interspersion and

Juxtaposition Index, Division (grid mesh

size), Split (landscape subdivision),

Aggregation Index.

Diversity PR, PRD,

SHDI, SIDI, MSIDI,

SHEI, SIEI, MSIEI

Patch Richness, Patch Richness Density,

Shannon and Simpson diversity and

evenness indices.

Research Paper

12

2.4 Statistical modelling

2.4.1 Data partitioning

Partitioning of 10-km squares for build and evaluation sets was achieved using a

simple randomisation technique. A sequence of numbers was generated in Excel and

tested for goodness-of-fit to a uniform distribution, and serial independence. Testing

is described in Macdonald (2004). Applying a one-dimensional set of pseudo-random

numbers to two-dimensional space of course gives rise to the possibility of spatial

autocorrelation or clumping across rows or columns. Inspection of the square datasets

shown in Figure 1 does reveal some clustering. Squares were numbered in rows from

left to right, top to bottom (i.e. sorted by easting and northing) therefore artificial

clumping occurs in vertical or diagonal trends due to periodicity in the number

generator algorithm. Techniques such as stratified random sampling (see Longley, et.

al. 2001, p.104), can reduce (i.e. scale down) this problem, however they require that

we choose an arbitrary aggregated spatial unit whose size affects the degree of

autocorrelation, in which the randomisation process is repeated across the dataset.

�

This map is based on data provided with the support of the ESRC and JISC and uses boundary material which is copyright of the Crown and the Post Office.

0 100 20050 Kilometres

Build Set

Evaluation Set

Figure 1 Partitioned 10-km squares. Left: build set (691 squares, 70%), right: evaluation set (302

squares, 30%). Squares assigned to one of the two sets using simple randomisation. Lines represent

50-km GB National Grid squares.

Research Paper

13

2.4.2 Generalised Linear Models

Generalised linear modelling (GLM) provides a framework for regression models that

allows for the specification of non-uniform mean-variance relationships, and non-

normal error distributions or structures (Nelder and Wedderburn, 1972; McCullagh

and Nelder, 1983). In addition to the error structure, a GLM has two other important

components. These are the linear predictor and the link function. The former is a

linear sum of the effects of all explanatory variables, while the latter is used to relate

this to the predicted values. The predicted value is obtained by applying the inverse

link function to the linear predictor.

2.4.2.1 Diversity model

In this study two GLM families were used. Exploratory data analysis revealed the

calculated diversity index, H, had positive values in the range of 0.9 to 2.9, with mean

of approximately 2.2. Considering equation 1 again, positive numbers are expected as

all values are calculated from the negative sum of a set of negative terms. The

individual terms will always be negative, since all frequency values are decimal

proportions between 0 and 1, and the logarithms of such numbers are negative.

Therefore H is expected to be positive with real-valued mean. The values were seen

to resemble the shape of the Normal distribution, and therefore the Gaussian GLM

family was selected.

GLM Model 1: Diversity, H Gaussian family GLM with identity link.

�=

+=

79

1i

ii xbaH Equation 2

The equation for the GLM diversity model is shown in Equation 2. The terms are as

follows: a is the intercept coefficient, and each bi is one of the 79 explanatory variable

coefficients.

Research Paper

14

2.4.2.2 Distribution models

Both distribution and frequencies, provided in the bird survey data, are proportional

data (values between 0 and 1). These require analysis using a binomial error

structure, commonly with a logit link function (Crawley, 2002, pp.513-536). The

distribution data was complicated by the inclusion of both single sightings and

confirmed breeding. Simplification was achieved by excluding less reliable single

sightings, thereby creating a binary response variable. This could then be analysed by

simple unweighted regression, as a special single-trial form of the Binomial

distribution known as the Bernoulli distribution (Crawley, 2002).

GLM Model 2: Distribution : Binary response binomial GLM with logit link.

��

��

�++

��

��

�+

=

�

�

=

=

79

1

79

1

exp1

exp

i

ii

i

ii

xba

xba

p Equation 3

The general equation for the GLM distribution models is shown in Equation 3. The

terms are as follows: a is the intercept coefficient, and each bi is one of the 79

explanatory variable coefficients. The computed value, p, is the probability of

presence/absence in each 10-km square.

2.4.2.3 Frequency models

Frequency data would normally be analysed using weighted binomial regression. The

survey data were supplied as decimal proportions with no information regarding the

number of tetrads visited in each 10-km square, i.e. the binomial denominators, or

weights. Therefore unweighted regression was used as the preferred available option.

Research Paper

15

GLM Model 3: Frequency : Unweighted binomial GLM with logit link.

��

��

�++

��

��

�+

=

�

�

=

=

79

1

79

1

exp1

exp

i

ii

i

ii

xba

xba

p Equation 4

The general equation for the GLM frequency models is shown in Equation 3. The

terms are as follows: a is the intercept coefficient, and each bi is one of the 79

explanatory variable coefficients. The computed value, p, is the proportion of

occupied tetrads (probability of tetrad occupancy).

2.4.3 Model fitting

Model fitting involves selecting only the most important explanatory variables in

order to create a parsimonious model. That is, the simplest possible model that will

explain the greatest amount of variation in the response (Crawley, 2002). Inclusion of

all variables in a model may lead to retention of irrelevant correlation between

explanatory variables (Macdonald, 2004)

Model fitting was performed using S-Plus’ automated stepwise selection function.

This selects the optimum model based on lowering a computed measure of fit,

Akaike’s Information Criterion (AIC). The stepwise process successively adds and

removes all terms until the ‘best’ model, that with the lowest AIC, is selected. The

process can take the form of forward addition, starting from a null (or intersect only)

model, or backwards removal of terms from a full model. Forward selection removes

terms that are no longer significant in the model, thereby reducing the spurious or

exaggerated correlation between explanatory variables (G. Buchanan, pers. comm.).

Thus, in light of the number of potentially correlated land metrics, forward selection

was chosen for use in this study.

2.4.4 Model prediction

The models were used to predict values of diversity based on all bird species,

probability of a selected set of species’ presence, and the species’ relative abundance,

or frequency, for each of the evaluation 10-km squares in Scotland, based on the set

of variables chosen by stepwise selection. Distribution predictions were interpreted as

Research Paper

16

predicted presence for all values greater than or equal to 0.5, and predicted absence

for values less than 0.5.

2.5 Model evaluation

2.5.1 Evaluation of partitioned data

The use of contemporaneous bird survey and land cover data in this study means there

are no such comparable datasets available for the study area. Data therefore required

to be partitioned into two sets, one for model building, and one for evaluation. Thus

we are unable to independently verify the models in terms of predicting known bird

distributions and frequencies. We are however able to test predicted models fit the

entire survey extent, through independent build and test data at the same spatial scale.

This ensures that we can accurately quantify, across Scotland, the strongest

relationships between species, habitat and, of particular interest, spatial configuration

of habitat. Additionally, there is no risk of a loss of information that may occur where

spatial aggregation of predictions to larger grain separate evaluation data is required.

2.5.2 Diversity model

Diversity predictions were evaluated by measuring the correlation between the

predicted and observed values. The Pearson product-moment correlation was

calculated for this purpose. A simple linear regression plot of observed H vs.

predicted H was also used, for visual analysis. Values for the average degree of over

and under-prediction were calculated, and residuals were also examined for spatial

patterns, either to test for problems due to autocorrelation (see 2.4.1 above), or to test

for any geographical variation in model accuracy.

2.5.3 Distribution models

Evaluation of distribution data, for all five species, was performed using a set of

performance measures devised for presence/absence models by Fielding and Bell

(1997). These measures are calculated from ‘confusion matrices’ recording the

numbers of correct and incorrect predictions of presence and absence for each square.

The performance measures used are listed in Table 6.

Research Paper

17

2.5.4 Frequency models

Frequency predictions, for all five species, were evaluated by measuring the

correlation between the predicted and observed values. The Pearson product-moment

correlation was calculated as for the diversity model. Observed values were plotted

against predicted for inspection, and comparison between species. Again, as for

diversity, the average degree of over and under-prediction were calculated, for all five

species, and residuals examined for spatial patterns, either to test for problems due to

autocorrelation (see 2.4.1 above), or to test for any geographical variation in model

accuracy.

Table 6 Performance measures for assessing classification accuracy of presence-absence

models (after Fielding and Bell, 1997). Formulae are derived from confusion matrix terms as

follows: a correct positive prediction, b false positive prediction, c false negative prediction, d

true negative prediction. N is the number of cases, or sample size.

Measure Formula

Prevalence (a+c)/N

Overall diagnostic power (b+d)/N

Correct classification rate: proportion

of all cases correctly predicted

(a+d)/N

Misclassification rate: proportion of

all cases incorrect predicted

(b+c)/N

Sensitivity a/(a+c)

Specificity d/(b+d)

False positive rate b/(b+d)

False negative rate c/(a+c)

Positive predictive power (PPP) a/(a+b)

Negative predictive power (NPP) d/(c+d)

Odds-ratio (ad)/(bc)

Kappa K: proportion of specific

agreement

[(a+d)-(((a+c)(a+b)+(b+d)(c+d))/N]

/ [N-(((a+c)(a+b)+(b+d)(c+d))/N]

Research Paper

18

3 Results

3.1 Optimum models of overall diversity, distribution and frequency

3.1.1 Model summaries

A total of 11 generalised linear models were created following the steps detailed in

section 2. These included a single model for diversity based on all 27 upland bird

species, and both a distribution and frequency model for each of the five species of

special interest.

Figure 2 shows a comparison of the deviance explained by each of the 11 GLM

models produced by the stepwise modelling process. The amount of deviance

explained is seen to remain reasonably constant at 25-50%, however distribution

models consistently explain more deviance than frequency models. Table 7 provides

a summary of the GLMs produced, and includes a measure of each model’s statistical

significance, the p-value.

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

BK CU GP MP PM H

Species

% E

xp

lain

ed

De

via

nc

e

Distribution Based

Frequency Based

Figure 2 Percentage deviance explained by GLM models in this study. Models for

distribution shown in blue, frequency models and diversity in yellow. All models are

identified by the relevant BTO species code: BK = black grouse, CU = curlew, GP = golden

plover, MP = meadow pipit, PM = ptarmigan. H represents diversity, H, GLM.

Res

earc

h P

aper

19

Tab

le 7

S

um

mar

y o

f al

l G

LM

s pro

duce

d i

n t

his

stu

dy.

(d.f

. =

deg

rees

of

free

dom

).

The

Pea

rson p

roduct

-mo

men

t is

giv

en f

or

the

deg

ree

of

corr

elat

ion

bet

wee

n o

bse

rved

and p

redic

ted v

alues

, fo

r al

l fr

equen

cy m

odel

s an

d d

iver

sity

, H

.

Dis

trib

uti

on

mo

del

s

*D

isp

ersi

on

Null

dev

iance

d

.f.

Res

idual

dev

iance

d

.f.

p

%E

xp

lain

ed d

evia

nce

Bla

ck G

rouse

1

7

50

.24

74

6

90

4

44

.38

77

6

69

0

4

0.7

7%

Curl

ew

1

8

50

.39

04

6

90

4

80

.95

21

6

71

0

4

3.4

4%

Go

lden

Plo

ver

1

9

57

.81

22

6

90

4

93

.77

21

6

69

0

4

8.4

5%

Mea

do

w P

ipit

1

1

88

.08

72

6

90

1

19

.13

79

6

80

7

.07

e-0

11

3

6.6

6%

Pta

rmig

an

1

51

9.2

35

1

69

0

17

6.8

25

8

68

5

0

65

.94

%

Fre

qu

ency

an

d d

iver

sity

mo

del

s

D

isp

ersi

on

Null

dev

iance

d

.f.

Res

idual

dev

iance

d

.f.

p

%E

xp

lain

ed d

evia

nce

P

ears

on

t p

Bla

ck G

rouse

1

8

3.6

68

91

6

90

6

3.1

56

93

6

88

3

.51

e-0

05

2

4.5

2%

0

.17

74

197

3

.12

25

0

.00

2

Curl

ew

1

3

94

.45

95

6

90

2

49

.09

29

6

82

0

3

6.8

5%

0

.62

12

267

1

3.7

.09

0

Go

lden

Plo

ver

1

2

40

.15

63

6

90

1

20

.11

12

6

87

0

4

9.9

9%

0

.59

71

54

1

2.8

94

5

0

Mea

do

w P

ipit

1

3

55

.79

41

6

90

2

49

.58

41

6

84

0

2

9.8

5%

0

.65

83

335

1

5.1

48

5

0

Pta

rmig

an

1

61

.429

1

69

0

25

.851

95

6

87

9

.20

e-0

08

5

7.9

2%

0

.45

54

774

8

.86

17

0

H

0.2

128

24

2

09

.43

35

6

90

1

43

.23

06

6

73

9

.61

e-0

08

3

1.6

1%

0

.53

46

674

1

0.9

58

6

0

* D

isper

sion p

aram

eter

of

1 a

ssum

ed b

y S

-Plu

s fo

r bin

om

ial

regre

ssio

n.

Research Paper

20

3.1.2 Model fitting and selection of explanatory variables

3.1.2.1 Diversity model

Table 8 provides a list of computed parameter estimates (regression coefficients) and

related information for the diversity model. Similar output and related information for

all models used in this study is listed in Macdonald (2004).

Variables selected as controlling overall diversity included, intuitively, the Shannon

diversity index calculated for landscape (SHDI), maximum elevation (MAX), and

areas of blanket bog and peatland (CA15) and heather moor (CA11). No particular

associations with easting or northing (X, Y) were revealed. Possible quadratic

relationships, however, can be detected in the model. For instance perimeter-area

ratio (PARA) overall is a weakly negative predictor. This is due to the positive

influence of the mean ratio and fractionally stronger negative influence of the median.

In contrast to this, the stepwise process picked out the presence of edge habitat for

water, smooth grassland, and bracken classes as positive predictors, bracken being

quite significant.

3.1.2.2 Distribution models

Parameter estimates selected for distribution models are shown in Table 9. In general,

a similar number of variables as for diversity were chosen for distribution models.

The exceptions were for meadow pipit and ptarmigan. This result, of course, might be

expected due to the unique patterns of distribution for these species.

For the distribution of black grouse, the moorland edge (TE11) and also edge habitats

of water, bracken and montane vegetation were selected as predictors. This appears to

be corroborated by a positive association with perimeter-area ratio (PARA). Fractal

dimension (shape complexity) however, is found to be strongly negative (FRAC).

Variety of patch types (PR), degree of class aggregation into single patches (AI), and,

to a lesser extent, patch area coefficient of variation (AREA.CV), are positive.

Grouse were also predicted to be weakly associated with easting, having a stronger

negative association with northing. Surprisingly, open canopy young plantation

(CA9) is found to be a negative predictor. This was expected to show a positive

Research Paper

21

relationship, since black grouse are known to favour plantation shrub and herb growth

that occurs in the first 15-20 years before canopy closure. Around this time, light no

longer reaches the canopy floor and any grouse present will move on to fresh habitat

(Hill, 2001).

From the other models, ptarmigan were predicted to respond to maximum elevation

(MAX), and contiguous montane vegetation (CA16, CONTIG.MN) located in

northern region (Y). Curlews were predicted to favour a diverse and varied landscape

(SHDI, SHAPE.SD), but again with contiguous patches (CONTIG.AM). They were

also predicted to favour areas of smooth grassland and montane vegetation, with

edges of blanket bog and conifer plantations other favoured habitats. A smaller model

was produced for meadow pipits, again selecting contiguity of patches, but also

blanket bog edge as particularly significant. Golden plovers were predicted to favour

some areas of heather moor (CA11), coarse grassland (CA12), and blanket bog and

peatland (CA15). Overall, a slight positive relationship with more complex patch

shape was predicted for golden plover (FRAC, SHAPE.MD).

3.1.2.3 Frequency models

Parameter estimates selected for frequency models are shown in Table 10. For

frequency models, a much smaller set of variables was selected. For the black grouse

model, only total conifer plantation perimeter (TE3) and mean elevation (MEAN)

were selected. Ptarmigan, the other specialist examined, were predicted by a small

model of topographic variation (VARIETY), blanket bog edge (TE15), and mean

patch area (AREA.MN).

Curlew were predicted to favour coarse grassland areas (CA12), but also edges of

smooth grassland (TE13). They also exhibited slight eastward preference (X).

Golden plovers, however, were predicted to favour land to the north (Y). The model

also picked out uniform blanket bog and peatland areas (CA15, COHESION) as

important predictor variables for this species. Meadow pipits were predicted to be

negatively associated with areas of mixed woodland, however woodland edges were

positive – possibly indicating a more complex quadratic interaction. Meadow pipits

were additionally predicted to favour a highly aggregated landscape (AI), blanket bog

edge (TE15). Mean elevation was predicted to have a negative relationship.

Research Paper

22

Table 8 The model chosen by forward stepwise Gaussian regression for overall bird

diversity, H, in Scottish upland habitats. The information presented here has been adapted

directly from S-Plus output, and includes the selected variables (described in Tables 4 and 5),

parameter estimate (regression coefficients), standard errors, and Student’s t test values.

Variables are presented in the order they are selected. Complete information for all models is

presented in Macdonald (2004).

Variable Parameter

estimate

Standard error t value

(Intercept) 1.729815 2.195503 0.787890

CONTIG.AM 6.421927 0.840860 7.637330

SHDI 0.329566 0.000655 5.030813

CA15 0.000055 0.000016 3.487625

TE2 0.000002 0.000001 2.036191

TE13 0.000002 0.000001 3.646451

CA0 0.000130 0.000046 -2.847954

CA11 0.000051 0.000016 3.191789

MAX 0.000416 0.000104 -3.993171

CONTIG.MD -7.438591 1.999564 -3.720105

AREA.MN 0.003361 0.000899 3.738703

CONTIG.CV -0.177609 0.031611 -5.618632

CONTIG.SD 21.32197 4.094209 5.207836

PARA.MD 0.003879 0.001289 -3.009283

TE14 0.000009 0.000004 2.477047

AREA.SD 0.000515 0.000220 -2.342149

PARA.MN 0.003492 0.001705 2.048171

CA14 0.000685 0.000383 -1.788750

Dispersion = 0.212824

Null deviance = 209.4335 on 690 degrees of freedom

Explained deviance = 143.2306 on 673 degrees of freedom, p = 9.614724e-9 (<< 0.05)

Percentage deviance explained = 31.61%

Res

earc

h P

aper

23

Tab

le 9

D

istr

ibu

tion

mod

els:

Par

amet

er e

stim

ates

for

var

iable

s se

lect

ed b

y s

tepw

ise

logis

tic

regre

ssio

n. F

or

full

outp

ut

see

Mac

donal

d (

2004).

Bla

ck G

rou

se

Cu

rlew

G

old

en P

lov

er

Mea

do

w P

ipit

P

tarm

iga

n

Var

iab

le

Co

effi

cient

Var

iab

le

Co

effi

cient

V

aria

ble

C

oef

fici

ent

Var

iab

le

Co

effi

cient

V

aria

ble

C

oef

fici

ent

Inte

rcep

t)

7.4

810

33

(Inte

rcep

t)

43

.877

16

(Inte

rcep

t)

-10

9.6

322

(Inte

rcep

t)

14

389

59

.0

(I

nte

rcep

t)

-33

.02

291

PR

0

.18

85

41

X

0.0

000

23

CO

HE

SIO

N

0.6

321

03

CO

NT

IG.A

M

56

3.7

86

0

C

A1

6

0.0

005

72

AI

0.2

755

18

SH

DI

3.1

638

72

CA

15

0

.00

07

68

CO

NT

IG.C

V

-0.0

70

284

MA

X

0.0

100

32

CA

16

-0

.00

12

71

CO

NT

IG.A

M

13

4.4

47

8

C

A1

2

0.0

013

95

TE

15

0

.00

00

62

Y

0.0

000

08

SID

I 1

0.1

22

840

TE

8

-0.0

00

070

Y

0.0

000

05

MIN

OR

ITY

-0

.00

39

15

CO

NT

IG.M

N

19

.959

29

AR

EA

.CV

0

.00

83

23

IJI

-0.0

43

319

CA

8

-0.0

05

016

ED

-0

.08

30

03

TE

2

0.0

000

17

TE

1

-0.0

00

074

TE

16

-0

.00

00

18

PR

D

-5.7

18

816

PL

AD

J -1

43

95

.18

Y

-0.0

00

012

Y

-0.0

00

007

SH

AP

E.M

D

-9.6

51

047

TE

2

0.0

001

42

X

0.0

000

09

TE

15

0

.00

00

14

TE

12

-0

.00

00

12

X

0.0

000

05

TE

12

-0

.00

00

06

ME

AN

-0

.00

64

03

FR

AC

.MD

5

8.1

07

90

DIV

ISIO

N

4.0

961

26

CA

14

-0

.00

71

95

AI

-1.3

85

507

MA

JOR

ITY

0

.00

81

62

PA

RA

.AM

-8

99

.31

46

TE

11

0

.00

00

13

FR

AC

.MD

-3

1.8

42

03

ME

AN

-0

.00

63

08

TE

2

0.0

000

25

MIN

OR

ITY

-0

.00

10

94

CA

11

0

.00

04

66

FR

AC

.AM

-3

6.6

54

34

0

F

RA

C.C

V

-1.0

81

644

MIN

OR

ITY

0

.00

11

05

CA

13

0

.00

06

28

SH

AP

E.S

D

2.5

018

58

CA

7

-0.0

00

706

TE

16

0

.00

00

17

CA

13

0

.00

07

85

FR

AC

.CV

1

.16

31

22

PA

RA

.MN

-0

.02

47

42

AR

EA

.MD

-0

.04

86

71

LP

I 0

.06

14

76

IJI

-0.0

32

698

CA

16

0

.00

06

28

AR

EA

.RA

-0

.00

06

27

TE

14

0

.00

00

35

TE

3

0.0

000

12

AR

EA

.MD

0

.07

30

12

CA

9

-0.0

00

492

CA

9

-0.0

00

618

SH

DI

2.8

618

49

PA

RA

.MD

0

.01

85

17

P

AF

RA

C

-10

.85

243

PA

RA

.CV

0

.05

54

01

M

AX

-0

.00

20

16

Res

earc

h P

aper

24

Tab

le 1

0 F

req

uen

cy m

od

els:

P

aram

eter

est

imat

es f

or

var

iable

s se

lect

ed b

y s

tepw

ise

logis

tic

regre

ssio

n. F

or

full

outp

ut

see

Mac

donal

d (

2004

).

Bla

ck G

rou

se

Cu

rlew

G

old

en P

lover

Mea

dow

Pip

it

Pta

rm

igan

Var

iable

C

oef

fici

ent

Var

iable

C

oef

fici

ent

V

aria

ble

C

oef

fici

ent

Var

iable

C

oef

fici

ent

V

aria

ble

C

oef

fici

ent

(Inte

rcep

t)

-5.1

49

640

(Inte

rcep

t)

23

.686

16

(Inte

rcep

t)

-10

4.1

109

(Inte

rcep

t)

-12

.23

221

(Inte

rcep

t)

-10

.94

814

TE

3

0.0

000

14

X

0.0

000

06

CA

15

0

.00

01

78

AI

0.1

401

84

VA

RIE

TY

0

.00

72

13

ME

AN

0

.00

39

98

VA

RIE

TY

-0

.00

34

28

Y

0.0

000

05

CA

6

-0.0

03

597

TE

15

0

.00

00

15

T

E1

3

0.0

000

09

CO

HE

SIO

N

0.9

909

11

TE

15

0

.00

00

07

AR

EA

.MN

0

.00

71

64

C

A1

2

0.0

003

11

M

EA

N

-0.0

01

967

F

RA

C.M

N

-20

.36

037

A

RE

A.M

N

0.0

053

40

P

R

0.1

277

55

T

E6

0

.00

00

15

P

AF

RA

C

-2.8

67

721

C

A3

-0

.00

01

77

Research Paper

25

3.2 Model evaluation


The computed Pearson product-moment correlation for the diversity model

(0.5346674) is shown, with related information Student’s t statistic and the model

p-value, in Table 7. Values of the average residual and ratio of over to under-

predicted squares are shown in Table 13. A plot of observed versus predicted

diversity is presented in the top-left frame of Figure 7.

Maps showing predicted diversity, H, and the average percentage of over or under

prediction in H are shown in Figure 3. A map showing locations of over and under

predictions of H compared to those for black grouse frequency is shown in Figure 4.


Evaluation maps of all distribution models are shown in Figures 5 (more specialist

species) and 6 (generalists). The confusion matrices used to produce these are listed

in Table 11. The computed performance assessments, described in Table 6, are

shown in Table 12.


The computed Pearson product-moment correlation values for frequency models are

shown, with related information Student’s t statistic and the model

p-values, in Table 7. Values of the average residuals and ratios of over to under-

predicted squares are shown in Table 13. Plots of observed versus predicted diversity

are shown in Figure 7.

Maps showing the locations of over and under predicted frequencies of curlew and

meadow pipit are shown in Figure 8. These show similarly good results as for

generalist distribution models.

A direct comparison of the direct model outputs of probability of distribution and

frequency, shown for black grouse, is given in Figure 9.

Res

earc

h P

aper

26

0.8

7 -

1.2

0

1.2

1 -

1.6

0

1.6

1 -

2.0

0

2.0

1 -

2.4

0

2.4

1 -

2.8

0

2.8

1 -

3.2

0

�

-41 -

-25

-24 -

0

1 -

25

26 -

50

51 -

75

76 -

100

This

map is b

ased o

n d

ata

pro

vid

ed w

ith the s

uppo

rt o

f th

e E

SR

C a

nd

JIS

C a

nd

uses b

oun

dary

mate

rial w

hic

h is c

opyrig

ht of th

e C

row

n a

nd t

he P

ost

Offic

e.

010

020

050

Kilo

metr

es

Pre

dic

ted d

ive

rsity H

Pe

rcen

tag

e o

ver

or

under-

pre

dic

ted

div

ers

ity

H

Fig

ure

3 P

redic

ted d

iver

sity

for

all

upla

nd b

ird s

pec

ies

(lef

t), an

d (

right)

the

per

centa

ge

over

(+

ve)

or

under

(-v

e) p

redic

tion o

f th

e obse

rved

val

ue.

Lin

es r

epre

sent

50-k

m G

B N

atio

nal

Gri

d s

quar

es.

Res

earc

h P

aper

27

-1 0 1

�

-1 0 1

This

map is b

ased o

n d

ata

pro

vid

ed w

ith the s

uppo

rt o

f th

e E

SR

C a

nd

JIS

C a

nd

uses b

oun

dary

mate

rial w

hic

h is c

opyrig

ht of th

e C

row

n a

nd t

he P

ost

Offic

e.

010

020

050

Kilo

me

tre

s

Over

(+ve

) or

under

(-ve)

pre

dic

tio

n o

f div

ers

ity H

Over

(+ve)

or

unde

r (-

ve)

pre

dic

tion o

f B

lack

Gro

use

(0 r

espon

se a

lmo

st is

alw

ays o

verp

red

icte

d)

Fig

ure

4

Loca

tions

of

over

(+

1)

or

under

(-1

) pre

dic

ted d

iver

sity

for

all

upla

nd b

ird s

pec

ies

(lef

t),

and (

right)

the

loca

tions

of

over

(+

1)

or

under

(-1

)

pre

dic

ted b

lack

gro

use

fre

quen

cy. Y

ello

w s

quar

es h

ave

an o

bse

rved

spec

ies

freq

uen

cy o

f ze

ro. L

ines

rep

rese

nt

50-k

m G

B N

atio

nal

Gri

d s

quar

es.

Research Paper

28

Table 11 Confusion matrices, for all species and H, showing the number of true

positive, false positive, false negative and true negative predictions of species

presence/absence in each 10-km square. In each matrix the terms are as follows:

a correct positive prediction (top-left), b false positive prediction (top-right),

c false negative prediction (bottom-left), d true negative prediction (bottom-right).

Black Grouse Curlew

Actual

Actual + - + -

+ 32 29 + 180 34 Predicted

- 23 218 Predicted

- 27 61

Golden Plover Meadow Pipit

Actual

Actual

+ - + -

+ 110 40 + 298 4 Predicted

- 40 112 Predicted

- 0 0

Ptarmigan

Actual

+ -

+ 36 10

Predicted

- 11 245

Res

earc

h P

aper

29

Tab

le 1

2

Per

form

ance

mea

sure

s d

eriv

ed f

rom

confu

sion m

atri

ces

show

n i

n T

able

11.

Thes

e re

sult

s sp

ecif

y t

he

deg

ree

of

succ

ess

wit

h w

hic

h

each

model

acc

ura

tely

pre

dic

ts b

oth

pre

sen

ce a

nd a

bse

nce

. M

easu

res

uti

lisi

ng a

ll m

atri

x d

ata

are

the

odds

rati

o a

nd

Kap

pa.

F

orm

al d

efin

itio

ns

are

giv

en T

able

6, af

ter

Fie

ldin

g a

nd B

ell

(1997).

Per

form

an

ce m

easu

re

Bla

ck G

rou

se

Cu

rlew

G

old

en P

lover

Mea

dow

Pip

it

Pta

rm

igan

O

ver

all

Pre

val

ence

0.1

82119

0.6

8543

0.4

96689

0.9

86755

0.1

55629

0.5

01325

Over

all

dia

gnost

ic p

ow

er

0.8

17881

0.3

1457

0.5

03311

0.0

13245

0.8

44371

0.4

98675

Corr

ect

clas

sifi

cati

on r

ate:

0.8

27815

0.7

98013

0.7

35099

0.9

86755

0.9

30464

0.8

55629

Mis

clas

sifi

cati

on r

ate:

0.1

72185

0.2

01987

0.2

64901

0.0

13245

0.0

69536

0.1

44371

Sen

siti

vit

y

0.5

81818

0.8

69565

0.7

33333

1

0.7

65957

0.7

90135

Spec

ific

ity

0.8

82591

0.6

42105

0.7

36842

0

0.9

60784

0.6

44465

Fal

se p

osi

tive

rate

0.1

17409

0.3

57895

0.2

63158

1

0.0

39216

0.3

55535

Fal

se n

egat

ive

rate

0.4

18182

0.1

30435

0.2

66667

0

0.2

34043

0.2

09865

Posi

tive

pre

dic

tive

pow

er (

PP

P)

0.5

2459

0.8

41121

0.7

33333

0.9

86755

0.7

82609

0.7

73682

Neg

ativ

e pre

dic

tive

pow

er (

NP

P)

0.9

04564

0.6

93182

0.7

36842

N/A

0.9

57031

2.5

73846*

Odds-

rati

o

10.4

5877

11.9

6078

7.7

N

/A

80.1

8182

50.1

65010*

Kap

pa

K

0.4

45519

0.5

22078

0.4

70175

0

0.7

33103

0.4

34175

* T

hes

e val

ues

are

aver

aged

ov

er o

nly

fou

r ca

ses.

Res

earc

h P

aper

30

Fa

lse

ne

gative

Tru

e n

eg

ative

Tru

e p

ositiv

e

Fa

lse

po

sitiv

e

�

Fa

lse

ne

ga

tive

Tru

e n

ega

tive

Tru

e p

ositiv

e

Fa

lse

po

sitiv

e

Th

is m

ap

is b

ased o

n d

ata

pro

vid

ed

with

the s

uppo

rt o

f th

e E

SR

C a

nd

JIS

C a

nd

uses b

oun

da

ry m

ate

rial w

hic

h is c

op

yrig

ht of th

e C

row

n a

nd t

he P

ost

Offic

e.

0100

200

50

Kilo

metr

es

Na

ture

of

Bla

ck G

rou

se

pre

dic

tio

n

Natu

re o

f P

tarm

igan

pre

dic

tio

n

Fig

ure

5

Eval

uat

ion ‘

gri

d’

map

s fo

r pre

dic

ted d

istr

ibuti

on o

f bla

ck g

rouse

(le

ft)

and p

tarm

igan

(ri

ght)

. G

reen

shad

es i

ndic

ate

corr

ect

pre

dic

tions.

R

ed

shad

es i

ndic

ate

inco

rrec

t pre

dic

tions.

L

ines

rep

rese

nt

50-k

m G

B N

atio

nal

Gri

d s

quar

es.

Res

earc

h P

aper

31

Fals

e n

egative

Tru

e n

egative

Tru

e p

ositiv

e

Fals

e p

ositiv

e

�

Fals

e n

egative

Tru

e n

ega

tive

Tru

e p

ositiv

e

Fals

e p

ositiv

e

This

map is b

ased o

n d

ata

pro

vid

ed w

ith the s

uppo

rt o

f th

e E

SR

C a

nd

JIS

C a

nd

uses b

oun

dary

mate

rial w

hic

h is c

opyrig

ht of th

e C

row

n a

nd t

he P

ost

Offic

e.

0100

200

50

Kilo

metr

es

Natu

re o

f C

urlew

pre

dic

tio

n

Natu

re o

f G

old

en P

lover

pre

dic

tion

Tru

e p

ositiv

e

Fals

e p

ositiv

e

Natu

re o

f M

ea

dow

Pip

itpre

dic

tion

Fig

ure

6

Eval

uat

ion ‘

gri

d’

map

s fo

r pre

dic

ted d

istr

ibuti

on o

f cu

rlew

(le

ft),

gold

en p

lover

(m

iddle

), a

nd m

eadow

pip

it (

right)

. G

reen

shad

es i

ndic

ate

corr

ect

pre

dic

tions.

R

ed s

had

es i

ndic

ate

inco

rrec

t pre

dic

tions.

L

ines

rep

rese

nt

50-k

m G

B N

atio

nal

Gri

d s

quar

es.

Res

earc

h P

aper

32

Tab

le 1

3

Mea

sure

s of

the

deg

ree

of

over

and u

nder

pre

dic

tion i

n f

requen

cy a

nd d

iver

sity

GL

M m

odel

s.

The

two l

eftm

ost

fie

lds

are

aver

aged

over

all

cas

es,

incl

udin

g t

hose

wher

e th

e obse

rved

val

ue

is z

ero.

Thes

e ar

e su

bje

ct t

o s

tati

stic

al n

ois

e (i

.e.

ver

y r

arel

y w

ill

the

model

pre

dic

t ex

actl

y z

ero d

ue

to t

he

pro

bab

ilis

tic

nat

ure

of

the

outp

ut)

, th

eref

ore

val

ues

bas

ed o

n o

nly

non-z

ero o

bse

rved

val

ues

are

giv

en i

n t

he

rem

ainin

g c

olu

mns.

T

hes

e in

clude,

the

aver

age

deg

ree

of

over

or

under

pre

dic

tion,

giv

en a

s a

per

centa

ge

of

the

true

(obse

rved

) val

ue.

C

are

is a

dvis

able

in i

nte

rpre

ting t

hes

e ty

pes

of

resu

lt,

but

they

are

giv

en h

ere

for

com

ple

tenes

s.

A

ver

age

resi

du

al

Over

/un

der

pre

dic

tion

rati

o

Aver

age

resi

du

al

(excl

. zer

o-v

alu

ed r

esp

on

ses)

Over

/un

der

pre

dic

tion

rati

o

(excl

. zer

o. res

p.)

Perc

enta

ge

over

/un

der

pre

dic

tion

(excl

. zer

o-v

alu

ed r

esp

on

ses)

Bla

ck G

rouse

-0

.00

56

90

6

.02

32

56

0

.11

92

17

0

.04

65

12

-6

5.6

1%

Curl

ew

0

.01

11

49

1

.01

33

33

0

.04

93

12

0

.66

1

5.3

3%

Go

lden

Plo

ver

0

.01

43

83

2

.43

18

18

0

.04

94

65

0

.52

27

27

-2

.55

%

Mea

do

w P

ipit

0

.01

04

69

0

.65

02

73

0

.02

74

47

0

.60

65

57

1

1.0

5%

Pta

rmig

an

0.0

082

95

9

.41

37

93

0

.01

79

88

0

.13

79

31

-5

.61

%

H

0.0

748

95

0

.49

50

50

0

.08

95

55

0

.48

51

49

-2

.28

%

Research Paper

33

Figure 7 Correlation plots of observed versus predicted diversity (top-left) and frequency, for

each of the five species modelled. Simple linear regression line of best fit and correlation

coefficient, R2, added for comparison between models.

Frequency Black Grouse

y = 0.3716x + 0.0133

R2 = 0.0315

0

0.1

0.2

0.3

0.4

0.5

0.6

0 0.05 0.1 0.15 0.2 0.25

Predicted

Ob

serv

ed

Frequency Curlew

y = 0.9232x + 0.0424

R2 = 0.3859

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1

Predicted

Ob

se

rve

d

Frequency Golden Plover

y = 1.0007x + 0.0143

R2 = 0.3566

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8

Predicted

Ob

se

rve

d

Frequency Ptarmigan

y = 1.057x + 0.0072

R2 = 0.2075

0

0.2

0.4

0.6

0.8

1

1.2

0 0.1 0.2 0.3 0.4

Predicted

Ob

serv

ed

Frequency Meadow Pipit

y = 1.0645x - 0.0372

R2 = 0.4334

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1 1.2

Predicted

Ob

se

rve

d

Observed vs. Predicted H

y = 0.7091x + 0.7038

R2 = 0.3539

0

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.5 3 3.5

Predicted H

Ob

se

rve

d H

Res

earc

h P

aper

34

-1 0 1

�

-1 0 1

This

map is b

ased o

n d

ata

pro

vid

ed w

ith

th

e s

uppo

rt o

f th

e E

SR

C a

nd

JIS

C a

nd

use

s b

oun

da

ry m

ate

rial w

hic

h is c

opyrig

ht

of

the

Cro

wn a

nd

the P

ost

Off

ice.

01

00

200

50

Kilo

me

tre

s

Over

(+ve

) o

r u

nd

er

(-ve

) p

red

ictio

n o

f C

urle

w fre

qu

en

cy

Over

(+ve

) or

und

er

(-ve

) p

red

iction o

f M

ea

dow

Pip

itfr

equ

ency

F

igu

re 8

L

oca

tions

of

over

(+

1)

or

under

(-1

) pre

dic

ted f

requen

cy f

or

upla

nd g

ener

alis

ts,

curl

ew (

left

), a

nd m

eadow

pip

it (

right)

. Y

ello

w s

quar

es h

ave

an

obse

rved

spec

ies

freq

uen

cy o

f ze

ro. L

ines

rep

rese

nt

50-k

m G

B N

atio

nal

Gri

d s

quar

es.

Res

earc

h P

aper

35

0.0

0 -

0.2

0

0.2

1 -

0.4

0

0.4

1 -

0.6

0

0.6

1 -

0.8

0

0.8

1 -

1.0

0

�

0.0

1 -

0.0

5

0.0

6 -

0.0

9

0.1

0 -

0.1

4

0.1

5 -

0.1

8

0.1

9 -

0.2

3

This

map is b

ased o

n d

ata

pro

vid

ed w

ith

th

e s

uppo

rt o

f th

e E

SR

C a

nd

JIS

C a

nd

use

s b

oun

da

ry m

ate

rial w

hic

h is c

opyrig

ht

of

the

Cro

wn a

nd

the P

ost

Off

ice.

010

020

050

Kilo

metr

es

Pre

dic

ted

dis

trib

utio

n

of B

lack G

rou

se

Pre

dic

ted

fre

que

ncy

(rela

tive a

bu

nd

an

ce

)of

Bla

ck G

rouse

Fig

ure

9 P

redic

ted d

istr

ibuti

on (

pro

bab

ilit

y o

f pre

sence

) of

bla

ck g

rouse

(le

ft),

and p

redic

ted f

requen

cy o

f bla

ck g

rouse

(ri

ght)

.

Lin

es r

epre

sent

50-k

m G

B N

atio

nal

Gri

d s

quar

es.

Research Paper

36

4 Discussion

4.1 Ecological interpretation

A description of species characteristics is given by Macdonald (2004), which readers

unfamiliar with the species may find helpful in reviewing the following text.

4.1.1 Mathematical vs. biological selection

This study seeks to create species-habitat models for several upland bird species, and

also for overall diversity, at a national scale. The modelling process, however,

involves the use of theoretical statistical models rooted firmly in mathematics.

Stepwise model fitting produces a mathematically correct result, but is likely to

include variables other than the most biologically intuitive. Correlation may exist

between explanatory variables (Hill, 2001). This does not necessarily mean a poor

choice of submitted parameters, for as Hill (2001) points out; a species’ predilection

for transitional or ‘gradational’ landscapes (McGarigal and Cushman, 2002) can lead

to variable equivalence. The stepwise procedure simply selects the strongest

numerically. Interpretation must therefore be made in light of real world problems

rather than purely statistical.

4.1.2 Species predictors

The models include intuitively sensible variables. Blanket bog and peatland, and

heather moor, are host to numerous species (Thompson et. al., 1995; Thirgood, 2000).

Selection of a good set of metrics for H (Table 8, and see 3.1.2.1) means the model is

not dominated by any one species and is truly representative of overall diversity.

Cohesive and contiguous landscapes are seen to be important generally. Heather

moor is considered home to unusually high numbers of curlew, golden plover and

more recently lowland species such as the meadow pipit (Thirgood et. al., 2000;

Thompson et. al., 1995). Only golden plovers appear to respond positively to shape

metrics. Again though, particularly for the frequency model, cohesion was selected.

Research Paper

37

British curlews in the north and west may face pressure from modern agricultural

practices (Sterry, 1995). The predicted eastward association (Table 9, also Figure 6)

may reflect preference for less intensive grazing, or perhaps upland areas near to less

intensive arable farmland seen in eastern Scotland. This hypothesis would require

verification by other means, e.g. local survey, however it is echoed by a predicted

preference for patchy landscapes (higher AI and PR). This suggests models may be

useful for predicting the geographical variation in species and even possible human

activity-species interactions.

Selection of young plantation as a negative predictor of black grouse distribution

requires consideration. The species favours herbaceous growth that often appears

after fire disturbance (Moorland Working Group, 2002), and is known to exploit

similar plant growth in young plantations, (see Baines 1995, Hill 2001, Hack 2002,

and Hancock et. al. 1999). Looking further, black grouse are predicted to respond

negatively to montane vegetation, and positively to moorland edge, and coarse

grassland found in sub-montane zones (Macaulay Land Use Research Institute, 1989).

Conifer plantations are commonly associated with mid-range elevations, and logically

do not occur at high altitudes. The model therefore indirectly predicts that black

grouse are associated with conifer plantations as expected. This more complex

picture may possibly be due to the sparseness of data for black grouse.

4.2 Prediction Success


The diversity model produces good results with a slight average under-prediction of

-2.28% (Table 13). This is echoed in the graphical plots in Figure 3 and 4 (left). The

correlation plot of Figure 7 (top-left) shows clustering along the line of best fit which

appears within the limits of statistical noise. The model thus appears, graphically, to

be a credible conservation tool.


All models show reasonable results in terms of explained deviance and p-values.

Evaluation of distribution model success has, however, been standardised by the set of

measures provided by Fielding & Bell (1997). The models have highly impressive

Research Paper

38

correct classification rates, all above 70%, with an average overall prediction success

rate of 86%. Measures of the prediction success for presence and absence (sensitivity

and specificity) are reasonably close for most models, and near identical for golden

plover. Meadow pipits exist throughout the dataset thus these measures take their

extreme values for this species, with some measures unavailable due to the presence

everywhere of this species. This is a significant result, meaning prediction of

widespread species (where intuitive selection of model controls is difficult) is indeed

possible. GIS-statistical modelling appears to be applicable to many species types,

and hence ‘total’ diversity.

Measures utilising all confusion matrix values are the odds ratio and Cohen’s Kappa

index. The odds ratio for ptarmigan is high, however this simply reflects the low

number of incorrect predictions. Landis and Koch (1977) proposed a suitable scale of

classes for Kappa (also Fielding and Bell, 1997). Most values were classed as

moderate, 0.41 < K < 0.6, with ptarmigan classed as substantial, 0.61 > K > 0.8. The

Kappa value for Meadow pipit is unavailable due to the species’ widespread nature.

Figures 5 and 6 illustrate the variables selected (listed in Table 9). Black grouse and

ptarmigan (Figure 5) reflect sparse data, with the eastward trend of curlews clear in

Figure 6 (left map). The presence of golden plover in the north-west highlands may

reflect the predicted slight positive correlation with shape complexity (Table 9).


The majority of frequency models produced reasonable values of explained deviance,

and Pearson product-moment correlation, both with low p-values. The degree of

scatter in correlation plots is much more pronounced (Figure 7). This reflects the

often more sparse nature of the frequency data, likely due to low numbers of tetrads in

some squares. Figures 4 and 9 show this feature of the frequency data. Clearly, the

quality of results (for all models) is entirely dependent upon that of input data.

Research Paper

39

4.3 Limitations and potential error sources

4.3.1 Biotic errors

The relatively high success rates (Tables 7-9, Figures 2-5), suggest sufficient

environmental variables were considered. Biotic errors, due to inadequate models of

species ecology, are thus kept small.

The problems of zero inflated data, a form of overdispersion, have been explored by

Barry and Welsh (2002). Testing for overdispersion (residual deviance divided by

residual degrees of freedom) does not apply to binary response data, however we see

that frequency-derived models are actually underdispersed (Table 7), indicating

negatively correlated responses (Wilson and Hardy, 2002). This may also be related

to the lack of binomial weights (surveyed tetrads) and thus a lack of model fit.

Methods to correct for underdispersion have not received significant attention in

statistical literature. In most cases, however, the phenomenon is not thought to pose

overly significant problems (Wilson and Hardy, 2002).

4.3.2 Algorithmic errors

4.3.2.1 Randomised data partitioning and spatial autocorrelation

Autocorrelation in samples is normally to be avoided since this can lead to

autocorrelated residuals or clustering of over/under-predicted cases. In this study,

however, there are a number of points worth considering. Firstly, the data extent,

bounded by the outline of Scotland, inevitably increases the likelihood of

autocorrelation, especially in remote islands. The size of the build set (70%) within

this constraint also means most squares will be neighboured by other survey squares,

regardless of what pattern of squares we remove for evaluation. Additionally,

autocorrelation between adjacent squares forms a key part of the ecological systems

studied, since species are often seen to favour contiguous habitats. In such cases, a

small amount of artificial pattern may help to distinguish clustering due to natural

processes from that due to algorithmic error, in our predictions.

Examination of Figures 4 and 8 shows a small degree of clustered predictions. The

results for curlew are the least clustered with little autocorrelation of over and under

Research Paper

40

predictions. Results for diversity, H, (Figure 4, left) and for golden plover show a few

clusters e.g. Morayshire coast (linear pattern), west of Inverness/Black Isle, and in

Perthshire. These are, however, within or near to significant upland areas. The

clusters do vary between species and do not strongly indicate any undue bias or

systematic error. The patterns are assumed to reflect the local land cover within those

10-km squares.

4.3.3 Boundary effects: land cover polygons

The use of 10-km squares as a fixed sampling unit may introduce errors through

imposing unnatural boundaries upon land cover polygons where these are split into

multiple patches. This was considered preferable to counting whole polygon area,

which would result in repeat counting where polygons overlap multiple squares. The

choice of a 10-km unit was considered sufficiently large to hold a number of home

ranges for each species examined. The Fragstats landscape pattern analysis package

also allows for exclusion of unnatural boundaries imposed by the landscape (map)

boundary.

Finally, Hill (2001) notes that imprecision in the interpretation of LCS88 source

photography may give rise to mixed habitat within polygons, or particular land cover

in areas where not expected. This may therefore contribute to small numbers of

spurious or inaccurate predictions. This problem may be increased in some cases due

to the land cover selection methodology. Figure 1 shows a large number of coastal

squares present in the data, with low levels of genuine upland type habitat. This may

lead to further small errors in models.

Research Paper

41

5 Conclusions

5.1 Spatial controls of upland diversity

Statistical modelling reveals that a varied landscape of peat and heather patches, with

the presence of grass and conifer edge habitats, provides the conditions necessary for

increased overall diversity.

Modelling of the controls upon individual species’ distribution and frequency (or

relative abundance) has shown most species to favour larger, contiguous or cohesive,

patch areas. Requirements however for suitable edge habitat, often with coniferous

plantations, are significant. Potential conflict arises where patch shape complexity, or

proportion of ‘edge’, is increased. Of the five species examined, only one, golden

plover Pluvialis apricaria, exhibited a preference for greater shape complexity, as

chosen by a stepwise statistical selection process. A cohesive, compacted, habitat

was, however, also selected as a control on this species’ distribution.

5.2 Application of models as conservation/land management tools

The models provide land managers and conservationists with the first steps towards

mapping the ‘ideal landscape configuration’ for diversity. Statistical measures can be

assessed comparatively with maps of the existing landscape to infer the optimum

spatial structures that maximise diversity.

As Hill (2001) notes the current power is for the ‘real-world’ application, where

proposed land developments (already in motion) can be substituted into models in

place of land cover data. The addition of land pattern analysis in this study, while

embryonic, adds further power to this process, allowing us not merely to see the

effects on one localised species population, but to draw inferences regarding the

consequences for all bird life.

This process is still though somewhat limited. Real power would come from the

ability to visualise the ideal landscape, produced from specified statistics, rather than

repeatedly searching through a set of arbitrarily defined landscapes for the best

example currently available.

Research Paper

42

5.3 Future work

The study could be extended in several ways. Firstly a more detailed investigation of

all landscape metrics, including possible non-linear interactions or correlation, could

be undertaken. Refinement of models in terms of the numbers of metrics could

simplify the interpretation process where the more unusual variables are chosen.

Secondly, landscape patterns could be measured on a range of scales, e.g. from

species home ranges up to multiple 10-km squares. Further to this, use could be made

of class or patch level metrics. These are not available for every 10-km square due to

each square’s individual geography, however some additional metrics exist only at

class and patch levels and may prove worthy of investigation. Alternatively a similar

analysis of diversity as in this study, but grouped by smaller geographical areas

similar to the approach of Hill (2001) may provide interesting information on regional

variation.

The most exciting opportunity is the potential development of maps, graphically

illustrating possible optimal landscapes for maximising species diversity. These could

be achieved using iterative simulation techniques e.g. Monte Carlo random

landscapes, incorporating suitable land pattern metrics as predicted by GIS-powered

statistical modelling of bird diversity.

Research Paper

43

6 Cited References

Baines, D. (1995) Habitat requirements of Black Grouse. In Proceedings of the Sixth

International Grouse Symposium (ed. D Jenkins), pp. 147-150. World Pheasant

Association/Instituto Nazionale per las Fauna Selvatica, Ozzana dell’Emilia.

Barry, S.C. and Welsh, A.H. (2002) Generalized additive modelling and zero inflated

data. Ecological Modelling, 157, 179-188.

Begon, M., Harper, J.L., and Townsend, C.R. (1990) Ecology: Individuals,

Populations and Communities (2nd

edn). Blackwell Scientific Publications,

Masachusetts.

British Trust for Ornithology (2004) Species Codes. PDF document available from

BTO web site, www.bto.org [Accessed 03 September 2004]

Cramp, S. & Simmonds, K.E.L. (eds) (1977) Handbook of the Birds of Europe, the

Middle East and North Africa. Oxford University Press, Oxford

Crawley, M.J. (2002) Statistical Computing: An Introduction to Data Analysis using

S-Plus. Wiley, Chichester.

ESRI (2000) ArcInfo 8: A New GIS for the New Millennium. Environmental Systems

Research Institute, Inc., Redlands, California.

ESRI (2004) ArcGIS Desktop. Environmental Systems Research Institute, Inc.,

Redlands, California.

Fielding, A.H. and Bell, J.F. (1997) A review of methods for the assessment of

prediction errors in conservation presence/absence models. Environmental

conservation, 21, 38-49.

Research Paper

44

Gibbons, D.W., Reid, J.B., and Chapman, R.A. (1993) The New Atlas of Breeding

Birds in Britain and Ireland: 1988-1991 T & AD Poyser, London.

Hancock, M. et. al. (1999) Status of male Black Grouse Tetrao tetrix in Britain in

1995-1996. Bird Study, 46, 1-15.

Landis, J.R. and Koch, G.C. (1977) The measurement of observer agreement for

categorical data. Biometrics, 33, 159-174.

Luscher, L.M. (2001) Oracle 9i Database Concepts – Release 1 (9.0.1). Oracle

Corporation, Redwood City, California.

Macaulay Land Use Research Institute (1989). The Land Cover of Scotland by Air

Photo Interpretation. Contract specification for the Scottish Development

Department.

Macdonald, O.N. (2004) Modelling the spatial arrangement of upland bird habitat in

Scotland: Technical Report. MSc thesis, University of Edinburgh.

Malanson, G.P. and Cramer, B.E. (1999) Landscape heterogeneity, connectivity, and

critical landscapes for conservation. Diversity and Distributions, 5, 27-39.

McCullagh, P. and Nelder, J. (1983) Generalized Linear Models. Chapman and Hall,

London.

McGarigal, K. (2002) Landscape pattern metrics. In: A. H. El-Shaarawi and W. W.

Pegorsch, eds. Encyclopaedia of Environmetrics Volume 2: 1135-1142. John Wiley

& Sons, Sussex.

McGarigal, K. and Cushman, S.A. (2002) The Gradient Concept of Landscape

Structure: Or, Why are there so many patches? Landscape Ecology. In Press.

Available online at www.umass.edu/landeco/pubs/pubs.html. [Accessed 26 August

2004]

Research Paper

45

McGarigal, K., et. al. (2002) FRAGSTATS: (Version 3) Spatial Pattern Analysis

Program for Categorical Maps. Computer software program produced by the authors

at the University of Massachusetts, Amherst. Available online at www.umass.edu/

landeco/research/fragstats/fragstats.html [Accessed 26 August 2004].

McGarigal, K. and Marks, B.J. (1995) FRAGSTATS: (Version 2) Spatial pattern

analysis program for quantifying landscape structure. General Technical Report

PNW-GTR-351. USDA Forest Service, Pacific Northwest Research Station, Portland,

Oregon.

Moorland Working Group (2002) Scotland’s Moorland: The Nature of Change.

Battleby: Scottish Natural Heritage.

Nelder, J.A. and Wedderburn, R.W.M. (1972) Generalized Linear Models. Journal of

the Royal Statistical Society, Series A 135, 370-384.

Pearce-Higgins, J.W. and Yalden, D. W. (2003) Variation in the use of pasture by

breeding European Golden Plovers Pluvialis apricaria in relation to prey availability.

Ibis, 145, 365-381.

S-PLUS 6 for Windows User’s Guide (2001) Insightful Corporation, Seattle,

Washington. Available online at www.insightful.com [Accessed 26 August 2004].

Sterry, P. ed. (1995) Explore Britain’s Birds. AA Publishing, Basingstoke.

Stillman, R.A. and Brown, A.F. (1998) Pattern in the distribution of Britain’s upland

breeding birds. Journal of Biogeography, 25, 73-82.

Tharme et. al. (2001) The effect of management for red grouse shooting on

the population density of breeding birds on heather-dominated moorland. Journal of

Applied Ecology, 38, 439–457

Thirgood, S. et. al. (2000) Conservation conflicts and management solutions.

Conservation Biology, 14(1), 95-104.

Research Paper

46

Thompson, D.B.A. et. al. (1995) Upland heather moorland in Great Britain: A review

of international importance, vegetation change and some objectives for nature

conservation. Biological Conservation, 71, 163-178.

Tucker, N.I.J. (2001) Linkage restoration: Interpreting fragmentation theory for the

design of a rainforest linkage in the humid Wet Tropics of north-eastern Queensland.

Ecological Management and Restoration, 1(1), 35-41.

Wilson, K. and Hardy, I.C.W. (2002) Sex ratios. Concepts and research methods.

Cambridge University Press, Cambridge.

abstract - geos.ed.ac.ukomacdona/dissertation/researchpaper... · partitioning of data, to give...

Documents