bs bs banner research stacking species distribution models

14
RESEARCH PAPER Stacking species distribution models and adjusting bias by linking them to macroecological models Justin M. Calabrese 1 *, Grégoire Certain 2 , Casper Kraan 3 and Carsten F. Dormann 4 1 Conservation Ecology Center, Smithsonian Conservation Biology Institute, National Zoological Park, 1500 Remount Rd., Front Royal, VA 22630, USA, 2 Institute of Marine Research, 9019 Tromsø, Norway, 3 National Institute of Water and Atmospheric Research, Hamilton 3216, New Zealand, 4 Biometry and Environmental System Analysis, University of Freiburg, 79104 Freiburg, Germany ABSTRACT Aim Species distribution models (SDMs) are common tools in biogeography and conservation ecology. It has been repeatedly claimed that aggregated (stacked) SDMs (S-SDMs) will overestimate species richness. One recently suggested solution to this problem is to use macroecological models of species richness to constrain S-SDMs. Here, we examine current practice in the development of S-SDMs to identify methodological problems, provide tools to overcome these issues, and quantify the performance of correctly stacked S-SDMs alongside macroecological models. Locations Barents Sea, Europe and Dutch Wadden Sea. Methods We present formal mathematical arguments demonstrating how S-SDMs should and should not be stacked. We then compare the performance of macroecological models and correctly stacked S-SDMs on the same data to deter- mine if the former can be used to constrain the latter. Next, we develop a maximum-likelihood approach to adjusting S-SDMs and discuss how it could potentially be used in combination with macroecological models. Finally, we use this tool to quantify how S-SDMs deviate from observed richness in four very different case studies. Results We demonstrate that stacking methods based on thresholding site-level occurrence probabilities will almost always be biased, and that these biases will tend toward systematic overprediction of richness. Next, we show that correctly stacked S-SDMs perform very similarly to macroecological models in that they both have a tendency to overpredict richness in species-poor sites and underpredict it in species-rich sites. Main conclusions Our results suggest that the perception that S-SDMs consis- tently overpredict richness is driven largely by incorrect stacking methods. With these biases removed, S-SDMs perform similarly to macroecological models, sug- gesting that combining the two model classes will not offer much improvement. However, if situations where coupling S-SDMs and macroecological models would be beneficial are subsequently identified, the tools we develop would facilitate such a synthesis. Keywords Boosted regression trees, Kumaraswamy distribution, macroecological models, maximum likelihood, poisson binomial distribution, richness regression models, species richness, stacked species distribution models. *Correspondence: Justin M. Calabrese, Conservation Ecology Center, Smithsonian Conservation Biology Institute, National Zoological Park, 1500 Remount Road, Front Royal, VA 22630, USA. E-mail: [email protected] Global Ecology and Biogeography, (Global Ecol. Biogeogr.) (2014) 23, 99–112 © 2013 John Wiley & Sons Ltd DOI: 10.1111/geb.12102 http://wileyonlinelibrary.com/journal/geb 99

Upload: others

Post on 10-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

RESEARCHPAPER

Stacking species distribution models andadjusting bias by linking them tomacroecological modelsJustin M. Calabrese1*, Grégoire Certain2, Casper Kraan3 and

Carsten F. Dormann4

1Conservation Ecology Center, Smithsonian

Conservation Biology Institute, National

Zoological Park, 1500 Remount Rd., Front

Royal, VA 22630, USA, 2Institute of Marine

Research, 9019 Tromsø, Norway, 3National

Institute of Water and Atmospheric Research,

Hamilton 3216, New Zealand, 4Biometry and

Environmental System Analysis, University of

Freiburg, 79104 Freiburg, Germany

ABSTRACT

Aim Species distribution models (SDMs) are common tools in biogeography andconservation ecology. It has been repeatedly claimed that aggregated (stacked)SDMs (S-SDMs) will overestimate species richness. One recently suggested solutionto this problem is to use macroecological models of species richness to constrainS-SDMs. Here, we examine current practice in the development of S-SDMs toidentify methodological problems, provide tools to overcome these issues, andquantify the performance of correctly stacked S-SDMs alongside macroecologicalmodels.

Locations Barents Sea, Europe and Dutch Wadden Sea.

Methods We present formal mathematical arguments demonstrating howS-SDMs should and should not be stacked. We then compare the performance ofmacroecological models and correctly stacked S-SDMs on the same data to deter-mine if the former can be used to constrain the latter. Next, we develop amaximum-likelihood approach to adjusting S-SDMs and discuss how it couldpotentially be used in combination with macroecological models. Finally, we usethis tool to quantify how S-SDMs deviate from observed richness in four verydifferent case studies.

Results We demonstrate that stacking methods based on thresholding site-leveloccurrence probabilities will almost always be biased, and that these biases will tendtoward systematic overprediction of richness. Next, we show that correctly stackedS-SDMs perform very similarly to macroecological models in that they both have atendency to overpredict richness in species-poor sites and underpredict it inspecies-rich sites.

Main conclusions Our results suggest that the perception that S-SDMs consis-tently overpredict richness is driven largely by incorrect stacking methods. Withthese biases removed, S-SDMs perform similarly to macroecological models, sug-gesting that combining the two model classes will not offer much improvement.However, if situations where coupling S-SDMs and macroecological models wouldbe beneficial are subsequently identified, the tools we develop would facilitate sucha synthesis.

KeywordsBoosted regression trees, Kumaraswamy distribution, macroecological models,maximum likelihood, poisson binomial distribution, richness regressionmodels, species richness, stacked species distribution models.

*Correspondence: Justin M. Calabrese,Conservation Ecology Center, SmithsonianConservation Biology Institute, NationalZoological Park, 1500 Remount Road, FrontRoyal, VA 22630, USA.E-mail: [email protected]

bs_bs_banner

Global Ecology and Biogeography, (Global Ecol. Biogeogr.) (2014) 23, 99–112

© 2013 John Wiley & Sons Ltd DOI: 10.1111/geb.12102http://wileyonlinelibrary.com/journal/geb 99

INTRODUCTION

Species distribution models (SDMs) have become standard

tools in biogeography and conservation biology, both for under-

standing the factors that affect species’ geographical ranges and

for predicting the response of species to global change (Scott

et al., 2002; Franklin, 2009; Peterson et al., 2011). The increasing

availability of large-scale, multispecies data sets, coupled with

advances in modelling techniques and software, has resulted in

the development of SDMs for all or many constituent species of

some communities (Thuiller, 2003; Elith et al., 2006). When

community-wide SDM coverage exists, it is natural to attempt

to combine the species-level models into a descriptor of

community-level properties such as species richness. The appeal

of this approach is that it may facilitate an easy assessment of

site-level biodiversity from knowledge of a handful of readily

measurable – or perhaps already available – covariates. The

process by which SDMs are combined into community-level

models is often referred to as ‘stacking’ (e.g. Ferrier & Guisan,

2006), so these models of species richness are also called stacked

species distribution models (S-SDMs, e.g. Mateo et al., 2012).

Despite their potential, current evidence suggests that individual

SDMs do not aggregate well into an unbiased description of

species richness (Guisan & Rahbek, 2011). Specifically, S-SDMs

are thought to systematically overpredict site-level richness

(Guisan & Rahbek, 2011; Hortal et al., 2012).

To remedy this situation, Guisan & Rahbek (2011) proposed

an integrated framework, SESAM, that starts with site-level rich-

ness and species composition predictions from S-SDMs, and

then progressively refines these predictions, by applying macr-

oecological models (MEMs), dispersal filters and ecological

assembly rules. They suggest that in addition to delivering better

predictions of richness than current S-SDMs, this approach

would also yield improved prediction of species composition via

the collection of refined site-level SDM occurrence probabilities.

The core assumptions upon which SESAM rests are that: (1)

S-SDMs consistently overpredict richness relative to MEMs; and

(2) the reason for this discrepancy is that S-SDMs lack biotic

filters such as dispersal limitation and ecological assembly rules

(Guisan & Rahbek, 2011; Hortal et al., 2012). Although SESAM

represents a bold attempt at synthesis and integration, we feel

that its foundational assumptions need to be examined before it

can be accepted, and several technical hurdles must be overcome

before it could be implemented.

First, before pursuing biotic explanations for the frequently

observed discrepancy between S-SDMs and MEMs, which

might require significant new research effort, simple statistical

artifacts introduced by the way S-SDMs are built should be

ruled out. A key candidate for such an artifactual cause is the

currently common practice of applying thresholds to SDM-

predicted occurrence probabilities to produce binary presence/

absence predictions. These presence/absence predictions are

then summed for each site to ‘stack’ the S-SDM. We are aware of

no formal, theoretical justification for this practice, and the large

array of different ad hoc thresholding schemes currently prolif-

erating in the literature (Table 1) suggests there is confusion

around this issue. Furthermore, the few studies that have com-

pared S-SDMs stacked both with and without thresholds show

dramatic differences between the two approaches (Aranda &

Lobo, 2011; Dubuis et al., 2011). A deeper exploration of the

process of stacking individual SDMs into an S-SDM is therefore

clearly warranted.

Second, the assumption that S-SDMs overpredict richness

relative MEMs needs to be evaluated. MEMs have contributed

substantially to our understanding of large-scale ecology and

biodiversity, and have been valuable research tools (Brown,

1995; Gaston, 2000; Gaston & Blackburn, 2000; Hawkins &

Diniz-Filho, 2004; Kerr et al., 2007; Algar et al., 2009). We might

therefore expect MEMs to predict site-level richness well, and

probably better and more consistently than S-SDMs, which have

substantial problems dealing with rare species (Graham &

Hijmans, 2006; Pineda & Lobo, 2009; Guisan & Rahbek, 2011).

Although this assumption is a core premise of the SESAM

framework, we are not aware of any systematic studies of how

accurately MEMs and S-SDMs predict observed species richness

on the same data sets. Given that MEMs and S-SDMs tend to use

the same predictor variables, it could be that S-SDMs, when

correctly stacked, perform similarly to MEMs.

Third, we explore how correctly-stacked S-SDMs deviate

from observed richness. Understanding how SDM occurrence

probabilities need to be adjusted to bring S-SDM predictions in

line with observed richness should allows us to examine the

specific ways in which correctly stacked S-SDMs are deficient,

and should also suggest how they may be improved in the

future. Such adjustments would also be necessary to use MEMs

to constrain S-SDMs, as suggested by Guisan & Rahbek (2011),

should further evidence suggest that such an approach is war-

ranted, but the methods to do this are currently lacking.

Here, we bring together four diverse data sets to explore how

S-SDMs perform relative to MEMs. First, we present the relevant

probability theory to establish the distribution of a site-level

richness prediction under an S-SDM and the correct (and exact)

way to stack individual SDMs into such a prediction. We review

the S-SDM literature to document the different stacking

approaches that have been used, most of which are based on

thresholding site-level occurrence probabilities, and then

present a mathematical argument to demonstrate why thresh-

olding will generally be incorrect. Next, we examine the accuracy

of the species richness predictions of both correctly stacked

S-SDMs and MEMs side-by-side on our four data sets. Finally,

we develop a maximum-likelihood approach to adjusting

S-SDM occurrence probabilities given some estimate or predic-

tion of site-level species richness, and then apply this tool to our

data sets to quantify the ways in which correctly-stacked

S-SDMs deviate from observed site-level richness.

Case studies

The four different data sets we use to illustrate our points

throughout the paper were chosen to represent contrasting

properties, in an attempt to maximize the generality of any

pattern we might detect.

J. M. Calabrese et al.

Global Ecology and Biogeography, 23, 99–112, © 2013 John Wiley & Sons Ltd100

The ‘Wadden Sea macrozoobenthos’ data (Kraan et al., 2007,

2009, 2010) were collected in 2005 and represent 2549 sites in

which the abundance of 24 organisms (polychaetes, bivalves and

crustaceans) was sampled to the lowest taxonomic level. These

sites are arranged in a fixed grid with 250 m intervals between

sites, encompassing 225 km2 of soft-sediment flats which are

exposed during low tide. Two environmental predictors

(median grain size and elevation relative to the Dutch ordnance

datum) were shown to be consistent and good predictors for

these species (Kraan et al., 2013).

‘EU forest trees’ is a gridded (20 km ¥ 20 km) version of the

distribution of forest trees in the member states of the European

Union (Köble & Seufert, 2001). This data set of 111 species

represents a satellite-based classification trained on national

forest inventories, and is restricted to forested area, so does not

represent the full range of environmental conditions under

which these species may occur outside forests (e.g. in parks and

gardens). As predictor variables, we used temperature seasonal-

ity (standard deviation ¥ 100), maximum temperature of the

warmest month, mean temperature of the coldest quarter,

annual precipitation, precipitation of the driest quarter, and

precipitation of the warmest quarter (these are WorldClim’s

Bioclim variables 4, 5, 11, 12, 17, 18; Hijmans et al., 2005).

Additionally, we used growing degree days (http://www.sage

.wisc.edu/atlas/maps.php?datasetid=31&includerelatedlinks

=1&dataset=31) and water balance (http://edit.csic.es/Climate

.html). These variables were not problematically collinear

(|r| < 0.7; Dormann et al., 2013).

The ‘Barents Sea trawls’ data set was collected each summer

(August–September) from 2004 to 2008, following a regular

sampling scheme of stations separated by 30–40 km (Jakobsen &

Ozhigin, 2011; Johannesen et al., 2012). The standard towing

time was 15 min at 3 knots, equivalent to a towing distance of

0.75 nautical miles. Data were collected for 81 taxonomic groups

of fishes (mainly species) and 29 taxonomic groups of inverte-

brates (Johannesen et al., 2012). The community is restricted to

the 79 species that were frequent enough for an SDM to be

implemented. We used 25 environmental predictors including:

bottom depth; bottom depth gradient; bottom temperature in

August–October averaged over the last 30 years; year bottom

temperature anomaly; year bottom temperature gradient;

surface temperature in August–October averaged over the last

Table 1 Non-exhaustive chronology of studies stacking SDMs to yield richness estimates (to July 2012). (For explanation ofgoodness-of-classification measures, see Fielding & Bell, 1997).

Study Method* Organism group Comments†

Skov & Borchsenius, 1997 Eqn 2 Ecuadorian palms

Guisan et al., 1999 T k Nevada mountain plants S-SDM compared to CCA, not richness

Erasmus et al., 2002 T CCR Various taxa, South Africa Birds, mammals, reptiles, butterflies

Ferrier et al., 2002 Eqn 2 various NZ taxa

Lehmann et al., 2002 Eqn 2 NZ ferns

Loiselle et al., 2003 T (0.5, 0.85, 0.95) Brazilian birds

Peppler-Lisbach & Schröder, 2004 T k Grassland plants Community composition, not richness

Graham & Hijmans, 2006 T k Amphibians and reptiles, California MaxEnt; 159 species

Loiselle et al., 2007 T (ª 0.01) Plant richness, Bolivia/Ecuador MaxEnt; one threshold for all species

Algar et al., 2009 T LS Canadian butterflies MaxEnt; S-SDM & MEM extremely similar

Pineda & Lobo, 2009 T (21 different) Mexican amphibians MaxEnt; same threshold for all species

Raes et al., 2009 T (0.1) Plant richness, Borneo MaxEnt; same threshold for all species

Randin et al., 2009 T CCR Swiss alpine plants Biomod

Barbet-Massin et al., 2010 T SES Iberian/African birds Biomod

Buisson et al., 2010 T TSS French stream fish Biomod

Aranda & Lobo, 2011 Eqn 2, T (various) Plants, Tenerife, Spain MaxEnt; same threshold for all species

Dubuis et al., 2011 Eqn 2, T TSS, B Swiss plants Huge different between Eqn 2 and T

Fitzpatrick et al., 2011 T TSS North American ants

Kaschner et al., 2011 T (various) World marine mammals Same threshold for all species

Mateo et al., 2012 T (min. commission) Two Andean plant groups Ensemble; max. commission error 0.05

Rondinini et al., 2011 T (3 unknown values) Global mammals

Pineda & Lobo, 2012 T (21 different) Mexican amphibians MaxEnt; same threshold for all species

Schmidt-Lebuhn et al., 2012 Eqn 2 Australian Asteraceae MaxEnt; suitability interpreted as probability

*Methods: Eqn 2, summing individual probabilities; T, thresholding then summing; B, drawing repeatedly from a Bernoulli distribution. Thresholdsused: k, maximizing kappa; CCR, maximizing correct classification rate; TSS, maximizing true skill statistic (sensitivity + specificity -1); SES, thresholdfor which sensitivity equals specificity; LS, threshold determined by the lowest probability under which a species has been observed. Numeric values referto the threshold(s) used.†Biomod (Thuiller, 2003; Thuiller et al., 2009) has built-in facilities to average models and may automatically threshold predicted occurrence prob-abilities; MaxEnt (Phillips et al., 2006) refers to a common modelling approach that does not yield probabilities, but a relative index of habitat suitabilityranging from 0 to 100 (Elith et al., 2010). This index is still often converted into presence/absence data and stacked in the same way as probabilities.

Stacking species distribution models

Global Ecology and Biogeography, 23, 99–112, © 2013 John Wiley & Sons Ltd 101

30 years; year surface temperature anomaly; year surface tem-

perature gradient; bottom salinity in August–October averaged

over the last 30 years; year bottom salinity anomaly; year bottom

salinity gradient; surface salinity in August–October averaged

over the last 30 years; year surface salinity anomaly; year surface

salinity gradient; gradient of potential energy deficit; mixed

layer depth; gradient of mixed layer depth; surface concentra-

tion of chlorophyll a (CHLa); gradient of surface concentration

of CHLa; number of ice-covered months in the last 10 months

before the summer; median velocity of the bottom current taken

in January–March; horizontal component of the bottom current

vector averaged over January–March; gradient of horizontal

component of the bottom current vector averaged over

January–March, vertical component of the bottom current

vector averaged over January–March, and gradient of vertical

component of the bottom current vector averaged over

January–March. None of these predictors were problematically

collinear (|r| < 0.7; Dormann et al., 2013).

‘EU mammals’ is a 50 ¥ 50 km gridded version of the Euro-

pean Mammal Assessment (Temple & Terry, 2007). It comprises

140 species in 3036 grid cells. We used 13 uncorrelated environ-

mental predictors, of which five were climatic (growing degree

days, annual precipitation, summer precipitation, temperature

seasonality and residuals of absolute minimum temperature),

six were related to land cover (proportion of crop, grassland,

mosaic habitat, shrubland, urban and forest) and two were

topographic (residuals of mean elevation, residuals of slope)

(see Dormann et al., 2010, for details).

All data were analysed as presence/absence data using boosted

regression trees (BRTs; Elith et al., 2008). We used a tree com-

plexity of three, a learning rate of 0.005 and 5-fold cross-

validation. For common species, we increased the learning rate

to 0.025, aiming for 2000 to 5000 trees per species. In a previous

analysis (Dormann et al., 2010), these settings worked well,

yielding BRT models with high discriminatory power. We did

not investigate the species-specific models in any detail, as we

used these models merely as an approach to generate expected

probabilities of occurrence under different environmental con-

ditions. To verify that our results were not driven by the BRTs

overfitting the data, in Appendix S1 we used GLMs (with

quadratic terms and first-order interactions) combined with a

conservative approach to model selection to yield more parsi-

monious SDMs, and then performed the same set of analyses

that we describe below on these GLM-based SDMs.

Species richness was analysed in the same way, separately

assuming Poisson or normal distributions and using the one

that better fitted the data (based on a lower BIC). Observed

species richness Sjobs( ) was computed as the sum of species

recorded for site j and analysed in the MEM also with BRTs,

trying both Poisson and normal distributions. In each case, the

residual diagnostics indicated the normal model to be slightly

better.

Summary statistics for the data sets can be found in Table 2.

The EU mammals data are typical of range-map analyses. The

EU forest trees data are similar, but much more restricted in

their scope, as they only include occurrences in forests. This

could introduce a strong bias, because we are thus modelling

forest use, rather than a physiological niche. Both marine data

sets are based on ‘point’ samples (even if the points are half-hour

trawls), i.e. data with high accuracy for the sampled site. This

kind of data is inherently much better than range maps or atlas

data, because environmental conditions can be quantified

without the loss of subscale variability than is inevitable with

grid-based analyses (see Rocchini et al., 2011, and Beck et al.,

2012, for recent discussions of data quality issues). However,

some predictors used in the analysis (Barents Sea: bottom tem-

perature, speed of currents; Wadden Sea: elevation) are based on

models of ocean currents (Barents Sea) or interpolated from a

lower-resolution sampling scheme (Wadden Sea) and may thus

be of lower quality. Furthermore, these single-visit point

samples are very likely to have detected only a proportion of the

local community actually present, thus consistently underesti-

mating true local richness.

Distribution theory for stacking SDMs

Stacked SDM predictions are built from the J ¥ K matrix of

SDM-predicted occurrence probabilities, P, where J and K are

the number of sites and species in the data set, respectively. We

use the term ‘site’ here loosely to mean a location corresponding

to a model prediction, which may range from a single point

where sampling was conducted to the much coarser grid cells of

range-map analyses. The matrix P has elements pj,k, each of

which is the occurrence probability of the k-th species at the j-th

site. Let the 1 ¥ K vector, pj, represent the occurrence probabili-

ties of each species at site j. The occurrence of the k-th species at

the j-th site is a Bernoulli trial (like a coin toss) with probability

of success pj,k. As in all other studies that sum the individual

SDM occurrence probabilities or presence/absence predictions,

we assume that all K Bernoulli trials at site j are independent.

The species richness at the j-th site is then the sum of K inde-

pendent but non-identical Bernoulli trials with probability of

success vector pj.

Table 2 Case study characteristics,sorted by increasing mean site-levelrichness, Sj

obs( ).

Data set Sample size Mean Sjobs( ) (min, max) Reference

Wadden Sea benthos 2549 4.0 (1, 11) Kraan et al. (2010)

EU forest trees 13038 4.9 (1, 21) Köble & Seufert (2001)

Barents Sea trawls 2457 16.4 (1, 33) Johannesen et al. (2012)

EU mammals 3036 44.2 (1, 73) Dormann et al. (2010)

J. M. Calabrese et al.

Global Ecology and Biogeography, 23, 99–112, © 2013 John Wiley & Sons Ltd102

Given these considerations, the site-level species richness pre-

diction, Sj, follows a Poisson binomial (sometimes also called

Poisson’s binomial) distribution with probability mass function

(Wang, 1993; Fernandez & Williams, 2010)

Pr , ,SK

e p e pj ji nS K

j k

i n

Kj k

k

Kj|p( ) =

++ −( )⎡

⎣⎢⎤⎦⎥

− +( ) +

=

1

112 1

2

1

1

ππ

∏∏∑⎛⎝⎜

⎞⎠⎟

=n

K

0

, (1)

where i = −1 is the imaginary unit. The expected value (mean)

and variance of S-SDM predictions for the j-th site under the

Poisson binomial distribution are

E S pj j k

k

K

( ) ==

∑ ,

1

(2)

Var S p pj j k j k

k

K

( ) = −( )=

∑ 11

, , . (3)

Equations 2 & 3 are both exact and provide a formal theoretical

basis for stacking S-SDMs. Given the pj, the expected value of

the Poisson binomial distribution, Eqn 2, is the best predictor of

site-level species richness. This means that the proper way to

aggregate individual SDMs into an S-SDM given the above

assumptions is simply to sum the site-level occurrence prob-

abilities. Equation 3 is the exact variance of the site-level rich-

ness prediction assuming the pj are fixed, known quantities. In

reality, the pj are estimated with uncertainty, and ignoring this

uncertainty would result in misleadingly narrow confidence

intervals on Sj. Error propagation techniques (e.g. Chapter 5 in

Clark, 2007) could be used in combination with Eqn 3 to fully

account for uncertainty in the site-level richness prediction.

Why thresholds should not be used to stack SDMs

In practice, many studies have not applied Eqn 2, but have

instead employed various thresholding schemes to convert site-

level occurrence probabilities into binary presence/absence pre-

dictions, which are then summed (Table 1). The arguments in

the previous section provide a formal justification, grounded in

probability theory, for stacking via Eqn 2. However, despite the

commonness of stacking via thresholded occurrence probabili-

ties (Table 1), we were unable to find a similar theoretical justi-

fication for this practice in the literature. Although thresholding

methods that account for differences in prevalence among

species have been shown to perform somewhat better (Liu et al.,

2005), Table 1 illustrates that a single global threshold across

species differing vastly in prevalence has even been used. Our

goal in this section is to show that thresholding schemes, even if

they account for species-specific prevalences, will generally yield

incorrect results when the aim is to construct an S-SDM from a

set of SDMs.

Under a thresholding scheme, the presence or absence of the

k-th species at the j-th site is given by the indicator

I p Tp T

p Tj k j k

j k j k

j k j k, ,

, ,

, ,

, ,( ) =>≤{1

0

if

if

where the threshold, T, is, for now, both site-specific and

species-specific. Thresholded richness at the j-th site is then

calculated as

S I p Tj j k j k

k

Ktsh( )

=

= ( )∑ , ,, ,1

(4)

where we have used a superscript in parentheses as a label on Sj.

The values of pj,k are estimated with uncertainty and can thus be

described by probability distributions. Under repeated sam-

pling, the average probability of success (occurrence) of the k-th

species at site j over N independent realizations of pj,k is

pN

pj k j k l

l

N

, , ,==∑1

1

, where pj,k,l denotes the l-th realization of pj,k.

We define the exact threshold for species k at site j such that,

as N becomes large, the proportion of thresholded presences is

pj k, . This threshold is exact in the sense that, if all species-by-site

combinations had such a threshold, the average result of

summing the thresholded presence values for each site j would

converge to Eqn 2 as N approaches infinity. The exact threshold

is defined as

T Q pj k j k j k, , , ,= ( ) (5)

where Qj,k(y) represents the y-th quantile of the probability dis-

tribution of pj,k. This thresholding scheme is exact only on

average over a large number of realizations. Furthermore, this

scheme requires J ¥ K unique thresholds, making it unreason-

able in practice.

We now consider the most commonly used thresholding

approach (prevalence-based thresholding), where each species

has a single threshold, Tk, across all sites. Let pk be the vector of

occurrence probabilities of the k-th species across all J sites. The

values of pk are described by a probability distribution with

support on [0, 1], which we will refer to as the ‘parent distribu-

tion’. We further assume that the J sites can be ordered according

to their suitability for the k-th species, and that the occurrence

probabilities of the k-th species tend to increase with site

suitability.

A single random realization of pk is generated by drawing J

independent values from the parent distribution and ordering

them from minimum to maximum value. The minimum is

assigned to the least suitable site for species k, the second small-

est is assigned to site with the second lowest suitability, and so on

until the sample maximum is assigned to the most suitable site.

Each of these ordered occurrence probabilities is an order sta-

tistic and, under repeated sampling, is described by a probability

distribution related to the parent distribution (Casella & Berger,

2002).

For convenience, we label the sites according to their suitabil-

ity for the k-th species, such that site j = 1 is the least suitable and

j = J is the most suitable. In general, the j-th order statistic of a

random sample of size J drawn from a parent distribution with

parameter vector q has expected value mj (J, q), and quantile

function Qj (y; J, q). A key property of order statistics is that the

functional forms of the mean and quantile functions depend on

j, and thus, in our setup, are different at each site. In other words,

Stacking species distribution models

Global Ecology and Biogeography, 23, 99–112, © 2013 John Wiley & Sons Ltd 103

the first moments m1 and m2, for example, do not just have

different numerical values but are different functions. The same

is true of the quantile functions. We provide a specific example

of this property in Appendix S2. From Eqn 5, a single, species-

specific threshold for all sites would be exact only if

T Q m J J jk j j k k= ( )( ), ; , .θ θ for all (6)

Notice that for species k, J and qk will be the same for all sites.

However, because each site represents a different order statistic,

the functional forms of both mj and Qj change at each site j. The

particular way in which the functional forms of the mean and

quantile function change across the different order statistics

(sites) is determined entirely by the parent distribution.

However, as all of these functions are very likely to be non-linear

in qk and J, the conditions under which Eqn 6 would hold across

all sites seem to be exceptionally narrow. We are not aware of any

probability distribution with support on [0, 1] that would satisfy

Eqn 6, and we suspect these conditions will never be met in

practice. In Appendix S2, we assume the parent distribution is a

Kumaraswamy distribution (an alternative to the beta distribu-

tion, which is more mathematically tractable for this analysis;

Jones, 2009), and then prove by counterexample that Eqn 6

cannot generally be satisfied, even in the simplified case of only

J = 2 sites.

We therefore conclude that thresholding schemes will lead,

quite generally, to biased results relative to the exact calculation

given by Eqn 2. Only the case of a unique threshold for each

site-by-species combination guarantees agreement with Eqn 2,

and even then only asymptotically. Thus, thresholding

approaches suffer from the dual limitation that there is no clear

justification for using them to construct S-SDMs and that, in

doing so, one is almost certain to obtain an incorrect result.

To date, two studies have compared S-SDMs stacked via

thresholding with those stacked via Eqn 2. Both Aranda & Lobo

(2011) and Dubuis et al. (2011) found dramatic differences

between summing probabilities (or, in the case of Aranda &

Lobo, 2011, relative suitabilities) and threshold-derived

presence/absences. In Appendix S3, we add to this body of

results. Specifically, we compare S-SDM predictions with and

without thresholding for our four empirical examples. We use

the prevalence-based thresholding scheme advocated by Liu

et al. (2005). In all four cases, thresholding led to the overpre-

diction of richness relative to Eqn 2. In three out of four cases,

the bias introduced by thresholding was severe. Our arguments

above and in Appendix S2 demonstrate why it is not surprising

that such differences exist, but the two case studies of Aranda &

Lobo (2011) and Dubuis et al. (2011), coupled with our four

examples (Appendix S3), provide clear evidence that the direc-

tion of the biases will tend towards overprediction, and that the

magnitude of these discrepancies will often be very large. In

other words, the distinction between stacking according to Eqn

2 and stacking via thresholding is not merely one of fine detail,

but can instead result in a qualitatively different relationship

between S-SDM predictions and observed richness.

Are macroecological models less biasedthan S-SDMs?

In the previous sections, we have provided a formal basis from

which to build S-SDMs, showed that thresholding will generally

yield incorrect results, and illustrated with our cases studies

(and two studies from the literature) that thresholding intro-

duces an often pronounced bias in the direction of overpredic-

tion. With these issues out of the way, we now examine how

correctly stacked S-SDMs compare to MEMs. So far, only two

studies, Algar et al. (2009) and Dubuis et al. (2011), have per-

formed such a comparison, and both show extremely similar

patterns for S-SDMs stacked according to Eqn 2 and MEMs. In

Algar et al. (2009), S-SDMs and MEMs both fit the observed

richness very closely: the results of both approaches are similar

to each other and to the observed richness. Although both

approaches are very similar to each other in the study of Dubuis

et al. (2011), they both overpredict species richness in species-

poor sites, and underestimate in species-rich sites (their

Fig. 1a,c), in contrast to the assertion of Guisan & Rahbek

(2011) that S-SDMs consistently overestimate richness.

Based on our four data sets (Table 2; see Appendix S4 for

visualization), we can tentatively conclude that correctly stacked

S-SDMs are no worse than MEMs (mean R2 across the case

studies of 0.71 vs. 0.75; see Table 3); that S(S-SDM) and S(MEM) are

highly correlated (r = 0.945); and that both S-SDMs and MEMs

fairly consistently underpredict richness at species-rich sites and

overpredict at species-poor sites (i.e. that calibration slopes

are < 1 and intercepts > 0; Table 3).

Obviously, six data sets (the four described herein, plus those

of Algar et al., 2009, and Dubuis et al., 2011) are not sufficient to

generalize these findings. The high consistency does suggest,

however, that both correctly stacked S-SDMs and MEMs tend to

exhibit the same biases. Combining these results with those of

the previous sections on thresholding strongly suggests that the

oft-cited overprediction of richness exhibited by S-SDMs is pri-

marily a statistical artifact introduced by threshold-based stack-

ing methods, and is not a result of these models ignoring

dispersal filters or species interactions.

A maximum-likelihood approach to adjustingS-SDM predictions

We now develop a general method for adjusting S-SDM richness

predictions by modifying the pj,k across a data set as a function of

site-level species richness, Sj. When an MEM that accurately

predicts richness is available, we can set S Sj j= ( )MEM , which is the

MEM-predicted richness at site j. Our approach then could be

used to facilitate the synthesis of S-SDMs and MEMs envisioned

by Guisan & Rahbek (2011). The resulting adjusted occurrence

probabilities, p j*, should then yield more accurate information

about the composition of the community, which the MEM

alone could not provide.

When we wish to quantify the nature and magnitude of the

discrepancy between a raw S-SDM and observed data, we can set

S Sj j= ( )obs , the observed species richness at the j-th site. The

J. M. Calabrese et al.

Global Ecology and Biogeography, 23, 99–112, © 2013 John Wiley & Sons Ltd104

adjustment is then summarized by the values of the adjustment

parameters (see below), and this summary will facilitate com-

parisons of the performance of S-SDM predictions across data

sets and taxa. Such comparisons should help to reveal consistent

patterns in the way that S-SDM predictions go wrong, and

might suggest concrete ways in which S-SDMs could be modi-

fied to improve their performance.

By definition, the occurrence probabilities must satisfy

0 � pj,k � 1 for all j and k. We wish to apply an adjustment to the

pj that depends on Sj, which may be SjMEM( ) or Sj

obs( ) depending

on the context. A simple way to do this, while still respecting the

above-mentioned constraint, is to first logit-transform the pj,

apply an additive adjustment that depends on Sj to the logit-

transformed values, and then inverse-logit-transform the

adjusted values back to the probability scale. Let

q pp

pj k j k

j k

j k, ,

,

,

log ;= ( ) =−

⎛⎝⎜

⎞⎠⎟

logit1

(7)

the adjusted values can then be defined as

q q aS bj k j k j, ,* ,= + + (8)

where a and b are the adjustment parameters. The adjusted

occurrence probabilities are then

p qe

j k j kqj k

, , ** ( * ) .

,= =

+−

−logit 1 1

1(9)

For a given data set, the values of the adjustment parameters a

and b can be estimated via maximum likelihood, by calculating

Pre

dic

ted

rich

nes

s

0 2 4 6 8 10 120

2

4

6

8

10

12 A

Original

Corrected

0 5 10 15 200

5

10

15

20B

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35 C

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70 D

Observed richness

Figure 1 Predicted richness from theoriginal (blue) and adjusted (red)S-SDMs versus observed richness, for theWadden Sea macrobenthos (a),European forest trees (b), Barents Seatrawls (c) and European mammals (D)data sets. All four examples respondedsimilarly to adjustment, but requiredadjustments of different strengths.Regression lines (solid) and 95%confidence bands (dashed) are providedto summarize the relationship betweenthe original and adjusted occurrenceprobabilities and observed richness. Notethat the regression line for the adjustedprobabilities (red) falls very close to the1:1 line (black) in all four cases.

Table 3 Calibration regressions(ordinary least squares) of S-SDMs andMEMs on observed species richness.Perfect fits would have intercepts of 0and slopes of 1. Notice high correlationbetween both approaches (last column).

S-SDM MEM S-SDM–MEM

Intercept slope R2 Intercept slope R2 Correlation

Wadden Sea benthos 2.119 0.435 0.46 2.003 0.466 0.49 0.989

EU forest trees 0.816 0.820 0.90 0.665 0.159 0.79 0.934

Barents Sea trawls 8.700 0.472 0.61 6.598 0.602 0.72 0.928

EU mammals 3.636 0.894 0.87 0.454 0.990 1.00 0.930

Stacking species distribution models

Global Ecology and Biogeography, 23, 99–112, © 2013 John Wiley & Sons Ltd 105

the probability of Sj for each site given the modified p j* charac-

terizing that site, which depend on the original pj and the adjust-

ment parameters. These probabilities are given by the Poisson

binomial distribution (Eqn 1). Let S = {S1, S2, . . ., SJ} denote the

vector of species richnesses (either predicted or observed) for

the J sites within a data set. The log-likelihood function is then

L a b S pj j k

k

K

j

J

, log(Pr( * )),,| |S( ) ===

∑∑11

(10)

where the dependence on the parameters a and b enters through

the adjusted occurrence probabilities, pj k,* . Equation 10 can be

maximized numerically to yield the maximum-likelihood esti-

mates a and b . We implemented the estimation procedure in R

(R Development Core Team, 2012) using the package poibin

(Hong, 2011), which provides an efficient implementation of

Eqn 1.

The features of the pj, and how adjustment changes them as a

function of Sj, can be summarized by fitting beta distributions to

them on a site-by-site basis. This can be readily achieved using

the method of moments, because the beta distribution has

closed-form moment estimators (Chapter 4 in Clark, 2007). The

beta distribution is essentially identical to the Kumaraswamy

distribution, which we use in Appendix S2, in terms of shape

and behaviour, but has contrasting mathematical properties

(Jones, 2009). Although the Kumaraswamy distribution is useful

for the analyses of Appendix S2, the beta distribution is much

more convenient for our purposes here. Comparing the fitted

beta distributions of the original and adjusted occurrence prob-

abilities at a site allows one to visualize the effects of adjustment,

but, because all of our example data sets have a large number of

sites, a site-by-site summary is impractical. Instead, we can look

at the average effects of adjustment by averaging the fitted beta-

distribution parameters within each data set over sites with the

same observed species richness. This approach yields a pair of

original and adjusted beta-distribution parameters for each level

of species richness in the data set. The two distributions for each

richness level can then be plotted against each other to visualize

the average effects of the adjustment.

Adjustment will most strongly affect low and high occurrence

probabilities, which correspond to rare and common species,

respectively. It is therefore useful to visualize the relative effects

of adjustment on these two groups of species. Rare species will

tend to occur in the lower quantiles of the pj, and common

species in the upper quantiles. We therefore use the 0.05 quantile

of pj at each site to represent the rare species and the 0.95

quantile to represent the common species. We then average the

value of each of these quantiles over all sites within a data set

that share a particular richness level. The ratio of these averaged

quantile values after adjustment to their values before adjust-

ment (Qadj/Qorg) then indicates the degree of adjustment that

was applied. When this ratio is 1, there was no adjustment; when

it is above 1, occurrence probabilities were increased by the

adjustment; when it is less than 1, the probabilities were

decreased. The ratio for the rare species can then be plotted

together with that for the common species in order to visualize

the degree of adjustment experienced by rare and common

species across sites ranging from low richness to high richness.

How S-SDM predictions deviate fromobserved richness

All four example data sets show the same qualitative pattern of

discrepancy relative to observed species richness: richness is

overestimated at species-poor sites and underestimated at

species-rich sites (Fig. 1). All data sets responded well to adjust-

ment, with the mean of the adjusted predictions falling very

close to the 1:1 line across the range of observed species richness

(Fig. 1). The strength of adjustment required, however, differed

among data sets and seems to be related to the maximum single-

site species richness in the data set (Table 4). Specifically, the

data sets with high site-level richness had low values of the

adjustment slope, a, and vice versa (Table 4). The magnitude of

the adjustment intercept, however, did not show a clear relation-

ship either with the maximum site-level species richness or with

the total number of species in a data set. Both adjustment

parameters differed significantly from zero for all four data sets,

indicating that adjustment was beneficial in all cases (Table 4).

The distributions of pj varied considerably with observed

species richness in all data sets (Fig. 2). Species-poor sites fea-

tured unimodal pj distributions with the mode occurring at 0

(Fig. 2, left column). Species-rich sites, in contrast, had bimodal

(U-shaped) distributions, with modes at 0 and 1 (Fig. 2, right

column). Intermediate sites were either unimodal or bimodal,

with the two data sets with the highest total richness – EU trees

and EU mammals – showing the bimodal pattern at

intermediate-richness sites (Fig. 2, middle column). Adjustment

tended to accentuate the original patterns at the low-richness

and high-richness site extremes, but had little effect at interme-

diate sites.

The patterns revealed by examining the distributions of the pj

suggest that species tend to have either very low or very high

occurrence probabilities. Adjustment should therefore have a

strong effect on the tails of the distributions of occurrence prob-

abilities. Focusing on the 0.05 and 0.95 quantiles of the pj dis-

tributions, adjustment tended to depress the values of both

Table 4 Maximum likelihood estimates (MLEs), standard errors(SEs), z-values and P-values for the correction parameters a and

b across the four data sets.

Data set Param. MLE SE z-value Pr(z)

Wadden Sea benthos a 0.290 0.006 45.46 < 10-15

b -1.248 0.032 -38.74 < 10-15

EU forest trees a 0.115 0.002 47.83 < 10-15

b -0.706 0.018 -38.62 < 10-15

Barents Sea trawls a 0.101 0.001 79.49 < 10-15

b -1.783 0.024 -73.66 < 10-15

EU mammals a 0.017 < 10-5 5319.20 < 10-15

b -0.365 < 10-4 -5621.20 < 10-15

J. M. Calabrese et al.

Global Ecology and Biogeography, 23, 99–112, © 2013 John Wiley & Sons Ltd106

common (0.95 quantile) and rare (0.05 quantile) species by a

similar amount in the low-richness sites (Fig. 3). In other words,

in species-poor sites, the occurrence probabilities of both

common and rare species were overpredicted. In the higher-

richness sites, adjustment substantially increased the occurrence

probabilities of rare species (Fig. 3); the occurrence probabilities

of common species also increased, but the magnitude of the

increase was small, and tended to level off for sites at the high

end of the richness spectrum (Fig. 3). In between, each data set

featured a (not necessarily integer) richness value where the

adjustment had no effect on either quantile.

DISCUSSION

How (and how not) to stack

From our probability theory argument in the first section, it is

clear that the process of stacking SDMs is straightforward and

requires only a summation of the per-site occurrence probabili-

ties, pj. We have also clearly demonstrated that the far more

common approach of first converting pj values into presence/

absence predictions and then summing them will generally be

incorrect, and will result in biased predictions of species rich-

ness. Additionally, our four example data sets, plus two existing

data sets in the literature (Algar et al., 2009; Dubuis et al.,

2011), show that thresholding schemes lead to S-SDMs that

overpredict richness relative to using Eqn 2, often dramatically.

Taken together, our formal argument justifying stacking via

Eqn 2, the notable lack of a similar argument justifying thresh-

olding, our analytical proof that thresholding will generally

lead to biased results, and six empirical examples showing that

thresholding leads to systematic overprediction of site-level

richness provide strong evidence that using a thresholding

scheme to build an S-SDM is incorrect and will produce biased

results. We therefore strongly recommend that this practice be

immediately put to rest. It is important to note, however, that

our results apply only to using thresholds to stack S-SDMs, and

do not imply anything about the use of thresholds for other

purposes (e.g. for generating range maps from SDM-predicted

occurrence probabilities).

Researchers working primarily with presence-only data may

object that their models do not predict the probability of occur-

rence but rather an index of habitat suitability (Phillips et al.,

2006; Phillips & Dudík, 2008). Although true, the conversion

using one or more arbitrary thresholds, as is commonly done

(Table 1), does not solve this problem. It implicitly assumes that

suitability values are comparable among species, which may or

may not be the case. It should only be a matter of time until

recent papers on the equivalence in principle of MaxEnt and

Poisson point process models (e.g. Renner & Warton, 2013)

facilitate a conversion of MaxEnt output to probabilities. Until

then, we propose to use our maximum-likelihood approach to

adjust MaxEnt-generated suitabilities, i.e. to treat them as if

they were probabilities.

Relationships between S-SDMs and MEMs

When stacked via Eqn 2, we find that S-SDMs make richness

predictions that are very similar to those of MEMs on the same

data.All four of our data sets confirm this pattern,with an average

correlation of 0.945 between S-SDMs and MEMs (Table 3). Two

other studies in the literature have also noted this similarity. In a

Pro

bab

ility

den

sity

0.0 0.2 0.4 0.6 0.8 1.00.00.51.01.52.02.53.03.5

Macben, Sobs = 1

Original

Adjusted

0.0 0.2 0.4 0.6 0.8 1.00.00.51.01.52.02.53.03.5

Macben, Sobs = 5

0.0 0.2 0.4 0.6 0.8 1.00.00.51.01.52.02.53.03.5

Macben, Sobs = 10

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

EUtree, Sobs = 1

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

EUtree, Sobs = 10

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

EUtree, Sobs = 19

0.0 0.2 0.4 0.6 0.8 1.00.0

0.5

1.0

1.5

2.0

2.5

3.0

Barsea, Sobs = 4

0.0 0.2 0.4 0.6 0.8 1.00.0

0.5

1.0

1.5

2.0

2.5

3.0

Barsea, Sobs = 17

0.0 0.2 0.4 0.6 0.8 1.00.0

0.5

1.0

1.5

2.0

2.5

3.0

Barsea, Sobs = 32

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

EUmams, Sobs = 1

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

EUmams, Sobs = 35

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

EUmams, Sobs = 70

Probability of occurrence

Figure 2 Fitted beta distributionprobability density functions for theoriginal (blue) and adjusted (red) pj atlow (left column), medium (middlecolumn), and high (right column)richness sites for the Wadden Seamacrobenthos (first row), Europeanforest trees (second row), Barents Seatrawls (third row) and Europeanmammals (fourth row) data sets.

Stacking species distribution models

Global Ecology and Biogeography, 23, 99–112, © 2013 John Wiley & Sons Ltd 107

study of Canadian butterflies, Algar et al. (2009) found very high

agreement both between MEMs and S-SDMs and between both

methods and observed richness patterns. Similarly, Dubuis et al.

(2011) report that correctly stacked S-SDMs and MEMs show

very similar richness predictions (their Figure 1a,c) and make

very similar predictions for the effect of elevation (their

Figure 2c,d). These six studies suggest that correctly stacked

S-SDMs do not exhibit a systematic tendency to overpredict

richness, as is frequently claimed in the literature.

Interestingly, S-SDMs and MEMs for all six data sets (our

four; Algar et al., 2009; and Dubuis et al., 2011) show the same

tendency to overestimate species richness in species-poor sites,

and to underestimate it on species-rich sites (Fig. 1; Appendix

S4). Of our examples, the EU mammals data set was the only

case where an MEM could potentially be used together with our

likelihood-based adjustment method to appreciably improve

S-SDM performance. Although the MEM for this example was

highly correlated with observed richness (R2 = 0.997), the

S-SDM was almost as good (R2 = 0.87). We have yet to encoun-

ter an example where the MEM performed very well and a

correctly stacked S-SDM performed poorly. More examples will

need to appear in the literature before we can conclude whether

or not MEMs and S-SDMs can productively be used together in

the manner suggested by Guisan & Rahbek (2011), but the avail-

able evidence is not encouraging.

Our maximum-likelihood approach to adjusting the pj sug-

gests that there is a systematic pattern of deviation that decreases

with increasing maximum site-level richness. Furthermore,

examining the adjusted pj in detail revealed that the rarer species

(small pj,k) typically required stronger correction, in both

species-rich and species-poor sites, to bring S-SDMs into line

with observed richness. Although it is tempting to speculate

about the origins of these patterns, we urge caution as they are

based on only four data sets. However, we have provided the

tools to examine patterns of error in a wide range of S-SDMs. If

similar patterns are observed across many more examples, these

regularities could potentially be used to correct bias in S-SDM

predictions.

Based on our examination of stacking methods and our com-

parison of correctly stacked S-SDMs and MEMs, we found no

evidence of systematic differences between the two model types.

In other words, the core observation on which SESAM is based

– that S-SDMs consistently overpredict richness whereas MEMs

do not – appears to be the result of a statistical artifact intro-

duced by using thresholding schemes to produce S-SDMs.

What are the causes of biased predictions?

We suspect that the tendency for both correctly stacked S-SDMs

and MEMs to overpredict in species-poor sites and underpredict

in species-rich sites may have several causes, which are more

likely to be statistical than ecological. First, it is well known that

issues of sample size plague the estimation of SDMs for rare

species (e.g. Jiménez-Valverde et al., 2009). Also, richness pat-

Rel

ativ

e ad

just

men

t

0 2 4 6 8 100

1

2

3

4

5A

Rare (0.05 quantile)

Common (0.95 quantile)

0 5 10 150

1

2

3

4

5B

0 5 10 15 20 25 300

1

2

3

4

5C

0 10 20 30 40 50 60 700

1

2

3

4

5D

Observed richness

Figure 3 The relative effect ofadjustment on rare (green, 0.05quantile) and common (brown, 0.95quantile) species as a function ofobserved species richness. The relativeadjustment is the ratio of the adjusted,Qadj, to original, Qorg, value of the focalquantile, averaged over all sites sharingthe same richness within a given dataset. Only richness values observed at � 5sites within a data set are shown.Relative adjustment values > 1 indicatethe occurrence probabilities for the focalspecies group have been increased byadjustment, values < 1 indicate adecrease, and values = 1 signify nochange. A horizontal line for relativeadjustment = 1 is shown for reference.The panels are arranged as in Fig. 1. Thescale of the y-axis is the same in allpanels to emphasize the differentstrengths of adjustment.

J. M. Calabrese et al.

Global Ecology and Biogeography, 23, 99–112, © 2013 John Wiley & Sons Ltd108

terns are largely driven by common species (i.e. those with a

high prevalence), as shown by Jetz & Rahbek (2002) and more

systematically by Lennon et al. (2004) and Šizling et al. (2009).

A signal of prevalence in model accuracy has been consis-

tently reported, with SDMs for both common and rare species

being less reliable than those for species of intermediate preva-

lence (McPherson et al., 2004; Santika, 2011). It could thus be

that both MEM and S-SDM patterns are largely determined by

common species, which we may fail to represent properly in our

models.

Second, the high consistency between S-SDMs and MEMs

suggests a common underlying cause of their biases. The

observed bias in S-SDMs may have nothing to do with SDM

model accuracy or neglecting species interactions during stack-

ing, but may instead be caused by ‘regression dilution’ or ‘attenu-

ation’ (Madansky, 1959; MacMahon et al., 1990; McInerny &

Purves, 2011). This refers to a bias in the estimation of the

strength of an effect due to unaccounted variability in the pre-

dictor (see Frost & Thompson, 2000, for a review). In other

words, because environmental predictors may not represent the

habitat conditions of the species but rather the average across

the whole grid cell, the former are represented by the latter with

a large error. This error could lead to underestimation of the

strength of the relationship between predictor (say, tempera-

ture) and the probability of occurrence. If the true function was,

for example, linear with an intercept of 2 and a slope of 2,

regression dilution could lead to estimates of 3 and 1.5, respec-

tively. Thus, the intercept would compensate for lower slope

estimates. As habitat preferences among species are on average

positively correlated (otherwise we would not see any pattern of

species richness along environmental gradients), regression

dilution will lead to underestimation of occurrence at suitable

sites and overestimation at unsuitable sites, which is exactly the

pattern we found.

If this hypothesis is correct, analysing species richness directly

should yield a very similar discrepancy, because MEMs fre-

quently use the same environmental predictors as S-SDMs.

Although speculative at the moment, the issue of subscale vari-

ability has been on the list of problems of SDMs for many years

(e.g. Rahbek & Graves, 2000, 2001; Vaughan & Ormerod, 2003;

Rahbek, 2005; Beck et al., 2012), and we have simply described a

particular way in which it may be manifested.

CONCLUSIONS

The use of S-SDMs to predict species richness is still in its

infancy and many problems remain to be solved. We have

attempted to put the process of stacking S-SDMs on firmer

statistical ground by demonstrating the correct way to build

S-SDMs. Our results strongly suggest that the use of ad hoc

stacking methods based on thresholding introduces a systematic

bias in S-SDM richness predictions that can account for the

frequently noted discrepancies between S-SDMs and MEMs.

These results therefore cast substantial doubt on the core

assumptions of the SESAM framework – that S-SDMs system-

atically overpredict richness because they lack dispersal filters

and ecological assembly rules. We recommend that future inves-

tigations into the relationship between S-SDMs and MEMs first

rule out artifactual causes for differences between these model-

ling frameworks before invoking biotic mechanisms. We also

suggest that more effort be directed at studying the extent to

which regression dilution can account for the similar biases

observed in MEMs and correctly stacked S-SDMs.

ACKNOWLEDGEMENTS

We are grateful to Renate Köble at the EC Joint Research Centre in

Ispra for providing the EU tree data, and to Christian Hof, Holger

Kreft and two anonymous reviewers for comments on an earlier

version of the manuscript. The benthic monitoring carried out in

the Dutch Wadden Sea was initiated, organized and led by

Theunis Piersma and Anne Dekinga, and received full support

from the Royal Netherlands Institute for Sea Research. A large

number of colleagues, students and volunteers contributed to the

collection of the field data. G.C. was funded by the BarEcoRe

project (NRC contract number 200793/S30), under the Norwe-

gian Research Council NORKLIMA programme. C.K. was sup-

ported by the Marsden Fund Council from Government funding,

administered by the Royal Society of New Zealand.

REFERENCES

Algar, A.C., Kharouba, H.M., Young, E.R. & Kerr, J.T. (2009)

Predicting the future of species diversity: macroecological

theory, climate change, and direct tests of alternative forecast-

ing methods. Ecography, 32, 22–33.

Aranda, S.C. & Lobo, J.M. (2011) How well does presence-only-

based species distribution modelling predict assemblage

diversity? A case study of the Tenerife flora. Ecography, 34,

31–38.

Barbet-Massin, M., Thuiller, W. & Jiguet, F. (2010) How much

do we overestimate future local extinction rates when restrict-

ing the range of occurrence data in climate suitability models?

Ecography, 33, 878–886.

Beck, J., Ballesteros-Mejia, L., Buchmann, C.M., Dengler, J.,

Fritz, S.A., Gruber, B., Hof, C., Jansen, F., Knapp, S., Kreft, H.,

Schneider, A.-K., Winter, M. & Dormann, C. (2012) What’s

on the horizon for macroecology? Ecography, 35, 673–683.

Brown, J.H. (1995) Macroecology. University of Chicago Press,

Chicago, IL.

Buisson, L., Thuiller, W., Casajus, N., Lek, S. & Grenouillet, G.

(2010) Uncertainty in ensemble forecasting of species distri-

bution. Global Change Biology, 16, 1145–1157.

Casella, G. & Berger, R. (2002) Statistical inference. Duxbury

Press, Belmont, CA.

Clark, J.S. (2007) Models for ecological data: an introduction.

Princeton University Press, Princeton, NJ.

Dormann, C.F., Gruber, B., Winter, M. & Herrmann, D. (2010)

Evolution of climate niches in European mammals? Biology

Letters, 6, 229–232.

Dormann, C.F., Elith, J., Bacher, S., Buchmann, C., Carl, G.,

Carré, G., García Marquéz, J.R., Gruber, B., Lafourcade, B.,

Stacking species distribution models

Global Ecology and Biogeography, 23, 99–112, © 2013 John Wiley & Sons Ltd 109

Leitão, P.J., Münkemüller, T., McClean, C., Osborne, P.E.,

Reineking, B., Schröder, B., Skidmore, A.K., Zurell, D. & Lau-

tenbach, S. (2013) Collinearity: a review of methods to deal

with it and a simulation study evaluating their performance.

Ecography, 36, 27–46.

Dubuis, A., Pottier, J., Rion, V., Pellissier, L., Theurillat, J. &

Guisan, A. (2011) Predicting spatial patterns of plant species

richness: a comparison of direct macroecological and species

stacking modelling approaches. Diversity and Distributions,

17, 1122–1131.

Elith, J., Graham, C.H., Anderson, R.P. et al. (2006) Novel

methods improve prediction of species’ distributions from

occurrence data. Ecography, 29, 129–151.

Elith, J., Leathwick, J.R. & Hastie, T. (2008) A working guide to

boosted regression trees. Journal of Animal Ecology, 77, 802–

813.

Elith, J., Kearney, M. & Phillips, S. (2010) The art of modelling

range-shifting species. Methods in Ecology and Evolution, 1,

330–342.

Erasmus, B.F.N., Van Jaarsveld, A.S., Chown, S.L., Kshatriya, M.

& Wessels, K.J. (2002) Vulnerability of South African animal

taxa to climate change. Global Change Biology, 87, 679–693.

Fernandez, M. & Williams, S. (2010) Closed-form expression for

the Poisson-binomial probability density function. IEEE

Transactions on Aerospace and Electronic Systems, 46, 803–817.

Ferrier, S. & Guisan, A. (2006) Spatial modelling of biodiversity

at the community level. Journal of Applied Ecology, 43, 393–

404.

Ferrier, S., Watson, G., Pearce, J. & Drielsma, M. (2002)

Extended statistical approaches to modelling spatial pattern in

biodiversity in northeast New South Wales. I. Species-level

modelling. Biodiversity and Conservation, 11, 2275–2307.

Fielding, A.H. & Bell, J.F. (1997) A review of methods for the

assessment of prediction errors in conservation presence/

absence models. Environmental Conservation, 24, 38–49.

Fitzpatrick, M.C., Sanders, N.J., Ferrier, S., Longino, J.T., Weiser,

M.D. & Dunn, R. (2011) Forecasting the future of biodiver-

sity: a test of single- and multi-species models for ants in

North America. Ecography, 34, 836–847.

Franklin, J. (2009) Mapping species distributions: spatial inference

and prediction. Cambridge University Press, Cambridge, UK.

Frost, C. & Thompson, S.G. (2000) Correcting for regression

dilution bias: comparison of methods for a single predictor

variable. Journal of the Royal Statistical Society A, 163, 173–

189.

Gaston, K.J. (2000) Global patterns in biodiversity. Nature, 405,

220–227.

Gaston, K.J. & Blackburn, T.M. (2000) Pattern and process in

macroecology. Blackwell, Malden, MA.

Graham, C.H. & Hijmans, R.J. (2006) A comparison of methods

for mapping species ranges and species richness. Global

Ecology and Biogeography, 15, 578–587.

Guisan, A. & Rahbek, C. (2011) SESAM – a new framework

integrating macroecological and species distribution models

for predicting spatio-temporal patterns of species assem-

blages. Journal of Biogeography, 38, 1433–1444.

Guisan, A., Weiss, S.B. & Weiss, A.D. (1999) GLM versus CCA

spatial modeling of plant species distribution. Plant Ecology,

143, 107–122.

Hawkins, B.A. & Diniz-Filho, J.A.F. (2004) ‘Latitude’ and geo-

graphic patterns in species richness. Ecography, 27, 268–272.

Hijmans, R.J., Cameron, S.E., Parra, J.L., Jones, P.G. & Jarvis, A.

(2005) Very high resolution interpolated climate surfaces for

global land areas. International Journal of Climatology, 25,

1965–1978.

Hong, Y. (2011) poibin: the Poisson binomial distribution. Avail-

able at: http://cran.r-project.org/web/packages/poibin/index

.html (accessed 11 July 2013).

Hortal, J., De Marco, P., Jr, Santos, A.M.C. & Diniz-Filho, J.A.F.

(2012) Integrating biogeographical processes and local com-

munity assembly. Journal of Biogeography, 39, 627–628.

Jakobsen, T. & Ozhigin, V. (eds) (2011) The Barents Sea. Ecosys-

tem, resources, management. Half a century of Russian–

Norwegian cooperation. Tapir Academic Press, Trondheim,

Norway.

Jetz, W. & Rahbek, C. (2002) Geographic range size and deter-

minants of avian species richness. Science, 297, 1548–1551.

Jiménez-Valverde, A., Lobo, J.M. & Hortal, J. (2009) The effect of

prevalence and its interaction with sample size on the reliabil-

ity of species distribution models. Community Ecology, 10,

196–205.

Johannesen, E., Høines, Å.S., Dolgov, A.V. & Fossheim, M.

(2012) Demersal fish assemblages and spatial diversity pat-

terns in the Arctic-Atlantic transition zone in the Barents Sea.

PLoS ONE, 7, e34924.

Jones, M.C. (2009) Kumaraswamy’s distribution: a beta-type

distribution with some tractability advantages. Statistical

Methodology, 6, 70–81.

Kaschner, K., Tittensor, D.P., Ready, J., Gerrodette, T. & Worm,

B. (2011) Current and future patterns of global marine

mammal biodiversity. PLoS ONE, 6, e19653.

Kerr, J.T., Kharouba, H.M. & Currie, D.J. (2007) The macroeco-

logical contribution to global change solutions. Science, 316,

1581–1584.

Köble, R. & Seufert, G. (2001) Novel maps for forest tree species

in Europe. Proceedings of the 8th European Symposium on the

Physico-Chemical Behaviour of Air Pollutants: ‘A Changing

Atmosphere!’, Turin, Italy, 17–20 September 2001.

Kraan, C., Piersma, T., Dekinga, A., Koolhaas, A. & van der Meer,

J. (2007) Dredging for edible cockles (Cerastoderma edule) on

intertidal flats: short-term consequences of fisher patch-

choice decisions for target and non-target benthic fauna. ICES

Journal of Marine Science, 64, 1735–1742.

Kraan, C., van Gils, J.A., Spaans, B., Dekinga, A., Bijleveld, A.I.,

van Roomen, M., Kleefstra, R. & Piersma, T. (2009)

Landscape-scale experiment demonstrates that Wadden Sea

intertidal flats are used to capacity by molluscivore migrant

shorebirds. Journal of Animal Ecology, 78, 1259–1268.

Kraan, C., Aarts, G., van der Meer, J. & Piersma, T. (2010) The

role of environmental variables in structuring landscape-scale

species distributions in seafloor habitats. Ecology, 91, 1583–

1590.

J. M. Calabrese et al.

Global Ecology and Biogeography, 23, 99–112, © 2013 John Wiley & Sons Ltd110

Kraan, C., Aarts, G., Piersma, T. & Dormann, C.F. (2013) Tem-

poral variability of ecological niches: a study on intertidal

macrobenthic fauna. Oikos, 122, 754–760.

Lehmann, A., Overton, J.M. & Austin, M.P. (2002) Regression

models for spatial prediction: their role for biodiversity and

conservation. Biodiversity and Conservation, 11, 2085–2092.

Lennon, J.J., Koleff, P., Greenwood, J.J.D. & Gaston, K.J. (2004)

Contribution of rarity and commonness to patterns of species

richness. Ecology Letters, 7, 81–87.

Liu, C., Berry, P.M., Dawson, T.P. & Pearson, R.G. (2005) Select-

ing thresholds of occurrence in the prediction of species dis-

tributions. Ecography, 28, 385–393.

Loiselle, B.A., Howell, C.A., Graham, C.H., Goerck, J.M., Brooks,

T., Smith, K.G. & Williams, P.H. (2003) Avoiding pitfalls of

using species distribution models in conservation planning.

Conservation Biology, 17, 1591–1600.

Loiselle, B.A., Jørgensen, P.M., Consiglio, T., Jiménez, I., Blake,

J.G., Lohmann, L.G. & Montiel, O.M. (2007) Predicting

species distributions from herbarium collections: does

climate bias in collection sampling influence model out-

comes? Journal of Biogeography, 35, 105–116.

McInerny, G.J. & Purves, D.W. (2011) Fine-scale environmental

variation in species distribution modelling: regression dilu-

tion, latent variables and neighbourly advice. Methods in

Ecology and Evolution, 2, 248–257.

MacMahon, S., Peto, R., Cutler, J., Collins, R., Sorlie, P., Neaton,

J., Abbott, R., Godwin, J., Dyer, A. & Stamler, J. (1990) Blood

pressure, stroke, and coronary heart disease: part 1, prolonged

differences in blood pressure: prospective observational

studies corrected for the regression dilution bias. The Lancet,

335, 765–774.

McPherson, J.M., Jetz, W. & Rogers, D.J. (2004) The effects of

species’ range sizes on the accuracy of distribution models:

ecological phenomenon or statistical artefact? Journal of

Applied Ecology, 41, 811–823.

Madansky, A. (1959) The fitting of straight lines when both

variables are subject to error. Journal of the American Statisti-

cal Association, 54, 173–205.

Mateo, R.G., Felicísimo, Á.M., Pottier, J., Guisan, A. & Muñoz, J.

(2012) Do stacked species distribution models reflect altitu-

dinal diversity patterns? PLoS ONE, 7, e32586.

Peppler-Lisbach, C. & Schröder, B. (2004) Predicting the

species composition of Nardus stricta communities by logistic

regression modelling. Journal of Vegetation Science, 15, 623–

634.

Peterson, A.T., Soberón, J., Pearson, R.G., Anderson, R.P.,

Martínez-Meyer, E., Nakamura, M. & Araújo, M. (2011) Eco-

logical niches and geographic distributions. Monographs in

Population Biology 49. Princeton University Press, Princeton,

NJ.

Phillips, S.J. & Dudík, M. (2008) Modeling of species distribu-

tions with Maxent: new extensions and a comprehensive

evaluation. Ecography, 31, 161–175.

Phillips, S.J., Anderson, R.P. & Schapire, R.E. (2006) Maximum

entropy modeling of species geographic distributions. Ecologi-

cal Modelling, 190, 231–259.

Pineda, E. & Lobo, J.M. (2009) Assessing the accuracy of species

distribution models to predict amphibian species richness

patterns. Journal of Animal Ecology, 78, 182–190.

Pineda, E. & Lobo, J.M. (2012) The performance of range maps

and species distribution models representing the geographic

variation of species richness at different resolutions. Global

Ecology and Biogeography, 21, 935–944.

R Development Core Team (2012) R: a language and environ-

ment for statistical computing. R Foundation for Statistical

Computing, Vienna, Austria.

Raes, N., Roos, M.C., Slik, J.W.F., Van Loon, E.E. & ter Steege, H.

(2009) Botanical richness and endemicity patterns of Borneo

derived from species distribution models. Ecography, 32, 180–

192.

Rahbek, C. (2005) The role of spatial scale and the perception of

large-scale species-richness patterns. Ecology Letters, 8, 224–

239.

Rahbek, C. & Graves, G.R. (2000) Detection of macro-ecological

patterns in South American hummingbirds is affected by

spatial scale. Proceedings of the Royal Society B: Biological Sci-

ences, 267, 2259–2265.

Rahbek, C. & Graves, G.R. (2001) Multiscale assessment of pat-

terns of avian species richness. Proceedings of the National

Academy of Sciences USA, 98, 4534–4539.

Randin, C.F., Engler, R., Normand, S., Zappa, M., Zimmermann,

N.E., Pearman, P.B., Vittoz, P., Thuiller, W. & Guisan, A.

(2009) Climate change and plant distribution: local models

predict high-elevation persistence. Global Change Biology, 15,

1557–1569.

Renner, I.W. & Warton, D.I. (2013) Equivalence of MAXENT

and Poisson point process models for species distribution

modeling in ecology. Biometrics, 69, 274–281.

Rocchini, D., Hortal, J., Lengyel, S., Lobo, J.M., Jiménez-

Valverde, A., Ricotta, C., Bacaro, G. & Chiarucci, A. (2011)

Accounting for uncertainty when mapping species distribu-

tions: the need for maps of ignorance. Progress in Physical

Geography, 35, 211–226.

Rondinini, C., Di Marco, M., Chiozza, F., Santulli, G., Baisero,

D., Visconti, P., Hoffmann, M., Schipper, J., Stuart, S.N.,

Tognelli, M.F., Amori, G., Falcucci, A., Maiorano, L. & Boitani,

L. (2011) Global habitat suitability models of terrestrial

mammals. Philosophical Transactions of the Royal Society B:

Biological Sciences, 366, 2633–2641.

Santika, T. (2011) Assessing the effect of prevalence on the

predictive performance of species distribution models

using simulated data. Global Ecology and Biogeography, 20,

181–192.

Schmidt-Lebuhn, A.N., Knerr, N.J. & González-Orozco, C.E.

(2012) Distorted perception of the spatial distribution of

plant diversity through uneven collecting efforts: the example

of Asteraceae in Australia. Journal of Biogeography, 39, 2072–

2080.

Scott, J.M., Heglund, P.J., Morrison, M.L., Haufler, J.B., Raphael,

M.G., Wall, W.A. & Samson, F.B. (2002) Predicting species

occurrences: issues of accuracy and scale. Island Press,

Washington, DC.

Stacking species distribution models

Global Ecology and Biogeography, 23, 99–112, © 2013 John Wiley & Sons Ltd 111

Šizling, A.L., Šizlingová, E., Storch, D., Reif, J. & Gaston, K.J.

(2009) Rarity, commonness, and the contribution of indi-

vidual species to species richness patterns. The American

Naturalist, 174, 82–93.

Skov, F. & Borchsenius, F. (1997) Predicting plant species

distribution patterns using simple climatic parameters:

a case study of Ecuadorian palms. Ecography, 20, 347–

355.

Temple, H.J. & Terry, A. (2007) The status and distribution of

European mammals. Office for Official Publications of the

European Communities, Luxemburg.

Thuiller, W. (2003) BIOMOD – optimizing predictions

of species distributions and projecting potential future

shifts under global change. Global Change Biology, 9, 1353–

1362.

Thuiller, W., Lafourcade, B., Engler, R. & Araújo, M.B. (2009)

BIOMOD – a platform for ensemble forecasting of species

distributions. Ecography, 32, 369–373.

Vaughan, I.P. & Ormerod, S.J. (2003) Improving the quality of

distribution models for conservation by addressing short-

comings in the field collection of training data. Conservation

Biology, 17, 1601–1611.

Wang, Y.H. (1993) On the number of successes in independent

trials. Statistica Sinica, 3, 295–312.

SUPPORTING INFORMATION

Additional supporting information may be found in the online

version of this article at the publisher’s web-site.

Appendix S1 GLM-based results.

Appendix S2 Single, species-specific threshold when the pk

values follow a Kumaraswamy distribution.

Appendix S3 A comparison of S-SDMs with and without

thresholding.

Appendix S4 Additional figures.

BIOSKETCH

Justin Calabrese is a quantitative ecologist at the

Smithsonian Conservation Biology Institute. His work

mixes ecological theory, statistics and probability

models, and empirical data. His research interests range

broadly and include studying the drivers of animal

movement, exploring how phenology affects individual

fitness and population dynamics, and understanding

the factors governing parasite/host interactions.

Editor: José Alexandre Diniz-Filho

J. M. Calabrese et al.

Global Ecology and Biogeography, 23, 99–112, © 2013 John Wiley & Sons Ltd112