data needs and challenges for the present and future of
TRANSCRIPT
Data Needs and Challenges for the Present
and Future of Species Distribution Modelling
Robert P. Anderson, City University of New York, USA
Miguel Araujo, Museo Nacional de Ciencias Naturales, Spain
Antoine Guisan, University of Lausanne, Switzerland
Jorge M. Lobo, Museo Nacional de Ciencias Naturales, Spain
Enrique Martinez-Meyer, Universidad Nacional Autonoma de Mexico, Mexico
A. Townsend Peterson, University of Kansas, USA
Jorge Soberon, University of Kansas, USA – Group leader
with support from Dmitry Schigel, GBIF Secretariat
Task Group on GBIF Data Fitness for Use in Distribution Modeling
Biodiversity data quality symposium
Sao Paulo, March 2016
A Shift from Quantity to Quality in Primary Biodiversity
Data
Jorge Miguel Lobo
Dpto. Biogeography and Global Change
Nacional de Ciencias Naturales
C.S.I.C. Madrid.
$T$T
$T
$T
$T
$T$T
$T
$T$T
$T$T
$T
$T
$T
$T
$T $T$T$T
$T
$T
$T
$T
$T
$T
$T $T$T
$T$T
$T
$T
$T$T$T$T
$T
$T
$T
$T
$T
$T
$T
$T
$T
$T
$T$T
$T
$T
$T$T$T$T
$T
$T
$T$T
$T$T
$T$T
$T$T
$T
$T$T$T$T$T
$T
$T
$T
$T
$T
$T
$T
$T
$T
$T
$T$T
$T$T
$T$T
$T$T
$T$T
$T$T$T$T
$T$T
$T
$T
$T
$T$T
$T
$T$T
$T$T
$T
$T
$T
$T
$T $T$T$T
$T
$T
$T
$T
$T
$T
$T $T$T
$T$T
$T
$T
$T$T$T$T
$T
$T
$T
$T
$T
$T
$T
$T
$T
$T
$T$T
$T
$T
$T$T$T$T
$T
$T
$T$T
$T$T
$T$T
$T$T
$T
$T$T$T$T$T
$T
$T
$T
$T
$T
$T
$T
$T
$T
$T
$T$T
$T$T
$T$T
$T$T
$T$T
$T$T$T$T
Geog
raph
ic s
pace
Occurrence dataof species
Modeling algorithm(GLM, GAM, MaxEnt, GARP, Bioclim, etc)
Temperature
Moi
stur
e
Niche model
Predicted potential distribution
Output
Eco
logi
cal sp
ace
Environmentalinformation
Projecting backto geography
Input data
SDM / ENM is a correlative analytical approach meant to infer the ecological conditions under which a species live and to represent them geographically
What is Species Distribution Modeling (or Ecological Niche Modeling)?
SDM has become very popular in the last two decades
Rodríguez-Castañeda et al. 2012. PlosOne 7(9): e44402
In part because of its proven usefulness for addressing questions in a variety of fields, in both basic and applied science, such as biogeography, ecology, evolution, conservation biology, and public health, among others
Rodríguez-Castañeda et al. 2012. Plos One 7(9): e44402
Williams et al. 2009. Div & Dist. 15: 565-576Raxworthy et al. 2003. Nature. 426: 837-841
SDM has been successfully used:
In biological survey and exploration
In Taxonomy
Rissler & Apodaca. 2007. Syst Biol. 56: 924-942
Combining genetic and ecological information to strengthen the criteria to delimit cryptic species
In Conservation Biology
Araiza et al. 2012. Cons Biol 26: 630-637
For reintroducing extinct species in the wild
333.6
km2
283.7
km2
385.8
km2
270
km2
381.6
km2
Martínez-Meyer et al. 2004. GEB 13: 305-314
To investigate species’ responses to climate change in the past
Vole (Phenacomys intermedius)
In Climate Change Biology
Identificación de corredores
Williams et al. 2005. Cons. Biol. 19: 1063-1074
Áreas nuevas
Áreas nuevas irremplazables
Áreas flexibles
Identificación de nuevas áreas importantes para la conservación
And proactive actions for the future
In Climate Change Biology
Also because data to produce models are widely available thorough the internet for thousands of species all over the world, and software for modeling and analyses are also freely available, and more and more user-friendly
Lozier et al. 2009. J.Biogeog. 36: 1623-1627
This is generally a good
news, nonetheless the
handiness of data and ease
of use of modeling programs
have caused a euphoria for
the use of SDM, in many
cases ignoring its
assumptions and reaches,
thus abusing the approach,
overselling and degrading the
scientific rigor of the field
Data needs for building SDM
--
-
-
-
-
Presence & absenceIdeally, to characterize the
ecological niche of the
species, we would need data
of conditions that species
like (occurrence data) and
conditions that species
DOES NOT like (absence
data). However, there are
few or no databases
accounting for the absence
of species
?
?
??
?
?
??
?
?
?
??
?
?
?
?
?
??
?
? ?
??
?
??
?
?
?
?
?
?
?
Data needs for building SDM
--
-
-
-
-
Presence/absence
(e.g. ANN)
Presence-only
(e.g. BIOCLIM)
Presence/pseudo-
absence
(e.g. GARP)
?
?
??
?
?
??
?
?
?
??
?
?
?
?
?
??
?
? ?
??
?
??
?
??
?
??
?
?
?
Presence/background
(e.g. Maxent)
There are several sources of PBD
GBIF is the largest platform of data and with the widest taxonomic and geographic coverage with ≈650 million records of +1.6 million species
Monitoring
Specialized Literature Scientific collections &
HerbariaDistributed databases
However, occurrence primary biodiversity data (PBD) are subject to errors and biases from different sources
The two main errors in PBD for SDM are:
Taxonomic
When the identity of the species in the record is wrong
Brown Howler Monkey(Alouatta palliata)
Black Howler Monkey(Alouatta pigra)
Geographic
When the position of the record is wrong
Myocastor coypus
8 mi. S of San Marcial, Tucumán, Argentina
The two main errors in PBD for SDM are:
There are other important biases of biodiversity data for SDM
Geographic/ecologic bias
Geographic/taxonomic/temporal completeness
Errors and biases have an important impact in SD models: Effect of data uncertainty
Thomomys bottae
By John Wieczorek
Effect of uncertaintyModel not accounting for uncertainty
100 kmT. bottae
Uncertainty does have an effect in SDM and has the potential to mislead interpretation and conclusions
As a consequence, an important amount of data downloaded from GBIF is currently being discarded
for analyses
From a survey by the DM-GBIF Task Group to 136 experts
Data quality makes it all in SDMModeling algorithms are quite powerful and they can model anything. But they are dumb: Algorithms do not make decisions, only follow orders
GLM Model
Maxent Model
We make decisions, so we are responsible for model outputs and interpretation
The Future of SDM and data requirements
Kearney & Porter. 2009. Ecol. Letters 12: 334-350
Coupling with other approaches that demand an integration of different sorts if data: physiological data
y = 49.107x-1.9078
R2 = 0.5745
P < 0.001
n = 48
0
10
20
30
40
50
60
0 2 4 6 8 10
Distance to centroid
Ab
un
dan
ce (
ind
/ro
ute
)
© Dick Cannings
The Future of SDM and data requirements
Coupling with other approaches that demand an integration of different data realms: population data
Martínez-Meyer et al. 2013. Biol Lett 9:20120637
Anderson et al. 2009. Proc.Roy.Soc.B 276:1415-1420
The Future of SDM and data requirements
Coupling with other approaches that demand an integration of different data realms: dispersal data
Araújo & Loto. 2007. GEB 16: 743-753
The Future of SDM and data requirements
Coupling with other approaches that demand an integration of different data realms: interaction networks
Challenges
Occurrence data is the cornerstone of SDM. Quantity –but mainly quality– of biodiversity data is key for producing reliable models. In their current form at least three issues limit a faster development of SDM:
1.Data issues: Inaccuracies (Geographic, Taxonomic and Temporal errors and biases)
2.Accessibility issues: Full information of records is often missing or hard to obtain due to inefficient functionality of data processing
3.Use issues: Careless or inappropriate use of data, underproficient knowledge of the taxonomic group, data and/or modeling methods
Stakeholders of PBD (i.e., data providers, data aggregators and data users) need to have clear their role and duties in PDB circulation and use to address and overcome these challenges, as well as a more fluent communication among them
ChallengesSDM progress is seriously hindered by:
a. The biases and gaps in the available biological information
b. The lack of reliable absence data and the use of background or pseudo-absences as an alternative
c. The use of sophisticated and complex modelling methods able to model the own errors and biases in the response variables
d. The absence of an agreed “gold-standard” procedure to validate model outputs.
e. The lack of information about the survey effort carried in each locality
f. The lack of discrimination between poorly and relatively well-surveyed localities
g. Our incapacity to know the location of the minimum number of localities required to generate reliable interpolations to recover the full spectrum of environmental conditions in a territory
the GBIF strategy the SDM strategy
GBIFdata
SDMsModel
validationGaps identification
• To provide a “seal of quality” for the data
• An agreement on the criteria determining data quality.
A Proposal
Conclusions
SDM / ENM is a popular research field still in expansion and with fertile grounds to generate basic and applied knowledge
The present and future of SDM is quite exciting but demands a major effort from all stakeholders to make PBD more abundant, more accessible and more reliable
Integration of PBD with other data realms opens new veins of opportunities to take biodiversity analyses to a next level, thus development of standards and protocols of interconnection with other sources of data are needed, as well as new and creative methodologies for their analysis