2013_cse-srs_poster
TRANSCRIPT
42
Methods
If the geographical space defined by a given leaf has enough utility, a model is run on the environmental and occurrence data within the specified space. Utility is defined as the ratio of observed species locations over the total geographic space.
Once all the base models have generated predictions, these predic-tions are aggregated, with each base model’s predictions being weighted by the relative size of the geographic extent used to train the model. Finally, each prediction is computed as the weighted average prediction of all base models at that location. Our work uses Maximum Entropy (Maxent) modeling as the base model.
+
+HDDT
OverallPredictions
Model TrainingPartitioned Data(Occurrences + Environment)
AggregatePredictions
GenerateBase Model
PartitionData
We propose a model that recursively partitions the geographic space into regions appropriately sized as input into local or “base” models, and then aggregates the weighted predictions of the base models as the final prediction.
We use a Hellinger Distance Decision Tree (HDDT) to recursively partition the space, with each leaf of the tree defining a particular geographic space. As a skew insensitive method, the HDDT model is effective even when the number of species observations is small.
Model
Together, these conditions can define the variables of the geographical space G. Species distribution models are correlative methods that estimate the area with suitable abiotic conditions for species, known as GA, based upon observed locations. Often neglected, however, is the implicit effect of the size of the geographic space itself, which, if too large, is likely to produce a drastic imbalance in the number of observed locations versus unobserved ones and, if too small, may represent only a fraction of the species’ suitable condi-tions. Our work focuses on reducing these effects by ensuring that input data comes from a reasonable geographic extent.
The geographical distribution of a species is defined by the confluence of three factors: biotic conditions, abiotic condi-tions, and movement conditions. Each is elaborated upon in the following diagram.
Background
Species: Vireo belliiData Source: GBIF
The method outperforms the base models by all metrics measured, including AUROC and AUPR. The predictions also display visibly greater fine-grained detail. Shown below is a species distribution prediction generated by the partitioning method for the North American songbird Bell’s Vireo.
Results
Forming knowledge of the potential distri-butions of species is important for the con-tinued development of conservation strate-gies. One method to assist in this process is species distribution modeling, which is the modeling of species’ niche requirements by combining occurrence data with ecological and environmental variables. We develop a method of robustifying species models by partitioning the environmental extent area, which can vary significantly. Decision trees are used to recursively partition the extent, with local predictions aggregated from many base models. The method improves upon state-of-the-art techniques.
Abstract
LA
I
Data, Inference Analytics,and Learning Lab @ ND
Reid A. Johnson Nitesh V. ChawlaComputer Science and Engineering
University of Notre Dame
Recursively Partitioning the Geographic Space Using Decision TreesSpecies Distribution Modeling
Quick Summary
Full Detail