environmental modeling advanced weighting of gis layers
TRANSCRIPT
Environmental ModelingEnvironmental ModelingAdvanced Weighting of Advanced Weighting of
GIS LayersGIS Layers
1. Issue1. Issue► Modeling the habitat of red squirrel in Modeling the habitat of red squirrel in the Mt. Graham areathe Mt. Graham area
► Red squirrel prefer a shaded and humid Red squirrel prefer a shaded and humid environment and feed on pine cones, that environment and feed on pine cones, that are offered by Mt. Grahamare offered by Mt. Graham
► The issue is whether the construction of The issue is whether the construction of an astronomy observatory will affect the an astronomy observatory will affect the habitat habitat
Pereira, J.M.C., and R.M. Itami, 1991. GIS-based habitat modeling using Pereira, J.M.C., and R.M. Itami, 1991. GIS-based habitat modeling using logistic multiple regression: a study of the Mt. Graham Red Squirrel. logistic multiple regression: a study of the Mt. Graham Red Squirrel. Photogrammetric Engineering and Remote Sensing, 57(11):1475-1486. Photogrammetric Engineering and Remote Sensing, 57(11):1475-1486.
2. Factors2. Factors
a. Topography:a. Topography: b. Vegetation:b. Vegetation:
ElevationElevation Land Land covercover
SlopeSlope Canopy Canopy closureclosure
Aspect (e-w)Aspect (e-w) Food Food productivityproductivity
Aspect (n-s) Aspect (n-s) Tree diameter Tree diameter
Distance to openness (canopy closure and Distance to openness (canopy closure and roads)roads)
3. Raw Data3. Raw Data
DEMDEM
Vegetation coverVegetation cover
RoadsRoads
200 presence sites (observed)200 presence sites (observed)
200 absence sites (randomly located)200 absence sites (randomly located)
4. Logistic Regression4. Logistic Regression
► The dependent variable is dichotomous The dependent variable is dichotomous (on/off, 1/0, presence/absence)(on/off, 1/0, presence/absence)
► The independent variable can be numeric The independent variable can be numeric (ratio data) or categorical (nominal (ratio data) or categorical (nominal data), ranking (ordinal data), or data), ranking (ordinal data), or interval (interval data)interval (interval data)
► The method is widely used in natural The method is widely used in natural resources and human impact related resources and human impact related projectsprojects
At each of the 400 locations, collect both dependent and the independent variables
4. LR - dependent variable4. LR - dependent variable
► Dependent variable: presence/absenceDependent variable: presence/absence► A total of 400 sitesA total of 400 sites
for the 200 presence sites: dep value for the 200 presence sites: dep value = 1= 1
for the 200 absence sites: dep for the 200 absence sites: dep value = 0value = 0
4. LR - independent 4. LR - independent variablesvariables
► Independent variables (14)Independent variables (14)
the continuous variables (1-5, ratio the continuous variables (1-5, ratio data)data)
1. Elevation1. Elevation
2. slope 2. slope
3. aspect (e-w)3. aspect (e-w)
4. aspect (n-s)4. aspect (n-s)
5. distance to openness 5. distance to openness (buffer to roads or to land cover)(buffer to roads or to land cover)
4. LR - circular var4. LR - circular var
45~13545~13500(e) vs. (e) vs. 22522500~315~31500(w)(w)
-45-4500~45~4500(n) (n) vs.135vs.13500~225~22500(s)(s)
Sin0 = 0Cos0 = 1
Sin180 = 0Cos180 = -1
Sin270 = -1Cos270 = 0
Sin90 = 1Cos90 = 0 3. 4. Aspect is a 3. 4. Aspect is a
circular variable. To circular variable. To differentiate its differentiate its circular values, divide circular values, divide it into e-w|n-s, or use it into e-w|n-s, or use sin or cos.sin or cos.
Extract Distance InfoExtract Distance Info
1. Calculate the distance1. Calculate the distance
The vector wayThe vector way
use point-to-line distance, use point-to-line distance,
or point-to-point distanceor point-to-point distance
The raster wayThe raster way
use “distance”use “distance”
in Spatial Analystin Spatial Analyst
4. LR - categorical ind var4. LR - categorical ind var
The categorical ind variables 6-14 The categorical ind variables 6-14 (nominal, ordinal, or interval(nominal, ordinal, or interval data)data)
6-8. Food productivity 6-8. Food productivity
9-11. Canopy closure9-11. Canopy closure
12-14. Tree diameter12-14. Tree diameter
4. LR - categorical ind var4. LR - categorical ind varFood productivity: variable 6-8Food productivity: variable 6-8► four categories: high, medium, low, none four categories: high, medium, low, none ► each is 1 or 0each is 1 or 0► for sites that have a high productivity, for sites that have a high productivity, high = 1, high = 1,
for the same site, medium=0, low=0 for the same site, medium=0, low=0 ► for sites that have a medium productivity, for sites that have a medium productivity, high=0, medium=1, low=0,high=0, medium=1, low=0,► ......
Only three of the four variables will show in Only three of the four variables will show in the regression. The remaining one is used as a the regression. The remaining one is used as a referencereference
4. LR - categorical ind var4. LR - categorical ind var
Canopy closure: variable 9-11Canopy closure: variable 9-11► Four categories: high, medium, low, Four categories: high, medium, low, and noneand none
► Three variables: high, medium, lowThree variables: high, medium, low
Tree dbh: variable 12-14Tree dbh: variable 12-14► Four categories: Four categories:
> 25cm, 15-25cm, 0-15cm, no trees> 25cm, 15-25cm, 0-15cm, no trees► Three variablesThree variables
5. Statistical Testing5. Statistical Testing
► t test for continuous ind variablest test for continuous ind variables
for each variable, say elevationfor each variable, say elevation
HH00: mean1 = mean2: mean1 = mean2
► 22 test for categorical ind test for categorical ind variables, say food productivityvariables, say food productivity
four categoriesfour categories
observed count, expected count.observed count, expected count.
► Land cover types of the area and at bear Land cover types of the area and at bear sighting sitessighting sites
Cover type %Area Expected# Cover type %Area Expected# Actual#Actual#
Douglas FirDouglas Fir 10.110.1 9.29.2 7 7Subalpine fir Subalpine fir 10.210.2 9.39.3 10 10Whitebark pineWhitebark pine 2.2 2.2 1.51.5 8 8Mountain hemlock Mountain hemlock 3.8 3.8 3.53.5 5 5Pacific silver firPacific silver fir 8.4 8.4 7.77.7 4 4Western hemlock Western hemlock 10.1 10.1 9.29.2 7 7Hardwood forestHardwood forest 1.2 1.2 1.11.1 0 0Tall shrubTall shrub 4.9 4.9 4.54.5 4 4Lowland herbLowland herb 8.5 8.5 7.77.7 12 12…… …… … ….. ….. .. ….. …. ….
TotalTotal 100% 100% 9191 91 91
6. Data Partition6. Data Partition
► Data partition for model development Data partition for model development and model validationand model validation
► 75% of sites are used to develop the 75% of sites are used to develop the logistic modellogistic model
150 presence sites and 150 absence 150 presence sites and 150 absence sitessites
► 25% for model validation25% for model validation
50 presence sites and 50 absence sites50 presence sites and 50 absence sites
7. The Logistic Model7. The Logistic Model
► Logistic model is sensitive to the Logistic model is sensitive to the middle range values of an ind varmiddle range values of an ind var
Y = bY = b00 + b + b11XX11 + b + b22XX22 + … + b + … + bnnXXnn
P(Y) = 1/[1 + exp P(Y) = 1/[1 + exp (-Y(-Y)])]
7. The Logistic Model7. The Logistic Model
YY = = 0.0020.002ele ele - 0.228- 0.228slope slope + + 0.6850.685canopy(high) canopy(high)
+ 0.443+ 0.443canopy(medium) canopy(medium) + + 0.4810.481canopy(low) canopy(low)
+ 0.009+ 0.009aspect(e-w)aspect(e-w)
P P (Y) = 1/[1 + exp (Y) = 1/[1 + exp (-(-YY)] )]
PP - The probability of red squirrel - The probability of red squirrel habitathabitat
8. Accuracy Assessment8. Accuracy Assessment
► Decide a cut-off value for P Decide a cut-off value for P
The convention is 0.5The convention is 0.5
► Convert the P values into two Convert the P values into two categoriescategories
site value < 0.5: unsuitablesite value < 0.5: unsuitable
site value ≥ 0.5: suitablesite value ≥ 0.5: suitable
8. Accuracy Assessment8. Accuracy Assessment
Mapped Category True Category Primary Secondary Total Oak >= 50% 15 5 20Oak < 50 3 52 55Total 18 57 75
Accuracy of the oak forest map: 89%Accuracy of the oak forest map: 89%
Error MatrixError Matrix
8. Accuracy Assessment8. Accuracy Assessment
► Error Matrix Error Matrix
for the 150 presence and 150 absence for the 150 presence and 150 absence sites that are used to develop the sites that are used to develop the logistic modellogistic model
ModeledModeled presence absence total presence absence total
accuracyaccuracy presence presence 123123 27 150 27 150
absence 36absence 36 114114 150 150
300300
Trut
Trut
hh
82%82%76%76%
Overall accuracy = (123+114)/300 Overall accuracy = (123+114)/300 = 79%= 79%
8. Model Validation8. Model Validation
► Error Matrix Error Matrix
for the 50 presence and 50 absence sites for the 50 presence and 50 absence sites that are put aside for model validationthat are put aside for model validation
ModeledModeled presence absence total presence absence total
accuracyaccuracy presence presence 3737 13 50 13 50
absence 16absence 16 3434 50 50
100 100
Trut
Trut
hh
74%74%
68%68%
71%71%
9. GIS Overlay9. GIS Overlay
Y= elevation*Y= elevation*0.0020.002 + slope* + slope*-0.228-0.228 + +
canopy closure (assign canopy closure (assign 0.6850.685 for all for all cells=high, cells=high, 0.4430.443 for cells=medium, for cells=medium, 0.4810.481 for cells=low) + for cells=low) +
aspect (e-w)*aspect (e-w)*0.0090.009
P(Y) = 1/[1 + exp P(Y) = 1/[1 + exp (-Y(-Y)] )]
► Keep the output as a continuous Keep the output as a continuous probability map or a suitable/unsuitable probability map or a suitable/unsuitable mapmap