environmental modeling advanced weighting of gis layers

Environmental ModelingEnvironmental ModelingAdvanced Weighting of Advanced Weighting of

GIS LayersGIS Layers

1. Issue1. Issue► Modeling the habitat of red squirrel in Modeling the habitat of red squirrel in the Mt. Graham areathe Mt. Graham area

► Red squirrel prefer a shaded and humid Red squirrel prefer a shaded and humid environment and feed on pine cones, that environment and feed on pine cones, that are offered by Mt. Grahamare offered by Mt. Graham

► The issue is whether the construction of The issue is whether the construction of an astronomy observatory will affect the an astronomy observatory will affect the habitat habitat

Pereira, J.M.C., and R.M. Itami, 1991. GIS-based habitat modeling using Pereira, J.M.C., and R.M. Itami, 1991. GIS-based habitat modeling using logistic multiple regression: a study of the Mt. Graham Red Squirrel. logistic multiple regression: a study of the Mt. Graham Red Squirrel. Photogrammetric Engineering and Remote Sensing, 57(11):1475-1486. Photogrammetric Engineering and Remote Sensing, 57(11):1475-1486.

2. Factors2. Factors

a. Topography:a. Topography: b. Vegetation:b. Vegetation:

ElevationElevation Land Land covercover

SlopeSlope Canopy Canopy closureclosure

Aspect (e-w)Aspect (e-w) Food Food productivityproductivity

Aspect (n-s) Aspect (n-s) Tree diameter Tree diameter

Distance to openness (canopy closure and Distance to openness (canopy closure and roads)roads)

3. Raw Data3. Raw Data

DEMDEM

Vegetation coverVegetation cover

RoadsRoads

200 presence sites (observed)200 presence sites (observed)

200 absence sites (randomly located)200 absence sites (randomly located)

4. Logistic Regression4. Logistic Regression

► The dependent variable is dichotomous The dependent variable is dichotomous (on/off, 1/0, presence/absence)(on/off, 1/0, presence/absence)

► The independent variable can be numeric The independent variable can be numeric (ratio data) or categorical (nominal (ratio data) or categorical (nominal data), ranking (ordinal data), or data), ranking (ordinal data), or interval (interval data)interval (interval data)

► The method is widely used in natural The method is widely used in natural resources and human impact related resources and human impact related projectsprojects

At each of the 400 locations, collect both dependent and the independent variables

4. LR - dependent variable4. LR - dependent variable

► Dependent variable: presence/absenceDependent variable: presence/absence► A total of 400 sitesA total of 400 sites

for the 200 presence sites: dep value for the 200 presence sites: dep value = 1= 1

for the 200 absence sites: dep for the 200 absence sites: dep value = 0value = 0

4. LR - independent 4. LR - independent variablesvariables

► Independent variables (14)Independent variables (14)

the continuous variables (1-5, ratio the continuous variables (1-5, ratio data)data)

1. Elevation1. Elevation

2. slope 2. slope

3. aspect (e-w)3. aspect (e-w)

4. aspect (n-s)4. aspect (n-s)

5. distance to openness 5. distance to openness (buffer to roads or to land cover)(buffer to roads or to land cover)

4. LR - circular var4. LR - circular var

45~13545~13500(e) vs. (e) vs. 22522500~315~31500(w)(w)

-45-4500~45~4500(n) (n) vs.135vs.13500~225~22500(s)(s)

Sin0 = 0Cos0 = 1

Sin180 = 0Cos180 = -1

Sin270 = -1Cos270 = 0

Sin90 = 1Cos90 = 0 3. 4. Aspect is a 3. 4. Aspect is a

circular variable. To circular variable. To differentiate its differentiate its circular values, divide circular values, divide it into e-w|n-s, or use it into e-w|n-s, or use sin or cos.sin or cos.

Extract Distance InfoExtract Distance Info

1. Calculate the distance1. Calculate the distance

The vector wayThe vector way

use point-to-line distance, use point-to-line distance,

or point-to-point distanceor point-to-point distance

The raster wayThe raster way

use “distance”use “distance”

in Spatial Analystin Spatial Analyst

4. LR - categorical ind var4. LR - categorical ind var

The categorical ind variables 6-14 The categorical ind variables 6-14 (nominal, ordinal, or interval(nominal, ordinal, or interval data)data)

6-8. Food productivity 6-8. Food productivity

9-11. Canopy closure9-11. Canopy closure

12-14. Tree diameter12-14. Tree diameter

4. LR - categorical ind var4. LR - categorical ind varFood productivity: variable 6-8Food productivity: variable 6-8► four categories: high, medium, low, none four categories: high, medium, low, none ► each is 1 or 0each is 1 or 0► for sites that have a high productivity, for sites that have a high productivity, high = 1, high = 1,

for the same site, medium=0, low=0 for the same site, medium=0, low=0 ► for sites that have a medium productivity, for sites that have a medium productivity, high=0, medium=1, low=0,high=0, medium=1, low=0,► ......

Only three of the four variables will show in Only three of the four variables will show in the regression. The remaining one is used as a the regression. The remaining one is used as a referencereference

4. LR - categorical ind var4. LR - categorical ind var

Canopy closure: variable 9-11Canopy closure: variable 9-11► Four categories: high, medium, low, Four categories: high, medium, low, and noneand none

► Three variables: high, medium, lowThree variables: high, medium, low

Tree dbh: variable 12-14Tree dbh: variable 12-14► Four categories: Four categories:

> 25cm, 15-25cm, 0-15cm, no trees> 25cm, 15-25cm, 0-15cm, no trees► Three variablesThree variables

5. Statistical Testing5. Statistical Testing

► t test for continuous ind variablest test for continuous ind variables

for each variable, say elevationfor each variable, say elevation

HH00: mean1 = mean2: mean1 = mean2

► 22 test for categorical ind test for categorical ind variables, say food productivityvariables, say food productivity

four categoriesfour categories

observed count, expected count.observed count, expected count.

► Land cover types of the area and at bear Land cover types of the area and at bear sighting sitessighting sites

Cover type %Area Expected# Cover type %Area Expected# Actual#Actual#

Douglas FirDouglas Fir 10.110.1 9.29.2 7 7Subalpine fir Subalpine fir 10.210.2 9.39.3 10 10Whitebark pineWhitebark pine 2.2 2.2 1.51.5 8 8Mountain hemlock Mountain hemlock 3.8 3.8 3.53.5 5 5Pacific silver firPacific silver fir 8.4 8.4 7.77.7 4 4Western hemlock Western hemlock 10.1 10.1 9.29.2 7 7Hardwood forestHardwood forest 1.2 1.2 1.11.1 0 0Tall shrubTall shrub 4.9 4.9 4.54.5 4 4Lowland herbLowland herb 8.5 8.5 7.77.7 12 12…… …… … ….. ….. .. ….. …. ….

TotalTotal 100% 100% 9191 91 91

6. Data Partition6. Data Partition

► Data partition for model development Data partition for model development and model validationand model validation

► 75% of sites are used to develop the 75% of sites are used to develop the logistic modellogistic model

150 presence sites and 150 absence 150 presence sites and 150 absence sitessites

► 25% for model validation25% for model validation

50 presence sites and 50 absence sites50 presence sites and 50 absence sites

7. The Logistic Model7. The Logistic Model

► Logistic model is sensitive to the Logistic model is sensitive to the middle range values of an ind varmiddle range values of an ind var

Y = bY = b00 + b + b11XX11 + b + b22XX22 + … + b + … + bnnXXnn

P(Y) = 1/[1 + exp P(Y) = 1/[1 + exp (-Y(-Y)])]

7. The Logistic Model7. The Logistic Model

YY = = 0.0020.002ele ele - 0.228- 0.228slope slope + + 0.6850.685canopy(high) canopy(high)

+ 0.443+ 0.443canopy(medium) canopy(medium) + + 0.4810.481canopy(low) canopy(low)

+ 0.009+ 0.009aspect(e-w)aspect(e-w)

P P (Y) = 1/[1 + exp (Y) = 1/[1 + exp (-(-YY)] )]

PP - The probability of red squirrel - The probability of red squirrel habitathabitat

8. Accuracy Assessment8. Accuracy Assessment

► Decide a cut-off value for P Decide a cut-off value for P

The convention is 0.5The convention is 0.5

► Convert the P values into two Convert the P values into two categoriescategories

site value < 0.5: unsuitablesite value < 0.5: unsuitable

site value ≥ 0.5: suitablesite value ≥ 0.5: suitable

Mapped Category True Category Primary Secondary Total Oak >= 50% 15 5 20Oak < 50 3 52 55Total 18 57 75

Accuracy of the oak forest map: 89%Accuracy of the oak forest map: 89%

Error MatrixError Matrix


► Error Matrix Error Matrix

for the 150 presence and 150 absence for the 150 presence and 150 absence sites that are used to develop the sites that are used to develop the logistic modellogistic model

ModeledModeled presence absence total presence absence total

accuracyaccuracy presence presence 123123 27 150 27 150

absence 36absence 36 114114 150 150

300300

Trut

Trut

hh

82%82%76%76%

Overall accuracy = (123+114)/300 Overall accuracy = (123+114)/300 = 79%= 79%

8. Model Validation8. Model Validation

► Error Matrix Error Matrix

for the 50 presence and 50 absence sites for the 50 presence and 50 absence sites that are put aside for model validationthat are put aside for model validation

ModeledModeled presence absence total presence absence total

accuracyaccuracy presence presence 3737 13 50 13 50

absence 16absence 16 3434 50 50

100 100

Trut

Trut

hh

74%74%

68%68%

71%71%

9. GIS Overlay9. GIS Overlay

Y= elevation*Y= elevation*0.0020.002 + slope* + slope*-0.228-0.228 + +

canopy closure (assign canopy closure (assign 0.6850.685 for all for all cells=high, cells=high, 0.4430.443 for cells=medium, for cells=medium, 0.4810.481 for cells=low) + for cells=low) +

aspect (e-w)*aspect (e-w)*0.0090.009

P(Y) = 1/[1 + exp P(Y) = 1/[1 + exp (-Y(-Y)] )]

► Keep the output as a continuous Keep the output as a continuous probability map or a suitable/unsuitable probability map or a suitable/unsuitable mapmap

environmental modeling advanced weighting of gis layers

Documents