Page 1
Lauren Bennett, Flora Vale, Alberto Nieto
Beyond Where: Modeling Spatial Relationships and Making Predictions
esriurl.com/spatialstats
Page 2
Modeling Relationships
when variables are related, information can be learned about one variable by observing the values of the related variable(s)
Page 3
Explore correlations
Predict unknown values
Understand key factors
Page 4
Correlation: 0.992558
Divorce Rate in Maine vs Per Capita Consumption of Margarine
tylervigen.com
Page 5
Correlation: 0.666004
Number People Who Drowned by Falling into a Swimming-Pool vs Number of Nicolas Cage Films
tylervigen.com
Page 6
http://xkcd.com/552/
Page 7
GeneralizedLinearRegression
Modeling linear relationships
Page 8
a statistical process for estimating
linear relationships between variables
Page 10
Dependent Variable
What are you trying to predict or understand?
Page 11
Explanatory Variables
Variables you believe to cause or explain the dependent variable
Page 12
Coefficients
Represent the strength and type of relationship that X has to y
Page 13
Coefficients
Positive relationship- as obesity rates rise, diabetes rates also rise
Page 14
Coefficients
Negative relationship- as foreclosure rates rise, home values drop
Page 15
Coefficients
No relationship- the value for X is not correlated with the value for y
Page 16
Residual
Model over and under predictions
Page 17
observed value
predicted value
Residual
Page 18
} over prediction
Residual
Page 19
}
Residual
difference between the observed value
and the predicted value = ε
Page 21
Evaluating our model
Page 22
Every variable should be statistically significant *
Page 23
Each variable should tell a different part of the story
Page 24
Residuals should not be clustered in location or in value
Page 25
Residuals should not be clustered in location or in value
Page 26
Model should have a strong R-Squared
Page 27
When comparing models, lower AICc is better
Page 28
Exploratory Regression
Page 30
GeographicallyWeightedRegression
Exploring spatial variation
Page 31
each feature gets a
separate equation
Page 35
Defining local
Neighborhood Type
• Number of neighbors
Page 36
Defining local
Neighborhood Type
• Number of neighbors
Page 37
Defining local
Neighborhood Type
• Number of neighbors
Page 38
Defining local
Neighborhood Type
• Number of neighbors
• Distance band
Page 39
Defining local
Neighborhood Type
• Number of neighbors
• Distance band
Page 40
Defining local
Neighborhood Type
• Number of neighbors
• Distance band
Page 41
Defining local
Neighborhood Type
• Number of neighbors
• Distance band
Neighborhood Selection Method
• Golden search
• Manual intervals
• User defined
Page 43
Local R-Squared
Condition NumberResiduals Predictions
Coefficients
Page 44
Three model types
• Gaussian – continuous
• Logistic – binary
• Poisson – count
Page 45
Gaussian – model a continuous variable
Sales profits
Healthcare spending
Mortality rateTemperature
Page 46
Poisson – model a count variable
0 1 2 3 4 5
Number of people with cancer per 10,000
Traffic accidents
Crime counts
Sales per month
Page 47
Logistic – model a binary variable
0 1
Fire damage
Pass/fail inspection
Disease presence
Insurance fraud
Page 49
LocalBivariateRelationships
Examining relationships across space
Page 50
“… the measurement of a relationship
the measurement
is taken.”
depends on where
Page 51
Y
two variables
X
determine relationship significance and type
relationships across geography
Page 52
… what does it mean for two variables to be related to
each other?
Page 53
“good” relationships
Page 54
“good” relationships
null relationships
Page 55
low entropy
higher entropy
null relationships
“good” relationships
Page 56
variable Y
low entropy
Page 57
variable Y
high entropy
Page 58
variable Y variable X
Page 59
variable Y variable Xmutual
information
Page 60
minimum spanning trees
low entropy
higher entropy
Page 61
…how do we know if the
relationship is SIGNIFICANT???
Page 62
Permutation-based distribution estimation
Page 64
each feature has two values
Y X
dependent explanatory
Page 65
relationshipbetween and x
is evaluated for a neighborhood
Y X
Page 66
relationshipbetween and x
is evaluated for a neighborhood
Y X
Page 67
neighborhood distribution of variables minimum spanning tree
0.56
entropy value
Page 68
what is the probability that the observed entropyexists while and are
actually independent from each other?
Y X
Page 69
observed local entropy
0.56
Y
X
Page 70
0.56
start a permutation
Y
X
Y
X
Page 71
0.56
the dependent values are shuffled, while explanatory values are kept the same
Y
X
Y
X
XX
Y
Page 72
0.56
minimum spanning trees and entropy are calculated for the permutation
Y
X
Y
X
Y
X
Page 73
0.56
Y
X
Y
X
Y
X
0.64
minimum spanning trees and entropy are calculated for the permutation
Page 74
permutations
Y
X
Y
X
0.61
Y
X
Y
X
0.64
Y
X
Y
X
0.57
0.56
Y
X
Y
X
………
neighborhood
Page 75
permutations
permutation entropy distribution
neighborhoodY
X
Y
X
0.61
Y
X
Y
X
0.64
Y
X
Y
X
0.57
0.56
Y
X
Y
X
………
Page 76
observed entropy is converted to a p-value
permutation entropies
0.55 0.65
p-value = 0.0012
observed entropy0.56
Page 77
entropy helps determine if relationships are significant, but
it does not tell uswhat type of relationship exists
Page 78
classifying the local relationships
Page 79
Positive Linear Negative Linear Convex Concave
Undefined Complex Not Significant
Page 80
X
Y
start with the distribution of the significant feature’s
neighborhood
Page 81
X
Y
estimate an ordinary linear regression model and
calculate AICc
Page 82
square of X
Y
estimate a second linear regression model
using the square of the explanatory variable
Page 83
AICc values are compared, and a model is selected
X
Y Y
square of X
linear model quadratic model
AICc: -42.3 AICc: -51.4
Page 84
is the R2 value greater
than 0.05?
Yes
No
which model was chosen?
linear
quadratic
sign of coefficient?
sign of coefficient?
positive
negative
positive
negative
Positive Linear
Negative Linear
Convex
ConcaveUndefined Complex
Page 85
entropy
classify type
Page 94
Positive Linear
Negative Linear
Concave
Convex
Undefined Complex
Not Significant
Page 96
"Essentially, all models are wrong, but some are useful."
- George E. P. Box
Page 97
Want to learn more???
esriurl.com/spatialstats
TUESDAY_________________________________________
1:45p Data Visualization for Spatial Analysis 146C
3:00p Machine Learning in ArcGIS 146C
4:15p From Means and Medians to Machine Learning: Spatial Statistics Basics and Innovations 146C
WEDNESDAY______________________________________
8:30a Machine Learning in ArcGIS 146C
11a Data Visualization for Spatial Analysis 146C
1:30p From Means and Medians to Machine Learning: Spatial Statistics Basics and Innovations 146C
2:45p Spatial Data Mining: Cluster Analysis and Space Time Analysis 146C
4:00p Beyond Where: Modeling Spatial Relationships and Making Predictions 146C
5:15p The Forest for the Trees: Making Predictions Using Forest-Based Classification and Regression 146C
Please fill out a course survey!!!
[email protected] @[email protected]
Page 98
esriurl.com/spatialstats