glen johnson, phd lehman college / cuny school of public health [email protected]
DESCRIPTION
University at Albany School of Public Health EPI 621, Geographic Information Systems and Public Health. Introduction to Smoothing and Spatial Regression. Glen Johnson, PhD Lehman College / CUNY School of Public Health [email protected]. Consider points distributed in space. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/1.jpg)
University at Albany School of Public HealthEPI 621, Geographic Information Systems and Public Health
Glen Johnson, PhDLehman College / CUNY School of Public Health
Introduction to Smoothing and Spatial
Regression
![Page 2: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/2.jpg)
Consider points distributed in space
“Pure” Point process:Only coordinates locating some “events”.
Set of points, S ={s1, s2, … , sn}
Points represent locations of something that is measured. Values of a random variable, Z, are observed for a set S of locations, such that the set of measurements areZ(s) ={Z(s1), Z(s2), … , Z(sn)}
_____________________Examples include• location of burglaries• location of disease cases• location of trees, etc.
___________________________Examples include• cases and controls (binary outcome)
identified by location of residence• Population-based count
(integer outcome) tied to geographic centroids
• PCBs measured in mg/kg (continuous outcome) in soil cores taken at specific point locations
![Page 3: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/3.jpg)
Example of a Pure Point Process: Baltimore Crime Events
Question: How to interpolate a smoothed surface that shows varying “intensity” of the points?
(source: http://www.people.fas.harvard.edu/~zhukov/spatial.html)
![Page 4: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/4.jpg)
From: Cromely and McLafferty. 2002. GIS and Public Health.
Kernel Density Estimation
![Page 5: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/5.jpg)
Kernel Density EstimationEstimate “intensity” of events at regular grid points as a function of nearby observed events. General formula for any point x is:
where xi are “observed” points for i = 1, …, n locations in the study area, k(.) is a kernel function that assigns decreasing weight to observed points as they approach the bandwidth h. Points that lie beyond the bandwidth, h, are given zero weighting.
![Page 6: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/6.jpg)
Baltimore Crime Locations (Kernel Density)
Bandwidth = 0.007 Bandwidth = 0.05
Bandwidth = 0.1 Bandwidth = 0.15
0
20000
40000
60000
80000
100000
120000
140000
160000
Results from Kernel Density Smoothing in R
![Page 7: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/7.jpg)
Source: http://spatialityblog.com/2011/09/29/spatial-analysis-of-nyc-bikeshare-maps/
Kernel Density Surface of Bike Share Locations in NYC
![Page 8: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/8.jpg)
Examples of Values Observed at Point Locations, Z(s) :
Question: How to interpolate a smoothed surface that captures variation in Z(s)?
![Page 9: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/9.jpg)
First, consider “deterministic” approaches to spatial interpolation:
• Deterministic models do not acknowledge uncertainty.
• Only real advantage is simplicity; good for exploratory analysis
• Several options, all with limitations. We will consider Inverse Distance Weighted (IDW) because of its common usage.
![Page 10: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/10.jpg)
Inverse Distance Weighted Surface Interpolation
Define search parameters
Define power of distance-decay function
0,
0,1
0
01
Interpolate value at point as
( ) ( )
for neighboring observed values ( ),
where the weight
for distance .
pi
npi
i
n
i ii
i
di
d
s
Z s Z s
n Z s
d
![Page 11: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/11.jpg)
Illustration: Tampa Bay sediment total organic carbon
![Page 12: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/12.jpg)
True “geostatistical” models assume the data, Z(S) = {Z(s1), Z(s2), … , Z(sn)}, are a partial realization of a random field.
Note that the set of locations S are a subset of some 2-dimensional spatial domain D, that is a subset of the real plane.
![Page 13: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/13.jpg)
General Protocol:
1. Characterize properties of spatial autocorrelation through variogram modeling;
2. Predict values for spatial locations where no data exist, through Kriging.
![Page 14: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/14.jpg)
A semivariogram is defined as
for distance h between the two locations, and is estimated as
for nh pairs separated by distance hj (called a “lag”).
After repeating for different lags, say j =1, … 10, the semivariance can be plotted as a function of distance.
21(h) E( ( ) ( ))2
Z s Z s h
2
1
1ˆ( ) ( ( ) ( ))2
hn
j i iih
h Z s Z s hn
![Page 15: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/15.jpg)
Given any location si, all other locations are treated as within distance h if they fall within a search window defined by the direction, lag h, angular tolerance and bandwidth.
Adapted from Waller and Gotway. Applied Spatial Statistics for Public Health. Wiley, 2004.
bandwidth
![Page 16: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/16.jpg)
Example semivariogram cloud for pairwise differences (red dots) , with the average semivariance for each lag (blue +), and a fitted semivariogram model (solid blue line)
![Page 17: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/17.jpg)
Characteristics of a semivariogram
Range = the distance within which positive spatial autocorrelation exists
Nugget = spatial discontinuity + observation errorSill = maximum semivariance
![Page 18: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/18.jpg)
If the variogram form does not depend on direction, the spatial process is isotropic. If it does depend on direction, it is anisotropic.
Multiple semi-variograms for different directions. Note changing parameter is the range.
Surface map of semivariance shows values more similar in NW-SE direction and more different in SW-NE direction.
![Page 19: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/19.jpg)
Kriging then uses semivariogram model results to define weights used for interpolating values where no data exists.The result is called the “Best Linear Unbiased Predictor”. The basic form is
01
( ) ( )p
i ii
Z s Z s
Where the λi assign weights to neighboring values according to semivariogram modeling that defines a distance-decay relation within the range, beyond which the weight goes to zero.
![Page 20: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/20.jpg)
Several variations of Kriging:• Simple (assumes known mean)• Ordinary (assumes constant mean, though
unknown) [our focus this week]• Universal (non-stationary mean)• Cokriging (prediction based on more than one
inter-related spatial processes)• Indicator (probability mapping based on binary
variable) [you will see in the lab work]• Block (areal prediction from point data)• And other variations …
![Page 21: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/21.jpg)
Example of two types of Kriging for California O3:
1. Ordinary Kriging (Detrended, Anisotropic)
-continuous surface
2. Indicator Kriging
- probability isolines
![Page 22: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/22.jpg)
What if point locations are centroids of polygons and the value Z(si) represents aggregation within polygon i ?
![Page 23: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/23.jpg)
With polygon data, we can still define neighbors as some function of Euclidean distance between polygon centroids, as we do for point-level data,
but now we have other ways to define neighbors and their weights …
![Page 24: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/24.jpg)
i
Defining spatial “Neighborhoods”
Raster or Lattice:
Rook
Queen- 1st orderQueen- 2nd order
iii
![Page 25: Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny](https://reader036.vdocuments.us/reader036/viewer/2022062501/56816163550346895dd0ee14/html5/thumbnails/25.jpg)
Spatial Regression Modeling as a method for both • assessing the effects of covariates
and…• smoothing a response variable