crime hotspot mapping and analysis · 2011-10-20 · definition of crime hotspot a hotspot is an...
Post on 29-Jun-2020
0 Views
Preview:
TRANSCRIPT
Crime Hotspot Mapping and Analysis
Paul ZandbergenDepartment of Geographyp g p yUniversity of New Mexico
Workshop Presentation PostedWorkshop Presentation Posted
h // l db / k hhttp://www.paulzandbergen.com/workshops
OutlineOutline• Introduction• Introduction
– Definition of crime hotspots– Purpose of hotspot mapping– Crime data for hotspot mappingCrime data for hotspot mapping
• Hotspot techniques– Grid‐based mapping– Local Moran’s ILocal Moran s I– Gi*– Kernel density– Nearest Neighbor Hierarchical clustering (NNH)– Spatial and Temporal Analysis of Crime (STAC)
• Hands‐on demonstrations• Comparison of hotspot techniques• Recent developments
– Predictive crime mapping– Hotspot visualization
Definition of Crime HotspotDefinition of Crime Hotspot
A hotspot is an area that has a greater than average number of criminal or disorder events, or an area where people have a higher than average risk of victimization
• Ways to express hotspots:Counts– Counts
– Densities (counts per sq. km)R t ( t l ti t i k– Rates (counts per population at risk
Purpose of Hotspot MappingPurpose of Hotspot Mapping
• Describe large volume of data in a meaningful way• Detect spatial and temporal patterns and trends• Strategically allocate resources• Test whether policies are working• Test whether policies are working• Identify underlying causes for crime events
Geocoding Crime EventsGeocoding Crime Events
Global vs Local ClusteringGlobal vs. Local Clustering
• Global clustering– Determines whether a pattern is clustered– Produces a single statistic with confidence intervals – e.g. Nearest Neighbor Index (NNI)
• Local clustering• Local clustering– Determines where clusters are locatedProd ces a map of cl sters (“hotspots”)– Produces a map of clusters (“hotspots”)
– e.g. kernel density
Methods for Global ClusteringMethods for Global Clustering
• Methods:– Nearest Neighbor Index
K nearest Neighbor– K‐nearest Neighbor– Quadrat analysis– Ripley’s K‐functionp y– Moran’s I– Getis‐Ord General G
• Determine whether a pattern is clustered, but not where.
Example: Nearest Neighbor IndexExample: Nearest Neighbor Index
Distance Mean ExpectedDistance Mean Observed
NNI
n = 100Expected mean distance:5.0
d 100
A = 100,000Expected mean distance:A/n
de
26136.0SE Standard error:
0
A/n2
0 100
Example: Nearest Neighbor IndexExample: Nearest Neighbor Index
homicides
Example: Nearest Neighbor IndexExample: Nearest Neighbor Index
homicides results
Average Nearest Neighbor tool
Z scores and p valuesZ‐scores and p‐values
P‐value : probability that the observed pattern was created by a random processZ‐score: number of standard deviations corresponding to a certain p‐valueZ score: number of standard deviations corresponding to a certain p value
Example: Nearest Neighbor IndexExample: Nearest Neighbor Index
Crime Type NNI Z‐score Conclusion
Auto burglary 0.346 ‐134.24 Highly clustered
Homicide 0.467 ‐9.17 Highly clustered
Robbery 0.345 ‐54.67 Highly clustered
homicidesauto burglary robbery
Methods for Local ClusteringMethods for Local Clustering
• Methods:– Grid‐based mapping
Local Moran’s I– Local Moran s I– Gi*– Kernel densityy– Nearest Neighbor Hierarchical clustering (NNH)– Spatial and Temporal Analysis of Crime (STAC)
• Determine where clustering occurs• Clusters of high values are called “hotspots”
Aggregated Data vs Point DataAggregated Data vs. Point Data
Aggregation
OR OR
Point Pattern Analysis
OR OR
Why Aggregation?Why Aggregation?
• Determine crime rates (i.e. # per 10,000 residents)• Allows for different types of statistical tests
Hotspot MethodsHotspot Methods
• Aggregated data– Grid‐based thematic mapping– Local Moran’s I– Gi*
• Point dataPoint data– Kernel density– Nearest Neighbor Hierarchical clustering (NNH)– Nearest Neighbor Hierarchical clustering (NNH)– Spatial and Temporal Analysis of Crime (STAC)
Grid based Thematic MappingGrid‐based Thematic Mapping
Handson: Grid based Thematic MappingHandson: Grid‐based Thematic Mapping
• Overlay a grid over the study area– Several hundred meters to a few km
ArcGIS: Create Fishnet Tool– ArcGIS: Create Fishnet Tool
• Generate a count of crimes per grid cell– ArcGIS: Spatial JoinArcGIS: Spatial Join
• Remove grid cells with count = 0– ArcGIS: SQL (Select by Attribute, Select)ArcGIS: SQL (Select by Attribute, Select)
• Determine a threshold value for a “hotspot”– Typically a quintile classification– ArcGIS: data classification or SQL
Grid based Thematic MappingGrid‐based Thematic Mapping
10 class quintile classification highest class
2 km grid
Demo: Grid‐based Thematic Mapping
Local Moran’s I and Gi*Local Moran s I and Gi
• Basic question: Are nearby features similar?• Two approaches for aggregated data:
1 Measuring the similarity of nearby features1. Measuring the similarity of nearby features– Global: Moran’s I
Local: Local Moran’s I– Local: Local Moran s I
2. Measuring the concentration of high or low values– Global: General G‐statistic– Local: Gi*
Global: Moran’s I and General G statisticGlobal: Moran s I and General G‐statistic
• Global measures of spatial autocorrelation– i.e. a single statistic for one spatial pattern
• Significance testing– using Z‐scores derived from standard errorg
• Analysis based on normalized data– i e using densities and rates not on raw countsi.e. using densities and rates, not on raw counts
• Defining what “nearby” means is criticalthis is referred to as “spatial eights”– this is referred to as “spatial weights”
Moran’s IMoran s I
i
jij
ij xxxxwn
I)')('(
jiij
i j
xxxxwI
)')('(i j i
x’ is the mean value for all featuresx is the mean value for all featuresi is the index for the target featurej is the index for the neighbor featuren is the number of featuresn is the number of featureswij is the spatial weight for the pair
Expected Value for Moran’s IExpected Value for Moran s I
• Value for I range from ‐1 to 1• Random distribution ~ 0• Clustered: I > 0 (positive spatial autocorrelation)• Dispersed: I < 0 (negative spatial autocorrelation)
• Significance test based on standard deviation.• Z‐scores are used in the reporting
Moran’s I ExampleMoran s I Example
crime density (# per sq. km)
General G StatisticGeneral G‐Statistic
jiij xxw )(
ji
ij
jj
xxdG
)()(
iji
j
)(
i is the index for the target featurej is the index for the neighbor featuren is the number of featuresn is the number of featureswij is the spatial weight for the pair
Expected value for General G statisticExpected value for General G‐statistic
h l l l b• The General G‐Statistic is a relative statistic, so no conclusions can be drawn from its absolute value. This is because the sum of weights for a dataset can vary.
• The theoretical range, however, is from 0 to 1, but values close to 1 are rare.
• If the observed G statistic is higher than the expected value there is a• If the observed G‐statistic is higher than the expected value, there is a concentration of high values
• If the observed G‐statistic is lower than the expected value, there is a concentration of low values
W)1()( nn
WE dG
General G statistic ExampleGeneral G‐statistic Example
crime density (# per sq. km)
Spatial NeighborhoodsSpatial Neighborhoods
• Two basic types:– Adjacency (i.e. sharing a boundary)
Distance based– Distance‐based
• Adjacency not as widely used, since it is very sensitive to specific local boundariesspecific local boundaries
• Many variations for distance‐based neighborhoods exist• Neighborhood relationships are stored in a a spatial weightsNeighborhood relationships are stored in a a spatial weights
matrix
Spatial weights matrixSpatial weights matrix
• Matrix for all features in a spatial dataset– n features produce an nxn matrix
V l i t i i di t th “ i ht”• Values in matrix indicate the “weight”– Can be 0s and 1s to indicate “no neighbor” and “neighbor”– Can be value that indicates relative weights i e some neighbors countCan be value that indicates relative weights, i.e. some neighbors count
more than others
• Weighs can be row‐normalized to account for potential biases– e.g. number of neighbors may vary with size of polygons, so you can
normalize for the number of neighbors
Example: Kansas Countiesp
Neighbors
001 003 005 …. 209
001
Neighbors
003
005
Targ
et
…..
209
T105 x 105 matrix
Example: Polygon ContiguityExample: Polygon Contiguity
Example: Polygon ContiguityExample: Polygon Contiguity
001 …. 009 052 053 105 141 165 … 209
001
Regular spatial weights matrix
…
167 0 0 1 1 1 1 1 1 0 0
…..
209
001 …. 009 052 053 105 141 165 … 209
001
Row-standardized spatial weights matrix
001
…
167 0 0 0.167 0.167 0.167 0.167 0.167 0.167 0 0
…..
209
Distance based NeighborhoodsDistance‐based Neighborhoods
• Distance is based on centroids of polygons• Data needs to be projected• Fixed threshold distance
– within the distance all neighbors count the same
I di i h d• Inverse distance weighted– Regular or squared– Up to a threshold distance or to infinite (end of study area)– Up to a threshold distance or to infinite (end of study area)
Example: Fixed DistanceExample: Fixed Distance
• Distance determination uses centroids• Threshold distance should be based on an understanding on what makes
up a meaningful “neighborhood” for the variable question• Row standardization is recommended for fixed distances• Row standardization is recommended for fixed distances
Example: Inverse Distance WeightedExample: Inverse Distance Weighted
• Distance determination uses centroids• Distance determination uses centroids• Can be regular or squared• Typically no threshold distance• Row standardization is not recommended since it• Row standardization is not recommended since it
makes all distances relative
Hands on: Moran’s I and General GHands‐on: Moran s I and General G
• Aggregate data to polygons– ArcGIS: Spatial Join
• Determine densities or rates:– ArcGIS: Add Field, Field Calculator
• Determine patterns– ArcGIS: Spatial Autocorrelation (Moran’s I)ArcGIS: Spatial Autocorrelation (Moran s I)– ArcGIS: High/Low Clustering (Getis‐Ord General G)– Select spatial weights– Select spatial weights
Demo: Moran’s I and General G
Local Moran’s ILocal Moran s I
• Local version of Moran’s I ‐ for every feature:– a value for local Moran’s I– Z‐score and corresponding p‐valueZ score and corresponding p value– type of cluster
• Spatial weights: inverse distance• Interpretation
– Values for Moran’s I indicate whether clustering occurs– Z‐values indicates whether the result is statistically significant.
Local Moran’s I ResultsLocal Moran s I Results
% of people w/ diabetes local clustering result
Local Moran’s I InterpretationLocal Moran s I Interpretation
• Clusters:– HH (high‐high) – cluster of high values
LL (low low) cluster of low values– LL (low‐low) – cluster of low values
• Outliers:– HL (high‐low) – high value surrounded by low valuesHL (high low) high value surrounded by low values– LH (low‐high) – low value surrounded by high values
• Not significantg– p‐value > 0.05
• For crime data we are interested in HH clusters
Local Moran’s I ExampleLocal Moran s I Example
i i ll i ifii d i statistically significant high‐high clusters
crime density(# per sq km)
Gi*GiL l i f G l G S i i• Local version of General G‐Statistic– a Z‐score and corresponding p‐value for every feature– no local Gi* reportedno local Gi reported
• Spatial weights: fixed distance• Two types:
– Gi: do not include the target feature in the neighborhood– Gi*: include the target features in the neighborhood
Gi* ResultsGi Results
% of people w/ diabetes local clustering result
Gi*InterpretationGi Interpretation
• Positive Z‐scores > 1.96– Clustering of high values
N ti Z 1 96• Negative Z‐scores < ‐1.96– Clustering of low values
• Z scores between 1 96 and 1 96• Z‐scores between ‐1.96 and 1.96– Not significant
• For crime data we are interested in clustering of high values
Gi* ExampleGi Examplel i f hi h li d i clustering of high valuescrime density
(# per sq km)
Local Moran’s I vs Gi*Local Moran s I vs. Gi
• Both indicate spatial clustering of high crime areas• Local Moran’s I is more robust, since it is not dependent on
h th hi h l l l twhether high or low values cluster• Gi*is most meaningful if either high or low values cluster, not
bothboth– This is typically the case for crime densities, so Gi* is very suited for
crime hotspot mapping
Hands on: Local Moran’s I and Gi*Hands‐on: Local Moran s I and Gi
• Aggregate data to polygons– ArcGIS: Spatial Join
D t i d iti t• Determine densities or rates:– ArcGIS: Add Field, Field Calculator– Densities: normalize by areaDensities: normalize by area– Rates: normalize by population at risk
• Determine patternsp– ArcGIS: Cluster and Outlier Analysis (Anselin Local Moran’s I)– ArcGIS: Hot Spot Analysis(Getis‐Ord Gi*)– Select spatial weights
Demo: Local Moran’s I and Gi*
Mapping DensityMapping DensityP i l b d l l l l d i• Point values can be used to calculate a local density.– Essentially, the point values are spread out over a surface. The
measured quantity of the points is distributed throughout a landscape – a density value is calculated for each cell in the output raster.
• Apply a search radius or bandwidthA circular search area is applied to each cell in the output raster being– A circular search area is applied to each cell in the output raster being created. The search area determines the distance to search for points in order to calculate a density value for each cell in the output raster.
• Calculating density is NOT a type of interpolation.
Surface Density CalculationSurface Density Calculation
When mapping density you determine the count per unit of area for a surface. This process
takes a discrete set of points buttakes a discrete set of points, but and creates a raster surface
where each cell is given a densitywhere each cell is given a density value. The calculation is count divided by the area of a user-specified search radius that is
centered on the cell and applied to each cell in the rasterto each cell in the raster.
Surface Density CalculationSurface Density Calculation
The simple method for creating a density surface uses a circular search area orThe simple method for creating a density surface uses a circular search area, or neighborhood, to calculate cell values. In a density surface, individual cell values are calculated by dividing the number of features that fall within the search area
(e g observations) by the size of the area (e g 2 88 acres) The resulting value is(e.g., observations) by the size of the area (e.g., 2.88 acres). The resulting value is then assigned to the cell. Every cell in the surface is processed in the same way.
Density CalculationsDensity Calculations
• You can calculate density using simple or kernel calculations:
• In a simple density calculation, points that fall within the search area are summed and then divided by the search area size to get each cell’s density valuesize to get each cell s density value.
• The kernel density calculation works the same as the simple density calculation except the points lying near the center ofdensity calculation, except the points lying near the center of a raster cell’s search area are weighted more heavily than those lying near the edge. The result is a smoother distribution of values.
Density TypesDensity Types
simple kernel
Density TypesDensity Types
simple kernel
Kernel DensityKernel Density
Kernel DensityKernel Density
Kernel DensityKernel Density
Kernel DensityKernel Density
Kernel DensityKernel Density
Hands on: Kernel Density ExampleHands‐on: Kernel Density Example
• Start with point data – Create weight field if needed
C t k l d it f• Create kernel density surface– ArcGIS: Kernel Density– Select bandwidth (search radius)Select bandwidth (search radius)– Select output raster cellsize
• Create hotspots from surfacep– ArcGIS: Classify or Reclassify– Select threshold values
Demo: Kernel density
Importance of Search RadiusImportance of Search Radius
• Boundaries vary greatly with search radius• No single best way to pick the best value• Size should correspond to scale of analysis
– regional vs. local vs. micro
• General guidelinesi t k di th f i– consistency – keep radius the same for comparisons
– uniform, i.e. no adaptive (very confusing)– “typical” values range from 100 to 500 m for “local” analysistypical values range from 100 to 500 m for local analysis
Importance of ThresholdImportance of Threshold
• What density should be considered “hot”?• No single best technique• One recommended technique
– Remove cells with density = 0D t i d it ( i k )– Determine average density (e.g. x crimes per sq km)
– Map multiple of the mean (< 1*mean, 1‐2*mean, etc.)
• What NOT to use:– Software defaultsSoftware defaults– Classifications driven by distribution (e.g. Natural Breaks)
Demo: Kernel density
Nearest Neighbor Hierarchical (NNH) ClusteringNearest Neighbor Hierarchical (NNH) Clustering
• Identifies clusters of points based on:– Number of points (user defined)– Distance to points (user defined or based on randomness)
• Output:p– Ellipses of clusters by order (1st, 2nd, etc.)– Convex hulls of clusters by order (1st, 2nd, etc.)y ( , , )
Nearest Neighbor Hierarchical (NNH) ClusteringNearest Neighbor Hierarchical (NNH) Clustering
Hands on: NNH ClusteringHands‐on: NNH Clustering
• Prepare points in ArcGIS– Coordinate system, units– XY coordinates
• Open data in CrimeStatp– Specify units, fields
• Run NNH in CrimeStatRun NNH in CrimeStat– Set parameters– Specify output location (for multiple files)– Specify output location (for multiple files)
• Open results in ArcGIS for mapping
NNH in CrimestatNNH in Crimestat
NNH in CrimestatNNH in Crimestat
Nearest Neighbor Hierarchical ClusteringNearest Neighbor Hierarchical Clustering
Spatial and Temporal Analysis of CrimeSpatial and Temporal Analysis of Crime
• Identifies cluster based on:– Search neighborhood (user defined)g ( )– Number of points (user defined)Overlapping clusters combined into one– Overlapping clusters combined into one
• Outputs– Ellipses of clusters (one order only, vary in size)
Spatial and Temporal Analysis of CrimeSpatial and Temporal Analysis of Crime
Spatial and Temporal Analysis of CrimeSpatial and Temporal Analysis of Crime
Spatial and Temporal Analysis of CrimeSpatial and Temporal Analysis of Crime
Comparison of Hotspot TechniquesComparison of Hotspot Techniques
Hotspot Technique
Creating map
Key parameters Interpretation Statistical significance
Softwareq p g
Grid‐based thematic
Easy Grid‐cell size Easy No ArcGIS
Local Moran’s I Moderate Polygon type & size spatial Difficult Yes ArcGISLocal Moran s I Moderate Polygon type & size, spatial weights, normalization
Difficult Yes ArcGIS
Gi* Moderate Polygon type & size, spatial weights, normalization,
Difficult Yes ArcGISg , ,
Kernel density Easy Search radius Easy No ArcGIS
NNH Difficult Search radius, minimum count
Moderate No CrimeStatcount
STAC Difficult Search radius, minimum count
Moderate No CrimeStat
Recent DevelopmentsRecent Developments
• Predictive crime mapping• Hotspot visualizationHotspot visualization• Spatio‐temporal hotspots
Predictive Crime MappingPredictive Crime MappingTime Period 1 Time Period 2Time Period 1 Time Period 2
?
Time periods can be years, seasons, months, weeks, shifts, etc.
Measures of Hotspot ReliabilityMeasures of Hotspot Reliability( )• Hit Rate (%)
– Percentage of crimes in period 2 that falls in hotspot derived from period 1– Higher values are betterg
• Predictive Accuracy Index (PAI)– Ratio of hit rate to the area percentage– Measures predictive accuracy of hot spot– Higher values are betterg
• Recapture Rate Index (RRI)– Ratio of hot spot crime densities for periods 2 and 1– Standardized for change in total number of crimes– Higher values are betterg
Example Calculation – Assaults in Las VegasExample Calculation – Assaults in Las Vegas
1,366 crimes in 2007overlay 1 km grid
1,035 km2 total area
653 crimes in 2007 within hotspot of 68 km2
1,531 crimes in 2008, of which 580 within hotspot
based on 2007 data
Predictive Accuracy Index =(580/1 531) / (68/1 035) 5 77
Recapture Rate Index = (580/653) * (1 366/1 531) 0 78
Hit Rate =(580/1 531) 37 9% (580/1,531) / (68/1,035) = 5.77 (580/653) * (1,366/1,531) = 0.78 (580/1,531) = 37.9%
Accuracy of Crime PredictionsAccuracy of Crime Predictions Charlotte NC – assaults
Hotspot technique Hit Rate PredictiveAccuracy Index
Recapture Rate Index
Charlotte, NC – assaults
Accuracy Index Rate Index
Grid‐based thematic – 250 m grid 45.6 15.8 0.81
Local Moran’s I – 250 m grid 62.3 9.2 0.86g
Local Moran’s I – blockgroups 38.1 7.8 1.02
Gi* – 250 m grid 59.5 7.6 0.95
Gi* – blockgroups 24.1 8.4 1.02
Kernel density – 200 m radius 34.8 14.5 0.60
N t i hb hi hi l l t i 5 8 563 0 70Nearest neighbor hierarchical clustering 5.8 563 0.70
Spatial and Temporal Analysis of Crime 3.2 1,083 0.70
Hotspot method has a major effect
Effects of Hotspot ParametersEffects of Hotspot ParametersLas Vegas NV – auto thefts
HR = 88 0%HR = 67 2%HR = 47 6%HR = 39 8%HR = 11 6%
Las Vegas, NV – auto thefts
HR = 88.0%PAI = 2.48RRI = 0.96
HR = 67.2%PAI = 3.79RRI = 0.94
HR = 47.6%PAI = 5.01RRI = 0.90
HR = 39.8%PAI = 5.51RRI = 0.87
HR = 11.6%PAI = 9.67RRI = 0.82
100 crimes/km2 50 crimes/km2 42 crimes/km2 25 crimes/km2 10 crimes/km2
There are trade‐offs among accuracy metrics
Effect of Hotspot ParametersEffect of Hotspot ParametersCharlotte NC – assaults – kernel density
Kernel density bandwidth Hit Rate Predictive Recapture
Charlotte, NC – assaults – kernel density
Accuracy Index Rate Index
50 m 33.6 214.2 0.59
100 m 35 0 65 4 0 69100 m 35.0 65.4 0.69
200 m 40.0 25.3 0.75
300 m 45.1 17.1 0.83
400 m 47.9 13.4 0.87
500 m 50.2 11.6 0.92
1,000 m 53.3 8.0 0.98
Hotspot parameters have a major effect
Hotspot VisualizationHotspot Visualization
3D View of KDE3D View of KDE
Temporal PatternsTemporal Patterns
Temporal PatternsTemporal Patterns
IsosurfacesIsosurfaces
ResourcesResources
https://www.ncjrs.gov/pdffiles1/nij/209393.pdf
Workshop Presentation PostedWorkshop Presentation Posted
h // l db / k hhttp://www.paulzandbergen.com/workshops
ContactContact
Paul Zandbergen – University of New Mexicozandberg@unm.edu – www.paulzandbergen.com
top related