san francisco crime
TRANSCRIPT
DSO 510 Business Analytics | Group Project
1
Crime in San FranciscoDSO 510 Business Analytics | Group Project
Phase 4 Presentation
Andrew Chen | Yile Wu | Chi Zhang | Chulsoon Pak
DSO 510 Business Analytics | Group Project
2
DSO 510 Business Analytics | Group Project
3
PHASE I
Define business analytics proposal, data required, data analysis approach,
and decision making and innovation framework
DSO 510 Business Analytics | Group Project
4
• 2014 Population: 852,4691
• 13th most populous city in the nation
• Separated into 10 districts
• Top global innovation center, with
highest concentration of technology-
related jobs in the U.S.
• Crime in San Francisco has historically
been higher than the U.S. average• Crime Index of SF is rated 3 out of 100
(safer than 3% of other U.S. cities)2
BACKGROUND: SAN FRANCISCO
1. United States Census Bureau, July 2014
2. "Crime rates for San Francisco, CA", NeighborhoodScout, 2013
DSO 510 Business Analytics | Group Project
5
In order to make San Francisco a safer place,
we aim identify factors that promote criminal
behavior to predict crime more accurately.
GOAL DEFINITION
DSO 510 Business Analytics | Group Project
6
DEFINING OUR VARIABLES
Dependent Variables
1. Number of Crimes
2. Time of Crime
3. Date of Crime
4. Severity of Crime
5. Type of Crime
6. Location of Crime
Independent Variables
1. Day of Week
2. Season of Year
3. Weather
4. Daylight
5. Income Level of District
6. Average Housing Price of District
7. Age Composition of District
8. Population density
9. Degree of Urbanization
10. Modes of Transportation
11. Level of Education
12. Divorce rate of Families
DSO 510 Business Analytics | Group Project
7
• San Francisco Crime Data:
• https://data.sfgov.org/Public-Safety/
• 700,000+ data points (5 years of data)
• San Francisco Weather, Population,
Housing, Hazard Risk, and Demographics
• www.sfclimatehealth.org/
• http://aa.usno.navy.mil/data/docs/RS_
OneYear.php
• San Francisco Housing, Income Level,
Employment, and Transportation by
District
• www.sf-planning.org
DATA COLLECTION
DSO 510 Business Analytics | Group Project
8
INTERPRETATION & ACTION
1. Identify to what degree different variables contribute to crime
2. Predict probability and severity of crimes in terms of time and location
Implementation
1. Assist SFPD in efficient deployment of its police force
2. Integrate data with mapping algorithms to provide the safest real-time routes
3. Organize anti-crime education in high crime areas (how to handle crimes under different
situations)
4. Utilize data in product development and marketing of security-related products
5. Enhance San Francisco city-planning to reduce crimes
DSO 510 Business Analytics | Group Project
9
PHASE II
DSO 510 Business Analytics | Group Project
10
In order to make San Francisco a safer place,
we aim identify factors that promote criminal
behavior to predict crime more accurately.
GOAL DEFINITION
DSO 510 Business Analytics | Group Project
11
DEFINING OUR VARIABLES
Dependent Variables
1. Number of Crimes per Day
2. Number of Crimes per Month
3. Time Slot of Crime
4. Date of Crime
5. Severity of Crime
6. Location of Crime
Independent Variables
1. Day of Week
2. Month
3. Weather
4. Daylight
5. Income Level of District
6. Age Composition of District
7. Modes of Transportation
8. Level of Education
9. Employment of District
DSO 510 Business Analytics | Group Project
12
SUMMARY STATISTICS
• 5 Years of Data• From August 2010
• To August 2015
• 726,245 Crimes
Reported
Monthly Statistics
DSO 510 Business Analytics | Group Project
13
MONTHLY CRIME DATA (2010 – 2015)
DSO 510 Business Analytics | Group Project
14
CRIME PATTERNS BY MONTH OF YEAR
DSO 510 Business Analytics | Group Project
15
WHICH CRIMES ARE MOST FREQUENTLY
COMMITTED?
Top 5 Crimes*
1. Theft
2. Assault
3. Vandalism
4. Drug Violation
5. Vehicle Theft
*Other Offenses, Non-Criminal
Offenses, and Warrants are excluded
DSO 510 Business Analytics | Group Project
16
CRIMES THAT DEMAND GREATER ATTENTION
Assault,
Robbery,
Missing Person
Theft,
Vandalism
Forcible Sex Offenses,
Murder,
Kidnapping
Disorderly Conduct,
Gambling,
Loitering
HIGH FREQUENCY
LOW FREQUENCY
HIGH
SEVERITY
LOW
SEVERITY
DSO 510 Business Analytics | Group Project
17
CRIMES PER DAY OF THE WEEK
• Friday and Saturday’s have
the most crimes committed
– Late night parties/Events
• Sunday and Monday’s have
the least crimes committed
– Church, Family gatherings
– Back to Work/School
DSO 510 Business Analytics | Group Project
18
CRIME PER DISTRICT
• Number of Crimes per District
• Some districts have significantly higher crime than others
• A good indicator to help SFPD deploy police forces by districts
DSO 510 Business Analytics | Group Project
19
INNER JOIN WITH DAYLIGHT DATA
Crime Data Sunrise and Sunset Data
Inner join by date
DSO 510 Business Analytics | Group Project
20
DAYLIGHT AFFECTS SOME TYPES OF CRIMES
• Crime breakdown based on day or nighttime (in percentages)
– Data eliminated our initial hypothesis that crimes are more likely committed during the night
DSO 510 Business Analytics | Group Project
21
LOOKING AHEAD
Data Manipulation
• Clean up and join other demographic data to existing data
• Categorize meaningful variables into numeric values in order to
run further statistical models
• Assign values for severity and frequency of each crime
Further Insights
• Dig deeper into crimes by district, day of week, and time of day
• Produce a spatial map of crime
DSO 510 Business Analytics | Group Project
22
PHASE III
DSO 510 Business Analytics | Group Project
23
DEFINING OUR VARIABLES
Dependent Variables
1. Number of Crimes per Day
2. Number of Crimes during the Day
3. Number of Crimes during the Night
4. Number of Crimes per Month
5. Time Slot of Crime
Independent Variables
1. Day of Week
2. Month
3. Average Temperature
4. Precipitation
5. Daylight
6. Income Level of District
7. Age Composition of District
8. Modes of Transportation
9. Level of Education
10. Employment of District
DSO 510 Business Analytics | Group Project
24
TEN CRIMES TO FOCUS ON
• Weighted based on frequency and severity of crime sentence
Frequency Low.yr High.yr Avg.yr Weight
LARCENY/THEFT 168,901 0 25 13 2,136,598
ASSAULT 62,449 1 25 13 811,837
DRUG/NARCOTIC 31,180 1 40 20 631,395
ROBBERY 18,652 15 30 23 419,670
BURGLARY 29,020 3 20 12 333,730
SEX OFFENSES,
FORCIBLE
3,927 20 100 60 235,620
FRAUD 14,237 1 25 13 185,081
VEHICLE THEFT 31,002 5 5 5 155,010
KIDNAPPING 2,162 0 100 50 108,208
WEAPON LAWS 7,444 0 20 10 74,812
DSO 510 Business Analytics | Group Project
25
CORRELATIONS
DSO 510 Business Analytics | Group Project
26
LINEAR REGRESSION MODEL
DSO 510 Business Analytics | Group Project
27
LINEAR REGRESSION MODEL
• Dependent Variable:
• Total Daily Crime
• Independent Variables:
• Day of Week
• Average Temperature
• Precipitation
• Significance level: <.0001
• R-Squared Value: 0.1956
DSO 510 Business Analytics | Group Project
28
RESIDUALS ANALYSIS
DSO 510 Business Analytics | Group Project
29
ANOVA
DSO 510 Business Analytics | Group Project
30
TEN CRIMES TO FOCUS ON
• Weighted based on frequency and severity of crime sentence
Frequency Low.yr High.yr Avg.yr Weight
LARCENY/THEFT 168,901 0 25 13 2,136,598
ASSAULT 62,449 1 25 13 811,837
DRUG/NARCOTIC 31,180 1 40 20 631,395
ROBBERY 18,652 15 30 23 419,670
BURGLARY 29,020 3 20 12 333,730
SEX OFFENSES,
FORCIBLE
3,927 20 100 60 235,620
FRAUD 14,237 1 25 13 185,081
VEHICLE THEFT 31,002 5 5 5 155,010
KIDNAPPING 2,162 0 100 50 108,208
WEAPON LAWS 7,444 0 20 10 74,812
DSO 510 Business Analytics | Group Project
31
PHASE IV
DSO 510 Business Analytics | Group Project
32
BINARY LOGISTIC REGRESSION
DSO 510 Business Analytics | Group Project
33
BINARY LOGISTIC REGRESSION
DSO 510 Business Analytics | Group Project
34
BINARY LOGISTIC REGRESSION
DSO 510 Business Analytics | Group Project
35
PREDICTIVE MODELING
DSO 510 Business Analytics | Group Project
36
PREDICTIVE MODELING
DSO 510 Business Analytics | Group Project
37
PREDICTIVE MODELING
DSO 510 Business Analytics | Group Project
38
PREDICTIVE MODELING
DSO 510 Business Analytics | Group Project
39
PREDICTIVE MODELING