making use of big data to evaluate the …docs.trb.org/prp/16-2886.pdf · simandl, graettinger,...
Post on 07-Apr-2018
216 Views
Preview:
TRANSCRIPT
Simandl, Graettinger, Smith, Jones, Barnett 1
Paper revised from original submission.
MAKING USE OF BIG DATA TO EVALUATE THE EFFECTIVENESS OF 1
SELECTIVE LAW ENFORCEMENT IN REDUCING CRASHES 2
3
Jenna K. Simandl*, Andrew J. Graettinger, Randy K. Smith, Steven Jones, Timothy E. Barnett 4
*Corresponding Author 5
6
Jenna K. Simandl 7
Kimley-Horn and Associates, Inc. 8
1700 Willow Lawn Drive Suite 200 9
Richmond, VA 23230 10
Tel: (804) 396-6017; Email: jenna.simandl@kimley-horn.com 11
12
Andrew J. Graettinger 13
The University of Alabama 14
Civil, Construction, and Environmental Engineering 15
Box 870205, Tuscaloosa, AL 35487 – 0205 16
Tel: (205) 348-1707; Fax: (205) 348-0783; Email: andrewg@eng.ua.edu 17
18
Randy K. Smith 19
The University of Alabama 20
Department of Computer Science 21
Box 870290, Tuscaloosa, AL 34587 – 0205 22
Tel: (205) 348-1664; Fax: (205) 348-0219; Email: rsmith@cs.ua.edu 23
24
Steven Jones 25
The University of Alabama 26
Civil, Construction, and Environmental Engineering 27
Box 870205, Tuscaloosa, AL 35487 – 0205 28
Tel: (205) 348-3137; Fax: (205) 348-0783; Email: sjones@eng.ua.edu 29
30
Timothy E. Barnett 31
State Safety Operations Engineer 32
Alabama Department of Transportation 33
Montgomery, AL 36110 34
Tel: (334) 353-6464; Email: barnettt@dot.state.al.us 35
36
Word Count: 250 (abstract) + 5571 (text) + 1500 (6 tables/figures x 250) = 7321 words 37
Original Submission Date: 07/30/2015 38
Simandl, Graettinger, Smith, Jones, Barnett 2
Paper revised from original submission.
ABSTRACT 39
State Departments of Transportation (DOTs) across the nation fund selective enforcement 40
campaigns aimed at intensifying law enforcement at certain locations in order to improve traffic 41
safety. At the same time, many states passively collect large datasets such as officer Global 42
Position System (GPS) location tracks. To evaluate the effectiveness of selective enforcement, 43
an approach was developed that employs Structured Query Language (SQL) and Geographic 44
Information System (GIS) technology to mine and integrate police patrol patterns, citations 45
issued, crash occurrences, and selective enforcement periods. This information was analyzed in 46
a relational database within a spatial and temporal analysis framework. The intent was to solely 47
use GIS technology, however the size of the datasets were prohibiting and SQL was used as a 48
Big Data Analytic. The SQL techniques successfully turned over 37 million points of GPS data 49
into 1.3 million points of selective enforcement location information, enabled the geolocation of 50
72.6% of Electronic Citations (eCitations), and identified 21 selective enforcement areas across 51
the State of Alabama. With Big Data Analytics, GIS technology was reestablished as useful for 52
the evaluation of changes in crash and citation frequencies before and during selective 53
enforcement. Paired difference t-tests confirm the decrease of crash frequency with 85% 54
confidence at urban and rural locations. The analysis of the number of citations at the locations 55
confirmed that citations increased during selective enforcement by an average of 254%. The 56
developed methodology is a successful approach using large datasets for an unintended purpose 57
to make valuable engineering conclusions and data-driven discoveries. 58
INTRODUCTION 59
In the age of Big Data, large datasets have become increasingly available to scientists and 60
researchers. The volume of data is growing, the speed of data creation is increasing, and the 61
variety of data, while generally unorganized, is accumulating (McAfee & Brynjolfsson, 2012). 62
Big Data from both experimental and open-source platforms has created a transformative data-63
intensive science for researchers in science, engineering, and other business disciplines 64
(Jahanian, 2012). Advanced management strategies, computations, and minimization techniques 65
are being developed to turn Big Data into useful information for advanced knowledge, 66
predictions, and decision making (Lohr, 2012; Mayer-Schonberger & Cukier, 2013; Mestyan, 67
Yasseri, & Kertesz, 2013). 68
Big Data enables a paradigm shift from hypothesis-driven research to data-driven 69
discovery (Lohr, 2012; Jahanian, 2012). Big Data is also proposed as a means to help resolve the 70
Grand Challenges defined by the National Academy of Engineering (Engineering and Physical 71
Sciences Research Council, 2014). A cohesive relationship between data science, Big Data, and 72
domain expertise are required for effective decision making and data-driven discoveries (Provost 73
& Fawcett, 2013). However, the nature of Big Data is that the dataset is too large for traditional 74
data processing and new technologies or techniques are required. 75
A research project by The University of Alabama for the Alabama Department of 76
Transportation (ALDOT) sought to use Geographic Information System (GIS) technology in 77
order to evaluate the effectiveness of selective law enforcement along state routes in reducing 78
Simandl, Graettinger, Smith, Jones, Barnett 3
Paper revised from original submission.
crashes in Alabama. Officer patrol route data, Electronic Citation (eCitation) data, Electronic 79
Crash (eCrash) data, and selective enforcement data were to be integrated into one spatial and 80
temporal GIS framework for analysis of citation counts and crash frequencies before and during 81
selective enforcement. However, the large dataset quickly overwhelmed GIS processing tools. 82
Structured Query Language (SQL) is the standard language for working with relational 83
database tables, and can be used as a Big Data Analytics tool (Cohen, Dolan, Dunlap, 84
Hellerstein, & Welton, 2009). While not as advanced as other data management tools, SQL 85
queries were executed for successful data minimization in order to perform traffic safety analysis 86
on the effectiveness of selective enforcement along state routes in Alabama from August 1, 2010 87
to July 31, 2011. 88
A relational database is a collection of data, organized in various tables of attributes 89
(columns) and records (rows) that have a specific, unique relationship with one another. The 90
tables are related through the use of primary keys, a unique attribute in a table, which is included 91
in other tables as a foreign key. SQL query expressions were written and executed in order to 92
translate data in tables to valuable information (Viescas & Hernandez, 2014). The following 93
sections present the logic behind the executed SQL queries used for data minimization of Big 94
Data in order to use GIS tools to evaluate the effectiveness of selective law enforcement. The 95
developed methodology is a successful approach of using large datasets for an unintended 96
purpose in order to make valuable engineering conclusions and discoveries. 97
BACKGROUND 98
State Departments of Transportation (DOTs) across the nation fund selective enforcement 99
campaigns aimed at intensifying law enforcement at certain locations in the state in order to 100
improve traffic safety and reduce the number of crashes. Many case studies show that selective 101
police enforcement, whether for a specific negative driver behavior or for a particular high crash 102
location, can have an influential impact on driver behavior, which can successfully reduce the 103
number of crashes in a location (Jonah & Grant, 1985; Vaa, 1997; Erke, Goldenbeld, & Vaa, 104
2009; Walter, Broughton, & Knowles, 2011). The first edition of the Highway Safety Manual, 105
while focused on engineering analysis and treatment, states that crash frequencies may be 106
reduced through enforcement efforts (AASHTO, 2010). 107
Large datasets are available for the analysis of the effectiveness of selective enforcement 108
efforts. By integrating police officer patrol patterns, citations issued, crash occurrences, and 109
selective enforcement periods into one spatial and temporal analysis framework, selective 110
enforcement locations can be verified, and citation and crash frequencies in the respective 111
locations can be identified and analyzed. Due to the volume of the data and its constant growth, 112
big data management strategies need to be employed before analysis is feasible. 113
The Alabama Mobile Selective Law Enforcement Campaign attempts to change negative 114
driver behaviors that contribute to the most severe crashes, including speeding, driving under the 115
influence, and the failure to wear a seatbelt (Peden, et al., 2004; Erke, Goldenbeld, & Vaa, 2009). 116
An agreement between the Alabama Department of Public Safety (DPS) and ALDOT was in 117
Simandl, Graettinger, Smith, Jones, Barnett 4
Paper revised from original submission.
effect between October 2007 and September 2011, employing police officers to work extra-duty 118
shifts for increased law enforcement efforts on various state and federal roadways. 119
The over-arching goal of the increased enforcement at selected high crash locations was 120
to deter negative driver behaviors with an increase in issued citations with the goal of decreasing 121
the number of crashes. In order to evaluate the effectiveness of selective enforcement, SQL and 122
GIS technology were used to integrate police officer patrol patterns, citations issued, crash 123
occurrences, and selective enforcement periods into a relational database and a spatial and 124
temporal analysis framework. A SQL relational database was used as the Big Data Analytic. 125
The database and GIS map information provides the means to verify selective enforcement 126
locations and evaluate the increases and decreases in crash and citation frequencies before and 127
during selective enforcement. 128
A relational database of officer patrol routes, citations, and selective enforcement data 129
was created. In the state of Alabama, State Trooper vehicles are equipped with a Global 130
Positioning System (GPS) unit that is polled every 30 seconds. The location of the trooper is 131
recorded along with a timestamp. Officer patrol route location data is a large dataset that is 132
continually growing. Additionally, Electronic Citations (eCitations) issued by law enforcement 133
officials in the state of Alabama are stored in a database. These eCitations do not require GPS 134
coordinates or other structured location data. The eCitations do have an electronic timestamp 135
indicating when the citation was issued. A temporal join methodology was developed to cross-136
reference timestamps between officer patrol route GPS records and issued citations in order to 137
geolocate citations. Selective enforcement data for participating officers was obtained from DPS 138
timesheets and DOT invoices. A methodology to verify selective enforcement locations was 139
developed by comparing officer patrol routes and dates and hours of selective enforcement shifts. 140
The steps utilized SQL in order to obtain only relevant data for a one year analysis of selective 141
enforcement. The relevant data was imported into a GIS, along with crash data from the Critical 142
Analysis Reporting Environment (CARE) (Center for Advanced Public Safety, 2009), in order to 143
quantify the number of crashes and the number of citations in selective enforcement locations 144
before and during the selective enforcement effort. Due to the sensitive nature of crash data, 145
maps were only generated for internal DOT use. This methodology is extensible to other states 146
with available officer patrol routes, citation locations, crash data, and selective enforcement time 147
periods. 148
METHODOLOGY 149
A methodology was developed using SQL to geolocate eCitations and verify where selective 150
enforcement efforts took place in Alabama. This required identifying key locations in the GPS 151
officer location data and importing that data into GIS technology in order to evaluate the 152
effectiveness of the selective enforcement campaign in reducing crashes at high crash locations. 153
Initial work for this project involved digitizing selective enforcement shift information, including 154
the date and hours worked, for each participating officer from DPS extra-duty timesheets and 155
DOT invoice files to spreadsheets for use in electronic analysis. Law enforcement officials in 156
the state of Alabama are given a unique UserID, which is used for identification purposes in both 157
Simandl, Graettinger, Smith, Jones, Barnett 5
Paper revised from original submission.
the GPS officer patrol route and eCitation databases. The unique UserID was identified for each 158
participating officer in the selective enforcement data. 159
Data processing tasks were executed using SQL queries because of the large datasets and 160
the limited capabilities of GIS tools. The SQL queries described herein are presented in detail in 161
Simandl, 2015. The first task for this project included developing a methodology to cross-162
reference timestamps between officer patrol routes and issued citations in order to geolocate 163
citations with and without GPS coordinates. Furthermore, a methodology to verify selective 164
enforcement locations was developed by comparing officer patrol routes and dates and hours of 165
selective enforcement shifts. The resulting data from the SQL queries was specific information 166
as opposed to the Big Data in its entirety, and was imported into a GIS along with crash data 167
from the Critical Analysis Reporting Environment (CARE). Only essential data for analysis was 168
imported in order to increase the feasibility of GIS to verify selective enforcement locations and 169
quantify the number of crashes and citations in selective enforcement locations before and during 170
selective enforcement efforts. 171
Geolocating eCitations 172
Performing a spatial-temporal analysis through a sequence of SQL queries, the time-stamped 173
eCitations and State Trooper location data were cross-referenced in order to accurately locate an 174
eCitation to the nearest GPS point in time. The UserIDs of the officers who participated in 175
selective enforcement were joined to the eCitation data based on UserID, in order to return 176
eCitations written by the relevant officers. The criterion to exclude voided eCitations was also 177
included, as voided eCitations were not officially issued. Over the course of the year being 178
analyzed in this research, including both selective enforcement shifts and regular duty shifts, 179
475,214 eCitations were issued by the participating officers. 180
Data for each individual GPS trace point includes the UserID, the latitude and longitude 181
location, and a timestamp recorded in UTC time. The officers who participated in selective 182
enforcement were joined to the GPS data based on UserID, in order to return GPS location points 183
for the relevant officers exclusively. For the year being analyzed in this research, including both 184
selective enforcement shifts and regular duty shifts, 37,628,124 GPS points were recorded for the 185
selective enforcement participating officers. 186
In order to cross-reference the timestamps of the eCitations and the GPS location points, 187
a SQL query was executed to calculate the time difference between GPS location points and 188
eCitations for a particular officer using the DATEDIFF() command. This query only returned 189
eCitations joined to a GPS location point within 600 seconds in order to maximize the number of 190
eCitations located while also providing reasonable eCitation location accuracies. Out of the 191
returned results, the minimum time difference between GPS location point and eCitation was 192
chosen to signify the most accurate location of the eCitation. 193
The spatial-temporal analysis, produced through a sequence of SQL queries, located 194
68.6% of eCitations. Further investigation showed that 49 of the officers (11.6%) participating 195
in selective enforcement efforts did not have any GPS trace data associated with their respective 196
UserIDs, potentially due to lack of GPS equipment and/or faulty equipment. With respect to the 197
Simandl, Graettinger, Smith, Jones, Barnett 6
Paper revised from original submission.
88.4% of officers that had trace data, 72.6% of eCitations were geolocated using the developed 198
methodology. Additionally, 95% of those eCitations were joined to a GPS point with a time 199
difference of only 30 seconds. The resulting dataset of 325,926 eCitations was imported into 200
GIS for visual mapping. Citations that were not successfully located lacked officer location data 201
or there were breaks in the officer trace location data due to lost GPS signals. The loss of 202
eCitation data from the analysis has minimal negative implications to the results. The eCitations 203
that were included represent a conservative understanding of the number of eCitations at various 204
locations. 205
Identifying Selective Enforcement Locations 206
Selective enforcement DPS timesheets and DOT invoices provided the following information: 207
Date Worked, First Name, Last Name, Middle Initial, Suffix, and Hours. To relate the selective 208
enforcement hours worked with the GPS location data, the total hours worked in a full shift was 209
found. By understanding the relationship between total hours worked and selective enforcement 210
hours worked by an officer on a particular day, specific GPS location points were defined as 211
either “during selective enforcement” or “not during selective enforcement.” The location of the 212
“during selective enforcement,” or overtime work, is vital for the analysis of the effectiveness of 213
selective enforcement. The workflow framework for identifying individual GPS points as 214
“during selective enforcement” is shown in FIGURE 1. FIGURE 1(a) illustrates the steps to 215
calculate full shifts from the GPS officer patrol route data, in order to establish the necessary 216
relationship to join in the selective enforcement data, denoted by the pink arrow. These steps are 217
described in the sub-section titled, Calculating Full Officer Shifts from GPS Location Data. 218
FIGURE 1(b) illustrates the steps to identify individual GPS points as during or not during 219
selective enforcement shifts after the establishment of the relationship between the data, as noted 220
by the pink arrow. These steps are described in the sub-section titled, Defining Individual GPS 221
Points During Selective Enforcement Shifts. The following steps were used to confirm that 222
selective enforcement was completed in the appropriate locations. 223
Simandl, Graettinger, Smith, Jones, Barnett 7
Paper revised from original submission.
224
FIGURE 1 Workflow framework for (a) establishing a relationship between the 225
GPS data and selective enforcement (SE) data, denoted by the pink arrow; and (b) 226
identifying individual GPS points as during selective enforcement 227
Calculating Full Officer Shifts from GPS Location Data 228
A series of SQL queries was used to define the beginning and ending of a full working shift for 229
an officer and calculate the total hours worked for the shift. The LEAD function in SQL 230
accesses the previous row in a column of a table, and the LAG function in SQL accesses the next 231
row in a column of a table. To calculate the time difference in seconds between two successive 232
GPS location points, the LEAD function was used, partitioned by UserID and ordered by GPS 233
Time to access the Next GPS Time in the table. The results of the LEAD calculation are stored 234
in a new column in a table. The DATEDIFF() command created an additional column, recording 235
the difference between the GPS Time column and the newly added Next GPS Time column. An 236
example of this query is shown in FIGURE 2. 237
Simandl, Graettinger, Smith, Jones, Barnett 8
Paper revised from original submission.
238
FIGURE 2 Example dataset and query results to calculate the time difference 239
between successive GPS points 240
The beginning and ending of a shift was identified in the table by using the CASE 241
function in SQL, which enforces a “when-then” statement. When the time difference between 242
successive GPS location points was greater than a predetermined time tolerance limit, then the 243
GPS location point was considered to be the beginning of a shift. An example of a large time 244
difference signifying the beginning of a shift is shown in FIGURE 2, highlighted with a red 245
circle. On the other hand, when the time difference between successive GPS location points was 246
less than or equal to the predetermined time tolerance limit, then the GPS location point was 247
considered to be within a shift. To select the time tolerance for shift generation that best 248
represents the GPS data and reasonable shift lengths that officers work, the time difference 249
tolerance of six hours was chosen because six hours minimized shifts of over 24 hours compared 250
to using eight hours, kept an optimum number of seven-, eight-, and nine-hour shifts, and 251
minimized the number of one-hour shifts compared to using five hours. 252
Using the time tolerance of six hours, the beginning and ending timestamps of a 253
particular shift for each officer were selected and compiled into a new table with new columns of 254
Shift Start and Shift End. A query was used to calculate the difference between these two 255
timestamps, which resulted in the total hours worked for a particular shift and generated a total 256
of 55,402 GPS-identified shifts. The date difference was calculated in seconds and then 257
converted to hours to ensure the precision of the time. Shifts with hours worked between 0.5 258
hours and 17 hours were considered reasonable, as 0.5 hours is the smallest increment recorded 259
in the selective enforcement invoices and 17 hours provides a conservative inclusion of a 260
maximum possible shift of two back-to-back 8-hour shifts. Out of all the GPS-identified shifts, 261
94.2% had a reasonable shift length. 262
Defining Individual GPS Points During Selective Enforcement Shifts 263
By understanding the GPS point data in terms of full officer shifts, a relationship to the selective 264
enforcement data was established. Using the relationship between hours worked in a GPS-265
identified shift to the hours worked by an officer for selective enforcement, a series of SQL 266
queries was used to define an individual GPS point as “during selective enforcement” or “not 267
during selective enforcement.” To relate the selective enforcement shifts to the identified GPS 268
Simandl, Graettinger, Smith, Jones, Barnett 9
Paper revised from original submission.
location point shifts, a join was executed in SQL based on UserID and Shift Start date. There 269
were 66% of selective enforcement shifts accounted for and joined to an appropriate GPS-270
identified shift. Because of the 49 officers that did not have any GPS trace data associated with 271
their respective UserIDs, 477 selective enforcement shifts were not successfully included. When 272
excluding those particular officers, the GPS generated shifts successfully recognized 72.6% of 273
the logged selective enforcement shifts. Additionally, this query calculated the difference 274
between GPS shift hours worked and Selective Enforcement shift hours worked. The loss of 275
selective enforcement data from the analysis due to a lack of officer GPS data suggests that 276
additional selective enforcement locations across the state were potentially unidentified or that 277
various locations across the state could potentially have had more police enforcement presence 278
than the analysis shows. However, the verified selective enforcement locations in the study still 279
represent the highest level of selective enforcement presence from the available data across the 280
state between August 2010 and July 2011. 281
Simultaneously, queries to calculate cumulative time worked in a GPS-identified shift 282
were executed. Using the difference between the GPS-identified shift hours worked and the 283
selective enforcement hours worked in conjunction with the cumulative time worked, individual 284
GPS location points were denoted as “during selective enforcement” or “not during selective 285
enforcement.” The scenarios for the CASE function, a “when-then” command, used in the SQL 286
query and the necessary assumptions for the methodology are shown in TABLE 1. 287
TABLE 1 The CASE function scenarios, assumptions, and query description for the 288
SQL query used to denote an individual GPS point as during selective enforcement. 289
Scenario Assumption Query
Logged selective
enforcement hours < GPS
shift hours worked
The last ‘x’ hours of the
GPS shift are the overtime
selective enforcement
hours
When Diff <= Cumulative
Time, then ‘Yes’
Logged selective
enforcement hours = GPS
shift hours worked
Within 0.5 hours is
considered “equal”
When Diff is Between -0.5
and 0.5, then ‘Yes’
Logged selective
enforcement hours > GPS
shift hours worked
The whole shift is
selective enforcement, and
the discrepancy can be
explained by a loss of GPS
signal
When Diff < 0, then ‘Yes’
A final query was executed to create a table of only GPS points during selective 290
enforcement shifts, resulting in a total of 1,647,711 GPS points. The series of SQL queries 291
demonstrates successful data minimization, by turning over 37.6 million GPS data points, into 292
approximately 1.65 million points of useful information. 293
Simandl, Graettinger, Smith, Jones, Barnett 10
Paper revised from original submission.
Verifying Selective Enforcement Locations 294
Officers participating in the Alabama selective enforcement efforts were allowed up to two hours 295
of paid travel time to selective enforcement locations. Therefore, the GPS points during 296
selective enforcement are not all localized in a particular selective enforcement area. 297
Additionally, police presence in a selective enforcement location does not necessarily mean the 298
presence was statistically significant over other areas of patrol. In order to verify selective 299
enforcement locations, a statistical method employing clustering of spatial data was 300
implemented. To employ the location verification process, the GPS points during selective 301
enforcement and eCitation data were integrated into GIS based on Route-Milepost (Route-MP) 302
along state routes, resulting in 1,283,211 GPS points during selective enforcement and 243,800 303
eCitations over the course of the year. With Route-MP data associated to each point event, a 304
clustering methodology was executed on the GPS point data using a SQL query technique 305
involving the summation and grouping of point events at each Route-MP with various bucket 306
sizes. This technique provides an efficient Hot Spot Analysis. Hot spot analysis tools in GIS 307
were not used because of the large amount of officer GPS location data, and GIS Hot Spot tools 308
neglect the one-dimensional integrity of the roadway network. 309
SQL Query Technique to Verify Selective Enforcement Locations 310
A series of SQL queries using GPS point data identified as “selective enforcement” was 311
implemented to verify selective enforcement locations by finding clusters of points at identifiable 312
Route-Milepost locations. The Route-MP data was incorporated by summarizing the number of 313
selective enforcement points at Route-MP locations using the COUNT aggregate function. 314
Bucket sizes of one-tenth of a mile, one-half of a mile, and one mile were evaluated. To 315
standardize the bucket sizes and confirm appropriate lengths along a particular route in the 316
network, queries to calculate the difference between successive Mileposts and the cumulative 317
difference of Mileposts were employed. The summation of “selective enforcement” points in 318
groups of the three bucket sizes was centered in the middle of each bucket size and recorded. 319
The LEAD and LAG function were used to calculate the number of selective enforcement points 320
prior to and after a particular row of data. A hypothetical example of the logic for the 321
summation query is shown in FIGURE 3. 322
The Sum column in FIGURE 3 was executed for all three bucket sizes used in the 323
analysis. The sum data was imported back into GIS for thematic mapping using natural jenks for 324
statistical recognition of large sums, or clusters, of selective enforcement locations. Natural 325
jenks organize the data into groups, or classes, that exclusively minimize the variance of the data 326
points included in each class. While the located selective enforcement points may not be the 327
exact area law enforcement was directed to patrol with extra-duty hours, the largest natural jenk 328
was used to identify the areas with statistically high levels of selective enforcement and was 329
therefore considered a selective enforcement location for the analysis. 330
Simandl, Graettinger, Smith, Jones, Barnett 11
Paper revised from original submission.
331
FIGURE 3 SQL query summation analysis logic: centering the summation on a 332
particular milepost by LEAD and LAG functions 333
Three natural jenk classes were used on the one mile bucket summation and grouping 334
analysis values in order to optimize the length of the located and verified selective enforcement 335
locations. The one-tenth of a mile and one-half of a mile analyses generated locations of smaller 336
length, which were in series and relatively close together, whereas the one mile analysis 337
recognized the smaller, close locations as one whole location. The limit value for the one mile 338
analysis was 2885 selective enforcement points at a particular Route-MP. Across the state of 339
Alabama, 26 locations were identified as selective enforcement areas through this procedure. 340
Incorporating eCrash Data from CARE 341
The Critical Analysis Reporting Environment (CARE) is data analysis software developed by 342
The Center for Advanced Public Safety at The University of Alabama. A data source of crash 343
records from 2010-2014 were imported into CARE, and filters were created to export crash 344
report data for the research time period as a GIS shapefile with GPS coordinates. Two exports 345
were completed: crashes during one year before the selective enforcement year and crashes 346
during the selective enforcement year. The created filters also incorporated highway 347
classifications, and only exported crashes along Interstate, Federal and State routes. A spatial 348
join of the GIS shapefiles of the exported crash data to the Route-Milepost information was 349
performed to ensure consistency of location identification with the selective enforcement points 350
and eCitations. 351
Simandl, Graettinger, Smith, Jones, Barnett 12
Paper revised from original submission.
RESULTS 352
For a one year study of the Alabama Mobile Selective Law Enforcement Campaign from August 353
1, 2010 through July 21, 2011, selective enforcement efforts were located at 26 areas along state 354
routes across the state of Alabama. After further investigation of each location using GIS and 355
Google Maps, three locations that were found to have high selective enforcement presence were 356
located at state trooper posts and were omitted from further study. Additionally, two different 357
areas were found to have two separate defined locations of selective enforcement high points, 358
separated by 0.2 miles, and were therefore merged. 359
Number of Crashes and Citations Before and During Selective Enforcement 360
The number of crashes and the number of issued citations were evaluated at the 21 361
locations before and during the selective enforcement research year, including both regular and 362
overtime shifts. The number of crashes and citations for each location over the course of the 363
before and during year are shown in TABLE 2, along with the percent change between the 364
before and during selective enforcement crash and citation counts. A negative percent change 365
for crashes marks a decrease in the number of crashes during the selective enforcement research 366
year, and a positive percent change for citations marks an increase in the number of issued 367
citations. TABLE 2 is ordered by the percent change in the number of crashes at each location, 368
and is sectioned off by negative and positive percent changes. There were 11 locations that 369
experienced a decrease in the number of crashes. These locations had variable changes in the 370
number of issued citations. Notably, Location 9, an enforcement effort between two 371
interchanges, experienced a citation increase of 2015%. 372
Simandl, Graettinger, Smith, Jones, Barnett 13
Paper revised from original submission.
TABLE 2 Crash and Citation Data for the Selective Enforcement Locations, 373
organized by percent change on crashes before and during selective enforcement 374
Crashes Citations Equivalent C
Level Crashes
Location
One
Year
Before
One
Year
During
Percent
Change
One
Year
Before
One
Year
During
Percent
Change
One
Year
Before
One
Year
During
7 2 0 -100.00% 93 153 64.52% 0 0
8 2 0 -100.00% 31 23 -25.81% 0 0
12 33 19 -42.42% 1512 963 -36.31% 27 5
21 22 14 -36.36% 91 257 182.42% 10 14
11 10 7 -30.00% 180 180 0.00% 6 4
19 17 14 -17.65% 75 505 573.33% 5 11
10 24 20 -16.67% 268 1133 322.76% 0 9
5 15 13 -13.33% 4 41 925.00% 8 1
13 16 14 -12.50% 916 511 -44.21% 8 9
9 16 15 -6.25% 33 698 2015.15% 4 4
15 16 15 -6.25% 214 826 285.98% 10 10
3 22 22 0.00% 886 1243 40.29% 11 8
16 2 2 0.00% 33 46 39.39% 6 2
20 9 9 0.00% 144 343 138.19% 2 2
1 13 15 15.38% 311 223 -28.30% 7 7
4 43 50 16.28% 35 170 385.71% 21 15
6 7 9 28.57% 371 365 -1.62% 1 1
18 13 19 46.15% 229 568 148.03% 7 9
2 19 30 57.89% 140 161 15.00% 7 11
17 3 5 66.67% 27 91 237.04% 0 0
14 13 34 161.54% 868 1599 84.22% 8 10
375
Severity weighted crash counts are also presented in TABLE 2. The weighted crash 376
counts were included in the analysis of the effectiveness of selective enforcement in order to 377
draw conclusions on any changes in the severity of crashes. For instance, the number of severe 378
crashes may have decreased, while the number of total crashes may have remained constant or 379
even increased. The severity weighted crash counts for the 21 selective enforcement areas 380
before and during the selective enforcement year are presented in terms of equivalent C level 381
crashes, obtained by using the “KABCO” injury scale (Federal Highway Administration, 2011). 382
Evaluating the Effectiveness of Selective Enforcement 383
The effectiveness of the selective enforcement efforts were evaluated in terms of crash 384
frequencies and the number of issued citations. A paired difference t-test was used to determine 385
whether the mean, or expected value, of the differences between the before and during crash 386
frequencies and citation counts yielded a significant change. A positive value change for crashes 387
Simandl, Graettinger, Smith, Jones, Barnett 14
Paper revised from original submission.
will conclude that the number of crashes decreased during the selective enforcement efforts. The 388
null and alternative hypotheses for the before and during crash number tests are, 389
𝐻0: 𝜇𝑑 ≤ 0 and 𝐻1: 𝜇𝑑 > 0 390
where 𝐻0 is the null hypothesis, 𝐻1 is the alternative hypothesis, and 𝜇𝑑 is the difference in the 391
means. A negative value change for citations will conclude that the number of issued citations 392
increased during selective enforcement. The null and alternative hypotheses for the before and 393
during citations number tests are, 394
𝐻0: 𝜇𝑑 ≥ 0 and 𝐻1: 𝜇𝑑 < 0 395
If the null hypothesis is rejected based on the t-test statistic and a corresponding P-value 396
less than a particular significance level, than the alternative hypothesis is accepted, meaning 397
there was a statistical change in the number of crashes or citations at the various selective 398
enforcement locations. For example, at a significance level of 0.05, the null hypothesis is 399
rejected with 95% confidence and is considered statistically significant. The paired difference t-400
test determines whether a change in the mean values of two samples is statistically significant, 401
interpreted by the resulting P-value. The actual mean values of the samples do not indicate the 402
results of the test since the variance of the datasets may be differing. 403
The 21 locations were analyzed by area type. Locations were considered rural if the 404
population of the location municipality was less than 5000, while locations were considered 405
urban if the population of the location municipality was greater than 5000. The paired difference 406
t-test results for the total number of crashes, the number of issued citations, and the number of 407
equivalent C injury level crashes for urban and rural selective enforcement locations are shown 408
in TABLE 3. 409
For urban selective enforcement locations along state routes in Alabama from August 1, 410
2010 through July 21, 2011, the p-value for the crash frequency data was 0.148. The null 411
hypothesis (the number of crashes remained constant or increased) can be rejected at a 412
significance level of 0.15, or with 85% confidence. While the actual mean value of the crash 413
frequency went up during selective enforcement, from 14.3 to 17.2, the variance also increased. 414
For rural selective enforcement locations, the p-value for the crash frequency data was 0.122. 415
The null hypothesis can be rejected at a significance level of 0.15, or with 85% confidence. The 416
results confirm that crash frequencies were reduced at both urban and rural selective enforcement 417
locations, with 85% confidence. While not considered statistically significant at a significance 418
level of 0.15, a marked decrease or trend was identified. The number of issued citations 419
increased during selective enforcement for both urban and rural locations, with rejections of the 420
null hypotheses with low-p values of 0.040 and 0.103, respectively. The results confirm that the 421
number of issued citations was increased at selective enforcement locations. The results for the 422
equivalent C severity level crash frequencies at both urban and rural locations did not constitute 423
the rejection of the null hypothesis. Therefore, when analyzing urban and rural locations 424
exclusively, it is plausible that selective enforcement was not effective in reducing the severity of 425
crash occurrences. 426
Simandl, Graettinger, Smith, Jones, Barnett 15
Paper revised from original submission.
In summary, there is evidence of trends between selective enforcement and the decrease 427
of crashes by analyzing urban and rural locations separately, as opposed to collectively. 428
Analyzing the locations by area type explains some of the variance of the data, and shows some 429
improvement to understanding the effectiveness of selective enforcement. However, the 430
decrease in the severity of crashes was inconclusive. In each paired difference test performed, 431
the results confirm that the number of issued citations increased during the selective enforcement 432
efforts. Overall, there was a statistically significant increase in citations at all identified selective 433
enforcement locations, while there was a marked decrease in the number of crashes. An analysis 434
that included crash and citation frequency collectively to directly examine the relationship 435
between the two was not included in this study. The goal of this work was to evaluate the effect 436
of selective enforcement on the number of citations, and more importantly the number of 437
crashes, before and during selective enforcement. The characteristics of the selective 438
enforcement locations, such as number of lanes, average annual daily traffic, and even safe 439
pullout zones for officers, varied greatly and have a large effect on evaluating correlations 440
between citations and crashes across all selective enforcement areas. 441
TABLE 3 Paired Difference t-test Results for Urban and Rural selective 442
enforcement locations along state routes in Alabama 443
Urban Locations
Crashes Citations
Equivalent C
Crashes
One Year
Before
One Year
During
One Year
Before
One Year
During
One Year
Before
One Year
During
Mean 14.3 17.2 181.1 357.8 6.9 6.7
Variance 151.122 257.956 66137.21 222289.1 34.9889 30.9
Observations 10 10 10 10 10 10
Hypothesized μd 0
0
0
Deg. of Freedom 9
9
9
t-Statistic -1.111
-1.967
0.198
P-value 0.148
0.040
0.424
Rural Locations
Crashes Citations Equivalent C
Crashes
One Year
Before
One Year
During
One Year
Before
One Year
During
One Year
Before
One Year
During
Mean 15.818 14 422.727 592.818 7.182 5.909
Variance 72.564 39.8 227955.4 166447 58.964 17.491
Observations 11 11 11 11 11 11
Hypothesized μd 0 0 0
Deg. of Freedom 10 10 10
t-Statistic 1.237 -1.353 0.525
P-value 0.122 0.103 0.305
Simandl, Graettinger, Smith, Jones, Barnett 16
Paper revised from original submission.
CONCLUSIONS 444
For the analysis of the effectiveness of selective law enforcement in Alabama, Structured Query 445
Language (SQL) was used to employ data minimization techniques necessary to turn a Big 446
Dataset into valuable information. The SQL techniques successfully turned over 37 million 447
points of GPS data into 1.3 million points of selective enforcement location information, enabled 448
the geolocation of 72.6% of Electronic Citations (eCitations), and identified 21 selective 449
enforcement areas across the State of Alabama. With Big Data Analytics, Geographic 450
Information System (GIS) technology was shown to be useful for the evaluation of the increases 451
and decreases in crash frequencies and the number of issued citations before and during selective 452
enforcement. 453
Officer patrol routes, geolocated citations, crash data, and selective enforcement time 454
periods were integrated into one cohesive source of spatial and temporal data for the analysis. 455
The processing tasks in this research were essential to the evaluation of selective law 456
enforcement and the results, exemplifying data-driven discovery. Paired difference t-tests 457
confirm the decrease of crash frequency with 85% confidence, at urban and rural locations 458
separately. The analysis of the number of issued citations at the locations confirmed that 459
citations increased during the selective enforcement year by an average of 254%. While a 460
statistical increase in the number of issued citations does not necessarily cause a statistical 461
decrease in the number of crashes, the increase seems to have some effect on the number of 462
crashes at various locations during the selective enforcement effort year in the study. Future 463
work will include analyzing crash frequencies several years after the selective enforcement effort 464
to potentially draw conclusions on any lasting effects, in attempts to understand possible changes 465
in driver behavior. However, continued selective enforcement campaigns or a high presence of 466
selective enforcement police presence in the same locations may cause potential problems for 467
interpreting these results. Future work will also involve reducing crash frequency at high crash 468
locations through selective enforcement campaign recommendations such as implementing steps 469
to locate crash hot spots for selective enforcement efforts, developing crash modification factors, 470
and calculating a return on investment for selective enforcement campaigns. 471
ACKNOWLEDGMENTS 472
This project was funded by the Alabama Department of Transportation. Special thanks for their 473
contributions to this work go to Beau Elliott and Jesse Norris, The Center for Advanced Public 474
Safety, and Brittany Shake, Graduate Research Assistant, The University of Alabama. 475
REFERENCES 476
AASHTO. (2010). Highway Safety Manual. Washington, DC: American Society of State 477
Highway and Transportation Officials. 478
Center for Advanced Public Safety. (2009). Center for Advanced Public Safety. Retrieved 479
February 22, 2015, from Overview: http://caps.ua.edu/overview.aspx 480
Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J., & Welton, C. (2009). MAD Skills: New 481
Analysis Practices for Big Data. 2(2), 1481-1492. 482
Simandl, Graettinger, Smith, Jones, Barnett 17
Paper revised from original submission.
Engineering and Physical Sciences Research Council. (2014). Engineering Grand Challenges: 483
Report on outcomes of a retreat. Stratford-upon-Avon. 484
Erke, A., Goldenbeld, C., & Vaa, T. (2009). Good practices in the selected key areas: Speeding, 485
drink driving, and seat belt wearing: Results from meta-analysis. European Commission. 486
Jahanian, F. (2012, July). Pillars of Societal Innovation: Research and Education in Computing 487
and Information. Biennial CRA Conference. Snowbird, UT: National Science Foundation. 488
Jonah, B., & Grant, B. (1985). Long-term effectiveness of selective traffic enforcement programs 489
for increasing seat belt use. The Journal of Applied Psychology, 70(2), 257-263. 490
Lohr, S. (2012, February 11). The Age of Big Data. The New York Times, 11. 491
Mayer-Schonberger, V., & Cukier, K. (2013). Letting the data speak. In Big Data: A revolution 492
that will transform how we live, work, and think. Houghton Mifflin Harcourt. 493
McAfee, A., & Brynjolfsson, E. (2012). Big Data: The Management Revolution. Harvard 494
Business Review, 90(10), pp. 61-67. 495
Mestyan, M., Yasseri, T., & Kertesz, J. (2013). Early prediction of movie box office success 496
based on Wikipedia activity big data. PloS one, 8.8. 497
Peden, M., Scurfield, R., Sleet, D., Mohan, D., Hyder, A., Jarawan, E., et al. (2004). World 498
report on road traffic injury prevention. Geneva: World Heath Organization. 499
Provost, F., & Fawcett, T. (2013). Data Science and its relationship to big data and data-driven 500
decision making. Big Data, 1(1), 51-59. 501
Simandl, J. (2015). GIS-Based Evaluation of the Effectiveness of Selective Law Enforcement 502
Campaigns in Reducing Crashes. Tuscaloosa, Alabama: The University of Alabama. 503
Vaa, T. (1997). Increased police enforcement: Effects on speed. Accident Analysis and 504
Prevention, 29(3), 373-385. 505
Viescas, J., & Hernandez, M. (2014). SQL Queries for Mere Mortals. Boston, MA: Addison-506
Wesley. 507
Walter, L., Broughton, J., & Knowles, J. (2011). The effects of increased police enforcement 508
along a route in London. Accident Analysis and Prevention, 43(3), 1219-1227. 509
top related