making use of big data to evaluate the …docs.trb.org/prp/16-2886.pdf · simandl, graettinger,...

Simandl, Graettinger, Smith, Jones, Barnett 1

Paper revised from original submission.

MAKING USE OF BIG DATA TO EVALUATE THE EFFECTIVENESS OF 1

SELECTIVE LAW ENFORCEMENT IN REDUCING CRASHES 2

3

Jenna K. Simandl*, Andrew J. Graettinger, Randy K. Smith, Steven Jones, Timothy E. Barnett 4

*Corresponding Author 5

6

Jenna K. Simandl 7

Kimley-Horn and Associates, Inc. 8

1700 Willow Lawn Drive Suite 200 9

Richmond, VA 23230 10

Tel: (804) 396-6017; Email: [email protected] 11

12

Andrew J. Graettinger 13

The University of Alabama 14

Civil, Construction, and Environmental Engineering 15

Box 870205, Tuscaloosa, AL 35487 – 0205 16

Tel: (205) 348-1707; Fax: (205) 348-0783; Email: [email protected] 17

18

Randy K. Smith 19


Department of Computer Science 21



24

Steven Jones 25


Civil, Construction, and Environmental Engineering 27



30

Timothy E. Barnett 31

State Safety Operations Engineer 32

Alabama Department of Transportation 33

Montgomery, AL 36110 34

Tel: (334) 353-6464; Email: [email protected] 35

36

Word Count: 250 (abstract) + 5571 (text) + 1500 (6 tables/figures x 250) = 7321 words 37

Original Submission Date: 07/30/2015 38



ABSTRACT 39

State Departments of Transportation (DOTs) across the nation fund selective enforcement 40

campaigns aimed at intensifying law enforcement at certain locations in order to improve traffic 41

safety. At the same time, many states passively collect large datasets such as officer Global 42

Position System (GPS) location tracks. To evaluate the effectiveness of selective enforcement, 43

an approach was developed that employs Structured Query Language (SQL) and Geographic 44

Information System (GIS) technology to mine and integrate police patrol patterns, citations 45

issued, crash occurrences, and selective enforcement periods. This information was analyzed in 46

a relational database within a spatial and temporal analysis framework. The intent was to solely 47

use GIS technology, however the size of the datasets were prohibiting and SQL was used as a 48

Big Data Analytic. The SQL techniques successfully turned over 37 million points of GPS data 49

into 1.3 million points of selective enforcement location information, enabled the geolocation of 50

72.6% of Electronic Citations (eCitations), and identified 21 selective enforcement areas across 51

the State of Alabama. With Big Data Analytics, GIS technology was reestablished as useful for 52

the evaluation of changes in crash and citation frequencies before and during selective 53

enforcement. Paired difference t-tests confirm the decrease of crash frequency with 85% 54

confidence at urban and rural locations. The analysis of the number of citations at the locations 55

confirmed that citations increased during selective enforcement by an average of 254%. The 56

developed methodology is a successful approach using large datasets for an unintended purpose 57

to make valuable engineering conclusions and data-driven discoveries. 58

INTRODUCTION 59

In the age of Big Data, large datasets have become increasingly available to scientists and 60

researchers. The volume of data is growing, the speed of data creation is increasing, and the 61

variety of data, while generally unorganized, is accumulating (McAfee & Brynjolfsson, 2012). 62

Big Data from both experimental and open-source platforms has created a transformative data-63

intensive science for researchers in science, engineering, and other business disciplines 64

(Jahanian, 2012). Advanced management strategies, computations, and minimization techniques 65

are being developed to turn Big Data into useful information for advanced knowledge, 66

predictions, and decision making (Lohr, 2012; Mayer-Schonberger & Cukier, 2013; Mestyan, 67

Yasseri, & Kertesz, 2013). 68

Big Data enables a paradigm shift from hypothesis-driven research to data-driven 69

discovery (Lohr, 2012; Jahanian, 2012). Big Data is also proposed as a means to help resolve the 70

Grand Challenges defined by the National Academy of Engineering (Engineering and Physical 71

Sciences Research Council, 2014). A cohesive relationship between data science, Big Data, and 72

domain expertise are required for effective decision making and data-driven discoveries (Provost 73

& Fawcett, 2013). However, the nature of Big Data is that the dataset is too large for traditional 74

data processing and new technologies or techniques are required. 75

A research project by The University of Alabama for the Alabama Department of 76

Transportation (ALDOT) sought to use Geographic Information System (GIS) technology in 77

order to evaluate the effectiveness of selective law enforcement along state routes in reducing 78



crashes in Alabama. Officer patrol route data, Electronic Citation (eCitation) data, Electronic 79

Crash (eCrash) data, and selective enforcement data were to be integrated into one spatial and 80

temporal GIS framework for analysis of citation counts and crash frequencies before and during 81

selective enforcement. However, the large dataset quickly overwhelmed GIS processing tools. 82

Structured Query Language (SQL) is the standard language for working with relational 83

database tables, and can be used as a Big Data Analytics tool (Cohen, Dolan, Dunlap, 84

Hellerstein, & Welton, 2009). While not as advanced as other data management tools, SQL 85

queries were executed for successful data minimization in order to perform traffic safety analysis 86

on the effectiveness of selective enforcement along state routes in Alabama from August 1, 2010 87

to July 31, 2011. 88

A relational database is a collection of data, organized in various tables of attributes 89

(columns) and records (rows) that have a specific, unique relationship with one another. The 90

tables are related through the use of primary keys, a unique attribute in a table, which is included 91

in other tables as a foreign key. SQL query expressions were written and executed in order to 92

translate data in tables to valuable information (Viescas & Hernandez, 2014). The following 93

sections present the logic behind the executed SQL queries used for data minimization of Big 94

Data in order to use GIS tools to evaluate the effectiveness of selective law enforcement. The 95

developed methodology is a successful approach of using large datasets for an unintended 96

purpose in order to make valuable engineering conclusions and discoveries. 97

BACKGROUND 98

State Departments of Transportation (DOTs) across the nation fund selective enforcement 99

campaigns aimed at intensifying law enforcement at certain locations in the state in order to 100

improve traffic safety and reduce the number of crashes. Many case studies show that selective 101

police enforcement, whether for a specific negative driver behavior or for a particular high crash 102

location, can have an influential impact on driver behavior, which can successfully reduce the 103

number of crashes in a location (Jonah & Grant, 1985; Vaa, 1997; Erke, Goldenbeld, & Vaa, 104

2009; Walter, Broughton, & Knowles, 2011). The first edition of the Highway Safety Manual, 105

while focused on engineering analysis and treatment, states that crash frequencies may be 106

reduced through enforcement efforts (AASHTO, 2010). 107

Large datasets are available for the analysis of the effectiveness of selective enforcement 108

efforts. By integrating police officer patrol patterns, citations issued, crash occurrences, and 109

selective enforcement periods into one spatial and temporal analysis framework, selective 110

enforcement locations can be verified, and citation and crash frequencies in the respective 111

locations can be identified and analyzed. Due to the volume of the data and its constant growth, 112

big data management strategies need to be employed before analysis is feasible. 113

The Alabama Mobile Selective Law Enforcement Campaign attempts to change negative 114

driver behaviors that contribute to the most severe crashes, including speeding, driving under the 115

influence, and the failure to wear a seatbelt (Peden, et al., 2004; Erke, Goldenbeld, & Vaa, 2009). 116

An agreement between the Alabama Department of Public Safety (DPS) and ALDOT was in 117



effect between October 2007 and September 2011, employing police officers to work extra-duty 118

shifts for increased law enforcement efforts on various state and federal roadways. 119

The over-arching goal of the increased enforcement at selected high crash locations was 120

to deter negative driver behaviors with an increase in issued citations with the goal of decreasing 121

the number of crashes. In order to evaluate the effectiveness of selective enforcement, SQL and 122

GIS technology were used to integrate police officer patrol patterns, citations issued, crash 123

occurrences, and selective enforcement periods into a relational database and a spatial and 124

temporal analysis framework. A SQL relational database was used as the Big Data Analytic. 125

The database and GIS map information provides the means to verify selective enforcement 126

locations and evaluate the increases and decreases in crash and citation frequencies before and 127

during selective enforcement. 128

A relational database of officer patrol routes, citations, and selective enforcement data 129

was created. In the state of Alabama, State Trooper vehicles are equipped with a Global 130

Positioning System (GPS) unit that is polled every 30 seconds. The location of the trooper is 131

recorded along with a timestamp. Officer patrol route location data is a large dataset that is 132

continually growing. Additionally, Electronic Citations (eCitations) issued by law enforcement 133

officials in the state of Alabama are stored in a database. These eCitations do not require GPS 134

coordinates or other structured location data. The eCitations do have an electronic timestamp 135

indicating when the citation was issued. A temporal join methodology was developed to cross-136

reference timestamps between officer patrol route GPS records and issued citations in order to 137

geolocate citations. Selective enforcement data for participating officers was obtained from DPS 138

timesheets and DOT invoices. A methodology to verify selective enforcement locations was 139

developed by comparing officer patrol routes and dates and hours of selective enforcement shifts. 140

The steps utilized SQL in order to obtain only relevant data for a one year analysis of selective 141

enforcement. The relevant data was imported into a GIS, along with crash data from the Critical 142

Analysis Reporting Environment (CARE) (Center for Advanced Public Safety, 2009), in order to 143

quantify the number of crashes and the number of citations in selective enforcement locations 144

before and during the selective enforcement effort. Due to the sensitive nature of crash data, 145

maps were only generated for internal DOT use. This methodology is extensible to other states 146

with available officer patrol routes, citation locations, crash data, and selective enforcement time 147

periods. 148

METHODOLOGY 149

A methodology was developed using SQL to geolocate eCitations and verify where selective 150

enforcement efforts took place in Alabama. This required identifying key locations in the GPS 151

officer location data and importing that data into GIS technology in order to evaluate the 152

effectiveness of the selective enforcement campaign in reducing crashes at high crash locations. 153

Initial work for this project involved digitizing selective enforcement shift information, including 154

the date and hours worked, for each participating officer from DPS extra-duty timesheets and 155

DOT invoice files to spreadsheets for use in electronic analysis. Law enforcement officials in 156

the state of Alabama are given a unique UserID, which is used for identification purposes in both 157



the GPS officer patrol route and eCitation databases. The unique UserID was identified for each 158

participating officer in the selective enforcement data. 159

Data processing tasks were executed using SQL queries because of the large datasets and 160

the limited capabilities of GIS tools. The SQL queries described herein are presented in detail in 161

Simandl, 2015. The first task for this project included developing a methodology to cross-162

reference timestamps between officer patrol routes and issued citations in order to geolocate 163

citations with and without GPS coordinates. Furthermore, a methodology to verify selective 164

enforcement locations was developed by comparing officer patrol routes and dates and hours of 165

selective enforcement shifts. The resulting data from the SQL queries was specific information 166

as opposed to the Big Data in its entirety, and was imported into a GIS along with crash data 167

from the Critical Analysis Reporting Environment (CARE). Only essential data for analysis was 168

imported in order to increase the feasibility of GIS to verify selective enforcement locations and 169

quantify the number of crashes and citations in selective enforcement locations before and during 170

selective enforcement efforts. 171

Geolocating eCitations 172

Performing a spatial-temporal analysis through a sequence of SQL queries, the time-stamped 173

eCitations and State Trooper location data were cross-referenced in order to accurately locate an 174

eCitation to the nearest GPS point in time. The UserIDs of the officers who participated in 175

selective enforcement were joined to the eCitation data based on UserID, in order to return 176

eCitations written by the relevant officers. The criterion to exclude voided eCitations was also 177

included, as voided eCitations were not officially issued. Over the course of the year being 178

analyzed in this research, including both selective enforcement shifts and regular duty shifts, 179

475,214 eCitations were issued by the participating officers. 180

Data for each individual GPS trace point includes the UserID, the latitude and longitude 181

location, and a timestamp recorded in UTC time. The officers who participated in selective 182

enforcement were joined to the GPS data based on UserID, in order to return GPS location points 183

for the relevant officers exclusively. For the year being analyzed in this research, including both 184

selective enforcement shifts and regular duty shifts, 37,628,124 GPS points were recorded for the 185

selective enforcement participating officers. 186

In order to cross-reference the timestamps of the eCitations and the GPS location points, 187

a SQL query was executed to calculate the time difference between GPS location points and 188

eCitations for a particular officer using the DATEDIFF() command. This query only returned 189

eCitations joined to a GPS location point within 600 seconds in order to maximize the number of 190

eCitations located while also providing reasonable eCitation location accuracies. Out of the 191

returned results, the minimum time difference between GPS location point and eCitation was 192

chosen to signify the most accurate location of the eCitation. 193

The spatial-temporal analysis, produced through a sequence of SQL queries, located 194

68.6% of eCitations. Further investigation showed that 49 of the officers (11.6%) participating 195

in selective enforcement efforts did not have any GPS trace data associated with their respective 196

UserIDs, potentially due to lack of GPS equipment and/or faulty equipment. With respect to the 197



88.4% of officers that had trace data, 72.6% of eCitations were geolocated using the developed 198

methodology. Additionally, 95% of those eCitations were joined to a GPS point with a time 199

difference of only 30 seconds. The resulting dataset of 325,926 eCitations was imported into 200

GIS for visual mapping. Citations that were not successfully located lacked officer location data 201

or there were breaks in the officer trace location data due to lost GPS signals. The loss of 202

eCitation data from the analysis has minimal negative implications to the results. The eCitations 203

that were included represent a conservative understanding of the number of eCitations at various 204

locations. 205

Identifying Selective Enforcement Locations 206

Selective enforcement DPS timesheets and DOT invoices provided the following information: 207

Date Worked, First Name, Last Name, Middle Initial, Suffix, and Hours. To relate the selective 208

enforcement hours worked with the GPS location data, the total hours worked in a full shift was 209

found. By understanding the relationship between total hours worked and selective enforcement 210

hours worked by an officer on a particular day, specific GPS location points were defined as 211

either “during selective enforcement” or “not during selective enforcement.” The location of the 212

“during selective enforcement,” or overtime work, is vital for the analysis of the effectiveness of 213

selective enforcement. The workflow framework for identifying individual GPS points as 214

“during selective enforcement” is shown in FIGURE 1. FIGURE 1(a) illustrates the steps to 215

calculate full shifts from the GPS officer patrol route data, in order to establish the necessary 216

relationship to join in the selective enforcement data, denoted by the pink arrow. These steps are 217

described in the sub-section titled, Calculating Full Officer Shifts from GPS Location Data. 218

FIGURE 1(b) illustrates the steps to identify individual GPS points as during or not during 219

selective enforcement shifts after the establishment of the relationship between the data, as noted 220

by the pink arrow. These steps are described in the sub-section titled, Defining Individual GPS 221

Points During Selective Enforcement Shifts. The following steps were used to confirm that 222

selective enforcement was completed in the appropriate locations. 223



224

FIGURE 1 Workflow framework for (a) establishing a relationship between the 225

GPS data and selective enforcement (SE) data, denoted by the pink arrow; and (b) 226

identifying individual GPS points as during selective enforcement 227

Calculating Full Officer Shifts from GPS Location Data 228

A series of SQL queries was used to define the beginning and ending of a full working shift for 229

an officer and calculate the total hours worked for the shift. The LEAD function in SQL 230

accesses the previous row in a column of a table, and the LAG function in SQL accesses the next 231

row in a column of a table. To calculate the time difference in seconds between two successive 232

GPS location points, the LEAD function was used, partitioned by UserID and ordered by GPS 233

Time to access the Next GPS Time in the table. The results of the LEAD calculation are stored 234

in a new column in a table. The DATEDIFF() command created an additional column, recording 235

the difference between the GPS Time column and the newly added Next GPS Time column. An 236

example of this query is shown in FIGURE 2. 237



238

FIGURE 2 Example dataset and query results to calculate the time difference 239

between successive GPS points 240

The beginning and ending of a shift was identified in the table by using the CASE 241

function in SQL, which enforces a “when-then” statement. When the time difference between 242

successive GPS location points was greater than a predetermined time tolerance limit, then the 243

GPS location point was considered to be the beginning of a shift. An example of a large time 244

difference signifying the beginning of a shift is shown in FIGURE 2, highlighted with a red 245

circle. On the other hand, when the time difference between successive GPS location points was 246

less than or equal to the predetermined time tolerance limit, then the GPS location point was 247

considered to be within a shift. To select the time tolerance for shift generation that best 248

represents the GPS data and reasonable shift lengths that officers work, the time difference 249

tolerance of six hours was chosen because six hours minimized shifts of over 24 hours compared 250

to using eight hours, kept an optimum number of seven-, eight-, and nine-hour shifts, and 251

minimized the number of one-hour shifts compared to using five hours. 252

Using the time tolerance of six hours, the beginning and ending timestamps of a 253

particular shift for each officer were selected and compiled into a new table with new columns of 254

Shift Start and Shift End. A query was used to calculate the difference between these two 255

timestamps, which resulted in the total hours worked for a particular shift and generated a total 256

of 55,402 GPS-identified shifts. The date difference was calculated in seconds and then 257

converted to hours to ensure the precision of the time. Shifts with hours worked between 0.5 258

hours and 17 hours were considered reasonable, as 0.5 hours is the smallest increment recorded 259

in the selective enforcement invoices and 17 hours provides a conservative inclusion of a 260

maximum possible shift of two back-to-back 8-hour shifts. Out of all the GPS-identified shifts, 261

94.2% had a reasonable shift length. 262

Defining Individual GPS Points During Selective Enforcement Shifts 263

By understanding the GPS point data in terms of full officer shifts, a relationship to the selective 264

enforcement data was established. Using the relationship between hours worked in a GPS-265

identified shift to the hours worked by an officer for selective enforcement, a series of SQL 266

queries was used to define an individual GPS point as “during selective enforcement” or “not 267

during selective enforcement.” To relate the selective enforcement shifts to the identified GPS 268



location point shifts, a join was executed in SQL based on UserID and Shift Start date. There 269

were 66% of selective enforcement shifts accounted for and joined to an appropriate GPS-270

identified shift. Because of the 49 officers that did not have any GPS trace data associated with 271

their respective UserIDs, 477 selective enforcement shifts were not successfully included. When 272

excluding those particular officers, the GPS generated shifts successfully recognized 72.6% of 273

the logged selective enforcement shifts. Additionally, this query calculated the difference 274

between GPS shift hours worked and Selective Enforcement shift hours worked. The loss of 275

selective enforcement data from the analysis due to a lack of officer GPS data suggests that 276

additional selective enforcement locations across the state were potentially unidentified or that 277

various locations across the state could potentially have had more police enforcement presence 278

than the analysis shows. However, the verified selective enforcement locations in the study still 279

represent the highest level of selective enforcement presence from the available data across the 280

state between August 2010 and July 2011. 281

Simultaneously, queries to calculate cumulative time worked in a GPS-identified shift 282

were executed. Using the difference between the GPS-identified shift hours worked and the 283

selective enforcement hours worked in conjunction with the cumulative time worked, individual 284

GPS location points were denoted as “during selective enforcement” or “not during selective 285

enforcement.” The scenarios for the CASE function, a “when-then” command, used in the SQL 286

query and the necessary assumptions for the methodology are shown in TABLE 1. 287

TABLE 1 The CASE function scenarios, assumptions, and query description for the 288

SQL query used to denote an individual GPS point as during selective enforcement. 289

Scenario Assumption Query

Logged selective

enforcement hours < GPS

shift hours worked

The last ‘x’ hours of the

GPS shift are the overtime

selective enforcement

hours

When Diff <= Cumulative

Time, then ‘Yes’

Logged selective

enforcement hours = GPS

shift hours worked

Within 0.5 hours is

considered “equal”

When Diff is Between -0.5

and 0.5, then ‘Yes’

Logged selective

enforcement hours > GPS

shift hours worked

The whole shift is

selective enforcement, and

the discrepancy can be

explained by a loss of GPS

signal

When Diff < 0, then ‘Yes’

A final query was executed to create a table of only GPS points during selective 290

enforcement shifts, resulting in a total of 1,647,711 GPS points. The series of SQL queries 291

demonstrates successful data minimization, by turning over 37.6 million GPS data points, into 292

approximately 1.65 million points of useful information. 293



Verifying Selective Enforcement Locations 294

Officers participating in the Alabama selective enforcement efforts were allowed up to two hours 295

of paid travel time to selective enforcement locations. Therefore, the GPS points during 296

selective enforcement are not all localized in a particular selective enforcement area. 297

Additionally, police presence in a selective enforcement location does not necessarily mean the 298

presence was statistically significant over other areas of patrol. In order to verify selective 299

enforcement locations, a statistical method employing clustering of spatial data was 300

implemented. To employ the location verification process, the GPS points during selective 301

enforcement and eCitation data were integrated into GIS based on Route-Milepost (Route-MP) 302

along state routes, resulting in 1,283,211 GPS points during selective enforcement and 243,800 303

eCitations over the course of the year. With Route-MP data associated to each point event, a 304

clustering methodology was executed on the GPS point data using a SQL query technique 305

involving the summation and grouping of point events at each Route-MP with various bucket 306

sizes. This technique provides an efficient Hot Spot Analysis. Hot spot analysis tools in GIS 307

were not used because of the large amount of officer GPS location data, and GIS Hot Spot tools 308

neglect the one-dimensional integrity of the roadway network. 309

SQL Query Technique to Verify Selective Enforcement Locations 310

A series of SQL queries using GPS point data identified as “selective enforcement” was 311

implemented to verify selective enforcement locations by finding clusters of points at identifiable 312

Route-Milepost locations. The Route-MP data was incorporated by summarizing the number of 313

selective enforcement points at Route-MP locations using the COUNT aggregate function. 314

Bucket sizes of one-tenth of a mile, one-half of a mile, and one mile were evaluated. To 315

standardize the bucket sizes and confirm appropriate lengths along a particular route in the 316

network, queries to calculate the difference between successive Mileposts and the cumulative 317

difference of Mileposts were employed. The summation of “selective enforcement” points in 318

groups of the three bucket sizes was centered in the middle of each bucket size and recorded. 319

The LEAD and LAG function were used to calculate the number of selective enforcement points 320

prior to and after a particular row of data. A hypothetical example of the logic for the 321

summation query is shown in FIGURE 3. 322

The Sum column in FIGURE 3 was executed for all three bucket sizes used in the 323

analysis. The sum data was imported back into GIS for thematic mapping using natural jenks for 324

statistical recognition of large sums, or clusters, of selective enforcement locations. Natural 325

jenks organize the data into groups, or classes, that exclusively minimize the variance of the data 326

points included in each class. While the located selective enforcement points may not be the 327

exact area law enforcement was directed to patrol with extra-duty hours, the largest natural jenk 328

was used to identify the areas with statistically high levels of selective enforcement and was 329

therefore considered a selective enforcement location for the analysis. 330



331

FIGURE 3 SQL query summation analysis logic: centering the summation on a 332

particular milepost by LEAD and LAG functions 333

Three natural jenk classes were used on the one mile bucket summation and grouping 334

analysis values in order to optimize the length of the located and verified selective enforcement 335

locations. The one-tenth of a mile and one-half of a mile analyses generated locations of smaller 336

length, which were in series and relatively close together, whereas the one mile analysis 337

recognized the smaller, close locations as one whole location. The limit value for the one mile 338

analysis was 2885 selective enforcement points at a particular Route-MP. Across the state of 339

Alabama, 26 locations were identified as selective enforcement areas through this procedure. 340

Incorporating eCrash Data from CARE 341

The Critical Analysis Reporting Environment (CARE) is data analysis software developed by 342

The Center for Advanced Public Safety at The University of Alabama. A data source of crash 343

records from 2010-2014 were imported into CARE, and filters were created to export crash 344

report data for the research time period as a GIS shapefile with GPS coordinates. Two exports 345

were completed: crashes during one year before the selective enforcement year and crashes 346

during the selective enforcement year. The created filters also incorporated highway 347

classifications, and only exported crashes along Interstate, Federal and State routes. A spatial 348

join of the GIS shapefiles of the exported crash data to the Route-Milepost information was 349

performed to ensure consistency of location identification with the selective enforcement points 350

and eCitations. 351



RESULTS 352

For a one year study of the Alabama Mobile Selective Law Enforcement Campaign from August 353

1, 2010 through July 21, 2011, selective enforcement efforts were located at 26 areas along state 354

routes across the state of Alabama. After further investigation of each location using GIS and 355

Google Maps, three locations that were found to have high selective enforcement presence were 356

located at state trooper posts and were omitted from further study. Additionally, two different 357

areas were found to have two separate defined locations of selective enforcement high points, 358

separated by 0.2 miles, and were therefore merged. 359

Number of Crashes and Citations Before and During Selective Enforcement 360

The number of crashes and the number of issued citations were evaluated at the 21 361

locations before and during the selective enforcement research year, including both regular and 362

overtime shifts. The number of crashes and citations for each location over the course of the 363

before and during year are shown in TABLE 2, along with the percent change between the 364

before and during selective enforcement crash and citation counts. A negative percent change 365

for crashes marks a decrease in the number of crashes during the selective enforcement research 366

year, and a positive percent change for citations marks an increase in the number of issued 367

citations. TABLE 2 is ordered by the percent change in the number of crashes at each location, 368

and is sectioned off by negative and positive percent changes. There were 11 locations that 369

experienced a decrease in the number of crashes. These locations had variable changes in the 370

number of issued citations. Notably, Location 9, an enforcement effort between two 371

interchanges, experienced a citation increase of 2015%. 372



TABLE 2 Crash and Citation Data for the Selective Enforcement Locations, 373

organized by percent change on crashes before and during selective enforcement 374

Crashes Citations Equivalent C

Level Crashes

Location

One

Year

Before

One

Year

During

Percent

Change

One

Year

Before

One

Year

During

Percent

Change

One

Year

Before

One

Year

During

7 2 0 -100.00% 93 153 64.52% 0 0

8 2 0 -100.00% 31 23 -25.81% 0 0

12 33 19 -42.42% 1512 963 -36.31% 27 5

21 22 14 -36.36% 91 257 182.42% 10 14

11 10 7 -30.00% 180 180 0.00% 6 4

19 17 14 -17.65% 75 505 573.33% 5 11

10 24 20 -16.67% 268 1133 322.76% 0 9

5 15 13 -13.33% 4 41 925.00% 8 1

13 16 14 -12.50% 916 511 -44.21% 8 9

9 16 15 -6.25% 33 698 2015.15% 4 4

15 16 15 -6.25% 214 826 285.98% 10 10

3 22 22 0.00% 886 1243 40.29% 11 8

16 2 2 0.00% 33 46 39.39% 6 2

20 9 9 0.00% 144 343 138.19% 2 2

1 13 15 15.38% 311 223 -28.30% 7 7

4 43 50 16.28% 35 170 385.71% 21 15

6 7 9 28.57% 371 365 -1.62% 1 1

18 13 19 46.15% 229 568 148.03% 7 9

2 19 30 57.89% 140 161 15.00% 7 11

17 3 5 66.67% 27 91 237.04% 0 0

14 13 34 161.54% 868 1599 84.22% 8 10

375

Severity weighted crash counts are also presented in TABLE 2. The weighted crash 376

counts were included in the analysis of the effectiveness of selective enforcement in order to 377

draw conclusions on any changes in the severity of crashes. For instance, the number of severe 378

crashes may have decreased, while the number of total crashes may have remained constant or 379

even increased. The severity weighted crash counts for the 21 selective enforcement areas 380

before and during the selective enforcement year are presented in terms of equivalent C level 381

crashes, obtained by using the “KABCO” injury scale (Federal Highway Administration, 2011). 382

Evaluating the Effectiveness of Selective Enforcement 383

The effectiveness of the selective enforcement efforts were evaluated in terms of crash 384

frequencies and the number of issued citations. A paired difference t-test was used to determine 385

whether the mean, or expected value, of the differences between the before and during crash 386

frequencies and citation counts yielded a significant change. A positive value change for crashes 387



will conclude that the number of crashes decreased during the selective enforcement efforts. The 388

null and alternative hypotheses for the before and during crash number tests are, 389

𝐻0: 𝜇𝑑 ≤ 0 and 𝐻1: 𝜇𝑑 > 0 390

where 𝐻0 is the null hypothesis, 𝐻1 is the alternative hypothesis, and 𝜇𝑑 is the difference in the 391

means. A negative value change for citations will conclude that the number of issued citations 392

increased during selective enforcement. The null and alternative hypotheses for the before and 393

during citations number tests are, 394

𝐻0: 𝜇𝑑 ≥ 0 and 𝐻1: 𝜇𝑑 < 0 395

If the null hypothesis is rejected based on the t-test statistic and a corresponding P-value 396

less than a particular significance level, than the alternative hypothesis is accepted, meaning 397

there was a statistical change in the number of crashes or citations at the various selective 398

enforcement locations. For example, at a significance level of 0.05, the null hypothesis is 399

rejected with 95% confidence and is considered statistically significant. The paired difference t-400

test determines whether a change in the mean values of two samples is statistically significant, 401

interpreted by the resulting P-value. The actual mean values of the samples do not indicate the 402

results of the test since the variance of the datasets may be differing. 403

The 21 locations were analyzed by area type. Locations were considered rural if the 404

population of the location municipality was less than 5000, while locations were considered 405

urban if the population of the location municipality was greater than 5000. The paired difference 406

t-test results for the total number of crashes, the number of issued citations, and the number of 407

equivalent C injury level crashes for urban and rural selective enforcement locations are shown 408

in TABLE 3. 409

For urban selective enforcement locations along state routes in Alabama from August 1, 410

2010 through July 21, 2011, the p-value for the crash frequency data was 0.148. The null 411

hypothesis (the number of crashes remained constant or increased) can be rejected at a 412

significance level of 0.15, or with 85% confidence. While the actual mean value of the crash 413

frequency went up during selective enforcement, from 14.3 to 17.2, the variance also increased. 414

For rural selective enforcement locations, the p-value for the crash frequency data was 0.122. 415

The null hypothesis can be rejected at a significance level of 0.15, or with 85% confidence. The 416

results confirm that crash frequencies were reduced at both urban and rural selective enforcement 417

locations, with 85% confidence. While not considered statistically significant at a significance 418

level of 0.15, a marked decrease or trend was identified. The number of issued citations 419

increased during selective enforcement for both urban and rural locations, with rejections of the 420

null hypotheses with low-p values of 0.040 and 0.103, respectively. The results confirm that the 421

number of issued citations was increased at selective enforcement locations. The results for the 422

equivalent C severity level crash frequencies at both urban and rural locations did not constitute 423

the rejection of the null hypothesis. Therefore, when analyzing urban and rural locations 424

exclusively, it is plausible that selective enforcement was not effective in reducing the severity of 425

crash occurrences. 426



In summary, there is evidence of trends between selective enforcement and the decrease 427

of crashes by analyzing urban and rural locations separately, as opposed to collectively. 428

Analyzing the locations by area type explains some of the variance of the data, and shows some 429

improvement to understanding the effectiveness of selective enforcement. However, the 430

decrease in the severity of crashes was inconclusive. In each paired difference test performed, 431

the results confirm that the number of issued citations increased during the selective enforcement 432

efforts. Overall, there was a statistically significant increase in citations at all identified selective 433

enforcement locations, while there was a marked decrease in the number of crashes. An analysis 434

that included crash and citation frequency collectively to directly examine the relationship 435

between the two was not included in this study. The goal of this work was to evaluate the effect 436

of selective enforcement on the number of citations, and more importantly the number of 437

crashes, before and during selective enforcement. The characteristics of the selective 438

enforcement locations, such as number of lanes, average annual daily traffic, and even safe 439

pullout zones for officers, varied greatly and have a large effect on evaluating correlations 440

between citations and crashes across all selective enforcement areas. 441

TABLE 3 Paired Difference t-test Results for Urban and Rural selective 442

enforcement locations along state routes in Alabama 443

Urban Locations

Crashes Citations

Equivalent C

Crashes

One Year

Before

One Year

During

One Year

Before

One Year

During

One Year

Before

One Year

During

Mean 14.3 17.2 181.1 357.8 6.9 6.7

Variance 151.122 257.956 66137.21 222289.1 34.9889 30.9

Observations 10 10 10 10 10 10

Hypothesized μd 0

0

0

Deg. of Freedom 9

9

9

t-Statistic -1.111

-1.967

0.198

P-value 0.148

0.040

0.424

Rural Locations

Crashes Citations Equivalent C

Crashes

One Year

Before

One Year

During

One Year

Before

One Year

During

One Year

Before

One Year

During

Mean 15.818 14 422.727 592.818 7.182 5.909

Variance 72.564 39.8 227955.4 166447 58.964 17.491

Observations 11 11 11 11 11 11

Hypothesized μd 0 0 0

Deg. of Freedom 10 10 10

t-Statistic 1.237 -1.353 0.525

P-value 0.122 0.103 0.305



CONCLUSIONS 444

For the analysis of the effectiveness of selective law enforcement in Alabama, Structured Query 445

Language (SQL) was used to employ data minimization techniques necessary to turn a Big 446

Dataset into valuable information. The SQL techniques successfully turned over 37 million 447

points of GPS data into 1.3 million points of selective enforcement location information, enabled 448

the geolocation of 72.6% of Electronic Citations (eCitations), and identified 21 selective 449

enforcement areas across the State of Alabama. With Big Data Analytics, Geographic 450

Information System (GIS) technology was shown to be useful for the evaluation of the increases 451

and decreases in crash frequencies and the number of issued citations before and during selective 452

enforcement. 453

Officer patrol routes, geolocated citations, crash data, and selective enforcement time 454

periods were integrated into one cohesive source of spatial and temporal data for the analysis. 455

The processing tasks in this research were essential to the evaluation of selective law 456

enforcement and the results, exemplifying data-driven discovery. Paired difference t-tests 457

confirm the decrease of crash frequency with 85% confidence, at urban and rural locations 458

separately. The analysis of the number of issued citations at the locations confirmed that 459

citations increased during the selective enforcement year by an average of 254%. While a 460

statistical increase in the number of issued citations does not necessarily cause a statistical 461

decrease in the number of crashes, the increase seems to have some effect on the number of 462

crashes at various locations during the selective enforcement effort year in the study. Future 463

work will include analyzing crash frequencies several years after the selective enforcement effort 464

to potentially draw conclusions on any lasting effects, in attempts to understand possible changes 465

in driver behavior. However, continued selective enforcement campaigns or a high presence of 466

selective enforcement police presence in the same locations may cause potential problems for 467

interpreting these results. Future work will also involve reducing crash frequency at high crash 468

locations through selective enforcement campaign recommendations such as implementing steps 469

to locate crash hot spots for selective enforcement efforts, developing crash modification factors, 470

and calculating a return on investment for selective enforcement campaigns. 471

ACKNOWLEDGMENTS 472

This project was funded by the Alabama Department of Transportation. Special thanks for their 473

contributions to this work go to Beau Elliott and Jesse Norris, The Center for Advanced Public 474

Safety, and Brittany Shake, Graduate Research Assistant, The University of Alabama. 475

REFERENCES 476

AASHTO. (2010). Highway Safety Manual. Washington, DC: American Society of State 477

Highway and Transportation Officials. 478

Center for Advanced Public Safety. (2009). Center for Advanced Public Safety. Retrieved 479

February 22, 2015, from Overview: http://caps.ua.edu/overview.aspx 480

Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J., & Welton, C. (2009). MAD Skills: New 481

Analysis Practices for Big Data. 2(2), 1481-1492. 482



Engineering and Physical Sciences Research Council. (2014). Engineering Grand Challenges: 483

Report on outcomes of a retreat. Stratford-upon-Avon. 484

Erke, A., Goldenbeld, C., & Vaa, T. (2009). Good practices in the selected key areas: Speeding, 485

drink driving, and seat belt wearing: Results from meta-analysis. European Commission. 486

Jahanian, F. (2012, July). Pillars of Societal Innovation: Research and Education in Computing 487

and Information. Biennial CRA Conference. Snowbird, UT: National Science Foundation. 488

Jonah, B., & Grant, B. (1985). Long-term effectiveness of selective traffic enforcement programs 489

for increasing seat belt use. The Journal of Applied Psychology, 70(2), 257-263. 490

Lohr, S. (2012, February 11). The Age of Big Data. The New York Times, 11. 491

Mayer-Schonberger, V., & Cukier, K. (2013). Letting the data speak. In Big Data: A revolution 492

that will transform how we live, work, and think. Houghton Mifflin Harcourt. 493

McAfee, A., & Brynjolfsson, E. (2012). Big Data: The Management Revolution. Harvard 494

Business Review, 90(10), pp. 61-67. 495

Mestyan, M., Yasseri, T., & Kertesz, J. (2013). Early prediction of movie box office success 496

based on Wikipedia activity big data. PloS one, 8.8. 497

Peden, M., Scurfield, R., Sleet, D., Mohan, D., Hyder, A., Jarawan, E., et al. (2004). World 498

report on road traffic injury prevention. Geneva: World Heath Organization. 499

Provost, F., & Fawcett, T. (2013). Data Science and its relationship to big data and data-driven 500

decision making. Big Data, 1(1), 51-59. 501

Simandl, J. (2015). GIS-Based Evaluation of the Effectiveness of Selective Law Enforcement 502

Campaigns in Reducing Crashes. Tuscaloosa, Alabama: The University of Alabama. 503

Vaa, T. (1997). Increased police enforcement: Effects on speed. Accident Analysis and 504

Prevention, 29(3), 373-385. 505

Viescas, J., & Hernandez, M. (2014). SQL Queries for Mere Mortals. Boston, MA: Addison-506

Wesley. 507

Walter, L., Broughton, J., & Knowles, J. (2011). The effects of increased police enforcement 508

along a route in London. Accident Analysis and Prevention, 43(3), 1219-1227. 509

making use of big data to evaluate the …docs.trb.org/prp/16-2886.pdf · simandl, graettinger,...

Documents