finding patterns with a rotten core: data mining for crime series with cores tong wang, cynthia...

21
Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Upload: hector-sims

Post on 13-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Finding Patterns with a Rotten Core:

Data Mining for Crime Series with Cores

Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Page 2: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Introduction

• One of the most important problems in crime analysis is that of crime series detection, or the detection of a set of crimes committed by the same individual or group.• Criminals follow a modus operandi (M.O.作案手法或模式 ) that

characterizes their crime series; for instance, some criminals operate exclusively during the day, others work at night, some criminals target apartments for housebreaks, while others target single family houses.

Page 3: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Introduction

• Crime analysts need to identify these crime series within police databases at the same time as they identify the M.O.• Currently, analysts identify crime series by hand: They manually

search through the data using database queries trying to locate patterns, which can be very challenging and time-consuming.

Page 4: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Introduction

• From a computational perspective, crime series detection is a very difficult problem: It is a clustering problem with cluster-specific feature selection, where the set of features for the cluster is the M.O. of the criminal(s).• One cannot know the M.O. of the series without determining which

crimes were involved, and one cannot locate the set of crimes without knowing the M.O.—the M.O. and the set of crimes need to be determined simultaneously.

Page 5: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Introduction

• The problem of crime series detection is much more data intensive and computationally harder than hotspot prediction (when and where crimes will occur).• To determine the M.O., we require very fine-grained detailed

information about past crimes: where the offender(s) entered (front door, window, etc.); how they entered (pried doors 撬開門 , forced doors破門 , unlocked doors (未上鎖的門 ), pushed in air conditioner, etc.), whether they ransacked the premise洗劫房屋 , the type of premise, etc.

Page 6: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Introduction

• This is high dimensional structured data, leading to a computationally challenging data mining problem of finding all crimes that are similar to each other in several ways.

Page 7: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Introduction

• The main hypothesis in this work is that most crime patterns have a core of crimes that exemplify the M.O. of the series. • A core might be approximately three or four crimes that are very

similar to each other in many different ways. • According to our hypothesis that each crime series has a core, if we

can locate all small cores of crime within the dataset, we have thus located pieces of most crime series.

Page 8: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Introduction

• Though we cannot determine the M.O. of a series before we see it, we can characterize parts of the M.O.s we expect generally—for instance, crimes in a series are often close in time and space.• We define a pattern-general similarity (犯罪的一般模式相似度 )

that encodes factors that are generally common to most crime series. • It is learned from past crime series; for instance, proximity in time and

space highly contribute to the pattern-general similarity.

Page 9: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Introduction

• The pattern-general similarity induces a similarity graph over the set of crimes, where there is an edge between two crimes when they are similar according to the pattern-general similarity.

Page 10: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Introduction

• Crimes in a series also have pattern-specific similarity (犯罪的特定模式相似度 ), where crimes are similar to each other if they all share the same M.O. • The pattern-specific similarity and pattern-general similarity are used

for detecting cores: A core must have both sufficiently large pattern-general similarity and pattern-specific similarity.• Once the cores are found, we construct the full crime series by

merging overlapping cores.

Page 11: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Introduction

• Our method is a general subspace clustering method that can be used beyond crime series, and for other completely different application domains. • It is novel in that it considers both pattern-general and pattern-

specific aspects of patterns, where ‘‘pattern-general’’ means that it is common to many crime series, and ‘‘pattern-specific’’ meaning aspects of a particular crime series.

Page 12: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Introduction

• Our method also can find subspace clusters whose subspace morphs dynamically over the cluster; an M.O. can change when an offender becomes more sophisticated, adapts his M.O. to avoid detection, or when his preferred method is not available (e.g., he prefers entry through unlocked rear doors but will push in the air conditioner if his preferred means of entry is not available).

Page 13: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Introduction

• Our three methodology sections follow the three main components of our method: learn a similarity graph, mine cores, and merge cores.• We then show experiments where our method was tested on the full

housebreak (入侵家戶 ) database from the Cambridge Police Department containing detailed information from thousands of crimes from over a decade. • This method has been able to provide new insights into true patterns

of crime committed in Cambridge.

Page 14: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Introduction

• One observation that has been revealed in our experiments is that computers do not have the same biases that humans do.• Computers search through the database in a different way than a

human would, and thus can work symbiotically (共生地 ) with analysts, showing them avenues to consider where they would not have normally ventured.

Page 15: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

FIG. 7. The locations of crimes in the first series.

Page 16: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Details of a one crime series with 10 crimes.

d is the number of defining features, where the defines features characterize the M.O. of the crime series.

Page 17: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Crimes 1 to 5 are well connected as a subset, and crimes 6 to 10 are well connected as another subset. From only the similarity graph, the two subsets do not seem very related except for a single edge between crimes {5, 6} connecting them; however, this is only the pattern-general part of the story.

Page 18: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

We discover cores of size 3 with at least 6 defining features. Table 3 lists the cores and their defining features in this series, where a check mark means the feature is a defining feature and a circle means it is not. These cores show how the crimes are similar to each other in a pattern-specific way.

Page 19: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Crime Series #1

• The defining feature set was chosen to include six features, which are geographic location, days apart, location of entry, the ransacked indicator, time of day, and day of the week.• As these data were reconsidered by crime analysts, we found out that

when these crimes were analyzed back in 2006–2007, they were viewed as two unrelated patterns, one at the end of 2006, crimes 1 to 5, and one at the beginning of 2007, crimes 6 to 10. • The connection between these two subsets of crime is very subtle,

and there is over a month gap between the two patterns, so it did not occur to the crime analysts to link them.

Page 20: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Crime Series #1

• Their intuition agrees completely with the similarity graph, as the two subsets are weakly connected only by one edge; however, recall that this only describes the pattern-general similarity–what one would expect from generic pattern without considering a specific M.O. • On examination of the cores, not only are they correlated in six

features, but five of the cores (core indices 11, 12, 14, 15, 16) contain crimes from both of the subsets, which is strong evidence that the two subsets should be merged together.

Page 21: Finding Patterns with a Rotten Core: Data Mining for Crime Series with Cores Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri

Conclusion

• Our automated series detection methods are able to detect crime patterns within big data sets where a human could not. • Detecting patterns among crime data is critical to effective policing.• When a pattern is identified, police can implement effective strategies

to prevent future crimes, solve past crimes, identify and capture offenders, and ensure offenders are investigated and prosecuted fully for the crimes for which they are responsible.