cascading spatio-temporal pattern discovery: a summary of results

21
1 Cascading spatio-temporal pattern discovery: A summary of results Pradeep Mohan¹, Shashi Shekhar¹, James A.Shine², James P.Rogers 2 ¹University of Minnesota, Twin-Cities, {mohan,shekhar}@cs.umn.edu ²Engineering Research and Development Center, Alexandria, VA {James.A.Shine, James.P.Rogers.II}@usace.army.mil

Upload: adila

Post on 24-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Cascading spatio-temporal pattern discovery: A summary of results. Pradeep Mohan ¹ , Shashi Shekhar ¹ , James A.Shine ² , James P.Rogers 2 ¹University of Minnesota, Twin-Cities, {mohan,shekhar}@cs.umn.edu - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cascading spatio-temporal pattern discovery: A summary of results

1

Cascading spatio-temporal pattern discovery: A summary of results

Pradeep Mohan¹, Shashi Shekhar¹, James A.Shine², James P.Rogers2

¹University of Minnesota, Twin-Cities, {mohan,shekhar}@cs.umn.edu ²Engineering Research and Development Center, Alexandria, VA

{James.A.Shine, James.P.Rogers.II}@usace.army.mil

Page 2: Cascading spatio-temporal pattern discovery: A summary of results

2

Outline

IntroductionMotivation

Problem Statement

Related Work Contributions

Conclusion and Future Work

Interest Measure

CSTP Miner Algorithm

Evaluation and Case Study

Page 3: Cascading spatio-temporal pattern discovery: A summary of results

3

Motivation : Public Safety

Stages: Bar Closing, Assault , Drunk Driving, Hurricane, Climate change etc.

Cascading spatio-temporal pattern (CSTP)Bar Closing

Assault

Drunk Driving

Partially ordered subsets of ST event types. Located together in space.Occur in stages over time.

Other Applications: Climate change, epidemiology, evacuation planning.

T1 T2 T3

B.2

B.1C.1

C.2 C.3

C.4

A.1

A.3

A.2

A.4

Assault(A)

Drunk Driving (C)

Bar Closing(B)

Aggregate(T1,T2,T3)

C2 C.3

C.4

C.1

A.1

A.3

A.2

A.4

B.2B.1

Page 4: Cascading spatio-temporal pattern discovery: A summary of results

4

Problem Definition Input : a) ST framework, b) directed ST

neighbor relation R, c) Interest measure threshold

Objective : a) Minimize computation costs while discovering statistically meaningful CSTPs.

Output : A set of CSTPs with interestingness >= threshold

Constraints : a) Correctness and Completeness

ST Join (R)R = {0.5 Miles, 2 min.}

Example:

BA

C

Threshold = 0.5

Aggregate(T1,T2,T3)

C2 C.3

C.4

C.1

A.1

A.3

A.2

A.4

B.2B.1

Page 5: Cascading spatio-temporal pattern discovery: A summary of results

5

Challenges and Contributions

Space and Time are continuous Many overlapping ST neighborhoods Neighborhood enumeration is computationally challenging

Conflicting Requirements Ex., Statistical interpretation Vs. computational scalability

Exponential Candidate Space Ex., Candidate CSTPs exponential in the number of event types

Interest Measures Statistical Interpretation Computational Structure

CSTP Miner Algorithm Filtering Strategies

Evaluation Experimental Evaluation Case study

Challenges

Contributions

Page 6: Cascading spatio-temporal pattern discovery: A summary of results

6

Limitations of Related Work: ST Data Mining

Limitations [ST Co-occurrence]

Treating space and time independently. Absence of partial order

[ST Sequence] Does not account for multiply connected patterns(e.g. nonlinear) Misses non-linear semantics. No ST statistical interpretation.

Related Work

ST Sequences

ST Subsets

Partial Order √ XMultiply connected

X √

Multiple patterns

√ √

ST Statistical Interpretation

X (only spatial)

X

Page 7: Cascading spatio-temporal pattern discovery: A summary of results

7

Interest Measures

Cascade Participation Ratio (CPR) :

Datasetin M ,Event type of instances of #

CSTP M Event type of instances of #),(

j

j

totalMCSTPCPR j

[Conditional Probability of observing an instance of CSTP having seen an Instance of A]

Cascade Participation Index (CPI) :

Datasetin M ,Event type of instances of #

CSTP M Event type of instances of #min)(

j

j

totalCSTPCPI

Lower bound on the Conditional Probability of observing an instance of CSTP having seen an Instance of A, B or C

BA

CAggregate(T1,T2,T3)

C2 C.3

C.4

C.1

A.1

A.3

A.2

A.4

B.2B.1

5.042),( ACSTPCPR

5.021),( BCSTPCPR

5.042),( CCSTPCPR

5.0),(),,(),,(min BCSTPCPRACSTPCPRCCSTPCPRCPI

Page 8: Cascading spatio-temporal pattern discovery: A summary of results

8

Interest Measures: Statistical Interpretation

ST K-Function 2/9 3/9 = 1/3 9/9 = 1

CPI 2/3 1 1

Time Axis

X Axis

Y Axis

Spatial Statistics: ST K-Function (Diggle et al. 1995)

)),(),,((1.1),(ˆ

jii dj jihtBA

AB BAtBAdITS

thK

Cascade Participation Index (CPI) is an upper bound to the ST K-FunctionExample:

AB

A

B A

B

A

BA

B

A

B A

B

A

B

A

B

Page 9: Cascading spatio-temporal pattern discovery: A summary of results

9

CSTP Miner Algorithm: Overview

Upper Bound Filter

Candidate Generation*

Multi-resolution Filter

Cycle checking

Compute CPI

Prune CSTP

Prevalent CSTPs

*using same strategy as [Kuramochi and Karypis’04]

Cycles Removed

RCPI Threshold

Filtering Choice

Pruned CSTPs

CPI computation involves ST Join. ST Join

Sort-merge over time Nested loop over space.

Computational Bottleneck!

Page 10: Cascading spatio-temporal pattern discovery: A summary of results

10

Filtering strategies Enhance Savings : Filter Non-prevalent CSTPs before CPI computation

Before Candidate Generation: Upper bound (UB)filter

After Candidate Generation: Multi-resolution ST(MST) filterKey Idea There exists a low dimensional embedding in space and time. Over estimate CPI by coarsening ST dataset. If Overestimate (CPI) < Threshold : Pruned

Key Idea CPI has anti-monotone upper bound.

Page 11: Cascading spatio-temporal pattern discovery: A summary of results

11

Evaluation

Real Dataset: City of Lincoln, Nebraska, Year 2007

Matlab 7.0 , X5355 2.66 GHZ with 16 GB Main Memory and Linux OS

Events within an interval of 10 minutes were assigned the same time stamp.

Goalsa. What is the effect of # event types on

execution time ? b. What is the effect of CPI threshold ?c. Other experiments: Effect of Neighborhood size, Dataset size, Grid Parameters

Page 12: Cascading spatio-temporal pattern discovery: A summary of results

12

Experimental AnalysisQuestions

a. What is the effect of # event types ? b. What is the effect of CPI threshold ?

Trends:

a. Patten size is exponential in the number of event types.b. MST filter enhances computational savings.

Fixed parameters : a. CPI = 0.2b. Time Neighborhood = 1750 Time stamps.

Fixed parameters : a. # of event types = 5b. Time Neighborhood = 1750 Time stamps.

Page 13: Cascading spatio-temporal pattern discovery: A summary of results

13

Lincoln, NE crime dataset: Case study Is bar closing a generator for crime related CSTP ?

Observation: Crime peaks around bar-closing!

Bar locations in Lincoln, NE

Is bar closing a crime generator ? Are there other generators (e.g. Saturday Nights )?

Questions

Bar closing Increase(Larceny,vandalism, assaults)

Saturday Night Increase(Larceny,vandalism, assaults)

K.S Test: Saturday night significantly different than normal day bar closing (P-value = 1.249x10-7 , K =0.41)

Page 14: Cascading spatio-temporal pattern discovery: A summary of results

14

Conclusions Cascading ST Patterns are useful in applications like Public Safety and Climate change science.

Future work New interest measure alternatives.

Qualitative Comparison with Graphical Models (e.g. Dynamic Bayes Nets, Hidden Markov Models etc.)

ST Multi-resolution filtering enhances computational performance.

Complementary filtering strategies.

Statistically interpretable interest measure.

Page 15: Cascading spatio-temporal pattern discovery: A summary of results

15

Acknowledgment Members of the Spatial Database and Data Mining Research Group University of Minnesota, Twin-Cities.

This Work was supported by Grants from USARMY and NSF.

Thank You for your Questions, Comments and Patience!

Page 16: Cascading spatio-temporal pattern discovery: A summary of results

Crime Report Schema Alignment

University of Texas at Dallas

Page 17: Cascading spatio-temporal pattern discovery: A summary of results

Overview

Washington DC Incidents ReportedLincoln_Nebraska Incidents Reported

NID CCN ence … Long Latitude3768 57139

8Arson 38.870

10181-76.9822237

3787 519110

Theft 38..88852

-76.9370033

3779 519097

Burglary 38.95143

-77.0238048

INC_ Time_ Date_ … Team_Area

45111 21:24 11-17-2007

Northwest Team

41000 18:22 12-2-2007 Center Team

•Two different tables from two different data sources. Our goal is to align attributes between two tables.

Code Crime

45111 Arson

41000 Auto-theft

41000 Unauthorized use of motor vehicle

Page 18: Cascading spatio-temporal pattern discovery: A summary of results

Dataset ER Diagram

Washington DCLincoln

Crime_type

BarsFootball Match

Incident_2007_reported

Incident_2007_reported

Football Match Bars

located located

crime

Crime

locatedlocated

Crime is an attribute in Washington DC Dataset, while it is a table in Lincoln Dataset.

• Heterogeneity

Page 19: Cascading spatio-temporal pattern discovery: A summary of results

Schema Alignment– Syntactic Matching: Keyword-based matching on

Crime name• Lincoln.CrimeType. IncidentClassification = “Robbery”• Washington.Crime = “Robbery”

– Semantic Matching: Semantically RelevantA. Specialization vs. Generalization

– Lincoln.CrimeType. IncidentClassification = “Death”– Washington.Crime = “Homicide”– Death is super class of Homicide

B. Finding Semantic MatchingI. Definition of Crimes

Using shared Words to determine SimilarityII.Relevant Words

Find relevant words using K-medoid Clustering and Normalized Google Distance (NGD) *

* Jeffrey Partyka, Latifur Khan, Bhavani Thuraisingham, “Geographically-Typed Semantic Schema Matching,” In Proc. of  ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2009), Seattle, Washington, USA, November 2009. Extended Version Submitted to Journal of Web Semantics, Springer.

Page 20: Cascading spatio-temporal pattern discovery: A summary of results

I. Finding Semantic Matching using Definition of Crime

• Finding shared words to determine similarity• Larceny-Theft: Unlawful taking, carrying, leading, or riding away of property from the possession or constructive possession of another; attempts to do these acts are included in the definition. [1]• Theft: Illegal taking of another person's property without that person's freely-given consent. [2]• Assault: An act that causes another to apprehend an immediate harmful contact. [3]Red keywords are common words in crime definitions, while blue keywords are not common..

[1] http://www.fbi.gov/ucr/cius_04/offenses_reported/property_crime/larceny-theft.html[2] http://en.wikipedia.org/wiki/Theft[3] http://en.wikipedia.org/wiki/Assult

Page 21: Cascading spatio-temporal pattern discovery: A summary of results

: Column 1

: Column 2

Similarity = H(C|T) / H(C)

WashingtonDC

Lincoln

Step 3 Calculate Similarity

Extract distinct keywords from compared columns

Group distinct keywords together into semantic clusters

Keywords extracted from columns = {Arson, Theft, Stolen, …}

“Arson”,”Theft”,”Burglary”,….“Arson”,”Theft”,”Northwest”….

C1 C2

C1 U C2

Step 1

Step 2

II. K-medoid + NGD Instance Similarity

Offence Long LatitudeArson 38.8701018

1-76.9822237

Theft 38..88852 -76.9370033

Burglary 38.95143 -77.0238048

INC_ Team_Area

Arson Northwest Team

Theft Center Team