spatial data mining: three case studies

42
C.T. Lu Spatial Data Mining 1 Spatial Data Mining: Three Case Studies Presented by: Chang-Tien Lu Spatial Database Lab Department of Computer Science University of Minnesota [email protected] http://www.cs.umn.edu/research/shashi-group Group Members: Shashi Shekhar, Weili Wu, Yan Huang, C.T. Lu

Upload: carol

Post on 15-Jan-2016

31 views

Category:

Documents


2 download

DESCRIPTION

Spatial Data Mining: Three Case Studies. Presented by: Chang-Tien Lu Spatial Database Lab Department of Computer Science University of Minnesota [email protected] http://www.cs.umn.edu/research/shashi-group Group Members: Shashi Shekhar, Weili Wu, Yan Huang, C.T. Lu. Outline. Introduction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 1

Spatial Data Mining: Three Case Studies

Presented by: Chang-Tien Lu

Spatial Database Lab Department of Computer Science

University of Minnesota

[email protected]://www.cs.umn.edu/research/shashi-group

Group Members:Shashi Shekhar, Weili Wu, Yan Huang, C.T. Lu

Page 2: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 2

Outline

IntroductionCase 1: Location PredictionCase 2: Spatial Association: Co-locationCase 3: Spatial Outlier DetectionConclusion and Future Directions

Page 3: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 3

Introduction: spatial data miningSpatial Databases are too large to analyze manually

NASA Earth Observation System (EOS)National Institute of Justice – Crime mappingCensus Bureau, Dept. of Commerce - Census Data

Spatial Data Mining Discover frequent and interesting spatial patterns for post processing (knowledge discovery)Pattern examples: spatial outliers, location prediction, clustering, spatial association, trends, ..

Historical ExampleLondon, 1854

• Cholera & water pump

Page 4: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 4

Framework

Problem statement: capture special needsData exploration: mapsTry reusing classical methods

data mining, spatial statistics

Invent new methods if reuse is not applicable Develop efficient algorithmsValidation, Performance tuning

Page 5: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 5

Case 1: Location Prediction

Problem: predict nesting site in marshesGiven vegetation, water depth, distance to edge, etc.

Data - maps of nests and attributesspatially clustered nests, spatially smooth attributes

Classical method: logistic regression, decision trees, bayesian classifier

but, independence assumption is violated ! • Misses auto-correlation !

Spatial auto-regression (SAR)Open issues: spatial accuracy vs. classification accuraryOpen issue: performance - SAR learning is slow!

Page 6: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 6

Given:1. Spatial Framework

2. Explanatory functions:3. A dependent class:4. A family of function

mappings:

Find: Classification model:

Objective:maximizeclassification_accuracy

Constraints: Spatial Autocorrelation

exists

},...{ 1 nssS RSf

kX :

},...{: 1 MC ccCSf

CRR ...

cf̂

),ˆ( cc ff

Nest locations Distance to open water

Vegetation durability Water depth

Location Prediction

Page 7: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 7

Evaluation: Change ModelLinear Regression

• Spatial Autoregression Model (SAR)• y = Wy + X +

• W models neighborhood relationships models strength of spatial dependencies error vector

• Mixed Spatial Autoregression Model (MSAR)• y = Wy + X + WX +

• Consider the impact of the explanatory variables from the neighboring observations

Xy

Page 8: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 8

Measure: ROC Curve

ROC Curve: Locus of the pair (TPR,FPR) for each cut-off probability

Receiver Operating Characteristic (ROC)TPR = AnPn / (AnPn + AnPnn)

FPR = AnnPn / (AnnPn+AnnPnn)

Classification accuracy: confusion matrix

Page 9: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 9

Evaluation: Change ModelLinear RegressionSpatial RegressionSpatial model is better

Xy

XWyy

Page 10: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 10

• Spatial Autoregression Model (SAR)• y = Wy + X +

• Solutions and - can be estimated using Maximum

likelihood theory or Bayesian statistics.• e.g., spatial econometrics package uses Bayesian

approach using sampling-based Markov Chain Monte Carlo (MCMC) method.

• Maximum likelihood-based estimation requires O(n3) ops.

Solution Procedures

Page 11: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 11

Evaluation: Chang measure

))(.,(),( PnearestAAdistPAADNP kk

k

New measure: ADNPAverage distance to nearest prediction

Spatial accuracy (map similarity)

Page 12: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 12

Predicting Location using Map Similarity

Page 13: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 13

Predicting location using Map Similarity

PLUMS components Map Similarity : Avg. Distance to Nearest

Prediction(ADNP) ,..

Search Algorithm : Greedy, gradient descent

Function family : generalized linear (GL)(logit, probit), non-linear,

GL with auto-correlation

Discretization of parameter space : Uniform, non-uniform,

multi-resolution, …

Page 14: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 14

Association RuleSupermarket shelf management

Goal: To identify items that are bought together by sufficiently many customersApproach: Process the point-of-scale data collected with barcode scanners to find dependencies among items (Transaction data)

A classic rule –If a customer buys diaper and milk, then he is very likely to buy beerSo, don’t be surprised if you find six-packs of beer stacked next to diapers!

Page 15: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 15

Association Rules:Support and confidence

Item set I = {i1, i2, ….ik}Transactions T = {t1, t2, …tn}Association rule: A -> B

Support S • (A and B) occur in at least S percent of

the transactions • P (A U B)

Confidence C : • Of all the transactions in which A occurs,

at least C percent of them contains B • P (B|A)

Page 16: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 16

Case 2: Spatial Association Rule

Problem: Given a set of boolean spatial featuresfind subsets of co-located features,

• e.g. (fire, drought, vegetation)

Data - continuous space, partition not natural

Classical data mining approach: association rules

But, No Transactions!!! No support measure!!

Approach: Work with continuous data without transactionizing it!

Participation index (support) : min. fraction of instances of a features in join resultConfidence = Pr.[fire at s | drought in N(s) and vegetation in N(s)] new algorithm using spatial joins

Page 17: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 17

Answers: and

Can you find co-location patterns from the following sample dataset?

Co-location

Page 18: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 18

Co-locationCan you find co-location patterns from the following sample dataset?

Page 19: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 19

Spatial Co-location A set of features frequently co-

located

Given A set T of K boolean spatial feature

types T={f1,f2, … , fk}

A set P of N locations P={p1, …, pN } in a spatial frame work S, pi P is of some spatial feature in T

A neighbor relation R over locations in S

Find Tc = subsets of T frequently co-

located

Objective Correctness Completeness Efficiency

Constraints R is symmetric and reflexive Monotonic prevalence measure

Reference Feature Centric

Window Centric Event Centric

Co-location

Page 20: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 20

Participation index

Participation index = min{pr(fi, c)}

Participation ratio pr(fi, c) of feature fi in co-location c = {f1, f2, …, fk}

Fraction of instances of fi with feature {f1, f2, f i-1, f i+1,…, fk} nearby.

Association rules Co-location rules

underlying space discrete sets continuous space

item-types item-types events /Boolean spatial features

collections transactions neighborhoods

Prevalence (A -> B) Support: P(A U B) Participation index

Conditional probability (A ->B)

Confidence: P[A|B] P [A in N(L) | B at L)

Comparison with association rules

Co-location

Page 21: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 21

Spatial Co-location Patterns

• Spatial feature A,B,C and their instances• Possible associations are (A, B), (B, C), etc.• Neighbor relationship includes following pairs:

•A1, B1•A2, B1•A2, B2•B1, C1•B2, C2

Dataset

Page 22: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 22

Spatial Co-location Patterns

Spatial feature A,B, C,and their instances

Support (A,B) =2 (B,C)=2 Support (A,B)=1 (B,C)=2

Partition approach [Yasuhiko, KDD 2001]

Support not well defined

i.e., not independent of execution trace

Has a fast heuristic which is hard to analyze for

correctness/completeness

Dataset

Page 23: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 23

Spatial Co-location Patterns

Spatial feature A,B, C,and their instances

Dataset Reference feature approach [Han SSD 95]

• Use C as reference feature to get transactions• Transactions: (B1) (B2)• Support (A,B) = Ǿ

• Note: Neighbor relationship includes following pairs:

•A1, B1•A2, B1•A2, B2•B1, C1•B2, C2

Page 24: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 24

Spatial Co-location Patterns

Spatial feature A,B, C,and their instances

Our approach (Event Centric)• Neighborhood instead of transactions

• Spatial join on neighbor relationship

• Support

•Participation index = Min ( p_ratio )

•P_ratio(A, (A,B)) = fraction of instance of A participating in join(A,B, neighbor)

•Examples

Support(A, B)=min(3/2,3/2)=1.5

Support(B, C)=min(2/2,2/2)=1

Dataset

Page 25: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 25

Spatial Co-location Patterns

Spatial feature A,B, C,and their instances

Support A,B =2 B,C=2

Support A,B=1 B,C=2

Support(A,B)=min(3/2,3/2)=1.5 Support(B,C)=min(2/2,2/2)=1

Partition approach

Our approach Dataset

Reference feature approach

C as reference featureTransactions: (B1) (B2)Support (A,B) = Ǿ

Page 26: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 26

Case 3: Spatial Outliers DetectionSpatial Outlier: A data point that is extreme relative to it

neighbors

Page 27: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 27

Application Domain: Traffic Data

Page 28: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 28

Spatial Outlier DetectionGiven

A spatial framework SF consisting of locations s1, s2, …, sn

An attribute function f : si R

(R : set of real numbers)A neighborhood relationship N SF SFA neighborhood aggregation function : RN RA difference function Fdiff : R R R

Statistic test function ST : R { True, False }• Test is based on Fdiff (f, (f, N)

Find

O = {vi | vi V, vi is a spatial outlier}Objective

Correctness: The attribute values of vi is extreme, compared with its neighbors

Computational efficiency

Naggrf

Naggrf

Page 29: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 29

An example of Spatial outlier

Page 30: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 30

Spatial Outlier Detection: Zs(x) approach

))((

1)()( )( yf

kxfxS xNy

Function:

s

sxs

xSZ

)()(

Declare x as a spatial outlier

If

Page 31: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 31

Evaluation of Statistical AssumptionDistribution of traffic station attribute f(x) looks normalDistribution of looks normal too!

))((

1)()( )( yf

kxfxS xNy

Page 32: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 32

Different Spatial Outlier Test

Spatial Statistic ApproachScatter plot approach(Luc Anselin 94’)Moran scatter plot approach (Luc Anselin 95’)Variogram cloud approach (Graphic)

Page 33: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 33

Scatter plot approachGiven

An attribute function f(x)A neighborhood relationship N(x)An aggregation function

A difference function Fdiff : є = E(x) – (m f(x) +

b)

Detect spatial outlier byStatistic test function

ST :

)(1

)(: )( yfk

xEf xNyN

aggr

Page 34: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 34

Graphical Spatial Outlier Test

Page 35: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 35

Original Data

Graphical Spatial Tests

Page 36: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 36

A Unified Algorithm

Separate two phasesModel building Testing (a node or a set of nodes)

Computation structure of model building

Key insights:• Spatial self join using N(x) relationship • Algebraic aggregate functions can be

computed in one disk scan of spatial join

Computation structure of testing Single node: spatial range query• Get-All-Neighbors(x) operation

Page 37: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 37

An example: Scatter plotModel building

An attribute function f(x)Neighborhood aggregate function Distributive aggregate functions

Algebraic aggregate functions •

• where ,

TestingDifference function

• where

Statistic test function•

)(1

)( )( yfk

xE xNy

)(),(),()(),(),( 22 xExfxExfxExf

22 ))(()(

)()()()(

xfxfN

xExfxExfNm

22

2

))(()(

))()(()()()(

xfxfN

xExfxfxExfNb

)2()( 2

nSmS xxyy

n

xfxfSxx

2

2 ))(()(

n

xExESyy

2

2 ))(()(

))(()( bxfmxE )(1

)( )( yfk

xE xNy

Page 38: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 38

Outlier Stations Detected

Page 39: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 39

Outlier Station Detected

Page 40: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 40

Conclusion and Future DirectionsSpatial domains may not satisfy assumptions of classical methods

data: auto-correlation, continuous geographic spacepatterns: global vs. local, e.g., outliers vs. spatial outliersdata exploration: maps and albums

Open Issuespatterns: hot-spots, spatial trends,…metrics: spatial accuracy (predicted locations), spatial contiguity(clusters)spatio-temporal dataset: spatial-temporal outliersscale and resolutions sentivity of patternsgeo-statistical confidence measure for mined patterns

Page 41: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 41

Reference1. S. Shekhar and Y. Huang, “Discovering Spatial Co-location Patterns: a Summary of Results”, In Proc. of 7th

International Symposium on Spatial and Temporal Databases (SSTD01), July 2001.

2. S. Shekhar, C.T. Lu, P. Zhang, "Detecting Graph-based Spatial Outliers: Algorithms and Applications“, the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001.

3. S. Shekhar, C.T. Lu, P. Zhang, “Detecting Graph-based Saptial Outlier”, Intelligent Data Analysis, To appear in Vol. 6(3), 2002

4. S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, “Extending Data Mining for Spatial Applications: A Case Study in Predicting Nest Locations”, Proc. Int. Confi. on 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2000), Dallas, TX, May 14, 2000.

5. S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, “Modeling Spatial Dependencies for Mining Geospatial Data”, First SIAM International Conference on Data Mining, 2001.

6. S. Shekhar, Y. Huang, W. Wu, C.T. Lu, What's Spatial about Spatial Data Mining: Three Case Studies , as Chapter of Book: Data Mining for Scientific and Engineering Applications. V. Kumar, R. Grossman, C. Kamath, R. Namburu (eds.), Kluwer Academic Pub., 2001, ISBN 1-4020-0033-2

7. Shashi Shekhar and Yan Huang , Multi-resolution Co-location Miner: a New Algorithm to Find Co-location Patterns in Spatial Datasets, Fifth Workshop on Mining Scientific Datasets (SIAM 2nd Data Mining Conference), April 2002

Page 42: Spatial Data Mining: Three Case Studies

C.T. Lu Spatial Data Mining 42

http://www.cs.umn.edu/research/shashi-group

Thank you !!!Thank you !!!