name: sujing wang advisor: dr. christoph f. eick data mining & machine learning group

14
A Polygon-based Clustering and Analysis Framework for Mining Spatial Dataset Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

Upload: kelley-hodges

Post on 30-Dec-2015

226 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

A Polygon-based Clustering and Analysis Framework for Mining

Spatial Dataset

Name: Sujing WangAdvisor: Dr. Christoph F. Eick

Data Mining & Machine Learning Group

Page 2: Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

Outline1.Introduction2.Framework Architecture3.Methodology4.Case Study5.Conclusion and Future Work

Data Mining & Machine Learning Sujing Wang 2

Page 3: Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

IntroductionSpatial Data Mining (SDM):

the process of analyzing and discovering interesting and useful patterns, associations, or relationships from large spatial datasets.

Spatial object structures:(<spatial attributes>;<non-spatial attributes>)

Example:

Data Mining & Machine Learning Sujing Wang 3

Page 4: Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

IntroductionSpatial objects:

point, trajectory(line) polygon(region)

Data Mining & Machine Learning Sujing Wang 4

Page 5: Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

IntroductionChallenges:

Complexity of spatial data typesSpatial relationshipsSpatial autocorrelation

Motivation: Polygons, specially overlapping polygons are very

important for mining spatial datasets. Traditional Clustering algorithms do not work for spatial

polygons. Research goal:

Develop new distance functions and new spatial clustering algorithms for polygons clustering.

Implement novel post-clustering techniques with plug-in reward functions to capture domain experts notation of interestingness.

Data Mining & Machine Learning Sujing Wang 5

Page 6: Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

Geospatial Datasets

Reward Functions

Spatial Clusters

Poly_SNN

Post-processing

Domain Experts

Notion of Interestingness

DCONTOUR

Meta Clusters

Summaries and Interesting Patterns

A Polygon-based Clustering and Analysis Framework for Mining

Spatial Datasets

Page 7: Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

Methodology1. Domain Driven Final Clustering Generation MethodologyInputs:

A meta-clustering M={X1, …, Xk} —at most one object will be selected from each meta-cluster Xi (i=1,...k).

The user provides the individual cluster reward function RewardU whose values are in [0,).

A reward threshold U —clusters with low rewards are not included in the final clusterings.

A cluster distance threshold d, which expresses to what extent the user would like to tolerate cluster overlap.

A cluster distance function dist.

Find ZX1…Xk that maximizes:

subject to: xZ x’Z (xx’ Dist(x,x’)>d)

xZ (RewardU(x)>U)

xZ x’Z ((x Xi x’ Xk xx’ ) ik)

Zc U crewardZq )()(

Data Mining & Machine Learning Sujing Wang 7

Page 8: Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

Methodology2. Finding interesting clusters with respect to continuous non spatial variable V:

Let Xi 2A be a cluster in the A-space

be the variance of v with respect in dataset D (Xi) be the variance of variable v in a cluster Xi

mv(Xi) the mean value of variable v in a cluster Xi

t10 a mean value reward threshold and t21 be a variance reward threshold

Interestingness function for each cluster:( Xi) = max (0, |mv(Xi)| - t1) × max(0, - ((Xi) × t2))

Data Mining & Machine Learning Sujing Wang 8

Page 9: Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

-95.8 -95.6 -95.4 -95.2 -95.0 -94.8

Longitude

30.4

30.2

30.0

29.8

29.6

29.4

29.2

29.0

Latit

ude

Case Study1. Meta-clusters generated from multiple spatial datasets:

Data Mining & Machine Learning Sujing Wang 9

Page 10: Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

-95.8 -95.6 -95.4 -95.2 -95.0 -94.8

Longitude

30.4

30.2

30.0

29.8

29.6

29.4

29.2

29.0

Latitu

de

13

80

125

21

150

Case Study2. Final Clusters with area of polygons as plug-in reward

function

Polygon ID 13 21 80 125 150

Temperature (oF) 79.0 86.35 89.10 84.10 88.87

Solar Radiation (Langleys per minute) N/A 1.33 1.17 0.13 1.10

Wind Speed (Miles per hour) 4.50 6.10 6.20 4.90 5.39

Time of Day 6 p.m. 1 p.m. 2 p.m. 2 p.m. 12 p.m.

Data Mining & Machine Learning Sujing Wang 10

Page 11: Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

Case Study

Cluster ID Mean Variance Number of Polygon

5 -0.9144 0.1981 515 1.1218 0.1334 521 1.0184 0.0350 3

-95.8 -95.6 -95.4 -95.2 -95.0 -94.8

Longitude

30.2

30.0

29.8

29.6

29.4

29.2

29.0

Latitu

de

15

5

21

3. Finding interesting meta-clusters with respect to solar radiation:

Data Mining & Machine Learning Sujing Wang 11

Page 12: Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

Conclusion & future workConclusions:

Our framework can effectively cluster spatial overlapping polygons similar in size, shape and locations.

Our post-clustering techniques with different plug-in reward functions can guide the knowledge extraction of interesting patterns and generate summaries from large spatial datasets.

Future Works:Develop novel spatial-temporal clustering techniques

and embed them to our framework.Investigating novel change analysis techniques to

identify spatial and temporal changes of spatial data.Evaluate our framework in challenging case studies.

Data Mining & Machine Learning Sujing Wang 12

Page 13: Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

Publication: S. Wang, C.S. Chen, V. Rinsourongkawong, F. Akdag, C.F. Eick, “Polygon-

based Methodology for Mining Related Spatial Datasets”, ACM SIGSPATIAL GIS Workshop on Data Mining for Geoinformatics (DMG) in conjunction with ACM SIGSPATIAL GIS 2010, San Jose, CA, Nov. 2010.NSF travel Award for ACM GIS 2010

 S. Wang, C. Eick, Q. Xu, “A Space-Time Analysis Framework for Mining Geospatial Datasets”, CyberGIS’12 the First International Conference on Space, Time, and CyberGIS, University of Illinois at Urbana-Champaign, Champaign, IL Aug 6-9, 2012.NSF travel Award for CyberGIS 2012

C. Eick, G. Forestier, S. Wang, Z. Cao, S. Goyal, “A Methodology for Finding Uniform Regions in Spatial Data”, CyberGIS’12 the First International Conference on Space, Time, and CyberGIS, University of Illinois at Urbana-Champaign, Champaign, IL Aug 6-9, 2012.

S. Wang, C.F. Eick, “A Polygon-based Clustering and Analysis Framework for Mining Spatial Datasets”, Geoinformatica, (Under Review).

Data Mining & Machine Learning Sujing Wang 13

Page 14: Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

Thank you!

Data Mining & Machine Learning Sujing Wang 14