#4 databeersbcn - "visualizing geolocated tweets" by joana simoes

20
Finding Patterns Showing Patterns Putting it All Together Conclusions Visualizing Geolocated Tweets A Spatial Data Mining Approach Joana Sim˜ oes 1 1 Eurecat, Centre Tecnologic de Catalunya September 15, 2015 1 / 20

Upload: databeersbcn

Post on 11-Feb-2017

571 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

Visualizing Geolocated TweetsA Spatial Data Mining Approach

Joana Simoes 1

1Eurecat, Centre Tecnologic de Catalunya

September 15, 2015

1 / 20

Page 2: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

Why we love Twitter

Due to its willingness in sharing data, Twitterhas been a prime playground, for researchersand practitioners around the world.

Users on Twitter generate over 400 milliontweets everyday.

Approximately 1% of all Tweets published onTwitter are geolocated.

2 / 20

Page 3: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

Motivation

How to assimilate these largedatasets and extract somerelevant information?

3 / 20

Page 4: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

Approach

Technological Challenge + Representation ChallengeMachine Learning/ GIS

4 / 20

Page 5: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

Clustering

Unsupervised, descriptive,method.

Widely used due to itssegmentation andsummarization features.

Identifies groups of objects,which are similar betweenthem, and distinct from therest.

5 / 20

Page 6: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

DBSCAN

Density-based clustering.

A dense-region is defined byglobal parameters:

epsilon: neighbourhoodradius.minPts: minimumnumber of points requiredto form a dense region.

6 / 20

Page 7: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

The Bottleneck: Global Parameters

Less strict combination of parameters Strict combination of parameters

7 / 20

Page 8: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

OPTICS

Generalization of DBSCAN.Ordering of the database, such as points that are spatiallyclosest become neighbours in the ordering.It does not produce a strict partition of the data.

8 / 20

Page 9: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

OPTICS (cont.)

In the ordering clusters appear as valleys, separated by noiseregions(peaks).

9 / 20

Page 10: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

Opticsxi: detecting Clusters

We can define the start and end of a cluster, based on athreshold (xi).Hierarchical partition.

10 / 20

Page 11: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

GIS: Putting Geographic Information into Context

Visualization is a key element to understand and interpret theresults of data mining.What about Geospatial information...?

11 / 20

Page 12: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

NASA WW

WorldWind is an Open Source virtual globe developed byNASA that accesses a number of geographic datasets.

OpenGL.

12 / 20

Page 13: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

Cluster Explorer

Java App based onFOSS.Generates clustersbased on DBSCAN,OPTICSxi (or both).Displays the resultson a virtual globe.

13 / 20

Page 14: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

A Case Study

A set of geo-located Tweets in the city of Barcelona, over a periodof five days

DBSCAN run OPTICSxi run

14 / 20

Page 15: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

GIS and the value of Location-Analysis (cont.)

15 / 20

Page 16: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

GIS and the value of Location-Analysis (cont.)

16 / 20

Page 17: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

GIS and the value of Location-Analysis (cont.)

Identified clusters (hierarchical level):Plaza de Catalunya cluster (L1)Triangle Shopping Centre (L2)FNAC shop (L3)Font de Canaletes/Metro (L2)Hard rock cafe (L2)Central gardens (L2)

17 / 20

Page 18: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

Conclusions

Flat cluster partition: good for summarizing the dataset andeasy to interpret.Hierarchical cluster partition: allows to look at the citythrough multiple scales.

18 / 20

Page 19: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

Conclusions (cont.)

Visualizations provided by the use of a virtual globe haveproved to be a flexible and context enhancing tool, that wascrucial for the interpretation of the results.Combining machine learning and GIS, can be a promisingapproach in the field of spatial data mining.

19 / 20

Page 20: #4 DataBeersBCN - "Visualizing Geolocated Tweets" by Joana Simoes

Finding PatternsShowing Patterns

Putting it All TogetherConclusions

Thank You for Listening!

This presentation is available at:http://tinyurl.com/nd29g3f

20 / 20