geo-spatial event detection in the twitter stream michael kaisser, agt international berlin...

30
Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

Upload: laila-rowlett

Post on 19-Jan-2016

216 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

Geo-spatial Event Detection in the Twitter Stream

Michael Kaisser, AGT International

Berlin Buzzwords, June 3, 2013

Page 2: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

2

Outline

1. Introduction & Context

• Social Media Analysis in a C2 Center

2. The “Avalanche” event detection approach

• Identify posting “hot spots”

• Evaluate post clusters with Machine Learning approach

3. Evaluation

4. Future work

Page 3: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

3

Background: Social Data

• Social Media continuously creates massive amounts of data

• E.g. 500 Million tweets each day: ~300 GB raw data

• Nature of the data:

• time-stamped

• textual (many languages, lingos & slangs, spelling mistakes are ripe, only a few words per tweet)

• links to pictures

• links to news paper articles (more text)

• sometimes geo-spatial (contains coordinates)

• Creating real actionable insights from this isn’t an easy problem

This talk gives one specific example how this can be done

Page 4: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

4

Use case: Urban Management & Public Safety

• Cites today are complex and need to be organized

• Administration is responsible for keeping population safe

• emergency services

• health services

• fire fighters

• police

Command & Control Center

Page 5: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

5

Urban Management & Public Safety

Why is Social Media relevant in this context?

?

Page 6: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

6

Urban Management & Public Safety

Why is Social Media relevant in this context?

“There's a plane in the Hudson. I'm on the ferry going to pick up the people. Crazy”

Page 7: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

7

Urban Management & Public Safety

Why is Social Media relevant in this context?

“De tering, wat een hel!!! 1,4 miljoen mensen op dat terrein! #loveparade”

Page 8: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

8

Urban Management & Public Safety

Why is Social Media relevant in this context?

“#Hoboken is on fire. Building above Hoboken Farm Corporation at 300 Washington is all smoked out”

Social Media can help creating a situational awareness picture

Page 9: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

9

Context: Social Media in a C2 Center

Page 10: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

10

Avalanche: Event detection in a C2 Center

Page 11: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

11

Avalanche: Event detection in a C2 Center

Page 12: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

12

Avalanche: Event detection in a C2 Center

Page 13: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

13

Avalanche: Event detection in a C2 Center

Page 14: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

14

Avalanche: Event detection in a C2 Center

Page 15: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

15

Avalanche: Event detection in a C2 Center

Page 16: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

16

Two step approach:

1.Identify locations with high tweet activity

• Collect geo-spatial tweet clusters

2.Evaluate clusters with a Machine Learning approach

• Do these clusters constitute an real-world event that the tweeters are witnessing first-hand?

Work in Progress:

3.Classify events according to type

How is it done?

Page 17: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

17

Machine Learning – What is the task?

= geo-located Social Media post (Tweet)

Page 18: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

18

Machine Learning – What is the task?

• Suspicious package in #GrandCentral #NYC #bomb threat possibility not sure?? http://t.co/VwU7SP3X

• Suspicious package found in Grand Central Station... the 456 train..the trains are closed !! [pic]: http://t.co/9YPki4k2

• Something happened in the #456 #trainstation in #GrandCentral #NYC http://t.co/GGKvQura

• Accident on the #456train in #midtown #NYC http://t.co/fj2mJJmf

vs.• RT @refinery29: This image of Madeleine Albright playing the drums

will be the best thing you'll see today: http://t.co/rGwQ5RdG• «@_PrettyPoison Guess ill fill out more job apps today» make punna

fill out some 2!• The Glamour & Glitz at the 2012 Emmy' s that we loved!

http://t.co/CiTFszfL• @IszwanieSyahira: i'm happy and i hope u feel the same too.

weeeee ~.~• How to prepare yourself for Friday's apocalypse http://cnet.co/lPU

We need to automatically determine which of the tweet clusters (tweets issued close to each other in a short time frame) represent real-world events and which are just random chatter.

Good

Bad

Page 19: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

19

• We look for geo-spatial clusters of tweets (e.g. 3 or more tweets in a 200m radius, posted within 30 mins)

• These become “event candidates”

• Event candidates are evaluated with a Machine Learning scheme.

• We currently use C4.5 decision trees.

Architecture

Page 20: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

20

Machine Learning - Features

Tweet cluster:•Suspicious package in #GrandCentral #NYC #bomb threat possibility not sure?? http://t.co/VwU7SP3X•Suspicious package found in Grand Central Station... the 456 train..the trains are closed !! [pic]: http://t.co/9YPki4k2•Something happened in the #456 #trainstation in #GrandCentral #NYC http://t.co/GGKvQura•Accident on the #456train in #midtown #NYC http://t.co/fj2mJJmf

Page 21: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

21

Blue = trainingGreen = runtime

In offline ML, we train once, but use the predictive model possibly millions of times a day.

It’s okay if training isn’t fast as lightning. But during execution every CPU cycle can count.

Scalable Machine Learning … …with Weka!

Page 22: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

22

Scalable Machine Learning … …with Weka!

… which can be optimized further in various ways.

See e.g. Nima Asadi, Jimmy Lin, Arjen P. de Vries. Runtime Optimizations for Tree-Based Machine Learning Models. IEEE Transactions on Knowledge and Data Engineering, 2013.

Page 23: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

23

Evaluation setup:• 1,000 hand-labeled tweet clusters.

• 319 good, 681 bad.

• 10-fold cross validation.

Machine Learning - Evaluation

Page 24: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

24

Evaluation setup:•1,000 hand-labeled tweet clusters. 319 good, 681 bad. •10-fold cross validation.

Machine Learning - Evaluation

Page 25: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

25

Machine Learning - Evaluation

Evaluation setup:•1,000 hand-labeled tweet clusters. 319 good, 681 bad. •10-fold cross validation.

Unique Posters scoreC

om

mon

Th

em

e s

core

11

0

Blue: eventRed: no event

Page 26: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

26

If there are several tweets …• from roughly the same location

• at roughly the same time

• from different users

• that nevertheless use the same words

… chances are good that we have detected an event.

(Somewhat simplyfied) Summary

Page 27: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

27

Outlook – work in progress and future work

Derive more coordinates

• from shared pictures

• from toponyms in posts

• use image sharing sites directly

Make use of posts without coordinates

• and add them to already existing clusters

Explore real-time TF-IDF

• to get rid of the Kardashians & Beliebers

Evaluate system with real-world data

• Because recall numbers are currently somewhat misleading

Page 28: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

28

Machine Learning – Relevance Feedback

Machine Learning Model

Users (journalists, C2 operators )

Documents (e.g. tweets, post clusters)

Good

Good

Bad

• Users implicitly rate documents by how they interact with them• User performs follow up actions relevant• User clicks document away irrelevant

System learns to present more relevant documents System can adapt to changing needs over time

Work in progress

Page 29: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

29

Example: Explosion in an image

Explosion detectedwith Image Analysis OMG!!!OMG!!!

http://t.co/maiAgHoh

Problem:• Not all tweets contain useful textual information• Shared text might be hard to analyze

Solution:• ~35% of tweets contain linked images• Images provide a wealth of information that can be analyzed

• Objects, events, persons• coordinates

Image Analysis of shared pictures Work in progress

Page 30: Geo-spatial Event Detection in the Twitter Stream Michael Kaisser, AGT International Berlin Buzzwords, June 3, 2013

Thank you!