semantics + filtering + search = twitcident - exploring information in social web streams

Post on 11-May-2015

484 Views

Category:

Education

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk by Ke Tao (from Web Information Systems, TU Delft) at 23rd ACM Conference on Hypertext and Social Media, June 28 2012, Milwaukee, WI, USA

TRANSCRIPT

DelftUniversity ofTechnology

Semantics + Filtering + Search = TwitcidentExploring Information in Social Web StreamsHypertext 2012, Milwaukee, WI – June 28

Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao

Web Information Systems, TU Delft, the Netherlands

2Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

200,000,000number of tweets published per day

3

Pukkelpop 2011

People tweet about everything,

everywhere :-)

4

Pukkelpop 2011

81,000 tweets in four hours

became a tragedy

Filtering

200,000,000

Search & Analytics

Useful tweets?

5

Case NijmegenTrain accident

6Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

First tweet…

And then your train blasts off full of the anvils. #Nijmegen #veolia

7Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

First picture…

Astonishing! My train rams the platform at Nijmegen!

http://pic.twitter.com/QVVfJHyd

8Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Traditional news media

A train ramed the anvils at Nijmegen.

9Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

1. (Automatic) Filtering: Given an incident, how can one automatically identify those tweets that are relevant to the incident?

2. Search & Analytics: How can one improve search and analytical capabilities so that users can explore information in the streams of tweets?

Twitter streams

Research Challenges

Filtering

topic

Search & Analytics

information need

10Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

11Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident system

12Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

13Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Incident detection

• Twiticident relies on Emergency Broadcasting Services for detecting incidents.

• In the Netherlands : P2000 communication network

14Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Incident Profiling• For an incident i:

• The profile of an incident is described as a set of tuples.

• Each tuple includes a facet-value pair (f, v) and its weight to the incident i.

Location, Netherlands

0.4

Incident,Train

accident0.5

Location, Nijmegen

0.8

Orgranization,Veolia

0.6

Incident,Crash

1.0

15Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

16Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Social Media Aggregation • Collecting Twitter messages, pictures, and videos from Social Media Platforms e.g. Twitter, PhotoBucket, Vimeo

17Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

18Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Semantic Enrichment

• Named Entity Recognition

• Classification : Casualties, Damages, Risks…

• Linkage : External Resources

• Metadata extraction

19Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

20Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Filtering

•Which tweets are relevant to the incidents?

• Preprocessing : Language detection

• Semantic Filtering : Compare tweet with P(i)

• Semantic Filtering with News Context• P’(i) : P(i) complemented with f-v pairs from

news

21Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

22Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Faceted Search

•Strategies (ranking)

• Frequency-based

• Time-sensitive based

• Personalized

23Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Real-time analyticsWhat type of things are mentioned in the tweets?

What aspects are mentioned over time? What do people report about over time?

Impact Area

24Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Evaluation - Dataset

• Twitter corpus ( TREC Microblog Track 2011 ) • 16 million tweets (Jan. 24th – Feb. 8th, 2011 )• 4,766,901 tweets classified as English• 6.2 million entity-extractions

• News (Same time period)• 62 RSS News Feeds• 13,959 News Articles• 357,559 entity-extractions

25Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

EvaluationFor tweets Filtering (1/2)

Semantic strategies outperform the keyword-based filtering regarding all metrics.

26Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

EvaluationFor tweets Filtering (2/2)

The semantic strategy is more robust and achieves higher precisions for complex topics.

27Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

EvaluationFor Faceted Search (1/2)

The semantic faceted search strategy improves the search performance by 34.8% and 22.4%.

28Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

EvaluationFor Faceted Search (2/2)

The strategies with semantic enrichment outperform the strategy without semantic enrichment in predicting the appropriate facet-values.

29Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Conclusions

• What we have done:

• Twitcident, a framework for filtering, searching, and

analyzing information about incidents that people

publish in their Social Web Streams

• What we have achieved:

• Better filtering of Twitter messages for a given incident.

• Better search for relevant information about an incident

within the filtered messages.

30Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Thank you!

Ke Tao @taubau

@wisdelfthttp://twitcident.org

top related