leveraging the semantics of tweets for adaptive faceted search on twitter

35
Delft University of Technology Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter ISWC, Bonn, Germany, Oct 27 th 2011 n Abel 1 , Ilknur Celik 1 , Geert-Jan Houben, Patrick Siehndel 2 1 Web Information Systems, TU Delft, the Netherlands 2 L3S Research Center, Hannover, Germany

Upload: web-information-systems-tu-delft

Post on 11-May-2015

1.960 views

Category:

Technology


2 download

DESCRIPTION

Slides presented at ISWC 2011, Bonn, Germany. Corresponding paper: http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Research_Paper/12/70310001.pdf

TRANSCRIPT

Page 1: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

DelftUniversity ofTechnology

Leveraging the Semantics of Tweets for Adaptive Faceted Search on TwitterISWC, Bonn, Germany, Oct 27th 2011

Fabian Abel1, Ilknur Celik1, Geert-Jan Houben, Patrick Siehndel2

1Web Information Systems, TU Delft, the Netherlands2L3S Research Center, Hannover, Germany

Page 2: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

2Adaptive Faceted Search on Twitter

PersonalizedRecommendations

Personalized Search Adaptive Systems

What we do: Science and Engineering for the Personal Web

Social Web

Analysis and User Modeling

user/usage data

Semantic Enrichment, Linkage and Alignment

domains: news social media cultural heritage public data e-learning

Page 3: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

3Adaptive Faceted Search on Twitter

200,000,000number of tweets published per day

Page 4: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

4Adaptive Faceted Search on Twitter

1number of tweets that are interesting for me now

Page 5: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

5Adaptive Faceted Search on Twitter

Searching on Twitter

Page 6: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

6Adaptive Faceted Search on Twitter

Issues with Multiple Keywords Search

Page 7: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

7Adaptive Faceted Search on Twitter

Let’s try to search with One Keyword

Page 8: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

8Adaptive Faceted Search on Twitter

Page 1

Page 9: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

9Adaptive Faceted Search on Twitter

Page 2

Page 10: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

10Adaptive Faceted Search on Twitter

Page 3

Page 11: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

11Adaptive Faceted Search on Twitter

Page 60!!

tweet I was looking for

Next Saturday @thatsimpsonguy aka Guilty Simpson will be performing atArea51 in my hometwon Eindhoven. #realliveshit #iwillspinrecordsabout 9 hours ago via Blackberry

Music Artist

Locations

Page 12: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

12Adaptive Faceted Search on Twitter

Is there an easier way?

Locations more...

Events more...

Music Artists:+ Guilty Simpson+ Bryan Adams+ Elton John+ Golden Earring+ Rihanna+ The eagles+ 3 Doors Downmore...

Current Query:

Results:1. Yskiddd: Next saturday

@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit #iwillspinrecords2

2. Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL

3. sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents

Eindhoven Music

Expand Query:

Faceted Search can help (hypothesis)

Page 13: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

13Adaptive Faceted Search on Twitter

Challenges

Page 14: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

14Adaptive Faceted Search on Twitter

Facets of a Tweet

@bob: Julian Assange got arrested

http://bit.ly/5d4r2t

Creator @bob Location Delft, the NetherlandsCreation time Nov 29th 2011

Facet type Facet Value

Challenge 1: How to infer facets that describe the content of a tweet?

Page 15: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

15Adaptive Faceted Search on Twitter

Faceted Search: selecting facet-value pairs

Locations+ Aachen+ Aalborg+ Aalesund+ Aarhus+ Aasiaat+ Abaiang+ Abakanmore...

Events more...

Music Artists more…

Current Query:

Results:1. Yskiddd: Next saturday

@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit #iwillspinrecords2

2. Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL

3. sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents

Music

Expand Query:

Number of selectable facet values may be

very high!

Challenge 2: How to adapt the faceted search interface to the current demands

of a user?

Page 16: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

16Adaptive Faceted Search on Twitter

Adaptive Faceted Search Framework

Page 17: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

17Adaptive Faceted Search on Twitter

Adaptive Faceted Search Framework

Adaptive Faceted Search

Twitter posts

Semantic Enrichment

User and Context Modeling

user

How to adapt the facet-value pair ranking to

the current demands of the

user?

How to represent the content of a

tweet? facet extraction

Page 18: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

18Adaptive Faceted Search on Twitter

Facet Extraction and Semantic Enrichment

@bob: Julian Assange got arrested http://bit.ly/5d4r2t

Julian Assange

Julian Assange Tweet-basedenrichment

Julian Assange arrestedJulian Assange, the founder ofWikiLeaks, is under arrest inLondon…

Link-basedenrichment

Julian Assange

London

WikiLeaks

Julian Assange Julian Assange

LondonWikiLeaks

powered by

Page 19: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

19Adaptive Faceted Search on Twitter

Impact of Link-based enrichment

Representation of tweets:

significantly more facets per tweet with link-

based enrichment

Page 20: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

20Adaptive Faceted Search on Twitter

Faceted Search Strategies

• Challenge: most-relevant facet-value pair should appear at the top of the ranking

• Baseline: hashtag-based keyword search

Locations1. Aachen2. Aalborg3. Aalesund4. Aarhus…2145.

Eindhoven

Locations1. Eindhoven2. Delft3. Amsterdam4. Rotterdam5. London…

Page 21: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

21Adaptive Faceted Search on Twitter

Faceted Search Strategies

• Challenge: most-relevant facet-value pair should appear at the top of the ranking

• Baseline: hashtag-based keyword search• Faceted Search Strategies:

1. Occurrence frequency: count occurrence frequencies of FVP (baseline)

Locations1. Aachen2. Aalborg3. Aalesund4. Aarhus…2145.

Eindhoven

Locations1. Eindhoven2. Delft3. Amsterdam4. Rotterdam5. London…

facet-value pair

current hit list of matching tweets

number of tweets that contain the FVP

Page 22: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

22Adaptive Faceted Search on Twitter

Faceted Search Strategies

• Challenge: most-relevant facet-value pair should appear at the top of the ranking

• Baseline: hashtag-based keyword search• Faceted Search Strategies:

1. Occurrence frequency: count occurrence frequencies of FVP (baseline)

2. Personalization: adapt ranking to user profile ( different user modeling strategies possible; here: entire tweeting history of the user)

Locations1. Aachen2. Aalborg3. Aalesund4. Aarhus…2145.

Eindhoven

Locations1. Eindhoven2. Delft3. Amsterdam4. Rotterdam5. London…

facet-value pair

current hit list of matching tweets

number of tweets that contain the FVP

Personalized FVP ranking strateyfacet-value pair

current hit list of matching tweets

number of tweets that contain the FVP

timeJune 27 July 4user

User Profile FVP weight 6

43

(location, Delft)

(event, JazzBaltica)

(person, ChetBaker)

weight in user profile =

rank of the FVP

Page 23: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

23Adaptive Faceted Search on Twitter

Faceted Search Strategies

• Challenge: most-relevant facet-value pair should appear at the top of the ranking

• Baseline: hashtag-based keyword search• Faceted Search Strategies:

1. Occurrence frequency: count occurrence frequencies of FVP (baseline)

2. Personalization: adapt ranking to user profile ( different user modeling strategies possible; here: entire tweeting history of the user)

3. Diversification: increase variety among the top-ranked FVPs

Locations1. Aachen2. Aalborg3. Aalesund4. Aarhus…2145.

Eindhoven

Locations1. Eindhoven2. Delft3. Amsterdam4. Rotterdam5. London…

facet-value pair

current hit list of matching tweets

number of tweets that contain the FVP

Personalized FVP ranking strateyfacet-value pair

current hit list of matching tweets

number of tweets that contain the FVP

timeJune 27 July 4user

User Profile FVP weight 6

43

(location, Delft)

(event, JazzBaltica)

(person, ChetBaker)

weight in user profile =

rank of the FVPnumber of tweets that contain the FVP

Genre+ Blues+ Jazz+ JazzMusic+ Rockmore...

Genre+ Blues+ Jazz+ Rock+ Classicmore...

minimize overlaps

Page 24: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

24Adaptive Faceted Search on Twitter

Faceted Search Strategies

• Challenge: most-relevant facet-value pair should appear at the top of the ranking

• Baseline: hashtag-based keyword search• Faceted Search Strategies:

1. Occurrence frequency: count occurrence frequencies of FVP (baseline)

2. Personalization: adapt ranking to user profile ( different user modeling strategies possible; here: entire tweeting history of the user)

3. Diversification: increase variety among the top-ranked FVPs4. Time-sensitivity: adapt FVP ranking to temporal context

•Semantic enrichment: (i) tweet-based and (ii) link-based enrichment

Locations1. Aachen2. Aalborg3. Aalesund4. Aarhus…2145.

Eindhoven

Locations1. Eindhoven2. Delft3. Amsterdam4. Rotterdam5. London…

facet-value pair

current hit list of matching tweets

number of tweets that contain the FVP

Personalized FVP ranking strateyfacet-value pair

current hit list of matching tweets

number of tweets that contain the FVP

timeJune 27 July 4user

User Profile FVP weight 6

43

(location, Delft)

(event, JazzBaltica)

(person, ChetBaker)

weight in user profile =

rank of the FVPnumber of tweets that contain the FVP

Genre+ Blues+ Jazz+ JazzMusic+ Rockmore...

Genre+ Blues+ Jazz+ Rock+ Classicmore...

minimize overlaps

Personalized FVP ranking stratey

current hit list of matching tweets

number of tweets that contain the FVP

timeJune 27 July 4June 20

occu

rren

ce

freq

uen

cy

of

FV

P (event, JazzBaltica)

(event, FrenchOpen)

Event+ JazzBaltica+ FrenchOpenmore...

search

Page 25: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

25Adaptive Faceted Search on Twitter

Research Questions

1. How well does faceted search that is supported by the semantic enrichment perform in comparison to keyword search?

2. What strategy performs best in ranking facet-value pairs that allow users to find relevant tweets on Twitter?

3. How do the different building blocks of the faceted search framework influence the performance?

Page 26: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

26Adaptive Faceted Search on Twitter

Dataset

timeNov 15 Dec 15 Jan 15 Feb 15

20,000 Twitter users

30,000,000 tweets

4 months

more than:

Egyptian revolution

Jan 25

Page 27: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

27Adaptive Faceted Search on Twitter

Evaluation Framework• User Simulation Model [cf. Koren et al., WWW’08]:

• Input: search settings = { (user who searches, relevant target tweet) }

• Drill down search result list until no more FVPs can be applied or less than 10 tweets match the query

• Simulating click behavior: first-matching FVP is selected ( user knows target resource)

• Ground truth relevant target tweet = tweet that has been re-tweeted by the user

• Metrics:• Succes@k: probability that relevant FVP appears in the top k

(the higher the Succes@k, the faster the search and fewer the user effort)

• MRR: mean reciprocal rank of the target tweet when the user selected it

Page 28: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

28Adaptive Faceted Search on Twitter

Faceted-search vs. hashtag-based (keyword) search

Faceted search based on semantic

enrichment of tweets outperforms

hashtgag-based search significantly.

Page 29: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

29Adaptive Faceted Search on Twitter

Results: OverviewPersonalized strategy achieves ~12% better

performance than other semantic strategies (and 2 x better than hashtag-based)

Page 30: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

30Adaptive Faceted Search on Twitter

Impact of link-based enrichmentPersonalized strategy outperforms baseline

significantly

Link-based enrichment improves quality for

both strategies

Page 31: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

31Adaptive Faceted Search on Twitter

Impact of time-sensitivity

Time-sensitivity based ranking improves quality for both frequency and

diversification strategies

Page 32: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

32Adaptive Faceted Search on Twitter

Application of the Faceted Search Framework

Page 33: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

33Adaptive Faceted Search on Twitter

Twitcident.comTwitter-based crisis

management system

1.

2.

3. 4.

Semantic enrichment allows for:1. Grouping

tweets into incidents

2. Faceted search3. Thematic Views4. Analysis

Page 34: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

34Adaptive Faceted Search on Twitter

Conclusions

What we did: • Adaptive Faceted Search on Twitter + Evaluation

Framework• Analysis and Evaluation (+ Application in Twitcident)Findings:1. Semantic Enrichment allows for structured

representation of the content of tweets basis for faceted search

2. Faceted search performs significantly better than hashtag-based keyword search

3. Different building blocks for making faceted search on Twitter adaptive improve the search quality:a) Link-based enrichment: more discoverable tweets, better search

performance

b) Personalization leads to significant improvements

c) Time-sensitivity improves performance as well

Page 35: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

35Adaptive Faceted Search on Twitter

Thank you!

Twitter: @fabianabelhttp://wis.ewi.tudelft.nl/iswc2011/