ecir2017-inferring user interests for passive users on twitter by leveraging followee biographies

Guangyuan Piao, John G. Breslin

Unit for Social Semantics

39th European Conference on Information Retrieval Aberdeen, Scotland, 9-13, April, 2017

Inferring User Interests for Passive Users on Twitter by Leveraging Followee Biographies

2

1/3 users seek medical information and over 50% users consume news

on Social Networks

Facebook and Twitter together generate more than 5 billion microblogs / day

[SOURCE] Semantic Filtering for Social Data, Amit et al., Internet Computing’16

According to a research done by Twocharts, 44% of Twitter users have never sent a tweet

[SOURCE] http://guardianlv.com/2014/04/twitter-users-are-not-tweeting/

How can we infer user interests for passive users based on the info of their followees?

! user modeling for active users •  analyzing users’ tweets

•  representing user interests using different approaches •  bag-of-words

•  topic modeling

•  bag-of-concepts

dbpedia:Eagles_of_Death_Metal (5)

Related Work

5

interest frequency

dbpedia:The_Wombats (2)

Related Work

! user modeling for passive users •  analyzing information of users’ followees

•  HIW(followees_tweet) [Chen et al. SIGCHI’10] •  a great amount of data •  but also noisy

•  SA(followees_name) [Besel et al. SAC’16, Faralli et al. SNAM’16] •  link names to entities •  construct category-based user profiles spreading activation + WiBi-taxonomy (Wikipedia categories)

6

dbpedia:Cristiano_Ronaldo (5)

Real_Madrid_C.F._players

2014_FIFA_World_Cup_players

Category A

Category B

…

…

Exploiting Background Knowledge

! Wikipedia category system •  Wikipedia Bitaxonomy (Flati et al. ACL’14)

•  Hierarchical Interest Graph (Kapanipathi et al. ESWC’14)

7

Exploiting Background Knowledge

! DBpedia •  beyond category information

•  using related entities with various properties

8

Aim of Work

! user modeling for passive users •  limitation of using followees’ names

•  link names to entities (only popular followees can be linked)

•  12.7% in [Faralli et al. SNAM’16]

•  we aim to investigate •  whether we can leverage the biographies (bios)

of followees for inferring user interest profiles,

•  evaluate our approach against two state-of-art

user modeling strategies

9

BobHorry@bob

Android developer,educator

10

Our Approach

1fetchuser’sfollowees

2extractentitiesfrombiosoffollowees

3interest

propagationTwitteruser@bob Interestprofile

TwitterAPI Aylien APIWiBi

taxonomyDBpediagraph

! user modeling leveraging biographies of followees

BobHorry@bob

Android developer,educator

dbpedia:Android (5)

Smartphones

…

Mobile_operating_systems

Category A

…

11

Interest Propagation

! spreading activation (Kapanipathi et al. ESWC’14)

Category

dsubnodes

entity

12

Interest Propagation

!  interest propagation using DBpedia (SEMANTiCS’16)

•  SP: # of subpages •  SC: # of subcategories

•  P: # of properties appearing in the whole DBpedia graph •  intuition 1: discount common categories

•  intuition 2: discount related entities connected with common properties

13

Experiment Setup

! main goal •  analyze & compare different user modeling strategies in the

context of link (URL) recommendations

! recommendation algorithm •  cosine similarity between a user and a link (URL)

! ground truth •  links shared by users in their last two weeks

! candidate set (1,377 distinct links) •  all links shared by users in their last two weeks

14

Experiment Setup

! Twitter dataset •  461 random users

•  902,544 followees •  90% of them filled their biographies

! dataset for experiment •  50 users •  84,646 followees, 77,825 distinct ones •  7,785 (10%) out of 77,825 followees can be linked to entities

•  72,145 (92.7%) of followees have bios

15

Experiment Setup

! evaluation metrics •  MRR (Mean Reciprocal Rank)

•  the 1st relevant item occurs on average in recommendations

•  S@N (Success rate) •  mean probability of a relevant item occurs in the top-N list

•  P@N (Precision) •  mean probability of retrieved items in the top-N are relevant

•  R@N (Recall) •  mean probability of relevant items retrieved in in the top-N

16

Observation

! # of entities extracted from names & bios of followees

•  more than twice the # of entities using bios of followees

•  on average, 509 entities (bios) vs. 210 entities (names)

0

100

200

300

400

500

600

followees_bio followees_name

averagenu

mbe

rofe

n--e

s

datasourcesforextrac-ngen--es

17

Results

0.3402

0.4665

0.5532 0.5616

0

0.1

0.2

0.3

0.4

0.5

0.6

recommen

da)o

npe

rforman

ce

usermodelingstrategies

SA(followeesname) HIW(followeestweet)

SA(followeesbio) IP(followeesbio)

0.6250 0.6250

0.81250.7708

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

recommen

da)o

npe

rforman

ce




MRR S@10

18

Results

0.1625

0.2521

0.2896

0.3354

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

recommen

da)o

npe

rforman

ce




0.0726

0.11860.1334

0.1555

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

recommen

da)o

npe

rforman

ce




P@10 R@10

Conclusions

•  leveraging biographies of followees can provide :

•  more quantified user profiles (a greater number of entities)

•  more qualified user profiles in terms of recommendation performance

•  leveraging DBpedia for interest propagation provides better performance compared to using categories only

20

Thank you for your attention!

Guangyuan Piao homepage: http://parklize.github.io e-mail: [email protected] twitter: https://twitter.com/parklize slideshare: http://www.slideshare.net/parklize

ecir2017-inferring user interests for passive users on twitter by leveraging followee biographies

Education