ecir2017-inferring user interests for passive users on twitter by leveraging followee biographies
TRANSCRIPT
Guangyuan Piao, John G. Breslin
Unit for Social Semantics
39th European Conference on Information Retrieval Aberdeen, Scotland, 9-13, April, 2017
Inferring User Interests for Passive Users on Twitter by Leveraging Followee Biographies
2
1/3 users seek medical information and over 50% users consume news
on Social Networks
Facebook and Twitter together generate more than 5 billion microblogs / day
[SOURCE] Semantic Filtering for Social Data, Amit et al., Internet Computing’16
According to a research done by Twocharts, 44% of Twitter users have never sent a tweet
[SOURCE] http://guardianlv.com/2014/04/twitter-users-are-not-tweeting/
How can we infer user interests for passive users based on the info of their followees?
! user modeling for active users • analyzing users’ tweets
• representing user interests using different approaches • bag-of-words
• topic modeling
• bag-of-concepts
dbpedia:Eagles_of_Death_Metal (5)
Related Work
5
interest frequency
dbpedia:The_Wombats (2)
Related Work
! user modeling for passive users • analyzing information of users’ followees
• HIW(followees_tweet) [Chen et al. SIGCHI’10] • a great amount of data • but also noisy
• SA(followees_name) [Besel et al. SAC’16, Faralli et al. SNAM’16] • link names to entities • construct category-based user profiles spreading activation + WiBi-taxonomy (Wikipedia categories)
6
dbpedia:Cristiano_Ronaldo (5)
Real_Madrid_C.F._players
2014_FIFA_World_Cup_players
Category A
Category B
…
…
Exploiting Background Knowledge
! Wikipedia category system • Wikipedia Bitaxonomy (Flati et al. ACL’14)
• Hierarchical Interest Graph (Kapanipathi et al. ESWC’14)
7
Exploiting Background Knowledge
! DBpedia • beyond category information
• using related entities with various properties
8
Aim of Work
! user modeling for passive users • limitation of using followees’ names
• link names to entities (only popular followees can be linked)
• 12.7% in [Faralli et al. SNAM’16]
• we aim to investigate • whether we can leverage the biographies (bios)
of followees for inferring user interest profiles,
• evaluate our approach against two state-of-art
user modeling strategies
9
BobHorry@bob
Android developer,educator
10
Our Approach
1fetchuser’sfollowees
2extractentitiesfrombiosoffollowees
3interest
propagationTwitteruser@bob Interestprofile
TwitterAPI Aylien APIWiBi
taxonomyDBpediagraph
! user modeling leveraging biographies of followees
BobHorry@bob
Android developer,educator
dbpedia:Android (5)
Smartphones
…
Mobile_operating_systems
Category A
…
11
Interest Propagation
! spreading activation (Kapanipathi et al. ESWC’14)
Category
dsubnodes
entity
12
Interest Propagation
! interest propagation using DBpedia (SEMANTiCS’16)
• SP: # of subpages • SC: # of subcategories
• P: # of properties appearing in the whole DBpedia graph • intuition 1: discount common categories
• intuition 2: discount related entities connected with common properties
13
Experiment Setup
! main goal • analyze & compare different user modeling strategies in the
context of link (URL) recommendations
! recommendation algorithm • cosine similarity between a user and a link (URL)
! ground truth • links shared by users in their last two weeks
! candidate set (1,377 distinct links) • all links shared by users in their last two weeks
14
Experiment Setup
! Twitter dataset • 461 random users
• 902,544 followees • 90% of them filled their biographies
! dataset for experiment • 50 users • 84,646 followees, 77,825 distinct ones • 7,785 (10%) out of 77,825 followees can be linked to entities
• 72,145 (92.7%) of followees have bios
15
Experiment Setup
! evaluation metrics • MRR (Mean Reciprocal Rank)
• the 1st relevant item occurs on average in recommendations
• S@N (Success rate) • mean probability of a relevant item occurs in the top-N list
• P@N (Precision) • mean probability of retrieved items in the top-N are relevant
• R@N (Recall) • mean probability of relevant items retrieved in in the top-N
16
Observation
! # of entities extracted from names & bios of followees
• more than twice the # of entities using bios of followees
• on average, 509 entities (bios) vs. 210 entities (names)
0
100
200
300
400
500
600
followees_bio followees_name
averagenu
mbe
rofe
n--e
s
datasourcesforextrac-ngen--es
17
Results
0.3402
0.4665
0.5532 0.5616
0
0.1
0.2
0.3
0.4
0.5
0.6
recommen
da)o
npe
rforman
ce
usermodelingstrategies
SA(followeesname) HIW(followeestweet)
SA(followeesbio) IP(followeesbio)
0.6250 0.6250
0.81250.7708
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
recommen
da)o
npe
rforman
ce
usermodelingstrategies
SA(followeesname) HIW(followeestweet)
SA(followeesbio) IP(followeesbio)
MRR S@10
18
Results
0.1625
0.2521
0.2896
0.3354
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
recommen
da)o
npe
rforman
ce
usermodelingstrategies
SA(followeesname) HIW(followeestweet)
SA(followeesbio) IP(followeesbio)
0.0726
0.11860.1334
0.1555
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
recommen
da)o
npe
rforman
ce
usermodelingstrategies
SA(followeesname) HIW(followeestweet)
SA(followeesbio) IP(followeesbio)
P@10 R@10
Conclusions
• leveraging biographies of followees can provide :
• more quantified user profiles (a greater number of entities)
• more qualified user profiles in terms of recommendation performance
• leveraging DBpedia for interest propagation provides better performance compared to using categories only
20
Thank you for your attention!
Guangyuan Piao homepage: http://parklize.github.io e-mail: [email protected] twitter: https://twitter.com/parklize slideshare: http://www.slideshare.net/parklize