pro lerank: finding relevant content and in uential users based … › ~arlei › talks ›...

23
ProfileRank: Finding Relevant Content and Influential Users based on Information Diffusion @SNAKDD’13, Chicago, IL Arlei Silva 1 , Sara Guimar˜ aes 2 , Wagner Meira Jr. 2 , Mohammed Zaki 3 1 Computer Science Department – University of California, Santa Barbara, CA 2 Computer Science Department – Universidade Federal de Minas Gerais, Brazil 3 Computer Science Department – Rensselaer Polytechnic Institute, NY

Upload: others

Post on 29-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

ProfileRank:Finding Relevant Content and Influential

Users based on Information Diffusion@SNAKDD’13, Chicago, IL

Arlei Silva1, Sara Guimaraes2,Wagner Meira Jr.2, Mohammed Zaki3

1Computer Science Department – University of California, Santa Barbara, CA2Computer Science Department – Universidade Federal de Minas Gerais, Brazil

3Computer Science Department – Rensselaer Polytechnic Institute, NY

Page 2: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users
Page 3: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Social Media in Numbers

Twitter: 500M users, 340M tweets/day

Tumblr: 100M users, 75M posts/day

Facebook: 1.15B users, 1B pieces of content shared/day

Instagram: 30M users, 5M photos shared/day

Page 4: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Influence and Relevance in Social Media: Questions

Who are the influentials?

I influence: ability of popularizing information

I personalized influence

What is relevant?

I relevance: capacity of satisfying a user’s information needs

I personalized relevance

Why are these questions important?

I Information diffusion mechanisms

I Recommender systems

I Viral marketing

Page 5: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Information Diffusion Data

Content creation/propagation represented as tuples:

I <user,content,time>

C

RT@user_0 A

0 @user_0 1

2 3

@user_1

@user_2 @user_3

A

BB | !BB?

RT@user_0 B

B

RT@user_1 C

(a) Twitter

user 0, A, t0user 0, B, t1user 1, A, t2user 1, C , t3user 2, B, t4user 3, C , t5

(b) Diffusion data

How can we measure influence and relevance?

Page 6: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

ProfileRankRandom walks over a content-user graph

Relevant content is created and propagated by influentialusers and influential users create relevant content

Relies on content propagation, instead of a social networkI In some scenarios, there is no social network availableI # of followers 6= capacity to propagate content [Cha et al.’10]

user 0, A, t0user 0, B, t1user 1, A, t2user 1, C , t3user 2, B, t4user 3, C , t5

(a) Diffusion data (b) Diffusion model

Page 7: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

ProfileRank: Formulation

Information diffusion data → information diffusion graph

I G (U,C ,F ,E )

G can be represented as two matrices:

1. M: User-content matrix

2. L: Content-user matrix

Relevance r and influence i computed as:

r = iM i = rL

r(k) = r(k−1)LM) i(k) = i(k−1)ML

r = (1− d)u(I − dLM)−1 i = (1− d)u(I − dML)−1

These equations always have a unique solution

Page 8: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Related Work

Social influence and information diffusion [Gruhl et al.’04,Leskovec et al.’07, Tang et al.’09, Cha et al.’09, Cha et al.’10,Weng et al’10, Goyal et al.’10, Romero et al.’11]

Content search and recommendation [Baluja et al.’08, Chen etal.’10, De Choudhury et al.’11, Kim and Shim’11]

Link prediction in social networks [Liben-Nowell andKleinberg’03, Hannon et al.’10, Leroy et al.’10, Gomez Rodriguezet al.’10]

Relevance in hyperlinked environments [Kleinberg’98, Page etal.’99]

Page 9: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Evaluation

Problem: Absence of ground truth information

I Influential users

I Relevant content

Solution: Considering personalized assessments

I A user is influential to another user

I A content is relevant to a given user

ProfileRank can be personalized to provide recommendations

Assumption: Recommendation accuracy → model quality

Page 10: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Evaluation: Datasets

Dataset content #users #pieces of content #propagations source

TW-CARS tweet 529,630 369,287 1,368,080 TwitterTW-SOCCER tweet 837,559 3,485,313 958,144 TwitterTW-ELECTIONS tweet 3,860,251 4,067,221 15,844,788 TwitterTW-LARGE tweet 17,069,982 476,553,560 71,835,017 TwitterMEME meme 96,608,034 210,999,824 126,905,936 MemeTracker

Table: Information diffusion datasets.

Dataset edge #edges source

TW-SOCCER follower-followee 269,217,548 TwitterTW-LARGE follower-followee 1,470,000,000 Twitter

Table: Network datasets.

Page 11: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Evaluation: Content Recommendation

Task: Predicting content users will propagate

I 50/50% split training/test

Content: tweets and memes

ProfileRank (global and personalized):

I Recommendations based on relevance scores

Baselines: collaborative filtering

I MyMediaLite library

Page 12: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Evaluation: Content Recommendation

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

true p

ositiv

e r

ate

false positive rate

(a) ROC

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6

pre

cis

ion

recall

(b) Prec-recall

0

0.05

0.1

0.15

5 10 15 20

pre

cis

ion

n

(c) Precision@n

0

0.05

0.1

0.15

0.2

0.25

5 10 15 20

reca

ll

n

PPR

WRMF

POPULAR

PR

(d) Recall@n

Page 13: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Evaluation: User Recommendation

Task: Predicting influence links

I Cold-start

Follower relationships on Twitter data

ProfileRank (global and personalized):

I Recommendations based on influence scores

Baselines: cold-start link prediction [Leroy et al.’10]

I # content shared

I Adamic-Adar score

I # content shared + common neighbors

I Adamic-Adar score + common neighbors

Page 14: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Evaluation: User Recommendation

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

true p

ositiv

e r

ate

false positive rate

(a) ROC

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6

pre

cis

ion

recall

(b) Prec-recall

0

0.1

0.2

0.3

0.4

0.5

5 10 15 20

pre

cis

ion

n

(c) Precision@n

0

0.1

0.2

0.3

0.4

5 10 15 20

reca

ll

n

PPR

AA

CC

AA+CN

CC+CN

PR

(d) Recall@n

Page 15: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Evaluation: Top Influentials - US Elections

user description

BarackObama US President and Demo-crat candidate

Obama2012 Obama’s campaignUberFacts Comedy factsBorowitzReport Comedy newsStephenAtHome Comediantruthteam2012 Obama’s campaignReal Liam Payne Pop singerMittRomney Republican candidatethinkprogress Political blogrealDonaldTrump Businessman

Page 16: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Evaluation: Top Relevant Tweets - US Elections

content description

@BarackObama hi mr Obama have yougot up all night yet?

Message from Liam Payneto Barack Obama

That was one of the strangest daysever will smith taylor swift justin biebermichelle obama wow what it going onwith my life!!

Liam Payne about the 2012Kid’s Choice Award

Obama, congratulations on being thefirst sitting President to support marriageequality. Feels like the future, and not thepast. #NoFear

Lady Gaga about Obama’ssupport for gay marriage

”Same-sex couples should be able to getmarried.”–President Obama

Obama about his supportfor gay marriage

Summertime with @NiallOfficial and@BarackObama! http://t.co/KNnWnfz7

Josh Devine about a pictureincluding a Obama’s statue

Page 17: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Concluding Remarks

We proposed a simple model that accurately measures userinfluence and content relevance in information diffusion data.

Model based on random walks over a content-user graph

Extensive evaluation:

I Quantitative: User and content recommendation

I Qualitative: Intuitive results in real data

Future work:

I ProfileRank+filtering for search on Twitter

I Incorporating temporal dynamics for updated assessments

I Incorporating textual and network information

Page 18: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

ProfileRank: Finding Relevant Content andInfluential Users based on Information Diffusion

More information:

[email protected]

http://www.cs.ucsb.edu/~arlei

http://code.google.com/p/profilerank/

This student received a travel award. Thanks!

Page 19: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Evaluation: Content Recommendation

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

true p

ositiv

e r

ate

false positive rate

(a) ROC

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6

pre

cis

ion

recall

(b) Prec-recall

0

0.1

0.2

0.3

5 10 15 20

pre

cis

ion

n

(c) Precision@n

0

0.1

0.2

0.3

0.4

0.5

0.6

5 10 15 20

reca

ll

n

PPR

WIKNN

POPULAR

PR

(d) Recall@n

Page 20: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Evaluation: Content Recommendation

Method AUC BEP P@5 P@20 R@5 R@20PPR 0.81 0.28 0.12 0.08 0.12 0.22PR 0.64 0.01 0.01 0.01 0.01 0.02WRMF 0.61 0.11 0.04 0.03 0.05 0.08WBPRMF 0.58 0.08 0.02 0.01 0.03 0.04WIKNN 0.57 0.13 0.05 0.03 0.05 0.09WUKNN 0.57 0.13 0.05 0.03 0.05 0.09POPULAR 0.55 0.01 0.01 0.01 0.01 0.03

(a) TW-CARS

Method AUC BEP P@5 P@20 R@5 R@20PPR 0.89 0.46 0.27 0.11 0.46 0.58WIKNN 0.75 0.43 0.22 0.09 0.35 0.44WUKNN 0.75 0.38 0.21 0.09 0.35 0.44WBRMF 0.71 0.09 0.04 0.02 0.07 0.13WRMF 0.71 0.05 0.01 0.01 0.01 0.01POPULAR 0.65 0.01 0.01 0.01 0.01 0.02PR 0.62 0.01 0.01 0.01 0.01 0.02

(b) MEME

Page 21: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Evaluation: User Recommendation

Method AUC BEP P@5 P@20 R@5 R@20PPR 0.88 0.25 0.42 0.25 0.18 0.30PR 0.84 0.07 0.06 0.05 0.02 0.06AA+CN 0.78 0.06 0.18 0.10 0.07 0.12CC+CN 0.70 0.02 0.13 0.06 0.05 0.08AA 0.62 0.17 0.41 0.20 0.16 0.24CC 0.61 0.10 0.28 0.13 0.12 0.17

Table: TW-SOCCER

Page 22: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Evaluation: Pairwise Ranking Correlations

ProfileRank PageRank #propag. #followers

ProfileRank - n/a 0.89 n/aPageRank 0.28 - n/a n/a#propag. 0.81 0.30 - n/a#followers 0.29 0.81 0.32 -

(a) User metrics

ProfileRank #content PageRank #user #followerspropag. propag.

ProfileRank - 0.36 n/a 0.42 n/a#content propag. 0.22 - n/a 0.44 n/aPageRank 0.26 -0.02 - n/a n/a#user propag. 0.27 0.11 0.42 - n/a#followers 0.25 -0.01 0.83 0.45 -

(b) Content metrics

Page 23: Pro leRank: Finding Relevant Content and In uential Users based … › ~arlei › talks › snakdd13.pdf · 2014-07-03 · Pro leRank: Finding Relevant Content and In uential Users

Evaluation: Execution Time

Dataset ProfileRank PageRankTW-CARS 3.85 10.04TW-SOCCER 39.32 133.55TW-ELECTIONS 5.28 9.20TW-LARGE 17.74 59.33MEME 1.23 3.86

Table: Running time (in seconds).