gdm@fudan

Which Topic will You Follow?Deqing Yang, Yanghua Xiao, Bo Xu, Hanghang Tong, Wei Wang and Sheng Huang

GDM@FUDANGDM@FUDAN Graph Data Management Lab, School of Computer Science http://gdm.fudan.edu.cn Fudan University, Shanghai, China

yangdeqing, [email protected] ECML/PKDD2012, Bristol, UK

Introduction

•Model•It is a two-category classification to predict whether an author will follow a given topic•Multiple Logistic Regression (MLR) model is feasible for our scenario, where the probability of topic-following is formalized as:

where xi is explanatory variable, αand β are parameters we should estimate by training•Baseline model

where a is the number of neighbors who have followed the topic

•Explanatory Variables•Social Influence

•An author u’s tendency to follow topic s in year t, is composed from all his neighbor v’s tendency to this topic, as well as considering their coauthor strengths

•Homophily•We use topic similarity to depict the homophily among users in the context of topic-following•A 25-dim vector u represents an author’s topic history, each dimension is the number of his papers of a given topic•Then, topic similarity between user u and v can be defined as

•W.r.t. those users who have followed topic s before t, i.e., we measure u’s homophily as

•Then, the whole MLR model is

•Y=π(x)=1, if u follows s or its related topics

Modeling Topic Diffusion in Scientific Collaboration Networks

•Motivations•Who are the most appropriate candidates to receive a call-for-paper or call-for-participation? •What session topics should we propose for a conference of next year?•Addressing these objectives, we study author’s topic-following behavior in Scientific Collaboration Network (SCN), i.e., an author follows others to publish papers of a given topic

•Basic Idea•Scientific Collaboration Network

•It is represented asa graph where vertices represent authors and edges represent coauthor relationships extracted from DBLP dataset•It is a temporal graph Gt, in which vertices and edges increase as time t elapses

•Author’s topic-following behavior is the process of topic diffusion in social networks, which is driven by two typical ingredients, social influence and homophily•We try to find the variables that can precisely depict social influence and homophily in our scenario and use them to predict one author’s topic-following behavior in future

•Challenges•How to distinguish social influence and homophily?•Topic definition and identification•Sample sparseness

•Contributions•Uncover the effects of social influence and homophily on topic diffusion•Propose a Multiple Logistic Regression (MLR) model to predict author’s topic-following behavior•Extensive experiments prove our model’s excellent performance

Empirical Study•Driving Forces of Topic-Following

•U1: users affected by both social influence and homophily•U2: users affected only by social influence•U3: users affected only homophily•U4: users without any impact•Results:

•Two forces are mixed to impact topic-following•Impacts are time-sensitive and decrease in an exponential way

•Social Influence•An author adopts a topic with more probability when more of his neighbors have followed the topic before

•x is affected neighbor number/proportion•p(x) is the probability that an author follows the topic

•It is more probable for an author to follow the topics that have been adopted by his neighbors (direct propagation) who have coauthored more papers with him

•Parameter Estimation•By maximum likelyhood against training set

• β2 has larger Wald value than β1 indicating FTS (homophily) is more crucial to impact topic-following behavior than FSI

•Model Evaluation•Metrics

•Recall/sensitivity, specificity, precision, accuracy, AUC•Fβ, we set β=1.1 to favor recall a little

•For topic XML•Area under ROC curve (AUC) is 0.743 vs. 0.638

•For other 4 representative topics, MLR outperforms the baseline in both accuracy and Fβ

gdm@fudan

Documents

probability of topic

topic diffusionpropose

topic definition

topic beforex

context of topic

authors topic history

process of topic diffusion

impact topicfollowingimpacts