gdm@fudan

1
Which Topic will You Follow? Deqing Yang, Yanghua Xiao, Bo Xu, Hanghang Tong, Wei Wang and Sheng Huang GDM@FUDAN GDM@FUDAN Graph Data Management Lab, School of Computer Science http://gdm.fudan.edu. cn Fudan University, Shanghai, China yangdeqing, [email protected] ECML/PKDD2012, Bristol, UK Introduction Model It is a two-category classification to predict whether an author will follow a given topic Multiple Logistic Regression (MLR) model is feasible f or our scenario, where the probability of topic- following is formalized as: where xi is explanatory variable, αand β are par ameters we should estimate by training Baseline model where a is the number of neighbors who have foll owed the topic Explanatory Variables Social Influence An author u’s tendency to follow topic s in year t, is composed from all his neighbor v’s tendency to th is topic, as well as considering their coauthor stre ngths Homophily We use topic similarity to depict the homophily among us ers in the context of topic-following A 25-dim vector u represents an author’s topic hist ory, each dimension is the number of his papers of a given topic Then, topic similarity between user u and v can be defined as W.r.t. those users who have followed topic s before t, i.e., we measure u’s homophily as Then, the whole MLR model is Y=π(x)=1, if u follows s or its related topics Modeling Topic Diffusion in Scientific Collaboration Network Motivations Who are the most appropriate candidates to rec eive a call-for-paper or call-for-participation? What session topics should we propose for a co nference of next year? Addressing these objectives, we study author’s topic-following behavior in Scientific Collaboration Ne twork (SCN), i.e., an author follows others to pu blish papers of a given topic Basic Idea Scientific Collaboration Network It is represented as a graph where vertices represent authors and edges represent coauthor relationships extracted from DBLP dataset It is a temporal graph Gt, in which vertices and edges increase as time t elapses Author’s topic-following behavior is the proce ss of topic diffusion in social networks, which is driven by two typical ingredients, social influe nce and homophily We try to find the variables that can precisel y depict social influence and homophily in our scenario and use them to predict one author’s t opic-following behavior in future Challenges How to distinguish social influence and homoph ily? Topic definition and identification Sample sparseness Contributions Uncover the effects of social influence and ho mophily on topic diffusion Propose a Multiple Logistic Regression (MLR) model to predict author’s topic-following behavior Extensive experiments prove our model’s excell ent performance Empirical Study Driving Forces of Topic-Following U1: users affected by both social influence and h omophily U2: users affected only by social influence U3: users affected only homophily U4: users without any impact Results: Two forces are mixed to impact topic-following Impacts are time-sensitive and decrease in an exponent ial way Social Influence An author adopts a topic with more probability w hen more of his neighbors have followed the topic before x is affected neighbor number/proportion p(x) is the probability that an author follows the t opic It is more probable for an author to follow the topics that have been adopted by his neighbors (d irect propagation) who have coauthored more paper s with him Parameter Estimation By maximum likelyhood against training set β2 has larger Wald value than β1 indicating FT S (homophily) is more crucial to impact topic-fol lowing behavior than FSI Model Evaluation Metrics Recall/sensitivity, specificity, precision, accuracy, AUC Fβ, we set β=1.1 to favor recall a little For topic XML Area under ROC curve (AUC) is 0.743 vs. 0.638 For other 4 representative topics, MLR outperfor ms the baseline in both accuracy and Fβ

Upload: gefjun

Post on 19-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

GDM@FUDAN. Introduction. Empirical Study. Graph Data Management Lab, School of Computer Science http://gdm.fudan.edu.cn Fudan University, Shanghai, China. Motivations Who are the most appropriate candidates to receive a call-for-paper or call-for-participation? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GDM@FUDAN

Which Topic will You Follow?Deqing Yang, Yanghua Xiao, Bo Xu, Hanghang Tong, Wei Wang and Sheng Huang

GDM@FUDANGDM@FUDAN Graph Data Management Lab, School of Computer Science http://gdm.fudan.edu.cn Fudan University, Shanghai, China

yangdeqing, [email protected] ECML/PKDD2012, Bristol, UK

Introduction

•Model•It is a two-category classification to predict whether an author will follow a given topic•Multiple Logistic Regression (MLR) model is feasible for our scenario, where the probability of topic-following is formalized as:

where xi is explanatory variable, αand β are parameters we should estimate by training•Baseline model

where a is the number of neighbors who have followed the topic

•Explanatory Variables•Social Influence

•An author u’s tendency to follow topic s in year t, is composed from all his neighbor v’s tendency to this topic, as well as considering their coauthor strengths

•Homophily•We use topic similarity to depict the homophily among users in the context of topic-following•A 25-dim vector u represents an author’s topic history, each dimension is the number of his papers of a given topic•Then, topic similarity between user u and v can be defined as

•W.r.t. those users who have followed topic s before t, i.e., we measure u’s homophily as

•Then, the whole MLR model is

•Y=π(x)=1, if u follows s or its related topics

Modeling Topic Diffusion in Scientific Collaboration Networks

•Motivations•Who are the most appropriate candidates to receive a call-for-paper or call-for-participation? •What session topics should we propose for a conference of next year?•Addressing these objectives, we study author’s topic-following behavior in Scientific Collaboration Network (SCN), i.e., an author follows others to publish papers of a given topic

•Basic Idea•Scientific Collaboration Network

•It is represented asa graph where vertices represent authors and edges represent coauthor relationships extracted from DBLP dataset•It is a temporal graph Gt, in which vertices and edges increase as time t elapses

•Author’s topic-following behavior is the process of topic diffusion in social networks, which is driven by two typical ingredients, social influence and homophily•We try to find the variables that can precisely depict social influence and homophily in our scenario and use them to predict one author’s topic-following behavior in future

•Challenges•How to distinguish social influence and homophily?•Topic definition and identification•Sample sparseness

•Contributions•Uncover the effects of social influence and homophily on topic diffusion•Propose a Multiple Logistic Regression (MLR) model to predict author’s topic-following behavior•Extensive experiments prove our model’s excellent performance

Empirical Study•Driving Forces of Topic-Following

•U1: users affected by both social influence and homophily•U2: users affected only by social influence•U3: users affected only homophily•U4: users without any impact•Results:

•Two forces are mixed to impact topic-following•Impacts are time-sensitive and decrease in an exponential way

•Social Influence•An author adopts a topic with more probability when more of his neighbors have followed the topic before

•x is affected neighbor number/proportion•p(x) is the probability that an author follows the topic

•It is more probable for an author to follow the topics that have been adopted by his neighbors (direct propagation) who have coauthored more papers with him

•Parameter Estimation•By maximum likelyhood against training set

• β2 has larger Wald value than β1 indicating FTS (homophily) is more crucial to impact topic-following behavior than FSI

•Model Evaluation•Metrics

•Recall/sensitivity, specificity, precision, accuracy, AUC•Fβ, we set β=1.1 to favor recall a little

•For topic XML•Area under ROC curve (AUC) is 0.743 vs. 0.638

•For other 4 representative topics, MLR outperforms the baseline in both accuracy and Fβ