![Page 1: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/1.jpg)
DM-Group Meeting Liangzhe Chen, Apr. 2 2015
![Page 2: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/2.jpg)
Papers to be present
On Integrating Network and Community Discovery
WSDM’15
J. Liu, C. Aggarwal, J. Han.
Global Diffusion via Cascading Invitations: Structure, Growth and Homophily
WWW’15
A. Anderson, D. Huttenlocher, J. Kleigburg, J. Leskovec, M. Tiwari.
![Page 3: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/3.jpg)
1st Paper
On Integrating Network and Community Discovery
WSDM’15
J. Liu, C. Aggarwal, J. Han.
![Page 4: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/4.jpg)
Introduction
Most algorithms for community detection assume that the entire network is available for analysis.
Privacy constraints in Facebook
Hard to crawl the whole network in Twitter
Discovery of the entire network itself is a costly task
Can we integrate community detection with network discovery?
![Page 5: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/5.jpg)
Problem Definition
G(N,A): N is the set of all nodes, A is the set of all edges in the network.
Gs(Ns,As,Qs): Ns is the set of observed nodes, As is the set of observed edges, Qs are the costs to query nodes in Ns.
Given Gs(Ns,As,Qs), a target node set Nt (subset of Ns), an ability to query any currently observe node for their adjacent links at cost ci, cluster Nt into the set of k most tightly linked communities within a total budget B.
![Page 6: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/6.jpg)
Framework
Inialization
Get k clusters
Select a node to query, And update the graph
Update the clusters
![Page 7: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/7.jpg)
How to select a node to query
Calculate a score for Each candidate
Adjust the score according to the cost
![Page 8: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/8.jpg)
How to select a node to query
Two ways used to calculate scores for nodes
Normalized cut
Modularity
![Page 9: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/9.jpg)
How to select a node to query
Incorporating the costs Qc
For each node i, the rank of that node is adjusted by the cost of querying that node according to the following equation:
Parameter that controls how much the cost affect
the result ranks
![Page 10: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/10.jpg)
Community Discovery
A generative model for the graph:
𝜃𝑖𝑘: the propensity of a node i to have edges of community k
𝜃𝑖𝑘𝜃𝑗𝑘𝑘 : the expected number of links between
node i and j
The likelihood of the graph:
Parameter updating rules (see details in the paper)
![Page 11: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/11.jpg)
Recap of their algorithm
Inialization
Get k clusters
Select a node to query, And update the graph
Update the clusters
![Page 12: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/12.jpg)
Experiments: Datasets
Synthetic 36,000 nodes, 6000 of them are generated from 5
clusters. Each of them has 3 out-cluster neighbors, and 8 within-cluster neighbors. The rest 30,000 nodes have random links.
DBLP Co-authorship network. 115 authors, from 4 research
groups
IMDB Co-actor and co-director network. Different genres are
treated as different clusters.
![Page 13: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/13.jpg)
Experiments: Results
![Page 14: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/14.jpg)
Experiments: Results
![Page 15: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/15.jpg)
Experiments: Results
![Page 16: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/16.jpg)
2nd Papers
Global Diffusion via Cascading Invitations: Structure, Growth and Homophily
WWW’15
A. Anderson, D. Huttenlocher, J. Kleigburg, J. Leskovec, M. Tiwari.
![Page 17: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/17.jpg)
Introduction
Many of the popular websites catalyze their growth through invitation from existing members. New members can then in turn issue invitations, thus creating a cascade of member signups.
![Page 18: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/18.jpg)
Member Signups
Two ways to sign up
A cold signup: sign up directly at the site
A warm signup: sign up through clicking an invitation from others
Forming a graph of forest
Cold signups as root nodes
Ward signups have 1 parent
![Page 19: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/19.jpg)
Quantifying virality as a while
![Page 20: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/20.jpg)
Quantifying virality as a while
![Page 21: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/21.jpg)
Structural Virality
The goal of structural virality, is to numerically disambiguate between shallow broadcast like diffusions and the deep branching structures.
Use Wiener Index to capture the structural virality of a tree: average path distance between two nodes in the tree.
![Page 22: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/22.jpg)
Structural Virality
High correlation between cascade size and structural virality, different from other datasets.
![Page 23: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/23.jpg)
Homophily
Edge homophily
Cascade homophily
![Page 24: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/24.jpg)
Edge Homophily
Directly calculating P(Ai|Ai)
High edge homophily is present in the dataset
![Page 25: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/25.jpg)
Cascade Homophily
Population diversity measure used in sociology
Within-similarity WA(T) of a group T on attribute A
Probability that two randomly selected nodes in T match on attribute A
Between-similarity BA(T1,T2)
Probability that a randomly selected node in T1 and a randomly selected node in T2 match on attribute A
Comparing WA and BA to identify cascade homophily.
![Page 26: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/26.jpg)
Cascade Homophily
![Page 27: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/27.jpg)
Cascade Homophily
Different attribute values show different level of homophily
![Page 28: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/28.jpg)
Cascade & Edge Homophily
Is the cascade homophily the same as the local edge homophily
Model the edge homophily by first order Markov chain using P(Ai|Aj)
Simulate the cascade tree using the Markov model and compare to the real tree.
![Page 29: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/29.jpg)
Cascade & Edge Homophily
First order Markov chain does not recover the data well.
The attributes of users are not entirely determined by the attributes of their direct parents, but by the rest of the cascade as well.
Edge level homophily is insufficient to explain cascade level homophily.
![Page 30: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/30.jpg)
Guessing the root
The edge homophily suggests that the cascade tends to retain some memory of the root. How quickly the cascade lose its root information and relax to the background distribution?
![Page 31: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/31.jpg)
Guessing the root
![Page 32: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/32.jpg)
Status Gradient
Status gradient is observed in some of the attributes which do not show homophily
![Page 33: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/33.jpg)
Timescale of transmission
Invitations to others are sent long after the registration of the user.
Invitations are adopted quickly after a user receives one.
![Page 34: DM-Group Meetingpeople.cs.vt.edu/liangzhe/slides/04-02-2015-liangzhe.pdf · 2/4/2015 · DM-Group Meeting Liangzhe Chen, Apr. 2 2015 . Papers to be present ... of a group T on attribute](https://reader036.vdocuments.us/reader036/viewer/2022071015/5fce0c1dcb9e923df51044c3/html5/thumbnails/34.jpg)
Cascade Growth Trajectories
Cascade size grows almost linearly w.r.t time.