community detection via semi–synchronous label propagation ... · 12/12/2010 business...
TRANSCRIPT
![Page 1: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/1.jpg)
Community Detection via Semi–Synchronous Label Propagation Algorithms
Gennaro Cordasco and Luisa Gargano
Dipartimento di Informatica ed Applicazioni “R.M. Capocelli” Università degli Studi di Salerno, ITALY
Business Applications of Social Network Analysis (BASNA) 2010
![Page 2: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/2.jpg)
Outline
• Complex networks – Community structure in networks
• Community detection algorithms • Label Propagation algorithms (LPAs)
– Synchronous vs Asynchronous LPAs – Stopping Criteria and Tie Resolution strategies
• Our Proposal: Semi-synchronous LPA – Convergence – Experimental Results
• Conclusion
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 2
![Page 3: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/3.jpg)
Complex networks
• Real-world networks are characterized by several properties:
– Small-world phenomenon
– Scale-free distribution
– …
– Community structure
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 3
![Page 4: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/4.jpg)
The largest connected component of a coauthorship network connecting physicists who have published together on networks. Each node is colored according to community membership.
Enron email corpus at UC Berkeley (2005)
A metabolic network consisting of metabolic genes (blue hexagons) regulated by carbon, light, and nitrogen treatments.
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 4
Community Structure in networks
![Page 5: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/5.jpg)
Community Detection Algorithms • Input: A network G=(V,E)
• Output: A partition of V into communities consisting of nodes
with similar characteristics.
• Two families of methods: • Agglomerative
• Divisive
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 5
Agglo
merative
Divisive
![Page 6: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/6.jpg)
Label Propagation Algorithms (LPAs) [RAK07]
1. Each vertex in the network is assigned a unique label (its own community)
2. An iterative process is performed so that connected groups of vertices are able to reach a consensus on some label giving rise to a community.
LPAs algorithms have shown to be quick (experimentally) and effective
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 6
![Page 7: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/7.jpg)
Synchronous LPA
• Easy to parallelize
• Stable
• Labels may oscillate
• Complex stop criterion are needed
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 7
In the synchronous LPA each vertex computes its label at step i based on the label of its neighbors at step i − 1.
![Page 8: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/8.jpg)
Synchronous LPA: The oscillation phenomenon
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 8
![Page 9: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/9.jpg)
Asynchronous LPA
• Hard to parallelize: some dependencies need to be considered
• Unstable: different runs may provide different results
• Less effective: some runs may provide a ‘‘monster’’ community
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9
In the asynchronous LPA, for each step, vertices labels are sequentially updated based on the current labeling.
![Page 10: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/10.jpg)
Asynchronous LPA
6
3
4
8 1 9
2
7
6
3
5
4
8 1 9
2
7
… …
… …
… 1
3
9
5
1
6 8
4
7
10
Propagation Step 1
Propagation Step 2
1 5 6
7 9 10
Generate a Random permutation of vertices
Generate a Random permutation of vertices
8 4
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 10
10
10
6
3
5
4
8 1 9
2
7 10
6
3
5
4
8 1 9
2
7 10
6
3
5
4
8 1 9
2
7 10
1
3
9
5
1
6 8
4
7
10
6
3
5
4
8 1 9
2
7 10
6
3
5
4
8 1 9
2
7 10
1
3
9
5
1
6 8
4
7
10
5
… … …
![Page 11: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/11.jpg)
LPA Tie Resolution Strategies
• When a ties occurs (i.e., there are two or more labels that maximize the sum in the labeling equation ):
• LPA-Random (LPA): one of the labels that maximize the sum is chosen randomly.
• LPA-Prec: if the current label satisfies the labeling equation, then the vertex keeps its current label, otherwise the label is chosen randomly.
• LPA-Max: each tie is solved deterministically by taking the label with higher value.
• LPA-Prec-Max: uses both the rules above.
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 11
)(
][maxargvNu
ul
v lll
![Page 12: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/12.jpg)
LPA Stopping Criteria
• Stop Criteria (C0) compare the current labeling with the previous one, if no label change occurs, the algorithm is stopped.
(C1) if in the current labeling every vertex in the network has a label to which the maximum number of its neighbors belong to, then the algorithm is stopped.
(C2) if either lv(i) = lv(i − 1) for each v ∈ V or lv(i) = lv(i − 2) for each v ∈ V, then the algorithm is stopped.
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 12
Does not prevent cycles
LPA Asynch
LPA Synch
![Page 13: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/13.jpg)
Our Proposal: Semi–Synchronous LPA
Two Phases: Coloring Phase. Color the network vertices so that no two adjacent vertices share the same color (i.e., by any distributed graph coloring algorithm). Propagation Phase. Each label propagation step is divided into stages:
At stage c, labels are simultaneously propagated to the vertices that have been assigned color c during the coloring phase.
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 13
Neighbouring nodes are never updated simultaneously
![Page 14: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/14.jpg)
Propagation Step 1
A
Coloring Phase
B C
Propagation Step 2
A B C
Our Proposal: Semi–Synchronous LPA
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 14
A
A
C
C C
A C
C
C
B
A
A
A
A
C
C C
A C
C
C
B
A
A
A
A
C
C C
A C
C
C
B
A
A
A
A
C
C C
A C
C
C
B
A
A
A
A
C
C C
A C
C
C
B
A
A
A
A
C
C C
A C
C
C
B
A
A
A
A
C
C C
A C
C
C
B
A
A
![Page 15: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/15.jpg)
Our Proposal: Convergence
Theorem 1:
Consider a network G = (V,E). Assume that Semi–Synchronous LPA uses the stop criterion (c1), that is, it ends at the first step t such that for each v ∈ V one of the following condition holds:
i) lv(t) = lv(t − 1)
ii) lv(t) ≠ lv(t − 1) but this change is due to a tie.
Then the Algorithm converges, independently of the tie management rule.
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 15
Hint: The number of monochromatic edges in the network increases at each propagation step
![Page 16: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/16.jpg)
Our Proposal: Experimental Results
• Analyzed Networks:
1. Zachary’s Karate: Social network of friendships between 34 members of a karate club at a US university in the 1970s [Zac77];
2. Dolphins: Social network of frequent associations between 62 dolphins in a community living off Doubtful Sound, New Zealand [LSB+03];
3. Football: Network of American football games between Division IA colleges during regular season Fall 2000 [GN02];
4. NetScience: Coauthorship network of scientists working on network theory and experiment, as compiled by M. Newman in May 2006 [New06a];
5. Power: A network representing the topology of the Western States Power Grid of the United State [WS98];
6. Internet: A symmetrized snapshot of the structure of the Internet at the level of autonomous systems, reconstructed from BGP tables posted by the University of Oregon Route Views [New06b];
7. Cond-Mat: Coauthorships between scientists posting preprints between Jan 1, 1995 and June 30, 2003 on the Condensed Matter E-Print Archive [New01].
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 16
![Page 17: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/17.jpg)
Our Proposal: Experimental Results Test Settings:
Each test is identified by a triple (P,N,T) where
P, indicate the algorithm
N, indicate the network
T, the tie resolution strategy
Overall 56 different test settings have been considered.
Test Execution: i. Initial labeling (randomly chosen) and network coloring (if needed)
ii. LPA execution (Stop criteria (c1))
iii. Community identification
iv. Community structure evaluation (quality and stability)
Quality of Communities: We measure the quality of a network partition using the Modularity [NG04]:
The modularity of a given partion C of a graph G=(V,E) is defined as
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 17
![Page 18: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/18.jpg)
Our Proposal: Experimental Results
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 18
Mo
du
lari
ty
![Page 19: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/19.jpg)
Our Proposal: Experimental Results
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 19
Nu
mb
er o
f st
ages
![Page 20: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/20.jpg)
Our Proposal: Experimental Results
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 20
Mo
du
lari
ty s
tan
dar
d d
evia
tio
n
![Page 21: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/21.jpg)
Conclusion
• Semi–synchronous LPA is
• Effective
• Efficient
• Stable
• Several problems related to the convergence of LPAs and their speed are still open.
12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 21
![Page 22: Community Detection via Semi–Synchronous Label Propagation ... · 12/12/2010 Business Applications of Social Network Analysis (BASNA) 2010 9 In the asynchronous LPA, for each step,](https://reader036.vdocuments.us/reader036/viewer/2022071219/6054f82ca1f885552635d20f/html5/thumbnails/22.jpg)
Thanks for your attention
Gennaro Cordasco
Luisa Gargano