leonid e. zhukov - leonid zhukov · lecture outline 1 cohesive subgroups graph cliques k-plex,...

30
Network communities Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization of Networks Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 1 / 30

Upload: others

Post on 31-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Network communities

Leonid E. Zhukov

School of Data Analysis and Artificial IntelligenceDepartment of Computer Science

National Research University Higher School of Economics

Structural Analysis and Visualization of Networks

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 1 / 30

Page 2: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Lecture outline

1 Cohesive subgroupsGraph cliquesk-plex, k-core

2 Network communitiesSimilarity based clusteringGraph partitioning

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 2 / 30

Page 3: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Network communities

Connected and undirected graphsLeonid E. Zhukov (HSE) Lecture 8 3.03.2015 3 / 30

Page 4: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Network communities

What makes a community (cohesive subgroup):

Mutuality of ties. Everyone in the group has ties (edges) to oneanother

Compactness. Closeness or reachability of group members in smallnumber of steps, not necessarily adjacency

Density of edges. High frequency of ties within the group

Separation. Higher frequency of ties among group members comparedto non-members

Wasserman and Faust

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 4 / 30

Page 5: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Graph cliques

Definition

A clique is a complete (fully connected) subgraph, i.e. a set of verticeswhere each pair of vertices is connected.

Cliques can overlapLeonid E. Zhukov (HSE) Lecture 8 3.03.2015 5 / 30

Page 6: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Graph cliques

A maximal clique is a clique that cannot be extended by includingone more adjacent vertex (not included in larger one)

A maximum clique is a clique of the largest possible size in a givengraph

Graph clique number is the size of the maximum clique

image from D. Eppstein

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 6 / 30

Page 7: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Graph cliques

Computational issues:

Finding click of fixed given size k - O(nkk2)

Finding maximum clique O(3n/3)

But in sparse graphs...

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 7 / 30

Page 8: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Relaxation of a clique

k-plex of size n is a maximal subset of n vertices such that eachvertex is connected to at least n − k others in the subset (any vertexcan be lacking ties with no more than k members).

k-core is a maximal subset of vertices such that each is connected toat least k others in the subset (degree of every vertex in k-coreki ≥ k) . (k+1) core is always a subgraph of k − core

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 8 / 30

Page 9: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

K-core

The core number of a vertex is the highest order of a core thatcontains this vertex

image from Alvarez-Hamelin et.al., 2005

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 9 / 30

Page 10: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Graph cliques

Zachary Karate Club, 1977

Maximal cliques:Clique size: 2 3 4 5Number of cliques: 11 21 2 2

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 10 / 30

Page 11: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Graph cliques

Zachary karate club 1,2,3,4 - cores

Maximum cliques

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 11 / 30

Page 12: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

K-cores

Zachary karate club: 1,2,3,4 - cores

4−core====⇒

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 12 / 30

Page 13: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Network communities

Definition

Network communities are groups of vertices similar to each other.

Community detection is an assignment of vertices to communities.Non-overlapping communities (every vertex belongs to a single group)

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 13 / 30

Page 14: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Similarity based clustering

Similarity based vertex clustering:

Define similarity measure between vertices based on network structure- Jaccard similarity- Cosine similarity- Pearson correlation- Eucledian distance (dissimilarity)

Calculate similarity between all pairs of vertices in the graph(similarity matrix)

Group together vertices with high similarities

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 14 / 30

Page 15: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Hierarchical clustering

Agglomerative clustering:

Assign each vertex to a group of its ownFind two groups with the highest similarity and join them in a singlegroupCalculate similarity between groups:- single-linkage clustering (most similar in the group)- complete-linkage clustering (least similar in the group)- average-linkage clustering (mean similarity between groups)Repeat until all joined into single group

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 15 / 30

Page 16: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Similarity matrix

Zachary karate club

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 16 / 30

Page 17: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Hierarchical clustering

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 17 / 30

Page 18: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Hierarchical clustering

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 18 / 30

Page 19: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Hierarchical clustering

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 19 / 30

Page 20: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Hierarchical clustering

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 20 / 30

Page 21: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Hierarchical clustering

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 21 / 30

Page 22: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Network communities

Definition

Network communities are groups of vertices such that vertices inside thegroup connected with many more edges than between groups.

Graph partitioning problem

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 22 / 30

Page 23: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Graph partitioning

Combinatorial problem:

Number of ways to divide network of n nodes in 2 groups(bi-partition):

n!

n1!n2!, n = n1 + n2

Dividing into k non-empty groups (Stirling numbers of the secondkind)

S(n, k) =1

k!

n∑j=0

(−1)jC jk(k − j)n

Number of all possible partitions (n-th Bell number):

Bn =n∑

k=1

S(n, k)

B20 = 5, 832, 742, 205, 057

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 23 / 30

Page 24: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Heuristic approach

Focus on edges that connect communities.Edge betweenness -number of shortest paths σst(e) going through edge e

CB(e) =∑s 6=t

σst(e)

σst

Construct communities by progressively removing edgesLeonid E. Zhukov (HSE) Lecture 8 3.03.2015 24 / 30

Page 25: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Edge betweenness

Newman-Girvan, 2004

Algorithm: Edge Betweenness

Input: graph G(V,E)

Output: Dendrogram

repeatFor all e ∈ E compute edge betweenness CB(e);

remove edge ei with largest CB(ei ) ;

until edges left;

If bi-partition, then stop when graph splits in two components(check for connectedness)

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 25 / 30

Page 26: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Edge betweenness

Hierarchical algorithm, dendrogram

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 26 / 30

Page 27: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Community ”quality”

Let nc - number of classes, ci - class label per nodeCompare fraction of edges within the cluster to expected fraction ifedges were distributed at randomModularity:

Q =1

2m

∑ij

(Aij −

kikj2m

)δ(ci , cj), δ(ci , cj)- kronecker delta

The higher the modularity score - the better is communityModularity score range Q ∈ [−1/2, 1)Single class, δ(ci , cj) = 1, Q = 0

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 27 / 30

Page 28: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Dendrogram and modularity score

Newman and Girvan, 2004

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 28 / 30

Page 29: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

Network communities

Zachary karate club

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 29 / 30

Page 30: Leonid E. Zhukov - Leonid Zhukov · Lecture outline 1 Cohesive subgroups Graph cliques k-plex, k-core 2 Network communities Similarity based clustering Graph partitioning Leonid E

References

Finding and evaluating community structure in networks, M.E.J.Newman, M. Girvan, Phys. Rev E, 69, 2004

Modularity and community structure in networks, M.E.J. Newman,PNAS, vol 103, no 26, pp 8577-8582, 2006

S. E. Schaeffer. Graph clustering. Computer Science Review,1(1):2764, 2007.

S. Fortunato. Community detection in graphs, Physics Reports, Vol.486, Iss. 35, pp 75-174, 2010

Leonid E. Zhukov (HSE) Lecture 8 3.03.2015 30 / 30