community discovery in social network yunming ye department of computer science shenzhen graduate...

33
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Upload: silvester-august-thompson

Post on 19-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Community Discovery in Social Network

Yunming Ye

Department of Computer Science

Shenzhen Graduate School

Harbin Institute of Technology

Page 2: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

2

Agenda Introduction to Social Network and

Community Discovery

Classical Community Discovery Algorithms

Hot Research Issues

Page 3: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

3

Introduction to Social Network and Community Discovery

Page 4: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Studies on Networks

Lots of “Networked” data!! Technological networks

Power-grid, road networks Biological networks

Food-web, protein networks Social networks

Collaboration networks, friendships Language networks

Semantic networks

Page 5: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Studies on Networks

Social Networks QQ Kaixin Renren Facebook Email Twitter Co-citation Blog

Page 6: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Community A property that seems to be common to many

networks is community structure.

Community: The division of network nodes into groups within which the network connections are dense, but between which they are sparser.

Page 7: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Subjectivity of Community Definition

Each component is a communityA densely-knit

community

Definition of a community can be subjective.

(unsupervised learning)

Definition of a community can be subjective.

(unsupervised learning)

Page 8: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Community Detection Community Detection: Find the community

structure from the social network. Community detection is important:

Identifying modules and their boundaries allows for a classification of vertices, according to their structural position in the modules.

Page 9: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Community Detection

Public opinions monitor

Commodity recommendation

Network optimization

Network security

Epidemic monitor

Page 10: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

10

Classical Community Discovery Algorithms

Page 11: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Clustering based on Vertex Similarity

Apply k-means or similarity-based clustering to nodes Vertex similarity is defined in terms of the similarity

of their neighborhood Structural equivalence: two nodes are structurally

equivalent iff they are connecting to the same set of actors

Structural equivalence is too restrict for practical use.

Nodes 1 and 3 are structurally equivalent; So are nodes 5 and 6.

Page 12: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Vertex Similarity

Jaccard Similarity

Cosine similarity

Page 13: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

13

Linkage Clustering

The illustration of three cluster-to-cluster dissimilarity criteria. R and S are two clusters and NR; NS are the sizes of these two clusters. riR and sjS are the ith and jth object in cluster R and S respectively.

Page 14: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Greedy on Similarity Merge the pair of which the

distance is minimum (i.e. most similar)

The number of partitions found during the procedure is n, each with a different number of clusters, from n to 1.

At each iteration step, one needs to compute the variation Q of modularity given by the merger of any two communities of the running partition, so that one can choose the best merger.

Page 15: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

CNM algorithm

Clauset, Newman, and Moore (CNM algorithm) Finding community structure in very large

networks A Clauset, MEJ Newman, C Moore - Physical

Review E 2004 cited times: 351

The idea of CNM is based on the greedy optimization of the quantity known as modularity

CNM is a agglomerative hierarchical method

Page 16: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Modularity Maximization Modularity measures the strength of a community partition

by taking into account the degree distribution Given a network with m edges, the expected number of

edges between two nodes with degrees di and dj is

Strength of a community:

Modularity:

A larger value indicates a good community structure

The expected number of edges between nodes 1 and 2 is

3*2/ (2*14) = 3/14

Given the degree distribution

Page 17: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

CNM We view every single node as a community

initially. We repeatedly join together the two

communities whose amalgamation produces the largest increase in Q.

For a network of n vertices, after n − 1 such joins we are left with a single community and the algorithm stops.

The entire process can be represented as a tree whose leaves are the vertices of the original network and whose internal nodes correspond to the joins.

Page 18: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

CNM

Dendrogram represents a hierarchical decomposition of the network into communitiesat all levels.

Page 19: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

CNM algorithm

It is observed that merging communities of unbalanced sizes has great impact on computational efficiency of CNM.

Page 20: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Results

Page 21: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Girvan and Newman Method

Among the hierarchical methods, the algorithm of Girvan and Newman (Girvan & Newman 2002) presents an important improvement.

Community structure in social and biological networks

M Girvan, MEJ Newman - Proceedings of the National Academy of Sciences, 2002 - National Acad Sciencescited times : 1302

GN method is a divisive hierarchical method.

Page 22: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Edge Betweenness

The strength of a tie can be measured by edge betweenness

Edge betweenness: the number of shortest paths that pass along with the edge

The edge betweenness of e(1, 2) is 4 (=6/2 + 1), as all the shortest paths from 2 to {4, 5, 6, 7, 8, 9} have to either pass e(1, 2) or e(2, 3), and e(1,2) is the shortest path between 1 and 2

22

Page 23: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Edge Betweenness

They use the metric called edge betweenness where betweenness is some measure that favors edges that lie between communities and disfavors those that lie inside communities.

Page 24: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Edge Betweenness

Define the edge betweenness of an edge as the number of shortest paths between pairs of vertices that run along it. If there more than one shortest path between a

pair of vertices each path is given equal weight such that the total weigh of all the paths is unity.

If a network contains communities or groups that are only loosely connected by a few inter-group edges, then all shortest paths between different communities must go along one of these few edges.

Page 25: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Edge Betweenness

Thus, the edges connecting communities will have high edge betweenness.

By removing these edges, we separate groups from one another and so reveal the underlying community structure of the graph.

Page 26: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Procedure

The algorithm is stated as follows:

1. Calculate the betweenness for all edges in the network.

2. Remove the edge with the highest betweenness.

3. Recalculate betweennesses for all edges excepted by the removal.

4. Repeat from step 2 until no edges remain

Page 27: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Divisive clustering based on edge betweenness

After remove e(4,5), the betweenness of e(4, 6) becomes 20, which is the highest;

After remove e(4,6), the edge e(7,9) has the highest betweenness value 4, and should be removed.

Initial betweenness value

27Idea: progressively removing edges with the highest betweenness

Page 28: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Procedure

Page 29: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Procedure

Page 30: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Procedure

Page 31: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

31

Hot Directions

Page 32: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Hot Directions

Discovery of Overlapping Communities

Incremental algorithm

Topic-sensitive Community Discovery

Local Community Discovery

Community Discovery in Multi-relational Network

Page 33: Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

Q&A

Thanks!