finding & evaluating community structure in networks

28
LMU Lehr- und Forschungseinheit für Datenbanksysteme Finding & Evaluating Community Structure in Networks Seminar: Mining Volatile Data November 7th 2012 Sigrid Zänkert

Upload: others

Post on 01-Dec-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Finding & Evaluating Community Structure in Networks

LMU Lehr- und Forschungseinheit für Datenbanksysteme

Finding & Evaluating Community Structure in

NetworksSeminar: Mining Volatile Data

November 7th 2012

Sigrid Zänkert

Page 2: Finding & Evaluating Community Structure in Networks

11/07/2012 2Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Executive Summary

Motivation

Methodology

Results

Evaluation

Page 3: Finding & Evaluating Community Structure in Networks

Executive Summary

Motivation

Methodology

Results

Evaluation

11/07/2012 3Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Page 4: Finding & Evaluating Community Structure in Networks

Executive Summary

„Finding and evaluating community structure in networks“Mark E.J. Newman and Michelle Girvan

The American Physical Society 2004

Set of algorithms for discovering community structure in networks

Quality measure for the detected community structure

Demonstration of effectiveness with both real-world and computer generated network data

11/07/2012 4Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Page 5: Finding & Evaluating Community Structure in Networks

Executive Summary

Motivation

Methodology

Results

Evaluation

11/07/2012 5Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Page 6: Finding & Evaluating Community Structure in Networks

What is a Community?

11/07/2012 6Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

• Community structure is the natural division of network nodesinto densely connected subgroups

• Communities are densely connected within but there is onlysparse connection between the different communities

Page 7: Finding & Evaluating Community Structure in Networks

Community Structure in Networks

11/07/2012 7Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

• Detecting community structures within networks: importantresearch topic in statistical physics, applied mathematics,computer science and sociology

• Versatile application of „network ideas“: World Wide Web,epidemiology, citation, collaboration and many more

• Challenge: Find algorithms that generate valid results withinreasonable computing time

Page 8: Finding & Evaluating Community Structure in Networks

Executive Summary

Motivation

Methodology

Results

Evaluation

11/07/2012 8Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Page 9: Finding & Evaluating Community Structure in Networks

Hierarchical Clustering

11/07/2012 9Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Hierarchical Clustering

Agglomerative(addition of edges)

• Similarity between vertex pairs

• Edges are added based on that similarity

• Algorithms often find only the core communities or even wrong community structures

Divisive(removal of edges)

• Start with the complete network of interest

• Repeated removal of the edges between the least similar connected pairs of vertices

Page 10: Finding & Evaluating Community Structure in Networks

An Iterative and Divisive Approach

11/07/2012 10Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

1. Calculate betweenness score

for each edge

Remove edges that are most „between“ because theyconnect the different communities.

2. Remove edge with highest betweenness

score

3. Perform analysis of the network’s

components

RECALCULATE

Page 11: Finding & Evaluating Community Structure in Networks

Example

11/07/2012 11Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Page 12: Finding & Evaluating Community Structure in Networks

Calculation of Edge-Betweenness

11/07/2012 12Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

1. Shortest-path betweennessamount of shortest paths running along one edge

2. Random-walk betweennessnet number of times a random walk signal passes one edge

3. Current-flow betweennessvalue of current flow from source to sink along one edge

Page 13: Finding & Evaluating Community Structure in Networks

Calculation of Edge-Betweenness

11/07/2012 13Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

1. Shortest-path betweennessamount of shortest paths running along one edge

2. Random-walk betweennessnet number of times a random walk signal passes one edge

3. Current-flow betweennessvalue of current flow from source to sink along one edge

• Produce the exact same output• Only practical for small graphs due to bad performance

Page 14: Finding & Evaluating Community Structure in Networks

Calculation of Edge-Betweenness

11/07/2012 14Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

1. Shortest-path betweennessamount of shortest paths running along one edge

2. Random-walk betweennessnet number of times a random walk signal passes one edge

3. Current-flow betweennessvalue of current flow from source to sink along one edge

• Recommended by the authors• Operates in O(n³)• Usable for networks of up to 10.000 vertices

Page 15: Finding & Evaluating Community Structure in Networks

Modularity

11/07/2012 15Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

• Modularity Q is an objective quality measure for each splitof a network into communities

• Measures the degree of correlation between theprobability of having an edge joining two sites and the factthat the sites belong to the same community

• Calculated for each split while moving down a dendrogramto detect local peaks

Page 16: Finding & Evaluating Community Structure in Networks

Modularity

11/07/2012 16Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Q = 1 strong community structures – in reality 0.3 - 0.7

Page 17: Finding & Evaluating Community Structure in Networks

Modularity

11/07/2012 17Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Local Peak

Page 18: Finding & Evaluating Community Structure in Networks

Executive Summary

Motivation

Methodology

Results

Evaluation

11/07/2012 18Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Page 19: Finding & Evaluating Community Structure in Networks

Results

• Shortest-path algorithm delivers quality results incomputer-generated and real-life network data

• Application: detection of community structures, analysisand visualization of difficult and not-obvious networkstructures

• Examples:1. Collaboration network of scientists2. Bottlenose dolphins

11/07/2012 19Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Page 20: Finding & Evaluating Community Structure in Networks

Collaboration Network of Scientists

11/07/2012 20Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Page 21: Finding & Evaluating Community Structure in Networks

Collaboration Network of Scientists

11/07/2012 21Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Page 22: Finding & Evaluating Community Structure in Networks

Collaboration Network of Scientists

11/07/2012 22Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Page 23: Finding & Evaluating Community Structure in Networks

Bottlenose Dolphins

11/07/2012 23Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

• Split corresponds to a knowndivision of the dolphin community

• Q = 0.52• Closely related to evolution of

community in human socialnetworks

Page 24: Finding & Evaluating Community Structure in Networks

Executive Summary

Motivation

Methodology

Results

Evaluation

11/07/2012 24Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Page 25: Finding & Evaluating Community Structure in Networks

Evaluation

• Newman & Girvan present the first satisfactory and feasible solution for the task of finding community structures in networks

• Foundation for many similar approaches and variations of the presented set of algorithms

11/07/2012 25Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Problem• Computing time O(n³) only

reasonable for networks up to 10.000 vertices

• In reality the networks of interest are much bigger

Possible Solutions• Parallelization• Betweenness calculation for

certain subsets• Greedy optimization of

modularity• Stochastic blockmodels (2011)

Page 26: Finding & Evaluating Community Structure in Networks

Thank you very much!

…Questions?

11/07/2012 26Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Page 27: Finding & Evaluating Community Structure in Networks

BildquellenFolie 01+06: http://www.nieuwsmarkt.nl/wp-content/uploads/2011/09/comm.jpg

Folie 06: Newman, M. E. J. and Girvan, M. (2004): Finding and evaluating community structure in networks. In: Physical Review E 69, 026113 (2004), p.11.

Folie 10: Newman, M. E. J. and Girvan, M. (2004): Finding and evaluating community structure in networks. In: Physical Review E 69, 026113 (2004), p.2

Folie 16+17: Newman, M. E. J. and Girvan, M. (2004): Finding and evaluating community structure in networks. In: Physical Review E 69, 026113 (2004),

p.8.

Folie 20-22: Newman, M. E. J. and Girvan, M. (2004): Finding and evaluating community structure in networks. In: Physical Review E 69, 026113

(2004), p.11.

Folie 23: http://static.guim.co.uk/sys-images/Guardian/Pix/pictures/2008/04/17/dolphin11a.jpg

Newman, M. E. J. and Girvan, M. (2004): Finding and evaluating community structure in networks. In: Physical Review E 69, 026113 (2004), p.12.

Folie 26: http://www.businesscartoons.co.uk/shop/images/uploads/3803bwc.gif

Folie28 : Newman, M. E. J. and Girvan, M. (2004): Finding and evaluating community structure in networks. In: Physical Review E 69, 026113 (2004), p.10.

11/07/2012 27Finding & Evaluating Community Structure in Networks – Sigrid Zänkert

Page 28: Finding & Evaluating Community Structure in Networks

Backup – Example Application Without Recalculation

11/07/2012 28Finding & Evaluating Community Structure in Networks – Sigrid Zänkert