graph based clustering
DESCRIPTION
AACIMP 2011 Summer School. Operational Research Stream. Lecture by Erik Kropat.TRANSCRIPT
![Page 1: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/1.jpg)
Graph Based Clustering
Summer School
“Achievements and Applications of Contemporary Informatics,
Mathematics and Physics” (AACIMP 2011)
August 8-20, 2011, Kiev, Ukraine
Erik Kropat
University of the Bundeswehr Munich Institute for Theoretical Computer Science,
Mathematics and Operations Research
Neubiberg, Germany
![Page 2: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/2.jpg)
Real World Networks
• Biological Networks
− Gene regulatory networks
− Metabolic networks
− Neural networks
− Food webs
• Technological Networks
− Telecommunication networks
− Internet
− Power grids
food web
power grid
![Page 3: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/3.jpg)
Real World Networks
• Social Networks
− Communication networks
− Organizational networks
− Social media
− Online communities
• Economic Networks
− Financial market networks
− Trade networks
− Collaboration networks
social networks
economic networks
Source: Frank Schweitzer et al., “Economic Networks: The New Challenges,” Science 325, no. 5939 (July 24, 2009): 422-425.
![Page 4: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/4.jpg)
Graph-Theory
• Graph theory can provide more detailed information about the inner structure of the data set in terms of
− cliques (subsets of nodes where each pair of elements is connected)
− clusters (highly connected groups of nodes)
− centrality (important nodes, hubs)
− outliers . . . (unimportant nodes)
• Applications
− social network analysis
− diffusion of information
− spreading of diseases or rumours
⇒ marketing campaigns, viral marketing, social network advertising
![Page 5: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/5.jpg)
Graph-Based Clustering
• Collection of a wide range of very popular clustering algorithms
that are based on graph-theory.
• Organize information in large datasets to facilitate users
for faster access to required information.
![Page 6: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/6.jpg)
Idea
• Objects are represented as nodes in a complete or connected graph.
• Assign a weight to each branch between the two nodes x and y.
The weight is defined by the distance d(x,y) between the nodes.
Clustering Distance between
clusters Distance between objects
![Page 7: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/7.jpg)
Idea
minimal spanning tree
graph
clusters
![Page 8: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/8.jpg)
Graph Based Clustering
Hierarchical method
(1) Determine a minimal spanning tree (MST)
(2) Delete branches iteratively
New connected components = Cluster
1
3
5
8
4
6
![Page 9: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/9.jpg)
Minimal Spanning Trees
![Page 10: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/10.jpg)
Minimal Spanning Tree
A minimal spanning tree of a connected graph G = (V,E)
is a connected subgraph with minimal weight
that contains all nodes of G and has no cycles.
1
3
5
8
4
6
a
1
3
5
8
4
6
c
d
b
a
c
d
b
minimal spanning tree graph G = (V, E)
![Page 11: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/11.jpg)
Minimal spanning trees can be calculated with...
(1) Prim’s algorithm.
(2) Kruskal’s algorithm.
a
1
3
5
8
4
6
c
d
b
![Page 12: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/12.jpg)
Example – Prims’s Algorithm
1
3
5
8
4
6
a
b
c
d
Set VT = {a}, ET = { }
1
3
5
8
4
6
a
b
c
d
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
VT = {a,b} and ET = { (a,b) }.
![Page 13: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/13.jpg)
Example– Prims’s Algorithm
c
1
3
5
8
4
6
a
b
c
d
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
VT = {a,b,d} and ET = { (a,b), (a,d) }.
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
VT = {a,b,c,d} and ET = { (a,b), (a,d),(b,c) }.
c
1
3
5
8
4
6
a
c
d
b
![Page 14: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/14.jpg)
Prim’s Algorithm
INPUT: Weighted graph G = (V, E), undirected + connected
OUTPUT: Minimal spanning tree T = (VT, ET) (1) Set VT = {v}, ET = { }, where v is an arbitrary node from V (starting point).
(2) REPEAT
(3) Choose an edge (a,b) with minimal weight, such that a ∈ VT and b ∉ VT.
(4) Set VT = VT ∪ {b} and ET = ET ∪ { (a,b) }.
(5) UNTIL VT = V
![Page 15: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/15.jpg)
Kruskal’s Algorithm
INPUT: Weighted graph G = (V, E), undirected + connected
OUTPUT: Minimal spanning tree T = (VT, ET) (1) Set VT = V, ET = { }, H = E.
(2) Initialize a queue to contain all edges in G, using the weights in ascending order as keys.
(3) WHILE H ≠ { }
(4) Choose an edge e ∈ H with minimal weight.
(5) Set H = H \ {e}.
(6) If (VT, ET ∪ {e}) has no cycles, then ET = ET ∪ {e} .
(7) END
![Page 16: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/16.jpg)
Branch Deletion
![Page 17: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/17.jpg)
Delete Branches - Different Strategies
(1) Delete the branch with maximum weight.
(2) Delete inconsistent branches.
(3) Delete by analysis of weights.
![Page 18: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/18.jpg)
(1) Delete the branch with maximum weight
• In each step, create two new clusters by deleting the branch with maximum weight.
• Repeat until the given number of clusters is reached.
2
2 6
3
4
2 2 2
![Page 19: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/19.jpg)
2
2 6
3
4
2 2 2
Ordered weights of branches: 6, 4, 3, 2, 2, 2, 2, 2.
Minimum spanning tree
Example: Delete the branch with maximum weight
![Page 20: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/20.jpg)
2
2 6
3
4
2 2 2
Ordered weights of branches: 6, 4, 3, 2, 2, 2, 2, 2.
Step 1: Delete branch (weight 6) ⇒ 2 clusters
Example: Delete the branch with maximum weight
![Page 21: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/21.jpg)
2
2 6
3
4
2 2 2
Example: Delete the branch with maximum weight
Ordered weights of branches: 6, 4, 3, 2, 2, 2, 2, 2.
Step 1: Delete branch (weight 6) ⇒ 2 clusters Step 2: Delete branch (weight 4) ⇒ 3 clusters
![Page 22: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/22.jpg)
(2) Delete inconsistent branches
• A branch e is inconsistent, if the corresponding weight de
is (much) larger than a reference value de .
• The reference value de can be defined by the average weight of all branches adjacent to e.
_
_
1
2 6
3 e de = 3 + 2 + 1 _________
3
_ = 2
de = 6 > 2 = de _
⇒ e inconsistent
![Page 23: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/23.jpg)
(3) Delete by analysis of weights
• Perform an “analysis” of all weights of branches in the MST. Determine a threshold S.
• The threshold can be estimated by histograms on the weights of branches (= length of branches).
• Delete a branches, if the corresponding weight higher than the threshold S.
weight of branch (length of branch)
Num
ber
weight of branch
Num
ber
S
![Page 24: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/24.jpg)
Exercise
Find a minimal spanning tree and provide a clustering of the graph by deleting all inconsistent branches.
10
f
a
b
c
d
e
g
2
12
4 1
3 20
8
5
9
15 6
![Page 25: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/25.jpg)
Example
Set VT = {a}, ET = { } Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
![Page 26: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/26.jpg)
Example
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
![Page 27: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/27.jpg)
Example
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
![Page 28: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/28.jpg)
Example
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
minimal spanning tree
![Page 29: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/29.jpg)
Example
For each branch calculate the reference value
(average weight of adjacent branches)
f
a
b
c
d
e
g
2
4 1
3
5
6
(3)
(3)
(4.5)
(3.6)
(5)
(4)
![Page 30: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/30.jpg)
Example
Delete inconsistent branches
(weight is larger than the reference value)
f
a
b
c
d
g 4 1
3
(3)
(3) (4)
e
2 clusters
Noise?
![Page 31: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/31.jpg)
Summary
![Page 32: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/32.jpg)
Summary
• In graph based clustering objects are represented as nodes in a complete or connected graph.
• The distance between two objects is given by the weight of the corresponding branch.
• Hierarchical method
(1) Determine a minimal spanning tree (MST)
(2) Delete branches iteratively
• Visualization of information in large datasets.
![Page 33: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/33.jpg)
• V. Kumar, M. Steinbach, P.-N. Tan
Introduction to Data Mining.
Addison Wesley, 2005.
Literature
• J.A. Dunne, R.J. Williams, N.D. Martinez, R.A. Wood, D.H. Erwin Compilation and Network Analyses of Cambrian Food Webs.
PLoS Biol 6(4): e102. doi:10.1371/journal.pbio.0060102 • F. Schweitzer, G. Fagiolo, D. Sornette, F. Vega-Redondo, A. Vespignani, D.R. White
Economic Networks: The New Challenges.
Science 325, no. 5939 (July 24, 2009): 422-425.
Other work mentioned in the presentation
![Page 34: Graph Based Clustering](https://reader036.vdocuments.us/reader036/viewer/2022081801/5550b5edb4c90504628b4bc0/html5/thumbnails/34.jpg)
Thank you very much!