hcs clustering algorithm a clustering algorithm based on graph connectivity
Post on 19-Dec-2015
264 views
TRANSCRIPT
![Page 1: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/1.jpg)
HCS Clustering Algorithm
A Clustering Algorithm
Based on Graph Connectivity
![Page 2: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/2.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 2
Presentation Outline
• The Problem
• HCS Algorithm Overview– Main Players– General Algorithm– Properties– Improvements
• Conclusion
![Page 3: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/3.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 3
The Problem
• Clustering:– Group elements into subsets based on similarity between
pairs of elements
• Requirements:– Elements in the same cluster are highly similar to each
other– Elements in different clusters have low similarity to each
other
• Challenges:– Large sets of data– Inaccurate and noisy measurements
![Page 4: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/4.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 4
Presentation Outline
• The Problem
• HCS Algorithm Overview– Main Players– General Algorithm– Properties– Improvements
• Conclusion
![Page 5: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/5.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 5
HCS Algorithm Overview
• Highly Connected Subgraphs Algorithm– Uses graph theoretic techniques
• Basic Idea– Uses similarity information to construct a
similarity graph– Groups elements that are highly connected with
each other
![Page 6: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/6.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 6
Presentation Outline
• The Problem
• HCS Algorithm Overview– Main Players– General Algorithm– Properties– Improvements
• Conclusion
![Page 7: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/7.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 7
HCS: Main Players
• Similarity Graph– Nodes correspond to elements (genes)– Edges connect similar elements (those whose similarity
value is above some threshold)
gene1
gene2
gene3
Gene1 similar to gene2
Gene1 similar to gene3
Gene2 similar to gene3
![Page 8: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/8.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 8
HCS: Main Players
• Edge Connectivity– Minimum number of edges whose removal results in a
disconnected graph
Must remove 3 edges to disconnect graph, thus has an edge connectivity k(G) = 3
gene1
gene2
gene4
gene3
![Page 9: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/9.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 9
HCS: Main Players
• Edge Connectivity– Minimum number of edges whose removal results in a
disconnected graph
Must remove 3 edges to disconnect graph, thus has an edge connectivity k(G) = 3
gene1
gene2
gene4
gene3
![Page 10: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/10.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 10
HCS: Main Players
• Edge Connectivity– Minimum number of edges whose removal results in a
disconnected graph
Must remove 3 edges to disconnect graph, thus has an edge connectivity k(G) = 3
gene1
gene2
gene4
gene3
![Page 11: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/11.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 11
HCS: Main Players
• Highly Connected Subgraphs– Subgraphs whose edge connectivity exceeds half the
number of nodes
gene1
gene2
gene4
gene5
gene3 gene6
gene7
gene8
Entire GraphNodes = 8Edge connectivity = 1
Not HCS!
![Page 12: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/12.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 12
HCS: Main Players
• Highly Connected Subgraphs– Subgraphs whose edge connectivity exceeds half the
number of nodes
gene1
gene2
gene4
gene5
gene3 gene6
gene7
gene8
HCS!
Sub GraphNodes = 5Edge connectivity = 3
![Page 13: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/13.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 13
HCS: Main Players
• Cut– A set of edges whose removal disconnects the graph
gene1
gene2
gene7gene4
gene5
gene3 gene6
gene8
![Page 14: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/14.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 14
HCS: Main Players
• Minimum Cut– A cut with a minimum number of edges
gene1
gene2
gene7gene4
gene3 gene6
gene5 gene8
![Page 15: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/15.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 15
HCS: Main Players
• Minimum Cut– A cut with a minimum number of edges
gene1
gene2
gene3 gene6
gene5 gene8
gene7gene4
![Page 16: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/16.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 16
HCS: Main Players
• Minimum Cut– A cut with a minimum number of edges
gene1
gene2
gene3
gene5 gene8
gene4
gene6
gene7
![Page 17: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/17.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 17
Presentation Outline
• The Problem
• HCS Algorithm Overview– Main Players– General Algorithm– Properties– Improvements
• Conclusion
![Page 18: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/18.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 18
HCS: Algorithm (by example)
2 4
1011
5
1
12
3
7
6
9 8find and remove a minimum cut
![Page 19: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/19.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 19
HCS: Algorithm (by example)
Highly Connected!
2 4
1011
5
1
12
3
7
6
9 8are the resulting
subgraphs highly connected?
![Page 20: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/20.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 20
HCS: Algorithm (by example)
2 4
1011
5
1
12
3
7
6
9 8repeat process on non-highly
connected subgraphs
Cluster 1
![Page 21: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/21.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 21
HCS: Algorithm (by example)
2 4
1011
5
1
12
3
7
6
9 8find and remove a minimum cut
Cluster 1
![Page 22: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/22.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 22
HCS: Algorithm (by example)
Highly Connected!
2 4
Highly Connected!
1011
5
1
12
3
7
6
9 8are the resulting
subgraphs highly connected?
Cluster 1
![Page 23: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/23.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 23
HCS: Algorithm (by example)
Cluster 2
2 4
Cluster 3
1011
5
1
12
3
7
6
9 8resulting clusters
Cluster 1
![Page 24: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/24.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 24
HCS: Algorithm
HCS( G ) {
MINCUT( G ) = { H1, … , Ht }
for each Hi, i = [ 1, t ] {
if k( Hi ) > n ÷ 2
return Hi
else
HCS( Hi )}
}
![Page 25: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/25.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 25
HCS: Algorithm
HCS( G ) {
MINCUT( G ) = { H1, … , Ht }
for each Hi, i = [ 1, t ] {
if k( Hi ) > n ÷ 2
return Hi
else
HCS( Hi )}
}
Find a minimum cut in graph G. This returns a set of subgraphs { H1, … , Ht } resulting
from the removal of the cut set.
![Page 26: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/26.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 26
HCS: Algorithm
HCS( G ) {
MINCUT( G ) = { H1, … , Ht }
for each Hi, i = [ 1, t ] {
if k( Hi ) > n ÷ 2
return Hi
else
HCS( Hi )}
}
For each subgraph…
![Page 27: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/27.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 27
HCS: Algorithm
HCS( G ) {
MINCUT( G ) = { H1, … , Ht }
for each Hi, i = [ 1, t ] {
if k( Hi ) > n ÷ 2
return Hi
else
HCS( Hi )}
}
If the subgraph is highly connected, then return that
subgraph as a cluster. (Note: k( Hi ) denotes edge
connectivity of graph Hi, n denotes number of nodes)
![Page 28: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/28.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 28
HCS: Algorithm
HCS( G ) {
MINCUT( G ) = { H1, … , Ht }
for each Hi, i = [ 1, t ] {
if k( Hi ) > n ÷ 2
return Hi
else
HCS( Hi )}
}
Otherwise, repeat the algorithm on the subgraph.
(recursive function)
This continues until there are no more subgraphs, and all clusters have been found.
![Page 29: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/29.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 29
HCS: Algorithm
HCS( G ) {
MINCUT( G ) = { H1, … , Ht }
for each Hi, i = [ 1, t ] {
if k( Hi ) > n ÷ 2
return Hi
else
HCS( Hi )}
}
Running time is bounded by 2N × f( n, m ) where N is the number of clusters found, and f( n, m ) is the time complexity of computing a
minimum cut in a graph with n nodes and m edges.
![Page 30: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/30.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 30
HCS: Algorithm
HCS( G ) {
MINCUT( G ) = { H1, … , Ht }
for each Hi, i = [ 1, t ] {
if k( Hi ) > n ÷ 2
return Hi
else
HCS( Hi )}
}
Deterministic for Un-weighted Graph:
takes O(nm) steps where n is the number of nodes and m is the
number of edges
![Page 31: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/31.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 31
Presentation Outline
• The Problem
• HCS Algorithm Overview– Main Players– General Algorithm– Properties– Improvements
• Conclusion
![Page 32: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/32.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 32
HCS: Properties
• Homogeneity– Each cluster has a diameter of at most 2
• Distance is the minimum length path between two nodes– Determined by number of EDGES traveled between nodes
• Diameter is the longest distance in the graph
– Each cluster is at least half as dense as a clique• Clique is a graph with maximum possible edge connectivity
clique
Dist( a, d ) = 2Dist( a, e ) = 3Diam( G ) = 4
a
cb
f
d
e
![Page 33: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/33.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 33
HCS: Properties
• Separation– Any non-trivial split is unlikely to have diameter of two– Number of edges removed by each iteration is linear in
the size of the underlying subgraph• Compared to quadratic number of edges within final clusters• Indicates separation unless sizes are small• Does not imply number of edges removed overall
![Page 34: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/34.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 34
Presentation Outline
• The Problem
• HCS Algorithm Overview– Main Players– General Algorithm– Properties– Improvements
• Conclusion
![Page 35: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/35.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 35
HCS: Improvements
2 4
1011
1
12
3
7
6
8Choosing between cut sets
![Page 36: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/36.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 36
HCS: Improvements
2
1
12 7
6
8
4
1011
3
![Page 37: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/37.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 37
HCS: Improvements
2
1
12
6
7
8
4
11
3
10
![Page 38: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/38.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 38
HCS: Improvements
• Iterated HCS– Sometimes there are multiple minimum cuts to
choose from• Some cuts may create “singletons” or nodes that
become disconnected from the rest of the graph
– Performs several iterations of HCS until no new cluster is found (to find best final clusters)
• Theoretically adds another O(n) factor to running time, but typically only needs 1 – 5 more iterations
![Page 39: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/39.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 39
HCS: Improvements
• Remove low degree nodes first– If node has low degree, likely will just be
separated from rest of graph– Calculating separation for those nodes is
expensive– Removal helps eliminate unnecessary iterations
and significantly reduces running time
![Page 40: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/40.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 40
Presentation Outline
• The Problem
• HCS Algorithm Overview– Main Players– General Algorithm– Properties– Improvements
• Conclusion
![Page 41: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/41.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 41
Conclusion
• Performance– With improvements, can handle problems with
up to thousands of elements in reasonable computing time
– Generates clusters with high homogeneity and separation
– More robust (responds better when noise is introduced) than other approaches based on connectivity
![Page 42: HCS Clustering Algorithm A Clustering Algorithm Based on Graph Connectivity](https://reader036.vdocuments.us/reader036/viewer/2022081506/56649d405503460f94a19b7c/html5/thumbnails/42.jpg)
ECS289A Modeling Gene Regulation • HCS Clustering Algorithm • Sophie Engle 42
References
“A Clustering Algorithm
based on Graph Connectivity”By Erez Hartuv and Ron Shamir
March 1999 ( Revised December 1999)
http://www.math.tau.ac.il/~rshamir/papers.html