recentadvances&in&stochas0c&flow&clustering& · 1 theoretical results •...
TRANSCRIPT
![Page 1: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/1.jpg)
Recent Advances in Stochas0c Flow Clustering
Srinivasan Parthasarathy
Data Mining Research Laboratory Dept. of Computer Science and Engineering
The Ohio State University
h:p://www.cse.ohio-‐state.edu/~srini
![Page 2: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/2.jpg)
Graph Clustering: A Fundamental Problem
2
Given a graph, discover groups of nodes that are strongly connected to one another but weakly connected to the rest of the graph.!!! !
![Page 3: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/3.jpg)
What Makes this Problem Hard?
• Scale – High throughput experiments, social media, high-‐res images.
• Noise – False posi0ve interac0ons; False nega0ves
• Novel Topological Characteris0cs – Hub nodes; power-‐law
• Domain Insights – Balance; known biological rela0onships
• Dynamics – Changes to nodes and links and content
3
![Page 4: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/4.jpg)
Extant Solu0ons
• Spectral methods [Shi ‘00] • Edge-‐based agglomera0ve/divisive methods [Newman ‘04]
• Graclus/Kernel K-‐Means [Dhillon ‘07]
• Me0s [Karypis ’98] + MQI [Leskovec, Lang’10]
• Markov Clustering [van Dongen’00]
• A Host of Specialized Solu0ons (e.g. MCODE, LINK-‐CLUSTER; etc.)
![Page 5: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/5.jpg)
5
Markov Clustering (MCL)!Stijn van Dongen, 2000!
!The original Stochastic flow
clustering algorithm!!
![Page 6: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/6.jpg)
6
3
1
2
4
1! 2! 3! 4!
1! 0.33! 0.25! 0.33!
2! 0.33! 0.25! 0.5! 0.33!
3! 0.25! 0.5!
4! 0.33! 0.25! 0.33!
Out-flows of 2!
In-flows of 2!
Column Stochastic Matrix: A matrix where each column sums to 1.!!Stochastic Flow: An entry in a column stochastic matrix, interpreted as the “flow” or “transition probability”.!
![Page 7: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/7.jpg)
7
Repeatedly apply certain operations to the flow matrix until the matrix converges and can be interpreted as a clustering. !
1! 2! 3! 4!
1!
2! 1.0! 1.0! 1.0!
3! 1.0!
4!3
1
2
4
![Page 8: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/8.jpg)
The MCL algorithm
Expand: M := M*M
Inflate: M := M.^r (r usually 2), renormalize columns
Converged?
Input: A, Adjacency matrix Ini0alize M to MG, the canonical transi0on matrix M:= MG:= (A+I) D-‐1
Yes
Output clusters
No
Prune
Enhances flow to well-‐connected nodes (i.e. nodes within a community).
Increases inequality in each column. “Rich get richer, poor get poorer.” (reduces flow across communi0es)
Saves memory by removing entries close to zero. Enables faster convergence
Clustering Interpreta0on: Nodes flowing into the same sink node are assigned same cluster labels
![Page 9: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/9.jpg)
9
[van Dongen ’00]
![Page 10: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/10.jpg)
10
MCL Strengths!!1. Theoretically well founded [Von Dongen’00]!
2. Simple, linear algebraic operations!
3. Noise Tolerant. [Brohee’06, Vlasblom’09]!!
[Chakrabarti and Faloutsos ‘06]
![Page 11: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/11.jpg)
11
MCL Limitations!!
!1. Outputs many small clusters. [Satuluri, Parthasarathy’09]!!
!!
2. Does not scale well. ! ! [Chakrabarti, Faloutsos’06]!
[Chakrabarti and Faloutsos ‘06]
![Page 12: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/12.jpg)
12
MCL Flaws!!!1. Outputs many small clusters.!!
Fix I: Regularized MCL !!
2. Does not scale well. !!
Fix II: Multi-Level Regularized MCL! Fix III: Localized Graph Sparsification!!
![Page 13: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/13.jpg)
Key Idea I: The Regularize operator
Why does MCL output many clusters? Due to overfisng; it does not penalize divergence of flows between neighbors.
Remedy: Penalize divergence in flows between neighbors. Use KL Divergence (a well known measure for comparing probability distribu0ons).
Turns out to have a nice closed form soluCon:
Regularize(M) :=M*(A+I)D-‐1= M*MG
![Page 14: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/14.jpg)
The Regularized-‐MCL algorithm
Regularize: M := M*MG
Inflate: M := M.^r (r usually 2), renormalize columns
Converge?
Yes
Output clusters
No
Prune
Takes into account flows of the neighbors.
Increases inequality in each column. “Rich get richer, poor get poorer.” [Hadamard power + rescaling]
Saves memory by removing entries close to zero. Enables faster convergence
Input: A, Adjacency matrix Ini0alize M to MG, the canonical transi0on matrix M:= MG:= (A+I) D-‐1
Nodes flowing into the same sink node are assigned same cluster labels
![Page 15: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/15.jpg)
Key Idea II: Mul0-‐level Regularized MCL
Input Graph
Intermediate Graph
Intermediate Graph
Coarsest Graph
. . . . . .
Coarsen
Coarsen
Coarsen
Run Curtailed R-‐MCL,project flow.
Run Curtailed R-‐MCL, project flow.
Input Graph
Run R-‐MCL to convergence, output clusters.
Faster to run on smaller graphs first!
Captures global
topology of graph!
Good initialization for refined flow
matrix!
![Page 16: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/16.jpg)
16
Comparison with MCL on!Protein Interaction Networks!
Dataset !(n,m)
Quality !Change! Speedup (Time)!
Yeast !(5k, 15k) 36%! 2.5x !
(0.4s)!
Yeast_Noisy!(6k, 200k) 300%! 57x!
(8s)!
Human!(10k, 60k) 21.6%! 200x!
(2s)!
[Hardware: Quad-core Intel i5 CPU, 3.2 GHz, with 16GB RAM ]
![Page 17: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/17.jpg)
Comparison with Graclus and Me0s
Quality: MLR-‐MCL improves upon both Graclus and Me0s
Speed: MLR-‐MCL is faster than Graclus, comparable to Me0s
![Page 18: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/18.jpg)
Key Idea III: Graph Sparsifica0on
Is there a simple pre-processing of the graph to reduce the edge set that can “clarify” or “simplify” its cluster structure?
18
Original! Sparsified!
!!
![Page 19: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/19.jpg)
Our Approach
Main Idea: Retain edges which are likely to be intra-‐cluster edges , while discarding likely inter-‐cluster edges.
Similarity-‐based SparsificaCon HeurisCc: An edge (i,j) is likely to be an intra-‐cluster edge if ver0ces i and j have highly overlapping adjacency lists.
|)()(||)()(|),(
jAdjiAdjjAdjiAdjjiSim
∪
∩=
![Page 20: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/20.jpg)
20
Algorithm: Global Sparsification (G-Spar)!
Parameter: Sparsification ratio, s!!
1. For each edge <i,j>:!(i) Calculate Sim ( <i,j> )
2. Retain top s% of edges in order of Sim, discard others!
!
![Page 21: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/21.jpg)
21
Dense clusters are over-represented, sparse clusters under-represented!!Works great when the goal is to just find the top communities!
![Page 22: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/22.jpg)
22
Algorithm: Local Sparsification (L-Spar)!
Parameter: Sparsification exponent, e (0 < e < 1)!!
1. For each node i of degree di:!(i) For each neighbor j: !
(a) Calculate Sim ( <i,j> ) (ii) Retain top (d i)e neighbors in order of Sim, for node i!!
Edges compete to be retained locally (think globally act locally paradigm)
![Page 23: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/23.jpg)
23
Ensures representation of clusters of varying densities!
![Page 24: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/24.jpg)
24
But...!
Similarity computation is expensive!!!!
Solution: A randomized, approximate solution based on Minwise Hashing [Broder et. al., 1998]!
![Page 25: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/25.jpg)
25
Dataset !(n, m)!
Spars. !Ratio!
L-Spar!
Speed! Quality!
Yeast ! 17%! 17x! +4%!
Human! 40%! 6x! +1%!
L-Spar: Results Using MLR-MCL
[Hardware: Quad-core Intel i5 CPU, 3.2 GHz, with 16GB RAM ]
![Page 26: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/26.jpg)
Results using MLR-‐MCL
Dataset (Nodes, Edges)
Spars. RaCo
RandomEdge G-‐Spar L-‐Spar
Spdup QualityΔ Spdup
QualityΔ Spdup
QualityΔ
BioGrid(6K, 200K)
17% 6x -‐16% 38x -‐23% 17x +4%
Wiki (1.1M, 53M)
15% 19x -‐58% 92x -‐54% 23x -‐4.5%
Orkut(3M, 117M)
17% 6x -‐32% 39x -‐59% 22x 0
Twi:er(146K, 83M)
4% 63x -‐90% 188x +10% 22x +40%
L-‐Spar enables high speed-‐ups, without significant loss of accuracy.
![Page 27: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/27.jpg)
1 Theoretical Results
• Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal tothe number of components of the graph.
• Theorem 2.2(d) of [1]:Pn
i=1 �i = 2|E(G)| =P
v d(v)
2 SIGMOD Paper
Flickr 33911, Flickr.spars 33953. Total 64903 nodesWikipedia 1, Wikipedia.spars 164. Total 1129060 nodesOrkut 186, Orkut.spars 257. Total 3072626 nodes
Dataset Original L-Spar G-SparBioGrid �4197 = 0.34284616 �4197 = 0.08680336 �4894 = 0.14459744DIP �45 = 0.117378 �54 = 0.03226163 �888 = 0.036117
Human �219 = 0.1038301 �234 = 0.05650493 �2266 = 0.05255889
Table 1: Spectral Gap Comparisons
Figure 1: BioGrid Eigenvalues COmparison
References
[1] B. Mohar. The laplacian spectrum of graphs. Graph theory, combinatorics, and applications,2:871–898, 1991.
1
Impact of Sparsification on Spectrum: Yeast PPI
![Page 28: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/28.jpg)
Global Sparsification results in multiple components
Local sparsification seems to match trends of original graph
Human PPI
![Page 29: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/29.jpg)
Synthe0c data (Fortunato’09)
As clustering problem gets harder, L-‐Spar is more beneficial
![Page 30: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/30.jpg)
SOCIAL IMPACT: EMERGENCY RESPONSE AND FLOOD MAPPING
![Page 31: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/31.jpg)
Copyright 2006, Data Mining Research Laboratory
Crisis Informatics and Flood Mapping
• Disaster Informatics or crisis informatics is the study of the use of information and technology in the different phases of disasters or crisis
• Flood Mapping: Mapping the extent of flood damage a key step for relief and recovery
![Page 32: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/32.jpg)
Copyright 2006, Data Mining Research Laboratory
Chennai Floods (2015): Social Sensing Enhanced Flood Mapping
Prior to 1rst Depression After 3rd Depression
![Page 33: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/33.jpg)
Markov Clustering and Flood Mapping
• Water Delinea0on Segmenta0on of remote-‐sensed images is key strategy employed in flood mapping.
• Confounding factors: cloud cover, sinuous river beds, urban area reflectance effects
• Key Idea: Semi-‐supervised flood mapping using MLR-‐MCL as a key pre-‐processing step.
![Page 34: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/34.jpg)
Procedure
1. Cluster patches of satellite image with MLR-‐MCL – Graph-‐based image segmenta0on [Shi-‐Malik’00]
2. Guided patch labeling a) Volunteer crowdsourcing b) Social-‐media induced labels
3. Semi-‐supervised learning of flood extent HUG-‐FM (KNN variant) SEANO (Neural Net)
![Page 35: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/35.jpg)
Houston Floods (original image)
![Page 36: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/36.jpg)
Otsu Thresholding
• Clustering based image thresholding [Otsu’79]
• Converts image to binary image
• Simple, widely used, but prone to false posi0ves
• Detects highways as waterbodies
![Page 37: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/37.jpg)
Watershed Algorithm • Relies on pre-‐iden0fied landmarks [Beucher’79, Meyer’92]
• Applies gradient transforma0on and thresholding
• Prone to smoothing errors
![Page 38: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/38.jpg)
Normal Thresholding • Improved variant of Otzu and watershed
• Relies on land-‐cover iden0fica0on
• nuanced threshold separa0on of types of land-‐cover from water
• State-‐of-‐the-‐art in remote sensing [2016]
![Page 39: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/39.jpg)
HUG-‐FM + MLR-‐MCL
![Page 40: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/40.jpg)
![Page 41: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/41.jpg)
Quantitative evaluation on Houston dataset
METHOD ACCURACY F1 Otsu Thr 0.89 0.74 Watershed 0.89 0.68 Normal Thr 0.87 0.84 HUG-‐FM 0.96 0.87 SEANO 0.97 0.90
Rely on MLR-‐MCL preprocessing Quality (SEANO) vs Speed (HUG-‐FM) tradeoff
Standard Remote Sensing Methods
![Page 42: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/42.jpg)
Chennai Floods 11/24 (bet 2nd and 3rd depression)
![Page 43: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/43.jpg)
Watershed (Beucher, Meyer 1992)
![Page 44: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/44.jpg)
N-‐cuts (100 parOOons) (Shi, Malik’01)
![Page 45: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/45.jpg)
HUG-‐FM + MLR-‐MCL patching
![Page 46: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/46.jpg)
Take Home: Recent Advances in MCL Key Idea 1: Regularization !• Avoids fragmenting community structure. [SIGKDD’09, ACM BCB’10, Bioinformatics 2012]!
!Key Idea 2: Multi-level Regularization !• Improves scalability. [SIGKDD’09, ACM BCB’10]!
!Key Idea 3: Sparsification: Simple pre-processing that makes a difference!• Reduces clustering time from hours down to minutes. [SIGMOD’11, WWW’13]!• Theoretical rationale [SoCG’17]!
!Key Ideas 4 & 5 : Soft Clustering[ISMB12] & GPU acceleration [HiPC14]!!Social Impact: Use of MLR-MCL for Flood Mapping shows promise.!
46
![Page 47: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/47.jpg)
References (incomplete)
1. MCL -‐ Graph Clustering by Flow SimulaVon. S. van Dongen, Ph.D. thesis, University of Utrecht, 2000.
2. Graclus -‐ Weighted Graph Cuts without Eigenvectors: A MulVlevel Approach. Dhillon et. al., IEEE. Trans. PAMI, 2007.
3. Me0s -‐ A fast and high quality mulVlevel scheme for parVVoning irregular graphs. Karypis and Kumar, SIAM J. on Scien0fic Compu0ng, 1998
4. Normalized Cuts and Image SegmentaVon. Shi and Malik, IEEE. Trans. PAMI, 2000.
5. Finding and evaluaVng community structure in networks. Newman and Girvan, Phys. Rev. E 69, 2004.
6. The idenVficaVon of funcVonal modules from the genomic associaVon of genes. Snel et. al., PNAS 2002.
![Page 48: RecentAdvances&in&Stochas0c&Flow&Clustering& · 1 Theoretical Results • Theorem 2.1(c) of [1]: The multiplicity of 0 as an eigenvalue of graph Laplacian is equal to the number of](https://reader030.vdocuments.us/reader030/viewer/2022041212/5dd1357ad6be591ccb64be49/html5/thumbnails/48.jpg)
Thanks & Acknowledgements
• Joint work with: • Albert Liang • Peter Jacobs • Nikhita Vedula • Venu Satuluri (Twi:er) • Yu-‐Keng Shih (GraphSQL) • Sitaram Asur (HP Laboratories) • Duygu Ucar (Jackson Laboratories)
• Grant Acknowledgements • NSF HazardSEES #1520870 and SOCS # IIS-‐1111118
• Soyware and References:h:ps://sites.google.com/site/stochas0cflowclustering/