2010 icml
TRANSCRIPT
![Page 1: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/1.jpg)
Multiple Non-Redundant Spectral
Clustering ViewsDonglin Niu, Jennifer G. Dy
Department of Electrical and Computer Engineering, Northeastern University, Boston, MA
Michael I. Jordan
EECS and Statistics Departments, University of California, Berkeley
![Page 2: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/2.jpg)
Motivation of Multiple Clustering
![Page 3: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/3.jpg)
Another Example
Given medical data, From doctor’s view:
according to type of disease From insurance company view:
based on patient’s cost/risk
![Page 4: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/4.jpg)
Two kinds of Approaches:Iterative Given an existing clustering, find another
clustering Conditional Information Bottleneck. Gondek and
Hofmann (2004) COALA. Bae and Bailey (2006) Minimizing KL-divergence. Qi and Davidson (2009)
Multiple alternative clusterings Orthogonal Projection. Cui et al. (2007)
Previous Work
Iterative & Simultaneous
![Page 5: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/5.jpg)
SimultaneousDiscovery of all the possible partitionings
Meta Clustering. Caruana et al. (2006) De-correlated kmeans. Jain et al. (2008)
![Page 6: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/6.jpg)
Ensemble Clustering
Hierarchical Clustering
Different from
![Page 7: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/7.jpg)
1. have high cluster quality, and2. be non-redundant
and we’d like to simultaneously 3. learn the subspace in each view
Problem FormulationVIEW 1
VIEW 2
NKThere are O( ) possible clustering solutions. We’d like to find solutions that:
![Page 8: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/8.jpg)
Normalized Cut (On Spectral Clustering, Ng et al.)
-maximize within-cluster similarity and minimize between-cluster similarity.
Let U be the cluster assignment
Advantage: Can discover arbitrarily-shaped clusters.
Clustering Quality
IUUts
UKDDUT
T
..
)(max tr 2/12/1
![Page 9: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/9.jpg)
There are several possible criteria: Correlation, Mutual information.
Correlation: can capture only linear dependencies.
Mutual information: can capture non-linear dependencies, but requires estimating the joint probability distribution.
Non-Redundant Clustering Views
2,HSIC
HSxycy)(x
In this approach, we choose Hilbert-Schmidt Information Criterion
Advantage: Can detect non-linear dependence, do not need to estimate joint probability distributions.
![Page 10: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/10.jpg)
HSIC is the norm of a cross-covariance matrix in kernel space.
Empirical estimate of HSIC
Hilbert-Schmidt Information Criteria (HSIC)
)])(())([( yxxyxy yxEC
)KHLH(1
:),(HSIC2tr
nYX
Tnn
jiijjiij
nn
nI
yylxxk
R
ts
111
H
),(:L),,(:K
,LK,H,
..
Number of observations
Kernel functions
2,HSIC
HSxycy)(x
![Page 11: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/11.jpg)
Overall Objective Function
),(, , ..
)(tr)(trmaximize
,
:RedundancyCut Normalized :QualityCluster
2/12/1
jTvi
Tvijvv
Tvv
Tv
qv
HSIC
qvRU vvvvTv
xWxWKKIWWIUUts
HHKKUDKDUcnv
Where Uv is the embedding, Kv is the kernel matrix, Dv is the degree matrix for each view v. Hv is the matrix to centralize the kernel
matrix. All these are defined in subspace Wv.
![Page 12: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/12.jpg)
Step 1: Fixed Wv, optimize for Uv
Solution to Uv is equal to the eigenvectors with the largest eigenvalues of the normalized kernel similarity matrix.
AlgorithmWe use a coordinate ascent approach.
Step 2: Fixed Uv, optimize for Wv
We use gradient ascent on a Stiefel manifold.
Repeat Steps 1 & 2 until convergence.
K-means Step: Normalize Uv. Apply k-means on Uv.
![Page 13: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/13.jpg)
Cluster the features using spectral clustering. Data x = [f1 f2 f3 f4 f5 …fd] Feature similarity based on HSIC(fi,fj).
Initialize Wv
..0..
..100
..000
..010
..001vWf1
f2
f4
Transformation Matrix
f3f21
f9
…
…f15
f34
f7…
![Page 14: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/14.jpg)
Synthetic Data Synthetic Data 1View 1 View 2
Synthetic Data 2View 1 View 2
DATA 1 DATA 2
VIEW 1 VIEW 2 VIEW 1 VIEW 2
mSC 0.94 0.95 0.90 0.93
OPC 0.89 0.85 0.02 0.07
DK 0.87 0.94 0.03 0.05
SC 0.37 0.42 0.31 0.25
Kmeans 0.36 0.34 0.03 0.05
mSC: our algorithmOPC: orthogonal Projection
(Cui et al., 2007)DK: de-correlated Kmeans
(Jain et al., 2008)SC: spectral clustering
Normalized Mutual Information (NMI) Results
![Page 15: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/15.jpg)
Face Image Data
•Mean face•Number below each image is cluster purity
Identity (ID)View Pose ViewFACE
ID POSE
mSC 0.79
0.42
OPC 0.67 0.37
DK 0.70 0.40
SC 0.67 0.22
Kmeans
0.64 0.24
NMI Results
![Page 16: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/16.jpg)
High weight word in each subspace view
view 1 Cornell, Texas, Wisconsin, Madison, Washington
view 2
homework, student, professor, project, Ph.d
Webkb Data High Weight Words
NMI Results
Webkb
Univ. Type
mSC 0.81 0.54
OPC 0.43 0.53
DK 0.48 0.57
SC 0.25 0.39
Kmeans 0.10 0.50
![Page 17: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/17.jpg)
Subjects
Physics Information Biology
materialschemicalmetalopticalquantum
controlprogramminginformationfunctionlanguages
cellgeneproteinDNABiological
Work Type
experimental theoretical
methodsmathematicaldevelopequationtheoretical
ExperimentsProcessesTechniquesMeasurementssurface
NSF Award Data High Frequent Words
![Page 18: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/18.jpg)
Machine Sound Data
Machine Sound Data
Motor Fan Pump
mSC 0.82 0.75 0.83
OPC 0.73 0.68 0.47
DK 0.64 0.58 0.75
SC 0.42 0.16 0.09
Kmeans 0.57 0.16 0.09
Normalized Mutual Information (NMI) Results
![Page 19: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/19.jpg)
Summary Most clustering algorithms only find one single
clustering solution. However, data may be multi-faceted (i.e., it can be interpreted in many different ways).
We introduced a new method for discovering multiple non-redundant clusterings.
Our approach, mSC, optimizes both a spectral clustering (to measure quality) and an HSIC regularization (to measure redundancy).
mSC, can discover multiple clusters with flexible shapes, while simultaneously find the subspace in which these clustering views reside.
![Page 20: 2010 ICML](https://reader034.vdocuments.us/reader034/viewer/2022052619/55694dccd8b42ad3278b45e5/html5/thumbnails/20.jpg)
Thank you!