unsupervised feature selection for multi-cluster data deng cai et al, kdd 2010
DESCRIPTION
Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010. Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill. Introduction . The problem. Feature selection for high dimensional data Assume high dimensional n*d data matrix X - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/1.jpg)
Unsupervised Feature Selection for Multi-Cluster Data
Deng Cai et al, KDD 2010
Presenter: Yunchao GongDept. Computer Science, UNC Chapel Hill
![Page 2: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/2.jpg)
Introduction
![Page 3: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/3.jpg)
The problem
• Feature selection for high dimensional data
• Assume high dimensional n*d data matrix X
• When d is too high (d > 10000) , it causes problems– Storing the data is expensive– Computational challenges
![Page 4: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/4.jpg)
A solution to this problem
• Select the most informative features
• Assume high dimensional n*d data matrix X
n
d
![Page 5: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/5.jpg)
After feature selection
n
d d’ << d
• More scalable and can be much more efficient
![Page 6: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/6.jpg)
Previous approaches
• Supervised feature selection– Fisher score– Information theoretical feature selection
• Unsupervised feature selection– LaplacianScore– MaxVariance– Random selection
![Page 7: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/7.jpg)
Motivation of the paper
• Improving clustering (unsupervised) using feature selection
• Automatically select the most useful feature by discovering the manifold structure of data
![Page 8: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/8.jpg)
Motivation and Novelty
• Previous approaches – Greedy selection– Selects each feature independently
• This approach– Considers all features together
![Page 9: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/9.jpg)
A toy example
![Page 10: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/10.jpg)
Some background—manifold learning
• Tries to discover manifold structure from data
![Page 11: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/11.jpg)
appearance variation
manifolds in vision
![Page 12: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/12.jpg)
Some background—manifold learning
• Discovers smooth manifold structure• Map different manifold (classes) as far as
possible
![Page 13: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/13.jpg)
The proposed method
![Page 14: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/14.jpg)
Outline of the approach
• 1. manifold learning for cluster analysis
• 2. sparse coefficient regression
• 3. sparse feature selection
![Page 15: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/15.jpg)
Outline of the approach
1 2 3
Generate response Y
![Page 16: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/16.jpg)
1. Manifold learning for clustering
• Manifold learning to find clusters of data2
1
3
4
![Page 17: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/17.jpg)
1. Manifold learning for clustering
• Observations– Points in the same class are clustered together
![Page 18: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/18.jpg)
1. Manifold learning for clustering
• Ideas?– How to discover the manifold structure?
![Page 19: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/19.jpg)
1. Manifold learning for clustering
• Map similar points closer• Map dissimilar points faraway
Data pointssimilarity
min
![Page 20: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/20.jpg)
1. Manifold learning for clustering
• Map similar points closer• Map dissimilar points faraway
min
= ,
D = diag(sum(K))
![Page 21: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/21.jpg)
1. Manifold learning for clustering
• Constraining f to be orthogonal (f^TDf=I) to eliminate free scaling
• So we have the following minimization problem
•
![Page 22: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/22.jpg)
Summary of manifold discovery
• Construct graph K
• Choose a weighting scheme for K ( )
• Perform
• Use f as the response vectors Y
![Page 23: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/23.jpg)
Response vector Y
• Y reveals the manifold structure!
2
3
4
1
Response Y =
![Page 24: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/24.jpg)
2. Sparse coefficient regression
• When we have obtained Y, how to use Y to perform feature selection?
• Y reveals the cluster structure, use Y as response to perform sparse regression
![Page 25: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/25.jpg)
Sparse regression
• The ``Lasso’’
sparsely
![Page 26: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/26.jpg)
Steps to perform sparse regression
• Generate Y from step 1• Data matrix input X• For each column of Y, we denote it as Y_k• Perform the following step to estimate
![Page 27: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/27.jpg)
illustration
X Y_ka =
nonzero
![Page 28: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/28.jpg)
3. Sparse feature selection
• But for Y, we will have c different Y_k
• How to finally combine them?
• A simple heuristic approach
![Page 29: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/29.jpg)
illustration
a a a a
![Page 30: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/30.jpg)
The final algorithm
• 1. manifold learning to obtain Y
• 2. sparse regression to select features
• 3. final combination
, Y = f
![Page 31: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/31.jpg)
2
3
4
1
Response Y =
Step 1
![Page 32: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/32.jpg)
Step 2
X a
nonzero
2
3
4
1
regression
![Page 33: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/33.jpg)
Step 3
a a a a
![Page 34: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/34.jpg)
Discussion
![Page 35: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/35.jpg)
Novelty of this approach
• Considers all features
• Uses sparse regression ``lasso’’ to perform feature selection
• Combines manifold learning with feature selection
![Page 36: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/36.jpg)
Results
![Page 37: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/37.jpg)
Face image clustering
![Page 38: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/38.jpg)
Digit recognition
![Page 39: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/39.jpg)
Summary
– The method is technically new and interesting
– But not ground breaking
![Page 40: Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010](https://reader035.vdocuments.us/reader035/viewer/2022062323/568161e8550346895dd21857/html5/thumbnails/40.jpg)
Summary
• Test on more challenging vision datasets– Caltech– Imagenet