feature selection in k-median clustering olvi mangasarian and edward wild university of wisconsin -...
TRANSCRIPT
![Page 1: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/1.jpg)
Feature Selection in k-Median Clustering
Olvi Mangasarian and Edward Wild
University of Wisconsin - Madison
![Page 2: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/2.jpg)
Principal Objective
Find a reduced number of input space features such that clustering in the reduced space closely replicates the clustering in the full dimensional space
![Page 3: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/3.jpg)
Basic Idea Based on rigorous optimization theory, make a simple
but fundamental modification in one of the two steps of the k-median algorithm
In each cluster, find a point closest in the 1-norm to all points in that cluster and to the median of ALL data points
Proposed approach can lead to a feature reduction as high as 64%, with clustering comparable to within 4% to that with the original set of features
Based on increasing weight given to the data median, more features are deleted from problem
![Page 4: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/4.jpg)
FSKM Example
Start with median at origin
Apply k-median algorithm
As weight of data median increases, features are removed from the problem
![Page 5: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/5.jpg)
Outline of Talk
Ordinary k-median algorithm
Two steps of the algorithm
Feature Selecting k-Median (FSKM) Algorithm
Overall optimization objective
Basic idea Mathematical optimization formulation Algorithm statement
Numerical examplesConclusion & outlook
![Page 6: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/6.jpg)
Ordinary k-Median Algorithm
Given m data points in n-dimensional input feature spaceFind k cluster centers with the following propertyThe sum of the 1-norm distances between each data point
and the closest cluster center is minimizedFinding the minimum of a bunch of linear functions
is a concave minimization problem and is NP-hardHowever, the two-step k-median algorithm
terminates in a finite number of steps at a point satisfying the minimum principle necessary optimality condition
![Page 7: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/7.jpg)
Two-Step k-Median Algorithm
(0) Start with k initial cluster centers
(1) Assign each data point to a 1-norm closest cluster center
(2) For each cluster compute a new cluster center that is 1-norm closest to all points in the cluster (median of cluster)
(3) Stop if all cluster centers are unchanged else go to (1)
Algorithm terminates in a finite number of steps at a point satisfying the minimum principle necessary optimality conditions
![Page 8: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/8.jpg)
Key Change in Step (2) of k-Median Algorithm
(0)(1)(2) For each cluster compute a new cluster center that
minimizes the sum of 1-norm distances to all points in the cluster and a weighted 1-norm distance to the median of all data points
(3)Weight of 1-norm distance to dataset median determines number of features deleted:
For a zero weight no features are suppressed
For a sufficiently large weight all features are suppressed
and a weighted 1-norm distance to the median of all data points
![Page 9: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/9.jpg)
FSKM Theory
![Page 10: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/10.jpg)
Subgradients
f(y)-f(x) ¸ f(x)0(y-x) 8 x,y 2 Rn Consider ||x||1 , x 2 R1
If x < 0 ||x||1 = -1
If x > 0 ||x||1 = 1
If x = 0 ||x||1 2 [-1, 1]
![Page 11: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/11.jpg)
FSKM Theory (Continued)
![Page 12: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/12.jpg)
Zeroing Cluster Features(Based on Necessary and Sufficient Optimality Conditions
for Nondifferentiable Convex Optimization)
Thatis, cj = 0; whenever
![Page 13: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/13.jpg)
FSKM Algorithm
Thatis, cj = 0; whenever
![Page 14: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/14.jpg)
FSKM Example (Revisited)
Start with median at origin Apply k-median algorithm Compute ’s
x1 = 1
y1 = 5
x2 = 0
y2 = 4
max x = 1 max y = 5 For =1, feature x is removed
from the problem
1
2
x
y
![Page 15: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/15.jpg)
Numerical Testing
Thatis, cj = 0; whenever
FSKM tested on five publicly available labeled datasets
Labels were used only to test effectiveness of FSKM
Data is first clustered using k-median then FSKM is applied to delete one feature at a time
Without using data labels, “error” in FSKM clustering with reduced features is obtained by comparison with the “gold standard” clustering with the full set of features
FSKM clustering error curve obtained without labels is compared with classification error curve obtained using data labels
![Page 16: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/16.jpg)
3-Class Wine Dataset178 Points in 13-dimensional Space
![Page 17: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/17.jpg)
Remarks
Curves close togetherLargest increase in error
as last few features are removed
Reduced 13 features to 4:Clustering error < 4%Classification error
decreased by 0.56 percentage points
![Page 18: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/18.jpg)
2-Class Votes Dataset435 Points in 16-dimensional Space
![Page 19: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/19.jpg)
Remarks
Curves have similar shape Largest increase in error as
last few features are removed
Reduced 16 features to 3: Clustering error < 10% Classification error increased
by 1.84 percentage points
![Page 20: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/20.jpg)
2-Class WDBC Dataset(Wisconsin Diagnostic Breast Cancer)569 Points in 30-dimensional Space
![Page 21: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/21.jpg)
Remarks
Curves have similar shape for 14 and fewer features
First 3 features removed cause no change to either error curve
Reduced 30 features to 7: Clustering error < 10% Classification error increased
by 3.69 percentage points
![Page 22: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/22.jpg)
2-Class Star/Galaxy-Bright Dataset2462 Points in 14-dimensional Space
![Page 23: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/23.jpg)
Remarks
Clustering error increases gradually as number of features is reduced
Some features obstructing classification
Reduced 14 features to 4: Clustering error < 10% Classification error decreased
by 1.42 percentage points
![Page 24: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/24.jpg)
2-Class Cleveland Heart Dataset297 Points in 13-dimensional Space
![Page 25: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/25.jpg)
Remarks
Largest increase in both curves going from 13 to 9 features
Most features useful?Reduced 13 features to
8:Clustering error < 17%Classification error
increased by 7.74 percentage points
![Page 26: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/26.jpg)
Conclusion
FSKM is a fast method for selecting relevant features while maintaining clusters similar to those in the original full dimensional space
Features selected by FSKM without labels may be useful for labeled data classification as well
FSKM eliminates costly search for appropriately reduced number of features required for clustering in smaller dimensional spaces (e.g. 14-choose-6 = 3003 k-median runs to get best 6 features out of 14 for the Star/Galaxy-Bright dataset compared to 9 k-median runs required by FSKM)
![Page 27: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/27.jpg)
Outlook
Feature & data selection for support vector machinesSparse kernel approximation methodsGene expression selection
Incorporation of prior knowledge into learning
Optimization-based clustering may be useful in other machine learning applications
Minimalist supervised & unsupervised
learningSelect minimal knowledge for best model
![Page 28: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison](https://reader035.vdocuments.us/reader035/viewer/2022062423/5697bfd91a28abf838caf99b/html5/thumbnails/28.jpg)
Web Pages(Containing Paper & Talk)
www:cs:wisc:edu=øolvi
www:cs:wisc:edu=øwildt