integrating constraints and metric learning in semi-supervised clustering
DESCRIPTION
Integrating Constraints and Metric Learning in Semi-Supervised Clustering. Mikhail Bilenko, Sugato Basu, Raymond J. Mooney ICML 2004 Presented by Xin Li. Semi-Supervised Clustering. K=4. Semi-Supervised Clustering. Semi-Supervised Clustering. How to exploit supervision in clustering. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/1.jpg)
Integrating Constraints and Metric Learning in Semi-Supervised Clustering
Mikhail Bilenko, Sugato Basu, Raymond J. MooneyICML 2004
Presented by Xin Li
![Page 2: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/2.jpg)
Semi-Supervised ClusteringK=4
![Page 3: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/3.jpg)
Semi-Supervised Clustering
![Page 4: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/4.jpg)
Semi-Supervised Clustering
![Page 5: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/5.jpg)
How to exploit supervision in clustering
Incorporate supervision as constraints Learn a distance metric using
supervision Integration of these two approaches
![Page 6: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/6.jpg)
K-means Clustering
X = {x1,x2,…}
L = {l1,l2,…,lk}Euclidean Distance:
Minimizing:
![Page 7: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/7.jpg)
Clustering with constraints
Pairwise constraints: M – Must-link pairs
(xi, xj) should be in the same cluster
C -- Cannot-link pairs (xi, xj) should be in different
clusters
![Page 8: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/8.jpg)
Learning a pairwise distance metricBinary Classification: (xi, xj) 0/1 M positive examples
(xi, xj) are the same cluster C negative examples
(xi, xj) are in different clusters
Apply the learned distance metric in clustering Metric learning and clustering are disjointed
![Page 9: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/9.jpg)
Unsupervised Clustering with Metric Learning
Maximizing the complete data log-likelihood under generalized K-means
Learn a distance metric that optimize a quality function
![Page 10: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/10.jpg)
Integrating Constraints and Metric Learning
Combining the previous two equations leads to the following objective function that minimizes cluster dispersion under that learned metrics while reducing constraint violations.
![Page 11: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/11.jpg)
Penalty for violating constraints
Penalty for violating a must-link constraints between distant points should be higher than that between nearby points.
Penalty for violating a cannot-link constraints between nearby points should be lower than that between nearby points.
![Page 12: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/12.jpg)
MPCK-MEANS Algorithm
Constraints are utilized during cluster initialization and when assigning points to clusters.
The distance metric is adapted by re-estimating the weights in matrices Ah.
![Page 13: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/13.jpg)
Initialization An initial guess of the clusters. Assign each point x to one of K clusters in a way that satisfies the
constraints. Compute the centroid of each cluster.
![Page 14: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/14.jpg)
E-step Every point x is assigned to the cluster that
minimizes the sum of the distance of x to the cluster centroid according to the local metric and the cost of any constraint violations incurred by the cluster assignment.
![Page 15: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/15.jpg)
M-Step
= 0
Update Metrics:
![Page 16: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/16.jpg)
Experimental Setting
![Page 17: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/17.jpg)
Single Metric, Diagonal Matrix A
![Page 18: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/18.jpg)
Single Metric, Diagonal Matrix A
![Page 19: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/19.jpg)
Multiple Metrics, Full Matrix A
![Page 20: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/20.jpg)
Multiple Metrics, Full Matrix A
![Page 21: Integrating Constraints and Metric Learning in Semi-Supervised Clustering](https://reader035.vdocuments.us/reader035/viewer/2022062323/56815c91550346895dca9f4e/html5/thumbnails/21.jpg)
Conclusion and Discussion
This paper has presented MPCK-MEANS, a new approach to semi-supervised clustering.
Supervision and metric learning are helpful in clustering and multiple distance metrics are not necessary in most cases.
Question 1: If we have supervision in clustering, why not utilize supervision in the same way as in a typical classification task ?
Question 2: If there are infinite number of classes, can we gain from supervision on part of them ?