unsupervised learning i: k-means...
TRANSCRIPT
![Page 1: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/1.jpg)
Unsupervised Learning I:
K-Means Clustering
Reading:
Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp. 487-515, 532-541, 546-552
(http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf)
![Page 2: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/2.jpg)
Unsupervised learning = No labels on training examples!
Main approach: Clustering
![Page 3: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/3.jpg)
Example: Optdigits data set
![Page 4: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/4.jpg)
Optdigits features f1 f2 f3 f4 f5 f6 f7 f8
f9
x = (f1, f2, ..., f64) = (0, 2, 13, 16, 16, 16, 2, 0, 0, ...) Etc. ..
![Page 5: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/5.jpg)
Partitional Clustering of Optdigits
Feature 1
Feature 2
Feature 3 64-dimensional space
![Page 6: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/6.jpg)
Partitional Clustering of Optdigits
Feature 1
Feature 2
Feature 3 64-dimensional space
![Page 7: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/7.jpg)
Hierarchical Clustering of Optdigits
Feature 1
Feature 2
Feature 3 64-dimensional space
![Page 8: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/8.jpg)
Hierarchical Clustering of Optdigits
Feature 1
Feature 2
Feature 3 64-dimensional space
![Page 9: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/9.jpg)
Hierarchical Clustering of Optdigits
Feature 1
Feature 2
Feature 3 64-dimensional space
![Page 10: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/10.jpg)
Issues for clustering algorithms
• How to measure distance between pairs of instances?
• How many clusters to create?
• Should clusters be hierarchical? (I.e., clusters of clusters)
• Should clustering be “soft”? (I.e., an instance can belong to different clusters, with “weighted belonging”)
![Page 11: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/11.jpg)
Most commonly used (and simplest) clustering algorithm:
K-Means Clustering
![Page 12: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/12.jpg)
Adapted from Andrew Moore, http://www.cs.cmu.edu/~awm/tutorials
![Page 13: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/13.jpg)
Adapted from Andrew Moore, http://www.cs.cmu.edu/~awm/tutorials
![Page 14: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/14.jpg)
Adapted from Andrew Moore, http://www.cs.cmu.edu/~awm/tutorials
![Page 15: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/15.jpg)
Adapted from Andrew Moore, http://www.cs.cmu.edu/~awm/tutorials
![Page 16: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/16.jpg)
![Page 17: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/17.jpg)
K-means clustering algorithm
![Page 18: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/18.jpg)
K-means clustering algorithm
Typically, use mean of points in cluster as centroid
![Page 19: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/19.jpg)
K-means clustering algorithm
Distance metric: Chosen by user. For numerical attributes, often use L2 (Euclidean) distance. Centroid of a cluster here refers to the mean of the points in the cluster.
d(x, y) = (xi − yi )2
i=1
n
∑
![Page 20: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/20.jpg)
Example: Image segmentation by K-means clustering by color From http://vitroz.com/Documents/Image%20Segmentation.pdf
K=5, RGB space
K=10, RGB space
![Page 21: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/21.jpg)
K=5, RGB space
K=10, RGB space
![Page 22: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/22.jpg)
K=5, RGB space
K=10, RGB space
![Page 23: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/23.jpg)
• A text document is represented as a feature vector of word frequencies
• Distance between two documents is the cosine of the angle between their corresponding feature vectors.
Clustering text documents
![Page 24: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/24.jpg)
Figure 4. Two-dimensional map of the PMRA cluster solution, representing nearly 29,000 clusters and over two million articles.
Boyack KW, Newman D, Duhon RJ, Klavans R, et al. (2011) Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches. PLoS ONE 6(3): e18029. doi:10.1371/journal.pone.0018029 http://www.plosone.org/article/info:doi/10.1371/journal.pone.0018029
![Page 25: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/25.jpg)
Exercise 1
![Page 26: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/26.jpg)
How to evaluate clusters produced by K-means?
• Unsupervised evaluation
• Supervised evaluation
![Page 27: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/27.jpg)
Unsupervised Cluster Evaluation
We don’t know the classes of the data instances
Let C denote a clustering (i.e., set of K clusters that is the result of a clustering algorithm) and let ci denote a cluster in C. Let |ci| denote the number of elements in ci. Clustering C:
How can we quantify how good a clustering this is?
|c1|= 9
|c2|= 6
|c3|= 6
c1 c2
c3
![Page 28: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/28.jpg)
We want to minimize the distance between elements of c and the centroid µc . coherence of each cluster c – i.e., minimize Mean Square Error (mse):
mse(c) =d(x,
x∈c∑ µc )
2
| c |
Average mse (C) =mse(c)
c∈C∑
K
c1 c2
c3
![Page 29: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/29.jpg)
We want to minimize the distance between elements of c and the centroid µc . coherence of each cluster c – i.e., minimize Mean Square Error (mse):
mse(c) =d(x,
x∈c∑ µc )
2
| c |
Average mse (C) =mse(c)
c∈C∑
K Note: The assigned reading uses sum square error rather than mean square error.
c1 c2
c3
![Page 30: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/30.jpg)
We also want to maximize pairwise separation of each cluster.
c1 c2
c3
![Page 31: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/31.jpg)
We also want to maximize pairwise separation of each cluster. That is, maximize Mean Square Separation (mss):
mss (C) =d(µi,µ j
all distinct pairs of clusters i, j∈C (i≠ j )∑ )2
K(K −1) / 2
c1 c2
c3
![Page 32: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/32.jpg)
We also want to maximize pairwise separation of each cluster. That is, maximize Mean Square Separation (mss):
mss (C) =d(µi,µ j
all distinct pairs of clusters i, j∈C (i≠ j )∑ )2
K(K −1) / 2
c1 c2
c3
![Page 33: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/33.jpg)
Exercises 2-3
![Page 34: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/34.jpg)
Supervised Cluster Evaluation
Suppose we know the classes of the data instances: black, blue, green
c1 c2
c3
![Page 35: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/35.jpg)
Supervised Cluster Evaluation
Suppose we know the classes of the data instances: black, blue, green
Entropy of a cluster: The degree to which a cluster consists of objects of a single class.
c1 c2
c3
![Page 36: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/36.jpg)
entropy(ci ) = − pi, jj=1
|Classes|
∑ log2 pi, j
wherepi, j = probability that a member of cluster i belongs to class j
=mi, j
mi
, where mi, j is the number of instances in cluster i with class j
and mi is the number of instances in cluster i
c1 c2
c3
entropy(c1) = −39log2
39+49log2
49+29log2
29
⎛
⎝⎜
⎞
⎠⎟=1.74
entropy(c2 ) = −16log2
16+46log2
46+16log2
16
⎛
⎝⎜
⎞
⎠⎟= 0.78
entropy(c3) = −46log2
46+26log2
26
⎛
⎝⎜
⎞
⎠⎟= 0.45
![Page 37: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/37.jpg)
Mean entropy of a clustering: Average entropy over all clusters in the clustering, weighted by number of elements in each cluster: where mi is the number of instances in cluster ci and m is the total number of instances in the clustering.
mean entropy(C) = mi
mi=1
K
∑ entropy(ci )
![Page 38: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/38.jpg)
Mean entropy of a clustering: Average entropy over all clusters in the clustering, weighted by number of elements in each cluster: where mi is the number of instances in cluster ci and m is the total number of instances in the clustering.
mean entropy(C) = mi
mi=1
K
∑ entropy(ci )
c1 c2
c3
entropy(c1) = −39log2
39+49log2
49+29log2
29
⎛
⎝⎜
⎞
⎠⎟=1.74
entropy(c2 ) = −16log2
16+46log2
46+16log2
16
⎛
⎝⎜
⎞
⎠⎟= 0.78
entropy(c3) = −46log2
46+26log2
26
⎛
⎝⎜
⎞
⎠⎟= 0.45
![Page 39: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/39.jpg)
Mean entropy of a clustering: Average entropy over all clusters in the clustering, weighted by number of elements in each cluster: where mi is the number of instances in cluster ci and m is the total number of instances in the clustering.
mean entropy(C) = mi
mi=1
K
∑ entropy(ci )
c1 c2
c3
entropy(c1) = −39log2
39+49log2
49+29log2
29
⎛
⎝⎜
⎞
⎠⎟=1.74
entropy(c2 ) = −16log2
16+46log2
46+16log2
16
⎛
⎝⎜
⎞
⎠⎟= 0.78
entropy(c3) = −46log2
46+26log2
26
⎛
⎝⎜
⎞
⎠⎟= 0.45
mean entropy(C) = 921
(1.74)+ 621
(0.78)+ 621
(0.45) =1.1
![Page 40: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/40.jpg)
What is the mean entropy of this clustering?
c1 c2
c3
![Page 41: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/41.jpg)
Entropy Exercise
![Page 42: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/42.jpg)
Issues for K-means
Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt
![Page 43: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/43.jpg)
Issues for K-means
• The algorithm is only applicable if the mean is defined. – For categorical data, use K-modes: The centroid is
represented by the most frequent values.
Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt
![Page 44: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/44.jpg)
Issues for K-means
• The algorithm is only applicable if the mean is defined. – For categorical data, use K-modes: The centroid is
represented by the most frequent values.
• The user needs to specify K.
Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt
![Page 45: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/45.jpg)
Issues for K-means
• The algorithm is only applicable if the mean is defined. – For categorical data, use K-modes: The centroid is
represented by the most frequent values.
• The user needs to specify K.
• The algorithm is sensitive to outliers – Outliers are data points that are very far away from other
data points.
Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt
![Page 46: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/46.jpg)
Issues for K-means
• The algorithm is only applicable if the mean is defined. – For categorical data, use K-modes: The centroid is
represented by the most frequent values.
• The user needs to specify K.
• The algorithm is sensitive to outliers – Outliers are data points that are very far away from other
data points. – Outliers could be errors in the data recording or some
special data points with very different values.
Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt
![Page 47: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/47.jpg)
CS583, Bing Liu, UIC
Issues for K-means: Problems with outliers
Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt
![Page 48: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/48.jpg)
Dealing with outliers • One method is to remove some data points in the clustering
process that are much further away from the centroids than other data points. – Expensive – Not always a good idea!
Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt
![Page 49: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/49.jpg)
Dealing with outliers • One method is to remove some data points in the clustering
process that are much further away from the centroids than other data points. – Expensive – Not always a good idea!
• Another method is to perform random sampling. Since in
sampling we only choose a small subset of the data points, the chance of selecting an outlier is very small. – Assign the rest of the data points to the clusters by distance or
similarity comparison, or classification
Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt
![Page 50: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/50.jpg)
CS583, Bing Liu, UIC
Issues for K-means (cont …) • The algorithm is sensitive to initial seeds.
+
+
Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt
![Page 51: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/51.jpg)
CS583, Bing Liu, UIC
• If we use different seeds: good results
+ +
Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt
Issues for K-means (cont …)
![Page 52: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/52.jpg)
CS583, Bing Liu, UIC
• If we use different seeds: good results Often can improve K-means results by doing several random restarts.
+ +
Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt
Issues for K-means (cont …)
![Page 53: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/53.jpg)
CS583, Bing Liu, UIC
• If we use different seeds: good results Often can improve K-means results by doing several random restarts.
+ +
Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt
Issues for K-means (cont …)
Often useful to select instances from data as initial seeds.
![Page 54: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/54.jpg)
CS583, Bing Liu, UIC
• The K-means algorithm is not suitable for discovering clusters that are not hyper-ellipsoids (or hyper-spheres).
+
Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt
Issues for K-means (cont …)
![Page 55: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/55.jpg)
Other Issues • What if a cluster is empty?
– Choose a replacement centroid
• At random, or
• From cluster that has highest mean square error
• How to choose K ?
• The assigned reading discusses several methods for improving a clustering with “postprocessing”.
![Page 56: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/56.jpg)
Choosing the K in K-Means • Hard problem! Often no “correct” answer for unlabeled data
• Many proposed methods! Here are a few:
• Try several values of K, see which is best, via cross-validation. – Metrics: mean square error, mean square separation, penalty for too
many clusters [why?]
• Start with K = 2. Then try splitting each cluster. – New means are one sigma away from cluster center in direction of greatest
variation.
– Use similar metrics to above.
![Page 57: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing](https://reader033.vdocuments.us/reader033/viewer/2022043016/5f39330fd0e17328cb101489/html5/thumbnails/57.jpg)
• “Elbow” method: – Plot average MSE vs. K. Choose K at which MSE (or other
metric) stops decreasing abruptly.
– However, sometimes no clear “elbow”
“elbow”