Download - Machine Learning
![Page 1: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/1.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Machine Learning
Clustering
![Page 2: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/2.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Clustering
• Grouping data into (hopefully useful) sets.
Things on the rightThings on the left
![Page 3: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/3.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Clustering• Unsupervised Learning
– No labels
• Why do clustering?– Hypothesis Generation/Data Understanding
• Clusters might suggest natural groups.– Visualization– Data pre-processing, e.g.:
• Medical Diagnosis• Text Classification (e.g., search engines, Google Sets)
![Page 4: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/4.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Some definitions• Let X be the dataset:
• An m-clustering of X is a partition of X into m sets (clusters) C1,…Cm such that:
},...,,{ 321 nxxxxX
if {}, :overlapnot do Clusters .3
:X of allcover Clusters .2
1{}, :empty-non are Clusters .1
1
ijCC
XC
miC
ji
imi
i
![Page 5: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/5.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
How many possible clusters?(Stirling numbers)
nm
i
im ii
m
mmnS
0
)1(!
1),(
Size of dataset
Number of clusters
6810)5,100(
901,115,232,45)4,20(
101,375,2)3,15(
S
S
S
![Page 6: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/6.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
What does this mean?
• We can’t try all possible clusterings.
• Clustering algorithms look at a small fraction of all partitions of the data.
• The exact partitions tried depend on the kind of clustering used.
![Page 7: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/7.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Who is right?
• Different techniques cluster the same data set DIFFERENTLY.
• Who is right? Is there a “right” clustering?
![Page 8: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/8.jpg)
Classic Example: Half Moons
From Batra et al., http://www.cs.cmu.edu/~rahuls/pub/bmvc2008-clustering-rahuls.pdf
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 9: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/9.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Steps in Clustering
• Select Features• Define a Proximity Measure• Define Clustering Criterion• Define a Clustering Algorithm• Validate the Results• Interpret the Results
![Page 10: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/10.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Kinds of Clustering
• Sequential– Fast– Results depend on data order
• Hierarchical– Start with many clusters– join clusters at each step
• Cost Optimization– Fixed number of clusters (typically)– Boundary detection, Probabilistic
classifiers
![Page 11: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/11.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
A Sequential Clustering Method
• Basic Sequential Algorithmic Scheme (BSAS)– S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic
Press, London England, 1999
• Assumption: The number of clusters is not known in advance.
• Let:d(x,C) be the distance between feature
vector x and cluster C. be the threshold of dissimilarityq be the maximum number of
clusters
![Page 12: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/12.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
BSAS Pseudo Code
End
End
}{
Else
}{
1
)( and )),(( If
),(min),(: Find
to2For
}{
1
11
ikk
im
ki
jij
kik
x C C
x C
mm
qmCxd
CxdCxdC
n i
xC
m
![Page 13: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/13.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
A Cost-optimization method
• K-means clustering– J. B. MacQueen (1967): "Some Methods for classification and Analysis of
Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability", Berkeley, University of California Press,
1:281-297
• A greedy algorithm • Partitions n samples into k clusters• minimizes the sum of the squared
distances to the cluster centers
![Page 14: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/14.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
The K-means algorithm
• Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids (means).
• Assign each object to the group that has the closest centroid (mean).
• When all objects have been assigned, recalculate the positions of the K centroids (means).
• Repeat Steps 2 and 3 until the centroids no longer move.
![Page 15: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/15.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
K-means clustering
• The way to initialize the mean values is not specified. – Randomly choose k samples?
• Results depend on the initial means– Try multiple starting points?
• Assumes K is known.– How do we chose this?
![Page 16: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/16.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 17: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/17.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 18: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/18.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
EM Algorithm
• General probabilistic approach to dealing with missing data
• “Parameters” (model)– For MMs: cluster distributions P(x | ci)
• For MoGs: mean i and variance i 2 of each ci
• “Variables” (data)– For MMs: Assignments of data points to clusters
• Probabilities of these represented as P(i | xi )
• Idea: alternately optimize parameters and variables
![Page 19: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/19.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 20: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/20.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 21: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/21.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 22: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/22.jpg)
Mixture Models for Documents
• Learn simultaneously P(w | topic), P(topic | doc)
From Blei et al., 2003 (http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf)
![Page 23: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/23.jpg)
Greedy Hierarchical Clustering
• Initialize one cluster for each data point
• Until done– Merge the two nearest clusters
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 24: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/24.jpg)
Hierarchical Clustering on Strings
• Features = contexts in which strings appear
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 25: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/25.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 26: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/26.jpg)
Summary
• Algorithms:– Sequential clustering
• Requires key distance threshold, sensitive to data order
– K-means clustering• Requires # of clusters, sensitive to initial
conditions• Special case of mixture modeling
– Greedy agglomerative clustering• Naively takes order of n^2 runtime• Hard to tell when you’re “done”
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo