machine learning
DESCRIPTION
Machine Learning. Clustering. Clustering. Grouping data into (hopefully useful) sets. Things on the left. Things on the right. Clustering. Unsupervised Learning No labels Why do clustering? Hypothesis Generation/Data Understanding Clusters might suggest natural groups. Visualization - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/1.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Machine Learning
Clustering
![Page 2: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/2.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Clustering
• Grouping data into (hopefully useful) sets.
Things on the rightThings on the left
![Page 3: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/3.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Clustering• Unsupervised Learning
– No labels
• Why do clustering?– Hypothesis Generation/Data Understanding
• Clusters might suggest natural groups.– Visualization– Data pre-processing, e.g.:
• Medical Diagnosis• Text Classification (e.g., search engines, Google Sets)
![Page 4: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/4.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Some definitions• Let X be the dataset:
• An m-clustering of X is a partition of X into m sets (clusters) C1,…Cm such that:
},...,,{ 321 nxxxxX
if {}, :overlapnot do Clusters .3
:X of allcover Clusters .2
1{}, :empty-non are Clusters .1
1
ijCC
XC
miC
ji
imi
i
![Page 5: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/5.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
How many possible clusters?(Stirling numbers)
nm
i
im ii
m
mmnS
0
)1(!
1),(
Size of dataset
Number of clusters
6810)5,100(
901,115,232,45)4,20(
101,375,2)3,15(
S
S
S
![Page 6: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/6.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
What does this mean?
• We can’t try all possible clusterings.
• Clustering algorithms look at a small fraction of all partitions of the data.
• The exact partitions tried depend on the kind of clustering used.
![Page 7: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/7.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Who is right?
• Different techniques cluster the same data set DIFFERENTLY.
• Who is right? Is there a “right” clustering?
![Page 8: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/8.jpg)
Classic Example: Half Moons
From Batra et al., http://www.cs.cmu.edu/~rahuls/pub/bmvc2008-clustering-rahuls.pdf
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 9: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/9.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Steps in Clustering
• Select Features• Define a Proximity Measure• Define Clustering Criterion• Define a Clustering Algorithm• Validate the Results• Interpret the Results
![Page 10: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/10.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
Kinds of Clustering
• Sequential– Fast– Results depend on data order
• Hierarchical– Start with many clusters– join clusters at each step
• Cost Optimization– Fixed number of clusters (typically)– Boundary detection, Probabilistic
classifiers
![Page 11: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/11.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
A Sequential Clustering Method
• Basic Sequential Algorithmic Scheme (BSAS)– S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic
Press, London England, 1999
• Assumption: The number of clusters is not known in advance.
• Let:d(x,C) be the distance between feature
vector x and cluster C. be the threshold of dissimilarityq be the maximum number of
clusters
![Page 12: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/12.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
BSAS Pseudo Code
End
End
}{
Else
}{
1
)( and )),(( If
),(min),(: Find
to2For
}{
1
11
ikk
im
ki
jij
kik
x C C
x C
mm
qmCxd
CxdCxdC
n i
xC
m
![Page 13: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/13.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
A Cost-optimization method
• K-means clustering– J. B. MacQueen (1967): "Some Methods for classification and Analysis of
Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability", Berkeley, University of California Press,
1:281-297
• A greedy algorithm • Partitions n samples into k clusters• minimizes the sum of the squared
distances to the cluster centers
![Page 14: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/14.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
The K-means algorithm
• Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids (means).
• Assign each object to the group that has the closest centroid (mean).
• When all objects have been assigned, recalculate the positions of the K centroids (means).
• Repeat Steps 2 and 3 until the centroids no longer move.
![Page 15: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/15.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
K-means clustering
• The way to initialize the mean values is not specified. – Randomly choose k samples?
• Results depend on the initial means– Try multiple starting points?
• Assumes K is known.– How do we chose this?
![Page 16: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/16.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 17: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/17.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 18: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/18.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
EM Algorithm
• General probabilistic approach to dealing with missing data
• “Parameters” (model)– For MMs: cluster distributions P(x | ci)
• For MoGs: mean i and variance i 2 of each ci
• “Variables” (data)– For MMs: Assignments of data points to clusters
• Probabilities of these represented as P(i | xi )
• Idea: alternately optimize parameters and variables
![Page 19: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/19.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 20: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/20.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 21: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/21.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 22: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/22.jpg)
Mixture Models for Documents
• Learn simultaneously P(w | topic), P(topic | doc)
From Blei et al., 2003 (http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf)
![Page 23: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/23.jpg)
Greedy Hierarchical Clustering
• Initialize one cluster for each data point
• Until done– Merge the two nearest clusters
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 24: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/24.jpg)
Hierarchical Clustering on Strings
• Features = contexts in which strings appear
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 25: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/25.jpg)
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo
![Page 26: Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062500/5681514f550346895dbf7267/html5/thumbnails/26.jpg)
Summary
• Algorithms:– Sequential clustering
• Requires key distance threshold, sensitive to data order
– K-means clustering• Requires # of clusters, sensitive to initial
conditions• Special case of mixture modeling
– Greedy agglomerative clustering• Naively takes order of n^2 runtime• Hard to tell when you’re “done”
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo