clustering algorithms presented by michael smaili cs 157b spring 2009 1
TRANSCRIPT
![Page 1: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/1.jpg)
Clustering Algorithms
Presented byMichael SmailiCS 157BSpring 2009
1
![Page 2: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/2.jpg)
Terminology
Cluster: a small group or bunch of something
Clustering: the unsupervised classification of patterns (observations, data items, or feature vectors) into clusters. Also referred to as data clustering, cluster analysis, typological analysis
Centroid: center of a cluster
Distance Measure: determines how the similarity between two elements is calculated.
Dendrogram: a tree diagram frequently used to illustrate the arrangement of the clusters
3
![Page 3: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/3.jpg)
Applications
Data Mining
Pattern Recognition
Machine Learning
Image Analysis
Bioinformatics
and many more…
4
![Page 4: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/4.jpg)
Simple Example
5
![Page 5: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/5.jpg)
Idea Behind Unsupervised Learning
You walk into a bar. A stranger approaches and tells you:
“I’ve got data from k classes. Each class produces observations with a normal distribution and variance σ2I. Standard simple multivariate Gaussian assumptions. I can tell you all the P(wi)’s .”
So far, looks straightforward. “I need a maximum likelihood estimate of the μi’s .“
No problem: “There’s just one thing. None of the data are labeled. I
have datapoints, but I don’t know what class they’re from (any of them!)
6
![Page 6: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/6.jpg)
Classifications
Exclusive Clustering
Overlapping Clustering
Hierarchical Clustering
Probabilistic Clustering
7
![Page 7: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/7.jpg)
Exclusive Clustering
8
Data that is grouped in an exclusive way, so that if a particular data belongs to a definite cluster then it could not be included in another cluster.
Separation of points are achieved by a straight line on a bi-dimensional plane.
![Page 8: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/8.jpg)
Overlapping Clustering
9
Uses fuzzy sets to cluster data, so that each point may belong to two or more clusters with different degrees of membership.
Various data belongs to multiple clusters.
![Page 9: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/9.jpg)
Hierarchical Clustering
Builds (agglomerative), or breaks up (divisive), a hierarchy of clusters. Agglomerative algorithms begin at the leaves of the tree, whereas divisive algorithms begin at the root.
Agglomerative Clustering
10
![Page 10: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/10.jpg)
Probabilistic Clustering
11
Typically shown as a model in an attempt to optimize the fit between the data and the model using a probabilistic approach. Each cluster can be represented by a parametric distribution, like a Gaussian (continuous) or a Poisson (discrete) and the entire data set is therefore modeled by a mixture of these distributions.
Mixture of Multivariate Gaussian
![Page 11: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/11.jpg)
Distance Measure
Common Measures Euclidean distance Manhattan distance Maximum norm Mahalanobis distance Hamming distance Minkowski distance (higher dimensional data)
12
![Page 12: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/12.jpg)
4 Most Commonly Used Algorithms
K-means
Fuzzy C-means
Hierarchical
Mixture of Gaussians
13
![Page 13: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/13.jpg)
K-means
14
An Exclusive Clustering algorithm whose steps are as follows:1) Let k be the number of clusters
2) Randomly generate k clusters and determine the cluster centers, or generate k random points as temporary cluster centers
3) Assign each point to the nearest cluster center
4) Recompute the new cluster centers
5) Repeat steps 3 and 4 until the centroids no longer move
![Page 14: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/14.jpg)
K-means
Suppose: n vectors x1, x2, ..., xn where each fall into k compact clusters, k < n. Let mi be the center points of the vectors in cluster i. Make initial guesses for the points m1, m2, ..., mk Until there are no changes in any point
Use the estimated points to classify the samples into clusters For every cluster, replace mi with the point of all of the
samples for cluster i
Sample points m1 and m2 moving towards the center of two clusters
15
![Page 15: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/15.jpg)
Fuzzy C-means
16
An Exclusive Clustering algorithm whose steps are as follows:
1) Initialize U = [uij] matrix, U(0)
2) At k-step: calculate center vectors C(k)=[cj] with U(k)
3) Update U(k), U(k+1)
4) Repeat steps 2 and 3 until ||U(k+1) – U(k)|| < ε where ε is the given sensitivity threshold
![Page 16: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/16.jpg)
Fuzzy C-means
17
Suppose: 20 data and 3 clusters are used to initialize the algorithm and to compute the U matrix. Color of the data in the graph below is that of the nearest cluster. Assume a fuzzyness coefficient m = 2 and ε = 0.3.
Initial graph Final condition reached after 8 steps
![Page 17: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/17.jpg)
Fuzzy C-means
18
Can we do better? Yes, but the result is more computations. Assume same conditions as before except ε = 0.01.
Result? Final condition reached after 37 steps!
![Page 18: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/18.jpg)
Hierarchical
19
A Hierarchical Clustering algorithm whose steps are as follows (agglomerative):1) Assign each item to a cluster
2) Find the closest (most similar) pair of clusters and merge them into a single cluster
3) Compute distances (similarities) between the new cluster and the old clusters using:
single-linkage complete-linkage average-linkage
4) Repeat steps 2 and 3 until there is just a single cluster
![Page 19: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/19.jpg)
Hierarchical
19
There is also a divisive hierarchical clustering which does the reverse by starting with all objects in one cluster and subdividing , however divisive methods are generally not available, and rarely have been applied.
![Page 20: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/20.jpg)
Single-Linkage
Definitions: proximity matrix D=[d(i,j)] and L(k) is the level of the kth clustering. d[(r),(s)] = proximity between clusters (r) and (s). Follow these steps:1) Begin with level L(0) = 0 and sequence number m = 0
2) Find a pair (r), (s), such that d[(r),(s)] = min d[(i),(j)] where the minimum is over all pairs of clusters in the current clustering.
3) Increment the sequence number: m = m + 1 and merge clusters (r) and (s) into a single cluster, L(m) = d[(r),(s)]
4) Update D by deleting the rows and columns for clusters (r) and (s) and adding a row and column for the newly formed cluster. The proximity between new cluster (r,s) and old cluster (k) is:
d[(k), (r,s)] = min d[(k), (r)], d[(k), (s)]
5) Repeat steps 2 thru 4 until all objects are in one cluster
20
![Page 21: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/21.jpg)
Single-Linkage
Suppose we want a hierarchical clustering of distances between some Italian cities using single-linkage.
Nearest pair is MI and TO at distance 138 merge into cluster“MI/TO” with its level L(MI/TO) = 138. Sequence number m = 1
21
![Page 22: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/22.jpg)
Single-Linkage
Next we compute the distance from this new compound object to all other objects. In single-linkage clustering the rule is that the distance from the compound object to another object is equal to the shortest distance from any member of the cluster to the outside object.
Distance from "MI/TO" to RM is chosen to be 564, which is the
distance from MI to RM
22
![Page 23: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/23.jpg)
Single-Linkage
After some computation we have:
Finally, we merge the last 2 clusters
23
Hierarchical Tree
![Page 24: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/24.jpg)
Mixture of Gaussians
24
A Probabilistic Clustering algorithm whose steps are as follows:1) Assume there are k components where ωi
represents the i’th component and has a mean vector μi with a covariance matrix σ2I
![Page 25: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/25.jpg)
Mixture of Gaussians2) Select a random component i with probability P(ωi)
3) A datapoint can then be generated as N(μi,σ2I)
25
![Page 26: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/26.jpg)
Mixture of Gaussian
4) We can define the general datapoint as N(μi, Σi) where each component generates data from a Gaussian with mean μi and covariance matrix Σi.
5) Next, we are interested in calculating the probability that an observation from class ωi, would have the data x given means μi,..., μx:
P(x|ωi,μi,..., μk)
26
![Page 27: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/27.jpg)
Mixture of Gaussian
6) Goal: maximize the probability of a datum given the centers of the Gaussians
P(x|μi) = Σi P(ωi) P(x|ωi,μ1, μ2,…, μk)
P(data|μi) = ∏ Σi P(ωi) P(x|ωi,μ1, μ2,…, μk)
The most popular and simple algorithm that is used is the Expectation-Maximization (EM) algorithm
27
i=1
N
![Page 28: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/28.jpg)
Expectation-Maximization (EM)
An iterative algorithm for finding maximum likelihood estimates of parameters in probabilistic models whose steps are as follows:1) Initialize the distribution parameters2) Estimate the Expected value of the unknown variables3) Re-estimate the distribution parameters to Maximize the
likelihood of the data4) Repeat steps 2 and 3 until convergence
28
![Page 29: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/29.jpg)
Expectation-Maximization (EM) Given probabilities of grades in a class: P(A) = ½, P(B) = μ, P(C) = 2μ, P(D) =½ - 3μ where 0<=μ<=1/6 What is the maximum likelihood estimate of μ?
We begin with a guess for μ, iterating between E and M to improve our estimates of μ and a and b (a = # of A’s and b = #’s of B’s)
Define μ(t) as the estimate of μ on the t’th iteration b(t) as the estimate of b on the t’th iteration
29
![Page 30: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/30.jpg)
EM Convergence
Prob(data|μ) must increase or remain the same between each iteration but it can never exceed 1, therefore it must converge.
30
![Page 31: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/31.jpg)
Mixture of Gaussian
Now let us see the affects of the probabilistic approach over several iterations using the EM algorithm.
31
![Page 32: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/32.jpg)
Mixture of Gaussian
After the first iteration
32
![Page 33: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/33.jpg)
Mixture of Gaussian
After the second iteration
33
![Page 34: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/34.jpg)
Mixture of Gaussian
After the third iteration
34
![Page 35: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/35.jpg)
Mixture of Gaussian
After the fourth iteration
35
![Page 36: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/36.jpg)
Mixture of Gaussian
After the fifth iteration
36
![Page 37: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/37.jpg)
Mixture of Gaussian
After the sixth iteration
37
![Page 38: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/38.jpg)
Mixture of Gaussian
After the 20th iteration
38
![Page 39: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/39.jpg)
Questions?
39
![Page 40: Clustering Algorithms Presented by Michael Smaili CS 157B Spring 2009 1](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649f435503460f94c63f73/html5/thumbnails/40.jpg)
References
“A Tutorial on Clustering Algorithms”, http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/
“Cluster Analysis”, http://en.wikipedia.org/wiki/Data_clustering
“Clustering”, http://www.cs.cmu.edu/afs/andrew/course/15/381-f08/www/lectures/clustering.pdf
“Clustering with Gaussian Mixtures”, http://autonlab.org/tutorials/gmm14.pdf.
“Data Clustering, A Review”, http://www.cs.rutgers.edu/~mlittman/courses/lightai03/jain99data.pdf
“Finding Communities by Clustering a Graph into Overlapping Subgraphs”, www.cs.rpi.edu/~magdon/talks/clusteringIADIS05.ppt
40