cse601 mixture model clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials ›...
TRANSCRIPT
![Page 1: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/1.jpg)
Clustering Lecture 5: Mixture Model
Jing Gao SUNY Buffalo
1
![Page 2: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/2.jpg)
Outline
• Basics – Motivation, definition, evaluation
• Methods – Partitional
– Hierarchical
– Density-based
– Mixture model
– Spectral methods
• Advanced topics – Clustering ensemble
– Clustering in MapReduce
– Semi-supervised clustering, subspace clustering, co-clustering, etc.
2
![Page 3: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/3.jpg)
Using Probabilistic Models for Clustering
• Hard vs. soft clustering
– Hard clustering: Every point belongs to exactly one cluster
– Soft clustering: Every point belongs to several clusters with
certain degrees
• Probabilistic clustering
– Each cluster is mathematically represented by a parametric distribution
– The entire data set is modeled by a mixture of these distributions
3
![Page 4: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/4.jpg)
x
f(x)
μ
σ
Changing μ shifts the distribution left or right
Changing σ increases or decreases the spread
Probability density function f(x) is a function of x given μ and σ
Gaussian Distribution
4
))(2
1exp(
2
1),|( 22
xxN
![Page 5: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/5.jpg)
Likelihood
x
f(x)
Define likelihood as a function of μ and σ given x1, x2, …, xn
Which Gaussian distribution is more likely to generate the data?
5
n
i
ixN1
2 ),|(
![Page 6: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/6.jpg)
6
Gaussian Distribution
• Multivariate Gaussian
• Log likelihood
mean covariance
|)|ln))()(2
1(),|(ln),( 1
11
i
T
i
n
i
n
i
i xxxNL
![Page 7: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/7.jpg)
7
Maximum Likelihood Estimate
• MLE
– Find model parameters that maximize log likelihood
• MLE for Gaussian
),( L
,
![Page 8: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/8.jpg)
8
Gaussian Mixture
• Linear combination of Gaussians
where
parameters to be estimated
![Page 9: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/9.jpg)
• To generate a data point:
– first pick one of the clusters with probability
– then draw a sample from that cluster distribution
9
Gaussian Mixture
![Page 10: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/10.jpg)
10
• Maximize log likelihood
• Each data point is generated by one of K clusters, a latent variable is associated with each
• Regard the values of latent variables as missing
Gaussian Mixture
![Page 11: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/11.jpg)
11
Expectation-Maximization (EM) Algorithm
• E-step: for given parameter values we can compute the expected values of the latent variables
– Note that instead of but we
still have
![Page 12: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/12.jpg)
12
• M-step: maximize the expected complete log likelihood
• Parameter update:
Expectation-Maximization (EM) Algorithm
![Page 13: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/13.jpg)
13
EM Algorithm
• Iterate E-step and M-step until the log likelihood of data does not increase any more.
– Converge to local optimal
– Need to restart algorithm with different initial guess of parameters (as in K-means)
• Relation to K-means
– Consider GMM with common covariance
– As , two methods coincide
![Page 14: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/14.jpg)
14 14
![Page 15: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/15.jpg)
15 15
![Page 16: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/16.jpg)
16 16
![Page 17: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/17.jpg)
17 17
![Page 18: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/18.jpg)
18 18
![Page 19: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/19.jpg)
19 19
![Page 20: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/20.jpg)
20
K-means vs GMM
• Objective function – Minimize sum of squared error
• Can be optimized by an EM algorithm – E-step: assign points to clusters
– M-step: optimize cluster centers
– Performs hard assignment during E-step
• Assumes spherical clusters with equal probability of a cluster
• Objective function
– Maximize log-likelihood
• EM algorithm
– E-step: Compute posterior probability of membership
– M-step: Optimize parameters
– Perform soft assignment during E-step
• Can be used for non-spherical clusters
• Can generate clusters with different probabilities
![Page 21: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/21.jpg)
Mixture Model
• Strengths
– Give probabilistic cluster assignments
– Have probabilistic interpretation
– Can handle clusters with varying sizes, variance etc.
• Weakness
– Initialization matters
– Choose appropriate distributions
– Overfitting issues
21
![Page 22: CSE601 Mixture Model Clustering - cse.buffalo.edu › ~jing › cse601 › fa13 › materials › clustering_mixture.pdf–Performs hard assignment during E-step • Assumes spherical](https://reader030.vdocuments.us/reader030/viewer/2022041112/5f1be690f2d9e74f1253f539/html5/thumbnails/22.jpg)
Take-away Message
• Probabilistic clustering
• Maximum likelihood estimate
• Gaussian mixture model for clustering
• EM algorithm that assigns points to clusters and estimates model parameters alternatively
• Strengths and weakness
22