![Page 1: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/1.jpg)
Lecture 17 Gaussian Mixture Models and Expectation
Maximization
Machine Learning
![Page 2: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/2.jpg)
Last Time
• Logistic Regression
![Page 3: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/3.jpg)
Today
• Gaussian Mixture Models• Expectation Maximization
![Page 4: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/4.jpg)
The Problem
• You have data that you believe is drawn from n populations
• You want to identify parameters for each population
• You don’t know anything about the populations a priori– Except you believe that they’re gaussian…
![Page 5: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/5.jpg)
Gaussian Mixture Models
• Rather than identifying clusters by “nearest” centroids• Fit a Set of k Gaussians to the data • Maximum Likelihood over a mixture model
![Page 6: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/6.jpg)
GMM example
![Page 7: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/7.jpg)
Mixture Models
• Formally a Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a distribution,
![Page 8: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/8.jpg)
Gaussian Mixture Models
• GMM: the weighted sum of a number of Gaussians where the weights are determined by a distribution,
![Page 9: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/9.jpg)
Graphical Modelswith unobserved variables
• What if you have variables in a Graphical model that are never observed?– Latent Variables
• Training latent variable models is an unsupervised learning application
laughing
amused
sweating
uncomfortable
![Page 10: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/10.jpg)
Latent Variable HMMs
• We can cluster sequences using an HMM with unobserved state variables
• We will train latent variable models using Expectation Maximization
![Page 11: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/11.jpg)
Expectation Maximization
• Both the training of GMMs and Graphical Models with latent variables can be accomplished using Expectation Maximization– Step 1: Expectation (E-step)
• Evaluate the “responsibilities” of each cluster with the current parameters
– Step 2: Maximization (M-step)• Re-estimate parameters using the existing
“responsibilities”
• Similar to k-means training.
![Page 12: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/12.jpg)
Latent Variable Representation
• We can represent a GMM involving a latent variable
• What does this give us?TODO: plate notation
![Page 13: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/13.jpg)
GMM data and Latent variables
![Page 14: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/14.jpg)
One last bit
• We have representations of the joint p(x,z) and the marginal, p(x)…
• The conditional of p(z|x) can be derived using Bayes rule.– The responsibility that a mixture component takes for
explaining an observation x.
![Page 15: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/15.jpg)
Maximum Likelihood over a GMM
• As usual: Identify a likelihood function
• And set partials to zero…
![Page 16: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/16.jpg)
Maximum Likelihood of a GMM
• Optimization of means.
![Page 17: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/17.jpg)
Maximum Likelihood of a GMM
• Optimization of covariance
• Note the similarity to the regular MLE without responsibility terms.
![Page 18: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/18.jpg)
Maximum Likelihood of a GMM
• Optimization of mixing term
![Page 19: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/19.jpg)
MLE of a GMM
![Page 20: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/20.jpg)
EM for GMMs
• Initialize the parameters– Evaluate the log likelihood
• Expectation-step: Evaluate the responsibilities
• Maximization-step: Re-estimate Parameters– Evaluate the log likelihood– Check for convergence
![Page 21: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/21.jpg)
EM for GMMs
• E-step: Evaluate the Responsibilities
![Page 22: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/22.jpg)
EM for GMMs
• M-Step: Re-estimate Parameters
![Page 23: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/23.jpg)
Visual example of EM
![Page 24: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/24.jpg)
Potential Problems
• Incorrect number of Mixture Components
• Singularities
![Page 25: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/25.jpg)
Incorrect Number of Gaussians
![Page 26: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/26.jpg)
Incorrect Number of Gaussians
![Page 27: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/27.jpg)
Singularities
• A minority of the data can have a disproportionate effect on the model likelihood.
• For example…
![Page 28: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/28.jpg)
GMM example
![Page 29: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/29.jpg)
Singularities
• When a mixture component collapses on a given point, the mean becomes the point, and the variance goes to zero.
• Consider the likelihood function as the covariance goes to zero.
• The likelihood approaches infinity.
![Page 30: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/30.jpg)
Relationship to K-means
• K-means makes hard decisions. – Each data point gets assigned to a single cluster.
• GMM/EM makes soft decisions.– Each data point can yield a posterior p(z|x)
• Soft K-means is a special case of EM.
![Page 31: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/31.jpg)
Soft means as GMM/EM
• Assume equal covariance matrices for every mixture component:
• Likelihood:
• Responsibilities:
• As epsilon approaches zero, the responsibility approaches unity.
![Page 32: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/32.jpg)
Soft K-Means as GMM/EM
• Overall Log likelihood as epsilon approaches zero:
• The expectation of soft k-means is the intercluster variability
• Note: only the means are reestimated in Soft K-means. – The covariance matrices are all tied.
![Page 33: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/33.jpg)
General form of EM
• Given a joint distribution over observed and latent variables:
• Want to maximize:1. Initialize parameters2. E Step: Evaluate:
3. M-Step: Re-estimate parameters (based on expectation of complete-data log likelihood)
4. Check for convergence of params or likelihood
![Page 34: Lecture 17 Gaussian Mixture Models and Expectation Maximization](https://reader036.vdocuments.us/reader036/viewer/2022062302/568166d7550346895ddaf0d8/html5/thumbnails/34.jpg)
Next Time
• Proof of Expectation Maximization in GMMs• Hidden Markov Models