review: mixtures of gaussiansconnection to k-means case study 2: document retrieval. k-means ©sham...

©Sham Kakade 2016 1

Machine Learning for Big Data CSE547/STAT548, University of Washington

John Thickstun

April 26st, 2016

Review: Mixtures of Gaussians

Case Study 2: Document Retrieval

Some Data

Gaussian Mixture Model

Most commonly used mixture model Observations:

Parameters:

Cluster indicator:

Per-cluster likelihood:

Ex. = country of origin, = height of ith person kth mixture component = distribution of heights in country k

Generative Model

• We can think of sampling observations from the model

• For each observation i,– Sample a cluster assignment

– Sample the observation from the selected Gaussian

Also Useful for Density Estimation

Contour Plot of Joint Density

Density as Mixture of Gaussians

• Approximate density with a mixture of Gaussians

Mixture of 3 Gaussians Contour Plot of Joint Density

Density as Mixture of Gaussians

• Approximate density with a mixture of Gaussians

Mixture of 3 Gaussians

Summary of GMM Components

• Observations

• Hidden cluster labels

• Hidden mixture means

• Hidden mixture covariances

• Hidden mixture probabilities

Gaussian mixture marginal and conditional likelihood :

John Thickstun

April 26st, 2016

Application to Document Modeling

Task 2: Cluster Documents

Cluster documents based on topic

Document Representation

Bag of words model

document d

A Generative Model

Documents:

Associated topics:

Parameters:

A Generative Model

Documents:

Associated topics:

Parameters:

Generative model:

Form of Likelihood

Conditioned on topic...

Marginalizing latent topic assignment:

John Thickstun

April 26st, 2016

Review:EM Algorithm

Learning Model Parameters

• Want to learn model parameters

Our actual observations

C. Bishop, Pattern Recognition & Machine Learning

Mixture of 3 Gaussians

ML Estimate of Mixture Model Params

Log likelihood

Want ML estimate

Assume exponential family

Neither convex nor concave and local optima

Complete Data

• Imagine we have an assignment of each xi to a cluster

Our actual observations

C. Bishop, Pattern Recognition & Machine Learning

Complete data labeled

by true cluster assignments

Cluster Responsibilities

• We must infer the cluster assignments from the observations

©Sham Kakade 2016 19C. Bishop, Pattern Recognition & Machine Learning

Posterior probabilities of assignments to each cluster *given* model parameters:

Soft assignments to clusters

Iterative Algorithm

Motivates a coordinate ascent-like algorithm:1. Infer missing values given estimate of parameters

2. Optimize parameters to produce new given “filled in” data

3. Repeat

Example: MoG (derivation soon… + HW)1. Infer “responsibilities”

2. Optimize parameters

Gaussian Mixture Example: Start

After first iteration

After 2nd iteration

After 3rd iteration

After 4th iteration

After 5th iteration

After 6th iteration

After 20th iteration

Expectation Maximization (EM) – Setup

More broadly applicable than just to mixture models considered so far

Model:

Interested in maximizing (wrt ):

Special case:

observable – “incomplete” data

not (fully) observable – “complete” data

parameters

EM Algorithm

Initial guess:

Estimate at iteration t:

E-Step

Compute

M-Step

Compute

Example – Mixture Models

E-Step Compute

M-Step Compute

Consider i.i.d.

Initialization

In mixture model case where , there are many ways to initialize the EM algorithm

Examples: Choose K observations at random to define each cluster. Assign

other observations to the nearest “centriod” to form initial parameter estimates

Pick the centers sequentially to provide good coverage of data

Grow mixture model by splitting (and sometimes removing) clusters until K clusters are formed

Can be quite important to convergence rates in practice

What you need to know

• Mixture model formulation– Generative model

– Likelihood

• Expectation Maximization (EM) Algorithm– Derivation

– Concept of non-decreasing log likelihood

– Application to standard mixture models

John Thickstun

April 26st, 2016

Review:Connection to k-means

K-means

1. Ask user how many clusters they’d like. (e.g. k=5)

2. Randomly guess k cluster Center locations

3. Each datapoint finds out which Center it’s closest to.

4. Each Center finds the centroid of the points it owns

K-means

• Randomly initialize k centers(0) = 1

(0),…, k(0)

• Classify: Assign each point j {1,…N} to nearest center:

• Recenter: i becomes centroid of its point:

– Equivalent to i average of its points!

Special Case: Spherical Gaussians + hard assignments

• If P(X|z=k) is spherical, with same for all classes:

• Then, compare EM objective with k-means:

P(xi | zi = k)µexp -1

2s 2xi -mk

P(zi = k,xi ) =1

(2p )m/2 || Sk ||1/2exp -

2xi -mk( )

Sk-1xi -mk( )

ûúP(zi = k)

review: mixtures of gaussiansconnection to k-means case study 2: document retrieval. k-means ©sham...

Documents

k-means and gmm

“geometric”&data&structures · ©sham&kakade&2017 1...

alternatives to the k-means algorithm that ﬁnd better...

hw2postedtoday. stochasticgradient* descent · 1 1...

k means clustering

yinyang k-means: a drop-in replacement of the classic k ......

k-means clustering - tarleton state university · means...

enhanced k-means clustering...

scalable k-means++

k means clustering algorithm

cartesian k-means

k-means under mahout

tutorial k-means clustering

sm dr. kakade div b.docx

movie recommendation with k-means ......2.1 k-means...

model ai assignment: an introduction to k-means...

faster algorithms for the constrained k-means...

clustering: k-means

k-means and dbscan

towards the world's fastest k-means...