review: mixtures of gaussiansconnection to k-means case study 2: document retrieval. k-means ©sham...

Post on 28-Jan-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

©Sham Kakade 2016 1

Machine Learning for Big Data CSE547/STAT548, University of Washington

John Thickstun

April 26st, 2016

Review: Mixtures of Gaussians

Case Study 2: Document Retrieval

Some Data

©Sham Kakade 2016 2

Gaussian Mixture Model

©Sham Kakade 2016 3

Most commonly used mixture model Observations:

Parameters:

Cluster indicator:

Per-cluster likelihood:

Ex. = country of origin, = height of ith person kth mixture component = distribution of heights in country k

Generative Model

• We can think of sampling observations from the model

• For each observation i,– Sample a cluster assignment

– Sample the observation from the selected Gaussian

©Sham Kakade 2016 4

Also Useful for Density Estimation

©Sham Kakade 2016 5

Contour Plot of Joint Density

Density as Mixture of Gaussians

• Approximate density with a mixture of Gaussians

©Sham Kakade 2016 6

Mixture of 3 Gaussians Contour Plot of Joint Density

Density as Mixture of Gaussians

• Approximate density with a mixture of Gaussians

©Sham Kakade 2016 7

Mixture of 3 Gaussians

Summary of GMM Components

• Observations

• Hidden cluster labels

• Hidden mixture means

• Hidden mixture covariances

• Hidden mixture probabilities

©Sham Kakade 2016 8

Gaussian mixture marginal and conditional likelihood :

©Sham Kakade 2016 9

Machine Learning for Big Data CSE547/STAT548, University of Washington

John Thickstun

April 26st, 2016

Application to Document Modeling

Case Study 2: Document Retrieval

Task 2: Cluster Documents

©Sham Kakade 2016 10

Now:

Cluster documents based on topic

Document Representation

©Sham Kakade 2016 11

Bag of words model

document d

A Generative Model

©Sham Kakade 2016 12

Documents:

Associated topics:

Parameters:

A Generative Model

©Sham Kakade 2016 13

Documents:

Associated topics:

Parameters:

Generative model:

Form of Likelihood

©Sham Kakade 2016 14

Conditioned on topic...

Marginalizing latent topic assignment:

©Sham Kakade 2016 15

Machine Learning for Big Data CSE547/STAT548, University of Washington

John Thickstun

April 26st, 2016

Review:EM Algorithm

Case Study 2: Document Retrieval

Learning Model Parameters

• Want to learn model parameters

©Sham Kakade 2016 16

Our actual observations

C. Bishop, Pattern Recognition & Machine Learning

Mixture of 3 Gaussians

ML Estimate of Mixture Model Params

©Sham Kakade 2016 17

Log likelihood

Want ML estimate

Assume exponential family

Neither convex nor concave and local optima

Complete Data

• Imagine we have an assignment of each xi to a cluster

©Sham Kakade 2016 18

Our actual observations

C. Bishop, Pattern Recognition & Machine Learning

Complete data labeled

by true cluster assignments

Cluster Responsibilities

• We must infer the cluster assignments from the observations

©Sham Kakade 2016 19C. Bishop, Pattern Recognition & Machine Learning

Posterior probabilities of assignments to each cluster *given* model parameters:

Soft assignments to clusters

Iterative Algorithm

©Sham Kakade 2016 20

Motivates a coordinate ascent-like algorithm:1. Infer missing values given estimate of parameters

2. Optimize parameters to produce new given “filled in” data

3. Repeat

Example: MoG (derivation soon… + HW)1. Infer “responsibilities”

2. Optimize parameters

Gaussian Mixture Example: Start

©Sham Kakade 2016 21

After first iteration

©Sham Kakade 2016 22

After 2nd iteration

©Sham Kakade 2016 23

After 3rd iteration

©Sham Kakade 2016 24

After 4th iteration

©Sham Kakade 2016 25

After 5th iteration

©Sham Kakade 2016 26

After 6th iteration

©Sham Kakade 2016 27

After 20th iteration

©Sham Kakade 2016 28

Expectation Maximization (EM) – Setup

©Sham Kakade 2016 29

More broadly applicable than just to mixture models considered so far

Model:

Interested in maximizing (wrt ):

Special case:

observable – “incomplete” data

not (fully) observable – “complete” data

parameters

EM Algorithm

©Sham Kakade 2016 30

Initial guess:

Estimate at iteration t:

E-Step

Compute

M-Step

Compute

Example – Mixture Models

©Sham Kakade 2016 31

E-Step Compute

M-Step Compute

Consider i.i.d.

Initialization

©Sham Kakade 2016 32

In mixture model case where , there are many ways to initialize the EM algorithm

Examples: Choose K observations at random to define each cluster. Assign

other observations to the nearest “centriod” to form initial parameter estimates

Pick the centers sequentially to provide good coverage of data

Grow mixture model by splitting (and sometimes removing) clusters until K clusters are formed

Can be quite important to convergence rates in practice

What you need to know

• Mixture model formulation– Generative model

– Likelihood

• Expectation Maximization (EM) Algorithm– Derivation

– Concept of non-decreasing log likelihood

– Application to standard mixture models

©Sham Kakade 2016 33

©Sham Kakade 2016 34

Machine Learning for Big Data CSE547/STAT548, University of Washington

John Thickstun

April 26st, 2016

Review:Connection to k-means

Case Study 2: Document Retrieval

K-means

©Sham Kakade 2016 35

1. Ask user how many clusters they’d like. (e.g. k=5)

2. Randomly guess k cluster Center locations

3. Each datapoint finds out which Center it’s closest to.

4. Each Center finds the centroid of the points it owns

K-means

• Randomly initialize k centers(0) = 1

(0),…, k(0)

• Classify: Assign each point j {1,…N} to nearest center:

• Recenter: i becomes centroid of its point:

– Equivalent to i average of its points!

©Sham Kakade 2016 36

Special Case: Spherical Gaussians + hard assignments

• If P(X|z=k) is spherical, with same for all classes:

• Then, compare EM objective with k-means:

©Sham Kakade 2016 37

P(xi | zi = k)µexp -1

2s 2xi -mk

ëê

ù

ûú

P(zi = k,xi ) =1

(2p )m/2 || Sk ||1/2exp -

1

2xi -mk( )

T

Sk-1xi -mk( )

é

ëê

ù

ûúP(zi = k)

top related