cs 679: text mining lecture #13: gibbs sampling for lda credit: many slides are from presentations...

41
CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License .

Upload: bernadette-morton

Post on 28-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

CS 679: Text Mining

Lecture #13: Gibbs Sampling for LDA

Credit: Many slides are from presentations by Tom Griffiths of Berkeley.

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.

Page 2: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Announcements

Required reading for Today Griffiths & Steyvers: “Finding Scientific Topics”

Final Project Proposal Clear, detailed: ideally, the first half of your project

report! Talk to me about ideas Teams are an option Due date to be specified

Page 3: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Objectives

Gain further understanding of LDA Understand the intractability of inference with

the model Gain further insight into Gibbs sampling Understand how to estimate the parameters

of interest in LDA using a collapsed Gibbs sampler

Page 4: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Latent Dirichlet Allocation(slightly different symbols this time)

NdD

zi

wi

(d)

(j)

(d) Dirichlet()

zi Categorical( (d) ) (j) Dirichlet()

wi Categorical( (zi) )

T

distribution over topicsfor each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

(Blei, Ng, & Jordan, 2001; 2003)

Page 5: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

The Statistical Problem of Meaning

Generating data from parameters is easy

Learning parameters from data is hard

What does it mean to identify the “meaning” of a document?

Page 6: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Estimation ofthe LDA Generative Model

Maximum likelihood estimation (EM) Similar to method presented by Hofmann for pLSI

(1999) Deterministic approximate algorithms

Variational EM (Blei, Ng & Jordan, 2001, 2003) Expectation propagation (Minka & Lafferty, 2002)

Markov chain Monte Carlo – our focus Full Gibbs sampler (Pritchard et al., 2000) Collapsed Gibbs sampler (Griffiths & Steyvers, 2004)

The papers you read for today

Page 7: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Review: Markov Chain Monte Carlo (MCMC)

Sample from a Markov chain converges to a target distribution

Allows sampling from an unnormalized posterior distribution

Can compute approximate statistics from intractable distributions

(MacKay, 2002)

Page 8: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Review: Gibbs Sampling

Most straightforward kind of MCMC For variables Require the full (or “complete”) conditional

distribution for each variable:Draw from

x-i = x1(t), x2

(t),…, xi-1(t)

, xi+1(t-1)

, …, xn(t-1)

Page 9: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Bayesian Inference in LDA We would like to reason with the full joint distribution:

Given , the distribution over the latent variables is desirable, but the denominator (the marginal likelihood) is intractable to compute:

We marginalize the model parameters out of the joint distribution so that we can focus on the words in the corpus () and their assigned topics ():

This leads to our use of the term “collapsed sampler”

Page 10: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Posterior Inference in LDA

From this marginalized joint dist., we can compute the posterior distribution over topics for a given corpus ():

But possible topic assignments, where is the number of tokens in the corpus! i.e., inference is still intractable!

Working with this topic posterior is only tractable up to a constant multiple:

Page 11: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Collapsed Gibbs Sampler for LDA

Since we’re now focusing on the topic posterior, namely:

Let’s find these factors by marginalizing separately:

( | , ) ( | , ) ( | ),P P p d w z w z

( | ) ( |, ,) ( | )P P p d z z

( )

( )1

( ) ( )

( ) ( )

wTjwW w

j jw

n W

n

D

dj

dj

T

j

dj

n

Tn

1)(

)(

)(

)(

)(

)(

Where:• is the number of times word assigned to topic

Page 12: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Collapsed Gibbs Sampler for LDA We only sample each ! Complete (or full) conditionals can now be derived for each in

.

Where:• is the document in which word wi occurs• is the number of times (ignoring position i) word w assigned to topic j• is the number of times (ignoring position i) topic j used in document d

( ) ( ), ,

( )( ), ,

| , , ), ( ( | )( | )i i

i

i i i i

w di j

i i

i j

dwi j i k

w k

i iz j PP z j P w

n

n

z

n

n T

j

W

z w z w z

Page 13: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Steps for deriving the complete conditionals

1. Begin with the full joint distribution over the data, latent variables, and model parameters, given the fixed parameters and of the prior distributions.

2. Write out the desired collapsed joint distribution and set it equal to the appropriate integral over the full joint in order to marginalize over and .

3. Perform algebra and group like terms.4. Expand the generic notation by applying the closed-form definitions of the

Multinomial, Categorical, and Dirichlet distributions.5. Transform the representation: change the product indices from products over

documents and word sequences, to products over cluster labels and token counts.

6. Simplify by combining products, adding exponents and pulling constant multipliers outside of integrals.

7. When you have integrals over terms that are in the form of the kernel of the Dirichlet distribution, consider how to convert the result into a familiar distribution.

8. Once you have the expression for the joint, derive the expression for the conditional distribution

Page 14: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Collapsed Gibbs Sampler for LDA

For = 1 to :For variables (i.e., for to ):

Draw from

Page 15: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Collapsed Gibbs Sampler for LDA

This is nicer than your average Gibbs sampler: Memory: counts (the “” counts) can be cached

in two sparse matrices No special functions, simple arithmetic The distributions on and are analytic in topic

assignments and , and can later be recomputed from the samples in a given iteration of the sampler: from | from

Page 16: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Gibbs sampling in LDA

i wi di zi123456789101112...50

MATHEMATICSKNOWLEDGERESEARCHWORK

MATHEMATICSRESEARCHWORK

SCIENTIFICMATHEMATICS

WORKSCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

iteration1

T=2 Nd=10 M=5

Page 17: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Gibbs sampling in LDA

i wi di zi zi123456789101112...50

MATHEMATICSKNOWLEDGERESEARCHWORK

MATHEMATICSRESEARCHWORK

SCIENTIFICMATHEMATICS

WORKSCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

?

iteration1 2

Page 18: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Gibbs sampling in LDA

i wi di zi zi123456789101112...50

MATHEMATICSKNOWLEDGERESEARCHWORK

MATHEMATICSRESEARCHWORK

SCIENTIFICMATHEMATICS

WORKSCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

?

iteration1 2

( ) ( ), ,

( )( ), ,

( | , )i i

i

w di j i j

i i dwi j i k

w k

n nP z j

n W n T

z w

Page 19: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Gibbs sampling in LDA

i wi di zi zi123456789101112...50

MATHEMATICSKNOWLEDGERESEARCHWORK

MATHEMATICSRESEARCHWORK

SCIENTIFICMATHEMATICS

WORKSCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

?

iteration1 2

( ) ( ), ,

( )( ), ,

( | , )i i

i

w di j i j

i i dwi j i k

w k

n nP z j

n W n T

z w

Page 20: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Gibbs sampling in LDA

i wi di zi zi123456789101112...50

MATHEMATICSKNOWLEDGERESEARCHWORK

MATHEMATICSRESEARCHWORK

SCIENTIFICMATHEMATICS

WORKSCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

2?

iteration1 2

( ) ( ), ,

( )( ), ,

( | , )i i

i

w di j i j

i i dwi j i k

w k

n nP z j

n W n T

z w

Page 21: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Gibbs sampling in LDA

i wi di zi zi123456789101112...50

MATHEMATICSKNOWLEDGERESEARCHWORK

MATHEMATICSRESEARCHWORK

SCIENTIFICMATHEMATICS

WORKSCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

21?

iteration1 2

( ) ( ), ,

( )( ), ,

( | , )i i

i

w di j i j

i i dwi j i k

w k

n nP z j

n W n T

z w

Page 22: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Gibbs sampling in LDA

i wi di zi zi123456789101112...50

MATHEMATICSKNOWLEDGERESEARCHWORK

MATHEMATICSRESEARCHWORK

SCIENTIFICMATHEMATICS

WORKSCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

211?

iteration1 2

( ) ( ), ,

( )( ), ,

( | , )i i

i

w di j i j

i i dwi j i k

w k

n nP z j

n W n T

z w

Page 23: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Gibbs sampling in LDA

i wi di zi zi123456789101112...50

MATHEMATICSKNOWLEDGERESEARCHWORK

MATHEMATICSRESEARCHWORK

SCIENTIFICMATHEMATICS

WORKSCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

2112?

iteration1 2

( ) ( ), ,

( )( ), ,

( | , )i i

i

w di j i j

i i dwi j i k

w k

n nP z j

n W n T

z w

Page 24: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Gibbs sampling in LDA

i wi di zi zi zi123456789101112...50

MATHEMATICSKNOWLEDGERESEARCHWORK

MATHEMATICSRESEARCHWORK

SCIENTIFICMATHEMATICS

WORKSCIENTIFICKNOWLEDGE

.

.

.JOY

111111111122...5

221212212111...2

211222212212...1

222122212222...1

iteration1 2 … 1000

( ) ( ), ,

( )( ), ,

( | , )i i

i

w di j i j

i i dwi j i k

w k

n nP z j

n W n T

z w

Page 25: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

pixel = word image = document

sample each pixel froma mixture of topics

A Visual Example: Bars

A toy problem.Just a metaphor for inference on text.

Page 26: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Documents generated from the topics.

Page 27: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Evolution of the topics ( matrix)

Page 28: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Interpretable decomposition

• SVD gives a basis for the data, but not an interpretable one

• The true basis is not orthogonal, so rotation does no good

Page 29: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Effects of Hyper-parameters

and control the relative sparsity of and smaller : fewer topics per document smaller : fewer words per topic

Good assignments z are a compromise in sparsity

Page 30: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Bayesian model selection

How many topics do we need?

A Bayesian would consider the posterior:

Involves summing over assignments z

P(T|w) P(w|T) P(T)

Page 31: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Sweeping T

Page 32: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Analysis of PNAS abstracts Used all D = 28,154 abstracts from 1991-2001 Used any word occurring in at least five abstracts,

not on “stop” list (W = 20,551) Segmentation by any delimiting character, total of n =

3,026,970 word tokens in corpus Also, PNAS class designations for 2001

(Acknowledgment: Kevin Boyack)

Page 33: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Running the algorithm Memory requirements linear in T(W+D), runtime

proportional to nT T = 50, 100, 200, 300, 400, 500, 600, (1000) Ran 8 chains for each T, burn-in of 1000 iterations, 10

samples/chain at a lag of 100

All runs completed in under 30 hours on BlueHorizon supercomputer at San Diego

Page 34: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

How many topics?

Page 35: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Topics by Document Length

Page 36: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

A Selection of Topics

FORCESURFACE

MOLECULESSOLUTIONSURFACES

MICROSCOPYWATERFORCES

PARTICLESSTRENGTHPOLYMER

IONICATOMIC

AQUEOUSMOLECULARPROPERTIES

LIQUIDSOLUTIONS

BEADSMECHANICAL

HIVVIRUS

INFECTEDIMMUNODEFICIENCY

CD4INFECTION

HUMANVIRAL

TATGP120

REPLICATIONTYPE

ENVELOPEAIDSREV

BLOODCCR5

INDIVIDUALSENV

PERIPHERAL

MUSCLECARDIAC

HEARTSKELETAL

MYOCYTESVENTRICULAR

MUSCLESSMOOTH

HYPERTROPHYDYSTROPHIN

HEARTSCONTRACTION

FIBERSFUNCTION

TISSUERAT

MYOCARDIALISOLATED

MYODFAILURE

STRUCTUREANGSTROM

CRYSTALRESIDUES

STRUCTURESSTRUCTURALRESOLUTION

HELIXTHREE

HELICESDETERMINED

RAYCONFORMATION

HELICALHYDROPHOBIC

SIDEDIMENSIONALINTERACTIONS

MOLECULESURFACE

NEURONSBRAIN

CORTEXCORTICAL

OLFACTORYNUCLEUS

NEURONALLAYER

RATNUCLEI

CEREBELLUMCEREBELLAR

LATERALCEREBRAL

LAYERSGRANULELABELED

HIPPOCAMPUSAREAS

THALAMIC

TUMORCANCERTUMORSHUMAN

CELLSBREAST

MELANOMAGROWTH

CARCINOMAPROSTATENORMAL

CELLMETASTATICMALIGNANT

LUNGCANCERS

MICENUDE

PRIMARYOVARIAN

P(w

| z

)

Page 37: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Cold topics Hot topics

Page 38: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Cold topics Hot topics

2SPECIESGLOBALCLIMATE

CO2WATER

ENVIRONMENTALYEARS

MARINECARBON

DIVERSITYOCEAN

EXTINCTIONTERRESTRIALCOMMUNITYABUNDANCE

134MICE

DEFICIENTNORMAL

GENENULL

MOUSETYPE

HOMOZYGOUSROLE

KNOCKOUTDEVELOPMENT

GENERATEDLACKINGANIMALSREDUCED

179APOPTOSIS

DEATHCELL

INDUCEDBCL

CELLSAPOPTOTIC

CASPASEFAS

SURVIVALPROGRAMMED

MEDIATEDINDUCTIONCERAMIDE

EXPRESSION

Page 39: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Cold topics Hot topics

2SPECIESGLOBALCLIMATE

CO2WATER

ENVIRONMENTALYEARS

MARINECARBON

DIVERSITYOCEAN

EXTINCTIONTERRESTRIALCOMMUNITYABUNDANCE

134MICE

DEFICIENTNORMAL

GENENULL

MOUSETYPE

HOMOZYGOUSROLE

KNOCKOUTDEVELOPMENT

GENERATEDLACKINGANIMALSREDUCED

179APOPTOSIS

DEATHCELL

INDUCEDBCL

CELLSAPOPTOTIC

CASPASEFAS

SURVIVALPROGRAMMED

MEDIATEDINDUCTIONCERAMIDE

EXPRESSION

37CDNA

AMINOSEQUENCE

ACIDPROTEINISOLATED

ENCODINGCLONED

ACIDSIDENTITY

CLONEEXPRESSEDENCODES

RATHOMOLOGY

289KDA

PROTEINPURIFIED

MOLECULARMASS

CHROMATOGRAPHYPOLYPEPTIDE

GELSDS

BANDAPPARENTLABELED

IDENTIFIEDFRACTIONDETECTED

75ANTIBODY

ANTIBODIESMONOCLONAL

ANTIGENIGG

MABSPECIFICEPITOPEHUMAN

MABSRECOGNIZED

SERAEPITOPESDIRECTED

NEUTRALIZING

Page 40: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Conclusions

Estimation/inference in LDA is more or less straightforward using Gibbs Sampling i.e., easy!

Not so easy in all graphical models

Page 41: CS 679: Text Mining Lecture #13: Gibbs Sampling for LDA Credit: Many slides are from presentations by Tom Griffiths of Berkeley. This work is licensed

Coming Soon

Topical n-grams