topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic...

28
topic models: priors, stop words and languages hanna m. wallach university of massachusetts amherst [email protected]

Upload: others

Post on 31-Jul-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

topic models:priors, stop words and languages

hanna m. wallachuniversity of massachusetts amherst

[email protected]

Page 2: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Finding Needles in Haystacks

● As more information becomes available, it can be harder and harder to find what we want

● We don't even always know what we want!

● Need new tools to help us organize, search and understand information

Page 3: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

A Solution: Topic Models

● Use topic models to discover hidden topic-based patterns

● Use discovered topics to annotate the collection

● Use annotations to organize, understand, summarize, search...

Page 4: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Topics ↔ Words

Page 5: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Documents ↔ Topics

Page 6: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Probabilistic Modeling

● Treat data as observations that arise from a generative probabilistic process that includes hidden variables:

– For documents, the hidden variables represent the thematic structure of the collection

● Infer the hidden structure using posterior inference:

– What are the topics that describe this collection?

● Situate new data into the estimated model:

– Which topics best describe the new documents?

Page 7: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Documents ↔ Topics ↔ Words

Page 8: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Probabilistic LSA

topics

observedword

document-specifictopic distribution

topicassignment

(Hofmann, 1999)

Page 9: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Latent Dirichlet Allocation

Dirichletparameters

Dirichletparameters

topics

observedword

document-specifictopic distribution

topicassignment

(Blei et al., 2003)

Page 10: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Dirichlet Distribution

● Distribution over K-dimensional positive vectors that sum to one (i.e., points on the probability simplex)

● Two parameters:

– Base measure (positive vector; sums to one)

– Concentration parameter α (positive scalar)

Page 11: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

An Aside: The Simplex

● K-dimensional probability distributions (i.e., points on the K-1 simplex) can be plotted in (K-1)-d:

A

B C

Page 12: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Dirichlet Parameters

Page 13: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Latent Dirichlet Allocation

symmetric priors:uniform base measures

Page 14: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

rethinking lda: why priors matter

hanna m. wallach, david mimno, andrew mccallum

Page 15: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Priors for LDA

● Almost all work on LDA uses symmetric Dirichlet priors

– Two scalar concentration parameters: α and β

● Concentration parameters are usually set heuristically

● Some recent work on inferring optimal concentration parameter values from data (Asuncion et al., 2009)

● No rigorous study of the Dirichlet priors:

– Base measure: asymmetric vs. symmetric

– Treatment: optimize vs. integrate out

Page 16: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Topic Modeling in Practice

● Help! All my topics consist of “the, and of, to, a ...”

– Preprocess data to remove stop words

● Now all my topics consist of “data, model, results ...”

– Make a new corpus-specific stop word list

● Wait, but how do I choose the right number of topics T

– Evaluate probability of held-out data for different T

● That sounds really time-consuming

– Use a nonparametric model ...

Page 17: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Symmetric → Asymmetric

● Use prior over as a running example

● Uniform base measure → nonuniform base measure

● Asymmetric prior: some topics more likely a priori

Page 18: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Predictive Distributions

● Predictive probability of topic in document given

● If has not yet occurred in then

● is smoothed with topic-specific quantity

Page 19: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Handling Unknown m

● Can take a fully Bayesian approach:

– Give a Dirichlet prior:

– Integrate out thanks to conjugacy:

Page 20: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

An Observation

● As α' → ꝏ, the asymmetric hierarchical Dirichlet prior over Θ approaches a symmetric Dirichlet prior:

● Symmetric Dirichlet prior is a special case of the asymmetric hierarchical Dirichlet prior

Page 21: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Four Combinations of Priors

“SS”

“AS”

“SA”

“AA”

Page 22: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Inferred Topics and Stop Words

Page 23: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Results: Log Probabilities

Page 24: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Sampled Concentration Parameters

● Sampled concentration parameters from “AA”

● β' Is large compared to

● Prior over Φ is effectively symmetric: “AA” → “AS”

Page 25: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Intuition

● Topics are specialized distributions over words

– Want topics to be as distinct as possible

– Asymmetric prior over { } makes topics more similar to each other (and to the corpus word frequencies)

– Want a symmetric prior to preserve topic “distinctness”

● Still have to account for power-law word usage:

– Asymmetric prior over { } means some topics (e.g., “the, a, of, to ...”) can be used more often than others

Page 26: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

Conclusions

● Careful thinking about priors can yield new insights

– e.g., priors and stop word handling are related

● For LDA the choice of prior is surprisingly important:

– Asymmetric prior for document-specific topic distributions

– Symmetric prior for topic-specific word distributions

● Rethinking priors for LDA facilitates new topic models

– e.g., polylingual topic model ...

Page 27: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

hanna m. wallach :: topic models: priors, stop words and languages

NESCAI 2010

● April 16-18 @ UMass Amherst

● http://nescai.cs.umass.edu/cfp.php

Page 28: topic models: priors, stop words and languageswallach/talks/priors.pdf · hanna m. wallach :: topic models: priors, stop words and languages Priors for LDA Almost all work on LDA

questions?