hierarchical topic models and the nested chinese restaurant process blei, griffiths, jordan,...

Hierarchical Topic Models and the Nested Chinese Restaurant

ProcessBlei, Griffiths, Jordan, Tenenbaum

presented by Rodrigo de Salvo Braz

Document classification

• One-class approach: one topic per document, with words generated according to the topic.

• For example, a Naive Bayes model.


• It is more realistic to assume more than one topic per document.

• Generative model: pick a mixture distribution over K topics and generate words from it.


• Even more realistic: topics may be organized in a hierarchy (not independent);

• Pick a path from root to leaf in a tree; each node is a topic; sample from the mixture.

Dirichlet distribution (DD)

• Distribution over distribution vectors of dimension K:P(p; u, ) = 1/Z(u) i pi

ui

• Parameters are a prior distribution (“previous observations”);

• Symmetric Dirichlet distribution assumes a uniform prior distribution (ui = uj, any i, j).

Latent Dirichlet Allocation (LDA)

• Generative model of multiple-topic documents;

• Generate a mixture distribution on topics using a Dirichlet distribution;

• Pick a topic according to their distribution and generate words according to the word distribution for the topic.

Latent Dirichlet Allocation (LDA)

K

W

wWords

Topics

Topic distribution

DD hyper parameter

Chinese Restaurant Process (CRP)

1 out of 9 customers



Data point (a distribution itself) sampled

Species Sampling Mixture

• Generative model of multiple-topic documents;

• Generate a mixture distribution on topics using a CRP prior;

• Pick a topic according to their distribution and generate words according to the word distribution for the topic.

Species Sampling Mixture

K

W

wWords

Topics

Topic distribution

CRP hyper parameter

Nested CRP1

1

1

2

2

2

3

3

3

4

4

4

5

5

5

6

6

6

Hierarchical LDA (hLDA)

• Generative model of multiple-topic documents;• Generate a mixture distribution on topics using a

Nested CRP prior;• Pick a topic according to their distribution and

generate words according to the word distribution for the topic.

hLDA graphical model

Artificial data experiment

100 1000-word documents on 25-term vocabulary

Each vertical bar is a topic

CRP prior vs. Bayes Factors

Predicting the structure

NIPS abstracts

Comments

• Accommodates growing collections of data;

• Hierarchical organization makes sense, but not clear to me why the CRP prior is the best prior for that;

• No mention of time; maybe it takes a very long time.

hierarchical topic models and the nested chinese restaurant process blei, griffiths, jordan,...

Documents

mixture distribution

word distribution

topic documentsgenerate

topic sample

uniform prior distribution

hierarchical topic models

topiccrp prior

dirichlet distributionpick