hierarchical topic models and the nested chinese restaurant process blei, griffiths, jordan,...
TRANSCRIPT
Hierarchical Topic Models and the Nested Chinese Restaurant
ProcessBlei, Griffiths, Jordan, Tenenbaum
presented by Rodrigo de Salvo Braz
Document classification
• One-class approach: one topic per document, with words generated according to the topic.
• For example, a Naive Bayes model.
Document classification
• It is more realistic to assume more than one topic per document.
• Generative model: pick a mixture distribution over K topics and generate words from it.
Document classification
• Even more realistic: topics may be organized in a hierarchy (not independent);
• Pick a path from root to leaf in a tree; each node is a topic; sample from the mixture.
Dirichlet distribution (DD)
• Distribution over distribution vectors of dimension K:P(p; u, ) = 1/Z(u) i pi
ui
• Parameters are a prior distribution (“previous observations”);
• Symmetric Dirichlet distribution assumes a uniform prior distribution (ui = uj, any i, j).
Latent Dirichlet Allocation (LDA)
• Generative model of multiple-topic documents;
• Generate a mixture distribution on topics using a Dirichlet distribution;
• Pick a topic according to their distribution and generate words according to the word distribution for the topic.
Latent Dirichlet Allocation (LDA)
K
W
wWords
Topics
Topic distribution
DD hyper parameter
Chinese Restaurant Process (CRP)
1 out of 9 customers
Chinese Restaurant Process (CRP)
2 out of 9 customers
Chinese Restaurant Process (CRP)
3 out of 9 customers
Chinese Restaurant Process (CRP)
4 out of 9 customers
Chinese Restaurant Process (CRP)
5 out of 9 customers
Chinese Restaurant Process (CRP)
6 out of 9 customers
Chinese Restaurant Process (CRP)
7 out of 9 customers
Chinese Restaurant Process (CRP)
8 out of 9 customers
Chinese Restaurant Process (CRP)
9 out of 9 customers
Data point (a distribution itself) sampled
Species Sampling Mixture
• Generative model of multiple-topic documents;
• Generate a mixture distribution on topics using a CRP prior;
• Pick a topic according to their distribution and generate words according to the word distribution for the topic.
Species Sampling Mixture
K
W
wWords
Topics
Topic distribution
CRP hyper parameter
Nested CRP1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
6
6
Hierarchical LDA (hLDA)
• Generative model of multiple-topic documents;• Generate a mixture distribution on topics using a
Nested CRP prior;• Pick a topic according to their distribution and
generate words according to the word distribution for the topic.
hLDA graphical model
Artificial data experiment
100 1000-word documents on 25-term vocabulary
Each vertical bar is a topic
CRP prior vs. Bayes Factors
Predicting the structure
NIPS abstracts
Comments
• Accommodates growing collections of data;
• Hierarchical organization makes sense, but not clear to me why the CRP prior is the best prior for that;
• No mention of time; maybe it takes a very long time.