text-classification using latent dirichlet allocation - intro graphical model lei li
Post on 20-Jan-2018
221 Views
Preview:
DESCRIPTION
TRANSCRIPT
Text-classification using Text-classification using Latent Dirichlet Latent Dirichlet
AllocationAllocation- intro graphical model- intro graphical model
Lei LiLei Lileili@csleili@cs
Outline• Introduction• Unigram model and mixture• Text classification using LDA • Experiments• Conclusion
Text ClassificationWhat class can you tell given a doc?
…………………… the New York
Stock Exchange……………………
America’s Nasdaq ………………………
Buy………………………
…………………… bank debtloan
interest billion
buy………………………
…………………… the New York
Stock Exchange……………………
America’s Nasdaq ………………………
Buy………………………
…………………… Iraq war
weapon armyAk-47bomb
………………………
finance
military
Why db guys care?• Could be adapted to model
discrete random variables– Disk failures– user access pattern– Social network, tags– blog
Document• “ bag of words”: no order on
words• d=(w1, w2, … wN)• wi one value in 1…V (1-of-V
scheme)• V: vocabulary size
Modeling Document• Unigram: simple multinomial dist• Mixture of unigram• LDA• Other: PLSA, bigram
Unigram Model for Classification
• Y is the class label,• d={w1, w2, … wN}• Use bayes rule: • How to model the
document given class• ~ Multinomial
distribution, estimated as word frequency
)0()0|()1()1|()()|()|(
YPYdPYPYdPYPYdPdYP
N
ii YwPYdP
1
)|()|(
)|( YwP iY
wN
Unigram: exampleP(w|Y) bank debt interest war army weapon
finance 0.2 0.15 0.1 0.0001 0.0001 0.0001
military 0.0001 0.0001 0.0001 0.1 0.15 0.2
d = bank * 100, debt * 110, interest * 130, war * 1, army * 0, weapon * 0P(finance|d)=?P(military|d)=?
P(Y)
finance 0.6
military 0.4
Mixture of unigrams for classification
Y
wN
z
• For each class, assume k topics
• Each topic represents a multinomial distribution
• Under each topic, each word is multinomial
Unigram: example
d = bank * 100, debt * 110, interest * 130, war * 1, army * 0, weapon * 0P(finance|d)=?P(military|d)=?
P(Y)
finance 0.6
military 0.4
P(w|z,Y)
bank debt interest war army weapon
finance 0.01 0.15 0.1 0.0001 0.0001 0.00010.2 0.01 0.01 0.0001 0.0001 0.0001
military 0.0001 0.0001 0.0001 0.1 0.15 0.010.0001 0.0001 0.0001 0.01 0.01 0.2
P(z|Y)finance 0.3
0.7military 0.5
0.5
Bayesian Network• Given a DAG• Nodes are random variables, or
parameters• Arrow are conditional probability
dependency• Given some prob on part nodes, there
are algorithm to infer values for other nodes
Latent Dirichlet Allocation
• Model a θ as a Dirichlet distribution, on α
• For n-th term wn:– Model n-th latent
variable zn as a multinomial distribution according to θ.
– Model wn as a multinomial distribution according to zn and β.
Variational inference for LDA
• Direct inference with LDA is HARD
• Approximation with variational distribution
• use factorized distribution on variational parameters γ and Φ to approximate posterior distribution of latent variables θand z.
Experiment• Data set: Reuters-21578, 8681 training
documents, 2966 test documents.• Classification task: “EARN” vs. “Non-
EARN” • For each document, learn LDA features
and classify with them (discriminative)
Result'bank' 'trade' 'shares' 'tonnes''banks' 'japan' 'company' 'mln''debt' 'japanese' 'stock' 'reuter''billion' 'states' 'dlrs' 'sugar''foreign' 'united' 'share' 'production''dlrs' 'officials' 'reuter' 'gold''government' 'reuter' 'offer' 'wheat''interest' 'told' 'common' 'nil''loans' 'government' 'pct' 'gulf'
most frequent words in each topic
Classification Accuracy
Comparison of Accuracy
Take Away Message• LDA with few topics and few training data
could produce relative better results• Bayesian network is useful to model multiple
random variable, nice algorithm for it, • Potential use of LDA:
– disk failure– database access pattern– user preference (collaborative filtering)– social network (tags)
Reference• Blei, D., Ng, A., Jordan, M.: Latent
Dirichlet allocation. Journal of machine Learning Research
top related