text mining using lda with context
TRANSCRIPT
![Page 1: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/1.jpg)
Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
Text Mining Using LDA with Context
Christoph Kling, Steffen Staab
Web and Internet Science Group · ECS · University of Southampton, UK &
![Page 2: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/2.jpg)
Text Mining Using LDA with Context 2/68Steffen Staab
Text Mining Documents
Documents are PDFs, emails, tweets,
Flickr photo tags, CVs, ...
Documents consist of bag of words metadata
- author(s) - timestamp- geolocation- publisher- booktitle- device...
Chinese food
Vegan
food
Break-
fast
dimsumduckeggs
...
vegantofu...
eggsham...
Objective:Cluster, categorize,
& explain
![Page 3: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/3.jpg)
Text Mining Using LDA with Context 3/68Steffen Staab
Latent Dirichlet Allocation (LDA)
![Page 4: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/4.jpg)
Text Mining Using LDA with Context 4/68Steffen Staab
Latent Dirichlet Allocation (LDA)
Document-topic distributions
Topic-word distributions
K topicsM documentsEach doc m M has length Nm
![Page 5: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/5.jpg)
Text Mining Using LDA with Context 5/68Steffen Staab
Use Metadata to Help Topic Prediction
Improve topic detection→ Morning times may help to improve the breakfast topic Describe dependencies: metadata ↔ topics
→ breakfast topic happens during morning hours Chinese
food
Vegan
food
Break-
fast
dimsumduckeggs
...
vegantofu...
eggsham...
![Page 6: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/6.jpg)
Text Mining Using LDA with Context 6/68Steffen Staab
Use Metadata to Help Topic Prediction
Improve topic detection→ Morning times may help to improve the breakfast topic Describe dependencies: metadata ↔ topics
→ breakfast topic happens during morning hours
Usage Autocompletion
→ From words to words Prediction of search queries
→ From metadata to words→ From words to metadata
Chinese food
Vegan
food
Break-
fast
dimsumduckeggs
...
vegantofu...
eggsham...
![Page 7: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/7.jpg)
Text Mining Using LDA with Context 7/68Steffen Staab
Nominal
Ordinal
Cyclic
Spherical
Networked
Structures of Metadata Spaces Nejdl Staab Kling
![Page 8: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/8.jpg)
Text Mining Using LDA with Context 8/68Steffen Staab
Challenges for Using Metadata for Text Mining
Generalizing the Text Mining ModelCreating a special text mining model for every dataset with its
kind of metadata spaces is impractical→ we need flexible models!
![Page 9: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/9.jpg)
Text Mining Using LDA with Context 9/68Steffen Staab
Challenges for Using Metadata for Text Mining
Generalizing the Text Mining Model Efficiency of the Text Mining ModelRich metadata → complex models → complex inference, slow convergence of samplers→ analysis of big datasets impossible
![Page 10: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/10.jpg)
Text Mining Using LDA with Context 10/68Steffen Staab
Challenges for Using Metadata for Text Mining
Generalizing the Text Mining Model Efficiency of the Text Mining Model Explaining the ResultImportance of Metadata→ learn how to weight metadata→ exclude irrelevant metadata (improves efficiency!)Complex dependencies & complex probability functions→ Learned parameters incomprehensible→ Reduced usefulness for data analysis / visualisation→ No sanity checks on parameters
![Page 11: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/11.jpg)
Text Mining Using LDA with Context 11/68Steffen Staab
Topic Models for Arbitrary Metadata
![Page 12: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/12.jpg)
Text Mining Using LDA with Context 12/68Steffen Staab
Topic Models for Arbitrary Metadata
Predict document-topic distributions using metadata→ Gaussian Process Regression Topic Model
(Agovic & Banerjee, 2012)→ Dirichlet-Multinomial Regression Topic Model
(Mimno & McCallum, 2012)→ Structural Topic Model (logistic normal regression)
(Roberts et al., 2013)
![Page 13: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/13.jpg)
Text Mining Using LDA with Context 13/68Steffen Staab
Topic Models for Arbitrary Metadata
Predict document-topic distributions using metadata→ Gaussian Process Regression Topic Model→ Dirichlet-Multinomial Regression Topic Model→ Structural Topic Model (logistic normal regression)
Regression input: MetadataRegression output: Topic distribution
![Page 14: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/14.jpg)
Text Mining Using LDA with Context 14/68Steffen Staab
Topic Models for Arbitrary Metadata
Dirichlet-multinomial regression
Metadata
Document-topic distributions
![Page 15: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/15.jpg)
Text Mining Using LDA with Context 15/68Steffen Staab
Topic Models for Arbitrary Metadata
Gaussian process regression
Metadata
Document-topic distributions
![Page 16: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/16.jpg)
Text Mining Using LDA with Context 16/68Steffen Staab
Topic Models for Arbitrary Metadata
Logistic normal regression
Metadata
Document-topic distributions
![Page 17: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/17.jpg)
Text Mining Using LDA with Context 17/68Steffen Staab
Topic Models for Arbitrary Metadata
Alternating inference: Estimate topics Estimate regression model Use prediction for re-estimating topics Re-estimate regression model with new topics ...
![Page 18: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/18.jpg)
Text Mining Using LDA with Context 18/68Steffen Staab
Topic Models for Arbitrary Metadata
Alternating inference: Estimate topics Estimate regression model Use prediction for re-estimating topics Re-estimate regression model with new topics ...
slow convergence
![Page 19: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/19.jpg)
Text Mining Using LDA with Context 19/68Steffen Staab
Topic Models for Arbitrary Metadata
Applicable to a wide range of metadata! Estimation of regression parameters relatively expensive Learned parameters have no natural interpretation Alternating process of paramter estimation is expensive
![Page 20: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/20.jpg)
Text Mining Using LDA with Context 20/68Steffen Staab
Topic Models for Arbitrary Metadata
Dirichlet-multinomial and logistic-normal regression do not support complex input data
(i.e. geographical data, temporal cycles, …)
Gaussian process regression topic models are very powerful with the right kernel function
...but require expert knowledge for kernel selection and efficient inference!
![Page 21: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/21.jpg)
Text Mining Using LDA with Context 21/68Steffen Staab
Hierarchical Multi-Dirichlet Process
Topic Models
The Idea
![Page 22: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/22.jpg)
Text Mining Using LDA with Context 22/68Steffen Staab
Topic Prediction
Topi
c P
roba
bilit
y
Metadata (e.g. time)
Documents, e.g. emails
![Page 23: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/23.jpg)
Text Mining Using LDA with Context 23/68Steffen Staab
Dirichlet-Multinomial Regression
Topi
c P
roba
bilit
y
Metadata (e.g. time)
![Page 24: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/24.jpg)
Text Mining Using LDA with Context 24/68Steffen Staab
Gaussian Process Regression
Topi
c P
roba
bilit
y
Metadata (e.g. time)
Topi
c P
roba
bilit
y
![Page 25: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/25.jpg)
Text Mining Using LDA with Context 25/68Steffen Staab
Cluster-Based Prediction
Topi
c P
roba
bilit
y
Metadata (e.g. time)
![Page 26: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/26.jpg)
Text Mining Using LDA with Context 26/68Steffen Staab
Cluster-Based Prediction
Topi
c P
roba
bilit
y
Metadata (e.g. time)
![Page 27: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/27.jpg)
Text Mining Using LDA with Context 27/68Steffen Staab
Cluster-Based Prediction
Topi
c P
roba
bilit
y
Metadata (e.g. time)
Topi
c P
roba
bilit
yTo
pic
Pro
babi
lity
Topi
c P
roba
bilit
y
![Page 28: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/28.jpg)
Text Mining Using LDA with Context 28/68Steffen Staab
Cluster-Based Prediction
Topi
c P
roba
bilit
y
Metadata (e.g. time)
Topi
c P
roba
bilit
yTo
pic
Pro
babi
lity
Topi
c P
roba
bilit
y
![Page 29: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/29.jpg)
Text Mining Using LDA with Context 29/68Steffen Staab
Idea
Two-step model:1)Cluster similar documents2)Learn topics for clusters and documents simultaneously
▪ Learn topic distributions of document clusters▪ Use cluster-topic distributions for topic prediction
![Page 30: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/30.jpg)
Text Mining Using LDA with Context 30/68Steffen Staab
Performance, Complex Metadata
Cluster documents for each metadata
![Page 31: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/31.jpg)
Text Mining Using LDA with Context 31/68Steffen Staab
Performance, Complex Metadata
Cluster documents for each metadata
![Page 32: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/32.jpg)
Text Mining Using LDA with Context 32/68Steffen Staab
Performance, Complex Metadata
Cluster documents for each metadata
+ nominal, ordinal, cyclic, spherical data+ any data which can be clustered!
![Page 33: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/33.jpg)
Text Mining Using LDA with Context 33/68Steffen Staab
Performance, Complex Metadata
Metadata clusters are associated with topicsGerman Beer
Party
![Page 34: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/34.jpg)
Text Mining Using LDA with Context 34/68Steffen Staab
Mixture of Metadata Predictions
Metadata clusters are associated with topicsGerman Beer
Party
The topic prediction for a single document is a mixture of the prediction of its metadata clusters
![Page 35: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/35.jpg)
Text Mining Using LDA with Context 35/68Steffen Staab
Smoothing of HMDP
![Page 36: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/36.jpg)
Text Mining Using LDA with Context 36/68Steffen Staab
Cluster-Based Prediction vs Outliers and noisy data
Topi
c P
roba
bilit
y
Metadata (e.g. time)
![Page 37: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/37.jpg)
Text Mining Using LDA with Context 37/68Steffen Staab
Adjacency Smoothing
Naive approach: Smoothed value of a cluster is the mean of the cluster and its adjacent clusters
Repeat n times
![Page 38: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/38.jpg)
Text Mining Using LDA with Context 38/68Steffen Staab
Smoothing topics associated with metadata clusters
Documents receive topics from their own and neighboring metadata clusters
![Page 39: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/39.jpg)
Text Mining Using LDA with Context 39/68Steffen Staab
Performance, Complex Metadata
Smooth topics associated with metadata clusters
![Page 40: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/40.jpg)
Text Mining Using LDA with Context 40/68Steffen Staab
Nominal Ordinal Cyclic Spherical Networked
![Page 41: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/41.jpg)
Text Mining Using LDA with Context 41/68Steffen Staab
Smoothing
Smoothing-strength is learned during inferenceSimilar clusters → stronger smoothingDissimilar clusters → softer smoothing
Smoothing-strength alternatively can be predefined by user
![Page 42: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/42.jpg)
Text Mining Using LDA with Context 42/68Steffen Staab
Metadata Weighting in HMDP's
![Page 43: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/43.jpg)
Text Mining Using LDA with Context 43/68Steffen Staab
Feature Weighting
One variable governs the influence of metadata cluster on documents
If η < threshold, ignore variable.
η
![Page 44: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/44.jpg)
Text Mining Using LDA with Context 44/68Steffen Staab
Metadata Weighting
Importance of metadata is learned during inference, answering the question:
How many percent of the topics are explained by a given metadata? (e.g. time, geographical coordinates, ...)
→ Interpretable parameter! Metadata with a low weight can be removed during
inference
![Page 45: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/45.jpg)
Text Mining Using LDA with Context 45/68Steffen Staab
Example Application
![Page 46: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/46.jpg)
Text Mining Using LDA with Context 46/68Steffen Staab
Dataset
Linux Kernel Mailinglist3,400,000 emails with timestamps and mailinglist ID
![Page 47: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/47.jpg)
Text Mining Using LDA with Context 47/68Steffen Staab
Dataset
Linux Kernel Mailinglist3,400,000 emails with timestamps and mailinglist ID
Timeline Yearly cycle Weekly cycle Daily cycle Mailing list
![Page 48: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/48.jpg)
Text Mining Using LDA with Context 48/68Steffen Staab
Topics
![Page 49: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/49.jpg)
Text Mining Using LDA with Context 49/68Steffen Staab
Topics
![Page 50: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/50.jpg)
Text Mining Using LDA with Context 50/68Steffen Staab
Topics
Professional topics:
Hobbyist topics:
![Page 51: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/51.jpg)
Text Mining Using LDA with Context 51/68Steffen Staab
Topics
Metadata weighting:
![Page 52: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/52.jpg)
Text Mining Using LDA with Context 52/68Steffen Staab
Topics
Metadata weighting:
can be removed during inference
![Page 53: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/53.jpg)
Text Mining Using LDA with Context 53/68Steffen Staab
Efficient Inference in HMDP
![Page 54: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/54.jpg)
Text Mining Using LDA with Context 54/68Steffen Staab
Hierarchical Multi-Dirichlet Process Topic Model (HMDP)
Cluster-topic distributions
Document-topic distributions
Metadata
![Page 55: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/55.jpg)
Text Mining Using LDA with Context 55/68Steffen Staab
Hierarchical Multi-Dirichlet Process Topic Model (HMDP)
Inference:Nearly completely collapsedinference!
![Page 56: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/56.jpg)
Text Mining Using LDA with Context 56/68Steffen Staab
Hierarchical Multi-Dirichlet Process Topic Model (HMDP)
We only need to learn Global topic distribution Topic assignments to words
![Page 57: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/57.jpg)
Text Mining Using LDA with Context 57/68Steffen Staab
Hierarchical Multi-Dirichlet Process Topic Model (HMDP)
We only need to learn Global topic distribution Topic assignments to words Dirichlet parameters
![Page 58: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/58.jpg)
Text Mining Using LDA with Context 58/68Steffen Staab
Hierarchical Multi-Dirichlet Process Topic Model (HMDP)
Approximations: Variational Practical Stochastic
→ low memory consumption→ online inference
![Page 59: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/59.jpg)
Text Mining Using LDA with Context 59/68Steffen Staab
Parameters of HMDP
Cluster-topic distributions:How many documents of a cluster contain topic x?
![Page 60: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/60.jpg)
Text Mining Using LDA with Context 60/68Steffen Staab
Parameters of HMDP
Cluster-topic distributions:How many documents of a cluster contain topic x? Metadata-weightsHow many of the topics of documents are explainedby metadata x?
![Page 61: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/61.jpg)
Text Mining Using LDA with Context 61/68Steffen Staab
Parameters of HMDP
Cluster-topic distributions:How many documents of a cluster contain topic x? Metadata-weightsHow many of the topics of documents are explainedby metadata x? Dirichlet process scaling parametersHow many pseudo-counts do we add to the topic
distributions?
![Page 62: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/62.jpg)
Text Mining Using LDA with Context 62/68Steffen Staab
Properties of HMDP
Interpretable parameters Simultaneous inference of topics and metadata-topic
dependencies Efficient online inference
![Page 63: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/63.jpg)
Text Mining Using LDA with Context 63/68Steffen Staab
Comparison of Topic Models for Arbitrary Metadata
![Page 64: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/64.jpg)
Text Mining Using LDA with Context 64/68Steffen Staab
Comparison
Gaussian Process Topic ModelThe “perfect” model:
Can cope with arbitrary metadata Models dependencies between metadata Parameter learning is very expensive Kernel selection and inference require expert knowledge Parameters of Gaussian processes hard to interpret
![Page 65: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/65.jpg)
Text Mining Using LDA with Context 65/68Steffen Staab
Comparison
Multinomial Regression Topic ModelThe “straight-forward” model:
Can cope with many metadata Parameter learning is cheaper than for Gaussian
processes but still expensive (due to alternating inference and repeated distance calculations)
Can not cope with complex metadata(e.g. geographical, cyclic, ...) Does not model dependencies between metadata Regression weights of Dirichlet-multinomial regression
hard to interpret
![Page 66: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/66.jpg)
Text Mining Using LDA with Context 66/68Steffen Staab
Comparison
Hierarchical Multi-Dirichlet Process Topic ModelThe “fast” model:
Can cope with arbitrary metadata Fast inference (simultaneously for topics and topic
predictions) All parameters have natural interpretations as probabilities
or pseudo-counts Requires a (simple) pre-clustering of documents Does not model dependencies between metadata
![Page 67: Text Mining using LDA with Context](https://reader035.vdocuments.us/reader035/viewer/2022062821/5899ec351a28ab96418b66f3/html5/thumbnails/67.jpg)
Text Mining Using LDA with Context 67/68Steffen Staab
THANK YOU FOR YOUR ATTENTION!