visualizing topic models generated using lda

27
Visualizing Topic Models Generated Using LDA Ashwinkumar Ganesan, Kiante Brantley, Shimei Pan & Jian Chen

Upload: others

Post on 29-Apr-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visualizing Topic Models Generated Using LDA

Visualizing Topic Models Generated Using LDAAshwinkumar Ganesan, Kiante Brantley,

Shimei Pan & Jian Chen

Page 2: Visualizing Topic Models Generated Using LDA

What is Topic Modeling ?

• Find hidden topics

• Process Large Sets of Documents

• Group words & information together

• Understand topics & documents

1

Page 3: Visualizing Topic Models Generated Using LDA

DRACULA

MOBY DICK

Page 4: Visualizing Topic Models Generated Using LDA

• Probabilistic method to find hidden topics

• Word distributions

• Topic distributions

• Link topic, words & documentsTopic Proportions

Topic AssignmentWord distribution

Topics

5

Page 5: Visualizing Topic Models Generated Using LDA

• Hidden Parameters

• Number of Topics

• Number of Iterations

• Naïve or Knowledgeable Users

• Understanding Documents

Page 6: Visualizing Topic Models Generated Using LDA
Page 7: Visualizing Topic Models Generated Using LDA

• Check Word Distributions

• Isolate useful topics

Topics

• Remove unnecessary topics

• Filter the documents based on Rank or Probability

Documents

• Combine Similar Documents

• Refine topics and provide feedback for a new model

Cluster

Annotate Topics

Page 8: Visualizing Topic Models Generated Using LDA

SERENDIP 3

TIARA 4

Page 9: Visualizing Topic Models Generated Using LDA

• Document Exploration without modeling User Feedback

• Hyper parameters that need configuration

• Topics Generated are difficult to understand

• Large Document Corpus require Multiple Views / Scrolling

• Real Time computation of model is difficult

Page 10: Visualizing Topic Models Generated Using LDA

• Visualizing Topics in the document corpus

• Topic Document Relations

• Filtering Documents

• Performing Set Operations

• Clustering Topics & Documents

• Topic Annotations

Page 11: Visualizing Topic Models Generated Using LDA

JSON

FORMAT

Extract Data

Perform LDA

Transform to JSON

Extract JSON

Render Design Using D3

BACKEND FRONTEND

Page 12: Visualizing Topic Models Generated Using LDA

Topic – Document Relations

Topic Information Individual Filter

Word to document search

Page 13: Visualizing Topic Models Generated Using LDA

• Shows the Top 10 Words• Rectangular sizes show

which word is more probable

Page 14: Visualizing Topic Models Generated Using LDA

Document IDs Topics Axis Filter

Page 15: Visualizing Topic Models Generated Using LDA

Multiple Filters

Page 16: Visualizing Topic Models Generated Using LDA

Key word Searching

Page 17: Visualizing Topic Models Generated Using LDA

• Using TF-IDF as a measure to clean the data• There is a preset threshold for eliminating the

words

• Word Probabilities:

𝑃 𝑤𝑥, 𝑑𝑦 ='!"#

$

'%"#

$

𝑃 𝑤𝑥, 𝑡𝑖 ∗ 𝑃(𝑡𝑖, 𝑑𝑦)

Page 18: Visualizing Topic Models Generated Using LDA

• There are 5 participants in the survey

• The participants did not have any formal training on the tool

• There is a brief introduction given to the participants at the start of the survey

• Some participants understand topic modeling

Page 19: Visualizing Topic Models Generated Using LDA

Survey Questionnaire

• Overview

• Topics

• Filtering

• Keyword Search

Page 20: Visualizing Topic Models Generated Using LDA

• Tasks Execution questionsHow many documents are represented in this visual?

• Understanding questionsDoes the visual have many things on it?

• Reasoning questionsWhich is the least important topic? (Important means highly ranked topic)

• Usability questionsAre the rankings on the axis visible / readable?

Page 21: Visualizing Topic Models Generated Using LDA

• Eliminate Documents• Connect topics and parallel coordinates• Combine Searching• Keep & Exclude can be connected to searching• Creating a reset button• Create probability mode & Ranking mode• Topic – Rank group• Filtering & Selection memory

Page 22: Visualizing Topic Models Generated Using LDA
Page 23: Visualizing Topic Models Generated Using LDA

[1]

[3]

[4]

[5][6]

Page 24: Visualizing Topic Models Generated Using LDA
Page 25: Visualizing Topic Models Generated Using LDA
Page 26: Visualizing Topic Models Generated Using LDA
Page 27: Visualizing Topic Models Generated Using LDA

• Ambiguity in Questions

• Imprecise human movements

• Filtering issues

• Remote tests

• Adding Ease of Use questions

• Adding a purpose section

• Better Example can be provided

• Data Selection