ovunc kocabas
TRANSCRIPT
Vigilare: Watch for public opinions on Reddit
Ovunc Kocabas
Motivation• Helping public health policy
makers to make better decisions
• Analyze public sentiment on Reddit
• Consulting project with Epidemico:
• Understanding public’s sentiment on opioid drugs
• 6 out of 10 drug overdoses are related to opioids
Dataset 225 GB raw text data
(Jan. 2015 - May 2016)
70K posts containing 13 opioid drugs
Sentiment Analysis
• Can we predict four sentiments?
• Positive, Negative, Neutral, Mixed
• Better resolution of population segments
• How to identify sentiments in multi-topic context?
Demohttp://www.vigilare-sentiment.me
Training Data• 100 labeled data
• Ask patients!
• Ratings and comments about drugs (9K comments)
• Train for: positive, negative, neutral
• What to do with mixed sentiment?
• Binary classifiers unite!
+o
-
5432
1
Linear SVM• How to get mixed sentiment?
• 2 classifiers
• Positive or Neutral
• Negative or Neutral
• Unbalanced dataset
• Only 1K neutral comments
• SMOTE neutrals!
MixedPositive Negative
Neutral
Classifier 1 Classifier 2 Prediction
+ - Mixed
0 - Negative
+ 0 Positive
0 0 Neutral
Linear SVM Performance
Accuracy Score: 71% F1-Score: 0.73
Accuracy Score: 73% F1-Score: 0.74
Four Sentiment Classifier• Voted binary SVM
classifier
• Accuracy: 34%
• Naive Bayes
• Off-the-shelf classifier
• Accuracy: 27.5 %
Future Directions• Improve performance of the classifier
• More labeled data
• Different approach: Word2vec
• Integrate into Epidemico’s surveillance tool
• Product-switching
• Product evaluation
About meMS & PhD
Electrical and Computer Engineering
Backup slides
Input Page
Output Page - I
Output Page - II
Output Page - III
Process
Topics
User input
SQL
Database
Query Data
Reddit posts
Data Processing
Natural Language Processing
Sentiments
Output Result
Data Visualization