ovunc kocabas

17
Vigilare: Watch for public opinions on Reddit Ovunc Kocabas

Upload: ovunc-kocabas

Post on 15-Apr-2017

233 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Ovunc Kocabas

Vigilare: Watch for public opinions on Reddit

Ovunc Kocabas

Page 2: Ovunc Kocabas

Motivation• Helping public health policy

makers to make better decisions

• Analyze public sentiment on Reddit

• Consulting project with Epidemico:

• Understanding public’s sentiment on opioid drugs

• 6 out of 10 drug overdoses are related to opioids

Page 3: Ovunc Kocabas

Dataset 225 GB raw text data

(Jan. 2015 - May 2016)

70K posts containing 13 opioid drugs

Page 4: Ovunc Kocabas

Sentiment Analysis

• Can we predict four sentiments?

• Positive, Negative, Neutral, Mixed

• Better resolution of population segments

• How to identify sentiments in multi-topic context?

Page 5: Ovunc Kocabas

Demohttp://www.vigilare-sentiment.me

Page 6: Ovunc Kocabas

Training Data• 100 labeled data

• Ask patients!

• Ratings and comments about drugs (9K comments)

• Train for: positive, negative, neutral

• What to do with mixed sentiment?

• Binary classifiers unite!

+o

-

5432

1

Page 7: Ovunc Kocabas

Linear SVM• How to get mixed sentiment?

• 2 classifiers

• Positive or Neutral

• Negative or Neutral

• Unbalanced dataset

• Only 1K neutral comments

• SMOTE neutrals!

MixedPositive Negative

Neutral

Classifier 1 Classifier 2 Prediction

+ - Mixed

0 - Negative

+ 0 Positive

0 0 Neutral

Page 8: Ovunc Kocabas

Linear SVM Performance

Accuracy Score: 71% F1-Score: 0.73

Accuracy Score: 73% F1-Score: 0.74

Page 9: Ovunc Kocabas

Four Sentiment Classifier• Voted binary SVM

classifier

• Accuracy: 34%

• Naive Bayes

• Off-the-shelf classifier

• Accuracy: 27.5 %

Page 10: Ovunc Kocabas

Future Directions• Improve performance of the classifier

• More labeled data

• Different approach: Word2vec

• Integrate into Epidemico’s surveillance tool

• Product-switching

• Product evaluation

Page 11: Ovunc Kocabas

About meMS & PhD

Electrical and Computer Engineering

Page 12: Ovunc Kocabas

Backup slides

Page 13: Ovunc Kocabas

Input Page

Page 14: Ovunc Kocabas

Output Page - I

Page 15: Ovunc Kocabas

Output Page - II

Page 16: Ovunc Kocabas

Output Page - III

Page 17: Ovunc Kocabas

Process

Topics

User input

SQL

Database

Query Data

Reddit posts

Data Processing

Natural Language Processing

Sentiments

Output Result

Data Visualization