sentiment knowledge discovery in twitter streaming … · introduction methods results discussion...

47
Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank (2010) Angerm¨ uller Christof LMU, Lehr- und Forschungseinheit f¨ ur Datenbanksysteme November 14, 2012 Angerm¨ uller Christof Sentiment Knowledge Discovery in Twitter Streams 1/36

Upload: phungphuc

Post on 04-Jun-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Sentiment Knowledge Discovery in TwitterStreaming Data

Albert Bifet and Eibe Frank (2010)

Angermuller Christof

LMU, Lehr- und Forschungseinheit fur Datenbanksysteme

November 14, 2012

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 1/36

Page 2: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

1 Introduction

2 Methods

3 Results

4 Discussion

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 2/36

Page 3: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Twitter

Social networking - microblogging service

Tweets

up to 140 characters long text message@runtastic: user name#iPhone: subject or categoryRT I like my iPhone: retweet - repetition of tweet

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 3/36

Page 4: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Twitter: Tweets per day

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 4/36

Page 5: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Twitter: Statistics

Statistics

More than 140 mil users

More than 400 mil tweets per day

Exponential growth

Makes Twitter attractive for

Companies

Politicians

Data mining

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 5/36

Page 6: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Sentiment analysis

Definition (Sentiment analysis)

Given a tweet

Predict its sentiment

positive(neutral)negative

Twitter sentiment analysis

sentiment140.com

tweetfeel.com

tweettone.com

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 6/36

Page 7: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Challenges learning from Twitter streams

Twitter API provides access to all tweets

Incremental learning instead of Batch learning required

1 Time efficiency: samples arrive with high rate

2 Memory efficiency: infinite training set

3 Concept drift: sentiments change dynamically

4 Evaluation: concept drift and unbalanced classes

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 7/36

Page 8: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

1 Introduction

2 Methods

3 Results

4 Discussion

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 8/36

Page 9: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Classification

Problem description

Given: training set{

(x (1), y (1)), ..., (x (m), y (m))}

x (i): data (unigram, bigram)y (i): class (y (i) = +1: positive, y (i) = −1: negative)

Wanted: hypothesis hθ(x)→ y

minimizes some cost function J(θ)

Models for sentiment classification

Naıve Bayes

SVM

Hoeffding tree

Logistic regression

...

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 9/36

Page 10: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Classification

Problem description

Given: training set{

(x (1), y (1)), ..., (x (m), y (m))}

x (i): data (unigram, bigram)y (i): class (y (i) = +1: positive, y (i) = −1: negative)

Wanted: hypothesis hθ(x)→ y

minimizes some cost function J(θ)

Models for sentiment classification

Naıve Bayes

SVM

Hoeffding tree

Logistic regression

...

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 9/36

Page 11: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Classification

Problem description

Given: training set{

(x (1), y (1)), ..., (x (m), y (m))}

x (i): data (unigram, bigram)y (i): class (y (i) = +1: positive, y (i) = −1: negative)

Wanted: hypothesis hθ(x)→ y

minimizes some cost function J(θ)

Models for sentiment classification

Naıve Bayes

SVM

Hoeffding tree

Logistic regression

...

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 9/36

Page 12: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Naıve Bayes

Prediction

P(y |x) =P(x |y)P(y)

P(x)∝

n∏j=1

P(wj |y)xjP(y)

hθ(x) = argmaxy ′P(y ′|x)

Training

P(wj |y) Prob. of word wj in tweet with sentiment y

P(y) Prior prob. of sentiment y

Can be easily approximated by the relative frequency

Count word wj sentiment y

⇒ Simple incremental implementation

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 10/36

Page 13: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Batch Gradient Descent

Idea

Minimize cost function J(θ) =∑m

i=1 l(x(i), y (i), θ)

Go into direction of steepest descent ∇J(θ)

Algorithm

Initialize θ0 randomly

Loop until convergence:

θt+1 = θt − α∇J(θt)

= θt − αm∑i=1

∇l(x (i), y (i), θt)

J

dJ

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 11/36

Page 14: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Stochastic Gradient Descent

Idea

Compute gradient of a single random sample

More efficient than Batch Gradient Descent

Algorithm

Initialize θ0 randomly

Loop until convergence:

Shuffle training set randomlyfor i = 1 to m

θt+1 = θt − α∇l(x (i), y (i), θt)

⇒ Simple incremental update for new (x (i), y (i))

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 12/36

Page 15: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Stochastic Gradient Descent

Idea

Compute gradient of a single random sample

More efficient than Batch Gradient Descent

Algorithm

Initialize θ0 randomly

Loop until convergence:

Shuffle training set randomlyfor i = 1 to m

θt+1 = θt − α∇l(x (i), y (i), θt)

⇒ Simple incremental update for new (x (i), y (i))

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 12/36

Page 16: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

SVM: Support Vector Machine

Prediction

Given weight vector θ

hθ(x) =

{+1 θT x ≥ 0

−1 θT x < 0

Training

Hinge loss function:

J(θ) =1

m

m∑i=1

max{

1− y (i)θT x (i), 0}

θ can be updated incrementally viaSGD

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 13/36

Page 17: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Hoeffding TreePrediction

Each node tests an attribute

Each edge refers to an attribute value

hθ(x) =‘label y of leaf to which x is assigned to’

x = (6, 2)

x1 < 5

x2 < 6 x2 < 3

θ1 θ2 θ3 θ4

YesNo

Yes N

o Yes N

o

x1

x2

5

6

3 θ1

θ2

θ3

θ4

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 14/36

Page 18: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Hoeffding TreePrediction

Each node tests an attribute

Each edge refers to an attribute value

hθ(x) =‘label y of leaf to which x is assigned to’

x = (6, 2) x1 < 5

x2 < 6 x2 < 3

θ1 θ2 θ3 θ4

YesNo

Yes N

o Yes N

o

x1

x2

5

6

3 θ1

θ2

θ3

θ4

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 14/36

Page 19: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Hoeffding TreeIncremental tree induction

1 Assign (x , y) to leaf2 Use Hoeffding bound as split criterion

Top-down model tree induction

x = (No,No,Yes) A1

A2 θ3

θ1 θ2

YesNo

Yes N

o

θ3

No

A3

θ4 θ5

Yes Yes

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 15/36

Page 20: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Hoeffding TreeIncremental tree induction

1 Assign (x , y) to leaf

2 Use Hoeffding bound as split criterion

Top-down model tree induction

x = (No,No,Yes) A1

A2 θ3

θ1 θ2

YesNo

Yes N

o

θ3

No

A3

θ4 θ5

Yes Yes

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 15/36

Page 21: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Hoeffding TreeIncremental tree induction

1 Assign (x , y) to leaf2 Use Hoeffding bound as split criterion

Top-down model tree induction

x = (No,No,Yes) A1

A2 θ3

θ1 θ2

YesNo

Yes N

o

θ3

No

A3

θ4 θ5

Yes Yes

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 15/36

Page 22: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Hoeffding TreeIncremental tree induction

1 Assign (x , y) to leaf2 Use Hoeffding bound as split criterion

∆G = G (Xa)− G (Xb) > ε

ε = ln

√R2ln(1/α)

2n

⇒ Hypothesis testing with significance level α

Top-down model tree induction

x = (No,No,Yes) A1

A2 θ3

θ1 θ2

YesNo

Yes N

o

θ3

No

A3

θ4 θ5

Yes Yes

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 15/36

Page 23: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Labeling tweets

Question

How to label millions of tweets?

Answer

Emoticons :-)

Positive sentiment

x =‘I love my iPhone more than my girlfriend :-)‘

y = +1

Negative sentiment

x =‘My iPhone is broken :-(‘

y = −1

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 16/36

Page 24: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Labeling tweets

Question

How to label millions of tweets?

Answer

Emoticons :-)

Positive sentiment

x =‘I love my iPhone more than my girlfriend :-)‘

y = +1

Negative sentiment

x =‘My iPhone is broken :-(‘

y = −1

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 16/36

Page 25: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Data preparation (Go et al., sentiment140.com)

1 Retrieve tweets via Twitter API

2 Discard tweets without emoticons

3 Define y = +1/− 1 with positive/negative emoticon4 Feature reduction

Remove emoticons@john → USERwww.google.com → URLheeeello → hello

5 Define x with unigrams/bigrams

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 17/36

Page 26: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Training and Test set

Holdout validation

Prequential / Interleaved test-then-train

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 18/36

Page 27: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Training and Test set

Holdout validation

1 Divide data into training set (60%) and test set (40%)

2 Train model on training set

3 Measure performance on test set

⇒ Static test set

Prequential / Interleaved test-then-train

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 18/36

Page 28: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Training and Test set

Holdout validation

1 Divide data into training set (60%) and test set (40%)

2 Train model on training set

3 Measure performance on test set

⇒ Static test set

Prequential / Interleaved test-then-train

New sample (x , y) arrives...

1 Test hθ(x)?= y and update performance metric

2 Update θ with (x , y)

⇒ Dynamic test set

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 18/36

Page 29: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Accuracy

f (x)

hθ(x)

1 · · · L1 o1,1 · · · o1,L r1...

.... . .

......

L oL,1 · · · oL,L rLc1 · · · cL m

Definition (Accuracy)

p0 =1

m

L∑i=1

oi ,i

⇒ Fraction of correct predictions

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 19/36

Page 30: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Accuracy

Unbalanced classes

Unbalanced (skewed) class distribution

Can lead to high accuracy

Cancer prediction

y = −1: person has no cancer (99%)

y = +1: person has cancer (1%)

hθ(x) = −1, i.e. predict always no cancer

⇒ p0 = 99%

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 20/36

Page 31: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Accuracy

Unbalanced classes

Unbalanced (skewed) class distribution

Can lead to high accuracy

Cancer prediction

y = −1: person has no cancer (99%)

y = +1: person has cancer (1%)

hθ(x) = −1, i.e. predict always no cancer

⇒ p0 = 99%

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 20/36

Page 32: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

κ statistic (Cohen, 1960)

Definition (kappa statistic)

rim

= predicted frequency of class i

cim

= actual frequency of class i

pc =L∑

i=1

rim

cim

= probability of being correct by chance

κ =p0 − pc1− pc

κ: accuracy relative to change prediction

po = pc → κ = 0

po = 1→ κ = 1Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 21/36

Page 33: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

κ statistic

Cancer prediction

y = −1: person has no cancer (99%)

y = +1: person has cancer (1%)

hθ(x) = −1, i.e. predict always no cancer

p0 = 0.99

pc = 1.00 ∗ 0.99 + 0.00 ∗ 0.01 = 0.99

⇒ κ = p0−pc1−pc = 0.0

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 22/36

Page 34: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

κ statisticIncremental update

Variables

Only 2L + 1 variables

ri , ci for i = 1, ..., L: row, column sums

p0: accuracy

Can be updated incrementally for a new (x , y)

Only consider most recent samples

Sliding window: store last w samples

Fading factors: down weight older samples

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 23/36

Page 35: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

1 Introduction

2 Methods

3 Results

4 Discussion

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 24/36

Page 36: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Training sets

sentiment140.com (Go et al.)

800.000 positive tweets800.000 negative tweetsBalanced classes - not representative

Edinburgh corpus

1.800.000 positive tweets325.000 negative tweetsUnbalanced classes - representative

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 25/36

Page 37: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

sentiment140.comAccuracy

Only positives

⇒ High accuracy for unbalances classes

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 26/36

Page 38: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

sentiment140.comκ statistic

Only positives

⇒ κ statistic detects unbalanced classes

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 27/36

Page 39: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

sentiment140.com

Accuracy κ Time

1 SGD 86.80% 62.60% 219.54 sec.2 Naıve Bayes 75.05% 50.10% 116.62, sec.3 Hoeffding Tree 73.11% 46.23% 5525.51 sec.

SGD scores best (Accuracy, κ)

Hoeffding tree scores worst (Accuracy, κ, Time)

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 28/36

Page 40: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Edinburgh corpusAccuracy

⇒ Similar accuracy

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 29/36

Page 41: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Edinburgh corpusκ statistic

⇒ κ discriminates better

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 30/36

Page 42: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Edinburgh corpus

Accuracy κ Time

1 Naıve Bayes 86.11% 36.15% 173.28, sec.2 SGD 86.26% 31.88% 293.98 sec.3 Hoeffding Tree 84.76% 20.40% 6151.51 sec.

Naıve Bayes scores best (κ, Time)

Hoeffding tree scores worst (Accuracy, κ, Time)

κ lower than sentiment140.com due to unbalanced classes

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 31/36

Page 43: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Changing feature weights

Tag Middle End Variance

apple 0.3 0.7 0.4microsoft -0.4 -0.1 0.3facebook -0.3 0.4 0.7mcdonalds 0.5 0.1 -0.4google 0.3 0.6 0.3disney 0.0 0.0 0.0bmw 0.0 -0.2 -0.2pepsi 0.1 -0.6 -0.7dell 0.2 0.0 -0.2gucci -0.4 0.6 1.0amazon -0.1 -0.4 -0.3

High weight: more frequent in positive tweets

Low weight: more frequent in negative tweets

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 32/36

Page 44: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

1 Introduction

2 Methods

3 Results

4 Discussion

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 33/36

Page 45: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

κ statistic

Pros

Suitable for unbalanced classes

Simple computation

Suitable for incremental learning

Cons

Independence assumption for computing pc often invalid

Conservative estimate

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 34/36

Page 46: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Introduction Methods Results Discussion

Furture improvements

Additional features

EmoticonsNumber of friends/followers

Different languages

Neural sentiments (three classes)

Semantics

Foo beats BarBar beats Foo

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 35/36

Page 47: Sentiment Knowledge Discovery in Twitter Streaming … · Introduction Methods Results Discussion Sentiment Knowledge Discovery in Twitter Streaming Data Albert Bifet and Eibe Frank

Appendix

Sources

A. Bifet and E. Frank (2010)Sentiment knowledge discovery in twitter streaming data.

A. Go et al. (2009)Twitter Sentiment Classification using Distant Supervision.

J. Gama et al. (2009)Issues in evaluation of stream learning algorithms.

J. Cohen (1960)A coefficient of agreement for nominal scales.

Angermuller Christof Sentiment Knowledge Discovery in Twitter Streams 36/36