1 topic-sentiment mixture: modeling facets and opinions in weblogs qiaozhu mei †, xu ling †,...

25
1 Topic-Sentiment Mixture: Modeling Facets and Opinions in Weblogs Qiaozhu Mei , Xu Ling , Matthew Wondra , Hang Su , and ChengXiang Zhai † University of Illinois at Urbana- Champaign ‡ Yahoo! Inc.

Upload: roberta-payne

Post on 17-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

1

Topic-Sentiment Mixture: Modeling Facets and Opinions

in Weblogs

Qiaozhu Mei†, Xu Ling†, Matthew Wondra†, Hang Su‡, and ChengXiang Zhai†

† University of Illinois at Urbana-Champaign‡ Yahoo! Inc.

2

Why Opinion Analysis?

• Customers: need peer opinions to make purchase decisions

• Business providers: – need customers’ opinions to improve product – need to track opinions to make marketing decisions

• Social researchers: want to know people’s reactions about social events

• Government: wants to know people’s reactions to a new policy

• Psychology, education, etc.

3

An Illustrative Example

Should I buy an iPod?

• Thumb up or thumb down?

Positive, negative, neutral… (Sentiments)

• Are their opinions changing?

Negative before 2005, but positive

recently… (Dynamics)

• What do people say about ipod?

Price, battery, warranty, nano, … (Topics) • What aspects are good/bad?

Sound is good, battery is bad..

(Faceted opinions)

4

Why Extracting Opinions from Blogs?

• Easy to collect: huge amount, clean format• Broadly distributed: demographics• Topic diversified: free discussion about any

topic/product/event• Opinion rich: highly personalized

5

Evidence from Blog Search

availability

Broad distribution Positive: …the trail leads to fascinating places that are richly

… Negative: …when I first watched the big-screen version of The Da Vinci Code, I fell asleep twice. Not once. Twice! …

Opinion rich

Topic diversity

6

Existing Blog-opinion Analysis Work

• Opinmind: sentiment classification/search of blogs

No faceted analysis, no neutral fact description: Not informative enough to support decision making

7

Existing Blog-opinion Analysis Work (Cont.)

• Use content to predict sales– Blog level topic analysis– Information Diffusion

through blogspace– Use topic bursting to

predict sales spikes– E.g., [Gruhl et al. 2005]

No sentiment analysis, no faceted analysis: what if the hot discussion is “Negative”?

Hot criticisms may not lead to sales spikes

[from Gruhl et al. 2005]

8

What’s Missing Here?

• Discussions are faceted– E.g. iPod: battery? Price? Nano? …– Usually different opinions on different facets

• Opinions have polarities– Positive, negative, and neutral …– Non-discriminative analysis may lead to

wrong decision

• Opinions are changing over time …

9

Our Goal

• Model the mixture of facets and opinions (topics and sentiments)

• Generate a faceted opinion summarization for ad hoc query

• Track the change of opinions over time

time

strength PositiveNegative

Topic-sentiment dynamics (Topic = Price)

Neutral

Query: Dell Laptop

Topic-sentiment summary

positive negative

Topic 2(Battery)

Topic 1(Price)

neutral

• my Dell battery sucks

• Stupid Dell laptop battery

• One thing I really like about this Dell battery is the Express Charge feature.

• i still want a free battery from dell..

• …… • ……

• it is the best site and they show Dell coupon code as early as possible

• Even though Dell's price is cheaper, we still don't want it.

• ……

• mac pro vs. dell precision: a price comparis..

• DELL is trading at $24.66

10

Challenges in Opinion Analysis from Blogs

• Topics and sentiments are mixed together• No existing facet structure for ad hoc topics• Difficult to identify sentiment polarities• Difficult to associate sentiment polarities with

facets• Difficult to segment topics and sentiments

– Tracking sentiment dynamics

11

Our Approach: Modeling Topic-Sentiment Mixture

• Use language models to represent facets and sentiments– Facets represented with topic models, extracted in an

unsupervised/semi-supervised way– Sentiment models extracted in a supervised way

• Model the mixture of topics and sentiments with a probabilistic generative model

• Segment associated topics and sentiments with a topical hidden Markov model

12

Probabilistic Model of Topic-Sentiment Mixture

k

1

2

B

Facet 1

Facet k

Facet 2

Background B

Choose a facet (subtopic) i

battery 0.3 life 0.2..

nano 0.1release 0.05screen 0.02 ..

apple 0.2microsoft 0.1compete 0.05 ..

Is 0.05the 0.04a 0.03 ..

love 0.2awesome 0.05good 0.01 ..

suck 0.07hate 0.06stupid 0.02 ..

P N

P

F

N

P

F

N

P

F

N

battery

love

hate

the

Draw a word from the mixture of topics and sentiments ( )F P N

13T

op

ics

B

1 - B

The “Generation” Process

1

2

k

d1

d2

dk

2, d, F

k, d, F

1, d, F

j, d, N

j, d, P

1

2

k

P

N

Neutral, F

actsP

ositive

Negative

B

w

d

))]|()|()|((

)1(

)|(log[),()log(

,,,,,,

1

NNdjPPdjjFdj

k

jdjB

BCd Vw

wpwpwp

BwpdwcC

p(w| i )p(w| T )

• p(w|i), p(w| p), p(w| N) can be estimated with Maximum Likelihood Estimator (MLE) through an EM algorithm

16

Learning Sentiment Models

• Problem: Sentiment expressions are topic-biased– E.g., “fearful” is negative in general , but how about for a

ghost movie?– E.g., “heavy” is positive for rock music, but how about for

laptops?

• Impossible to create training data for every ad hoc topic

• Solution: – Collect sentiment labeled data with diversified topics– Learn a general sentiment model from the mixed training data in

training mode– Use this general sentiment model as prior, get the topic-biased

sentiment models in testing mode

17

Estimating Topic Models

• Problem: no existing facet structure for ad hoc topics

• Unsupervised extraction: facets might not be what you like– E.g., user wants “battery”, “price” and “sound quality”– System returns “ipod nano”, “ipod video”, “ipod

shuffle”..• Solution: Incorporate user specified interests into

automatically extracted facets– User provides hints; add priors into the topic model– Using MAP estimation instead of MLE– See paper for technical details

18

Sentiment Segmentation and Dynamics Tracking

• Design a topic-sentiment enhanced HMM

• Associate states with topic/sentiment models

• Learn the transition prob. and segment the text

• Plot the sentiment dynamics by counting segments over time ( tagged with each facet and sentiment)

E

T3T2

1

P N

B

T1

From and to E

… the battery really sucks and it's really heavy in my part but where could you find laptops so affordable nowadays?...

19

Experiment Setup• Training data for sentiment models (diversified topics,

downloaded from Opinmind)

• Test dataset: created by querying Google blog search and crawling from original sites (ad hoc)

Datasets # docs Time Period Query Term

iPod 2988 01/06 ~ 11/06 ipod

Da Vinci Code 1000 01/06 ~ 10/06 da+vinci+code

Topic # Pos # Neg Topic # Pos # Neg

laptops 346 142 people 441 475

movies 396 398 banks 292 229

universities 464 414 insurances 354 297

airlines 283 400 nba teams 262 191

cities 500 500 cars 399 334

20

Results: General Sentiment Models• Sentiment models trained from diversified topic mixture

v.s. single topicsPos-Cities Neg-Cities Pos-Mix Neg-Mix

beautiful hate love suck

love suck awesome hate

awesome people good stupid

amaze traffic miss ass

live drive amaze fuck

good fuck pretty horrible

night stink job shitty

nice move god crappy

time weather yeah terrible

air city bless people

greatest transport excellent evil

# topic mixture in training data

KL Divergence between learnt

p and N and unseen topic

21

Results: Facets and Topic Models (I)

• Facets for iPod :

No Prior With Prior

Battery, nano Marketing Ads, spam Nano Battery

battery apple free nano battery

shuffle microsoft sign color shuffle

charge market offer thin charge

nano zune freepay hold usb

dock device complete model hour

itune company virus 4gb mini

usb consumer freeipod dock life

hour sale trial inch rechargable

22

Results: Facets and Topic Models (II)

• Facets for the Da Vinci Code

No Prior With Prior

Story Book Background Movie Religion

landon author jesus movie religion

secret idea mary hank belief

murder holy gospel tom cardinal

louvre court magdalene film fashion

thrill brown testament watch conflict

clue blood gnostic howard metaphor

neveu copyright constantine ron complaint

curator publish bible actor communism

23

Results: Faceted Opinions(the Da Vinci Code)

Neutral Positive Negative

Facet 1:Movie

... Ron Howards selection of Tom Hanks to play Robert Langdon.

Tom Hanks stars in the movie,who can be mad at that?

But the movie might get delayed, and even killed off if he loses.

Directed by: Ron Howard Writing credits: Akiva Goldsman ...

Tom Hanks, who is my favorite movie star act the leading role.

protesting ... will lose your faith by ... watching the movie.

After watching the movie I went online and some research on ...

Anybody is interested in it?

... so sick of people making such a big deal about a FICTION book and movie.

Facet 2:Book

I remembered when i first read the book, I finished the book in two days.

Awesome book. ... so sick of people making such a big deal about a FICTION book and movie.

I’m reading “Da Vinci Code” now.

So still a good book to past time.

This controversy book cause lots conflict in west society.

24

Results: Comparison with Opinmind

• Faceted opinions from TSMFacets Thumbs Up Thumbs Down

iPod Nano (sweat) iPod Nano ok so ...

Ipod Nano is a cool design, ...

WHAT IS THIS SHIT??!!

ipod nanos are TOO small!!!!

Battery the battery is one serious

example of excellent relibability

Poor battery life ...

...iPod’s battery completely died

iPod Video My new VIDEO ipod arrived!!!

Oh yeah! New iPod video

fake video ipod

Watch video podcasts ...

Opinions

from

Opinmind:

Thumbs Up Thumbs Down

I love my iPod, I love my G5... I hate ipod.

I love my little black 60GB iPod Stupid ipod out of batteries...

I LOVE MY iPOD “ hate ipod ” = 489..

I love my iPod. my iPod looked uglier...surface...

- I love my iPod. i hate my ipod.

... iPod video looks SO awesome ... microsoft ... the iPod sucks

25

Results: Sentiment Dynamics

Facet: the book “ the da vinci code”. ( Bursts during the movie, Pos > Neg )

Facet: the impact on religious beliefs. ( Bursts during the movie, Neg > Pos )

26

Summary and Future Work

• Algorithm: A new way to model the mixture of topics and sentiments

• Application: A new way to summarize faceted opinions, and track their dynamics

• Future Work:– Beyond unigram language model?– Better segmentation of sentiments and topics?– Adapting existing facet structures?– Develop an end user application for opinion analysis

27

Thank You!