visa: a visual sentiment analysis system sept. 2012 dongxu duan 1 weihong qian 1 shimei pan 2 lei...

45
VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J. Watson Research Center 3 Institute of Software Chinese Academy of Sciences 4 Tsinghua University

Upload: julianna-fears

Post on 22-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

VISA: A VIsual Sentiment Analysis System

Sept. 2012

Dongxu Duan1 Weihong Qian1 Shimei Pan2

Lei Shi3 Chuang Lin4

1 IBM Research — China

2 IBM T. J. WatsonResearch Center

3 Institute of SoftwareChinese Academy of Sciences

4 Tsinghua University

Page 2: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

2

What is Sentiment Analysis• Sentiment analysis or opinion mining refers to the application

of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials. ---- From Wikipedia

• A survey of sentiment analysis works by Pang and Lee in 2008:“Opinion mining and sentiment analysis”, cited 1189 times in Google Scholar, including 326 references

A probably earliest study:

Page 3: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

3

MotivationThe truth: sentiment analysis is becoming even more important– Corporate

* Brand analysis, sales campaign design, etc. * Crisis relationship management

– Government• As we all know ..

Observations:– Sentiment analysis technologies are going deeper and versatile:

* Aspect-oriented, domain-specific lexicon expansion, MT technology

– The average users are still leveraging rather simple sentiment results

• It’s hard for them (even domain expert) to understand sophisticated SA results

– There is big gap and huge potential for sentiment visualization (visual opinion mining)

Page 4: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

4

Agenda

• Related Works

• Research Problem and Challenges

• Sentiment-Tuple based Data Model

• VISA System Framework

• Visualization Optimizations

• Cases

• User Studies

• Summary

Page 5: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Basic Sentiment Representation• Raw text/table or simple visualization

Page 6: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Brand Association Map

Page 7: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

COBRA (COrporate Brand and Reputation Analysis)

Behal et al. (HCI 2009)

Page 8: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Opinion Observer

Liu et al. (KDD 2005); Liu et al. (IW3C2 2005)

Page 9: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Visual Sentiment Analysis of RSS News Feeds

Wanner et al. (VISSW 2009)

Page 10: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Pulse: Mining Customer Opinions from Free Text

Gamon et al. (IDA 2005)

Page 11: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Visualizing Sentiments in Financial Texts

Ahmad and Almas (IV2005)

Page 12: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Visual Analysis of Conflicting Opinions

Chen et al. (VAST 2006)

Page 13: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Who Votes For What? A Visual Query Language for Opinion Data

Draper and Riesenfeld (Vis 2008)

Page 14: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Visual Opinion Analysis of Customer Feedback Data

Summary Report of printers

Scatterplot of customer reviews on printers

Circular Correlation Map

Oelke et al. (VAST 2009)

Page 15: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

OpinionSeer: Interactive Visualization of Hotel Customer Feedback

Wu et al. (InfoVis 2010)

Page 16: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Taking the Pulse of the Web: Assessing Sentiment on Topics in Online Media

Brew et al. (WebSci 2010)

Page 17: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Understanding Text Corpora with Multiple Facets

Shi et al. (VAST 2010)

Page 18: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

18

Research Problem• Can we design a sentiment visualization system that:

– Show how the sentiment evolves over time (trend)– Visualize both the sentiment analysis results and the structured

facet data, e.g. profile of the reviewer (facet)– Rather than only showing which document or feature tends to be

positive or negative, also demonstrate how the positives/ negatives are described in documents (context)

• Most existing sentiment visualization fails to meet all the requirements simultaneously

– Our VISA design is based on the TIARA prototype, which already brings together most features (trend, context, facet switching)

Page 19: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

19

Retrospect on TIARA Visualization(Emergency Room Record)

Page 20: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

20

Challenges for TIARA Sentiment Visualization• Failure of the document trend visualization

– Binary/ternary/scored classification of document-level sentiments will drop valuable pieces

BUT: It has BED BUGS and they BITE me!!!

Page 21: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

21

Challenges for TIARA Sentiment Visualization• Keyword Summarization

– Content visualized are keywords summarized from all the text, not echoing the sentiment-centric design

• Structured Facet– Sentiment-aware facet associations and distributions– Spatial (location) information

• Comparison– Categorical, temporal comparison, and sentiment comparison

as well

• Compatibility with sentiment analysis engines– Consumability of all kinds of sentiment analysis results

Page 22: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Sentiment Tuple• {Aspect, feature, opinion, polarity}

– Aspect: a sub-topic shared by some document In a hotel review, the room, the view, or the service– Feature: specific object the users are commenting

Entity, person, location, or abstract concepts– An opinion is a particular word or phrase describing a feature– Polarity of the opinion word/phrase in the context

……

Sentiment Analysis Model

aspect: feature: opinion: polarityaspect: feature: opinion: polarity……

aspect: feature: opinion: polarityaspect: feature: opinion: polarity……

aspect: feature: opinion: polarityaspect: feature: opinion: polarity……

{ “view”, + }

Aggregate

Page 23: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Keyword Summarization (TIARA)

A set of topics {T1, …Ti,… TN }

A set of keywords

{W1, …, Wj, …, WM}

A set of topic probabilities

{…, P(Ti | Dk), …}

A set of word probabilities

{…, P(Wj | Ti), …}

kth document in the collection

Rank the topics to present most valuable ones first

Select keyword sub-set for each time segment for content summary

{…} t-1, {…, Wj, …}t, {…} t+1,

Page 24: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

VISA Sentiment Keyword Summarization

{C1, …Ci,… CN }

A set of sentiment keywords(opinions/features)

{W1, …, Wj, …, WM}

A set of topic probabilities

{…, P(Ti | Dk), …}

A set of word probabilities

{…, P(Wj | Ti), …}

kth document in the collection

Let user select to compare aspects of a hotel or an aspect of several hotels

Select keyword sub-set for each time segment for sentiment summary

{…} t-1, {…, Wj, …}t, {…} t+1,

Aspects/Hotels

Page 25: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

VISA Mashup Visualization

SentimentTuple TrendSentiment

Tuple Trend

FacetCorrelations

FacetCorrelations

SentimentSnippets

SentimentSnippets

SearchSearch

Sentiment-CentricDocumentRanking

Sentiment-CentricDocumentRanking

FiltersFilters

Page 26: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

26

VISA Sentiment Visualization Framework

• Offline: – Document pre-processing– Sentiment analysis– Meta data parsing– Indexing

• Online: – Data Retrieval– Visualization– Interactions

Page 27: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Offline Analysis

Raw DataRaw Data Reader

Extractor

StatisticManager

DictionaryDictionary

IndexWriterIndexIndex

Meta Data Sentiment Data

Segment ExtractorSegment Extractor

Sentence ExtractorSentence Extractor

Text ExtractorText Extractor

Entity PolicyEntity Policy

Filter OpenNLP

Sentiment Entity Class No/Not

aspect: feature: opinion: polarityaspect: feature: opinion: polarity

Data Analysis Framework

Page 28: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Offline Analysis

Raw DataRaw Data Reader

3rd Party Sentiment Analysis Framework

IndexWriterIndexIndex

Meta Data Sentiment Dataaspect: feature: opinion: polarityaspect: feature: opinion: polarity

Page 29: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Data Server

Query ParserQuery Parser

Data RetrievalData Retrieval

Lucene

Hermes

Index

HttpServletHttpServlet

VISA

Data AdapterData Adapter

Page 30: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Sentiment Trend Optimizations• Sentiment tuple based negative/positive/(neutral) trends

Positive

Negative

Y axis: sentiment valueY axis: sentiment value

X axis: timeX axis: time

Time Sensitive Feature/Opinion wordsTime Sensitive Feature/Opinion words

Page 31: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

Sentiment-Centric Interactions

Page 32: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

32

Case Study ---- Summarizing Hotel Reviews

• Initial View

Page 33: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

33

Case Study ---- Summarizing Hotel Reviews

• Switch to ”Family” type only

(traveling in this type)

Page 34: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

34

Case Study ---- Summarizing Hotel Reviews

• Click on the “Free” sentiment word

(want to enjoy the free time or free breakfast?)

• It’s 30 min distance from the harbor!

Page 35: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

35

Case Study ---- Summarizing Hotel Reviews

• For two selected hotels

• Drill down to the “cleanliness” and “room” aspects

• Switch to the negative sentiments

Page 36: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

36

Case Study ---- Summarizing Hotel Reviews

• Comparing the recent reviews

Page 37: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

37

Case Study ---- NFL on Twitter

• Crawling tweets from Twitter on the topic of National Football League (NFL), from 03/2011 to 08/2011. (when the famous lock out happened)

• 665360 tweets from 307973 users, with an average length of 16.8 words.

• Tweet collection pre-processing:– Classify into 5 content topics: “season play”, “player draft”,

“lockout bad”, “lockout end” and “football return”.– Categorize according to the subject of the sentiments – 32 NFL

teams, by manually creating relevant subject keyword list for each team (full/nick name, city, stadium, head, owner and super stars)

Page 38: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

38

Case Study ---- NFL on Twitter• Overview of sentiments on content topics

– Reach peak in July when the new CBA signed

Page 39: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

39

Case Study ---- NFL on Twitter

• Subject-comparing view on 4 NFL Teams– “Green Bay Packers”, “Pittsburgh Steelers”, “New York Jets”, “New England Patriots”– A very large RED “CBA” for the Steelers: the only team to vote “NO” to CBA– “Brett Favre” for the Packers: the former NFL all-star quarterback in Packers, who has

claimed to return for several times. The fans are tired of the similar news at all.

Page 40: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

40

User Study ---- Setup• Subject

– VISA System with all functionalities– TripAdvisor.com– A plain text editor with search function

• Data– HK hotel cases with 3 hotels’ reviews– Both structured (ratings) and unstructured (review

comments) data inputs

• User– 12 users (7 male, 5 female), age 26~35– Each is given a gift as incentive

• Task– TI: look up specific sentiment-related information of a hotel

(e.g. traveler’s ratings).– T2: summarize opinions on a general aspect of a hotel (e.g.

the view of a hotel)

• Procedure– Within-subject design: user perform all tasks with all the

systems– Record user demographics, time of completion and

satisfactions and open-ended questions

TripAdvisorTripAdvisor

Text EditorText Editor

VISAVISA

Page 41: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

41

User Study ---- Objective Results

• Three metrics: Elapsed time (in minutes), task completion rate and task correctness.

0

0.5

1

1.5

2

2.5

3

VISA

TripAdvisor

TextEditor

VISA 1.66 1 0.75

TripAdvisor 2.94 0.81 0.42

TextEditor 2.69 0.86 0.67

Time(min) Completion Correctness

Significant advantages of VISA over the compared systems(t-test significance p< 0.004~ 0.034)

Page 42: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

42

User Study ---- Subjective Results

• Three metrics: Usefulness, userability and satisfaction.

0

1

2

3

4

5

VISA

TripAdvisor

TextEditor

VISA 4.58 4.08 4.29

TripAdvisor 2.46 2.67 2.38

TextEditor 2.5 2.33 2.17

Usefulness Usability Satisfaction

Subjective Evaluation Results

Page 43: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

43

User Study ---- Open Surveys

• Why VISA is thought better than the baseline systems:– “mash-up visualizations” and “rich interactions”– “Mash-up visualizations provide more information and it’s

quite intuitive”, “rich interactions make it easy to search what I want to know”

– Improvements to VISA: “it now needs some learning efforts to use VISA”, “It could introduce better UI design and richer interactions”.

Page 44: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

44

Summary

• We have presented the VISA system for generic sentiment visualization purpose– The backend core is the new sentiment-tuple definition, as well

as the faceted data model– In visualization, we introduce several critical optimizations over

TIARA in sentiment visualization scenarios: sentiment-tuple based trending, sentiment keywords, comparison, sentiment in document context, interactions

– Evaluated with two real-life case studies– Conduct formal user study to compare with two baseline

systems and demonstrate the clear advantage

Page 45: VISA: A VIsual Sentiment Analysis System Sept. 2012 Dongxu Duan 1 Weihong Qian 1 Shimei Pan 2 Lei Shi 3 Chuang Lin 4 1 IBM Research — China 2 IBM T. J

45

Thank You

MerciGrazie

Gracias

Obrigado

Danke

Japanese

English

French

Russian

German

Italian

Spanish

Brazilian PortugueseArabic

Traditional Chinese

Simplified Chinese

HindiTamil

Thai

Korean