visual sentiment analysis of web blogs - ceur-ws.orgceur-ws.org/vol-443/paper7.pdf ·...

8
Visual Sentiment Analysis of RSS News Feeds Featuring the US Presidential Election in 2008 Franz Wanner, Christian Rohrdantz, Florian Mansmann, Daniela Oelke, Daniel A. Keim University of Konstanz, Germany [email protected] ABSTRACT The technology behind RSS feeds offers great possibilities to retrieve more news items than ever. In contrast to these technical developments, human capabilities to read all these news items have not increased likewise. To bridge this gap, this paper presents a visual analytics tool for conducting semi-automatic sentiment analysis of large news feeds. While the tool automatically retrieves and analyzes RSS feeds with respect to positive and negative opinion words, the more de- manding news analysis of finding trends, spotting peculiar- ities and putting events into context is left to the human ex- pert. For a solid analysis the news similarity filter enables highlighting of similar or redundant news items. A case study about news related to the US presidential election in 2008 shows how the visual interface of the tool empowers the analyst to draw meaningful conclusions without the ef- fort of reading all news postings. Author Keywords sentiment analysis, opinion mining, information visualiza- tion, visual analytics ACM Classification Keywords H.5.2 Information Interfaces and Presentation: Miscellaneous INTRODUCTION The web is the largest information source in the world. One major aspect of the web is to bring news from all over the world via RSS feeds instantaneously on your screen. Apart from passive usage of the web as a media, web 2.0 technol- ogy helps more and more people to actively contribute to this valuable information source by creating content in an easy way. There are many possibilities to take an active part in the web: blogs, reviews and other ways to state comments. Analyzing news stories and user generated content is of huge importance for many people and organizations. Economic analysts, for example, would like to find consumer and pub- lic opinions on their products and services. Likewise, po- tential consumers seek experiences of existing users before Workshop on Visual Interfaces to the Social and the Semantic Web (VISSW2009), IUI2009, Feb 8 2009, Sanibel Island, Florida, USA. Copy- right is held by the author/owner(s). making a purchase decision or afterwards to cope with the product’s shortcomings or praise its functionality. Further- more, politicians want to find out their public reputation, the manner the news write about them, and the reaction of the public on these articles. Since public opinion polls are an expensive undertaking, our goal is to offer a semi-automatic approach by mining the web for particular key words, conducting sentiment analysis on the text to assess how positive or negative a particular news postings is, and then to present the information in a visual exploration tool. While our approach is not suitable to completely replace a thoroughly conducted opinion poll due to the lack of accuracy, it has also some unique advantages, namely low costs and the possibility to continuously monitor a particular subject in real-time. Knowing at an early stage that consumers have a problem with a sub-component of a product gives the company more time to react appropriately and to avoid damage to valuable trade marks. In this paper, we demonstrate a novel way of using text anal- ysis methods in combination with a visual representation. On the one hand, this system automatically evaluates the emotional content of a news posting. On the other hand, the visual interface empowers the human expert to draw mean- ingful conclusions, to selectively read a few news postings with strong emotional content, to discover trends, and to gain an overview of the development of chosen topic in the me- dia. To exemplify our tool we have a closer look at the news coverage in the web of the 2008 US presidential election. Out of 50 chosen political RSS newstickers, we retrieved all RSS articles containing at least one of the following key words: “Obama”, “McCain”, “Biden” and “Palin” as well as “Democrat” and “Republican”. Thereupon, the articles are automatically evaluated with respect to the contained posi- tive and negative opinion words, resulting in a normalized sentiment score for each article. For presentation purposes, these articles are then visualized on a daily timeline using symbols to encode the contained key words. The vertical position of each symbol is defined by the article’s sentiment score, which makes strong emo- tional news more visible. Furthermore, we demonstrate an interactive feature to show relations between the news items to track the development of a specific topic. The rest of this paper is structured as follows: In section 1

Upload: others

Post on 15-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visual Sentiment Analysis of Web Blogs - CEUR-WS.orgceur-ws.org/Vol-443/paper7.pdf · 2009-01-12 · Visual Sentiment Analysis of RSS News Feeds Featuring the US Presidential Election

Visual Sentiment Analysis of RSS News Feeds Featuringthe US Presidential Election in 2008

Franz Wanner, Christian Rohrdantz, Florian Mansmann, Daniela Oelke, Daniel A. KeimUniversity of Konstanz, Germany

[email protected]

ABSTRACTThe technology behind RSS feeds offers great possibilitiesto retrieve more news items than ever. In contrast to thesetechnical developments, human capabilities to read all thesenews items have not increased likewise. To bridge this gap,this paper presents a visual analytics tool for conductingsemi-automatic sentiment analysis of large news feeds. Whilethe tool automatically retrieves and analyzes RSS feeds withrespect to positive and negative opinion words, the more de-manding news analysis of finding trends, spotting peculiar-ities and putting events into context is left to the human ex-pert. For a solid analysis the news similarity filter enableshighlighting of similar or redundant news items. A casestudy about news related to the US presidential election in2008 shows how the visual interface of the tool empowersthe analyst to draw meaningful conclusions without the ef-fort of reading all news postings.

Author Keywordssentiment analysis, opinion mining, information visualiza-tion, visual analytics

ACM Classification KeywordsH.5.2 Information Interfaces and Presentation: Miscellaneous

INTRODUCTIONThe web is the largest information source in the world. Onemajor aspect of the web is to bring news from all over theworld via RSS feeds instantaneously on your screen. Apartfrom passive usage of the web as a media, web 2.0 technol-ogy helps more and more people to actively contribute to thisvaluable information source by creating content in an easyway. There are many possibilities to take an active part inthe web: blogs, reviews and other ways to state comments.

Analyzing news stories and user generated content is of hugeimportance for many people and organizations. Economicanalysts, for example, would like to find consumer and pub-lic opinions on their products and services. Likewise, po-tential consumers seek experiences of existing users before

Workshop on Visual Interfaces to the Social and the Semantic Web(VISSW2009), IUI2009, Feb 8 2009, Sanibel Island, Florida, USA. Copy-right is held by the author/owner(s).

making a purchase decision or afterwards to cope with theproduct’s shortcomings or praise its functionality. Further-more, politicians want to find out their public reputation, themanner the news write about them, and the reaction of thepublic on these articles.

Since public opinion polls are an expensive undertaking, ourgoal is to offer a semi-automatic approach by mining theweb for particular key words, conducting sentiment analysison the text to assess how positive or negative a particularnews postings is, and then to present the information in avisual exploration tool. While our approach is not suitable tocompletely replace a thoroughly conducted opinion poll dueto the lack of accuracy, it has also some unique advantages,namely low costs and the possibility to continuously monitora particular subject in real-time. Knowing at an early stagethat consumers have a problem with a sub-component of aproduct gives the company more time to react appropriatelyand to avoid damage to valuable trade marks.

In this paper, we demonstrate a novel way of using text anal-ysis methods in combination with a visual representation.On the one hand, this system automatically evaluates theemotional content of a news posting. On the other hand, thevisual interface empowers the human expert to draw mean-ingful conclusions, to selectively read a few news postingswith strong emotional content, to discover trends, and to gainan overview of the development of chosen topic in the me-dia.

To exemplify our tool we have a closer look at the newscoverage in the web of the 2008 US presidential election.Out of 50 chosen political RSS newstickers, we retrievedall RSS articles containing at least one of the following keywords: “Obama”, “McCain”, “Biden” and “Palin” as well as“Democrat” and “Republican”. Thereupon, the articles areautomatically evaluated with respect to the contained posi-tive and negative opinion words, resulting in a normalizedsentiment score for each article.

For presentation purposes, these articles are then visualizedon a daily timeline using symbols to encode the containedkey words. The vertical position of each symbol is definedby the article’s sentiment score, which makes strong emo-tional news more visible. Furthermore, we demonstrate aninteractive feature to show relations between the news itemsto track the development of a specific topic.

The rest of this paper is structured as follows: In section

1

Page 2: Visual Sentiment Analysis of Web Blogs - CEUR-WS.orgceur-ws.org/Vol-443/paper7.pdf · 2009-01-12 · Visual Sentiment Analysis of RSS News Feeds Featuring the US Presidential Election

Related Work text and sentiment analysis methods and vi-sual interfaces for them are discussed. The next section Vi-sual Sentiment Analysis then presents our processing, visu-alization, and interaction approaches for analyzing the newscoverage of the 2008 US presidential election. Afterwards,section Results shows how some interesting topics about thecandidates and their parties manifest in our visualization. Bysummarizing our contributions we draw our conclusions inthe last section.

RELATED WORK

Text AnalysisThe visualization and visual analysis of textual data is in-creasingly attracting interest in different application domains.Many of the early approaches in that area dealt with the vi-sualization of retrieval results (see e.g., VIBE [22] or In-foCrystal [27]). Furthermore, a variety of techniques con-centrate on the visualization of large document collections,most of which are based on dimensionality-reduction meth-ods (see e.g. WebSOM [23], Galaxies and ThemeScape ofIN-SPIRETM [30], or [9]). In contrast to this, text featurevisualization techniques visualize single documents in de-tail and show the distribution of specific text features acrossthe text. Prominent examples among these are e.g. TileBars[16], Seesoft [3], the FeatureLens [6], and Literature Finger-printing [19]. But also [1] and the Compus system of Feketeand Dufournaud [7] are worth being mentioned: As opposedto the other techniques they offer the possibility to visualizeseveral text features at once.

Relatively few approaches tackle the problem of visualizingtemporal variations across a set of documents as we do inthis paper. One example for such an approach is the well-known ThemeRiver visualization [15] that reveals the devel-opment of topics over time in a river-like graphic. Accord-ing to the metaphor each topic is represented as one colored“current” in the “river” that flows in the direction of the time-line from left to right. To allow for several different themesto be displayed at once the currents are stacked on top ofeach other. The thickness of a current at a specific pointin time represents the strength of the topic in the associateddocuments. TimeMines [28] and Narratives [8] are exam-ples for visualizations that are based on standard line charts.TimeMines automatically determines keywords and judgesthose keywords with respect to their temporal significancein the context of the corpus. Furthermore, keywords thatshow to have a similar development over time are groupedto form a topic. Narratives presents the development of aspecific topic over time and searches for correlated terms.

A similar concept is reported in [12]. The system BlogPulse(that can be found at www.blogpulse.com) monitors blogsand displays timelines that show how many blogs talk abouta specific topic at a specific point in time. In addition, hottopics are detected automatically. All of the mentioned time-oriented approaches have a common limitation: They merelydisplay the development of the significance of keywords ortopics over time. Our approach goes beyond that by meansof additionally revealing the sentiment of the documents.

Two further approaches being related to our work are [2] and[11]. Both of them analyze blogs and / or newspaper articleswith respect to their political orientation. However, none ofthe approaches explores the development over time as wedo. Instead they both focus on analyzing the link structurebetween the different blogs respectively the citation patternsfor newspaper articles. In addition, [11] takes into accounthow emotionally charged a post is.

Sentiment AnalysisWithin the abundant literature that exists in the context ofsentiment analysis and opinion mining, some major taskscan be identified:

• Classification of the statements of a document (or a sen-tence) as subjective or objective. (e.g. [29, 14])

• Classification of a document (or a sentence) as expressinga negative or positive sentiment (or opinion). (e.g. [25,5])

• Feature-based opinion mining made up by two successivesteps: First, the features (or attributes), that have beencommented on, are identified. Secondly, the respectiveopinion that has been expressed on them is detected. (e.g.[17, 18, 26, 21, 20])

Note that our approach is not contributing to the area of au-tomatic sentiment analysis but makes use of some of its stan-dard techniques. However, we contribute to the developmentof visualizations for sentiment analysis. Related work inthis respect includes [10, 24, 13]. The visualization, whichshows to have the highest resemblance to our work, can befound in [24]. The authors suggest to use bars to visualizehow many positive respectively negative statements – thatcomment on one of the analyzed attributes of a product – ex-ist within the document corpus. Our work is similar in thatwe also use the vertical deflection of bars to encode the opin-ion that is expressed. In contrast to [24] however, in our caseone bar represents one document instead of the summary ofall sentences talking about a specific attribute of a product.Moreover, in our visualization the development over time iscentral, something that is completely omitted in all of theabove mentioned approaches for sentiment analysis / opin-ion mining. In [10] customer reviews are visualized, too,but a Treemap representation is used to display the result ofthe analysis. Finally, [13] presents an adaptation of the RosePlot visualizations to illustrate the affective content of a doc-ument. In addition to positive and negative sentiments, thedocuments are also analyzed with respect to the categoriesvirtue, vice, pleasure, pain, power cooperative, and powerconflict.

VISUAL SENTIMENT ANALYSISData ProcessingThe data we used was gathered from 50 different RSS newsfeeds, that mainly dealt with the 2008 US presidential elec-tions. The RSS feeds were retrieved every 30 minutes duringa time interval of one month (10/09/2008 - 11/10/2008). Forevery news item in each feed we saved date, title and descrip-tion, as well as the id of the feed. Next, noise was eliminated

2

Page 3: Visual Sentiment Analysis of Web Blogs - CEUR-WS.orgceur-ws.org/Vol-443/paper7.pdf · 2009-01-12 · Visual Sentiment Analysis of RSS News Feeds Featuring the US Presidential Election

out of the title and description. With noise we refer to stringsthat do not carry any content, such as URLs or strings con-sisting of special characters. The concatenation of title anddescription was then considered to be the content of the newsitem. Finally, we filtered out those documents that containednone of the following signal words: “Obama”, “McCain”,“Biden”, “Palin”, “Democrat” and “Republican”. More than23,000 news items contained at least one of the six strings.

Pairwise similarities between news items were calculated byapplying a similarity measure, which counts the number ofnon-stopwords that two items have in common (normalizedby the length of the larger item). Although this is a relativelysimple measure it works quite well for the short descriptivetexts in the RSS news feeds.

Another aspect of interest is the sentiment context of a newsitem, which is done by enriching each item with a sentimentscore. For this purpose we make use of a freely availablelist of words that evoke positive or negative associations [4].We count the number of positive and negative words andevaluate the whole news item as rather positive if it containsin total more positive than negative words. Likewise, theitem is evaluated as rather negative if it contains more neg-ative than positive words. The absolute relation of positiveagainst negative words normalized by the item’s length, pro-vides our sentiment score. One important point to mentionhere is that the appearance of a candidate, e.g., in a negativecontext, does not necessarily mean, that the item containsnegative publicity for the candidate, but simply that he ap-pears in a negatively connoted context. This becomes clearwhen we consider the example of news telling that racistsplanned to assassinate Obama (see section “Results”). Thiswas bad news for Obama not about Obama, with a visiblynegative connotation.

Data VisualizationThe visualization on the one hand aims to give a meaningfulrepresentation of the data and on the other hand is intendedto be an appropriate starting point for the interactive explo-ration and discovery of interesting patterns. Figure 4 shows ascreenshot of the visualization. Each line represents one dayand each colored object depicts one news item. The newsitem’s emotional score is encoded by a vertical displacementof the news item. Colors encode whether the text mentionsthe Democratic party, the Republican party or both. Addi-tionally, the shape of the news objects visualizes whether thefirst candidate, the second candidate or only the name of theparty itself was mentioned. The following passages describeeach of those aspects in detail.

PlacementEvery news item is represented by an object in a 2D plane.The position of the object within the plane depends on thedate the news was published. Thereby, the day it was pub-lished accounts for the line it will be placed in (as each linerepresents one day) and the time of day determines its hor-izontal position within the line. The exact vertical positiondepends on the sentiment score of the object. According tothis value an object is slightly shifted up (positive) or down

(negative). Horizontal lines mark the position that a newsitem would have that is neither positive nor negative.

ColoringEverything that is solely related to the conservatives (Repub-lican party) is colored in red and everything purely related tothe liberals (Democratic party) in blue. Gray news objectsrelate both to the liberals and the conservatives, which basi-cally means that both camps are mentioned within the news’content.

ShapeThe use of different shapes for the object allows us to makea distinction between news items in which the first candi-date of a party was mentioned, the second candidate but notthe first candidate or none of them but only the name of theparty. Figure 1 shows the visual appearance of the differentshapes. Please note that we keep the horizontal interruptionsthat are utilized to mark news items that talk about the sec-ond candidate always at the same vertical position of eachline (regardless of the vertical shift of the object that encodesthe emotional score). This leads to a clear visual pattern ofcontinuous white horizontal lines, if several neighboring ob-jects refer to the second candidates only.

Figure 1. Symbols used to represent news items according to the ap-pearance of certain keywords.

OpacityWe paint our news objects with a relatively low opacity. Thatmeans they are partly transparent, which comes with two ad-vantages: First, the problem of overlapping news objects isreduced. In most cases every object is visible and can bedifferentiated clearly from its overlapping neighbors. Sec-ondly, if multiple news items are put on top of each other,the overall opacity at this position increases, resulting in anobject that is less opaque and can therefore be distinguishedfrom objects that represent just one news item. The situationthat several feeds bring the same news nearly at the samemoment in time is often the case when the news is very im-portant. That means that the less opaque news objects of-ten represent news that are more important and surely morewidely spread. Figure 2 visually illustrates the above men-tioned design decisions.

3

Page 4: Visual Sentiment Analysis of Web Blogs - CEUR-WS.orgceur-ws.org/Vol-443/paper7.pdf · 2009-01-12 · Visual Sentiment Analysis of RSS News Feeds Featuring the US Presidential Election

two horizontal lines represent one day

about one hour of the day

+

-

sentiment shift

higher α-value: same news item from different feeds

“Biden in neutral context”

“Democrats in negative context”

highlighted news item

“McCain in positive context”

Figure 2. Semantics of the visualization

Interactive Visual AnalyticsThe visualization is designed for an interactive data explo-ration. There are several possibilities to interact with thetool:

• Zooming: Continuous zooming allows to analyze certainparts at a greater level of detail.

• Details on demand: When the mouse is dragged over anews object, a tooltip appears containing date, time, feedid, and content of the item.

• Similarity search: With a mouse click on a news object,the search for similar news items is started. The news itemitself and every other news object that is related to it ishighlighted (please refer to section “Data Processing” forour definition of similarity). Figure 3 shows an example.

• Filtering: The user can select the different candidates /parties he is interested in. Another possibility to reducethe number of items that are displayed is to select one spe-cific RSS feed. Both filtering mechanisms can be usedto analyze in detail the behaviour of one specific newsprovider respectively the development of news for a sub-set of candidates and/or parties.

Figure 3. After selecting one news item, similar items are highlightedin yellow enabling the user to track specific topics (low threshold) orredundant postings (high threshold).

RESULTSFirst of all, we present an overview of all 50 monitored RSSfeeds over a time period of 31 days in Fig. 4. A prede-fined filter displays all news postings containing at least oneof the terms Obama, McCain, Biden, Palin, Democrat, and

Republican. To exemplify our Visual Analytics technique,we picked five interesting discussions in the monitored RSSfeeds.

Palin abused power in AlaskaOn Saturday, 10th October, many negative news postings oc-curred about Sarah Palin. Almost all articles deal with thetopic whether Sarah Palin had abused her power in Alaskaor not. As demonstrated in Fig. 5 there is a high density ofred shapes with two white bars symbolizing news postingsabout Palin. Their positions below the baseline denote thatmainly negative emotion words were used in these postings.Only one exceptionally positive red news item sticks out inthe visualization. A closer look at this posting reveals that itis a response from the McCain-Palin presidential campaign:“Sarah Palin acted ‘within proper and lawful authority’ inremoving the state’s public safety commissioner”.

Fri Oct 10 19:24:20 CST 2008 (Feed 49):Alaska panel finds Palin abused power in firing: ANCHORAGE, Alaska (AP) -- A legislative committee investigating Alaska Gov. Sarah Palin has found she unlawfully abused her authority in firing the state's public safety commissioner. The investigative report concludes that a family grudge wasn't the sole reason for firing Public Safety Commissioner Walter Monegan but says it likely was a contributing factor....

Fri Oct 10 19:41:49 CST 2008 (Feed 19):Palin abused power Alaska 'Troopergate' probe finds: AFP - Republican vice-presidential nominee Sarah Palin abused her position as Alaska Governor by pressuring officials to dismiss a state trooper, an investigator's report said.

Fri Oct 10 22:15:22 CST 2008 (Feed 39):Palin says report says she acted lawfully (Reuters): Reuters - Alaska Gov. Sarah Palin acted "within proper and lawful authority" in removing the state's public safety commissioner, the McCain-Palin Republican presidential ticket said on Friday in response to a state report.

Fri Oct 10 21:06:44 CST 2008 (Feed 18):Probe accuses Palin of abuse of power (AFP):AFP - Investigators found vice presidential nominee Sarah Palin abused her powers as Alaska governor, dealing another blow to Republican John McCain's struggling White House bid.

Fri Oct 10 21:50:40 CST 2008 (Feed 32):Alaska ethics probe says Palin abused her power: CHILLICOTHE, Ohio (Reuters) - An Alaska ethics inquiry found on Friday that U.S. Republican vice presidential candidate Sarah Palin abused her power as the state's governor, casting a cloud over John McCain's controversial choice of running mate for the November 4 election.

Figure 5. Media coverage dealing with the topic of Sarah Palin’s abuseof power as a governor of Alaska.

Bad news for the DemocratsApproximately one week before the US presidential elec-tion we detected a high appearance of news which included“Obama” (see Fig. 6). The sentiment scores of these post-ings were mainly negative and dealt with a plot to assassi-nate Barack Obama and 102 blacks. Note that the news arebad for him but not about him, meaning that a negative eventis related to him in the news postings although the negativeopinion words do not refer to him as a person.

The used emotion words were so strong, that even in theoverview it is possible to recognize the emergence of thenegative news of that event on 28th of October (see Fig. 4).Note that although each RSS posting only consist of a fewsentences, the few contained positive or negative opinionwords are sufficient to provide clear results. Further head-lines of that day discuss the corruption scandal of a Demo-cratic senator and result in negative headlines for the Democrats.

TV debate Obama vs. McCainIn the middle of October the final TV debate between theDemocrat candidate Barack Obama and the Republican can-

4

Page 5: Visual Sentiment Analysis of Web Blogs - CEUR-WS.orgceur-ws.org/Vol-443/paper7.pdf · 2009-01-12 · Visual Sentiment Analysis of RSS News Feeds Featuring the US Presidential Election

A

B

C

D

E

Figure 4. 31 days of the 2008 US presidential election showing a scandal of power abuse by Palin (A), the TV debate McCain vs. Obama (B),assassination plans against Obama (C), the election day (D), and a debate about Palin’s election wardrobe (E).

5

Page 6: Visual Sentiment Analysis of Web Blogs - CEUR-WS.orgceur-ws.org/Vol-443/paper7.pdf · 2009-01-12 · Visual Sentiment Analysis of RSS News Feeds Featuring the US Presidential Election

Mon Oct 27 14:24:25 CST 2008(Feed 37):ATF disrupts skinhead plot to assassinate Obama (AP): AP - The ATF says it has broken up a plot to assassinate Democratic presidential candidate Barack Obama and shoot or decapitate 102 black people in a Tennessee murder spree.

Mon Oct 27 15:45:26 CST 2008(Feed 38):Assassination plot targeting Obama disrupted (AP): AP - Law enforcement agents have broken up a plot by two neo-Nazi skinheads to assassinate Democratic presidential candidate Barack Obama and shoot or decapitate 88 black people, the Bureau of Alcohol, Tobacco Firearms and Explosives said Monday.

Mon Oct 27 16:45:39 CST 2008(Feed 31):Skinheads held over Obama death plot: WASHINGTON (Reuters) - Two white supremacist skinheads were arrested in Tennessee over plans to go on a killing spree and eventually shoot Democratic presidential candidate Barack Obama, court documents showed on Monday.

Figure 6. Democrats appears in “negative context”. Bad news forObama, but not about him.

didate John McCain was held. As shown in Fig. 7, newspostings of the event cover both candidates (gray) and gen-erally have low sentiment scores due to the criticism of bothcandidates against each other. The debate revealed little nov-elty with respect to each candidate’s political plans after theelection. Therefore, there were no strong positive statementsabout the event in the monitored feeds.

Wed Oct 15 22:20:20 CST 2008 (Feed 32): Wed Oct 15 22:36:54 CST 2008 McCain and Obama battle in contentious debate: HEMPSTEAD, New York (Reuters) - Republican John McCain and Democrat

(Feed 34):Obama, McCain Get Feisty in Final Presidential Debate:

Barack Obama battled fiercely on Wednesday in their liveliest and most contentious debate, with McCain attacking

Final Presidential Debate: Candidates mix it up on campaign attacks economics, taxes, "Joe the plumber."

Obama's tax plan, campaign tone and relationship with a 1960s radical.

the plumber.

Figure 7. TV debate

Obama wins the electionAs you can see in annotation D in Fig. 4 the election day isdominated by gray bars. This is due to the fact that thesenews postings reported about election results in particularstates, featuring scores of both candidates. In the evening ofthe election day lots of news postings were received aboutthe winner Barack Obama. The density of news about theDemocrats increased rapidly after the result was known anddominate the news for several days.

Palin’s wardrobeAlthough after the election the blue shapes increased im-mensely, some red negatively rated items stick out (see Fig.

8). These outliers deal with some critical notes about the ex-pensive wardrobe, which was bought by Sarah Palin for hercampaign, and her inappropriate use of language describingher critics.

Fri Nov 07 16:01:19 CST 2008 (Feed 37):Palin denounces her critics as cowardly (AP): AP - Alaska Gov. Sarah Palin is striking back at critics of the high-priced wardrobe she wore as the Republican vice presidential candidate....

Fri Nov 07 15:40:35 CST 2008 (Feed 23):GOP tries to sort out Palin's donor-funded duds: WASHINGTON (AP) -- Republican Party lawyers are still trying to determine exactly what clothing was purchased for Alaska Gov. Sarah Palin, what was returned and what has become of the rest.....

Fri Nov 07 17:56:01 CST 2008 (Feed 31):Palin fires back at leaks questioning her smarts: WASHINGTON (Reuters) - Alaska Gov. Sarah Palin fired back on Friday against post-election claims by aides to Republican presidential candidate John McCain that she thought Africa was a country, not a continent, calling the anonymous sources "jerks."

Fri Nov 07 16:38:59 CST 2008 (Feed 39):Palin denounces her critics as cowardly (AP): AP - Alaska Gov. Sarah Palin called her critics cowards and jerks Friday for deriding her anonymously and insisted she never asked for the expensive wardrobe purchased for her use on the presidential campaign.

Figure 8. Palin under attack after the elections.

Further trendsThe Democratic vice presidential candidate Joe Biden, whois represented by blue bars with two interruptions, was notreferenced often. As it can be seen in Fig. 4, he appears veryrarely compared to the Republican vice presidential candi-date Sarah Palin.

A further discovery was that some feeds show daily patterns.For example, one RSS-feed only sent messages in the morn-ing at about 7AM, others broadcast their news during work-ing hours and some feeds even switched the coverage of po-litical events within daily patterns, which is probably due totwo editors each preferring news about one party and takingturns in writing news postings.

Often, the same news story is broadcasted in many differentfeeds (e.g., the above mentioned news about Palin’s wardrobe).This is mainly due to the fact that some feeds immediatelybroadcast the news copied from a particular news agency,whereas other feeds broadcasted this information later. An-other feed resent the same news posting several times asshown in Fig. 9.

CONCLUSIONSThe main contribution of this paper is the combination of asentiment analysis method with a visualization technique re-vealing the emotional content of RSS news feeds over time.Through textual filters, we focused our analysis on the 2008US presidential election featuring positive and negative newsitems about the presidential candidates Obama and McCain,the vice president candidates Biden and Palin and the twomajor parties. The timeline visualization builds upon threebasic elements, first the attribute color denotes the politicalparty featured in the news article, second, different shapesare used to distinguish between the discussed persons, andthird, the emotional score of each RSS news article resulted

6

Page 7: Visual Sentiment Analysis of Web Blogs - CEUR-WS.orgceur-ws.org/Vol-443/paper7.pdf · 2009-01-12 · Visual Sentiment Analysis of RSS News Feeds Featuring the US Presidential Election

Figure 9. Technical failure or search engine optimization resulting inresending the same news postings over and over again.

in the vertical position of the representative symbol on thetime line.

Within the result section, we showed how some emotionaldiscussions manifested in our news visualization: 1) Palinabused power in Alaska, which resulted in many negativenews items and her own version sticking out as a highly pos-itive article. 2) The story about assassination plans againstObama dominated the news for several hours with highlynegative sentiment scores. 3) The final TV debate consistedof mainly gray elements since reports featured both candi-dates. In general, the accusations of both candidates againsteach other resulted in more negative than positive sentimentscores. 4) Obama wins the elections, which is documentedby the vast dominance of blue news elements on the eve ofthe election day and the following days. 5) Even after theelection a discussion about the expensive wardrobe of Palinfills negative headlines.

The tool’s interaction concept shows the corresponding RSSnews articles when the mouse is moved over a symbol onthe timeline. To find redundant or similar news items in theprocess of analyzing particular events, we furthermore im-plemented a simple document similarity filter, which afterselecting a particular news item highlights all related newspostings surpassing a certain threshold of similarity.

We believe that the presented analysis tool can not only beused to monitor public emotional discussions, but is also ca-pable of evaluating product reviews, public opinions on aparticular subject, or to get hints about the reputation an en-terprise. By offering sentiment analysis functionality of amultitude of large RSS feeds in real-time, users of this tech-nique can take early action, such as reacting before a topicdominates news coverage. This strategic dimension of ourapplication is very valuable for public relation specialistsand could be implemented in early warning systems. Fur-thermore, we expect the tool to be useful for monitoringthe evolution of popularity of certain products, persons, orviews, ultimately answering the question about why a posi-tive public image turned into a negative one.

Future WorkFor computing the similarity between news items we useda simple word matching method. Due to the fact that many

news items are copied from other news tickers, related RSSpostings are often based on the text of the same announce-ment of a newswire and therefore often contain almost iden-tical vocabulary. For the analysis of other content, such asproduct reviews or the full articles linked in the RSS tick-ers, more complex document similarity measures could beemployed. Furthermore, we believe that more sophisticatedsentiment analysis methods can be integrated into the pre-sented analysis tool.

AcknowledgementThis work has been funded by the research center ”Compu-tational Analysis of Linguistic Development” at the Univer-sity of Konstanz and by the German Research Society (DFG)under the grant GK-1042, Explorative Analysis and Visual-ization of Large Information Spaces, Konstanz.We thank the anonymous reviewers of the VISSW 2009 fortheir valuable comments.

REFERENCES1. A. Abbasi and H. Chen. Categorization and analysis of

text in computer mediated communication archivesusing visualization. In JCDL ’07: Proceedings of the2007 conference on Digital libraries, pages 11–18,New York, NY, USA, 2007. ACM.

2. L. A. Adamic and N. Glance. The political blogosphereand the 2004 U.S. election: divided they blog. InLinkKDD ’05: Proceedings of the 3rd internationalworkshop on Link discovery, pages 36–43. ACM, 2005.

3. T. Ball and S. G. Eick. Software Visualization in theLarge. IEEE Computer, 29(4):33–43, 1996.

4. V. Buvac. Internet General Inquirer, 2008.http://www.webuse.umd.edu:9090/ as retrieved on Nov.14, 2008.

5. K. Dave, S. Lawrence, and D. M. Pennock. Mining thepeanut gallery: opinion extraction and semanticclassification of product reviews. In WWW ’03:Proceedings of the 12th international conference onWorld Wide Web, pages 519–528. ACM, 2003.

6. A. Don, E. Zheleva, M. Gregory, S. Tarkan, L. Auvil,T. Clement, B. Shneiderman, and C. Plaisant.Discovering interesting usage patterns in textcollections: integrating text mining with visualization.In CIKM ’07: Proceedings of the sixteenth ACMconference on Conference on information andknowledge management, pages 213–222. ACM, 2007.

7. J.-D. Fekete and N. Dufournaud. Compus: visualizationand analysis of structured documents for understandingsocial life in the 16th century. In DL ’00: Proceedingsof the fifth ACM conference on Digital libraries, pages47–55, New York, NY, USA, 2000. ACM.

8. D. Fisher, A. Hoff, G. Robertson, and M. Hurst.Narratives: A Visualization to Track Narrative Eventsas they Develop. In IEEE Symposium on VisualAnalytics and Technology (VAST 2007), pages115–122, 2008.

7

Page 8: Visual Sentiment Analysis of Web Blogs - CEUR-WS.orgceur-ws.org/Vol-443/paper7.pdf · 2009-01-12 · Visual Sentiment Analysis of RSS News Feeds Featuring the US Presidential Election

9. B. Fortuna, D. Mladenic, and M. Grobelnik.Visualization of Text Document Corpus. InformaticaJournal, 29(4):497–502, 2005.

10. M. Gamon, A. Aue, S. Corston-Oliver, and E. Ringger.Pulse: Mining Customer Opinions from Free Text. InAdvances in Intelligent Data Analysis VI, pages121–132. Springer, 2005.

11. M. Gamon, S. Basu, D. Belenko, D. Fisher, M. Hurst,and A. C. Konig. BLEWS: Using Blogs to ProvideContext for News Articles. In ICWSM, 2008.

12. N. Glance, M. Hurst, and T. Tomokiyo. BlogPulse:Automated Trend Discovery for Weblogs. In WWW2004 Workshop on the Weblogging Ecosystem. ACM,May 2004.

13. M. L. Gregory, N. Chinchor, P. Whitney, R. Carter,E. Hetzler, and A. Turner. User-directed SentimentAnalysis: Visualizing the Affective Content ofDocuments. In Workshop on Sentiment and Subjectivityin Text, pages 23–30, 2006.

14. V. Hatzivassiloglou and J. Wiebe. Effects of adjectiveorientation and gradability on sentence subjectivity,2000.

15. S. Havre, E. Hetzler, P. Whitney, and L. Nowell.ThemeRiver: Visualizing Thematic Changes in LargeDocument Collections. IEEE Transactions onVisualization and Computer Graphics, 8(1):9–20, 2002.

16. M. A. Hearst. TileBars: Visualization of TermDistribution Information in Full Text InformationAccess. In Proceedings of the Conference on HumanFactors in Computing Systems, CHI’95, 1995.

17. M. Hu and B. Liu. Mining and summarizing customerreviews. In KDD ’04: Proceedings of the tenth ACMSIGKDD international conference on Knowledgediscovery and data mining, pages 168–177. ACM,2004.

18. M. Hu and B. Liu. Mining Opinion Features inCustomer Reviews. In AAAI, pages 755–760, 2004.

19. D. A. Keim and D. Oelke. Literature Fingerprinting: ANew Method for Visual Literary Analysis. In EEESymposium on Visual Analytics and Technology (VAST2007), pages 115–122, 2007.

20. S.-M. Kim and E. Hovy. Extracting Opinions, OpinionHolders, and Topics Expressed in Online News MediaText. In Proceedings of the ACL Workshop onSentiment and Subjectivity in Text, pages 1–8, 2006.

21. N. Kobayashi, K. Inui, Y. Matsumoto, K. Tateishi, andT. Fukushima. Collecting Evaluative Expressions forOpinion Extraction. In IJCNLP, pages 596–605, 2004.

22. R. R. Korfhage. To see, or not to see – is That thequery? In SIGIR ’91: Proceedings of the 14th annualinternational ACM SIGIR conference on Research anddevelopment in information retrieval, pages 134–141.ACM Press, 1991.

23. K. Lagus, T. Honkela, S. Kaski, and T. Kohonen.Self-organizing maps of document collections: A newapproach to interactive exploration. In E. Simoudis,J. Han, and U. Fayyad, editors, Proceedings of theSecond International Conference on KnowledgeDiscovery and Data Mining, pages 238–243. AAAIPress, 1996.

24. B. Liu, M. Hu, and J. Cheng. Opinion observer:analyzing and comparing opinions on the Web. InWWW ’05: Proceedings of the 14th internationalconference on World Wide Web, pages 342–351. ACM,2005.

25. B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?:sentiment classification using machine learningtechniques. In EMNLP ’02: Proceedings of the ACL-02conference on Empirical methods in natural languageprocessing, pages 79–86. Association forComputational Linguistics, 2002.

26. A.-M. Popescu and O. Etzioni. Extracting productfeatures and opinions from reviews. In HLT ’05:Proceedings of the conference on Human LanguageTechnology and Empirical Methods in NaturalLanguage Processing, pages 339–346. Association forComputational Linguistics, 2005.

27. A. Spoerri. InfoCrystal: a visual tool for informationretrieval & management. In CIKM ’93: Proceedings ofthe second international conference on Information andknowledge management, pages 11–20. ACM, 1993.

28. R. Swan and D. Jensen. TimeMines: ConstructingTimelines with Statistical Models of Word Usage,2000.

29. B. Wang, B. Spencer, C. X. Ling, and H. Zhang.Semi-supervised Self-training for Sentence SubjectivityClassification, pages 344–355. Lecture Notes inComputer Science. Springer Berlin / Heidelberg, 2008.

30. J. A. Wise, J. J. Thomas, K. Pennock, D. Lantrip,M. Pottier, A. Schur, and V. Crow. Visualizing thenon-visual: spatial analysis and interaction withinformation from text documents. In INFOVIS ’95:Proceedings of the 1995 IEEE Symposium onInformation Visualization, pages 51–58, 1995.

8