yahoo! engagement study

15

Click here to load reader

Upload: yahoo-deutschland

Post on 22-Jun-2015

681 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Yahoo! Engagement Study

TECHNICAL REPORT

YL-2010-008

EDISCOPE: SOCIAL ANALYTICS FOR ONLINE NEWS

Yury LifshitsSanta Clara, CA 95054

{[email protected]}

December 20, 2010

Bangalore • Barcelona • Haifa •Montreal • New YorkSantiago • Silicon Valley

Page 2: Yahoo! Engagement Study

Yahoo! Labs Technical Report No. YL-2010-008

Page 3: Yahoo! Engagement Study

EDISCOPE: SOCIAL ANALYTICS FOR ONLINE NEWS

Yury LifshitsSanta Clara, CA 95054

{[email protected]}

December 20, 2010

ABSTRACT: We present Ediscope — an system for measuring social engagement around onlinenews articles. Ediscope collects signals from Twitter, Facebook and Bit.ly. Using our link spotterand social crawler we address a number of questions. What is a lifespan of a typical news story?What are the typical engagement numbers per-pageview? Can social signals be used for pageviewestimates? How much improvement a social optimization can bring to a news source? Our firstresults indicate that less than 20% of activity happens to an article after its first 24 hours. In averagea story has 5-20 social actions per 1000 pageviews. For most feeds, top 7 stories a week capture65% Facebook actions and 25% retweets. The correlation between pageviews and social signalsis surprisingly low. Our measurements indicate a double digit improvement potential for socialoptimizations.

Page 4: Yahoo! Engagement Study

1. Introduction

Online news are on the way to become our primary source of information. In order to win thecompetition and delight the users, the editors of online news have to constantly optimize their con-tent strategy. Content strategy is a new applied discipline that addresses the following questions:What should we write about? How many articles per day? How to allocate coverage shares be-tween main topics? How to discover breaking stories? Which stories to promote within a website?What is the most effective navigation structure for our content? Next to content strategy, there isthe emerging field of social media optimization (SMO): How to maximize engagement? How tomaximize secondary traffic from social sources (Facebook, Twitter)? How to grow the number offollowers, subscribers and fans?

To solve the problems of content strategy and social media optimization one needs both artand science. As web news are inherently more measurable than print news, the role of scienceis increasing. Until recently, most solutions were based on click-through rates, time spent, eyetracking and pageviews. This information is typically available only for website owners. Therefore,it was hard to create generic measurement and optimization solutions. Fortunately, in the last coupleof years, social signals emerged as a universal and public feedback mechanism. In this paper, wepresent a study based on Facebook likes, links in Twitter, and clicks on Bit.ly links. The availabilityof social signals for content strategy problems created the new research direction of social mediaanalytics [1].

Questions we address in this study: For how long an average article receives user attention?Can we guess the pageviews counts from social signals? Can social signals be used to promotebest stories? Should editors focus on producing better content or on producing more content? Howmuch improvement can it bring?

Contribution. Our first contribution is the data engineering infrastructure we build for the project.Ediscope system has modules for link discovery, signal monitoring, statistical analysis and visual-ization. Ediscope data and lookup tool are available at http://ediscope.labs.yahoo.net.Currently, Ediscope toolkit is available on request as it is subject to third-party API rate limits. Feelfree to contact Yury at [email protected] to use Ediscope for your project or to order a customreport on your favorite news source.

Our most surprising finding is the low correlation between social signals and the actual pageivewcounts. The gap is especially large for non-top new, in that case Pearson coefficient approaches 0.5.To understand the role of these low correlations we introduce a simple user experience model. Underthis model we demonstrate a potential for double digit improvement at Gawker, Business Insider,Change.org and Forbes blogs.

In average we see around 10 Facebook/Twitter actions per 1000 pageviews. Correlation betweensocial activities is higher for the top news than in the average case. Mainstream sources have muchmore Facebook activity than mentions on Twitter. Tech media has the opposite situation. Facebookactions are much more skewed to top news. Finally, Twitter signals have slightly better correlationto pageviews counts.

Page 5: Yahoo! Engagement Study

Our results show that almost universally across news sources less than 20% of activity happensafter the first 24 hours. Feeds and frontpages drive attention to the latest content units. Search bringstraffic to “evergreen” content like Wikipedia. But there is no driver for materials with mid-range(few weeks – few months) lifespan. Perhaps, we need a new promotion mechanism for this type ofcontent.

Remark on focus. When scientists work with real world data there are two mindsets. One canfocus on hard/intelligent tasks like model fitting and parameter predictions. This approach makes iteasy to judge the project by comparing accuracy of results to the previous work. The other method isto measure the raw signals and turn them into actionable insights for domain experts. In this case, thefindings can be judged by novelty of measurements and importance of resulting recommendations.This study follows the second approach. Here are our takeaway lessons for editors and productmanagers of online news:

Create new promotion mechanisms for in-depth content. At the moment there is no middle placebetween breaking news and reference content. Perhaps, we need dedicated feed, section andfrontpage module that highlight articles of mid-range lifespan.

Use social signals for content optimization. There is a serious gap between what content units aremost liked and what content units receive the most pageviews. In other words, user experiencecan be improved by using Facebook likes and retweet counts to promote the most popularcontent.

Check your engagement scores. If you see less than 10-20 social actions per 1000 pageviews,your sharing functionality can be improved. Typically, it is as simple as getting the buttons ofthe right size, at the right place and minimize the number of clicks to share your content.

Check your head/tail structure. If you have heavy head, improvements in quality and promotionmechanisms should be your priorities. If you have heavy tail, your best opportunity is inexpanding content production. According to our measurements, one has heavier-than-typicalhead if over 75% of weekly Facebook actions or over 45% of weekly retweets is concentratedin top 7 articles.

1.1. Related work

Social signals (Facebook likes, Retweet counters, Bit.ly click counters) are relatively new phe-nomena. In particular, Facebook Like button was introduced in May 2010, just 6 months prior tothis paper. Until now, social analytics research was centered around text-based signals [6, 7, 8]. Toour knowledge, we present the first temporal study of Facebook like counts.

Before social signals, researchers were looking into comment counts, Digg counts and Youtubeviewcounts. Tsagkias, Weerkamp and de Rijke developed several algorithms to predict the totalvolume of comments shortly after publication [12, 13]. Paul Ogilvie measured and modeled totalcomment counts across various RSS feeds as a part of FeedHub project [9]. Cha, Kwak, Rodriguez,

Page 6: Yahoo! Engagement Study

Ahn, and Moon performed a long tail analysis of Youtube and Daum videos [3]. Avramova, Wit-tevrongel, Bruneel and De Vleeschauwer developed classifier that distinguishes videos with expo-nential and power law popularity decays [2]. Salman and Rangwala showed how to predict a totalDigg count shortly after publication [10]. Spiliotopoulos studied correlations between Digg countsand comment counts for most popular stories [11].

The key advantage of social signals comparing to comment/Digg/Youtube counts is their uni-versality. Only now one can develop optimization/prediction/recommendation systems that will beapplicable to any news source on the Web.

2. Overview of Ediscope System

2.1. Architecture of Ediscope.

For our study we implemented a new social analytics system called Ediscope. It has four primarycomponents. Link spotting tool is taking RSS feeds as an input and check them regularly to spotnew links. In many cases, RSS feeds present proxy links in order to measure clicks from RSSreaders. In particular, Feedburner and Pheedo do that. In this cases we convert proxy links tothe original ones. The second component is signal crawler. It takes news URLs and calls publicAPIs (Facebook, Bit.ly, TweetMeme) to retrieve the current numbers for a given story. We alsoimplemented custom scraping for pageview counts. After that, we have monitoring componentthat re-crawl active links in our database regularly (by default, every hour). Ediscope’s monitorcomputes the deltas to the previous crawl for measuring activity over the last interval. Monitoringfunctionality is used for temporal analysis of social engagement. Finally, we call Google Chart APIfor dynamic visualization of results at Ediscope’s website.

In its current form, Ediscope has certain limitations. First of all, APIs we use have strict ratelimits. In particular, TweetMeme only allows 250 requests per 60 minute time period. This forcedus to focus on smaller datasets. Secondly, the same news article can be represented by severalURLs. Sometimes, Facebook, Bit.ly or Twitter fail to recognize these links as the same object. Asa result, APIs return lower engagement numbers, missing likes, clicks and retweets on non-canonicversions of an article. E.g. Wall Street Journal has different URLs for a story when you visit itdirectly vs. when you visit it from the frontpage. Next, many top websites do not have RSS feedsor their feeds do not work properly. For example, Yahoo’s today module, the central piece of itsfrontpage, does not have a feed. In these cases, one has to use manual lookups or scraping. Finally,Ediscope is using a pull mechanism to discover new stories. By the time we add an article to oursystem, around 15% of its social activity has already happened. In the future, push mechanismssuch as PubHubSubBub can be used to address this issue.

There are several commercial systems in the space of social analytics. Postrank is a proprietaryarticle ranking algorithm that takes social signals into account. BackType is a lookup system thatretrieves the current values of social metrics. Unlike Ediscope, it does not have the fully accessibletemporal profiles or pageview extractor modules. Klout is using social signals to rate news sourcesand Twitter personalities.

Page 7: Yahoo! Engagement Study

2.2. Datasets.

We created three datasets for our study: temporal set, pageview set, head-tail set. For temporalanalysis we selected 10 RSS feeds from major US news sources. We used our linkspotting moduleto discover 20 articles per source. Link spotter was checking RSS feeds every 10 minutes in orderto discover articles almost immediately after publishing. Then, we used our monitoring tool toupdate social counts every hour and compute the corresponding delta values. As a result we havegot temporal social profiles for 20 articles at 10 sources. For pageview analysis we consider fourmajor content networks that explicitly show viewcounts at their articles: Business Insider, Gawker,Forbes Blogs and Change.org. For every network, we picked three RSS feeds, launched our linkspotting module and kept it live until we spotted around 50-75 articles per network. Then we waitedfor several days until the total social counts are close to their final values. Then, we used our crawlerto measure social counts and pageview counts for every article in our dataset. For head/tail analysiswe looked at RSS feeds of several major news sources. For every publisher, we used link spotter toget all articles from a one week period (around 200 articles per feed). Then we crawled them onceto collect social counts.

3. Empirical Study

3.1. Article Lifespan

In our temporal study we track 20 articles from each of the following sources: Washington Post,Gizmodo, CNN, MSNBC, HuffingtonPost, Yahoo News, New York Times, Engadget, Mashable,and TechCrunch. On average, every story has 901 Facebook actions (likes, shares and Facebookcomments), 221 retweets and 660 clicks on from Bitly-shortened links. The following table repre-sents percentages of activities for the first, second, third, forth and fifth interval of 24 hours afterpublication. Note that the total share of activity is significantly less than 100%. This is due to ac-tivity in the interval between the time a story was published and the time Ediscope has discoveredit. Out of all sources, Engadget articles have the slowest decay of activity and Yahoo News has thesharpest decay.

Page 8: Yahoo! Engagement Study

Signals in average 1 day 2 day 3 day 4 day 5 dayFacebook 73.94 11.57 2.83 1.29 0.48Twitter 70.71 5.11 1.72 0.69 0.37Bitly 73.27 8.07 2.49 1.06 1.01Engadget signalsFacebook 56.13 24.40 9.03 4.35 1.99Twitter 71.27 9.24 4.12 1.28 0.71Bitly 76.53 10.02 4.12 1.54 0.86Yahoo News signalsFacebook 85.49 6.69 1.01 0.38 0.10Twitter 84.80 4.21 0.33 0.13 0.00Bitly 33.88 2.08 0.40 0.21 0.05

Figure 1: Average activity of Engadget article during the first 68 hours of track-ing. Deep blue represents Facebook, light blue represents Twitter, yellow rep-resents Bit.ly.

Figure 2: Social activity of Engadget article “BlackBerry users running out ofloyalty”

Page 9: Yahoo! Engagement Study

Here are our main observations:

• Majority (typically, over 80%) of social activity happens during the first 24 hours.• Monotonicity. Majority of shapes are monotone or monotone after daytime correction (bump-

next-morning effect).• Twitter is geeky. While mainstream sources like NYT, Yahoo, CNN, MSNBC and Washington

Post have up to 10 Facebook actions for one retweet, TechCrunch and Mashable have moreretweets than Facebook signals. The Facebook advantage over Twitter in mainstream newsindicates that it can be a more reliable signal for content optimization solutions.• Non-original content has lower activity. HuffingtonPost has two patterns: one for original

posts, another for aggregated content. Five links from TechCrunch feed are re-posts fromCrunchGear and TechCrunch.EU and have much lower counts than TC-proper articles.• User experience flaws. The sharing functionality can have serious affect on total amount of

activity. In particular, at New York Times Twitter buttons do not directly tweet the story, butinstead ask reader to use Twitter for logging into NYT.

The fact that most activity happens during the first day has serious implications for editorsand product managers of online news. As our study shows, the currently used mechanisms forpromotion (feeds, frontpage promotions, cross-linking) are only capable for driving the first dayaudience. In such an environment, weekly/analytic/evergreen content is highly discouraged andunsustainable. Thus, if a certain publisher wants to produce longer-lifespan articles, it should departfrom existing content promotion strategies. On a positive side, we feel that the opportunity of highquality weekly/monthly analytic content is wide open in almost every vertical.

3.2. Per-pageview Statistics

Several online content networks display actual pageview counts. This allows us to computeaverage amounts of social activity per 1000 pageviews. In some cases several top stories havedifferent activity pattern than the rest of the site. To get more robust results we compute averagesboth for full sets of articles and the sets excluding top 10 articles.

Network Facebook Twitter Bit.ly FB (non-top) TW (non-top) BT (non-top)Gawker 24.59 4.66 13.36 11.55 4.74 2.65Forbes blogs 4.61 9.16 41.41 5.13 11.86 29.00Business Insider 3.08 6.40 34.37 3.90 28.99 106.47Change.org 4.43 2.74 3.54 8.69 4.12 6.25

Then we look at the Pearson correlation coefficient between social signals and the actual pageviewcounts. We also compute correlations between Facebook and Twitter signals and between Bit.ly andTwitter signals.

Page 10: Yahoo! Engagement Study

Network FB / PV TW / PV BT / PV FB / TW BT / TWGawker 0.92 0.95 0.93 0.95 0.95Forbes blogs 0.35 0.40 0.63 0.34 0.63Business Insider 0.93 0.54 0.65 0.65 0.87Change.org -0.01 0.45 0.05 0.34 0.65Excluding top 10 newsGawker 0.47 0.63 0.41 0.47 0.35Forbes blogs 0.12 0.34 0.55 0.31 0.56Business Insider 0.34 0.43 0.53 0.50 0.80Change.org 0.67 0.50 -0.09 0.47 0.75

To get a visual sense of correlations we present plots for Gawker and Change.org. Absolutevalues are scaled to fit in the same space. The top-right point at Gawker plot is in fact far outside ofthe chart (Gawker has one outstandingly popular story).

Figure 3: Correlation between retweets and pageviews at Gawker network

Let us make some observations from the above tables:

• On average articles have around 10 Facebook/Twitter actions per 1000 pageviews.• With the exception of Facebook signals at Gawker, the top news have less social actions per-

pageview than the average stories.• For the non-top news, correlation between social signals and pageviews is around 0.5. Recall

that Pearson coefficient is ranging from -1 (perfectly negatively correlated) to 0 (totally inde-pendent) to 1 (perfectly positively correlated). Thus, 0.5 value means that social signals are asclose to perfect correlation as they are to total independence.• In 6 cases out of 8, retweets have higher correlation to pageviews than Facebook actions.• Change.org shows negative correlations in some cases. An article is more likely to get Face-

book activity if it has less pageviews. It turns out that “Social Entrepreneurship” section hasmuch more pageviews but the same (or even slightly lower) Facebook counts. Once we re-move articles from this section, the correlation returns to positive value.• As expected, bit.ly clicks are better correlated to retweets than Facebook signals.

Page 11: Yahoo! Engagement Study

Figure 4: Correlation between pageviews vs. and Facebook (dark blue), Twitter(light blue) and Bit.ly (yellow) signals at Change.org. The gap in pageviewsrepresents difference in popularity between different sections of the portal.

Looking at our per-pageview results, one can try to reconstruct pageview counts for the rest ofthe Web. The baseline guess would be around Facebook count (or Twitter count) times 100. As ourmeasurements show, there are more chances to accurately predict the pageviews for a top story thanto do so for an average article. And looking at our lifespan study, we recommend Facebook overTwitter as the primary signal for mainstream sources.

What lessons can one learn from these measurements? At the moment the role of social trafficin overall article success seems to be very small. For an average story there is a very low correlationbetween social signals and its pageview count. When we include top stories to the picture, socialactivity per pageview actually goes down. These observations hint that factors different from lik-ability and social cascades are playing the leading role in pageview success. As a result, traffic isallocated to not-so-likable stories.

Let us do the following mind experiment. Assume for a moment that Facebook count or Twittercount represents the actual reader satisfaction score. Then we can compute the total user satisfactionscore as the sum of products between pageviews and Facebook/Twitter counts. Now, let us reallocatepageview counts in a way that the top pageview value corresponds to the top Facebook count, thesecond top corresponds to the second top and so on. Then, we can calculate the “optimal” usersatisfaction score. In other words, we want to check how much user benefit promotion-by-likabilitycan bring to existing content networks. Below is the table of our results.

Network FB increase TW increase FB increase (non-top) TW increase (non-top)Gawker 1.019 1.026 1.330 1.181Forbes blogs 1.566 1.403 1.796 1.341Business Insider 1.047 1.342 1.402 1.227Change.org 2.346 1.245 1.109 1.110

Page 12: Yahoo! Engagement Study

As we see, all networks have double digit potential to increase their user experience for non-topstories. Forbes blogs and Change.org can significantly increase the overall user experience. Again,Change.org results look a bit weird because it has two cluster of very different articles. One aremore likable, others get more pageviews. So the overall experience can be improved significantly.Once we remove the top news (one cluster), the rest of the site can only achieve 10-11% increase. Ofcourse, our model for user satisfaction score is oversimplification, but it can be used as a first-orderapproximation of possible improvement based on social signals.

3.3. Head vs. Tail Analysis

In our final experiment we collect links from RSS feeds at several US news sources over thecourse of one week. These feeds have from 64 to 226 items per week. Then, for every source weretrieve and sort the social counts for discovered articles. We compute the percentage of weeklysocial activity that corresponds to the top story, top 7 stories and all stories outside top 7. We usethe constant 7 as a reflection of one-story-per-day strategy.

Feed Articles tracked Top item: FB / TW Top 7: FB / TW The rest: FB / TWTechCrunch 182 32.3 / 4.6 61.5 / 16.8 38.5 / 83.2Mashable 162 23.1 / 2.1 47.1 / 13.2 52.9 / 86.8Wired 120 9.9 / 4.8 41.4 / 24.9 58.6 / 75.1Engadget 200 44.3 / 18.9 68.7 / 27.5 31.3 / 72.5Wall Street Journal 201 36.6 / 5.8 65.4 / 18.5 34.6 / 81.5Vanity Fair 64 21.8 / 11.4 70.5 / 44.7 29.5 / 55.3Yahoo! Upshot 109 28.8 / 26.0 75.7 / 59.1 24.3 / 40.9Yahoo! Top News 226 20.9 / 9.1 45.6 / 29.6 54.4 / 70.4All Things D 139 66.2 / 17.2 89.2 / 41.5 10.8 / 58.5Gizmodo 82 36.1 / 5.2 70.0 / 21.1 30.0 / 78.9Aol News 78 19.2 / 11.4 85.1 / 44.4 14.9 / 55.6

One can made several immediate observations:

• Typically, around 65% of Facebook actions and 25% of retweets happens around top 7 stories.• Facebook activity is much more heavy-headed than retweets.• Yahoo! Upshot is the most heavy-headed blog in our study. Only 40% of retweets and 25% of

Facebook actions happens outside top 7 articles. Perhaps, it is so, because Upshot has very fewdedicated readers, and majority of action corresponds to a few Yahoo-wide promoted stories.AllThingsD is also fairly heavy-headed.• Mashable and Wired have the heaviest tails. Both have over over 75% of retweets and over

50% of Facebook actions outside top 7 stories.

Let us offer an interpretation from a content optimization perspective. The heavy head of socialactivity means that the total user satisfaction can be improved by improving quality of the tail

Page 13: Yahoo! Engagement Study

content or by finding a better ways to promote it. The heavy tail indicates that the tail contenthas its own audience and is well promoted. Thus, the best opportunity for heavy-tail websites liesin expanding its content production. For more accurate interpretation, one should track individualconsumption patterns. Goel, Broder, Gabrilovich and Pang have recently shown that the purpose ofthe tail inventory is not only to capture new users but also to better serve users who like some-of-the-top and some-of-the-niche [5].

4. Roadmap for Social Analytics

There is a number of natural next steps for Ediscope framework. First, we can turn our mea-surements into rankings of news sources and individual writers by their engagement scores andlifespans of their content. It is also informative to compare the signals for the same story coveredat different destinations. Then, one can do in-depth factor analysis to find what features of contentand audience increase the overall success of an article. In particular, what is the role of frontpagesand other in-site promotions? Another important step is to release datasets for research community.Pageviews vs. social signals spreadsheet is likely to be published first. As we identified the problemwith content of mid-range lifespan, one should have a closer look at this area. Videos and productshave a longer lifespan and should be studied through social signals. And, of course, Ediscope shouldcollect larger datasets to make its findings more robust.

The general direction of using social signals for content management is wide open. Here is theoverview of the key areas.

Data engineering. Ediscope establishes the basic architecture for news analytics systems. Fora typical study, one needs content discovery, signal crawler, monitoring, statistical analysis andvisualization components. Looking into future, the research community will benefit from a sharedpublic stack of these tools. We do not want to recreate the same code again and again. Ediscopeplatform can be extended in a number of ways. Of course, we need more signals: StumbleUpon,Delicious, Yahoo Site Explorer, Digg, Spinn3r, comment counts, signals from public and privatehit counters. In its future versions, Ediscope can incorporate content metadata: author, publisher,keywords, topics, headlines, tags, full text, content type, date and time, staff/guest/sponsored. Userdata can be harder to add due to privacy concerns, but eventually it will be a part of analyticssystems. We need real-time content discovery and signal stream processing. Higher rate limitsshould be negotiated with API providers. Then, there should be a way to add prediction, rankingand optimization algorithms on top of basic infrastructure.

Measurements and modeling. Every category of web content can be a subject of social ana-lytics: video, products, movies, books, websites, blogs, newspapers, magazines, TV shows, andcontent farms. One can focus either on a particular vertical or on a content network (Yahoo, MSN,Aol). A number of metrics can be created based on social signals: content lifespan, engagementscore, engagement-per-visit, share of social traffic to overall pageviews. Once we focus on a certaincontent source and a metric, it is time for factor analysis. How do features of content, audience

Page 14: Yahoo! Engagement Study

and user interface affect the social success of a published material? Then, we need comprehensiveindustry studies: the baseline numbers for social engagement and leaderboards. Finally, one cancreate a taxonomy of engagement scenarios of content units.

Content optimization. Of course, the ultimate goal of social analytics is not just to collect dataand compute some metrics and rankings. The real impact is in using social insights for makingbetter publishing choices. Every online publisher faces the following issues: Choose stories andtopics to cover. Balance recency and importance in news coverage. Optimize headlines. Optimizearticle length. Optimize in-network promotion. Rank its own stream of news [4] and make the bestselection for the frontpage. Find and fix underperforming areas. Optimize user interface. Make thebest content easy to discover. To conclude, the future of Ediscope and other analytics systems is torecommend choices that maximize social engagement.

Acknowledgement. Author thanks Benjamin Moseley and Silvio Lattanzi for fruitful discussionsat the early stage of this project.

References

[1] Workshop on Social Media Analytics, 2010. http://snap.stanford.edu/soma2010.[2] Z. Avramova, S. Wittevrongel, H. Bruneel, and D. De Vleeschauwer. Analysis and model-

ing of video popularity evolution in various online video content systems: power-law versusexponential decay. In INTERNET’09.

[3] M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon. Analyzing the video popular-ity characteristics of large-scale user generated content systems. IEEE/ACM Trans. Netw.,17(5):1357–1370, 2009.

[4] G.M. Del Corso, A. Gullı́, and F. Romani. Ranking a stream of news. In WWW’05.[5] S. Goel, A. Broder, E. Gabrilovich, and B. Pang. Anatomy of the long tail: Ordinary people

with extraordinary tastes. In WSDM’10.[6] J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news

cycle. In KDD’09.[7] M. Mathioudakis and N. Koudas. TwitterMonitor: Trend detection over the Twitter stream. In

SIGMOD’10.[8] M. Mendoza, B. Poblete, and C. Castillo. Twitter under crisis: Can we trust what we RT? In

SOMA’10.[9] P. Ogilvie. Modeling blog post comment counts, 2008.

http://livewebir.com/blog/2008/07/modeling-blog-post-comment-counts/.[10] J. Salman and H. Rangwala. Digging Digg: Comment mining, popularity prediction, and

social network analysis. In WISM’09.[11] T. Spiliotopoulos. Votes and comments in recommender systems: The case of Digg, 2010.

http://hci.uma.pt/courses/socialweb/projects/2009.digg.paper.pdf.

Page 15: Yahoo! Engagement Study

[12] M. Tsagkias, W. Weerkamp, and M. de Rijke. News comments: Exploring, modeling, andonline prediction. In ECIR’10.

[13] M. Tsagkias, W. Weerkamp, and M. de Rijke. Predicting the volume of comments on onlinenews stories. In CIKM’09.