magazine recommendations based on social media trends

67
Magazine recommendations based on social media trends Steffen Karlsson Kongens Lyngby 2014 B.Eng-2014

Upload: steffen-karlsson

Post on 17-Mar-2016

212 views

Category:

Documents


0 download

DESCRIPTION

My B.Eng project.

TRANSCRIPT

Page 1: Magazine recommendations based on social media trends

Magazine recommendationsbased on social media trends

Steffen Karlsson

Kongens Lyngby 2014B.Eng-2014

Page 2: Magazine recommendations based on social media trends

Technical University of DenmarkDepartment of Applied Mathematics and Computer ScienceMatematiktorvet, building 303B,2800 Kongens Lyngby, DenmarkPhone +45 4525 [email protected] B.Eng-2014

Page 3: Magazine recommendations based on social media trends

Summary (English)

Issuu uses a recommendation engine, for predicting what a certain readerwill enjoy. It is based on collaborative filtering, such as reading history ofother similar users and content-based filtering reflected as the document’stopics etc. So far all of those parameters, are completely isolated from anyexternal (non-Issuu) sources causing the Matthew Effect. This project,done in collaboration with Issuu, is the first attempt to solve the problem,by investigating how to extract trends from social media and incorporatethem to improve Issuu’s magazine recommendations.

Popular social media networks have been investigated and evaluated re-sulting in choosing Twitter as the data source. A framework for spottingtrends in the data has been implemented. To map trends to Issuu two ap-proaches have been used - Latent Dirichlet Allocation model and ApacheSolr search engine.

Page 4: Magazine recommendations based on social media trends

ii

Page 5: Magazine recommendations based on social media trends

Summary (Danish)

Issuu benytter sig af et anbefalingssystem til at forudsige, hvad der vilglæde en given læser. Det er baseret på collaborative filtering såsom læsehistorik fra lignende brugere. Derudover er det baseret på indholdsbase-ret filtrering, der afspejles som dokumentets tema mv. Hidtil er alle disseparametre fuldstændig isoleret fra eksterne (ikke Issuu) kilder. Dette pro-jekt er udført i samarbejde med Issuu og er det første forsøg på at løseproblemet. Dette er gjort ved at undersøge, hvorledes man kan udtræk-ke tendenser fra sociale medier og integere dem, for at forbedre Issuu’smagasin anbefalinger.

Populære sociale medier er blevet undersøgt og evalueret, hvilket resulte-rer i at Twitter blev valgt som datakilde. Et system til at spotte trendenserpå i dataen er blevet implementeret. Der er benyttet to forskellige me-toder til at integere tendenserne på Issuu - Latent Dirichlet Allocationmodellen og Apache Solr søgemaskine.

Page 6: Magazine recommendations based on social media trends

iv

Page 7: Magazine recommendations based on social media trends

Preface

This thesis was prepared at the department of Applied Mathematics andComputer Science at the Technical University of Denmark (DTU) in ful-fillment of the requirements for acquiring an B.Eng. in IT. The work wascarried out in the period September 2013 to January 2014.

I would like to thank my supervisor Ole Winther from DTU, my externalsupervisor Andrius Butkus and Issuu for spending time and resources onhaving me around.

Lyngby, 10-January-2014

Steffen Karlsson

Page 8: Magazine recommendations based on social media trends

vi

Page 9: Magazine recommendations based on social media trends

Contents

Summary (English) i

Summary (Danish) iii

Preface v

1 Introduction 11.1 Problem definition . . . . . . . . . . . . . . . . . . . . . . 21.2 Social media . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 What is a trend? . . . . . . . . . . . . . . . . . . . . . . . 51.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . 51.5 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 71.6 Expected results . . . . . . . . . . . . . . . . . . . . . . . 81.7 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Mining Twitter 92.1 Twitter API . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Tweet’s location problem . . . . . . . . . . . . . . . . . . . 11

3 Trending framework 153.1 Raw data . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2 Normalizing data . . . . . . . . . . . . . . . . . . . . . . . 183.3 Detecting trends . . . . . . . . . . . . . . . . . . . . . . . 193.4 Recurring trends . . . . . . . . . . . . . . . . . . . . . . . 203.5 Trend score . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Page 10: Magazine recommendations based on social media trends

viii CONTENTS

3.6 Aggregating trends . . . . . . . . . . . . . . . . . . . . . . 22

4 From trends to magazines 254.1 LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Using LDA . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2.1 Results using LDA . . . . . . . . . . . . . . . . . . 304.3 Solr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.4 Using Solr . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.4.1 Results using Solr . . . . . . . . . . . . . . . . . . 33

5 Conclusion 355.1 Improvements of the trending framework . . . . . . . . . . 365.2 Improvements of the LDA model . . . . . . . . . . . . . . 385.3 LDA vs. Solr . . . . . . . . . . . . . . . . . . . . . . . . . 39

A Dataset statistics 41A.1 Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41A.2 Hashtag . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

B Example: #bostonstrong 43

C Implementation details 47C.1 Flask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47C.2 Peewee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48C.3 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 49C.4 MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Bibliography 51

Page 11: Magazine recommendations based on social media trends

List of Figures

1.1 Typical patterns for slow and fast trends. . . . . . . . . . 6

1.2 Project flowchart . . . . . . . . . . . . . . . . . . . . . . . 7

2.1 Mining Twitter flowchart . . . . . . . . . . . . . . . . . . 9

2.2 Visualization of the problem with the location . . . . . . . 12

2.3 Visualization of the solution to the location problem . . . 12

3.1 Total tweets per hour . . . . . . . . . . . . . . . . . . . . . 16

3.2 Raw tweet count for hashtags . . . . . . . . . . . . . . . . 17

3.3 Weighted tweet count per hour . . . . . . . . . . . . . . . 18

3.4 Normalized hashtags . . . . . . . . . . . . . . . . . . . . . 18

3.5 Example sizes of w and r . . . . . . . . . . . . . . . . . . 19

3.6 w - r, where r = 2 hours. . . . . . . . . . . . . . . . . . . 20

Page 12: Magazine recommendations based on social media trends

x LIST OF FIGURES

3.7 w - r, where r = 24 hours. . . . . . . . . . . . . . . . . . . 21

3.8 Displyaing use of threshold in the trending framwork. . . . 22

3.9 E/R Diagram v2 . . . . . . . . . . . . . . . . . . . . . . . 23

4.1 Plate notation of the LDA model [Ble09] . . . . . . . . . . 26

4.2 LDA topic simplex, with three topics . . . . . . . . . . . . 27

4.3 Representation of topic distribution using dummy data . . 27

4.4 #apple tag cloud . . . . . . . . . . . . . . . . . . . . . . . 28

4.5 From trend to magazines flowchart . . . . . . . . . . . . . 29

4.6 Topic distribution for #apple tweets . . . . . . . . . . . . 29

4.7 Subset of the similar #apple documents using LDA . . . . 30

4.8 Example of tokenizing and stemming . . . . . . . . . . . 31

4.9 Subset of the similar #apple documents using Solr . . . . 33

5.1 Supported languages by Issuu . . . . . . . . . . . . . . . . 36

5.2 Translation module to improve the solution. . . . . . . . . 37

5.3 Three simultaneously running trending frameworks. . . . . 37

5.4 Top words in the topics. . . . . . . . . . . . . . . . . . . . 38

5.5 LDA per page solution. . . . . . . . . . . . . . . . . . . . 39

A.1 Top and bottom 10 of used locations . . . . . . . . . . . . 41

Page 13: Magazine recommendations based on social media trends

LIST OF FIGURES xi

A.2 Top and bottom 10 of used hashtags . . . . . . . . . . . . 42

B.1 Total tweets per hour . . . . . . . . . . . . . . . . . . . . . 43

B.2 Raw tweet count for hashtags . . . . . . . . . . . . . . . . 44

B.3 Fully processed data . . . . . . . . . . . . . . . . . . . . . 44

B.4 #bostonstrong tag cloud . . . . . . . . . . . . . . . . . . 45

B.5 Subset of #bostonstrong LDA documents . . . . . . . . . 46

B.6 Subset of #bostonstrong Solr documents . . . . . . . . . 46

C.1 E/R Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 49

Page 14: Magazine recommendations based on social media trends

xii LIST OF FIGURES

Page 15: Magazine recommendations based on social media trends

Chapter 1

Introduction

Issuu1 is a leading online publishing platform with more than 15 millionpublications - a pool that keeps growing by more than 20 thousand newones each day. The main challenge for the reader then becomes the nav-igation and discovery of interesting content, among the vast number ofdocuments. To solve the problem Issuu uses a recommendation enginefor predicting what a certain reader might enjoy.

Currently a whole range of parameters are a part of Issuu’s recommen-dation algorithm: reader’s location and language preferences (context),reading history of other similar users (collaborative filtering [RIS+94],[SM95]), document’s topics (content-based filtering [Sal89]) and docu-ment’s overall popularity. Also there are editorial and promoted docu-ments. So far all of those parameters, are completely isolated from anyexternal (non-Issuu) sources.

1www.issuu.com

Page 16: Magazine recommendations based on social media trends

2 Introduction

The main problem is that the same magazines constantly get recom-mended again and again. It highlights the shortcomings with collabora-tive filtering, rather than reading habits of Issuu users. Issuu does notallow readers to rate magazines, so the read time is used instead. Nat-urally popular magazines gather their read-times very quick and thenare hard to beat by the newly uploaded ones. They get recommendedmore and by that they only become stronger - a phenomena known asthe Mathew Effect [Jac88].

Incorporating local trends (what is happening around the reader) intothe recommendations, would address this problem and add a bit morefreshness and serendipity.

1.1 Problem definition

How to extract trends from social media and incorporate them, to improveIssuu’s magazine recommendations.

1.2 Social media

In this project, social media is the data source from which trends can beextracted. There are many social media platforms that could be used asthe data source for this project. Their suitability was evaluated based onthese parameters:

Data - Defines the format of the data, the amount of the data that isavailable and how semantically rich is it. This is the most impor-tant parameter, since it’s all about the quality of the data and willdirectly impact the ability to extract trends. Text is a preferreddata format here. The more data, the better - since it will addstability to the resulting trends. Semantic richness is about, howmuch meaning can be extracted from the data.

Page 17: Magazine recommendations based on social media trends

1.2 Social media 3

We should not expect any highly organized and semantically richtaxonomies, since the Twitter is a crowd driven social media in-stead of editorially curated and organized. In social networks wenormally see data being organized as folksonomies2, where "mul-tiple users tag particular content with a variety of terms from avariety of vocabularies, thus creating a greater amount of meta-data for that content [Wal05]. Semantic richness in folksonomiescomes from multiple users tagging the data with the same labels,which shows that they agree on what it is about. It can be narrowor broad. In narrow ones only the creator of the content is allowedto label it with tags, while in broad ones multiple users can labela piece of content. Broad folksonomies are more stable and infor-mative given that there are enough users to label things, and is thepreferred one in this project.

Real-time - Defines the time from something important happening inthe world, until it appears on the particular social media network.An API supporting real-time streaming of data is naturally prefer-able, but a small delay is also acceptable.

Accessibility - Defines if there are any restrictions throughout the API,that limits the accessibility of data.

The most popular social networks were evaluated based on these threeparameters. The one that fitted the best appeared to be Twitter3 (seeTable 1.1). Facebook4 and Google+5 scored well on the data part andreal-time but had to be ruled out due to the limited API access andstrict privacy settings. "The largest study ever conducted on Facebookon privacy, showed that in June 2011 around 53% of the profiles whereprivate, which where an increase of 17% over 15 months." [Sag12].

2A term coined by Thomas Vander Wal, combining words folk and taxonomy.3www.twitter.com4www.facebook.com5plus.google.com

Page 18: Magazine recommendations based on social media trends

4 Introduction

Positive Negative

Data

Average of 58 milliontweets each day6.Over 85% of topicsare headline or per-sistent news in na-ture [KLPM10].

Length of the tweet.Lack of reliability accord-ing to the location preci-sion of the tweets.

Real-time Pseudo real-time locationbased streaming service.

2 hours behind.

AccessibilityAPI easy accessible andusable.

Unpaid plan limited by a1% representative subsetof data.

Table 1.1: Evaluation of Twitter’s suitability for the project.

Linked-in7 was discarded because of the nature of the data - industryand career oriented. Instagram8, Pinterest9 and Flickr10 are all big andinteresting, but the data they provide is mostly images and thus hard tointerpret, also their data is not that close to trending news. Same goesfor YouTube11 and Vine12.

It is worth to mention that trends can be spotted on Issuu as well. Oneof the problems is that they have a huge delay since Issuu users are notas active as other social networks. Also on Issuu trends would have tobe inferred from what people read instead of what they are posting orcommenting.

7www.linkedin.com8www.instagram.com9www.pinterest.com

10www.flickr.com11www.youtube.com12vine.twitter.com

Page 19: Magazine recommendations based on social media trends

1.3 What is a trend? 5

1.3 What is a trend?

Trend can be understood in many different ways depending on the context- stock market, fashion, music, news, etc. The dictionary defines a trendas:

"a general direction in which something is developing orchanging" - Definition in the dictionary.

In this project trends will be considered a bit differently. Basically Issuuis interested in knowing what topic or event is currently hot in whichcountry (or other even smaller area) and recommend magazines similarto it. On Twitter trends can be spotted by looking at the hashtags, soin this project trending hashtags and trends, will be considered the samething. Trends are taken as a “hashtag-driven topic that is immediatelypopular at a particular time"13.

Trends vary in terms of how unexpected they are. Seasonal holidays likeChristmas or Halloween are trends, but very expected ones. On the otherhand, Schumacher’s skiing accident is a very unexpected one, both typesare equally interesting and valuable for Issuu. Another parameter is thespeed of how quickly, the trend is raising. We can have slow or fast trends(see Figure 1.1), the priority is spotting trends that raise fast.

Trend and popularity is not the same thing. If something becomes popularall of a sudden - it is a trend. But if it keeps being popular, it is not atrend anymore.

1.4 Related work

Extracting trends from Twitter is nothing new. The two widely usedapproaches are parametric or non-parametric. The most popular one isthe parametric approach, where a trending hashtag is being detected,by observing it’s deviation based on some baseline [IHS06], [BNG11],

13www.hashtags.org/platforms/twitter/what-do-twitter-trends-mean/

Page 20: Magazine recommendations based on social media trends

6 Introduction1 1

2 0

1 0

0 2

0 4

0 7

0 5

0 8

1 10

0 15

2 12

16 16

30 18

3 20

2 16

1 22

2 18

1 16

1 14

1 17

0 14

0 3

0 0

0 0

1 1

2 2

1 1

1 1

1 1

1 1

Cou

nt

Fast trendSlow trend

Cou

nt

time period ≈ 24 hours

0 24Figure 1.1: Typical patterns for slow and fast trends.

[CDCS10], using a sliding window. It’s the simplest of approaches andstill quite successful, based on the assumption that different trends willbehave similarly to one another. It’s known that this is not the case, inthe real world - there are many types of trends, with all kinds of patterns.

To address that problem, other non-parametric methods have been usedas well [Nik12]. In those ones the parameters were not set in advance,but were learned from the data instead. Many patterns were observedand grouped into the ones that became trends and the ones that didn’t.New hashtag patterns can then be compared to the observed ones usingeuclidean distance, the similarity can then be used to determinate if it istrending or not.

The requirements for spotting trends at Issuu, are not that strict - there’sno need to capture all the trends from a certain day, but instead justthe most significant ones. It makes things simpler and that’s why, theparametric model was chosen for this project. It is the first time, thatIssuu is doing a project like this, so the idea was to try the simpler thingsfirst, to see if they work. If not the more heavy non-parametric models,could be applied.

Page 21: Magazine recommendations based on social media trends

1.5 Methodology 7

1.5 Methodology

Figure 1.2 is illustrating the methodology of the project.

! !! I⚙ "Twitter Data Trending

Framework Trends Issuu Documents

Figure 1.2: Project flowchart

It’s important to note early how this trend data will be used by Issuubecause it sets requirements on other parts of the projects. Issuu is usingLatent Dirichlet Allocation (LDA) [BNJ03], to extract topics from it’sdocuments, using the Gensim implementation [ŘS10]. Using the Jensen-Shannon distance (JSD) algorithm it is possible to compare documentsto one other, using the LDA topic distribution. This allows Issuu to findsimilar documents, to the one that is being read, for example.

If we can capture trends from social media and express them as text("virtual document"), we could calculate LDA for the trend (one text fileper trend) and use JSD to find similar documents.

Issuu is using Apache Solr14 search engine, which takes text as input, andcan give similar documents as output. This is another approach and willbe investigated, whether this may be used as an alternative/complementto LDA.

With all that in mind, the plan is this:

• Access Twitter API15 and retrieve tweets, from a given country ona given time and storing them in a database.

14lucene.apache.org/solr/15dev.twitter.com

Page 22: Magazine recommendations based on social media trends

8 Introduction

• Calculate trends from the tweets, the output of this step are the listof trending hashtags, per given time window.

• Find out how to feed those trends, into both LDA topic model andSolr search engine

• Get documents as the final result and evaluate.

1.6 Expected results

• Analysis of potential resources for mining data from social medianetworks, to be used at Issuu as basis for recommendations.

• Data mining algorithms (Python16) to retrieve all the necessarydata.

• An algorithm for extracting trends from tweets.

• A method of feeding trends into the LDA model and Solr.

• Evaluation of the results and final recommendations on the end-to-end solution, for incorporating social media data into Issuu’srecommendation engine.

1.7 Outline

Chapter 2 is explaining how to retrieve tweets from the Twitter APIservice, which are being processed and analyzed in Chapter 3. Thetrends are fed into the LDA model and Solr search engine, resulting insimilar documents in Chapter 4. The final recommendations on theend-to-end solution are being evaluated in Chapter 5.

16www.python.org

Page 23: Magazine recommendations based on social media trends

Chapter 2

Mining Twitter

This chapter is about retrieving tweets from Twitter and storing them fortrend extraction later. USA was chosen as the country for this projectbecause of several reasons. First of all, most Issuu readers are from theUSA. Secondly, more than half of Twitter users are from the USA too[Bee12]. Finally, having tweets in english makes it simpler, because Issuu’sLDA model was trained on english Wikipedia and sticking to the englishtweets means that no translation will be needed.

!Trend related

Tweets

IIssuu’s

LDA

##Topic

Distribution

!Similar

documents

␡␡

$All tweets

in the worldLocation

filter [USA]

!Tweets in USA

$Database

Figure 2.1: Mining Twitter flowchart

Page 24: Magazine recommendations based on social media trends

10 Mining Twitter

The data in Twitter are 140 character long messages called tweets. Oftenthey contain some additional meta-data:

Symbol Description Example

#

Grouping tweets togetherby type or topic, known asa hashtag.

Wow, Mac OS X Maver-icks is free and will beavailable for machines go-ing back as far as 2007?#Apple #Keynote

@Used to referencing, men-tioning or replying anotheruser.

@alastormspotter: iOS7will release at around nooncentral time onWednesday.

RTSymbolizing a retweet(posting an existing tweetfrom another user).

RT @ThomasCDec: 50 daysto #ElectionDay

Table 2.1: Additional meta-data used in tweets.

2.1 Twitter API

The Twitter API is providing two different calls which may be suitablefor this purpose:

GET search/tweets : Is part of the ordinary API, i.e. with the ratelimit of 450 requests per 15 minutes and continuations url, whichmeans that there are a finite number of tweets per request beforerequesting next chunk.

POST statuses/filter : Is part of the streaming API, as mentioned inTable 1.1. Which for the unpaid plan, has the limitation of onlyproviding a 1% representative subset of the full dataset.

Page 25: Magazine recommendations based on social media trends

2.2 Tweet’s location problem 11

Twitter is using a three step heuristic, to determine whether a given Tweetfalls within the specified location defined as a bounding box1:

1. If the tweet is geo-location tagged, this location will be used forcomparison with the bounding box.

2. A user on Twitter can in the account settings specify location, whichin the API calls refers as place, and this will be used for comparisonif the tweets is not geo tagged.

3. If neither of the rules listed above match, the tweet will be ignoredby the streaming API.

The streaming API was chosen, because it takes all the three heuristicsinto account, whereas the search API only includes the second. Addition-ally it is difficult to know how frequently to execute the API call, in orderto be up-to-date, due to the limitations.

2.2 Tweet’s location problem

A couple of problems where spotted with the location accuracy:

1. Twitters API supports streaming by location, but only with coor-dinates in sets of SW and NE, defining each country by a square.Figure 2.2 shows the tweets streamed from USA (tweets that areactually from USA have been filtered, to provide a better overview).

2. Although the selected bounding box is covering USA and even more,tweets from Guatemala and Honduras is still present (see Figure2.2).

1Two pairs of longitude and latitude coordinates; south-west (SW) and north-east(NE) corner of a rectangle

Page 26: Magazine recommendations based on social media trends

12 Mining Twitter

Figure 2.2: Visualization of the problem with the Twitter service, whereeach red dot represents a tweet. Duration is 1 hour and thenumber of tweets with a wrong location is 7,240.

Figure 2.3: Visualization of the solution to the Twitter service problem,where each red dot represents a tweet. Duration is 1 hourand the number of tweets is 112,851, this means an errorrate of approximately 7%.

Page 27: Magazine recommendations based on social media trends

2.2 Tweet’s location problem 13

Applying the location filter to the streaming API, means that the bound-ing box needs to be known. GeoNames2 solves the problem, by providingall coordinates needed for all countries.

The two problems spotted regarding the location accuracy, turned outto have the same solution. Algorithm 1 investigates whether the currentreceived tweet are from the same country as desired. The ones which are,will be stored in the MySQL3 database for further analysis. AppendixC contains information about, the database choice and implementationdetails including E/R diagram.

A problem occurred, some of the tweets where missing the country code,which means that they could not be processed. To solve this Open-Streetmap’s reverse geocoding API4 was used, which has the ability toconvert longitude and latitude value pairs to a country code.

Algorithm 1: Parse tweet from dataif coordinates in data then

if place not in data thencountry_code = reverse geocode coordinates

if tweet.country_code is chosen country_code then#Parse the rest of the tweetadd tweet to database

elseraise LocationNotAvailableException

For debugging purposes an interactive tweet-map, which is a graphicalinteractive way of visualizing tweet, has been created (used at Figure 2.2)and Figure 2.3). It is a JavaScript/HTML5 based website hosted locallyin Python with the module Flask5 and implementation details availablein Section C.1.

2www.geonames.org - Licensed under a Creative Commons attribution license,which gives you free access to: Share - to copy, distribute and transmit the workand Remix - to adapt the work to make commercial use of

3www.mysql.com4wiki.openstreetmap.org/wiki/Nominatim/5flask.pocoo.org

Page 28: Magazine recommendations based on social media trends

14 Mining Twitter

Page 29: Magazine recommendations based on social media trends

Chapter 3

Trending framework

In the previous chapter it was described how the tweets were collectedensuring their location accuracy and stored in the MySQL database. Thischapter focuses on how to turn those tweets into trends. "Fast" trendswere chosen for this project, because it would have the most impact com-pared to the "slow" trends. Eventually most trends will appear on Issuu- having huge delay, since Issuu users are not as active as other socialnetworks. Therefore the challenge is to reduce this delay.

To illustrate the idea a time period of three days was chosen knowing inadvance that there were several trends in there and testing if the algorithmcan find them.

On October 22nd Apple held it’s annual event where it has presented theupdated product line (new iPads, MacBooks and of course the new OSXMavericks). This event was chosen as one of the examples to start with.

Page 30: Magazine recommendations based on social media trends

16 Trending framework

3.1 Raw data

At first, we will take a look at the raw data from the database. Inaddition also a 3 consecutive days subset, which will be used as example,to describe the trending framework:

Type Count

Full dataset Example

Duration (hours) 1462 72Tweets 127,930,378 4,103,273Hashtags 25,502,269 770,453Unique hashtags 3,180,466 206,850Avg. tweets per day 2,099,858 1,367,757Avg. length per tweet (char) 56 54Avg. words per tweet 9.5 9.1

Table 3.1: Facts about the dataset collected.

More statistics about the dataset are presented in Appendix A.

The plot of total tweets per hour at Figure 3.1 - where the x-axis rep-resents the 3 days (72 hours) and 0, 24 and 72 is midnight (this alsoapplies for the other plots in this chapter) - clearly shows, that the fre-quency/fluctuation of tweets reflects the same day/night rhythm as hu-mans, which was as expected.

tweets per hour

#pll#apple#jobs

#pll#apple#jobs

#pll#apple#jobs

#pll#apple#jobs

1 6 12 18 24

#apple

0 12 3624 48 60 72

1 12 3624 48 60 72

1 12 3624 48 60 72

1 12 3624 48 60 72

24 36 48 60 72

Untitled 1 Untitled 6 Untitled 11 Untitled 16 Untitled 2124 36 4830 453327

wr

2 hou

rs

1 hou

r

30,000

1,800

0.07

0.06

0.06

0

0

Figure 3.1: Total tweets per hour

Page 31: Magazine recommendations based on social media trends

3.1 Raw data 17

Hashtags is used to categorize/label the tweet, with one word or phraseand can be used to spot the trends in the tweets. The full text of thetweets, could also have been used, this option has been tested and foundto be generally too vague.

Figure 3.2, shows the total amount of tweets of the chosen hashtags. Asdescribed before Apple is one of them, the other two are; the TV-show"Pretty Little Liars" which was shown the same day and the hashtags"jobs", which is a way companies use to identify a job opening on Twitter.

These three different hashtags represent different kinds of trends: one-time-events, weekly recurring and daily recurring, which will be describedlater in this chapter.

tweets per hour

#pll#apple#jobs

#pll#apple#jobs

#pll#apple#jobs

#pll#apple#jobs

1 6 12 18 24

#apple

0 12 3624 48 60 72

0 12 3624 48 60 72

0 12 3624 48 60 72

0 12 3624 48 60 72

Untitled 1 Untitled 6 Untitled 11 Untitled 16 Untitled 2124 36 4830 453327

wr

2 hou

rs

1 hou

r

30,000

1,800

0.07

0.06

0.06

0

0

0 12 3624 48 60 72

Figure 3.2: Raw tweet count for hashtags

Due to the quite big fluctuation in the total amount of tweets during a day,a weight function with the purpose of reducing the importance of tweets,which are tweeted during nights, expressed as a sigmoid function1 hasbeen created (this could also have been another mathematical functionlike hyperbolic tangent):

wt =1

1 + exp(−(m× (tweets ∈ t−X))), (3.1)

where t defines a time period and m is the slope of the curve, whichcan be defined as how "expensive" it is, to have a tweet count below thepreferred amount X (see Figure 3.3).

1en.wikipedia.org/wiki/Sigmoid_function/

Page 32: Magazine recommendations based on social media trends

18 Trending framework

tweets per hour

#pll#apple#jobs

#pll#apple#jobs

#pll#apple#jobs

#pll#apple#jobs

1 6 12 18 24

#apple

0 12 3624 48 60 72

0 12 3624 48 60 72

0 12 3624 48 60 72

0 12 3624 48 60 72

Untitled 1 Untitled 6 Untitled 11 Untitled 16 Untitled 21

24

36 4830 453327

wr

2 hou

rs1 h

our

30,000

1,800

0.07

0.06

0.06

0

0

0 12 3624 48 60 72

!

#pll

#pll

#jobs

weighted tweet count

0 12 3624 48 60 72

1.0

Figure 3.3: Weighted tweet count per hour

3.2 Normalizing data

To get reasonable results from data which vary in the amount (in thiscase, the total amount of tweets pr hour), it is highly recommended tonormalize the data. In this project every hashtag will be normalized bythe total amount of tweets in the time period t, this results in a normalizedvalue for each hashtag:

ft =|tweets 3 the hashtag|

|tweets ∈ t|(3.2)

Figure 3.4 shows the result of applying Equation 3.2 to the data at Figure3.2. The hashtags seems to follow the same pattern, although there issome differences during the night.

tweets per hour

#pll#apple#jobs

#pll#apple#jobs

#pll#apple#jobs

#pll#apple#jobs

1 6 12 18 24

#apple

0 12 3624 48 60 72

0 12 3624 48 60 72

0 12 3624 48 60 72

0 12 3624 48 60 72

Untitled 1 Untitled 6 Untitled 11 Untitled 16 Untitled 2124 36 4830 453327

wr

2 hou

rs

1 hou

r

30,000

1,800

0.07

0.06

0.06

0

0

0 12 3624 48 60 72

Figure 3.4: Normalized hashtags

Page 33: Magazine recommendations based on social media trends

3.3 Detecting trends 19

3.3 Detecting trends

To detect a trend, is it important to know how the hashtag has behaved inthe previous time window (reference window r), before the current timewindow w. Sizes of the w and r are parameters in the framework and aretunable.

Figure 3.5 shows an example of the values, for the reference window andcurrent window, which respectively is 2 hours and 1 hour.

Twee

t cou

nt

tweets pr hour

#pll#apple#jobs

#pll#apple#jobs

Tren

d sc

ore

#pll#apple#jobs

#pll#apple#jobs

1 6 12 18 24

#apple

1 12 3624 48 60 72

1 12 3624 48 60 72

1 12 3624 48 60 72

1 12 3624 48 60 72

1 12 3624 48 60 72

24 36 4830 453327

wr

2 hou

rs

1 hou

r

Figure 3.5: Example sizes of w and r.

To be able to know if the term is a trend, the normalized reference window(ft_ref ) will be subtracted from the current window, to find out whetherthe interest has increased:

ft_ref =

|r|∑|the hashtag ∈ tweets|

|r|∑|tweets ∈ t|

(3.3)

where r is a list of reference windows. The outcome of this step can beseen in Figure 3.6. It clearly shows that it has a huge impact on the"jobs" hashtag, whose influence has dropped.

Page 34: Magazine recommendations based on social media trends

20 Trending framework

tweets per hour

#pll#apple#jobs

#pll#apple#jobs

#pll#apple#jobs

#pll#apple#jobs

1 6 12 18 24

#apple

0 12 3624 48 60 72

0 12 3624 48 60 72

0 12 3624 48 60 72

0 12 3624 48 60 72

Untitled 1 Untitled 6 Untitled 11 Untitled 16 Untitled 2124 36 4830 453327

wr

2 hou

rs

1 hou

r

30,000

1,800

0.07

0.06

0.06

0

0

0 12 3624 48 60 72

Figure 3.6: w - r, where r = 2 hours.

3.4 Recurring trends

The size of w (current window) and r (reference window) creates problemsregarding interfering recurring trends. Hashtags like "jobs" turns out asa trend each day, no matter the fact that it follow the same pattern eachday (see Figure 3.2).

These types can be daily, weekly or yearly recurring defined as:

Day - Examples of recurring daily trends will be hashtags such as "jobs".This hashtag is recurring each day, but not necessarily on the sametime, and tests shows that the amount tends to be a bit lower duringthe weekend.

Week - TV-shows is a great example of trends, which are recurring eachweek on the same day and time as long as they are shown. Anothertype of weekly recurring trends is the natural difference betweenweekdays with work and weekends.

Year - New Years Eve, Christmas or Halloween are all examples of yearlyrecurring trends.

Page 35: Magazine recommendations based on social media trends

3.5 Trend score 21

In this project the focus is to get rid of the daily recurring trends. Theissue will be solved, by subtracting the maximal value of the hashtag fromthe day before the time period t :

max(fi), i ∈ {t− 24; t} (3.4)

This would make the framework sensitive for outliers. But in this projectoutliers are in fact what is being looked for - trends. Subtracting themaximum does not mean that a hashtag can not be trending two days ina row, the amount of tweets containing it, just needs to rise.

The weekly and yearly recurring trends where not implemented, but theprinciple is the same.

3.5 Trend score

Combining Equation 3.1, 3.2, 3.3 and 3.4 gives the complete Equation 3.5for calculating the trend_score of a hashtag on a given time t :

trend_score = (ft − ft_ref −max(fi))× wt, i ∈ {t− 24; t} (3.5)

Figure 3.7 shows the final result of the trending framework after applyingEquation 3.5 to the data:

tweets per hour

#pll#apple#jobs

#pll#apple#jobs

#pll#apple#jobs

#pll#apple#jobs

1 6 12 18 24

#apple

0 12 3624 48 60 72

0 12 3624 48 60 72

0 12 3624 48 60 72

0 12 3624 48 60 72

Untitled 1 Untitled 6 Untitled 11 Untitled 16 Untitled 2124 36 4830 453327

wr

2 hou

rs

1 hou

r

30,000

1,800

0.07

0.06

0.06

0

0

0 12 3624 48 60 72

Figure 3.7: w - r, where r = 24 hours.

where the x -axis represents time (as for the other plots), where the y-axisrepresents the trend_score.

Page 36: Magazine recommendations based on social media trends

22 Trending framework

In this plot the importance of the hashtag "jobs" is reduced to a point,where it is insignificant (a trend score below 0) and the other two turnsout to be trendy, which is exactly what we want.

Sometimes the framework produces too many trends than needed forIssuu. Because of this a threshold has been set as a limit. If and only ifthe trend_score is above the threshold, will the hashtag be accepted as atrend. The threshold is a parameter which is tunable

tweets per hour

#pll#apple#jobs

#pll#apple#jobs

#pll#apple#jobs

#pll#apple#jobs

1 6 12 18 24

#apple

0 12 3624 48 60 72

0 12 3624 48 60 72

0 12 3624 48 60 72

0 12 3624 48 60 72

Untitled 1 Untitled 6 Untitled 11 Untitled 16 Untitled 21

24

36 4830 453327

wr

2 hou

rs1 h

our

30,000

1,800

0.07

0.06

0.06

0

0

0 12 3624 48 60 72

!

#pll

#jobs

weighted tweet count

0 12 3624 48 60 72

1.0

Figure 3.8: Trend scores for all trends in the timeperiod of 22. October.The red line displays the chosen threshold.

Figure 3.8 is a visual representation of the trending scores of the Appleevent day, the 22nd of October. The red line represents the threshold.

3.6 Aggregating trends

At Issuu it is not very likely that users come back each hour, but morelikely ones a day, therefore it would be unnecessary to recommend newdocument each hour. A solution where it is possible to aggregate trendthroughout longer periods, which should be tunable, would be preferable.The computationally and extensible most optimal solution would be toextend the existing database (explained in Section C.3), make it able tostore trends and references to corresponding tweets.

Page 37: Magazine recommendations based on social media trends

3.6 Aggregating trends 23

Two new tables: trend and tweet_trend_relation, where added to thedatabase, containing the trend and the time where it where trendy (seeFigure 3.9).

Figure 3.9: E/R Diagram v2

Page 38: Magazine recommendations based on social media trends

24 Trending framework

Page 39: Magazine recommendations based on social media trends

Chapter 4

From trends to magazines

This chapter is all about mapping the computed trends to Issuu, whichcan then be presented as similar magazines/documents for the users. Twodifferent approaches will be investigated in order to solve this problem:Latent Dirichlet Allocation (LDA) and Apache Solr search engine

4.1 LDA

Latent Dirichlet Allocation (LDA) is a generative probabilistic model,that allows to automatically discover the topics in a document. A topicis defined by the probability distribution over words in a fixed vocabulary,which means that each topic, contains a probability for each word.

LDA can be expressed as a graphical model, known as the plate notation(Figure 4.1)

Page 40: Magazine recommendations based on social media trends

26 From trends to magazines

α

ND

Observed word

Topic Hyperparameter

Per-word topic assignment

Per-document topic proportion

Dirichlet parameter

θd βZd,n

Wd,n

Figure 4.1: Plate notation of the LDA model [Ble09]

where,

Variable Definition

D The number of documents.N Total number of words in all documents.W The observed word n for the document dZ Assigns the topics for the n’th word in the d ’th

document.α K dimensions per-document topic distribution vector,

where K is the number of topics.β Y dimensions per-topic word distribution vector,

where Y defines the number of words in the corpus.θ Topic proportions for the d ’th document.

Table 4.1: Definition of LDA model parameters

Page 41: Magazine recommendations based on social media trends

4.1 LDA 27

Figure 4.2 is an example of a visual representation of a LDA space withthree topics. A given document x has a probability, to belong in eachtopic, which all sums up to 1. The corners of the simplex, corresponds tothe probability 1 for the given topic.

Topic 1

Topic 2

Topic 3

␡␡

Figure 4.2: LDA topic simplex, with three dummy topics.

Topic distribution for a document can be visualized using a bar-plot.This describes which topics are present in the document and by that, it’sunderlying hidden (latent) structure:

1 2 3 4 5 6 7 8 9 10 x-1 xTopic

Figure 4.3: Representation of topic distribution using dummy data,where the x-axis represents the x number of topics and they-axis represents the probability of belonging to the giventopic x.

At Issuu, the LDA model is trained on 4.5 million English Wikipedia1

articles. LDA makes the assumption that all words, in the same article issomehow related. Every article is unique in the sense, that it has uniquedistributions of words.

1www.wikipedia.org

Page 42: Magazine recommendations based on social media trends

28 From trends to magazines

This could be interpreted as a unique topic for each article, resulting in4.5 million topics. This would be useless, since the goal is to make amodel that finds similarities among documents, instead of declaring themall different. One of the main steps in LDA is dimensionality reductionwhere the number of topics is reduced (in Issuu case to 150 topics) forcingsimilar topics to “merge” and reveal deeper underlying patterns.

4.2 Using LDA

All the tweets containing the trending hashtag, will be used as data sourcefor Issuu’s LDA model, instead of only the hashtag itself. A hashtag doesnot provide enough information and context, to give a stable result fromIssuu’s LDA model. The model is context-dependent and would not beable to differentiate, between the fruit and the electronic company, basedon the hashtag #apple, without any context.

To give an idea of the richness of the context, behind the tweets from asingle hashtag like #apple (see Figure 4.4), two tag clouds were generated.A tag cloud is a visual representation of text, which favors the words thatis mostly used in the text, by either color or size.

Figure 4.4: Left: Tag cloud for all words, in the tweets containing#apple. Right: Same tagcloud after removing #apple,#free and #mavericks, to get a deeper understanding.

Page 43: Magazine recommendations based on social media trends

4.2 Using LDA 29

The flowchart (Figure 4.5) contains four steps, visualized as arrows anddenoted with numbers. It shows overall structure of how to turn trendsfrom Twitter, into similar magazines/documents using LDA.

!Trend related

TweetsIssuu’s

LDA

##Topic

Distribution

!Similar

documents

␡␡

$All tweets

in the worldLocation

filter [USA]

!Tweets in USA

$Database

%1 32 4

Figure 4.5: From trend to magazines flowchart

The tweets corresponding to the trending hashtag is feed in to the LDAmodel (step 1), which produce the topic distribution (step 2). Figure 4.6shows the topic distribution for the #apple tweets, where it is easy to seethat the software/electronics topic is dominating, as expected.0.000007945967

0.011839627149

0.000007945967

0.001273715413

0.010046488782

0.020267836621

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.009953154993

0.000007945967

0.000007945967

0.002414483548

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.008195320570

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.026455528532

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.003986194631

0.000007945967

0.005491066789

0.000007945967

0.000007945967

0.000007945967

0.001353193101

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.001436402768

0.005035626070

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000921799862

0.061102617619

0.000007945967

0.001225442698

0.000007945967

0.000007945967

0.005200749857

0.000007945967

0.000007945967

0.009508312644

0.002172815590

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.001200310125

0.017425434171

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.051578089927

0.002091128987

0.000007945967

0.447611991520

0.000007945967

0.003475499744

0.000007945967

0.050404963941

0.000007945967

0.005985321782

0.000007945967

0.000007945967

0.000007945967

0.001527621895

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.006667538538

0.000007945967

0.009114833388

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.113652862505

0.000007945967

0.000007945967

0.019504673443

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.009007024311

0.007163380404

0.046579565550

0.000007945967

0.004113406199

0.000007945967

0.000007945967

0.000007945967

0.000007945967

0.003511620738

0.000007945967

0.000007945967

0.000007945967

0.005560819158

0.003021698753

0.000007945967

0.000007945967

0.000007945967

0.002047781268

Software

Figure 4.6: Topic distribution for #apple tweets

Using this LDA topic distribution, it is possible, in combination withthe Jensen-Shannon divergence algorithm (Equation 4.1) (step 3), to findsimilar magazines to recommend from Issuu (step 4):

JSD(P‖Q) =1

2D(P‖Q)+

1

2D(Q‖M),

where M =1

2D(P +Q),

(4.1)

where P and Q are two probability distributions, which in Issuu’s caseare two LDA topic distributions.

Page 44: Magazine recommendations based on social media trends

30 From trends to magazines

4.2.1 Results using LDA

Figure 4.7 is a a subset of the magazines found being similar using LDA,to the tweets containing the hashtag #apple.

Figure 4.7: Subset of the similar #apple documents using LDA.NB: The documents are blurred due to copyright issues andthe terms of services/privacy policy on Issuu, this appliesfor all figures which shows magazine covers

The resulting magazines range from learning material such as "... ForDummies" to magazines like "Computer Magazine" and "macworld".The popularity and the date they are uploaded on Issuu differs frommagazine to magazine. These would be parameters which also could beused to weight the documents.

Page 45: Magazine recommendations based on social media trends

4.3 Solr 31

4.3 Solr

Lucene is a Java-based high-performance text search engine library. Adocument in Lucene terms, is not a document as we know it, but merelya collection of fields, which describes the document. For any given doc-ument, these fields could be information like the title and the amount ofpages. Lucene uses text analyzer, which tokenize the data from a field,into a series of words. After that a process called stemming will be per-formed, which is reducing all the words to its stem/base. See Figure 4.8for example.

Magazine recommendations based on social media trends

Magazine

recommendations

based social mediatrends

on

recommendtrend

base

Tokenization Stemming

Text to be processed Tokenized text Stemmed words

Magazineon

socialmedia

Figure 4.8: Example of tokenizing and stemming

Lucene uses term frequency-inverse document frequency (tf-idf ) as a partof the scoring model, where tf is the term frequency in a document, whichis the measure of how often, a term appears in a document. Idf is themeasure of how often a term, appears across a collection of documents.

Solr is a open sourced enterprise search server, used widely by serviceslike Netflix2. It is a web application service build around Lucene, addinguseful functionality such as geospatial search, replication and web admin-istrative interface for configuration.

2www.netflix.com

Page 46: Magazine recommendations based on social media trends

32 From trends to magazines

4.4 Using Solr

In this project an integration with Issuu Solr server has been created,the use of the search engine is throughout HTTP requests and JSON3

responses. JSON is a easy human-readable open standard text format,which is mostly used to transfer data between servers and web application,like in this project. An example of a request could be:

<base url>?q=apple+mavericks+free+new&

wt=json&debug=true&start=0&row=50&

where,

Parameter Description

base url Address to access the Solr search engine.

debug If true, the response will contain addition informa-tion, including scores and reason for each document.

q Main text to be queried in the request.

rows Maximum number of results - used to paginate.

start Used to define current position, in combination withrows to paginate.

wt The format of the response like json.

Table 4.2: Explanation of parameters to Solr.

The q parameter is constructed using the x most occurring words, in thetweets corresponding to the trending hashtag, where x is tunable.

3json.org

Page 47: Magazine recommendations based on social media trends

4.4 Using Solr 33

4.4.1 Results using Solr

The similar magazines produced by Solr (Figure 4.9) are more diversemore than the similar documents from the LDA model (Figure 4.7). Theyrange from Apple magazines and learning material to magazines aboutthe surf spot "Mavericks" in California and the NBA (National BasketballAssociation) team Dallas Mavericks.

Figure 4.9: Subset of the similar #apple documents using Solr

Appendix B contains a complete example, using the hashtag #bostonstrongdisplaying the complete process from tweets to similar documents usingboth LDA and Solr.

Page 48: Magazine recommendations based on social media trends

34 From trends to magazines

Page 49: Magazine recommendations based on social media trends

Chapter 5

Conclusion

A prototype of an end-to-end solution, with the purpose of spotting lo-cation based trends from a social media network and mapping them toIssuu, has been developed. Twitter was selected as the data source, be-cause it suited the requirements the best, as described in section 1.2.

Both of the results (LDA: Figure 4.7 and Solr: 4.9) suggest that improve-ments could be made. Four improvements for the trending frameworkand three for the LDA was found useful:

• Trending Framework:

1. Support the existing 28 languages supported1 by Issuu.

2. Capture "Slow" trends.

3. Non-parametric model.

4. Recurring weekly and yearly trends.1Magazine written in other languages does not have a LDA topic distribution,

because only those languages are incorporated into the translation framework andtherefore can be translated to English.

Page 50: Magazine recommendations based on social media trends

36 Conclusion

• LDA:

1. Limited by Issuu’s LDA model.2. Wikipedia lacks certain topics.3. Big magazines results in many topics.

5.1 Improvements of the trending framework

USA was chosen as location in this prototype, because most Issuu readersare from USA and more than half of Twitter users are from the countrytoo. In the future the goal will be to support the existing 28 languagessupported by Issuu (Figure 5.1).

EnglishSpanishGermanFrenchPortugueseRussianArabicItalianDutch

TurkishFarsiPolishIndonesianSwedish

NorwegianCatalan

CzechHebrew

DanishFinnishRomanianHungarianCroatianIcelandic

Supported Not supported

Figure 5.1: List of languages supported by Issuu, including coloredworld map of countries speaking those languages.

A solution is to extend the existing end-to-end solution with Issuu’s trans-lation framework (Figure 5.2), which will be able to translate all non-English tweets to English (1st improvement of the trending framework).

A tuneable trending framework has been developed, which is capable ofspotting "fast" trends on Twitter and reduce the importance on dailyrecurring trends. The hashtags: #apple and #pll were found to betrendy the 22nd of October, among a vast number of unimportant re-curring trends. It was the day Apple presented their updated productline including the new OSX Mavericks2 and the Halloween episode of thepopular TV-show "Pretty Little Liars (pll)" was shown.

2Apple Special Event: http://www.apple.com/apple-events/october-2013/

Page 51: Magazine recommendations based on social media trends

5.1 Improvements of the trending framework 37

! # $%

&

MySQL Database Trends English? Issuu’s

LDATranslation

''

TopicDistribution

Yes

No

Figure 5.2: Translation module to improve the solution.

Hashtags like #happyhalloween described in Appendix B, along with#bostonstrong, was possible to spot during the 31st of October, butthe ability to spot "slow" trends (described in Section 1.3) would needto be improved (2nd improvement of the trending framework). A possi-ble solution is to run multiple instances of the framework simultaneously,with various sizes of the current window (w) and the reference window(r) (Figure 5.3).

! !

⚙2h

⚙6h

⚙12h

#

#

#MySQL

Database Tweets Trending Frameworks Trends

Fast

Slow

Slow

Figure 5.3: Three simultaneously running trending frameworks.

Creating a new trending framework, which is build on a non-parametricmodel (3rd improvement of the trending framework) [Nik12] would makethe system more robust and faster to spot the trends. Robust becauseparameters like the threshold is not defined from the beginning, but ob-served from training data, which is used decide whether a new dataset isa trend or not.

Page 52: Magazine recommendations based on social media trends

38 Conclusion

A last improvement (4th) would be to implement the weekly and yearlyrecurring trends (as described in Section 3.4), which is basically the sameprincipal as the daily recurring trends.

5.2 Improvements of the LDA model

This project is limited by Issuu’s LDA model (1st improvement of theLDA model), the similar magazines found using the Apple tweets (Figure4.7) includes magazines about Microsoft too, because that Apple is partof the overall Software topic, which also includes Google and Microsoftetc. (see Figure 5.4).

Topics about technology, computers and software

#98 #70 #1 #66

undosoftwarewindowsuserservercomputergooglemicrosoftdomainweblinuxos programversionbrowserfilesmac system open programming file computing download free apple

1

100K

user link addedblacklistcoibot reported resolves accounts users additions reporting involved report records wikipedia mentioned whitelistmonitor interest monitorlist domainredlist org adding conflict

data code internet local network service mobile users access digital computer services using ip available address web phone technology system mail online networks application via

sat search admins poly cycle graph problem algorithm node logic rp step np arrow graphs edge problemsinterwiki optimization path arrows tree algorithms xwiki fpc

apple freemavericksosxnewos todayxmac eventkeynoteproavailable watchingipadmacbookloveohiworkwantblackgolooksgoodcook

Top words in #apple tweets

… … … … …

Figure 5.4: Top words in the topics.

Wikipedia is great resource to use as a text corpus for training the LDAmodel because it is free, it is broad enough to cover multiple topics andvery clean and focused in terms of each article being about one topic.For example Wikipedia article about Italian food is very unlikely to writeabout technology or cars. The main disadvantage of using Wikipedia atIssuu is that Wikipedia is not evenly covering all possible themes (2nd

Page 53: Magazine recommendations based on social media trends

5.3 LDA vs. Solr 39

improvement of the LDA model) - it is overloaded on business and tech-nology, while lacking more in entertainment topics. That is one of thereasons why certain topics in Issuu’s LDA model are combined together(for example American football and baseball) thus making it unable for usto distinguish between them. This will be addressed when Issuu launchestheir new LDA model with more topics.

Magazines are often broad and big and therefore includes many topics,this is of course a problem, because a single topic distribution is computedfrom the whole magazine. To address this problem (3rd improvement ofthe LDA model), a solution would be to compute LDA per page instead(Figure 5.5), which makes it possible to recommend single pages withina magazine.

"

Magazine

)

Pages

+

JSD

,

-

.

/

0

1

2

3

4

Similar pages

''

''''

Topic Distribution

''

# Trend

Figure 5.5: LDA per page solution.

5.3 LDA vs. Solr

The two approaches - LDA (Section 4.1) and Solr (Section 4.3) - differ inthe results they provide. One could argue that the quality of the LDAapproach is better that the Solr. Once both systems are launched live,an A/B testing3 could be used to see which one fits better for Issuu.

3en.wikipedia.org/wiki/A/B_testing

Page 54: Magazine recommendations based on social media trends

40 Conclusion

Page 55: Magazine recommendations based on social media trends

Appendix A

Dataset statistics

A.1 Location

Los Angeles 2051577

Texas 1678012

Georgia 1539281

New York 1503027

Manhattan 1435282

Chicago 1280852

Florida 1273262

Philadelphia 1178112

Ohio 1116043

South Carolina 1078361

Oberon 1

Macy's 1

Woodberry 1

Juntura 1

Conconully 1

German Valley 1

Unionville Center 1

Cedarbend 1

Funkley 1

Alicia 1

job 472043

jobs 412111

tweetmyjobs 230950

oomf 133661

wcw 84900

pdx 76820

veteranjob 68357

mcm 61772

coupon 56949

nursing 55968

gtav 11728

apple 7388

pll 6070

wrongconversation

1

billycorgan 1

justwantcheesecake

1

stalkerwife 1

travelquestions 1

uneedheadandshoulders

1

hdbros 1

thoughtweweregonnamove

1

quietdownpeople

1

xodb 1

Los Angeles

Texas

Georgia

New York

Manhattan

Chicago

Florida

Philadelphia

Ohio

South Carolina

Oberon

Macy's

Woodberry

Juntura

Conconully

German Valley

Unionville Center

Cedarbend

Funkley

Alicia

625,000 1,250,000 1,875,000 2,500,0001

1

1

1

1

1

1

1

1

1

1,078,361

1,116,043

1,178,112

1,273,262

1,280,852

1,435,282

1,503,027

1,539,281

1,678,012

2,051,577

job

jobs

tweetmyjobs

oomf

wcw

pdx

veteranjob

mcm

coupon

nursing

gtav

apple

pll

wrongconversation

billycorgan

justwantcheesecake

stalkerwife

travelquestions

uneedheadandshoulders

hdbros

thoughtweweregonnamove

quietdownpeople

xodb

150,000 300,000 450,000 600,0001

1

1

1

1

1

1

1

1

1

6,070

7,388

11,728

55,968

56,949

61,772

68,357

76,820

84,900

133,661

230,950

412,111

472,043

Figure A.1: Top and bottom 10 of used locations

Page 56: Magazine recommendations based on social media trends

42 Dataset statistics

A.2 Hashtag

Los Angeles 2051577

Texas 1678012

Georgia 1539281

New York 1503027

Manhattan 1435282

Chicago 1280852

Florida 1273262

Philadelphia 1178112

Ohio 1116043

South Carolina 1078361

Oberon 1

Macy's 1

Woodberry 1

Juntura 1

Conconully 1

German Valley 1

Unionville Center 1

Cedarbend 1

Funkley 1

Alicia 1

job 472043

jobs 412111

tweetmyjobs 230950

oomf 133661

wcw 84900

pdx 76820

veteranjob 68357

mcm 61772

coupon 56949

nursing 55968

happyhalloween 13274

bostonstrong 9913

apple 7388

pll 6070

wrongconversation

1

billycorgan 1

justwantcheesecake

1

stalkerwife 1

travelquestions 1

uneedheadandshoulders

1

hdbros 1

thoughtweweregonnamove

1

quietdownpeople

1

xodb 1

Los Angeles

Texas

Georgia

New York

Manhattan

Chicago

Florida

Philadelphia

Ohio

South Carolina

Oberon

Macy's

Woodberry

Juntura

Conconully

German Valley

Unionville Center

Cedarbend

Funkley

Alicia

625,000 1,250,000 1,875,000 2,500,0001

1

1

1

1

1

1

1

1

1

1,078,361

1,116,043

1,178,112

1,273,262

1,280,852

1,435,282

1,503,027

1,539,281

1,678,012

2,051,577

job

jobs

tweetmyjobs

oomf

wcw

pdx

veteranjob

mcm

coupon

nursing

happyhalloween

bostonstrong

apple

pll

wrongconversation

billycorgan

justwantcheesecake

stalkerwife

travelquestions

uneedheadandshoulders

hdbros

thoughtweweregonnamove

quietdownpeople

xodb

150,000 300,000 450,000 600,0001

1

1

1

1

1

1

1

1

1

6,070

7,388

9,913

13,274

55,968

56,949

61,772

68,357

76,820

84,900

133,661

230,950

412,111

472,043

Figure A.2: Top and bottom 10 of used hashtags. Including thefour hashtags analyzed in this project: happyhalloween,bostonstrong, apple and pll.

Page 57: Magazine recommendations based on social media trends

Appendix B

Example: #bostonstrong

This is a full example of how the trending framework works (part 1) andthe transformation from trends to magazine/documents on Issuu (part2). Short facts about the subset of the dataset is as follows, 5.465.644tweets containing 1.195.695 hashtags, where 280.855 is unique.

The fluctuation of the total amount of tweets per hour, follows the samepattern as described in section 3.1, as expected:

total tweets hour Wx constants count bostonstrong count happyhalloween count jobs fraction bostonstrong fraction happyhalloween fraction jobs ref fraction bostonstrong ref fraction happyhalloween ref fraction jobs A bostonstrong A happyhalloween A jobs FINAL bostonstrong FINAL happyhalloween FINAL jobs

157,676 80000 65 8 15 0.000412237753 0.000050736954 0.000095131789

155,341 0.00009 178 10 15 0.001145866191 0.000064374505 0.000096561758

130,396 9 9 20 0.000069020522 0.000069020522 0.000153378938

90,009 5 8 44 0.000055550001 0.000088880001 0.000488840005

54,489 1 3 34 0.000018352328 0.000055056984 0.000623979152

29,956 0 2 44 0.000000000000 0.000066764588 0.001468820937

15,455 4 0 34 0.000258815917 0.000000000000 0.002199935296

10,737 1 1 46 0.000093135885 0.000093135885 0.004284250722

20,177 7 0 25 0.000346929672 0.000000000000 0.001239034544

34,251 5 2 98 0.000145981139 0.000058392456 0.002861230329

43,133 6 2 94 0.000139104630 0.000046368210 0.002179305868

58,313 2 3 94 0.000034297669 0.000051446504 0.001611990465

75,845 4 6 296 0.000052739139 0.000079108709 0.003902696288

85,045 4 5 1015 0.000047033923 0.000058792404 0.011934858016

89,980 4 8 691 0.000044454323 0.000088908646 0.007679484330

90,166 10 17 818 0.000110906550 0.000188541135 0.009072155802

90,364 2 6 807 0.000022132708 0.000066398123 0.008930547563

92,980 2 8 656 0.000021510002 0.000086040009 0.007055280706

94,524 3 3 289 0.000031737971 0.000031737971 0.003057424569

99,682 5 6 222 0.000050159507 0.000060191409 0.002227082121

108,332 7 5 251 0.000064616180 0.000046154414 0.002316951593

128,469 2 6 175 0.000015567958 0.000046703874 0.001362196328

149,266 4 14 148 0.000026797797 0.000093792290 0.000991518497

154,043 7 9 39 0.000045441857 0.000058425245 0.000253176061

154,824 0.998811839382 5 9 18 0.000032294735 0.000058130522 0.000116261045 0.000036266646 0.000075830259 0.000616532975 -0.000003971911 -0.000017699737 -0.000500271930 -0.001148471910 -0.000205995824 -0.012420355014

145,399 0.997229380851 1 10 26 0.000006877626 0.000068776264 0.000178818286 0.000038851674 0.000058277511 0.000184545452 -0.000031974048 0.000010498753 -0.000005727165 -0.001174576892 -0.000177549095 -0.011907502368

120,202 0.973869650226 1 6 9 0.000008319329 0.000049915975 0.000074873962 0.000019985144 0.000063286291 0.000146557725 -0.000011665815 -0.000013370316 -0.000071683763 -0.001127285290 -0.000196635434 -0.011692806643

89,740 0.706117162445 5 7 13 0.000055716514 0.000078003120 0.000144862937 0.000007530092 0.000060240737 0.000131776612 0.000048186422 0.000017762383 0.000013086325 -0.000775090524 -0.000120589808 -0.008418167598

47,567 0.051223735645 2 4 10 0.000042045956 0.000084091912 0.000210229781 0.000028579322 0.000061921864 0.000104790847 0.000013466634 0.000022170048 0.000105438934 -0.000058005736 -0.000008522149 -0.000605947036

26,504 0.008044895961 1 1 11 0.000037730154 0.000037730154 0.000415031693 0.000050980649 0.000080112449 0.000167507847 -0.000013250495 -0.000042382295 0.000247523846 -0.000009324973 -0.000001857755 -0.000094023387

14,263 0.002687829072 0 0 28 0.000000000000 0.000000000000 0.001963121363 0.000040501681 0.000067502801 0.000283511766 -0.000040501681 -0.000067502801 0.001679609597 -0.000003188754 -0.000000688202 -0.000027564355

10,032 0.001838215698 0 2 21 0.000000000000 0.000199362041 0.002093301435 0.000024529644 0.000024529644 0.000956656119 -0.000024529644 0.000174832397 0.001136645316 -0.000002151440 -0.000000025200 -0.000019849444

18,514 0.003935633580 2 0 15 0.000108026358 0.000000000000 0.000810197688 0.000000000000 0.000082321465 0.002016875900 0.000108026358 -0.000082321465 -0.001206678212 -0.000004084557 -0.000001066016 -0.000051720271

33,579 0.015099336268 8 4 71 0.000238244141 0.000119122070 0.002114416749 0.000070062355 0.000070062355 0.001261122399 0.000168181785 0.000049059715 0.000853294350 -0.000014762386 -0.000002106077 -0.000167324256

44,987 0.041045201687 12 3 99 0.000266743726 0.000066685931 0.002200635739 0.000191964371 0.000076785749 0.001650893594 0.000074779355 -0.000010099817 0.000549742145 -0.000043962975 -0.000008153258 -0.000467304377

56,699 0.109379979688 11 8 81 0.000194006949 0.000141095963 0.001428596624 0.000254563043 0.000089097065 0.002163785862 -0.000060556094 0.000051998898 -0.000735189237 -0.000131958445 -0.000014934987 -0.001385849511

70,860 0.305212030948 16 11 601 0.000225797347 0.000155235676 0.008481512842 0.000226186496 0.000108176150 0.001770155184 -0.000000389149 0.000047059526 0.006711357659 -0.000349850920 -0.000043181889 -0.001594275153

78,774 0.472442953055 15 7 978 0.000190418158 0.000088861807 0.012415263920 0.000211666758 0.000148950682 0.005346545520 -0.000021248600 -0.000060088874 0.007068718399 -0.000551395158 -0.000117463496 -0.002298973371

81,468 0.532982036922 9 10 1049 0.000110472824 0.000122747582 0.012876221338 0.000207172167 0.000120293516 0.010552414558 -0.000096699343 0.000002454066 0.002323806780 -0.000662265109 -0.000099181065 -0.005122517665

80,945 0.521249692402 9 11 966 0.000111186608 0.000135894743 0.011934029279 0.000149773468 0.000106089540 0.012649617454 -0.000038586859 0.000029805204 -0.000715588174 -0.000617395788 -0.000082741055 -0.006594041186

82,375 0.553234966201 10 21 868 0.000121396055 0.000254931715 0.010537177542 0.000110828567 0.000129299994 0.012406642325 0.000010567488 0.000125631720 -0.001869464784 -0.000628086940 -0.000034803688 -0.007637034058

84,356 0.596773691157 13 11 949 0.000154108777 0.000130399734 0.011249940727 0.000116336027 0.000195934362 0.011229488121 0.000037772750 -0.000065534628 0.000020452606 -0.000661281013 -0.000151625731 -0.007110203695

86,172 0.635406055845 7 20 789 0.000081232883 0.000232093952 0.009156106392 0.000137946753 0.000191925917 0.010897793452 -0.000056713869 0.000040168035 -0.001741687060 -0.000764126653 -0.000094277166 -0.008690159564

89,045 0.692971862621 19 11 254 0.000213375260 0.000123533045 0.002852490314 0.000117282792 0.000181788328 0.010191874648 0.000096092467 -0.000058255283 -0.007339384334 -0.000727463653 -0.000171022974 -0.013356507622

99,441 0.851913697642 30 22 178 0.000301686427 0.000221236713 0.001790006134 0.000148387428 0.000176923472 0.005952618753 0.000153298999 0.000044313241 -0.004162612618 -0.000845581587 -0.000122869719 -0.013713655731

114,784 0.958135861164 81 35 175 0.000705673265 0.000304920546 0.001524602732 0.000259966257 0.000175079316 0.002291947413 0.000445707007 0.000129841230 -0.000767344680 -0.000670847623 -0.000056242484 -0.012170435920

132,495 0.991203360961 129 26 146 0.000973621646 0.000196233820 0.001101928375 0.000518146808 0.000266075388 0.001647800210 0.000455474838 -0.000069841568 -0.000545871835 -0.000684318230 -0.000256109804 -0.012370941376

152,910 0.998588796229 355 55 35 0.002321627101 0.000359688706 0.000228892813 0.000849243163 0.000246684919 0.001298128834 0.001472383938 0.000113003787 -0.001069236021 0.000326056964 -0.000075430750 -0.012985742611

153,378 0.998646922963 208 46 61 0.001356126694 0.000299912634 0.000397710232 0.001695835742 0.000283807221 0.000634186507 -0.000339709049 0.000016105413 -0.000236476275 -0.002657735157 -0.000343118398 -0.013094955123

153,392 0.998648624464 2163 41 36 0.014101126526 0.000267289037 0.000234692813 0.001838139268 0.000329755002 0.000313430497 0.012262987258 -0.000062465965 -0.000078737684 0.009927925646 -0.000421584181 -0.012937452007

119,355 0.971858093513 397 87 48 0.003326211721 0.000728917934 0.000402161619 0.007728917430 0.000283600091 0.000316197803 -0.004402705709 0.000445317843 0.000085963816 -0.006535097264 0.000083219370 -0.012430315292

82,169 0.548648112120 116 89 48 0.001411724616 0.001083133542 0.000584161910 0.009385987747 0.000469299387 0.000307977723 -0.007974263131 0.000613834155 0.000276184187 -0.005648820738 0.000139436421 -0.006912986596

49,355 0.059633623157 38 44 36 0.000769932124 0.000891500355 0.000729409381 0.002545602509 0.000873345110 0.000476370060 -0.001775670384 0.000018155244 0.000253039321 -0.000244336694 -0.000020366878 -0.000752766079

27,905 0.009116147803 17 46 67 0.000609209819 0.001648450099 0.002401003404 0.001170888963 0.001011222286 0.000638666707 -0.000561679144 0.000637227812 0.001762336697 -0.000026284646 0.000002530088 -0.000101315815

15,213 0.002927046635 3 40 38 0.000197199763 0.002629330178 0.002497863669 0.000711881957 0.001164897748 0.001333160756 -0.000514682194 0.001464432430 0.001164702913 -0.000008302010 0.000003233636 -0.000034280161

10,524 0.001921280998 11 43 59 0.001045229951 0.004085898898 0.005606233371 0.000463843406 0.001994526648 0.002435177884 0.000581386544 0.002091372250 0.003171055487 -0.000003343491 0.000003327051 -0.000018646351

19,994 0.004493856737 33 98 43 0.001650495149 0.004901470441 0.002150645194 0.000543963943 0.003224929090 0.003768893033 0.001106531206 0.001676541351 -0.001618247840 -0.000005460467 0.000005917747 -0.000065136068

37,977 0.022268325137 58 242 134 0.001527240172 0.006372277958 0.003528451431 0.001441772069 0.004620224130 0.003342289796 0.000085468103 0.001752053828 0.000186161635 -0.000049795516 0.000031005639 -0.000282586375

47,841 0.052435557917 57 398 248 0.001191446667 0.008319224096 0.005183838130 0.001569750392 0.005865001466 0.003053250763 -0.000378303725 0.002454222630 0.002130587367 -0.000141572379 0.000109828055 -0.000563453312

60,822 0.151097373647 48 519 164 0.000789188123 0.008533096577 0.002696392753 0.001340045212 0.007457642919 0.004451280617 -0.000550857089 0.001075453658 -0.001754887864 -0.000434024817 0.000108150204 -0.002210722174

73,817 0.364364626367 50 537 333 0.000677350746 0.007274747009 0.004511155967 0.000966290274 0.008438935056 0.003791538978 -0.000288939528 -0.001164188047 0.000719616988 -0.000951198134 -0.000555246784 -0.004429436602

79,683 0.492867983759 30 575 615 0.000376491849 0.007216093772 0.007718082904 0.000727872310 0.007843195508 0.003691352431 -0.000351380461 -0.000627101736 0.004026730472 -0.001317439848 -0.000486357416 -0.004361630721

82,518 0.556413771330 29 570 698 0.000351438474 0.006907583800 0.008458760513 0.000521172638 0.007244299674 0.006175895765 -0.000169734164 -0.000336715874 0.002282864747 -0.001386227717 -0.000387489099 -0.005894289492

82,125 0.547667296341 21 576 952 0.000255707763 0.007013698630 0.011592085236 0.000363746216 0.007059142669 0.008094894606 -0.000108038454 -0.000045444039 0.003497190630 -0.001330648365 -0.000221877955 -0.005136588390

80,785 0.517655156915 13 484 826 0.000160920963 0.005991211240 0.010224670421 0.000303687372 0.006960514568 0.010021683278 -0.000142766409 -0.000969303328 0.000202987143 -0.001275706009 -0.000687959580 -0.006560365036

83,266 0.572960435379 11 474 808 0.000132106742 0.005692599620 0.009703840703 0.000208704192 0.006506660119 0.010914001596 -0.000076597450 -0.000814060499 -0.001210160893 -0.001374087783 -0.000672511855 -0.008070939696

82,325 0.552122454395 10 450 573 0.000121469784 0.005466140298 0.006960218646 0.000146295969 0.005839647427 0.009960317218 -0.000024826185 -0.000373507129 -0.003000098573 -0.001295529547 -0.000404813884 -0.008765672716

85,047 0.611644482390 9 494 236 0.000105823839 0.005808552918 0.002774936212 0.000126818487 0.005580013407 0.008339825232 -0.000020994648 0.000228539511 -0.005564889020 -0.001432851667 -0.000080216681 -0.011279403400

88,557 0.683549014676 18 656 151 0.000203258918 0.007407658344 0.001705116479 0.000113519585 0.005640130966 0.004833544440 0.000089739333 0.001767527379 -0.003128427961 -0.001525604685 0.000962326738 -0.010939962259

90,514 0.720362412173 18 643 90 0.000198864264 0.007103873434 0.000994321320 0.000155526370 0.006624271330 0.002229211308 0.000043337894 0.000479602104 -0.001234889988 -0.001641193909 0.000086381105 -0.010165114194

101,491 0.873712464645 15 657 87 0.000147796356 0.006473480407 0.000857218867 0.000201037577 0.007254105913 0.001345834892 -0.000053241221 -0.000780625506 -0.000488616026 -0.002074952055 -0.000996306741 -0.011677024993

115,342 0.960104553736 8 690 52 0.000069358950 0.005982209429 0.000450833174 0.000171870524 0.006770657014 0.000921850993 -0.000102511574 -0.000788447585 -0.000471017819 -0.002327426581 -0.001102330881 -0.012814745095

129,015 0.988006802870 14 618 23 0.000108514514 0.004790140681 0.000178273844 0.000106072415 0.006212154054 0.000641046335 0.000002442099 -0.001422013373 -0.000462772491 -0.013929596125 -0.009835716354 -0.011910281442

117,151 0.965894303881 9 552 25 0.000076823928 0.004711867590 0.000213399800 0.000090032207 0.005352823942 0.000306927978 -0.000013208279 -0.000640956352 -0.000093528178 -0.013632955591 -0.008861165468 -0.011287067434

93,636 0.773335144678 3 435 28 0.000032038959 0.004645649109 0.000299030287 0.000093432887 0.004752890326 0.000194990372 -0.000061393927 -0.000107241217 0.000104039915 -0.010952374803 -0.006681876878 -0.008884109190

71,741 0.322280762354 1 300 160 0.000013939031 0.004181709204 0.002230244909 0.000056929507 0.004682451954 0.000251438656 -0.000042990476 -0.000500742750 0.001978806253 -0.004558376810 -0.002911432625 -0.003098174879

50,220 0.064151868077 1 201 28 0.000019912386 0.004002389486 0.000557546794 0.000024187160 0.004444390695 0.001136796532 -0.000004274775 -0.000442001209 -0.000579249737 -0.000904887843 -0.000575769289 -0.000780813876

32,794 0.014083885541 0 112 48 0.000000000000 0.003415258889 0.001463682381 0.000016398685 0.004107870549 0.001541476374 -0.000016398685 -0.000692611660 -0.000077793993 -0.000198829609 -0.000129933819 -0.000164357243

20,598 0.004743713550 2 96 58 0.000097096806 0.004660646665 0.002815807360 0.000012046161 0.003770448358 0.000915508228 0.000085050645 0.000890198307 0.001900299132 -0.000066488249 -0.000036255720 -0.000045975057

14,780 0.002815488608 0 49 37 0.000000000000 0.003315290934 0.002503382950 0.000037458795 0.003895714714 0.001985316152 -0.000037458795 -0.000580423780 0.000518066798 -0.000039807026 -0.000025659013 -0.000031178773

20,730 0.004800132821 0 19 159 0.000000000000 0.000916546068 0.007670043415 0.000056532308 0.004098592346 0.002685284640 -0.000056532308 -0.003182046277 0.004984758775 -0.000067958643 -0.000056234242 -0.000031716045

33,437 0.014910453545 5 21 89 0.000149534946 0.000628046775 0.002661722044 0.000000000000 0.001914953534 0.005519571952 0.000149534946 -0.001286906760 -0.002857849907 -0.000208024558 -0.000146420704 -0.000215455087

43,474 0.036008064557 3 25 135 0.000069006763 0.000575056356 0.003105304320 0.000092307124 0.000738456994 0.004578433363 -0.000023300362 -0.000163400639 -0.001473129044 -0.000508593275 -0.000313144033 -0.000470453079

57,497 0.116575160226 4 35 138 0.000069568847 0.000608727412 0.002400125224 0.000104016331 0.000598093901 0.002912457256 -0.000034447483 0.000010633511 -0.000512332032 -0.001647856805 -0.000993507497 -0.001411074382

72,142 0.330213409302 3 27 326 0.000041584652 0.000374261872 0.004518865571 0.000069326836 0.000594230026 0.002703746620 -0.000027742184 -0.000219968154 0.001815118950 -0.004665541906 -0.002890379347 -0.003228485370

80,110 0.502474979786 4 39 680 0.000049931344 0.000486830608 0.008488328548 0.000053996097 0.000478251144 0.003579169849 -0.000004064752 0.000008579464 0.004909158700 -0.007087505702 -0.004283356564 -0.003358003376

84,234 0.594128736584 6 45 711 0.000071230145 0.000534226084 0.008440772135 0.000045976408 0.000433491842 0.006607466569 0.000025253737 0.000100734242 0.001833305566 -0.008362880516 -0.005009908780 -0.005797971436

86,018 0.632189190487 3 39 832 0.000034876421 0.000453393476 0.009672394150 0.000060847977 0.000511123010 0.008463953658 -0.000025971556 -0.000057729534 0.001208440492 -0.008930998700 -0.005431027405 -0.006564427965

84,815 0.606673333129 0 28 862 0.000000000000 0.000330130284 0.010163296587 0.000052862815 0.000493386274 0.009063035970 -0.000052862815 -0.000163255991 0.001100260616 -0.008586847890 -0.005275845198 -0.006365110212

85,221 0.615357720536 3 34 912 0.000035202591 0.000398962697 0.010701587637 0.000017561010 0.000392195887 0.009916116909 0.000017641581 0.000006766810 0.000785470727 -0.008666381193 -0.005246742850 -0.006649933671

85,339 0.617868296609 5 14 650 0.000058589859 0.000164051606 0.007616681705 0.000017643323 0.000364628667 0.010433084759 0.000040946537 -0.000200577061 -0.002816403054 -0.008687339460 -0.005396260054 -0.008902548116

84,941 0.609375994694 17 18 353 0.000200138920 0.000211911798 0.004155825809 0.000046904315 0.000281425891 0.009158067542 0.000153234605 -0.000069514094 -0.005002241734 -0.008499510513 -0.005242224434 -0.010112184503

86,959 0.651652295784 8 22 273 0.000091997378 0.000252992790 0.003139410527 0.000129198966 0.000187925769 0.005890298332 -0.000037201588 0.000065067020 -0.002750887805 -0.009213273974 -0.005518210901 -0.009346631311

91,068 0.730295040770 5 19 177 0.000054904028 0.000208635305 0.001943602583 0.000145433392 0.000232693426 0.003641652123 -0.000090529364 -0.000024058121 -0.001698049541 -0.010364095916 -0.006249247639 -0.009705719518

96,702 0.818048283021 7 18 166 0.000072387334 0.000186138860 0.001716613927 0.000073022631 0.000230302145 0.002527706471 -0.000000635297 -0.000044163286 -0.000811092544 -0.011535922047 -0.007016612704 -0.010146398287

105,291 0.906885813868 10 25 67 0.000094974879 0.000237437198 0.000636331690 0.000063907973 0.000197049582 0.001826702881 0.000031066907 0.000040387616 -0.001190371191 -0.012759937469 -0.007701917278 -0.011592228400

#bostonstrong#happyhalloween#jobs

1 12 3624 48 60 72

#bostonstrong#happyhalloween#jobs

1 12 3624 48 60 72

Tren

d sc

ore

#bostonstrong#happyhalloween#jobs

24 36 6048 72

Twee

t cou

nt

tweets per hour

1 12 3624 48 60 72

Figure B.1: Total tweets per hour

Page 58: Magazine recommendations based on social media trends

44 Example: #bostonstrong

Whereas the three different hashtags used in this example: #bostonstrong,#happyhalloween and #jobs, differs from the once, in the explanation oftrending framework in chapter 3. The hashtag #bostonstrong - used bythe fans, of the baseball team Boston Red Sox - was a unexpected event,because Red Sox became the champions of the Word Series1. Whereas#happyhalloween is hashtag used to celebrate Halloween, a yearly recur-ring event.

total tweets hour Wx constants count bostonstrong count happyhalloween count jobs fraction bostonstrong fraction happyhalloween fraction jobs ref fraction bostonstrong ref fraction happyhalloween ref fraction jobs A bostonstrong A happyhalloween A jobs FINAL bostonstrong FINAL happyhalloween FINAL jobs

157,676 80000 65 8 15 0.000412237753 0.000050736954 0.000095131789

155,341 0.00009 178 10 15 0.001145866191 0.000064374505 0.000096561758

130,396 9 9 20 0.000069020522 0.000069020522 0.000153378938

90,009 5 8 44 0.000055550001 0.000088880001 0.000488840005

54,489 1 3 34 0.000018352328 0.000055056984 0.000623979152

29,956 0 2 44 0.000000000000 0.000066764588 0.001468820937

15,455 4 0 34 0.000258815917 0.000000000000 0.002199935296

10,737 1 1 46 0.000093135885 0.000093135885 0.004284250722

20,177 7 0 25 0.000346929672 0.000000000000 0.001239034544

34,251 5 2 98 0.000145981139 0.000058392456 0.002861230329

43,133 6 2 94 0.000139104630 0.000046368210 0.002179305868

58,313 2 3 94 0.000034297669 0.000051446504 0.001611990465

75,845 4 6 296 0.000052739139 0.000079108709 0.003902696288

85,045 4 5 1015 0.000047033923 0.000058792404 0.011934858016

89,980 4 8 691 0.000044454323 0.000088908646 0.007679484330

90,166 10 17 818 0.000110906550 0.000188541135 0.009072155802

90,364 2 6 807 0.000022132708 0.000066398123 0.008930547563

92,980 2 8 656 0.000021510002 0.000086040009 0.007055280706

94,524 3 3 289 0.000031737971 0.000031737971 0.003057424569

99,682 5 6 222 0.000050159507 0.000060191409 0.002227082121

108,332 7 5 251 0.000064616180 0.000046154414 0.002316951593

128,469 2 6 175 0.000015567958 0.000046703874 0.001362196328

149,266 4 14 148 0.000026797797 0.000093792290 0.000991518497

154,043 7 9 39 0.000045441857 0.000058425245 0.000253176061

154,824 0.998811839382 5 9 18 0.000032294735 0.000058130522 0.000116261045 0.000036266646 0.000075830259 0.000616532975 -0.000003971911 -0.000017699737 -0.000500271930 -0.001148471910 -0.000205995824 -0.012420355014

145,399 0.997229380851 1 10 26 0.000006877626 0.000068776264 0.000178818286 0.000038851674 0.000058277511 0.000184545452 -0.000031974048 0.000010498753 -0.000005727165 -0.001174576892 -0.000177549095 -0.011907502368

120,202 0.973869650226 1 6 9 0.000008319329 0.000049915975 0.000074873962 0.000019985144 0.000063286291 0.000146557725 -0.000011665815 -0.000013370316 -0.000071683763 -0.001127285290 -0.000196635434 -0.011692806643

89,740 0.706117162445 5 7 13 0.000055716514 0.000078003120 0.000144862937 0.000007530092 0.000060240737 0.000131776612 0.000048186422 0.000017762383 0.000013086325 -0.000775090524 -0.000120589808 -0.008418167598

47,567 0.051223735645 2 4 10 0.000042045956 0.000084091912 0.000210229781 0.000028579322 0.000061921864 0.000104790847 0.000013466634 0.000022170048 0.000105438934 -0.000058005736 -0.000008522149 -0.000605947036

26,504 0.008044895961 1 1 11 0.000037730154 0.000037730154 0.000415031693 0.000050980649 0.000080112449 0.000167507847 -0.000013250495 -0.000042382295 0.000247523846 -0.000009324973 -0.000001857755 -0.000094023387

14,263 0.002687829072 0 0 28 0.000000000000 0.000000000000 0.001963121363 0.000040501681 0.000067502801 0.000283511766 -0.000040501681 -0.000067502801 0.001679609597 -0.000003188754 -0.000000688202 -0.000027564355

10,032 0.001838215698 0 2 21 0.000000000000 0.000199362041 0.002093301435 0.000024529644 0.000024529644 0.000956656119 -0.000024529644 0.000174832397 0.001136645316 -0.000002151440 -0.000000025200 -0.000019849444

18,514 0.003935633580 2 0 15 0.000108026358 0.000000000000 0.000810197688 0.000000000000 0.000082321465 0.002016875900 0.000108026358 -0.000082321465 -0.001206678212 -0.000004084557 -0.000001066016 -0.000051720271

33,579 0.015099336268 8 4 71 0.000238244141 0.000119122070 0.002114416749 0.000070062355 0.000070062355 0.001261122399 0.000168181785 0.000049059715 0.000853294350 -0.000014762386 -0.000002106077 -0.000167324256

44,987 0.041045201687 12 3 99 0.000266743726 0.000066685931 0.002200635739 0.000191964371 0.000076785749 0.001650893594 0.000074779355 -0.000010099817 0.000549742145 -0.000043962975 -0.000008153258 -0.000467304377

56,699 0.109379979688 11 8 81 0.000194006949 0.000141095963 0.001428596624 0.000254563043 0.000089097065 0.002163785862 -0.000060556094 0.000051998898 -0.000735189237 -0.000131958445 -0.000014934987 -0.001385849511

70,860 0.305212030948 16 11 601 0.000225797347 0.000155235676 0.008481512842 0.000226186496 0.000108176150 0.001770155184 -0.000000389149 0.000047059526 0.006711357659 -0.000349850920 -0.000043181889 -0.001594275153

78,774 0.472442953055 15 7 978 0.000190418158 0.000088861807 0.012415263920 0.000211666758 0.000148950682 0.005346545520 -0.000021248600 -0.000060088874 0.007068718399 -0.000551395158 -0.000117463496 -0.002298973371

81,468 0.532982036922 9 10 1049 0.000110472824 0.000122747582 0.012876221338 0.000207172167 0.000120293516 0.010552414558 -0.000096699343 0.000002454066 0.002323806780 -0.000662265109 -0.000099181065 -0.005122517665

80,945 0.521249692402 9 11 966 0.000111186608 0.000135894743 0.011934029279 0.000149773468 0.000106089540 0.012649617454 -0.000038586859 0.000029805204 -0.000715588174 -0.000617395788 -0.000082741055 -0.006594041186

82,375 0.553234966201 10 21 868 0.000121396055 0.000254931715 0.010537177542 0.000110828567 0.000129299994 0.012406642325 0.000010567488 0.000125631720 -0.001869464784 -0.000628086940 -0.000034803688 -0.007637034058

84,356 0.596773691157 13 11 949 0.000154108777 0.000130399734 0.011249940727 0.000116336027 0.000195934362 0.011229488121 0.000037772750 -0.000065534628 0.000020452606 -0.000661281013 -0.000151625731 -0.007110203695

86,172 0.635406055845 7 20 789 0.000081232883 0.000232093952 0.009156106392 0.000137946753 0.000191925917 0.010897793452 -0.000056713869 0.000040168035 -0.001741687060 -0.000764126653 -0.000094277166 -0.008690159564

89,045 0.692971862621 19 11 254 0.000213375260 0.000123533045 0.002852490314 0.000117282792 0.000181788328 0.010191874648 0.000096092467 -0.000058255283 -0.007339384334 -0.000727463653 -0.000171022974 -0.013356507622

99,441 0.851913697642 30 22 178 0.000301686427 0.000221236713 0.001790006134 0.000148387428 0.000176923472 0.005952618753 0.000153298999 0.000044313241 -0.004162612618 -0.000845581587 -0.000122869719 -0.013713655731

114,784 0.958135861164 81 35 175 0.000705673265 0.000304920546 0.001524602732 0.000259966257 0.000175079316 0.002291947413 0.000445707007 0.000129841230 -0.000767344680 -0.000670847623 -0.000056242484 -0.012170435920

132,495 0.991203360961 129 26 146 0.000973621646 0.000196233820 0.001101928375 0.000518146808 0.000266075388 0.001647800210 0.000455474838 -0.000069841568 -0.000545871835 -0.000684318230 -0.000256109804 -0.012370941376

152,910 0.998588796229 355 55 35 0.002321627101 0.000359688706 0.000228892813 0.000849243163 0.000246684919 0.001298128834 0.001472383938 0.000113003787 -0.001069236021 0.000326056964 -0.000075430750 -0.012985742611

153,378 0.998646922963 208 46 61 0.001356126694 0.000299912634 0.000397710232 0.001695835742 0.000283807221 0.000634186507 -0.000339709049 0.000016105413 -0.000236476275 -0.002657735157 -0.000343118398 -0.013094955123

153,392 0.998648624464 2163 41 36 0.014101126526 0.000267289037 0.000234692813 0.001838139268 0.000329755002 0.000313430497 0.012262987258 -0.000062465965 -0.000078737684 0.009927925646 -0.000421584181 -0.012937452007

119,355 0.971858093513 397 87 48 0.003326211721 0.000728917934 0.000402161619 0.007728917430 0.000283600091 0.000316197803 -0.004402705709 0.000445317843 0.000085963816 -0.006535097264 0.000083219370 -0.012430315292

82,169 0.548648112120 116 89 48 0.001411724616 0.001083133542 0.000584161910 0.009385987747 0.000469299387 0.000307977723 -0.007974263131 0.000613834155 0.000276184187 -0.005648820738 0.000139436421 -0.006912986596

49,355 0.059633623157 38 44 36 0.000769932124 0.000891500355 0.000729409381 0.002545602509 0.000873345110 0.000476370060 -0.001775670384 0.000018155244 0.000253039321 -0.000244336694 -0.000020366878 -0.000752766079

27,905 0.009116147803 17 46 67 0.000609209819 0.001648450099 0.002401003404 0.001170888963 0.001011222286 0.000638666707 -0.000561679144 0.000637227812 0.001762336697 -0.000026284646 0.000002530088 -0.000101315815

15,213 0.002927046635 3 40 38 0.000197199763 0.002629330178 0.002497863669 0.000711881957 0.001164897748 0.001333160756 -0.000514682194 0.001464432430 0.001164702913 -0.000008302010 0.000003233636 -0.000034280161

10,524 0.001921280998 11 43 59 0.001045229951 0.004085898898 0.005606233371 0.000463843406 0.001994526648 0.002435177884 0.000581386544 0.002091372250 0.003171055487 -0.000003343491 0.000003327051 -0.000018646351

19,994 0.004493856737 33 98 43 0.001650495149 0.004901470441 0.002150645194 0.000543963943 0.003224929090 0.003768893033 0.001106531206 0.001676541351 -0.001618247840 -0.000005460467 0.000005917747 -0.000065136068

37,977 0.022268325137 58 242 134 0.001527240172 0.006372277958 0.003528451431 0.001441772069 0.004620224130 0.003342289796 0.000085468103 0.001752053828 0.000186161635 -0.000049795516 0.000031005639 -0.000282586375

47,841 0.052435557917 57 398 248 0.001191446667 0.008319224096 0.005183838130 0.001569750392 0.005865001466 0.003053250763 -0.000378303725 0.002454222630 0.002130587367 -0.000141572379 0.000109828055 -0.000563453312

60,822 0.151097373647 48 519 164 0.000789188123 0.008533096577 0.002696392753 0.001340045212 0.007457642919 0.004451280617 -0.000550857089 0.001075453658 -0.001754887864 -0.000434024817 0.000108150204 -0.002210722174

73,817 0.364364626367 50 537 333 0.000677350746 0.007274747009 0.004511155967 0.000966290274 0.008438935056 0.003791538978 -0.000288939528 -0.001164188047 0.000719616988 -0.000951198134 -0.000555246784 -0.004429436602

79,683 0.492867983759 30 575 615 0.000376491849 0.007216093772 0.007718082904 0.000727872310 0.007843195508 0.003691352431 -0.000351380461 -0.000627101736 0.004026730472 -0.001317439848 -0.000486357416 -0.004361630721

82,518 0.556413771330 29 570 698 0.000351438474 0.006907583800 0.008458760513 0.000521172638 0.007244299674 0.006175895765 -0.000169734164 -0.000336715874 0.002282864747 -0.001386227717 -0.000387489099 -0.005894289492

82,125 0.547667296341 21 576 952 0.000255707763 0.007013698630 0.011592085236 0.000363746216 0.007059142669 0.008094894606 -0.000108038454 -0.000045444039 0.003497190630 -0.001330648365 -0.000221877955 -0.005136588390

80,785 0.517655156915 13 484 826 0.000160920963 0.005991211240 0.010224670421 0.000303687372 0.006960514568 0.010021683278 -0.000142766409 -0.000969303328 0.000202987143 -0.001275706009 -0.000687959580 -0.006560365036

83,266 0.572960435379 11 474 808 0.000132106742 0.005692599620 0.009703840703 0.000208704192 0.006506660119 0.010914001596 -0.000076597450 -0.000814060499 -0.001210160893 -0.001374087783 -0.000672511855 -0.008070939696

82,325 0.552122454395 10 450 573 0.000121469784 0.005466140298 0.006960218646 0.000146295969 0.005839647427 0.009960317218 -0.000024826185 -0.000373507129 -0.003000098573 -0.001295529547 -0.000404813884 -0.008765672716

85,047 0.611644482390 9 494 236 0.000105823839 0.005808552918 0.002774936212 0.000126818487 0.005580013407 0.008339825232 -0.000020994648 0.000228539511 -0.005564889020 -0.001432851667 -0.000080216681 -0.011279403400

88,557 0.683549014676 18 656 151 0.000203258918 0.007407658344 0.001705116479 0.000113519585 0.005640130966 0.004833544440 0.000089739333 0.001767527379 -0.003128427961 -0.001525604685 0.000962326738 -0.010939962259

90,514 0.720362412173 18 643 90 0.000198864264 0.007103873434 0.000994321320 0.000155526370 0.006624271330 0.002229211308 0.000043337894 0.000479602104 -0.001234889988 -0.001641193909 0.000086381105 -0.010165114194

101,491 0.873712464645 15 657 87 0.000147796356 0.006473480407 0.000857218867 0.000201037577 0.007254105913 0.001345834892 -0.000053241221 -0.000780625506 -0.000488616026 -0.002074952055 -0.000996306741 -0.011677024993

115,342 0.960104553736 8 690 52 0.000069358950 0.005982209429 0.000450833174 0.000171870524 0.006770657014 0.000921850993 -0.000102511574 -0.000788447585 -0.000471017819 -0.002327426581 -0.001102330881 -0.012814745095

129,015 0.988006802870 14 618 23 0.000108514514 0.004790140681 0.000178273844 0.000106072415 0.006212154054 0.000641046335 0.000002442099 -0.001422013373 -0.000462772491 -0.013929596125 -0.009835716354 -0.011910281442

117,151 0.965894303881 9 552 25 0.000076823928 0.004711867590 0.000213399800 0.000090032207 0.005352823942 0.000306927978 -0.000013208279 -0.000640956352 -0.000093528178 -0.013632955591 -0.008861165468 -0.011287067434

93,636 0.773335144678 3 435 28 0.000032038959 0.004645649109 0.000299030287 0.000093432887 0.004752890326 0.000194990372 -0.000061393927 -0.000107241217 0.000104039915 -0.010952374803 -0.006681876878 -0.008884109190

71,741 0.322280762354 1 300 160 0.000013939031 0.004181709204 0.002230244909 0.000056929507 0.004682451954 0.000251438656 -0.000042990476 -0.000500742750 0.001978806253 -0.004558376810 -0.002911432625 -0.003098174879

50,220 0.064151868077 1 201 28 0.000019912386 0.004002389486 0.000557546794 0.000024187160 0.004444390695 0.001136796532 -0.000004274775 -0.000442001209 -0.000579249737 -0.000904887843 -0.000575769289 -0.000780813876

32,794 0.014083885541 0 112 48 0.000000000000 0.003415258889 0.001463682381 0.000016398685 0.004107870549 0.001541476374 -0.000016398685 -0.000692611660 -0.000077793993 -0.000198829609 -0.000129933819 -0.000164357243

20,598 0.004743713550 2 96 58 0.000097096806 0.004660646665 0.002815807360 0.000012046161 0.003770448358 0.000915508228 0.000085050645 0.000890198307 0.001900299132 -0.000066488249 -0.000036255720 -0.000045975057

14,780 0.002815488608 0 49 37 0.000000000000 0.003315290934 0.002503382950 0.000037458795 0.003895714714 0.001985316152 -0.000037458795 -0.000580423780 0.000518066798 -0.000039807026 -0.000025659013 -0.000031178773

20,730 0.004800132821 0 19 159 0.000000000000 0.000916546068 0.007670043415 0.000056532308 0.004098592346 0.002685284640 -0.000056532308 -0.003182046277 0.004984758775 -0.000067958643 -0.000056234242 -0.000031716045

33,437 0.014910453545 5 21 89 0.000149534946 0.000628046775 0.002661722044 0.000000000000 0.001914953534 0.005519571952 0.000149534946 -0.001286906760 -0.002857849907 -0.000208024558 -0.000146420704 -0.000215455087

43,474 0.036008064557 3 25 135 0.000069006763 0.000575056356 0.003105304320 0.000092307124 0.000738456994 0.004578433363 -0.000023300362 -0.000163400639 -0.001473129044 -0.000508593275 -0.000313144033 -0.000470453079

57,497 0.116575160226 4 35 138 0.000069568847 0.000608727412 0.002400125224 0.000104016331 0.000598093901 0.002912457256 -0.000034447483 0.000010633511 -0.000512332032 -0.001647856805 -0.000993507497 -0.001411074382

72,142 0.330213409302 3 27 326 0.000041584652 0.000374261872 0.004518865571 0.000069326836 0.000594230026 0.002703746620 -0.000027742184 -0.000219968154 0.001815118950 -0.004665541906 -0.002890379347 -0.003228485370

80,110 0.502474979786 4 39 680 0.000049931344 0.000486830608 0.008488328548 0.000053996097 0.000478251144 0.003579169849 -0.000004064752 0.000008579464 0.004909158700 -0.007087505702 -0.004283356564 -0.003358003376

84,234 0.594128736584 6 45 711 0.000071230145 0.000534226084 0.008440772135 0.000045976408 0.000433491842 0.006607466569 0.000025253737 0.000100734242 0.001833305566 -0.008362880516 -0.005009908780 -0.005797971436

86,018 0.632189190487 3 39 832 0.000034876421 0.000453393476 0.009672394150 0.000060847977 0.000511123010 0.008463953658 -0.000025971556 -0.000057729534 0.001208440492 -0.008930998700 -0.005431027405 -0.006564427965

84,815 0.606673333129 0 28 862 0.000000000000 0.000330130284 0.010163296587 0.000052862815 0.000493386274 0.009063035970 -0.000052862815 -0.000163255991 0.001100260616 -0.008586847890 -0.005275845198 -0.006365110212

85,221 0.615357720536 3 34 912 0.000035202591 0.000398962697 0.010701587637 0.000017561010 0.000392195887 0.009916116909 0.000017641581 0.000006766810 0.000785470727 -0.008666381193 -0.005246742850 -0.006649933671

85,339 0.617868296609 5 14 650 0.000058589859 0.000164051606 0.007616681705 0.000017643323 0.000364628667 0.010433084759 0.000040946537 -0.000200577061 -0.002816403054 -0.008687339460 -0.005396260054 -0.008902548116

84,941 0.609375994694 17 18 353 0.000200138920 0.000211911798 0.004155825809 0.000046904315 0.000281425891 0.009158067542 0.000153234605 -0.000069514094 -0.005002241734 -0.008499510513 -0.005242224434 -0.010112184503

86,959 0.651652295784 8 22 273 0.000091997378 0.000252992790 0.003139410527 0.000129198966 0.000187925769 0.005890298332 -0.000037201588 0.000065067020 -0.002750887805 -0.009213273974 -0.005518210901 -0.009346631311

91,068 0.730295040770 5 19 177 0.000054904028 0.000208635305 0.001943602583 0.000145433392 0.000232693426 0.003641652123 -0.000090529364 -0.000024058121 -0.001698049541 -0.010364095916 -0.006249247639 -0.009705719518

96,702 0.818048283021 7 18 166 0.000072387334 0.000186138860 0.001716613927 0.000073022631 0.000230302145 0.002527706471 -0.000000635297 -0.000044163286 -0.000811092544 -0.011535922047 -0.007016612704 -0.010146398287

105,291 0.906885813868 10 25 67 0.000094974879 0.000237437198 0.000636331690 0.000063907973 0.000197049582 0.001826702881 0.000031066907 0.000040387616 -0.001190371191 -0.012759937469 -0.007701917278 -0.011592228400

#bostonstrong#happyhalloween#jobs

1 12 3624 48 60 72

#bostonstrong#happyhalloween#jobs

1 12 3624 48 60 72

Tren

d sc

ore

#bostonstrong#happyhalloween#jobs

24 36 6048 72

Twee

t cou

nt

tweets per hour

1 12 3624 48 60 72

Figure B.2: Raw tweet count for hashtags

Applying the knowledge learned and the combined trend_score equation,3.5, results in the expected trend #bostonstrong:

total tweets hour Wx constants count bostonstrong count happyhalloween count jobs fraction bostonstrong fraction happyhalloween fraction jobs ref fraction bostonstrong ref fraction happyhalloween ref fraction jobs A bostonstrong A happyhalloween A jobs FINAL bostonstrong FINAL happyhalloween FINAL jobs

157,676 80000 65 8 15 0.000412237753 0.000050736954 0.000095131789

155,341 0.00009 178 10 15 0.001145866191 0.000064374505 0.000096561758

130,396 9 9 20 0.000069020522 0.000069020522 0.000153378938

90,009 5 8 44 0.000055550001 0.000088880001 0.000488840005

54,489 1 3 34 0.000018352328 0.000055056984 0.000623979152

29,956 0 2 44 0.000000000000 0.000066764588 0.001468820937

15,455 4 0 34 0.000258815917 0.000000000000 0.002199935296

10,737 1 1 46 0.000093135885 0.000093135885 0.004284250722

20,177 7 0 25 0.000346929672 0.000000000000 0.001239034544

34,251 5 2 98 0.000145981139 0.000058392456 0.002861230329

43,133 6 2 94 0.000139104630 0.000046368210 0.002179305868

58,313 2 3 94 0.000034297669 0.000051446504 0.001611990465

75,845 4 6 296 0.000052739139 0.000079108709 0.003902696288

85,045 4 5 1015 0.000047033923 0.000058792404 0.011934858016

89,980 4 8 691 0.000044454323 0.000088908646 0.007679484330

90,166 10 17 818 0.000110906550 0.000188541135 0.009072155802

90,364 2 6 807 0.000022132708 0.000066398123 0.008930547563

92,980 2 8 656 0.000021510002 0.000086040009 0.007055280706

94,524 3 3 289 0.000031737971 0.000031737971 0.003057424569

99,682 5 6 222 0.000050159507 0.000060191409 0.002227082121

108,332 7 5 251 0.000064616180 0.000046154414 0.002316951593

128,469 2 6 175 0.000015567958 0.000046703874 0.001362196328

149,266 4 14 148 0.000026797797 0.000093792290 0.000991518497

154,043 7 9 39 0.000045441857 0.000058425245 0.000253176061

154,824 0.998811839382 5 9 18 0.000032294735 0.000058130522 0.000116261045 0.000036266646 0.000075830259 0.000616532975 -0.000003971911 -0.000017699737 -0.000500271930 -0.001148471910 -0.000205995824 -0.012420355014

145,399 0.997229380851 1 10 26 0.000006877626 0.000068776264 0.000178818286 0.000038851674 0.000058277511 0.000184545452 -0.000031974048 0.000010498753 -0.000005727165 -0.001174576892 -0.000177549095 -0.011907502368

120,202 0.973869650226 1 6 9 0.000008319329 0.000049915975 0.000074873962 0.000019985144 0.000063286291 0.000146557725 -0.000011665815 -0.000013370316 -0.000071683763 -0.001127285290 -0.000196635434 -0.011692806643

89,740 0.706117162445 5 7 13 0.000055716514 0.000078003120 0.000144862937 0.000007530092 0.000060240737 0.000131776612 0.000048186422 0.000017762383 0.000013086325 -0.000775090524 -0.000120589808 -0.008418167598

47,567 0.051223735645 2 4 10 0.000042045956 0.000084091912 0.000210229781 0.000028579322 0.000061921864 0.000104790847 0.000013466634 0.000022170048 0.000105438934 -0.000058005736 -0.000008522149 -0.000605947036

26,504 0.008044895961 1 1 11 0.000037730154 0.000037730154 0.000415031693 0.000050980649 0.000080112449 0.000167507847 -0.000013250495 -0.000042382295 0.000247523846 -0.000009324973 -0.000001857755 -0.000094023387

14,263 0.002687829072 0 0 28 0.000000000000 0.000000000000 0.001963121363 0.000040501681 0.000067502801 0.000283511766 -0.000040501681 -0.000067502801 0.001679609597 -0.000003188754 -0.000000688202 -0.000027564355

10,032 0.001838215698 0 2 21 0.000000000000 0.000199362041 0.002093301435 0.000024529644 0.000024529644 0.000956656119 -0.000024529644 0.000174832397 0.001136645316 -0.000002151440 -0.000000025200 -0.000019849444

18,514 0.003935633580 2 0 15 0.000108026358 0.000000000000 0.000810197688 0.000000000000 0.000082321465 0.002016875900 0.000108026358 -0.000082321465 -0.001206678212 -0.000004084557 -0.000001066016 -0.000051720271

33,579 0.015099336268 8 4 71 0.000238244141 0.000119122070 0.002114416749 0.000070062355 0.000070062355 0.001261122399 0.000168181785 0.000049059715 0.000853294350 -0.000014762386 -0.000002106077 -0.000167324256

44,987 0.041045201687 12 3 99 0.000266743726 0.000066685931 0.002200635739 0.000191964371 0.000076785749 0.001650893594 0.000074779355 -0.000010099817 0.000549742145 -0.000043962975 -0.000008153258 -0.000467304377

56,699 0.109379979688 11 8 81 0.000194006949 0.000141095963 0.001428596624 0.000254563043 0.000089097065 0.002163785862 -0.000060556094 0.000051998898 -0.000735189237 -0.000131958445 -0.000014934987 -0.001385849511

70,860 0.305212030948 16 11 601 0.000225797347 0.000155235676 0.008481512842 0.000226186496 0.000108176150 0.001770155184 -0.000000389149 0.000047059526 0.006711357659 -0.000349850920 -0.000043181889 -0.001594275153

78,774 0.472442953055 15 7 978 0.000190418158 0.000088861807 0.012415263920 0.000211666758 0.000148950682 0.005346545520 -0.000021248600 -0.000060088874 0.007068718399 -0.000551395158 -0.000117463496 -0.002298973371

81,468 0.532982036922 9 10 1049 0.000110472824 0.000122747582 0.012876221338 0.000207172167 0.000120293516 0.010552414558 -0.000096699343 0.000002454066 0.002323806780 -0.000662265109 -0.000099181065 -0.005122517665

80,945 0.521249692402 9 11 966 0.000111186608 0.000135894743 0.011934029279 0.000149773468 0.000106089540 0.012649617454 -0.000038586859 0.000029805204 -0.000715588174 -0.000617395788 -0.000082741055 -0.006594041186

82,375 0.553234966201 10 21 868 0.000121396055 0.000254931715 0.010537177542 0.000110828567 0.000129299994 0.012406642325 0.000010567488 0.000125631720 -0.001869464784 -0.000628086940 -0.000034803688 -0.007637034058

84,356 0.596773691157 13 11 949 0.000154108777 0.000130399734 0.011249940727 0.000116336027 0.000195934362 0.011229488121 0.000037772750 -0.000065534628 0.000020452606 -0.000661281013 -0.000151625731 -0.007110203695

86,172 0.635406055845 7 20 789 0.000081232883 0.000232093952 0.009156106392 0.000137946753 0.000191925917 0.010897793452 -0.000056713869 0.000040168035 -0.001741687060 -0.000764126653 -0.000094277166 -0.008690159564

89,045 0.692971862621 19 11 254 0.000213375260 0.000123533045 0.002852490314 0.000117282792 0.000181788328 0.010191874648 0.000096092467 -0.000058255283 -0.007339384334 -0.000727463653 -0.000171022974 -0.013356507622

99,441 0.851913697642 30 22 178 0.000301686427 0.000221236713 0.001790006134 0.000148387428 0.000176923472 0.005952618753 0.000153298999 0.000044313241 -0.004162612618 -0.000845581587 -0.000122869719 -0.013713655731

114,784 0.958135861164 81 35 175 0.000705673265 0.000304920546 0.001524602732 0.000259966257 0.000175079316 0.002291947413 0.000445707007 0.000129841230 -0.000767344680 -0.000670847623 -0.000056242484 -0.012170435920

132,495 0.991203360961 129 26 146 0.000973621646 0.000196233820 0.001101928375 0.000518146808 0.000266075388 0.001647800210 0.000455474838 -0.000069841568 -0.000545871835 -0.000684318230 -0.000256109804 -0.012370941376

152,910 0.998588796229 355 55 35 0.002321627101 0.000359688706 0.000228892813 0.000849243163 0.000246684919 0.001298128834 0.001472383938 0.000113003787 -0.001069236021 0.000326056964 -0.000075430750 -0.012985742611

153,378 0.998646922963 208 46 61 0.001356126694 0.000299912634 0.000397710232 0.001695835742 0.000283807221 0.000634186507 -0.000339709049 0.000016105413 -0.000236476275 -0.002657735157 -0.000343118398 -0.013094955123

153,392 0.998648624464 2163 41 36 0.014101126526 0.000267289037 0.000234692813 0.001838139268 0.000329755002 0.000313430497 0.012262987258 -0.000062465965 -0.000078737684 0.009927925646 -0.000421584181 -0.012937452007

119,355 0.971858093513 397 87 48 0.003326211721 0.000728917934 0.000402161619 0.007728917430 0.000283600091 0.000316197803 -0.004402705709 0.000445317843 0.000085963816 -0.006535097264 0.000083219370 -0.012430315292

82,169 0.548648112120 116 89 48 0.001411724616 0.001083133542 0.000584161910 0.009385987747 0.000469299387 0.000307977723 -0.007974263131 0.000613834155 0.000276184187 -0.005648820738 0.000139436421 -0.006912986596

49,355 0.059633623157 38 44 36 0.000769932124 0.000891500355 0.000729409381 0.002545602509 0.000873345110 0.000476370060 -0.001775670384 0.000018155244 0.000253039321 -0.000244336694 -0.000020366878 -0.000752766079

27,905 0.009116147803 17 46 67 0.000609209819 0.001648450099 0.002401003404 0.001170888963 0.001011222286 0.000638666707 -0.000561679144 0.000637227812 0.001762336697 -0.000026284646 0.000002530088 -0.000101315815

15,213 0.002927046635 3 40 38 0.000197199763 0.002629330178 0.002497863669 0.000711881957 0.001164897748 0.001333160756 -0.000514682194 0.001464432430 0.001164702913 -0.000008302010 0.000003233636 -0.000034280161

10,524 0.001921280998 11 43 59 0.001045229951 0.004085898898 0.005606233371 0.000463843406 0.001994526648 0.002435177884 0.000581386544 0.002091372250 0.003171055487 -0.000003343491 0.000003327051 -0.000018646351

19,994 0.004493856737 33 98 43 0.001650495149 0.004901470441 0.002150645194 0.000543963943 0.003224929090 0.003768893033 0.001106531206 0.001676541351 -0.001618247840 -0.000005460467 0.000005917747 -0.000065136068

37,977 0.022268325137 58 242 134 0.001527240172 0.006372277958 0.003528451431 0.001441772069 0.004620224130 0.003342289796 0.000085468103 0.001752053828 0.000186161635 -0.000049795516 0.000031005639 -0.000282586375

47,841 0.052435557917 57 398 248 0.001191446667 0.008319224096 0.005183838130 0.001569750392 0.005865001466 0.003053250763 -0.000378303725 0.002454222630 0.002130587367 -0.000141572379 0.000109828055 -0.000563453312

60,822 0.151097373647 48 519 164 0.000789188123 0.008533096577 0.002696392753 0.001340045212 0.007457642919 0.004451280617 -0.000550857089 0.001075453658 -0.001754887864 -0.000434024817 0.000108150204 -0.002210722174

73,817 0.364364626367 50 537 333 0.000677350746 0.007274747009 0.004511155967 0.000966290274 0.008438935056 0.003791538978 -0.000288939528 -0.001164188047 0.000719616988 -0.000951198134 -0.000555246784 -0.004429436602

79,683 0.492867983759 30 575 615 0.000376491849 0.007216093772 0.007718082904 0.000727872310 0.007843195508 0.003691352431 -0.000351380461 -0.000627101736 0.004026730472 -0.001317439848 -0.000486357416 -0.004361630721

82,518 0.556413771330 29 570 698 0.000351438474 0.006907583800 0.008458760513 0.000521172638 0.007244299674 0.006175895765 -0.000169734164 -0.000336715874 0.002282864747 -0.001386227717 -0.000387489099 -0.005894289492

82,125 0.547667296341 21 576 952 0.000255707763 0.007013698630 0.011592085236 0.000363746216 0.007059142669 0.008094894606 -0.000108038454 -0.000045444039 0.003497190630 -0.001330648365 -0.000221877955 -0.005136588390

80,785 0.517655156915 13 484 826 0.000160920963 0.005991211240 0.010224670421 0.000303687372 0.006960514568 0.010021683278 -0.000142766409 -0.000969303328 0.000202987143 -0.001275706009 -0.000687959580 -0.006560365036

83,266 0.572960435379 11 474 808 0.000132106742 0.005692599620 0.009703840703 0.000208704192 0.006506660119 0.010914001596 -0.000076597450 -0.000814060499 -0.001210160893 -0.001374087783 -0.000672511855 -0.008070939696

82,325 0.552122454395 10 450 573 0.000121469784 0.005466140298 0.006960218646 0.000146295969 0.005839647427 0.009960317218 -0.000024826185 -0.000373507129 -0.003000098573 -0.001295529547 -0.000404813884 -0.008765672716

85,047 0.611644482390 9 494 236 0.000105823839 0.005808552918 0.002774936212 0.000126818487 0.005580013407 0.008339825232 -0.000020994648 0.000228539511 -0.005564889020 -0.001432851667 -0.000080216681 -0.011279403400

88,557 0.683549014676 18 656 151 0.000203258918 0.007407658344 0.001705116479 0.000113519585 0.005640130966 0.004833544440 0.000089739333 0.001767527379 -0.003128427961 -0.001525604685 0.000962326738 -0.010939962259

90,514 0.720362412173 18 643 90 0.000198864264 0.007103873434 0.000994321320 0.000155526370 0.006624271330 0.002229211308 0.000043337894 0.000479602104 -0.001234889988 -0.001641193909 0.000086381105 -0.010165114194

101,491 0.873712464645 15 657 87 0.000147796356 0.006473480407 0.000857218867 0.000201037577 0.007254105913 0.001345834892 -0.000053241221 -0.000780625506 -0.000488616026 -0.002074952055 -0.000996306741 -0.011677024993

115,342 0.960104553736 8 690 52 0.000069358950 0.005982209429 0.000450833174 0.000171870524 0.006770657014 0.000921850993 -0.000102511574 -0.000788447585 -0.000471017819 -0.002327426581 -0.001102330881 -0.012814745095

129,015 0.988006802870 14 618 23 0.000108514514 0.004790140681 0.000178273844 0.000106072415 0.006212154054 0.000641046335 0.000002442099 -0.001422013373 -0.000462772491 -0.013929596125 -0.009835716354 -0.011910281442

117,151 0.965894303881 9 552 25 0.000076823928 0.004711867590 0.000213399800 0.000090032207 0.005352823942 0.000306927978 -0.000013208279 -0.000640956352 -0.000093528178 -0.013632955591 -0.008861165468 -0.011287067434

93,636 0.773335144678 3 435 28 0.000032038959 0.004645649109 0.000299030287 0.000093432887 0.004752890326 0.000194990372 -0.000061393927 -0.000107241217 0.000104039915 -0.010952374803 -0.006681876878 -0.008884109190

71,741 0.322280762354 1 300 160 0.000013939031 0.004181709204 0.002230244909 0.000056929507 0.004682451954 0.000251438656 -0.000042990476 -0.000500742750 0.001978806253 -0.004558376810 -0.002911432625 -0.003098174879

50,220 0.064151868077 1 201 28 0.000019912386 0.004002389486 0.000557546794 0.000024187160 0.004444390695 0.001136796532 -0.000004274775 -0.000442001209 -0.000579249737 -0.000904887843 -0.000575769289 -0.000780813876

32,794 0.014083885541 0 112 48 0.000000000000 0.003415258889 0.001463682381 0.000016398685 0.004107870549 0.001541476374 -0.000016398685 -0.000692611660 -0.000077793993 -0.000198829609 -0.000129933819 -0.000164357243

20,598 0.004743713550 2 96 58 0.000097096806 0.004660646665 0.002815807360 0.000012046161 0.003770448358 0.000915508228 0.000085050645 0.000890198307 0.001900299132 -0.000066488249 -0.000036255720 -0.000045975057

14,780 0.002815488608 0 49 37 0.000000000000 0.003315290934 0.002503382950 0.000037458795 0.003895714714 0.001985316152 -0.000037458795 -0.000580423780 0.000518066798 -0.000039807026 -0.000025659013 -0.000031178773

20,730 0.004800132821 0 19 159 0.000000000000 0.000916546068 0.007670043415 0.000056532308 0.004098592346 0.002685284640 -0.000056532308 -0.003182046277 0.004984758775 -0.000067958643 -0.000056234242 -0.000031716045

33,437 0.014910453545 5 21 89 0.000149534946 0.000628046775 0.002661722044 0.000000000000 0.001914953534 0.005519571952 0.000149534946 -0.001286906760 -0.002857849907 -0.000208024558 -0.000146420704 -0.000215455087

43,474 0.036008064557 3 25 135 0.000069006763 0.000575056356 0.003105304320 0.000092307124 0.000738456994 0.004578433363 -0.000023300362 -0.000163400639 -0.001473129044 -0.000508593275 -0.000313144033 -0.000470453079

57,497 0.116575160226 4 35 138 0.000069568847 0.000608727412 0.002400125224 0.000104016331 0.000598093901 0.002912457256 -0.000034447483 0.000010633511 -0.000512332032 -0.001647856805 -0.000993507497 -0.001411074382

72,142 0.330213409302 3 27 326 0.000041584652 0.000374261872 0.004518865571 0.000069326836 0.000594230026 0.002703746620 -0.000027742184 -0.000219968154 0.001815118950 -0.004665541906 -0.002890379347 -0.003228485370

80,110 0.502474979786 4 39 680 0.000049931344 0.000486830608 0.008488328548 0.000053996097 0.000478251144 0.003579169849 -0.000004064752 0.000008579464 0.004909158700 -0.007087505702 -0.004283356564 -0.003358003376

84,234 0.594128736584 6 45 711 0.000071230145 0.000534226084 0.008440772135 0.000045976408 0.000433491842 0.006607466569 0.000025253737 0.000100734242 0.001833305566 -0.008362880516 -0.005009908780 -0.005797971436

86,018 0.632189190487 3 39 832 0.000034876421 0.000453393476 0.009672394150 0.000060847977 0.000511123010 0.008463953658 -0.000025971556 -0.000057729534 0.001208440492 -0.008930998700 -0.005431027405 -0.006564427965

84,815 0.606673333129 0 28 862 0.000000000000 0.000330130284 0.010163296587 0.000052862815 0.000493386274 0.009063035970 -0.000052862815 -0.000163255991 0.001100260616 -0.008586847890 -0.005275845198 -0.006365110212

85,221 0.615357720536 3 34 912 0.000035202591 0.000398962697 0.010701587637 0.000017561010 0.000392195887 0.009916116909 0.000017641581 0.000006766810 0.000785470727 -0.008666381193 -0.005246742850 -0.006649933671

85,339 0.617868296609 5 14 650 0.000058589859 0.000164051606 0.007616681705 0.000017643323 0.000364628667 0.010433084759 0.000040946537 -0.000200577061 -0.002816403054 -0.008687339460 -0.005396260054 -0.008902548116

84,941 0.609375994694 17 18 353 0.000200138920 0.000211911798 0.004155825809 0.000046904315 0.000281425891 0.009158067542 0.000153234605 -0.000069514094 -0.005002241734 -0.008499510513 -0.005242224434 -0.010112184503

86,959 0.651652295784 8 22 273 0.000091997378 0.000252992790 0.003139410527 0.000129198966 0.000187925769 0.005890298332 -0.000037201588 0.000065067020 -0.002750887805 -0.009213273974 -0.005518210901 -0.009346631311

91,068 0.730295040770 5 19 177 0.000054904028 0.000208635305 0.001943602583 0.000145433392 0.000232693426 0.003641652123 -0.000090529364 -0.000024058121 -0.001698049541 -0.010364095916 -0.006249247639 -0.009705719518

96,702 0.818048283021 7 18 166 0.000072387334 0.000186138860 0.001716613927 0.000073022631 0.000230302145 0.002527706471 -0.000000635297 -0.000044163286 -0.000811092544 -0.011535922047 -0.007016612704 -0.010146398287

105,291 0.906885813868 10 25 67 0.000094974879 0.000237437198 0.000636331690 0.000063907973 0.000197049582 0.001826702881 0.000031066907 0.000040387616 -0.001190371191 -0.012759937469 -0.007701917278 -0.011592228400

#bostonstrong#happyhalloween#jobs

1 12 3624 48 60 72

#bostonstrong#happyhalloween#jobs

1 12 3624 48 60 72

Tren

d sc

ore

#bostonstrong#happyhalloween#jobs

24 36 6048 72

Twee

t cou

nt

tweets per hour

1 12 3624 48 60 72

Figure B.3: Fully processed data

Which might be noted on figure B.3, #happyhalloween are actually spot-ted as a trend. Unaffected by the fact, that it is the settings for spottingfast trends - which is in focus at the moment, as described in chapter3. Never the less satisfy #happyhalloween the requirements, for beingconsidered as a slow trend - described in section 1.3.

1en.wikipedia.org/wiki/World_Series

Page 59: Magazine recommendations based on social media trends

45

Two tag clouds with top 100 of the words, in all the tweets containing thehashtag #bostonstrong, has been created to give an overall perspectiveof the semantic richness of the data.

Figure B.4: Top: Tag cloud for all words, in the tweets containing#bostonstrong. Bottom: Tag cloud for the rest of thewords, after removing #bostonstrong, to get a deeper un-derstanding.

The documents similar to those tweets, computed using LDA and Solr,can be seen at figure B.5 and figure B.6. Although all documents isnot about baseball or Boston Red Sox, is the overall impressions of thedocument positive.

The documents computed using Solr, is clearly different than the oneusing LDA, magazines/documents about the city Boston, is among theselected too.

Page 60: Magazine recommendations based on social media trends

46 Example: #bostonstrong

Figure B.5: Subset of the similar #bostonstrong documents using LDA

Figure B.6: Subset of the similar #bostonstrong documents using Solr

Page 61: Magazine recommendations based on social media trends

Appendix C

Implementation details

C.1 Flask

The existing Python code will be used, as data provider for the website,for reusing as much of the code as possible. It could properly have beenoptimized by writing it all in JavaScript, but this is just a debug tool. Aweb application framework supporting execution of Python code wouldtherefore be preferable.

Flask is a lightweight Python web application micro-framework, whichmeans that the core is kept simple but extensible. Flask supports bothlocal server and broadcasting, plus opportunity for custom port number(default is 5000). In addition it supports implementation of the decoratorapp.route(), which is a URL trigger. As soon as the url defined in theroute decorator is matching, the function attached is executed.

Page 62: Magazine recommendations based on social media trends

48 Implementation details

The interactive tweet-map needs three different app.route()’s:

/ : The website’s index page, is loading the HTML which displays thedialog showed in Figure ?? and discussed in the associated section.

/get_tweets/. . . : To receive the tweets from the database, matchingthe settings specified in the initial dialog of the website, this urlneeds three parameters: country_code, timestamp and dura-tion. After receiving the tweets, will they be plotted on the map.

If no tweets are available from the time period and country, an errormessage will be displayed and a new time period can be selected.

/related_documents : When the tweets is loaded and displayed on themap the sidebar, where its possible to see the related documents, tothe tweets on the map, appears. This url is triggered by the sidebarand will calculate all the documents, using the trending frameworkin combination with LDA and Solr, which will be discussed later inthe report.

C.2 Peewee

Peewee1, a ORM (object-relational mapping) module for Python whichsupports the database choice. was the chosen.

ORM: A programming technique for converting objects between incom-patible type systems in object-oriented programming languages.For databases uses, it creates a "virtual object database" which syn-chronize the state of the objects in the programming language withthe database tables. Usually ORM modules/libraries provides ansimple abstraction the SQL language on top too.

1peewee.readthedocs.org - open-source, MIT (Massachusetts Institute of Tech-nology) license, which means that all code is free to use and free to change/modifyhow ever it fits the integration implementation.

Page 63: Magazine recommendations based on social media trends

C.3 Database 49

Besides providing an extended abstraction of the SQL language on topof the database connection, Peewee also provides a Python script whichmakes it possible to grab a database from a running MySQL server andauto generate the Python classes, this includes tables, foreign keys etc.

C.3 Database

There is a ton of different databases to choose from and each have theirown advantages. What where preferred in this project is a regular rela-tional database, which is easy and fast to setup using predefined scripts.The database is not the main focus in this project, and is mainly used asa placeholder for the data to be saved, instead of keeping it in memory.

C.4 MySQL

Figure C.1: E/R Diagram

By examining the json result callbackfrom the Twitter service and investigatingthe different analysis models and frame-work for achieving the goal of the project(described in the section 1.1), a solu-tion for the table(s) for the database wasfound.

It turned out that all information needed,was able to fit into a single table, namedtweet (see figure C.1).

The desired information from the tweet is attributes such as the text,time stamp and longitude/latitude, to analyze the tweets over time fromlocation.

Page 64: Magazine recommendations based on social media trends

50 Implementation details

Page 65: Magazine recommendations based on social media trends

Bibliography

[Bee12] Beevolve. An exhaustive study of twitter users across theworld. http://www.beevolve.com/twitter-statistics/,October 2012. Online; last read 6. January 2014.

[Ble09] David M. Blei. Topic models. http://videolectures.net/mlss09uk_blei_tm/, November 2009. Online Video Lecture;last viewed 26. December 2013.

[BNG11] H. Becker, M. Naaman, and L. Gravano. Beyond trendingtopics: Real-world event identification on twitter. In FifthInternational AAAI Conference on Weblogs and Social Media,2011.

[BNJ03] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latentdirichlet allocation. J. Mach. Learn. Res., 3:993–1022, March2003.

[CDCS10] Mario Cataldi, Luigi Di Caro, and Claudio Schifanella.Emerging topic detection on twitter based on temporal andsocial terms evaluation. In Proceedings of the Tenth Interna-tional Workshop on Multimedia Data Mining, MDMKDD ’10,pages 4:1–4:10, New York, NY, USA, 2010. ACM.

[IHS06] Alexander Ihler, Jon Hutchins, and Padhraic Smyth. Adap-tive event detection with time-varying poisson processes. In

Page 66: Magazine recommendations based on social media trends

52 BIBLIOGRAPHY

Proceedings of the 12th ACM SIGKDD International Confer-ence on Knowledge Discovery and Data Mining, KDD ’06,pages 207–216, New York, NY, USA, 2006. ACM.

[Jac88] R. Jackson. The matthew effect in science. INTERNA-TIONAL JOURNAL OF DERMATOLOGY, 27(1):16–16,1988.

[KLPM10] Haewoon Kwak, Changhyun Lee, Hosung Park, and SueMoon. What is twitter, a social network or a news media?In Proceedings of the 19th International Conference on WorldWide Web, WWW ’10, pages 591–600, New York, NY, USA,2010. ACM.

[Nik12] Stanislav Nikolov. Trend or no trend: A novel nonpara-metric method for classifying time series. Master’s thesis,Massachusetts Institute of Technology, Massachusetts, USA,September 2012.

[RIS+94] Paul Resnick, Neophytos Iacovou, Mitesh Suchak, PeterBergstrom, and John Riedl. Grouplens: An open architec-ture for collaborative filtering of netnews. In Proceedings ofthe 1994 ACM Conference on Computer Supported Coopera-tive Work, CSCW ’94, pages 175–186, New York, NY, USA,1994. ACM.

[ŘS10] Radim Řehůřek and Petr Sojka. Software Framework forTopic Modelling with Large Corpora. In Proceedings of theLREC 2010 Workshop on New Challenges for NLP Frame-works, pages 45–50, Valletta, Malta, May 2010. ELRA. http://is.muni.cz/publication/884893/en.

[Sag12] Jeff Saginor. Study finds facebook users more pri-vate than ever. http://www.digitaltrends.com/web/study-finds-facebook-users-more-private-than-ever/,February 2012. Online; last read 23. August 2013.

[Sal89] Gerard Salton. Automatic Text Processing: The Transfor-mation, Analysis, and Retrieval of Information by Computer.Addison-Wesley Longman Publishing Co., Inc., Boston, MA,USA, 1989.

Page 67: Magazine recommendations based on social media trends

BIBLIOGRAPHY 53

[SM95] Upendra Shardanand and Pattie Maes. Social informa-tion filtering: Algorithms for automating &ldquo;word ofmouth&rdquo;. In Proceedings of the SIGCHI Conference onHuman Factors in Computing Systems, CHI ’95, pages 210–217, New York, NY, USA, 1995. ACM Press/Addison-WesleyPublishing Co.

[Wal05] Thomas Vander Wal. Explaining and showing broad and nar-row folksonomies, 2005.