epidemiological modeling of online social network dynamics

11
arXiv:1401.4208v1 [cs.SI] 17 Jan 2014 1 Epidemiological modeling of online social network dynamics John Cannarella 1 , Joshua A. Spechler 1,* 1 Department of Mechanical and Aerospace Engineering, Princeton University, Princeton, NJ, USA * E-mail: Corresponding [email protected] Abstract The last decade has seen the rise of immense online social networks (OSNs) such as MySpace and Facebook. In this paper we use epidemiological models to explain user adoption and abandonment of OSNs, where adoption is analogous to infection and abandonment is analogous to recovery. We modify the traditional SIR model of disease spread by incorporating infectious recovery dynamics such that contact between a recovered and infected member of the population is required for recovery. The proposed infectious recovery SIR model (irSIR model) is validated using publicly available Google search query data for “MySpace” as a case study of an OSN that has exhibited both adoption and abandonment phases. The irSIR model is then applied to search query data for “Facebook,” which is just beginning to show the onset of an abandonment phase. Extrapolating the best fit model into the future predicts a rapid decline in Facebook activity in the next few years. Introduction Online social networks (OSNs) have revolutionized interpersonal interaction by greatly facilitating com- munication between users, leading to the popularity of OSNs seen today [1]. The rise to ubiquity of OSNs has been accompanied by the creation of a multibillion dollar industry to provide these social networking services. Two high profile examples are Facebook ($140 billion) and Twitter ($35 billion) [2], both of which command high valuations based on their large user bases and expectations of growth. Despite the recent success of Facebook and Twitter, the last decade also provides numerous examples of OSNs that have risen and fallen in popularity, most notably MySpace. MySpace, founded in 2003, reached its peak in 2008 with 75.9 million unique monthly visits in the US before subsequently decaying to obscurity by 2011 [3]. MySpace was purchased by News Corp for $580 million during its rapid growth phase in 2005 and was sold six years later at a loss for $35 million in 2011 [4]. The dynamics governing the rapid rises and falls of OSNs are therefore not only of academic interest, but also of financial interest to incumbent and emerging OSN providers and their stakeholders. In this paper we analyze the adoption and abandonment dynamics of OSNs by drawing analogy to the dynamics that govern the spread of infectious disease. The application of disease-like dynamics to OSN adoption follows intuitively, since users typically join OSNs because their friends have already joined [5]. The precedent for applying epidemiological models to non-disease applications has previously been set by research focused on modeling the spread of less-tangible applications such as ideas [6–9]. Ideas, like diseases, have been shown to spread infectiously between people before eventually dying out, and have been successfully described with epidemiological models. Again, this follows intuitively, as ideas are spread through communicative contact between different people who share ideas with each other. Idea manifesters ultimately lose interest with the idea and no longer manifest the idea, which can be thought of as the gain of “immunity” to the idea. The epidemiological models presented in this study are used to analyze publicly available Google search query data for different OSNs, which can be obtained from Google’s “Google Trends” service [10]. Google search query data has been used in a range of studies, including the monitoring of disease outbreak [11], economic forecasting [12], and the prediction of financial trading behavior [13]. The work in Ref. [11] is particularly interesting, as the authors are able to correlate search query frequencies of flu-related terms

Upload: dario-caliendo

Post on 27-Aug-2014

58.518 views

Category:

Social Media


3 download

DESCRIPTION

Facebook si è diffuso come un'epidemia virale, alla quale gli utenti stanno lentamente diventando immuni. Ad affermarlo sono John Cannarella e Joshua A. Spechler, due ricercatori della Princeton Universiry che in un'analisi nella quale hanno studiato il fenomeno del social network applicando un modello utilizzato per lo studio dello sviluppo delle epidemie, hanno determinato che la diffusione della piattaforma di Zuckerberg ha ormai superato il sui punto più alto ed a breve inizierà ad implodere, fino a perdere l'80% degli utenti entro il 2017.

TRANSCRIPT

Page 1: Epidemiological modeling of online social network dynamics

arX

iv1

401

4208

v1 [

csS

I] 1

7 Ja

n 20

14

1

Epidemiological modeling of online social network dynamicsJohn Cannarella1 Joshua A Spechler1lowast

1 Department of Mechanical and Aerospace Engineering Princeton University PrincetonNJ USAlowast E-mail Corresponding spechlerprincetonedu

Abstract

The last decade has seen the rise of immense online social networks (OSNs) such as MySpace andFacebook In this paper we use epidemiological models to explain user adoption and abandonment ofOSNs where adoption is analogous to infection and abandonment is analogous to recovery We modify thetraditional SIR model of disease spread by incorporating infectious recovery dynamics such that contactbetween a recovered and infected member of the population is required for recovery The proposedinfectious recovery SIR model (irSIR model) is validated using publicly available Google search querydata for ldquoMySpacerdquo as a case study of an OSN that has exhibited both adoption and abandonmentphases The irSIR model is then applied to search query data for ldquoFacebookrdquo which is just beginningto show the onset of an abandonment phase Extrapolating the best fit model into the future predicts arapid decline in Facebook activity in the next few years

Introduction

Online social networks (OSNs) have revolutionized interpersonal interaction by greatly facilitating com-munication between users leading to the popularity of OSNs seen today [1] The rise to ubiquity of OSNshas been accompanied by the creation of a multibillion dollar industry to provide these social networkingservices Two high profile examples are Facebook ($140 billion) and Twitter ($35 billion) [2] both ofwhich command high valuations based on their large user bases and expectations of growth Despite therecent success of Facebook and Twitter the last decade also provides numerous examples of OSNs thathave risen and fallen in popularity most notably MySpace MySpace founded in 2003 reached its peakin 2008 with 759 million unique monthly visits in the US before subsequently decaying to obscurity by2011 [3] MySpace was purchased by News Corp for $580 million during its rapid growth phase in 2005and was sold six years later at a loss for $35 million in 2011 [4] The dynamics governing the rapid risesand falls of OSNs are therefore not only of academic interest but also of financial interest to incumbentand emerging OSN providers and their stakeholders

In this paper we analyze the adoption and abandonment dynamics of OSNs by drawing analogy to thedynamics that govern the spread of infectious disease The application of disease-like dynamics to OSNadoption follows intuitively since users typically join OSNs because their friends have already joined [5]The precedent for applying epidemiological models to non-disease applications has previously been setby research focused on modeling the spread of less-tangible applications such as ideas [6ndash9] Ideas likediseases have been shown to spread infectiously between people before eventually dying out and havebeen successfully described with epidemiological models Again this follows intuitively as ideas arespread through communicative contact between different people who share ideas with each other Ideamanifesters ultimately lose interest with the idea and no longer manifest the idea which can be thoughtof as the gain of ldquoimmunityrdquo to the idea

The epidemiological models presented in this study are used to analyze publicly available Google searchquery data for different OSNs which can be obtained from Googlersquos ldquoGoogle Trendsrdquo service [10] Googlesearch query data has been used in a range of studies including the monitoring of disease outbreak [11]economic forecasting [12] and the prediction of financial trading behavior [13] The work in Ref [11] isparticularly interesting as the authors are able to correlate search query frequencies of flu-related terms

2

to numbers of flu cases from CDC data which is the basis of Googlersquos ldquoGoogle Flu Trendsrdquo service [14]The results of Ref [11] show that search query data can be used as a proxy for a tangible quantity(flu cases) with the advantage that search query data is quicker and easier to obtain than the actualaggregated reported flu case data [15] In this work we similarly use Google search query data as a proxyfor another quantity of interest OSN usership for which data is difficult to obtain

Model

Traditional SIR model

The classical SIR model for the spread of infections disease is presented in Equations 1a-1c [16] Thevariable names are summarized in Table 1 which compares the epidemiological interpretation of thevariables to the equivalent OSN dynamical interpretation of the variables In short the SIR model is acollection of three ordinary differential equations that govern the rates at which three compartments of theentire population N (susceptible S infected I and recovered R) evolve during a disease outbreak wherethe superscripted dot indicates a time derivative Equations 1a-1c sum to 0 when added together revealingan underlying assumption that the population remains constant during the outbreak MathematicallyS + I +R = N independent of time This assumption is valid for disease outbreaks that are short livedcompared to the lifespan of the population members

S = minus

βIS

N(1a)

I =βIS

Nminus γI (1b)

R = γI (1c)

Equation 1a shows that the rate at which the susceptible population becomes infected is proportionateto the infection rate β the fraction of infected population I

N and the susceptible population S This

indicates that the disease is transferred through interaction between the susceptible and infected popu-lations with an infection rate of β Equation 1c shows that the rate at which the infected populationrecovers is proportionate to I and the recovery rate γ but does not require any sort of interaction betweenpopulation compartments as was necessary for the spread of infection

There is no analytical solution to the SIR model but the set of ODEs can be solved numericallywith the addition of initial conditions for each of the population compartments S(0) = S0 I(0) = I0and R(0) = R0 S0 is the initial compartment of the population that is susceptible to contracting thedisease and generally represents the majority of the population at early times I0 represents the sizeof the initial outbreak which is generally much smaller than the total population I0 must be non-zerofor an infection to begin to spread as given by equation 1b Note that for the purposes of fitting theSIR model to a data set if no initial parameters are known than R0 can be arbitrarily set to 0 This isbecause the magnitude of R does not affect any of the dynamics other than contributing to the size ofN However since terms containing β are always divided by N R0 can be set arbitrarily and β is thentaken as a fitting parameter

One interesting feature of the model is the immunization criterion

S0

Nlt

γ

β(2)

which is derived by setting the right hand side of Equation 1b less than or equal to 0 and rearrangingIf Equation 2 is satisfied then I is always less than or equal to 0 meaning that I can never increaseThis is the basis for immunization campaigns In the context of OSNs where OSN users are analogous to

3

the infected population compartment this immunization criterion represents the conditions under whicha OSN will strictly decline

The SIR model can be applied to OSNs by drawing the correct OSN analogues to the SIR model pa-rameters When applying the SIR model to OSNs the susceptible population compartment is equivalentto all users that could potentially join the OSN The infected population compartment is analogous toOSN users potential OSN users are susceptible to joining the OSN through ldquoinfectionrdquo by contact witha current OSN users Finally the recovered population compartment is analogous to the population ofpeople who are opposed to joining the OSN In the case of an OSN this could be comprised of peoplewho have left the OSN with no intention of returning or people who resist joining the OSN in the firstplace A summary of all the OSN analogues to the epidemiological model parameters is provided in Table1

Infectious recovery SIR model

While the traditional SIR model captures the infectious uptake dynamics of OSNs the assumptionof a characteristic recovery rate in the traditional SIR model is of doubtful validity for modeling theabandonment of OSNs [5] Contrary to the SIR modelrsquos assumption of a characteristic recovery ratefor diseases OSN users do not join an OSN expecting to leave after a predetermined amount of timeInstead every user that joins the network expects to stay indefinitely but ultimately loses interest astheir peers begin to lose interest Thus a user that joins early on is expected to stay on the networklonger than a user that joins later Eventually users begin to leave and recovery spreads infectiously asusers begin to lose interest in the social network The notion of infectious abandonment is supported bywork analyzing user churn in mobile networks which show that users are more likely to leave the networkif their contacts have left [17] Therefore it is necessary to modify the traditional SIR model to includeinfectious recovery dynamics which intuitively provide a better description of OSN abandonment

To incorporate infectious recovery dynamics the recovery rate must be modified to be proportionateto the recovered population fraction R

N similar to how the spread of infection among the susceptible

population in the SIR model is proportionate to the infected population fraction IN This is achieved by

multiplying the γI terms in Equations 1b and 1c by RN

to give Equations 3b and 3c The constant γ isreplaced with a new constant ν which now represents an infectious recovery rate with the same units

S = minus

βIS

N(3a)

I =βIS

Nminus

νIR

N(3b)

R =νIR

N(3c)

In modifying Equation 3b and 3c R0 is no longer an arbitrary parameter as it was in the SIR modelbecause the magnitude of R is now important to the recovery dynamics This results in an additionalfitting parameter when applying the irSIR model to the data Similar to the requirement of an initialinfected population in the traditional SIR model the irSIR model necessarily requires a small initialrecovered population If R0 were set to 0 none of the infected population would be able to recover sincerecovery now requires contact with a recovered individual Mathematically Equation 3c would be zerofor all time and I would grow at the expense of S until the entire population is infected In the contextof OSN dynamics R0 can be thought of as the first users to leave the OSN or a compartment of thepopulation that resists joining the OSN altogether This small initial compartment of OSN resistors isultimately responsible for the abandonment of the OSN

An immunization criterion can be derived for the irSIR model similar to Equation 2

4

S0

R0

ltγ

β(4)

Similar to the implications of Equation 2 in the traditional SIR model if this criterion is satisfiedthen I can never increase One might also wonder if an analogous criterion for ldquoimmunizationrdquo againstrecovery can be derived for the irSIR model since the recovery and infection dynamics are now the sameIn the context of OSNs immunization against recovery would ensure that the OSN always grows in timeLooking at Equation 3c one can see that unless R0 = 0 R is always positive and R must always increasein time The implication is that there is no way to ldquoimmunizerdquo the infected population compartmentagainst infectious recovery in the irSIR model In the context of the irSIR model all OSNs are expectedto eventually decline

Methods

Google Trends Search Query Data

To test the theory of disease-like adoption dynamics of OSNs we use publicly available historical Googlesearch query data as a proxy for OSN usership as has been done in previous work [18] The public natureof this data is an important feature of the approach used in this paper as historical OSN user activitydata is typically proprietary and difficult to obtain [19] The Google search query data is obtained fromGooglersquos ldquoGoogle Trendsrdquo service and reports the relative number of Google search queries for a givensearch term [10] The use of search query data for the study of OSN adoption is advantageous comparedto using registration or membership data in that search query data provides a measure of the level of webtraffic for a given OSN Web traffic is arguably the best metric for OSN health in that it represents ameasure of user activity or interest within the network For example inactive members do not contributeto a social network and would not be counted using search data but would be counted using other metricssuch as registered account data Thus OSN usership as measured by search query data can be thoughtof as representing a less tangible albeit more meaningful metric of user activity or interest

The search query data obtained from Google Trends is presented in arbitrary units of weekly searchquery frequency which are normalized such that the maximum data point in the selected time periodcorresponds to a value of 100 The use of weekly search query data is advantageous over daily databecause it eliminates periodic variation due to day of week For example search queries for ldquoFacebookrdquotend to peak during the weekend The resolution of this search query data is 1 unit To increase theresolution of the search query data for model fitting shorter time periods of data are stitched togetherFor example consider two consecutive sets of weekly search query data chosen such that the last week ofthe first data set and the first week of the second data set are the same If that week has a value of 90 inthe first data set and 100 in the second then the two data sets can be stitched together by multiplyingthe second data set by a factor of 90100 so that they are now on the same scale This process can berepeated with data spanning the entire time range of interest and the resulting data set is subsequentlynormalized such that 100 corresponds to the maximum data point The ultimate result is a data set withresolution many times higher than the original 1 unit

The Google search query data for the term Facebook shows a large jump in search queries during earlyOctober 2012 as shown in Figure 1 This jump is assumed to be artifactual as it is unlikely that searchactivity for Facebook suddenly jumped by 20 in under a week and remained consistently at this elevatedlevel in the weeks following The authors believe this artifact is due to the October 5 2012 upgrade ofGoogles search algorithm to Penguin 3 [20] which coincides with the reported jump in Facebook searchqueries A similar jump albeit smaller in magnitude can also be observed in the Google search querydata for other search terms such as ldquoTwitterrdquo To the authors knowledge there is no publicly availableexplanation for how this update should affect the Google search query data reported by Google Trends

5

This jump is removed for the purposes of data analysis in this paper by multiplying the search querydata occurring after the week ending October 6 2012 by a factor of 0804 A multiplicative factor is usedinstead of subtraction of a constant because it is assumed to be more likely that an artifactual change insearch query reporting would affect all search queries equally as opposed to simply removing a constantamount of searches queries each week

The Google data for search query ldquoMySpacerdquo and search query ldquoFacebookrdquo are presented in Figure 2In Figure 2 the MySpace data is normalized to the same scale as the Facebook data in order to comparethe relative number of search queries for both OSNs Figure 2 shows a number of important featuresthat qualitatively validate the use of search query data as a proxy for OSN usership The shape of theMySpace curve in Figure 2 is in qualitative agreement with the timeline for the rise and fall of MySpacediscussed in Section [3] exhibiting a rise in 2005 a peak between 2007 and 2008 and a steady declinebetween 2009 and 2011 The intersection of the MySpace and Facebook curves occurs near April 2008the month in which Facebook is reported to have overtaken MySpace in terms of global web traffic [21]The Facebook curve in Figure 2 shows a decline in search activity starting in 2013 which corroboratesreports that the OSN started losing some of its younger user base in 2013 [22 23]

Search Query Data Curve Fitting

The Google search query data for the OSN is assumed to be representative of the magnitude of theinfected population compartment of the previously discussed epidemiological models Recall that theinfected compartment in the context of OSNs is the active user base of the OSN The epidemiologicalmodels are used to analyze the search query data by curve fitting the infected I(t) population curvegenerated by the model to the search query data The curve fitting is carried out using the MATLABcomputational software package [24] using the following approach to determine the best fit curve

To find the best fit curve an I(t) curve must be generated and then compared to the search querydata set This is achieved by choosing initial input parameters S0 I0 R0 β and ν and then solvingthe system of differential equations to determine the corresponding I(t) The sum of squared error (SSE)between the weekly search query data points and the generated I(t) curve is then calculated The bestfit curve is defined as the I(t) curve that minimizes the calculated SSE The best fit curve is determinedusing a built in MATLAB function fminsearch which uses a Nelder-Mead simplex method [25] tofind determine the set of 5 input parameters corresponding to the best fit I(t) curve This optimizationscheme uses a Dormand-Prince (Runge-Kutta 45) method [26] to compute S(t) I(t) and R(t) Oncethe best fit curve is determined the corresponding input parameters are saved so that the best fit curvecan now be generated by re-solving the system of differential equations using the best fit parameters

Results and discussion

MySpace

To validate the use of epidemiological models in OSN adoption dynamics we first consider the case ofMySpace MySpace is a particularly useful case study for model validation because it represents one ofthe largest OSNs in history to exhibit the full life cycle of an OSN from rise to fall MySpace also hasthe added advantage of having its entire lifespan occur within the range of search query data availablefrom Google Trends which is only available after January 2004 The data for search query MySpace isshown in Figure 3 as the solid blue line This is the same search query data that is presented in Figure2 but in Figure 3 the data is normalized such that the maximum value of weekly Google search queriesfor ldquoMySpacerdquo corresponds to a value of 100

To fit the Myspace data we first apply the SIR model As shown in Figure 3a the best fit SIRmodel provides a somewhat qualitative description of the search query data although it clearly fails to

6

fit near the shoulders of the curve The relatively poor fit is expected given that the constant recoveryrate assumption in the SIR model does not intuitively describe the adoption and abandonment dynamicsof OSNs Rather as discussed in detail in Section ldquorecoveryrdquo spreads infectiously users begin to leavethe OSN after their peers have left the OSN

To capture the intuitive infectious recovery dynamics for OSN abandonment the irSIR model isapplied to the Myspace data as shown in Fig 3b The irSIR model provides a significantly betterdescription of the search query data hugging the shoulders of the curve more tightly and resulting in a75 drop in SSE as seen in Table 2 The quality of fit of the irSIR model lends validity to the notion ofinfectious user abandonment of OSNs

Facebook

Having validated the application of the irSIR model to the adoption dynamics of OSNs we now applythe irSIR model to the Google search query data for Facebook Facebook presents an interesting casestudy as it is the largest OSN in history Moreover the search query data suggests that Facebook hasalready reached the peak of its popularity and has entered a decline phase as evidenced by the downwardtrend in search frequency after 2012 The presence of a year-long decline portion of the curve allows fordetermination of best fit R0 and ν parameters both of which are sensitive to the decline dynamics Giventhe early stage of Facebookrsquos decline at the time of writing the best fit irSIR model can be used not onlyto provide an explanation for the observed curve but also to forecast how the OSN will decline in thefuture according to the assumptions of the irSIR model

Applying the irSIR model to Facebook shows a high quality fit for the available search query data overthe time period of January 2004 to the last reported data point at the time of writing Extrapolating thebest fit into the future shows that Facebook is expected to undergo rapid decline in the upcoming yearsshrinking to 20 of its maximum size by December 2014 We will use 20 as an arbitrary definition forend of life so that we can define a so-called ldquo20 daterdquo as the OSNrsquos end of life date

To illustrate the strength of the prediction upper and lower model prediction bounds for the aban-donment of Facebook are overlaid alongside the best fit irSIR model in Figure 4 The correspondingfitting parameter values are tabulated in Table 2 The prediction bounds plotted in Figure 4 representthe two curves with the earliest and latest 20 dates that exhibit less than or equal to 15 higher SSEthan the true best fit curve Thus all possible irSIR model solutions with an increase in SSE of lessthan a 15 over the best fit are bounded between the plotted prediction bounds The presentation ofa bounded prediction region emphasizes that the plotted best fit curve should be thought of as the bestforecast given the available data

The differences between the 15 SSE prediction bound curves and the best fit curve are mainlydetermined by the R0 and ν parameters which follows from the importance of these parameters indescribing the abandonment of the OSN in the irSIR model The early prediction bound curve has ahigher ν parameter than the best fit model which is responsible for the higher rate of decline in the earlyprediction bound To compensate for the rapid decline R0 is decreased such that the onset of the declineis later and the error bound coincides with declining data occurring after 2012 The opposite is true forthe late prediction bound curve which has a lower ν and higher R0 than the best fit This results in acurve that declines slower than the best fit but also begins declining earlier such that it coincides withthe 2013 data

It is also important to understand the nature of the asymmetry of the prediction bounds Thedifference in 20 dates between the late bound and the best fit model is much larger than the differencein 20 dates between the best fit model and the early bound Since the early and late prediction boundcurves can be thought of as equally probable future trajectories for Facebook it follows that Facebookis more likely to deviate from the best fit curve on the side of slower decline The reason for thisasymmetry is that the decline is entirely dictated by data occurring in after 2012 If the post-2012 datais ignored a solution in which Facebook continues indefinitely at a constant size is possible with a finite

7

SSE determined mainly by deviation from the post-2012 data This implies that there exists an infiniterange of possible slower declining solutions than the best fit all with SSE values bounded primarily bydeviation from the post-2012 data Conversely solutions declining faster than the best fit are boundedby a lower limit set by the date of the last reported data point in the data set

Conclusions

In this paper we have applied a modified epidemiological model to describe the adoption and abandonmentdynamics of user activity of online social networks Using publicly available Google data for search queryMyspace as a case study we showed that the traditional SIR model for modeling disease dynamicsprovides a poor description of the data A 75 decrease in SSE is achieved by modifying the traditionalSIR model to incorporate infectious recovery dynamics which is a better description of OSN dynamicsHaving validated the irSIR model of OSN dynamics on Google data for search query Myspace we thenapplied the model to the Google data for search query Facebook Extrapolating the best fit model intothe future suggests that Facebook will undergo a rapid decline in the coming years losing 80 of its peakuser base between 2015 and 2017

Acknowledgments

This publication is the culmination of talk originally presented at the 2012 Princeton Research Symposiumtitled rdquoThe social network disease epidemiology and the demise of Facebookrdquo The authors acknowledgePRS for its provision of an intellectually stimulating forum for the discussion of research without whichthis work would not have been conceived The authors acknowledge the Princeton Graduate Schoolspecifically Dean William B Russel for continued support of PRS and help in meeting the costs ofpublication In addition the authors acknowledge Professor Craig B Arnold for fruitful golf discussion

References

1 Boyd D Ellison N (2007) Social network sites Definition history and scholarship Journal ofComputer-Mediated Communication 13 210-230

2 Google finance URL httpfinancegooglecom Accessed 2013-11-20

3 Gillette F (2011) The rise and inglorious fall of myspace URL httpcligs94lb3yf

4 (2012) Communication Technology Update and Fundamentals Taylor and Francis

5 Wu S Das Sarma A Fabrikant A Lattanzi S Tomkins A (2013) Arrival and departure dynamicsin social networks In Proceedings of the Sixth ACM International Conference on Web Searchand Data Mining WSDM rsquo13 pp 233ndash242

6 Goffman W (1966) Mathematical approach to the spread of scientific ideasthe history of mast cellresearch Nature 449452

7 Watts DJ (2002) A simple model of global cascades on random networks Proceedings of theNational Academy of Sciences 99 5766-5771

8 Bartholomew DJ (1984) Recent developments in nonlinear stochastic modelling of social processesCanadian Journal of Statistics 12 39ndash52

8

9 Bettencourt LM Cintrn-Arias A Kaiser DI Castillo-Chvez C (2006) The power of a good ideaQuantitative modeling of the spread of ideas from epidemiological models Physica A StatisticalMechanics and its Applications 364 513 - 536

10 Google trends URL httptrendsgooglecom Accessed 2014-1-6

11 Detecting influenza epidemics using search engine query data Nature 457 1012-1014

12 Choi H Varian H (2012) Predicting the present with google trends Economic Record 88 2-9

13 T Preis HSM Stanley HE (2013) Quantifying trading behavior in financial markets using googletrends Scientific Reports 3

14 Google flu trends URL httpwwwgoogleorgflutrends Accessed 2013-12-23

15 Carneiro HA Mylonakis E (2009) Google trends A web-based tool for real-time surveillance ofdisease outbreaks Clinical Infectious Diseases 49 1557-1564

16 Bailey N (1987) Mathematical Theory of Infectious Diseases Mathematics in Medicine SeriesOxford University Press Incorporated

17 Dasgupta K Singh R Viswanathan B Chakraborty D Mukherjea S et al (2008) Social ties andtheir relevance to churn in mobile telecom networks In Proceedings of the 11th InternationalConference on Extending Database Technology Advances in Database Technology EDBT rsquo08 pp668ndash677

18 Garcia D Mavrodiev P Schweitzer F (2013) Social resilience in online communities The autopsyof friendster In Proceedings of the First ACM Conference on Online Social Networks COSN rsquo13pp 39ndash50

19 Torkjazi M Rejaie R Willinger W (2009) Hot today gone tomorrow On the migration of myspaceusers In Proceedings of the 2Nd ACM Workshop on Online Social Networks WOSN rsquo09

20 McGee M Google penguin update 3 released impacts 03 of english-language queries URLhttpsearchenginelandcomgoogle-penguin-update-3-135527 Accessed 2013-11-20

21 (2013) Facebook is not only the worldrsquos largest social network it is also the fastest growing URLhttptechcrunchcom20080812facebook-is-not-only-the-worlds-largest-social-network-it-is-als

22 Facebook q3 2013 earnings call transcript URL httpinvestorfbcomresultscfm Ac-cessed 2013-12-02

23 Miller D (Nov 24 2013) What will we learn from the fall of facebook URLhttpblogsuclacuk

24 URL httpwwwmathworkscom

25 Lagarias J Reeds J Wright M Wright P (1998) Convergence properties of the nelderndashmead simplexmethod in low dimensions SIAM Journal on Optimization 9 112-147

26 Dormand J Prince P (1980) A family of embedded runge-kutta formulae Journal of Computationaland Applied Mathematics 6 19 - 26

9

115

95

105

125

12012 42012 72012 72013102012 12013 42013

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 1 Google search query data for ldquoFacebookrdquo between January 2012 and July 2013before and after removal of the artifactual October 2012 jump in search queries Both datasets are scaled such that 100 corresponds to the maximum weekly Google search queries for the set withthe jump removed over the plotted time period

2012201020082006 2014112004

100

80

60

40

20

MySpace

Facebook

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 2 Search query data for ldquoFacebookrdquo and ldquoMySpacerdquo obtained from Google Trendsoverlaid on top of each other The data for both curves are scaled such that 100 corresponds to themaximum weekly Google search queries for rdquoFacebookrdquo over the plotted time period Search queries forldquoMySpacerdquo peak at 10 of the maximum weekly search queries for ldquoFacebookrdquo in this time period

Figure Legends

Tables

10

100

80

60

40

20

112004 2008 2012

MySpace

best fitSIR model

best fitirSIR model

2004 2008 2012

MySpace

Date Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 3 Data for search query ldquoMyspacerdquo with best fit (a) SIR and (b) irSIR modelsoverlaid The search query data are normalized such that the maximum data point corresponds to avalue of 100

Facebook

best fitirSIR model

predictionbound curves

112004 2008 2012 2016 2020

100

80

60

40

20

Date

Norm

alized w

eekly

sear

ch q

uerie

s

early late

Figure 4 Data for search query ldquoFacebookrdquo with best fit irSIR model overlaid Predictionbounds corresponding to the earliest and latest decline times with a 15 increase in SSE over the bestfit are overlaid The data are scaled such that the maximum weekly search queries for ldquoFacebookrdquocorresponds to a value of 100

11

Table 1 Epidemiological model parameter definitionsSymbol Units Disease Model Parameter Equivalent OSN Model Parameter

S People Susceptible Potential OSN usersI People Infected OSN usersR People RecoveredImmune Population opposed to OSN useβ Timeminus1 Infection rate Rate at which potential users join OSNγ Timeminus1 recovery rate -ν Timeminus1 - OSN abandonment rate

Summary of parameters in the epidemiological models and their equivalent OSN analogues

Table 2 Epidemiological model fitting parametersFit β ν S0

NI0N

R0

NN SSE 20 date

MySpace SIR 492 middot 10minus2 539 (γ) 0996 41 middot 10minus3 0 (fixed) 322 167 middot 104 122010MySpace irSIR 598 middot 10minus2 268 middot 10minus2 0992 949 middot 10minus4 719 middot 10minus3 9294 433 middot 103 112010Facebook best 336 middot 10minus2 498 middot 10minus2 100 643 middot 10minus5 235 middot 10minus6 945 134 middot 103 12015Facebook early 343 middot 10minus2 823 middot 10minus2 100 547 middot 10minus5 104 middot 10minus9 9361 153 middot 103 072014Facebook late 327 middot 10minus2 271 middot 10minus2 100 809 middot 10minus5 410 middot 10minus4 970 154 middot 103 022016

Table showing fitting parameter values from fitting of search query data with epidemiological models

Page 2: Epidemiological modeling of online social network dynamics

2

to numbers of flu cases from CDC data which is the basis of Googlersquos ldquoGoogle Flu Trendsrdquo service [14]The results of Ref [11] show that search query data can be used as a proxy for a tangible quantity(flu cases) with the advantage that search query data is quicker and easier to obtain than the actualaggregated reported flu case data [15] In this work we similarly use Google search query data as a proxyfor another quantity of interest OSN usership for which data is difficult to obtain

Model

Traditional SIR model

The classical SIR model for the spread of infections disease is presented in Equations 1a-1c [16] Thevariable names are summarized in Table 1 which compares the epidemiological interpretation of thevariables to the equivalent OSN dynamical interpretation of the variables In short the SIR model is acollection of three ordinary differential equations that govern the rates at which three compartments of theentire population N (susceptible S infected I and recovered R) evolve during a disease outbreak wherethe superscripted dot indicates a time derivative Equations 1a-1c sum to 0 when added together revealingan underlying assumption that the population remains constant during the outbreak MathematicallyS + I +R = N independent of time This assumption is valid for disease outbreaks that are short livedcompared to the lifespan of the population members

S = minus

βIS

N(1a)

I =βIS

Nminus γI (1b)

R = γI (1c)

Equation 1a shows that the rate at which the susceptible population becomes infected is proportionateto the infection rate β the fraction of infected population I

N and the susceptible population S This

indicates that the disease is transferred through interaction between the susceptible and infected popu-lations with an infection rate of β Equation 1c shows that the rate at which the infected populationrecovers is proportionate to I and the recovery rate γ but does not require any sort of interaction betweenpopulation compartments as was necessary for the spread of infection

There is no analytical solution to the SIR model but the set of ODEs can be solved numericallywith the addition of initial conditions for each of the population compartments S(0) = S0 I(0) = I0and R(0) = R0 S0 is the initial compartment of the population that is susceptible to contracting thedisease and generally represents the majority of the population at early times I0 represents the sizeof the initial outbreak which is generally much smaller than the total population I0 must be non-zerofor an infection to begin to spread as given by equation 1b Note that for the purposes of fitting theSIR model to a data set if no initial parameters are known than R0 can be arbitrarily set to 0 This isbecause the magnitude of R does not affect any of the dynamics other than contributing to the size ofN However since terms containing β are always divided by N R0 can be set arbitrarily and β is thentaken as a fitting parameter

One interesting feature of the model is the immunization criterion

S0

Nlt

γ

β(2)

which is derived by setting the right hand side of Equation 1b less than or equal to 0 and rearrangingIf Equation 2 is satisfied then I is always less than or equal to 0 meaning that I can never increaseThis is the basis for immunization campaigns In the context of OSNs where OSN users are analogous to

3

the infected population compartment this immunization criterion represents the conditions under whicha OSN will strictly decline

The SIR model can be applied to OSNs by drawing the correct OSN analogues to the SIR model pa-rameters When applying the SIR model to OSNs the susceptible population compartment is equivalentto all users that could potentially join the OSN The infected population compartment is analogous toOSN users potential OSN users are susceptible to joining the OSN through ldquoinfectionrdquo by contact witha current OSN users Finally the recovered population compartment is analogous to the population ofpeople who are opposed to joining the OSN In the case of an OSN this could be comprised of peoplewho have left the OSN with no intention of returning or people who resist joining the OSN in the firstplace A summary of all the OSN analogues to the epidemiological model parameters is provided in Table1

Infectious recovery SIR model

While the traditional SIR model captures the infectious uptake dynamics of OSNs the assumptionof a characteristic recovery rate in the traditional SIR model is of doubtful validity for modeling theabandonment of OSNs [5] Contrary to the SIR modelrsquos assumption of a characteristic recovery ratefor diseases OSN users do not join an OSN expecting to leave after a predetermined amount of timeInstead every user that joins the network expects to stay indefinitely but ultimately loses interest astheir peers begin to lose interest Thus a user that joins early on is expected to stay on the networklonger than a user that joins later Eventually users begin to leave and recovery spreads infectiously asusers begin to lose interest in the social network The notion of infectious abandonment is supported bywork analyzing user churn in mobile networks which show that users are more likely to leave the networkif their contacts have left [17] Therefore it is necessary to modify the traditional SIR model to includeinfectious recovery dynamics which intuitively provide a better description of OSN abandonment

To incorporate infectious recovery dynamics the recovery rate must be modified to be proportionateto the recovered population fraction R

N similar to how the spread of infection among the susceptible

population in the SIR model is proportionate to the infected population fraction IN This is achieved by

multiplying the γI terms in Equations 1b and 1c by RN

to give Equations 3b and 3c The constant γ isreplaced with a new constant ν which now represents an infectious recovery rate with the same units

S = minus

βIS

N(3a)

I =βIS

Nminus

νIR

N(3b)

R =νIR

N(3c)

In modifying Equation 3b and 3c R0 is no longer an arbitrary parameter as it was in the SIR modelbecause the magnitude of R is now important to the recovery dynamics This results in an additionalfitting parameter when applying the irSIR model to the data Similar to the requirement of an initialinfected population in the traditional SIR model the irSIR model necessarily requires a small initialrecovered population If R0 were set to 0 none of the infected population would be able to recover sincerecovery now requires contact with a recovered individual Mathematically Equation 3c would be zerofor all time and I would grow at the expense of S until the entire population is infected In the contextof OSN dynamics R0 can be thought of as the first users to leave the OSN or a compartment of thepopulation that resists joining the OSN altogether This small initial compartment of OSN resistors isultimately responsible for the abandonment of the OSN

An immunization criterion can be derived for the irSIR model similar to Equation 2

4

S0

R0

ltγ

β(4)

Similar to the implications of Equation 2 in the traditional SIR model if this criterion is satisfiedthen I can never increase One might also wonder if an analogous criterion for ldquoimmunizationrdquo againstrecovery can be derived for the irSIR model since the recovery and infection dynamics are now the sameIn the context of OSNs immunization against recovery would ensure that the OSN always grows in timeLooking at Equation 3c one can see that unless R0 = 0 R is always positive and R must always increasein time The implication is that there is no way to ldquoimmunizerdquo the infected population compartmentagainst infectious recovery in the irSIR model In the context of the irSIR model all OSNs are expectedto eventually decline

Methods

Google Trends Search Query Data

To test the theory of disease-like adoption dynamics of OSNs we use publicly available historical Googlesearch query data as a proxy for OSN usership as has been done in previous work [18] The public natureof this data is an important feature of the approach used in this paper as historical OSN user activitydata is typically proprietary and difficult to obtain [19] The Google search query data is obtained fromGooglersquos ldquoGoogle Trendsrdquo service and reports the relative number of Google search queries for a givensearch term [10] The use of search query data for the study of OSN adoption is advantageous comparedto using registration or membership data in that search query data provides a measure of the level of webtraffic for a given OSN Web traffic is arguably the best metric for OSN health in that it represents ameasure of user activity or interest within the network For example inactive members do not contributeto a social network and would not be counted using search data but would be counted using other metricssuch as registered account data Thus OSN usership as measured by search query data can be thoughtof as representing a less tangible albeit more meaningful metric of user activity or interest

The search query data obtained from Google Trends is presented in arbitrary units of weekly searchquery frequency which are normalized such that the maximum data point in the selected time periodcorresponds to a value of 100 The use of weekly search query data is advantageous over daily databecause it eliminates periodic variation due to day of week For example search queries for ldquoFacebookrdquotend to peak during the weekend The resolution of this search query data is 1 unit To increase theresolution of the search query data for model fitting shorter time periods of data are stitched togetherFor example consider two consecutive sets of weekly search query data chosen such that the last week ofthe first data set and the first week of the second data set are the same If that week has a value of 90 inthe first data set and 100 in the second then the two data sets can be stitched together by multiplyingthe second data set by a factor of 90100 so that they are now on the same scale This process can berepeated with data spanning the entire time range of interest and the resulting data set is subsequentlynormalized such that 100 corresponds to the maximum data point The ultimate result is a data set withresolution many times higher than the original 1 unit

The Google search query data for the term Facebook shows a large jump in search queries during earlyOctober 2012 as shown in Figure 1 This jump is assumed to be artifactual as it is unlikely that searchactivity for Facebook suddenly jumped by 20 in under a week and remained consistently at this elevatedlevel in the weeks following The authors believe this artifact is due to the October 5 2012 upgrade ofGoogles search algorithm to Penguin 3 [20] which coincides with the reported jump in Facebook searchqueries A similar jump albeit smaller in magnitude can also be observed in the Google search querydata for other search terms such as ldquoTwitterrdquo To the authors knowledge there is no publicly availableexplanation for how this update should affect the Google search query data reported by Google Trends

5

This jump is removed for the purposes of data analysis in this paper by multiplying the search querydata occurring after the week ending October 6 2012 by a factor of 0804 A multiplicative factor is usedinstead of subtraction of a constant because it is assumed to be more likely that an artifactual change insearch query reporting would affect all search queries equally as opposed to simply removing a constantamount of searches queries each week

The Google data for search query ldquoMySpacerdquo and search query ldquoFacebookrdquo are presented in Figure 2In Figure 2 the MySpace data is normalized to the same scale as the Facebook data in order to comparethe relative number of search queries for both OSNs Figure 2 shows a number of important featuresthat qualitatively validate the use of search query data as a proxy for OSN usership The shape of theMySpace curve in Figure 2 is in qualitative agreement with the timeline for the rise and fall of MySpacediscussed in Section [3] exhibiting a rise in 2005 a peak between 2007 and 2008 and a steady declinebetween 2009 and 2011 The intersection of the MySpace and Facebook curves occurs near April 2008the month in which Facebook is reported to have overtaken MySpace in terms of global web traffic [21]The Facebook curve in Figure 2 shows a decline in search activity starting in 2013 which corroboratesreports that the OSN started losing some of its younger user base in 2013 [22 23]

Search Query Data Curve Fitting

The Google search query data for the OSN is assumed to be representative of the magnitude of theinfected population compartment of the previously discussed epidemiological models Recall that theinfected compartment in the context of OSNs is the active user base of the OSN The epidemiologicalmodels are used to analyze the search query data by curve fitting the infected I(t) population curvegenerated by the model to the search query data The curve fitting is carried out using the MATLABcomputational software package [24] using the following approach to determine the best fit curve

To find the best fit curve an I(t) curve must be generated and then compared to the search querydata set This is achieved by choosing initial input parameters S0 I0 R0 β and ν and then solvingthe system of differential equations to determine the corresponding I(t) The sum of squared error (SSE)between the weekly search query data points and the generated I(t) curve is then calculated The bestfit curve is defined as the I(t) curve that minimizes the calculated SSE The best fit curve is determinedusing a built in MATLAB function fminsearch which uses a Nelder-Mead simplex method [25] tofind determine the set of 5 input parameters corresponding to the best fit I(t) curve This optimizationscheme uses a Dormand-Prince (Runge-Kutta 45) method [26] to compute S(t) I(t) and R(t) Oncethe best fit curve is determined the corresponding input parameters are saved so that the best fit curvecan now be generated by re-solving the system of differential equations using the best fit parameters

Results and discussion

MySpace

To validate the use of epidemiological models in OSN adoption dynamics we first consider the case ofMySpace MySpace is a particularly useful case study for model validation because it represents one ofthe largest OSNs in history to exhibit the full life cycle of an OSN from rise to fall MySpace also hasthe added advantage of having its entire lifespan occur within the range of search query data availablefrom Google Trends which is only available after January 2004 The data for search query MySpace isshown in Figure 3 as the solid blue line This is the same search query data that is presented in Figure2 but in Figure 3 the data is normalized such that the maximum value of weekly Google search queriesfor ldquoMySpacerdquo corresponds to a value of 100

To fit the Myspace data we first apply the SIR model As shown in Figure 3a the best fit SIRmodel provides a somewhat qualitative description of the search query data although it clearly fails to

6

fit near the shoulders of the curve The relatively poor fit is expected given that the constant recoveryrate assumption in the SIR model does not intuitively describe the adoption and abandonment dynamicsof OSNs Rather as discussed in detail in Section ldquorecoveryrdquo spreads infectiously users begin to leavethe OSN after their peers have left the OSN

To capture the intuitive infectious recovery dynamics for OSN abandonment the irSIR model isapplied to the Myspace data as shown in Fig 3b The irSIR model provides a significantly betterdescription of the search query data hugging the shoulders of the curve more tightly and resulting in a75 drop in SSE as seen in Table 2 The quality of fit of the irSIR model lends validity to the notion ofinfectious user abandonment of OSNs

Facebook

Having validated the application of the irSIR model to the adoption dynamics of OSNs we now applythe irSIR model to the Google search query data for Facebook Facebook presents an interesting casestudy as it is the largest OSN in history Moreover the search query data suggests that Facebook hasalready reached the peak of its popularity and has entered a decline phase as evidenced by the downwardtrend in search frequency after 2012 The presence of a year-long decline portion of the curve allows fordetermination of best fit R0 and ν parameters both of which are sensitive to the decline dynamics Giventhe early stage of Facebookrsquos decline at the time of writing the best fit irSIR model can be used not onlyto provide an explanation for the observed curve but also to forecast how the OSN will decline in thefuture according to the assumptions of the irSIR model

Applying the irSIR model to Facebook shows a high quality fit for the available search query data overthe time period of January 2004 to the last reported data point at the time of writing Extrapolating thebest fit into the future shows that Facebook is expected to undergo rapid decline in the upcoming yearsshrinking to 20 of its maximum size by December 2014 We will use 20 as an arbitrary definition forend of life so that we can define a so-called ldquo20 daterdquo as the OSNrsquos end of life date

To illustrate the strength of the prediction upper and lower model prediction bounds for the aban-donment of Facebook are overlaid alongside the best fit irSIR model in Figure 4 The correspondingfitting parameter values are tabulated in Table 2 The prediction bounds plotted in Figure 4 representthe two curves with the earliest and latest 20 dates that exhibit less than or equal to 15 higher SSEthan the true best fit curve Thus all possible irSIR model solutions with an increase in SSE of lessthan a 15 over the best fit are bounded between the plotted prediction bounds The presentation ofa bounded prediction region emphasizes that the plotted best fit curve should be thought of as the bestforecast given the available data

The differences between the 15 SSE prediction bound curves and the best fit curve are mainlydetermined by the R0 and ν parameters which follows from the importance of these parameters indescribing the abandonment of the OSN in the irSIR model The early prediction bound curve has ahigher ν parameter than the best fit model which is responsible for the higher rate of decline in the earlyprediction bound To compensate for the rapid decline R0 is decreased such that the onset of the declineis later and the error bound coincides with declining data occurring after 2012 The opposite is true forthe late prediction bound curve which has a lower ν and higher R0 than the best fit This results in acurve that declines slower than the best fit but also begins declining earlier such that it coincides withthe 2013 data

It is also important to understand the nature of the asymmetry of the prediction bounds Thedifference in 20 dates between the late bound and the best fit model is much larger than the differencein 20 dates between the best fit model and the early bound Since the early and late prediction boundcurves can be thought of as equally probable future trajectories for Facebook it follows that Facebookis more likely to deviate from the best fit curve on the side of slower decline The reason for thisasymmetry is that the decline is entirely dictated by data occurring in after 2012 If the post-2012 datais ignored a solution in which Facebook continues indefinitely at a constant size is possible with a finite

7

SSE determined mainly by deviation from the post-2012 data This implies that there exists an infiniterange of possible slower declining solutions than the best fit all with SSE values bounded primarily bydeviation from the post-2012 data Conversely solutions declining faster than the best fit are boundedby a lower limit set by the date of the last reported data point in the data set

Conclusions

In this paper we have applied a modified epidemiological model to describe the adoption and abandonmentdynamics of user activity of online social networks Using publicly available Google data for search queryMyspace as a case study we showed that the traditional SIR model for modeling disease dynamicsprovides a poor description of the data A 75 decrease in SSE is achieved by modifying the traditionalSIR model to incorporate infectious recovery dynamics which is a better description of OSN dynamicsHaving validated the irSIR model of OSN dynamics on Google data for search query Myspace we thenapplied the model to the Google data for search query Facebook Extrapolating the best fit model intothe future suggests that Facebook will undergo a rapid decline in the coming years losing 80 of its peakuser base between 2015 and 2017

Acknowledgments

This publication is the culmination of talk originally presented at the 2012 Princeton Research Symposiumtitled rdquoThe social network disease epidemiology and the demise of Facebookrdquo The authors acknowledgePRS for its provision of an intellectually stimulating forum for the discussion of research without whichthis work would not have been conceived The authors acknowledge the Princeton Graduate Schoolspecifically Dean William B Russel for continued support of PRS and help in meeting the costs ofpublication In addition the authors acknowledge Professor Craig B Arnold for fruitful golf discussion

References

1 Boyd D Ellison N (2007) Social network sites Definition history and scholarship Journal ofComputer-Mediated Communication 13 210-230

2 Google finance URL httpfinancegooglecom Accessed 2013-11-20

3 Gillette F (2011) The rise and inglorious fall of myspace URL httpcligs94lb3yf

4 (2012) Communication Technology Update and Fundamentals Taylor and Francis

5 Wu S Das Sarma A Fabrikant A Lattanzi S Tomkins A (2013) Arrival and departure dynamicsin social networks In Proceedings of the Sixth ACM International Conference on Web Searchand Data Mining WSDM rsquo13 pp 233ndash242

6 Goffman W (1966) Mathematical approach to the spread of scientific ideasthe history of mast cellresearch Nature 449452

7 Watts DJ (2002) A simple model of global cascades on random networks Proceedings of theNational Academy of Sciences 99 5766-5771

8 Bartholomew DJ (1984) Recent developments in nonlinear stochastic modelling of social processesCanadian Journal of Statistics 12 39ndash52

8

9 Bettencourt LM Cintrn-Arias A Kaiser DI Castillo-Chvez C (2006) The power of a good ideaQuantitative modeling of the spread of ideas from epidemiological models Physica A StatisticalMechanics and its Applications 364 513 - 536

10 Google trends URL httptrendsgooglecom Accessed 2014-1-6

11 Detecting influenza epidemics using search engine query data Nature 457 1012-1014

12 Choi H Varian H (2012) Predicting the present with google trends Economic Record 88 2-9

13 T Preis HSM Stanley HE (2013) Quantifying trading behavior in financial markets using googletrends Scientific Reports 3

14 Google flu trends URL httpwwwgoogleorgflutrends Accessed 2013-12-23

15 Carneiro HA Mylonakis E (2009) Google trends A web-based tool for real-time surveillance ofdisease outbreaks Clinical Infectious Diseases 49 1557-1564

16 Bailey N (1987) Mathematical Theory of Infectious Diseases Mathematics in Medicine SeriesOxford University Press Incorporated

17 Dasgupta K Singh R Viswanathan B Chakraborty D Mukherjea S et al (2008) Social ties andtheir relevance to churn in mobile telecom networks In Proceedings of the 11th InternationalConference on Extending Database Technology Advances in Database Technology EDBT rsquo08 pp668ndash677

18 Garcia D Mavrodiev P Schweitzer F (2013) Social resilience in online communities The autopsyof friendster In Proceedings of the First ACM Conference on Online Social Networks COSN rsquo13pp 39ndash50

19 Torkjazi M Rejaie R Willinger W (2009) Hot today gone tomorrow On the migration of myspaceusers In Proceedings of the 2Nd ACM Workshop on Online Social Networks WOSN rsquo09

20 McGee M Google penguin update 3 released impacts 03 of english-language queries URLhttpsearchenginelandcomgoogle-penguin-update-3-135527 Accessed 2013-11-20

21 (2013) Facebook is not only the worldrsquos largest social network it is also the fastest growing URLhttptechcrunchcom20080812facebook-is-not-only-the-worlds-largest-social-network-it-is-als

22 Facebook q3 2013 earnings call transcript URL httpinvestorfbcomresultscfm Ac-cessed 2013-12-02

23 Miller D (Nov 24 2013) What will we learn from the fall of facebook URLhttpblogsuclacuk

24 URL httpwwwmathworkscom

25 Lagarias J Reeds J Wright M Wright P (1998) Convergence properties of the nelderndashmead simplexmethod in low dimensions SIAM Journal on Optimization 9 112-147

26 Dormand J Prince P (1980) A family of embedded runge-kutta formulae Journal of Computationaland Applied Mathematics 6 19 - 26

9

115

95

105

125

12012 42012 72012 72013102012 12013 42013

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 1 Google search query data for ldquoFacebookrdquo between January 2012 and July 2013before and after removal of the artifactual October 2012 jump in search queries Both datasets are scaled such that 100 corresponds to the maximum weekly Google search queries for the set withthe jump removed over the plotted time period

2012201020082006 2014112004

100

80

60

40

20

MySpace

Facebook

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 2 Search query data for ldquoFacebookrdquo and ldquoMySpacerdquo obtained from Google Trendsoverlaid on top of each other The data for both curves are scaled such that 100 corresponds to themaximum weekly Google search queries for rdquoFacebookrdquo over the plotted time period Search queries forldquoMySpacerdquo peak at 10 of the maximum weekly search queries for ldquoFacebookrdquo in this time period

Figure Legends

Tables

10

100

80

60

40

20

112004 2008 2012

MySpace

best fitSIR model

best fitirSIR model

2004 2008 2012

MySpace

Date Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 3 Data for search query ldquoMyspacerdquo with best fit (a) SIR and (b) irSIR modelsoverlaid The search query data are normalized such that the maximum data point corresponds to avalue of 100

Facebook

best fitirSIR model

predictionbound curves

112004 2008 2012 2016 2020

100

80

60

40

20

Date

Norm

alized w

eekly

sear

ch q

uerie

s

early late

Figure 4 Data for search query ldquoFacebookrdquo with best fit irSIR model overlaid Predictionbounds corresponding to the earliest and latest decline times with a 15 increase in SSE over the bestfit are overlaid The data are scaled such that the maximum weekly search queries for ldquoFacebookrdquocorresponds to a value of 100

11

Table 1 Epidemiological model parameter definitionsSymbol Units Disease Model Parameter Equivalent OSN Model Parameter

S People Susceptible Potential OSN usersI People Infected OSN usersR People RecoveredImmune Population opposed to OSN useβ Timeminus1 Infection rate Rate at which potential users join OSNγ Timeminus1 recovery rate -ν Timeminus1 - OSN abandonment rate

Summary of parameters in the epidemiological models and their equivalent OSN analogues

Table 2 Epidemiological model fitting parametersFit β ν S0

NI0N

R0

NN SSE 20 date

MySpace SIR 492 middot 10minus2 539 (γ) 0996 41 middot 10minus3 0 (fixed) 322 167 middot 104 122010MySpace irSIR 598 middot 10minus2 268 middot 10minus2 0992 949 middot 10minus4 719 middot 10minus3 9294 433 middot 103 112010Facebook best 336 middot 10minus2 498 middot 10minus2 100 643 middot 10minus5 235 middot 10minus6 945 134 middot 103 12015Facebook early 343 middot 10minus2 823 middot 10minus2 100 547 middot 10minus5 104 middot 10minus9 9361 153 middot 103 072014Facebook late 327 middot 10minus2 271 middot 10minus2 100 809 middot 10minus5 410 middot 10minus4 970 154 middot 103 022016

Table showing fitting parameter values from fitting of search query data with epidemiological models

Page 3: Epidemiological modeling of online social network dynamics

3

the infected population compartment this immunization criterion represents the conditions under whicha OSN will strictly decline

The SIR model can be applied to OSNs by drawing the correct OSN analogues to the SIR model pa-rameters When applying the SIR model to OSNs the susceptible population compartment is equivalentto all users that could potentially join the OSN The infected population compartment is analogous toOSN users potential OSN users are susceptible to joining the OSN through ldquoinfectionrdquo by contact witha current OSN users Finally the recovered population compartment is analogous to the population ofpeople who are opposed to joining the OSN In the case of an OSN this could be comprised of peoplewho have left the OSN with no intention of returning or people who resist joining the OSN in the firstplace A summary of all the OSN analogues to the epidemiological model parameters is provided in Table1

Infectious recovery SIR model

While the traditional SIR model captures the infectious uptake dynamics of OSNs the assumptionof a characteristic recovery rate in the traditional SIR model is of doubtful validity for modeling theabandonment of OSNs [5] Contrary to the SIR modelrsquos assumption of a characteristic recovery ratefor diseases OSN users do not join an OSN expecting to leave after a predetermined amount of timeInstead every user that joins the network expects to stay indefinitely but ultimately loses interest astheir peers begin to lose interest Thus a user that joins early on is expected to stay on the networklonger than a user that joins later Eventually users begin to leave and recovery spreads infectiously asusers begin to lose interest in the social network The notion of infectious abandonment is supported bywork analyzing user churn in mobile networks which show that users are more likely to leave the networkif their contacts have left [17] Therefore it is necessary to modify the traditional SIR model to includeinfectious recovery dynamics which intuitively provide a better description of OSN abandonment

To incorporate infectious recovery dynamics the recovery rate must be modified to be proportionateto the recovered population fraction R

N similar to how the spread of infection among the susceptible

population in the SIR model is proportionate to the infected population fraction IN This is achieved by

multiplying the γI terms in Equations 1b and 1c by RN

to give Equations 3b and 3c The constant γ isreplaced with a new constant ν which now represents an infectious recovery rate with the same units

S = minus

βIS

N(3a)

I =βIS

Nminus

νIR

N(3b)

R =νIR

N(3c)

In modifying Equation 3b and 3c R0 is no longer an arbitrary parameter as it was in the SIR modelbecause the magnitude of R is now important to the recovery dynamics This results in an additionalfitting parameter when applying the irSIR model to the data Similar to the requirement of an initialinfected population in the traditional SIR model the irSIR model necessarily requires a small initialrecovered population If R0 were set to 0 none of the infected population would be able to recover sincerecovery now requires contact with a recovered individual Mathematically Equation 3c would be zerofor all time and I would grow at the expense of S until the entire population is infected In the contextof OSN dynamics R0 can be thought of as the first users to leave the OSN or a compartment of thepopulation that resists joining the OSN altogether This small initial compartment of OSN resistors isultimately responsible for the abandonment of the OSN

An immunization criterion can be derived for the irSIR model similar to Equation 2

4

S0

R0

ltγ

β(4)

Similar to the implications of Equation 2 in the traditional SIR model if this criterion is satisfiedthen I can never increase One might also wonder if an analogous criterion for ldquoimmunizationrdquo againstrecovery can be derived for the irSIR model since the recovery and infection dynamics are now the sameIn the context of OSNs immunization against recovery would ensure that the OSN always grows in timeLooking at Equation 3c one can see that unless R0 = 0 R is always positive and R must always increasein time The implication is that there is no way to ldquoimmunizerdquo the infected population compartmentagainst infectious recovery in the irSIR model In the context of the irSIR model all OSNs are expectedto eventually decline

Methods

Google Trends Search Query Data

To test the theory of disease-like adoption dynamics of OSNs we use publicly available historical Googlesearch query data as a proxy for OSN usership as has been done in previous work [18] The public natureof this data is an important feature of the approach used in this paper as historical OSN user activitydata is typically proprietary and difficult to obtain [19] The Google search query data is obtained fromGooglersquos ldquoGoogle Trendsrdquo service and reports the relative number of Google search queries for a givensearch term [10] The use of search query data for the study of OSN adoption is advantageous comparedto using registration or membership data in that search query data provides a measure of the level of webtraffic for a given OSN Web traffic is arguably the best metric for OSN health in that it represents ameasure of user activity or interest within the network For example inactive members do not contributeto a social network and would not be counted using search data but would be counted using other metricssuch as registered account data Thus OSN usership as measured by search query data can be thoughtof as representing a less tangible albeit more meaningful metric of user activity or interest

The search query data obtained from Google Trends is presented in arbitrary units of weekly searchquery frequency which are normalized such that the maximum data point in the selected time periodcorresponds to a value of 100 The use of weekly search query data is advantageous over daily databecause it eliminates periodic variation due to day of week For example search queries for ldquoFacebookrdquotend to peak during the weekend The resolution of this search query data is 1 unit To increase theresolution of the search query data for model fitting shorter time periods of data are stitched togetherFor example consider two consecutive sets of weekly search query data chosen such that the last week ofthe first data set and the first week of the second data set are the same If that week has a value of 90 inthe first data set and 100 in the second then the two data sets can be stitched together by multiplyingthe second data set by a factor of 90100 so that they are now on the same scale This process can berepeated with data spanning the entire time range of interest and the resulting data set is subsequentlynormalized such that 100 corresponds to the maximum data point The ultimate result is a data set withresolution many times higher than the original 1 unit

The Google search query data for the term Facebook shows a large jump in search queries during earlyOctober 2012 as shown in Figure 1 This jump is assumed to be artifactual as it is unlikely that searchactivity for Facebook suddenly jumped by 20 in under a week and remained consistently at this elevatedlevel in the weeks following The authors believe this artifact is due to the October 5 2012 upgrade ofGoogles search algorithm to Penguin 3 [20] which coincides with the reported jump in Facebook searchqueries A similar jump albeit smaller in magnitude can also be observed in the Google search querydata for other search terms such as ldquoTwitterrdquo To the authors knowledge there is no publicly availableexplanation for how this update should affect the Google search query data reported by Google Trends

5

This jump is removed for the purposes of data analysis in this paper by multiplying the search querydata occurring after the week ending October 6 2012 by a factor of 0804 A multiplicative factor is usedinstead of subtraction of a constant because it is assumed to be more likely that an artifactual change insearch query reporting would affect all search queries equally as opposed to simply removing a constantamount of searches queries each week

The Google data for search query ldquoMySpacerdquo and search query ldquoFacebookrdquo are presented in Figure 2In Figure 2 the MySpace data is normalized to the same scale as the Facebook data in order to comparethe relative number of search queries for both OSNs Figure 2 shows a number of important featuresthat qualitatively validate the use of search query data as a proxy for OSN usership The shape of theMySpace curve in Figure 2 is in qualitative agreement with the timeline for the rise and fall of MySpacediscussed in Section [3] exhibiting a rise in 2005 a peak between 2007 and 2008 and a steady declinebetween 2009 and 2011 The intersection of the MySpace and Facebook curves occurs near April 2008the month in which Facebook is reported to have overtaken MySpace in terms of global web traffic [21]The Facebook curve in Figure 2 shows a decline in search activity starting in 2013 which corroboratesreports that the OSN started losing some of its younger user base in 2013 [22 23]

Search Query Data Curve Fitting

The Google search query data for the OSN is assumed to be representative of the magnitude of theinfected population compartment of the previously discussed epidemiological models Recall that theinfected compartment in the context of OSNs is the active user base of the OSN The epidemiologicalmodels are used to analyze the search query data by curve fitting the infected I(t) population curvegenerated by the model to the search query data The curve fitting is carried out using the MATLABcomputational software package [24] using the following approach to determine the best fit curve

To find the best fit curve an I(t) curve must be generated and then compared to the search querydata set This is achieved by choosing initial input parameters S0 I0 R0 β and ν and then solvingthe system of differential equations to determine the corresponding I(t) The sum of squared error (SSE)between the weekly search query data points and the generated I(t) curve is then calculated The bestfit curve is defined as the I(t) curve that minimizes the calculated SSE The best fit curve is determinedusing a built in MATLAB function fminsearch which uses a Nelder-Mead simplex method [25] tofind determine the set of 5 input parameters corresponding to the best fit I(t) curve This optimizationscheme uses a Dormand-Prince (Runge-Kutta 45) method [26] to compute S(t) I(t) and R(t) Oncethe best fit curve is determined the corresponding input parameters are saved so that the best fit curvecan now be generated by re-solving the system of differential equations using the best fit parameters

Results and discussion

MySpace

To validate the use of epidemiological models in OSN adoption dynamics we first consider the case ofMySpace MySpace is a particularly useful case study for model validation because it represents one ofthe largest OSNs in history to exhibit the full life cycle of an OSN from rise to fall MySpace also hasthe added advantage of having its entire lifespan occur within the range of search query data availablefrom Google Trends which is only available after January 2004 The data for search query MySpace isshown in Figure 3 as the solid blue line This is the same search query data that is presented in Figure2 but in Figure 3 the data is normalized such that the maximum value of weekly Google search queriesfor ldquoMySpacerdquo corresponds to a value of 100

To fit the Myspace data we first apply the SIR model As shown in Figure 3a the best fit SIRmodel provides a somewhat qualitative description of the search query data although it clearly fails to

6

fit near the shoulders of the curve The relatively poor fit is expected given that the constant recoveryrate assumption in the SIR model does not intuitively describe the adoption and abandonment dynamicsof OSNs Rather as discussed in detail in Section ldquorecoveryrdquo spreads infectiously users begin to leavethe OSN after their peers have left the OSN

To capture the intuitive infectious recovery dynamics for OSN abandonment the irSIR model isapplied to the Myspace data as shown in Fig 3b The irSIR model provides a significantly betterdescription of the search query data hugging the shoulders of the curve more tightly and resulting in a75 drop in SSE as seen in Table 2 The quality of fit of the irSIR model lends validity to the notion ofinfectious user abandonment of OSNs

Facebook

Having validated the application of the irSIR model to the adoption dynamics of OSNs we now applythe irSIR model to the Google search query data for Facebook Facebook presents an interesting casestudy as it is the largest OSN in history Moreover the search query data suggests that Facebook hasalready reached the peak of its popularity and has entered a decline phase as evidenced by the downwardtrend in search frequency after 2012 The presence of a year-long decline portion of the curve allows fordetermination of best fit R0 and ν parameters both of which are sensitive to the decline dynamics Giventhe early stage of Facebookrsquos decline at the time of writing the best fit irSIR model can be used not onlyto provide an explanation for the observed curve but also to forecast how the OSN will decline in thefuture according to the assumptions of the irSIR model

Applying the irSIR model to Facebook shows a high quality fit for the available search query data overthe time period of January 2004 to the last reported data point at the time of writing Extrapolating thebest fit into the future shows that Facebook is expected to undergo rapid decline in the upcoming yearsshrinking to 20 of its maximum size by December 2014 We will use 20 as an arbitrary definition forend of life so that we can define a so-called ldquo20 daterdquo as the OSNrsquos end of life date

To illustrate the strength of the prediction upper and lower model prediction bounds for the aban-donment of Facebook are overlaid alongside the best fit irSIR model in Figure 4 The correspondingfitting parameter values are tabulated in Table 2 The prediction bounds plotted in Figure 4 representthe two curves with the earliest and latest 20 dates that exhibit less than or equal to 15 higher SSEthan the true best fit curve Thus all possible irSIR model solutions with an increase in SSE of lessthan a 15 over the best fit are bounded between the plotted prediction bounds The presentation ofa bounded prediction region emphasizes that the plotted best fit curve should be thought of as the bestforecast given the available data

The differences between the 15 SSE prediction bound curves and the best fit curve are mainlydetermined by the R0 and ν parameters which follows from the importance of these parameters indescribing the abandonment of the OSN in the irSIR model The early prediction bound curve has ahigher ν parameter than the best fit model which is responsible for the higher rate of decline in the earlyprediction bound To compensate for the rapid decline R0 is decreased such that the onset of the declineis later and the error bound coincides with declining data occurring after 2012 The opposite is true forthe late prediction bound curve which has a lower ν and higher R0 than the best fit This results in acurve that declines slower than the best fit but also begins declining earlier such that it coincides withthe 2013 data

It is also important to understand the nature of the asymmetry of the prediction bounds Thedifference in 20 dates between the late bound and the best fit model is much larger than the differencein 20 dates between the best fit model and the early bound Since the early and late prediction boundcurves can be thought of as equally probable future trajectories for Facebook it follows that Facebookis more likely to deviate from the best fit curve on the side of slower decline The reason for thisasymmetry is that the decline is entirely dictated by data occurring in after 2012 If the post-2012 datais ignored a solution in which Facebook continues indefinitely at a constant size is possible with a finite

7

SSE determined mainly by deviation from the post-2012 data This implies that there exists an infiniterange of possible slower declining solutions than the best fit all with SSE values bounded primarily bydeviation from the post-2012 data Conversely solutions declining faster than the best fit are boundedby a lower limit set by the date of the last reported data point in the data set

Conclusions

In this paper we have applied a modified epidemiological model to describe the adoption and abandonmentdynamics of user activity of online social networks Using publicly available Google data for search queryMyspace as a case study we showed that the traditional SIR model for modeling disease dynamicsprovides a poor description of the data A 75 decrease in SSE is achieved by modifying the traditionalSIR model to incorporate infectious recovery dynamics which is a better description of OSN dynamicsHaving validated the irSIR model of OSN dynamics on Google data for search query Myspace we thenapplied the model to the Google data for search query Facebook Extrapolating the best fit model intothe future suggests that Facebook will undergo a rapid decline in the coming years losing 80 of its peakuser base between 2015 and 2017

Acknowledgments

This publication is the culmination of talk originally presented at the 2012 Princeton Research Symposiumtitled rdquoThe social network disease epidemiology and the demise of Facebookrdquo The authors acknowledgePRS for its provision of an intellectually stimulating forum for the discussion of research without whichthis work would not have been conceived The authors acknowledge the Princeton Graduate Schoolspecifically Dean William B Russel for continued support of PRS and help in meeting the costs ofpublication In addition the authors acknowledge Professor Craig B Arnold for fruitful golf discussion

References

1 Boyd D Ellison N (2007) Social network sites Definition history and scholarship Journal ofComputer-Mediated Communication 13 210-230

2 Google finance URL httpfinancegooglecom Accessed 2013-11-20

3 Gillette F (2011) The rise and inglorious fall of myspace URL httpcligs94lb3yf

4 (2012) Communication Technology Update and Fundamentals Taylor and Francis

5 Wu S Das Sarma A Fabrikant A Lattanzi S Tomkins A (2013) Arrival and departure dynamicsin social networks In Proceedings of the Sixth ACM International Conference on Web Searchand Data Mining WSDM rsquo13 pp 233ndash242

6 Goffman W (1966) Mathematical approach to the spread of scientific ideasthe history of mast cellresearch Nature 449452

7 Watts DJ (2002) A simple model of global cascades on random networks Proceedings of theNational Academy of Sciences 99 5766-5771

8 Bartholomew DJ (1984) Recent developments in nonlinear stochastic modelling of social processesCanadian Journal of Statistics 12 39ndash52

8

9 Bettencourt LM Cintrn-Arias A Kaiser DI Castillo-Chvez C (2006) The power of a good ideaQuantitative modeling of the spread of ideas from epidemiological models Physica A StatisticalMechanics and its Applications 364 513 - 536

10 Google trends URL httptrendsgooglecom Accessed 2014-1-6

11 Detecting influenza epidemics using search engine query data Nature 457 1012-1014

12 Choi H Varian H (2012) Predicting the present with google trends Economic Record 88 2-9

13 T Preis HSM Stanley HE (2013) Quantifying trading behavior in financial markets using googletrends Scientific Reports 3

14 Google flu trends URL httpwwwgoogleorgflutrends Accessed 2013-12-23

15 Carneiro HA Mylonakis E (2009) Google trends A web-based tool for real-time surveillance ofdisease outbreaks Clinical Infectious Diseases 49 1557-1564

16 Bailey N (1987) Mathematical Theory of Infectious Diseases Mathematics in Medicine SeriesOxford University Press Incorporated

17 Dasgupta K Singh R Viswanathan B Chakraborty D Mukherjea S et al (2008) Social ties andtheir relevance to churn in mobile telecom networks In Proceedings of the 11th InternationalConference on Extending Database Technology Advances in Database Technology EDBT rsquo08 pp668ndash677

18 Garcia D Mavrodiev P Schweitzer F (2013) Social resilience in online communities The autopsyof friendster In Proceedings of the First ACM Conference on Online Social Networks COSN rsquo13pp 39ndash50

19 Torkjazi M Rejaie R Willinger W (2009) Hot today gone tomorrow On the migration of myspaceusers In Proceedings of the 2Nd ACM Workshop on Online Social Networks WOSN rsquo09

20 McGee M Google penguin update 3 released impacts 03 of english-language queries URLhttpsearchenginelandcomgoogle-penguin-update-3-135527 Accessed 2013-11-20

21 (2013) Facebook is not only the worldrsquos largest social network it is also the fastest growing URLhttptechcrunchcom20080812facebook-is-not-only-the-worlds-largest-social-network-it-is-als

22 Facebook q3 2013 earnings call transcript URL httpinvestorfbcomresultscfm Ac-cessed 2013-12-02

23 Miller D (Nov 24 2013) What will we learn from the fall of facebook URLhttpblogsuclacuk

24 URL httpwwwmathworkscom

25 Lagarias J Reeds J Wright M Wright P (1998) Convergence properties of the nelderndashmead simplexmethod in low dimensions SIAM Journal on Optimization 9 112-147

26 Dormand J Prince P (1980) A family of embedded runge-kutta formulae Journal of Computationaland Applied Mathematics 6 19 - 26

9

115

95

105

125

12012 42012 72012 72013102012 12013 42013

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 1 Google search query data for ldquoFacebookrdquo between January 2012 and July 2013before and after removal of the artifactual October 2012 jump in search queries Both datasets are scaled such that 100 corresponds to the maximum weekly Google search queries for the set withthe jump removed over the plotted time period

2012201020082006 2014112004

100

80

60

40

20

MySpace

Facebook

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 2 Search query data for ldquoFacebookrdquo and ldquoMySpacerdquo obtained from Google Trendsoverlaid on top of each other The data for both curves are scaled such that 100 corresponds to themaximum weekly Google search queries for rdquoFacebookrdquo over the plotted time period Search queries forldquoMySpacerdquo peak at 10 of the maximum weekly search queries for ldquoFacebookrdquo in this time period

Figure Legends

Tables

10

100

80

60

40

20

112004 2008 2012

MySpace

best fitSIR model

best fitirSIR model

2004 2008 2012

MySpace

Date Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 3 Data for search query ldquoMyspacerdquo with best fit (a) SIR and (b) irSIR modelsoverlaid The search query data are normalized such that the maximum data point corresponds to avalue of 100

Facebook

best fitirSIR model

predictionbound curves

112004 2008 2012 2016 2020

100

80

60

40

20

Date

Norm

alized w

eekly

sear

ch q

uerie

s

early late

Figure 4 Data for search query ldquoFacebookrdquo with best fit irSIR model overlaid Predictionbounds corresponding to the earliest and latest decline times with a 15 increase in SSE over the bestfit are overlaid The data are scaled such that the maximum weekly search queries for ldquoFacebookrdquocorresponds to a value of 100

11

Table 1 Epidemiological model parameter definitionsSymbol Units Disease Model Parameter Equivalent OSN Model Parameter

S People Susceptible Potential OSN usersI People Infected OSN usersR People RecoveredImmune Population opposed to OSN useβ Timeminus1 Infection rate Rate at which potential users join OSNγ Timeminus1 recovery rate -ν Timeminus1 - OSN abandonment rate

Summary of parameters in the epidemiological models and their equivalent OSN analogues

Table 2 Epidemiological model fitting parametersFit β ν S0

NI0N

R0

NN SSE 20 date

MySpace SIR 492 middot 10minus2 539 (γ) 0996 41 middot 10minus3 0 (fixed) 322 167 middot 104 122010MySpace irSIR 598 middot 10minus2 268 middot 10minus2 0992 949 middot 10minus4 719 middot 10minus3 9294 433 middot 103 112010Facebook best 336 middot 10minus2 498 middot 10minus2 100 643 middot 10minus5 235 middot 10minus6 945 134 middot 103 12015Facebook early 343 middot 10minus2 823 middot 10minus2 100 547 middot 10minus5 104 middot 10minus9 9361 153 middot 103 072014Facebook late 327 middot 10minus2 271 middot 10minus2 100 809 middot 10minus5 410 middot 10minus4 970 154 middot 103 022016

Table showing fitting parameter values from fitting of search query data with epidemiological models

Page 4: Epidemiological modeling of online social network dynamics

4

S0

R0

ltγ

β(4)

Similar to the implications of Equation 2 in the traditional SIR model if this criterion is satisfiedthen I can never increase One might also wonder if an analogous criterion for ldquoimmunizationrdquo againstrecovery can be derived for the irSIR model since the recovery and infection dynamics are now the sameIn the context of OSNs immunization against recovery would ensure that the OSN always grows in timeLooking at Equation 3c one can see that unless R0 = 0 R is always positive and R must always increasein time The implication is that there is no way to ldquoimmunizerdquo the infected population compartmentagainst infectious recovery in the irSIR model In the context of the irSIR model all OSNs are expectedto eventually decline

Methods

Google Trends Search Query Data

To test the theory of disease-like adoption dynamics of OSNs we use publicly available historical Googlesearch query data as a proxy for OSN usership as has been done in previous work [18] The public natureof this data is an important feature of the approach used in this paper as historical OSN user activitydata is typically proprietary and difficult to obtain [19] The Google search query data is obtained fromGooglersquos ldquoGoogle Trendsrdquo service and reports the relative number of Google search queries for a givensearch term [10] The use of search query data for the study of OSN adoption is advantageous comparedto using registration or membership data in that search query data provides a measure of the level of webtraffic for a given OSN Web traffic is arguably the best metric for OSN health in that it represents ameasure of user activity or interest within the network For example inactive members do not contributeto a social network and would not be counted using search data but would be counted using other metricssuch as registered account data Thus OSN usership as measured by search query data can be thoughtof as representing a less tangible albeit more meaningful metric of user activity or interest

The search query data obtained from Google Trends is presented in arbitrary units of weekly searchquery frequency which are normalized such that the maximum data point in the selected time periodcorresponds to a value of 100 The use of weekly search query data is advantageous over daily databecause it eliminates periodic variation due to day of week For example search queries for ldquoFacebookrdquotend to peak during the weekend The resolution of this search query data is 1 unit To increase theresolution of the search query data for model fitting shorter time periods of data are stitched togetherFor example consider two consecutive sets of weekly search query data chosen such that the last week ofthe first data set and the first week of the second data set are the same If that week has a value of 90 inthe first data set and 100 in the second then the two data sets can be stitched together by multiplyingthe second data set by a factor of 90100 so that they are now on the same scale This process can berepeated with data spanning the entire time range of interest and the resulting data set is subsequentlynormalized such that 100 corresponds to the maximum data point The ultimate result is a data set withresolution many times higher than the original 1 unit

The Google search query data for the term Facebook shows a large jump in search queries during earlyOctober 2012 as shown in Figure 1 This jump is assumed to be artifactual as it is unlikely that searchactivity for Facebook suddenly jumped by 20 in under a week and remained consistently at this elevatedlevel in the weeks following The authors believe this artifact is due to the October 5 2012 upgrade ofGoogles search algorithm to Penguin 3 [20] which coincides with the reported jump in Facebook searchqueries A similar jump albeit smaller in magnitude can also be observed in the Google search querydata for other search terms such as ldquoTwitterrdquo To the authors knowledge there is no publicly availableexplanation for how this update should affect the Google search query data reported by Google Trends

5

This jump is removed for the purposes of data analysis in this paper by multiplying the search querydata occurring after the week ending October 6 2012 by a factor of 0804 A multiplicative factor is usedinstead of subtraction of a constant because it is assumed to be more likely that an artifactual change insearch query reporting would affect all search queries equally as opposed to simply removing a constantamount of searches queries each week

The Google data for search query ldquoMySpacerdquo and search query ldquoFacebookrdquo are presented in Figure 2In Figure 2 the MySpace data is normalized to the same scale as the Facebook data in order to comparethe relative number of search queries for both OSNs Figure 2 shows a number of important featuresthat qualitatively validate the use of search query data as a proxy for OSN usership The shape of theMySpace curve in Figure 2 is in qualitative agreement with the timeline for the rise and fall of MySpacediscussed in Section [3] exhibiting a rise in 2005 a peak between 2007 and 2008 and a steady declinebetween 2009 and 2011 The intersection of the MySpace and Facebook curves occurs near April 2008the month in which Facebook is reported to have overtaken MySpace in terms of global web traffic [21]The Facebook curve in Figure 2 shows a decline in search activity starting in 2013 which corroboratesreports that the OSN started losing some of its younger user base in 2013 [22 23]

Search Query Data Curve Fitting

The Google search query data for the OSN is assumed to be representative of the magnitude of theinfected population compartment of the previously discussed epidemiological models Recall that theinfected compartment in the context of OSNs is the active user base of the OSN The epidemiologicalmodels are used to analyze the search query data by curve fitting the infected I(t) population curvegenerated by the model to the search query data The curve fitting is carried out using the MATLABcomputational software package [24] using the following approach to determine the best fit curve

To find the best fit curve an I(t) curve must be generated and then compared to the search querydata set This is achieved by choosing initial input parameters S0 I0 R0 β and ν and then solvingthe system of differential equations to determine the corresponding I(t) The sum of squared error (SSE)between the weekly search query data points and the generated I(t) curve is then calculated The bestfit curve is defined as the I(t) curve that minimizes the calculated SSE The best fit curve is determinedusing a built in MATLAB function fminsearch which uses a Nelder-Mead simplex method [25] tofind determine the set of 5 input parameters corresponding to the best fit I(t) curve This optimizationscheme uses a Dormand-Prince (Runge-Kutta 45) method [26] to compute S(t) I(t) and R(t) Oncethe best fit curve is determined the corresponding input parameters are saved so that the best fit curvecan now be generated by re-solving the system of differential equations using the best fit parameters

Results and discussion

MySpace

To validate the use of epidemiological models in OSN adoption dynamics we first consider the case ofMySpace MySpace is a particularly useful case study for model validation because it represents one ofthe largest OSNs in history to exhibit the full life cycle of an OSN from rise to fall MySpace also hasthe added advantage of having its entire lifespan occur within the range of search query data availablefrom Google Trends which is only available after January 2004 The data for search query MySpace isshown in Figure 3 as the solid blue line This is the same search query data that is presented in Figure2 but in Figure 3 the data is normalized such that the maximum value of weekly Google search queriesfor ldquoMySpacerdquo corresponds to a value of 100

To fit the Myspace data we first apply the SIR model As shown in Figure 3a the best fit SIRmodel provides a somewhat qualitative description of the search query data although it clearly fails to

6

fit near the shoulders of the curve The relatively poor fit is expected given that the constant recoveryrate assumption in the SIR model does not intuitively describe the adoption and abandonment dynamicsof OSNs Rather as discussed in detail in Section ldquorecoveryrdquo spreads infectiously users begin to leavethe OSN after their peers have left the OSN

To capture the intuitive infectious recovery dynamics for OSN abandonment the irSIR model isapplied to the Myspace data as shown in Fig 3b The irSIR model provides a significantly betterdescription of the search query data hugging the shoulders of the curve more tightly and resulting in a75 drop in SSE as seen in Table 2 The quality of fit of the irSIR model lends validity to the notion ofinfectious user abandonment of OSNs

Facebook

Having validated the application of the irSIR model to the adoption dynamics of OSNs we now applythe irSIR model to the Google search query data for Facebook Facebook presents an interesting casestudy as it is the largest OSN in history Moreover the search query data suggests that Facebook hasalready reached the peak of its popularity and has entered a decline phase as evidenced by the downwardtrend in search frequency after 2012 The presence of a year-long decline portion of the curve allows fordetermination of best fit R0 and ν parameters both of which are sensitive to the decline dynamics Giventhe early stage of Facebookrsquos decline at the time of writing the best fit irSIR model can be used not onlyto provide an explanation for the observed curve but also to forecast how the OSN will decline in thefuture according to the assumptions of the irSIR model

Applying the irSIR model to Facebook shows a high quality fit for the available search query data overthe time period of January 2004 to the last reported data point at the time of writing Extrapolating thebest fit into the future shows that Facebook is expected to undergo rapid decline in the upcoming yearsshrinking to 20 of its maximum size by December 2014 We will use 20 as an arbitrary definition forend of life so that we can define a so-called ldquo20 daterdquo as the OSNrsquos end of life date

To illustrate the strength of the prediction upper and lower model prediction bounds for the aban-donment of Facebook are overlaid alongside the best fit irSIR model in Figure 4 The correspondingfitting parameter values are tabulated in Table 2 The prediction bounds plotted in Figure 4 representthe two curves with the earliest and latest 20 dates that exhibit less than or equal to 15 higher SSEthan the true best fit curve Thus all possible irSIR model solutions with an increase in SSE of lessthan a 15 over the best fit are bounded between the plotted prediction bounds The presentation ofa bounded prediction region emphasizes that the plotted best fit curve should be thought of as the bestforecast given the available data

The differences between the 15 SSE prediction bound curves and the best fit curve are mainlydetermined by the R0 and ν parameters which follows from the importance of these parameters indescribing the abandonment of the OSN in the irSIR model The early prediction bound curve has ahigher ν parameter than the best fit model which is responsible for the higher rate of decline in the earlyprediction bound To compensate for the rapid decline R0 is decreased such that the onset of the declineis later and the error bound coincides with declining data occurring after 2012 The opposite is true forthe late prediction bound curve which has a lower ν and higher R0 than the best fit This results in acurve that declines slower than the best fit but also begins declining earlier such that it coincides withthe 2013 data

It is also important to understand the nature of the asymmetry of the prediction bounds Thedifference in 20 dates between the late bound and the best fit model is much larger than the differencein 20 dates between the best fit model and the early bound Since the early and late prediction boundcurves can be thought of as equally probable future trajectories for Facebook it follows that Facebookis more likely to deviate from the best fit curve on the side of slower decline The reason for thisasymmetry is that the decline is entirely dictated by data occurring in after 2012 If the post-2012 datais ignored a solution in which Facebook continues indefinitely at a constant size is possible with a finite

7

SSE determined mainly by deviation from the post-2012 data This implies that there exists an infiniterange of possible slower declining solutions than the best fit all with SSE values bounded primarily bydeviation from the post-2012 data Conversely solutions declining faster than the best fit are boundedby a lower limit set by the date of the last reported data point in the data set

Conclusions

In this paper we have applied a modified epidemiological model to describe the adoption and abandonmentdynamics of user activity of online social networks Using publicly available Google data for search queryMyspace as a case study we showed that the traditional SIR model for modeling disease dynamicsprovides a poor description of the data A 75 decrease in SSE is achieved by modifying the traditionalSIR model to incorporate infectious recovery dynamics which is a better description of OSN dynamicsHaving validated the irSIR model of OSN dynamics on Google data for search query Myspace we thenapplied the model to the Google data for search query Facebook Extrapolating the best fit model intothe future suggests that Facebook will undergo a rapid decline in the coming years losing 80 of its peakuser base between 2015 and 2017

Acknowledgments

This publication is the culmination of talk originally presented at the 2012 Princeton Research Symposiumtitled rdquoThe social network disease epidemiology and the demise of Facebookrdquo The authors acknowledgePRS for its provision of an intellectually stimulating forum for the discussion of research without whichthis work would not have been conceived The authors acknowledge the Princeton Graduate Schoolspecifically Dean William B Russel for continued support of PRS and help in meeting the costs ofpublication In addition the authors acknowledge Professor Craig B Arnold for fruitful golf discussion

References

1 Boyd D Ellison N (2007) Social network sites Definition history and scholarship Journal ofComputer-Mediated Communication 13 210-230

2 Google finance URL httpfinancegooglecom Accessed 2013-11-20

3 Gillette F (2011) The rise and inglorious fall of myspace URL httpcligs94lb3yf

4 (2012) Communication Technology Update and Fundamentals Taylor and Francis

5 Wu S Das Sarma A Fabrikant A Lattanzi S Tomkins A (2013) Arrival and departure dynamicsin social networks In Proceedings of the Sixth ACM International Conference on Web Searchand Data Mining WSDM rsquo13 pp 233ndash242

6 Goffman W (1966) Mathematical approach to the spread of scientific ideasthe history of mast cellresearch Nature 449452

7 Watts DJ (2002) A simple model of global cascades on random networks Proceedings of theNational Academy of Sciences 99 5766-5771

8 Bartholomew DJ (1984) Recent developments in nonlinear stochastic modelling of social processesCanadian Journal of Statistics 12 39ndash52

8

9 Bettencourt LM Cintrn-Arias A Kaiser DI Castillo-Chvez C (2006) The power of a good ideaQuantitative modeling of the spread of ideas from epidemiological models Physica A StatisticalMechanics and its Applications 364 513 - 536

10 Google trends URL httptrendsgooglecom Accessed 2014-1-6

11 Detecting influenza epidemics using search engine query data Nature 457 1012-1014

12 Choi H Varian H (2012) Predicting the present with google trends Economic Record 88 2-9

13 T Preis HSM Stanley HE (2013) Quantifying trading behavior in financial markets using googletrends Scientific Reports 3

14 Google flu trends URL httpwwwgoogleorgflutrends Accessed 2013-12-23

15 Carneiro HA Mylonakis E (2009) Google trends A web-based tool for real-time surveillance ofdisease outbreaks Clinical Infectious Diseases 49 1557-1564

16 Bailey N (1987) Mathematical Theory of Infectious Diseases Mathematics in Medicine SeriesOxford University Press Incorporated

17 Dasgupta K Singh R Viswanathan B Chakraborty D Mukherjea S et al (2008) Social ties andtheir relevance to churn in mobile telecom networks In Proceedings of the 11th InternationalConference on Extending Database Technology Advances in Database Technology EDBT rsquo08 pp668ndash677

18 Garcia D Mavrodiev P Schweitzer F (2013) Social resilience in online communities The autopsyof friendster In Proceedings of the First ACM Conference on Online Social Networks COSN rsquo13pp 39ndash50

19 Torkjazi M Rejaie R Willinger W (2009) Hot today gone tomorrow On the migration of myspaceusers In Proceedings of the 2Nd ACM Workshop on Online Social Networks WOSN rsquo09

20 McGee M Google penguin update 3 released impacts 03 of english-language queries URLhttpsearchenginelandcomgoogle-penguin-update-3-135527 Accessed 2013-11-20

21 (2013) Facebook is not only the worldrsquos largest social network it is also the fastest growing URLhttptechcrunchcom20080812facebook-is-not-only-the-worlds-largest-social-network-it-is-als

22 Facebook q3 2013 earnings call transcript URL httpinvestorfbcomresultscfm Ac-cessed 2013-12-02

23 Miller D (Nov 24 2013) What will we learn from the fall of facebook URLhttpblogsuclacuk

24 URL httpwwwmathworkscom

25 Lagarias J Reeds J Wright M Wright P (1998) Convergence properties of the nelderndashmead simplexmethod in low dimensions SIAM Journal on Optimization 9 112-147

26 Dormand J Prince P (1980) A family of embedded runge-kutta formulae Journal of Computationaland Applied Mathematics 6 19 - 26

9

115

95

105

125

12012 42012 72012 72013102012 12013 42013

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 1 Google search query data for ldquoFacebookrdquo between January 2012 and July 2013before and after removal of the artifactual October 2012 jump in search queries Both datasets are scaled such that 100 corresponds to the maximum weekly Google search queries for the set withthe jump removed over the plotted time period

2012201020082006 2014112004

100

80

60

40

20

MySpace

Facebook

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 2 Search query data for ldquoFacebookrdquo and ldquoMySpacerdquo obtained from Google Trendsoverlaid on top of each other The data for both curves are scaled such that 100 corresponds to themaximum weekly Google search queries for rdquoFacebookrdquo over the plotted time period Search queries forldquoMySpacerdquo peak at 10 of the maximum weekly search queries for ldquoFacebookrdquo in this time period

Figure Legends

Tables

10

100

80

60

40

20

112004 2008 2012

MySpace

best fitSIR model

best fitirSIR model

2004 2008 2012

MySpace

Date Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 3 Data for search query ldquoMyspacerdquo with best fit (a) SIR and (b) irSIR modelsoverlaid The search query data are normalized such that the maximum data point corresponds to avalue of 100

Facebook

best fitirSIR model

predictionbound curves

112004 2008 2012 2016 2020

100

80

60

40

20

Date

Norm

alized w

eekly

sear

ch q

uerie

s

early late

Figure 4 Data for search query ldquoFacebookrdquo with best fit irSIR model overlaid Predictionbounds corresponding to the earliest and latest decline times with a 15 increase in SSE over the bestfit are overlaid The data are scaled such that the maximum weekly search queries for ldquoFacebookrdquocorresponds to a value of 100

11

Table 1 Epidemiological model parameter definitionsSymbol Units Disease Model Parameter Equivalent OSN Model Parameter

S People Susceptible Potential OSN usersI People Infected OSN usersR People RecoveredImmune Population opposed to OSN useβ Timeminus1 Infection rate Rate at which potential users join OSNγ Timeminus1 recovery rate -ν Timeminus1 - OSN abandonment rate

Summary of parameters in the epidemiological models and their equivalent OSN analogues

Table 2 Epidemiological model fitting parametersFit β ν S0

NI0N

R0

NN SSE 20 date

MySpace SIR 492 middot 10minus2 539 (γ) 0996 41 middot 10minus3 0 (fixed) 322 167 middot 104 122010MySpace irSIR 598 middot 10minus2 268 middot 10minus2 0992 949 middot 10minus4 719 middot 10minus3 9294 433 middot 103 112010Facebook best 336 middot 10minus2 498 middot 10minus2 100 643 middot 10minus5 235 middot 10minus6 945 134 middot 103 12015Facebook early 343 middot 10minus2 823 middot 10minus2 100 547 middot 10minus5 104 middot 10minus9 9361 153 middot 103 072014Facebook late 327 middot 10minus2 271 middot 10minus2 100 809 middot 10minus5 410 middot 10minus4 970 154 middot 103 022016

Table showing fitting parameter values from fitting of search query data with epidemiological models

Page 5: Epidemiological modeling of online social network dynamics

5

This jump is removed for the purposes of data analysis in this paper by multiplying the search querydata occurring after the week ending October 6 2012 by a factor of 0804 A multiplicative factor is usedinstead of subtraction of a constant because it is assumed to be more likely that an artifactual change insearch query reporting would affect all search queries equally as opposed to simply removing a constantamount of searches queries each week

The Google data for search query ldquoMySpacerdquo and search query ldquoFacebookrdquo are presented in Figure 2In Figure 2 the MySpace data is normalized to the same scale as the Facebook data in order to comparethe relative number of search queries for both OSNs Figure 2 shows a number of important featuresthat qualitatively validate the use of search query data as a proxy for OSN usership The shape of theMySpace curve in Figure 2 is in qualitative agreement with the timeline for the rise and fall of MySpacediscussed in Section [3] exhibiting a rise in 2005 a peak between 2007 and 2008 and a steady declinebetween 2009 and 2011 The intersection of the MySpace and Facebook curves occurs near April 2008the month in which Facebook is reported to have overtaken MySpace in terms of global web traffic [21]The Facebook curve in Figure 2 shows a decline in search activity starting in 2013 which corroboratesreports that the OSN started losing some of its younger user base in 2013 [22 23]

Search Query Data Curve Fitting

The Google search query data for the OSN is assumed to be representative of the magnitude of theinfected population compartment of the previously discussed epidemiological models Recall that theinfected compartment in the context of OSNs is the active user base of the OSN The epidemiologicalmodels are used to analyze the search query data by curve fitting the infected I(t) population curvegenerated by the model to the search query data The curve fitting is carried out using the MATLABcomputational software package [24] using the following approach to determine the best fit curve

To find the best fit curve an I(t) curve must be generated and then compared to the search querydata set This is achieved by choosing initial input parameters S0 I0 R0 β and ν and then solvingthe system of differential equations to determine the corresponding I(t) The sum of squared error (SSE)between the weekly search query data points and the generated I(t) curve is then calculated The bestfit curve is defined as the I(t) curve that minimizes the calculated SSE The best fit curve is determinedusing a built in MATLAB function fminsearch which uses a Nelder-Mead simplex method [25] tofind determine the set of 5 input parameters corresponding to the best fit I(t) curve This optimizationscheme uses a Dormand-Prince (Runge-Kutta 45) method [26] to compute S(t) I(t) and R(t) Oncethe best fit curve is determined the corresponding input parameters are saved so that the best fit curvecan now be generated by re-solving the system of differential equations using the best fit parameters

Results and discussion

MySpace

To validate the use of epidemiological models in OSN adoption dynamics we first consider the case ofMySpace MySpace is a particularly useful case study for model validation because it represents one ofthe largest OSNs in history to exhibit the full life cycle of an OSN from rise to fall MySpace also hasthe added advantage of having its entire lifespan occur within the range of search query data availablefrom Google Trends which is only available after January 2004 The data for search query MySpace isshown in Figure 3 as the solid blue line This is the same search query data that is presented in Figure2 but in Figure 3 the data is normalized such that the maximum value of weekly Google search queriesfor ldquoMySpacerdquo corresponds to a value of 100

To fit the Myspace data we first apply the SIR model As shown in Figure 3a the best fit SIRmodel provides a somewhat qualitative description of the search query data although it clearly fails to

6

fit near the shoulders of the curve The relatively poor fit is expected given that the constant recoveryrate assumption in the SIR model does not intuitively describe the adoption and abandonment dynamicsof OSNs Rather as discussed in detail in Section ldquorecoveryrdquo spreads infectiously users begin to leavethe OSN after their peers have left the OSN

To capture the intuitive infectious recovery dynamics for OSN abandonment the irSIR model isapplied to the Myspace data as shown in Fig 3b The irSIR model provides a significantly betterdescription of the search query data hugging the shoulders of the curve more tightly and resulting in a75 drop in SSE as seen in Table 2 The quality of fit of the irSIR model lends validity to the notion ofinfectious user abandonment of OSNs

Facebook

Having validated the application of the irSIR model to the adoption dynamics of OSNs we now applythe irSIR model to the Google search query data for Facebook Facebook presents an interesting casestudy as it is the largest OSN in history Moreover the search query data suggests that Facebook hasalready reached the peak of its popularity and has entered a decline phase as evidenced by the downwardtrend in search frequency after 2012 The presence of a year-long decline portion of the curve allows fordetermination of best fit R0 and ν parameters both of which are sensitive to the decline dynamics Giventhe early stage of Facebookrsquos decline at the time of writing the best fit irSIR model can be used not onlyto provide an explanation for the observed curve but also to forecast how the OSN will decline in thefuture according to the assumptions of the irSIR model

Applying the irSIR model to Facebook shows a high quality fit for the available search query data overthe time period of January 2004 to the last reported data point at the time of writing Extrapolating thebest fit into the future shows that Facebook is expected to undergo rapid decline in the upcoming yearsshrinking to 20 of its maximum size by December 2014 We will use 20 as an arbitrary definition forend of life so that we can define a so-called ldquo20 daterdquo as the OSNrsquos end of life date

To illustrate the strength of the prediction upper and lower model prediction bounds for the aban-donment of Facebook are overlaid alongside the best fit irSIR model in Figure 4 The correspondingfitting parameter values are tabulated in Table 2 The prediction bounds plotted in Figure 4 representthe two curves with the earliest and latest 20 dates that exhibit less than or equal to 15 higher SSEthan the true best fit curve Thus all possible irSIR model solutions with an increase in SSE of lessthan a 15 over the best fit are bounded between the plotted prediction bounds The presentation ofa bounded prediction region emphasizes that the plotted best fit curve should be thought of as the bestforecast given the available data

The differences between the 15 SSE prediction bound curves and the best fit curve are mainlydetermined by the R0 and ν parameters which follows from the importance of these parameters indescribing the abandonment of the OSN in the irSIR model The early prediction bound curve has ahigher ν parameter than the best fit model which is responsible for the higher rate of decline in the earlyprediction bound To compensate for the rapid decline R0 is decreased such that the onset of the declineis later and the error bound coincides with declining data occurring after 2012 The opposite is true forthe late prediction bound curve which has a lower ν and higher R0 than the best fit This results in acurve that declines slower than the best fit but also begins declining earlier such that it coincides withthe 2013 data

It is also important to understand the nature of the asymmetry of the prediction bounds Thedifference in 20 dates between the late bound and the best fit model is much larger than the differencein 20 dates between the best fit model and the early bound Since the early and late prediction boundcurves can be thought of as equally probable future trajectories for Facebook it follows that Facebookis more likely to deviate from the best fit curve on the side of slower decline The reason for thisasymmetry is that the decline is entirely dictated by data occurring in after 2012 If the post-2012 datais ignored a solution in which Facebook continues indefinitely at a constant size is possible with a finite

7

SSE determined mainly by deviation from the post-2012 data This implies that there exists an infiniterange of possible slower declining solutions than the best fit all with SSE values bounded primarily bydeviation from the post-2012 data Conversely solutions declining faster than the best fit are boundedby a lower limit set by the date of the last reported data point in the data set

Conclusions

In this paper we have applied a modified epidemiological model to describe the adoption and abandonmentdynamics of user activity of online social networks Using publicly available Google data for search queryMyspace as a case study we showed that the traditional SIR model for modeling disease dynamicsprovides a poor description of the data A 75 decrease in SSE is achieved by modifying the traditionalSIR model to incorporate infectious recovery dynamics which is a better description of OSN dynamicsHaving validated the irSIR model of OSN dynamics on Google data for search query Myspace we thenapplied the model to the Google data for search query Facebook Extrapolating the best fit model intothe future suggests that Facebook will undergo a rapid decline in the coming years losing 80 of its peakuser base between 2015 and 2017

Acknowledgments

This publication is the culmination of talk originally presented at the 2012 Princeton Research Symposiumtitled rdquoThe social network disease epidemiology and the demise of Facebookrdquo The authors acknowledgePRS for its provision of an intellectually stimulating forum for the discussion of research without whichthis work would not have been conceived The authors acknowledge the Princeton Graduate Schoolspecifically Dean William B Russel for continued support of PRS and help in meeting the costs ofpublication In addition the authors acknowledge Professor Craig B Arnold for fruitful golf discussion

References

1 Boyd D Ellison N (2007) Social network sites Definition history and scholarship Journal ofComputer-Mediated Communication 13 210-230

2 Google finance URL httpfinancegooglecom Accessed 2013-11-20

3 Gillette F (2011) The rise and inglorious fall of myspace URL httpcligs94lb3yf

4 (2012) Communication Technology Update and Fundamentals Taylor and Francis

5 Wu S Das Sarma A Fabrikant A Lattanzi S Tomkins A (2013) Arrival and departure dynamicsin social networks In Proceedings of the Sixth ACM International Conference on Web Searchand Data Mining WSDM rsquo13 pp 233ndash242

6 Goffman W (1966) Mathematical approach to the spread of scientific ideasthe history of mast cellresearch Nature 449452

7 Watts DJ (2002) A simple model of global cascades on random networks Proceedings of theNational Academy of Sciences 99 5766-5771

8 Bartholomew DJ (1984) Recent developments in nonlinear stochastic modelling of social processesCanadian Journal of Statistics 12 39ndash52

8

9 Bettencourt LM Cintrn-Arias A Kaiser DI Castillo-Chvez C (2006) The power of a good ideaQuantitative modeling of the spread of ideas from epidemiological models Physica A StatisticalMechanics and its Applications 364 513 - 536

10 Google trends URL httptrendsgooglecom Accessed 2014-1-6

11 Detecting influenza epidemics using search engine query data Nature 457 1012-1014

12 Choi H Varian H (2012) Predicting the present with google trends Economic Record 88 2-9

13 T Preis HSM Stanley HE (2013) Quantifying trading behavior in financial markets using googletrends Scientific Reports 3

14 Google flu trends URL httpwwwgoogleorgflutrends Accessed 2013-12-23

15 Carneiro HA Mylonakis E (2009) Google trends A web-based tool for real-time surveillance ofdisease outbreaks Clinical Infectious Diseases 49 1557-1564

16 Bailey N (1987) Mathematical Theory of Infectious Diseases Mathematics in Medicine SeriesOxford University Press Incorporated

17 Dasgupta K Singh R Viswanathan B Chakraborty D Mukherjea S et al (2008) Social ties andtheir relevance to churn in mobile telecom networks In Proceedings of the 11th InternationalConference on Extending Database Technology Advances in Database Technology EDBT rsquo08 pp668ndash677

18 Garcia D Mavrodiev P Schweitzer F (2013) Social resilience in online communities The autopsyof friendster In Proceedings of the First ACM Conference on Online Social Networks COSN rsquo13pp 39ndash50

19 Torkjazi M Rejaie R Willinger W (2009) Hot today gone tomorrow On the migration of myspaceusers In Proceedings of the 2Nd ACM Workshop on Online Social Networks WOSN rsquo09

20 McGee M Google penguin update 3 released impacts 03 of english-language queries URLhttpsearchenginelandcomgoogle-penguin-update-3-135527 Accessed 2013-11-20

21 (2013) Facebook is not only the worldrsquos largest social network it is also the fastest growing URLhttptechcrunchcom20080812facebook-is-not-only-the-worlds-largest-social-network-it-is-als

22 Facebook q3 2013 earnings call transcript URL httpinvestorfbcomresultscfm Ac-cessed 2013-12-02

23 Miller D (Nov 24 2013) What will we learn from the fall of facebook URLhttpblogsuclacuk

24 URL httpwwwmathworkscom

25 Lagarias J Reeds J Wright M Wright P (1998) Convergence properties of the nelderndashmead simplexmethod in low dimensions SIAM Journal on Optimization 9 112-147

26 Dormand J Prince P (1980) A family of embedded runge-kutta formulae Journal of Computationaland Applied Mathematics 6 19 - 26

9

115

95

105

125

12012 42012 72012 72013102012 12013 42013

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 1 Google search query data for ldquoFacebookrdquo between January 2012 and July 2013before and after removal of the artifactual October 2012 jump in search queries Both datasets are scaled such that 100 corresponds to the maximum weekly Google search queries for the set withthe jump removed over the plotted time period

2012201020082006 2014112004

100

80

60

40

20

MySpace

Facebook

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 2 Search query data for ldquoFacebookrdquo and ldquoMySpacerdquo obtained from Google Trendsoverlaid on top of each other The data for both curves are scaled such that 100 corresponds to themaximum weekly Google search queries for rdquoFacebookrdquo over the plotted time period Search queries forldquoMySpacerdquo peak at 10 of the maximum weekly search queries for ldquoFacebookrdquo in this time period

Figure Legends

Tables

10

100

80

60

40

20

112004 2008 2012

MySpace

best fitSIR model

best fitirSIR model

2004 2008 2012

MySpace

Date Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 3 Data for search query ldquoMyspacerdquo with best fit (a) SIR and (b) irSIR modelsoverlaid The search query data are normalized such that the maximum data point corresponds to avalue of 100

Facebook

best fitirSIR model

predictionbound curves

112004 2008 2012 2016 2020

100

80

60

40

20

Date

Norm

alized w

eekly

sear

ch q

uerie

s

early late

Figure 4 Data for search query ldquoFacebookrdquo with best fit irSIR model overlaid Predictionbounds corresponding to the earliest and latest decline times with a 15 increase in SSE over the bestfit are overlaid The data are scaled such that the maximum weekly search queries for ldquoFacebookrdquocorresponds to a value of 100

11

Table 1 Epidemiological model parameter definitionsSymbol Units Disease Model Parameter Equivalent OSN Model Parameter

S People Susceptible Potential OSN usersI People Infected OSN usersR People RecoveredImmune Population opposed to OSN useβ Timeminus1 Infection rate Rate at which potential users join OSNγ Timeminus1 recovery rate -ν Timeminus1 - OSN abandonment rate

Summary of parameters in the epidemiological models and their equivalent OSN analogues

Table 2 Epidemiological model fitting parametersFit β ν S0

NI0N

R0

NN SSE 20 date

MySpace SIR 492 middot 10minus2 539 (γ) 0996 41 middot 10minus3 0 (fixed) 322 167 middot 104 122010MySpace irSIR 598 middot 10minus2 268 middot 10minus2 0992 949 middot 10minus4 719 middot 10minus3 9294 433 middot 103 112010Facebook best 336 middot 10minus2 498 middot 10minus2 100 643 middot 10minus5 235 middot 10minus6 945 134 middot 103 12015Facebook early 343 middot 10minus2 823 middot 10minus2 100 547 middot 10minus5 104 middot 10minus9 9361 153 middot 103 072014Facebook late 327 middot 10minus2 271 middot 10minus2 100 809 middot 10minus5 410 middot 10minus4 970 154 middot 103 022016

Table showing fitting parameter values from fitting of search query data with epidemiological models

Page 6: Epidemiological modeling of online social network dynamics

6

fit near the shoulders of the curve The relatively poor fit is expected given that the constant recoveryrate assumption in the SIR model does not intuitively describe the adoption and abandonment dynamicsof OSNs Rather as discussed in detail in Section ldquorecoveryrdquo spreads infectiously users begin to leavethe OSN after their peers have left the OSN

To capture the intuitive infectious recovery dynamics for OSN abandonment the irSIR model isapplied to the Myspace data as shown in Fig 3b The irSIR model provides a significantly betterdescription of the search query data hugging the shoulders of the curve more tightly and resulting in a75 drop in SSE as seen in Table 2 The quality of fit of the irSIR model lends validity to the notion ofinfectious user abandonment of OSNs

Facebook

Having validated the application of the irSIR model to the adoption dynamics of OSNs we now applythe irSIR model to the Google search query data for Facebook Facebook presents an interesting casestudy as it is the largest OSN in history Moreover the search query data suggests that Facebook hasalready reached the peak of its popularity and has entered a decline phase as evidenced by the downwardtrend in search frequency after 2012 The presence of a year-long decline portion of the curve allows fordetermination of best fit R0 and ν parameters both of which are sensitive to the decline dynamics Giventhe early stage of Facebookrsquos decline at the time of writing the best fit irSIR model can be used not onlyto provide an explanation for the observed curve but also to forecast how the OSN will decline in thefuture according to the assumptions of the irSIR model

Applying the irSIR model to Facebook shows a high quality fit for the available search query data overthe time period of January 2004 to the last reported data point at the time of writing Extrapolating thebest fit into the future shows that Facebook is expected to undergo rapid decline in the upcoming yearsshrinking to 20 of its maximum size by December 2014 We will use 20 as an arbitrary definition forend of life so that we can define a so-called ldquo20 daterdquo as the OSNrsquos end of life date

To illustrate the strength of the prediction upper and lower model prediction bounds for the aban-donment of Facebook are overlaid alongside the best fit irSIR model in Figure 4 The correspondingfitting parameter values are tabulated in Table 2 The prediction bounds plotted in Figure 4 representthe two curves with the earliest and latest 20 dates that exhibit less than or equal to 15 higher SSEthan the true best fit curve Thus all possible irSIR model solutions with an increase in SSE of lessthan a 15 over the best fit are bounded between the plotted prediction bounds The presentation ofa bounded prediction region emphasizes that the plotted best fit curve should be thought of as the bestforecast given the available data

The differences between the 15 SSE prediction bound curves and the best fit curve are mainlydetermined by the R0 and ν parameters which follows from the importance of these parameters indescribing the abandonment of the OSN in the irSIR model The early prediction bound curve has ahigher ν parameter than the best fit model which is responsible for the higher rate of decline in the earlyprediction bound To compensate for the rapid decline R0 is decreased such that the onset of the declineis later and the error bound coincides with declining data occurring after 2012 The opposite is true forthe late prediction bound curve which has a lower ν and higher R0 than the best fit This results in acurve that declines slower than the best fit but also begins declining earlier such that it coincides withthe 2013 data

It is also important to understand the nature of the asymmetry of the prediction bounds Thedifference in 20 dates between the late bound and the best fit model is much larger than the differencein 20 dates between the best fit model and the early bound Since the early and late prediction boundcurves can be thought of as equally probable future trajectories for Facebook it follows that Facebookis more likely to deviate from the best fit curve on the side of slower decline The reason for thisasymmetry is that the decline is entirely dictated by data occurring in after 2012 If the post-2012 datais ignored a solution in which Facebook continues indefinitely at a constant size is possible with a finite

7

SSE determined mainly by deviation from the post-2012 data This implies that there exists an infiniterange of possible slower declining solutions than the best fit all with SSE values bounded primarily bydeviation from the post-2012 data Conversely solutions declining faster than the best fit are boundedby a lower limit set by the date of the last reported data point in the data set

Conclusions

In this paper we have applied a modified epidemiological model to describe the adoption and abandonmentdynamics of user activity of online social networks Using publicly available Google data for search queryMyspace as a case study we showed that the traditional SIR model for modeling disease dynamicsprovides a poor description of the data A 75 decrease in SSE is achieved by modifying the traditionalSIR model to incorporate infectious recovery dynamics which is a better description of OSN dynamicsHaving validated the irSIR model of OSN dynamics on Google data for search query Myspace we thenapplied the model to the Google data for search query Facebook Extrapolating the best fit model intothe future suggests that Facebook will undergo a rapid decline in the coming years losing 80 of its peakuser base between 2015 and 2017

Acknowledgments

This publication is the culmination of talk originally presented at the 2012 Princeton Research Symposiumtitled rdquoThe social network disease epidemiology and the demise of Facebookrdquo The authors acknowledgePRS for its provision of an intellectually stimulating forum for the discussion of research without whichthis work would not have been conceived The authors acknowledge the Princeton Graduate Schoolspecifically Dean William B Russel for continued support of PRS and help in meeting the costs ofpublication In addition the authors acknowledge Professor Craig B Arnold for fruitful golf discussion

References

1 Boyd D Ellison N (2007) Social network sites Definition history and scholarship Journal ofComputer-Mediated Communication 13 210-230

2 Google finance URL httpfinancegooglecom Accessed 2013-11-20

3 Gillette F (2011) The rise and inglorious fall of myspace URL httpcligs94lb3yf

4 (2012) Communication Technology Update and Fundamentals Taylor and Francis

5 Wu S Das Sarma A Fabrikant A Lattanzi S Tomkins A (2013) Arrival and departure dynamicsin social networks In Proceedings of the Sixth ACM International Conference on Web Searchand Data Mining WSDM rsquo13 pp 233ndash242

6 Goffman W (1966) Mathematical approach to the spread of scientific ideasthe history of mast cellresearch Nature 449452

7 Watts DJ (2002) A simple model of global cascades on random networks Proceedings of theNational Academy of Sciences 99 5766-5771

8 Bartholomew DJ (1984) Recent developments in nonlinear stochastic modelling of social processesCanadian Journal of Statistics 12 39ndash52

8

9 Bettencourt LM Cintrn-Arias A Kaiser DI Castillo-Chvez C (2006) The power of a good ideaQuantitative modeling of the spread of ideas from epidemiological models Physica A StatisticalMechanics and its Applications 364 513 - 536

10 Google trends URL httptrendsgooglecom Accessed 2014-1-6

11 Detecting influenza epidemics using search engine query data Nature 457 1012-1014

12 Choi H Varian H (2012) Predicting the present with google trends Economic Record 88 2-9

13 T Preis HSM Stanley HE (2013) Quantifying trading behavior in financial markets using googletrends Scientific Reports 3

14 Google flu trends URL httpwwwgoogleorgflutrends Accessed 2013-12-23

15 Carneiro HA Mylonakis E (2009) Google trends A web-based tool for real-time surveillance ofdisease outbreaks Clinical Infectious Diseases 49 1557-1564

16 Bailey N (1987) Mathematical Theory of Infectious Diseases Mathematics in Medicine SeriesOxford University Press Incorporated

17 Dasgupta K Singh R Viswanathan B Chakraborty D Mukherjea S et al (2008) Social ties andtheir relevance to churn in mobile telecom networks In Proceedings of the 11th InternationalConference on Extending Database Technology Advances in Database Technology EDBT rsquo08 pp668ndash677

18 Garcia D Mavrodiev P Schweitzer F (2013) Social resilience in online communities The autopsyof friendster In Proceedings of the First ACM Conference on Online Social Networks COSN rsquo13pp 39ndash50

19 Torkjazi M Rejaie R Willinger W (2009) Hot today gone tomorrow On the migration of myspaceusers In Proceedings of the 2Nd ACM Workshop on Online Social Networks WOSN rsquo09

20 McGee M Google penguin update 3 released impacts 03 of english-language queries URLhttpsearchenginelandcomgoogle-penguin-update-3-135527 Accessed 2013-11-20

21 (2013) Facebook is not only the worldrsquos largest social network it is also the fastest growing URLhttptechcrunchcom20080812facebook-is-not-only-the-worlds-largest-social-network-it-is-als

22 Facebook q3 2013 earnings call transcript URL httpinvestorfbcomresultscfm Ac-cessed 2013-12-02

23 Miller D (Nov 24 2013) What will we learn from the fall of facebook URLhttpblogsuclacuk

24 URL httpwwwmathworkscom

25 Lagarias J Reeds J Wright M Wright P (1998) Convergence properties of the nelderndashmead simplexmethod in low dimensions SIAM Journal on Optimization 9 112-147

26 Dormand J Prince P (1980) A family of embedded runge-kutta formulae Journal of Computationaland Applied Mathematics 6 19 - 26

9

115

95

105

125

12012 42012 72012 72013102012 12013 42013

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 1 Google search query data for ldquoFacebookrdquo between January 2012 and July 2013before and after removal of the artifactual October 2012 jump in search queries Both datasets are scaled such that 100 corresponds to the maximum weekly Google search queries for the set withthe jump removed over the plotted time period

2012201020082006 2014112004

100

80

60

40

20

MySpace

Facebook

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 2 Search query data for ldquoFacebookrdquo and ldquoMySpacerdquo obtained from Google Trendsoverlaid on top of each other The data for both curves are scaled such that 100 corresponds to themaximum weekly Google search queries for rdquoFacebookrdquo over the plotted time period Search queries forldquoMySpacerdquo peak at 10 of the maximum weekly search queries for ldquoFacebookrdquo in this time period

Figure Legends

Tables

10

100

80

60

40

20

112004 2008 2012

MySpace

best fitSIR model

best fitirSIR model

2004 2008 2012

MySpace

Date Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 3 Data for search query ldquoMyspacerdquo with best fit (a) SIR and (b) irSIR modelsoverlaid The search query data are normalized such that the maximum data point corresponds to avalue of 100

Facebook

best fitirSIR model

predictionbound curves

112004 2008 2012 2016 2020

100

80

60

40

20

Date

Norm

alized w

eekly

sear

ch q

uerie

s

early late

Figure 4 Data for search query ldquoFacebookrdquo with best fit irSIR model overlaid Predictionbounds corresponding to the earliest and latest decline times with a 15 increase in SSE over the bestfit are overlaid The data are scaled such that the maximum weekly search queries for ldquoFacebookrdquocorresponds to a value of 100

11

Table 1 Epidemiological model parameter definitionsSymbol Units Disease Model Parameter Equivalent OSN Model Parameter

S People Susceptible Potential OSN usersI People Infected OSN usersR People RecoveredImmune Population opposed to OSN useβ Timeminus1 Infection rate Rate at which potential users join OSNγ Timeminus1 recovery rate -ν Timeminus1 - OSN abandonment rate

Summary of parameters in the epidemiological models and their equivalent OSN analogues

Table 2 Epidemiological model fitting parametersFit β ν S0

NI0N

R0

NN SSE 20 date

MySpace SIR 492 middot 10minus2 539 (γ) 0996 41 middot 10minus3 0 (fixed) 322 167 middot 104 122010MySpace irSIR 598 middot 10minus2 268 middot 10minus2 0992 949 middot 10minus4 719 middot 10minus3 9294 433 middot 103 112010Facebook best 336 middot 10minus2 498 middot 10minus2 100 643 middot 10minus5 235 middot 10minus6 945 134 middot 103 12015Facebook early 343 middot 10minus2 823 middot 10minus2 100 547 middot 10minus5 104 middot 10minus9 9361 153 middot 103 072014Facebook late 327 middot 10minus2 271 middot 10minus2 100 809 middot 10minus5 410 middot 10minus4 970 154 middot 103 022016

Table showing fitting parameter values from fitting of search query data with epidemiological models

Page 7: Epidemiological modeling of online social network dynamics

7

SSE determined mainly by deviation from the post-2012 data This implies that there exists an infiniterange of possible slower declining solutions than the best fit all with SSE values bounded primarily bydeviation from the post-2012 data Conversely solutions declining faster than the best fit are boundedby a lower limit set by the date of the last reported data point in the data set

Conclusions

In this paper we have applied a modified epidemiological model to describe the adoption and abandonmentdynamics of user activity of online social networks Using publicly available Google data for search queryMyspace as a case study we showed that the traditional SIR model for modeling disease dynamicsprovides a poor description of the data A 75 decrease in SSE is achieved by modifying the traditionalSIR model to incorporate infectious recovery dynamics which is a better description of OSN dynamicsHaving validated the irSIR model of OSN dynamics on Google data for search query Myspace we thenapplied the model to the Google data for search query Facebook Extrapolating the best fit model intothe future suggests that Facebook will undergo a rapid decline in the coming years losing 80 of its peakuser base between 2015 and 2017

Acknowledgments

This publication is the culmination of talk originally presented at the 2012 Princeton Research Symposiumtitled rdquoThe social network disease epidemiology and the demise of Facebookrdquo The authors acknowledgePRS for its provision of an intellectually stimulating forum for the discussion of research without whichthis work would not have been conceived The authors acknowledge the Princeton Graduate Schoolspecifically Dean William B Russel for continued support of PRS and help in meeting the costs ofpublication In addition the authors acknowledge Professor Craig B Arnold for fruitful golf discussion

References

1 Boyd D Ellison N (2007) Social network sites Definition history and scholarship Journal ofComputer-Mediated Communication 13 210-230

2 Google finance URL httpfinancegooglecom Accessed 2013-11-20

3 Gillette F (2011) The rise and inglorious fall of myspace URL httpcligs94lb3yf

4 (2012) Communication Technology Update and Fundamentals Taylor and Francis

5 Wu S Das Sarma A Fabrikant A Lattanzi S Tomkins A (2013) Arrival and departure dynamicsin social networks In Proceedings of the Sixth ACM International Conference on Web Searchand Data Mining WSDM rsquo13 pp 233ndash242

6 Goffman W (1966) Mathematical approach to the spread of scientific ideasthe history of mast cellresearch Nature 449452

7 Watts DJ (2002) A simple model of global cascades on random networks Proceedings of theNational Academy of Sciences 99 5766-5771

8 Bartholomew DJ (1984) Recent developments in nonlinear stochastic modelling of social processesCanadian Journal of Statistics 12 39ndash52

8

9 Bettencourt LM Cintrn-Arias A Kaiser DI Castillo-Chvez C (2006) The power of a good ideaQuantitative modeling of the spread of ideas from epidemiological models Physica A StatisticalMechanics and its Applications 364 513 - 536

10 Google trends URL httptrendsgooglecom Accessed 2014-1-6

11 Detecting influenza epidemics using search engine query data Nature 457 1012-1014

12 Choi H Varian H (2012) Predicting the present with google trends Economic Record 88 2-9

13 T Preis HSM Stanley HE (2013) Quantifying trading behavior in financial markets using googletrends Scientific Reports 3

14 Google flu trends URL httpwwwgoogleorgflutrends Accessed 2013-12-23

15 Carneiro HA Mylonakis E (2009) Google trends A web-based tool for real-time surveillance ofdisease outbreaks Clinical Infectious Diseases 49 1557-1564

16 Bailey N (1987) Mathematical Theory of Infectious Diseases Mathematics in Medicine SeriesOxford University Press Incorporated

17 Dasgupta K Singh R Viswanathan B Chakraborty D Mukherjea S et al (2008) Social ties andtheir relevance to churn in mobile telecom networks In Proceedings of the 11th InternationalConference on Extending Database Technology Advances in Database Technology EDBT rsquo08 pp668ndash677

18 Garcia D Mavrodiev P Schweitzer F (2013) Social resilience in online communities The autopsyof friendster In Proceedings of the First ACM Conference on Online Social Networks COSN rsquo13pp 39ndash50

19 Torkjazi M Rejaie R Willinger W (2009) Hot today gone tomorrow On the migration of myspaceusers In Proceedings of the 2Nd ACM Workshop on Online Social Networks WOSN rsquo09

20 McGee M Google penguin update 3 released impacts 03 of english-language queries URLhttpsearchenginelandcomgoogle-penguin-update-3-135527 Accessed 2013-11-20

21 (2013) Facebook is not only the worldrsquos largest social network it is also the fastest growing URLhttptechcrunchcom20080812facebook-is-not-only-the-worlds-largest-social-network-it-is-als

22 Facebook q3 2013 earnings call transcript URL httpinvestorfbcomresultscfm Ac-cessed 2013-12-02

23 Miller D (Nov 24 2013) What will we learn from the fall of facebook URLhttpblogsuclacuk

24 URL httpwwwmathworkscom

25 Lagarias J Reeds J Wright M Wright P (1998) Convergence properties of the nelderndashmead simplexmethod in low dimensions SIAM Journal on Optimization 9 112-147

26 Dormand J Prince P (1980) A family of embedded runge-kutta formulae Journal of Computationaland Applied Mathematics 6 19 - 26

9

115

95

105

125

12012 42012 72012 72013102012 12013 42013

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 1 Google search query data for ldquoFacebookrdquo between January 2012 and July 2013before and after removal of the artifactual October 2012 jump in search queries Both datasets are scaled such that 100 corresponds to the maximum weekly Google search queries for the set withthe jump removed over the plotted time period

2012201020082006 2014112004

100

80

60

40

20

MySpace

Facebook

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 2 Search query data for ldquoFacebookrdquo and ldquoMySpacerdquo obtained from Google Trendsoverlaid on top of each other The data for both curves are scaled such that 100 corresponds to themaximum weekly Google search queries for rdquoFacebookrdquo over the plotted time period Search queries forldquoMySpacerdquo peak at 10 of the maximum weekly search queries for ldquoFacebookrdquo in this time period

Figure Legends

Tables

10

100

80

60

40

20

112004 2008 2012

MySpace

best fitSIR model

best fitirSIR model

2004 2008 2012

MySpace

Date Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 3 Data for search query ldquoMyspacerdquo with best fit (a) SIR and (b) irSIR modelsoverlaid The search query data are normalized such that the maximum data point corresponds to avalue of 100

Facebook

best fitirSIR model

predictionbound curves

112004 2008 2012 2016 2020

100

80

60

40

20

Date

Norm

alized w

eekly

sear

ch q

uerie

s

early late

Figure 4 Data for search query ldquoFacebookrdquo with best fit irSIR model overlaid Predictionbounds corresponding to the earliest and latest decline times with a 15 increase in SSE over the bestfit are overlaid The data are scaled such that the maximum weekly search queries for ldquoFacebookrdquocorresponds to a value of 100

11

Table 1 Epidemiological model parameter definitionsSymbol Units Disease Model Parameter Equivalent OSN Model Parameter

S People Susceptible Potential OSN usersI People Infected OSN usersR People RecoveredImmune Population opposed to OSN useβ Timeminus1 Infection rate Rate at which potential users join OSNγ Timeminus1 recovery rate -ν Timeminus1 - OSN abandonment rate

Summary of parameters in the epidemiological models and their equivalent OSN analogues

Table 2 Epidemiological model fitting parametersFit β ν S0

NI0N

R0

NN SSE 20 date

MySpace SIR 492 middot 10minus2 539 (γ) 0996 41 middot 10minus3 0 (fixed) 322 167 middot 104 122010MySpace irSIR 598 middot 10minus2 268 middot 10minus2 0992 949 middot 10minus4 719 middot 10minus3 9294 433 middot 103 112010Facebook best 336 middot 10minus2 498 middot 10minus2 100 643 middot 10minus5 235 middot 10minus6 945 134 middot 103 12015Facebook early 343 middot 10minus2 823 middot 10minus2 100 547 middot 10minus5 104 middot 10minus9 9361 153 middot 103 072014Facebook late 327 middot 10minus2 271 middot 10minus2 100 809 middot 10minus5 410 middot 10minus4 970 154 middot 103 022016

Table showing fitting parameter values from fitting of search query data with epidemiological models

Page 8: Epidemiological modeling of online social network dynamics

8

9 Bettencourt LM Cintrn-Arias A Kaiser DI Castillo-Chvez C (2006) The power of a good ideaQuantitative modeling of the spread of ideas from epidemiological models Physica A StatisticalMechanics and its Applications 364 513 - 536

10 Google trends URL httptrendsgooglecom Accessed 2014-1-6

11 Detecting influenza epidemics using search engine query data Nature 457 1012-1014

12 Choi H Varian H (2012) Predicting the present with google trends Economic Record 88 2-9

13 T Preis HSM Stanley HE (2013) Quantifying trading behavior in financial markets using googletrends Scientific Reports 3

14 Google flu trends URL httpwwwgoogleorgflutrends Accessed 2013-12-23

15 Carneiro HA Mylonakis E (2009) Google trends A web-based tool for real-time surveillance ofdisease outbreaks Clinical Infectious Diseases 49 1557-1564

16 Bailey N (1987) Mathematical Theory of Infectious Diseases Mathematics in Medicine SeriesOxford University Press Incorporated

17 Dasgupta K Singh R Viswanathan B Chakraborty D Mukherjea S et al (2008) Social ties andtheir relevance to churn in mobile telecom networks In Proceedings of the 11th InternationalConference on Extending Database Technology Advances in Database Technology EDBT rsquo08 pp668ndash677

18 Garcia D Mavrodiev P Schweitzer F (2013) Social resilience in online communities The autopsyof friendster In Proceedings of the First ACM Conference on Online Social Networks COSN rsquo13pp 39ndash50

19 Torkjazi M Rejaie R Willinger W (2009) Hot today gone tomorrow On the migration of myspaceusers In Proceedings of the 2Nd ACM Workshop on Online Social Networks WOSN rsquo09

20 McGee M Google penguin update 3 released impacts 03 of english-language queries URLhttpsearchenginelandcomgoogle-penguin-update-3-135527 Accessed 2013-11-20

21 (2013) Facebook is not only the worldrsquos largest social network it is also the fastest growing URLhttptechcrunchcom20080812facebook-is-not-only-the-worlds-largest-social-network-it-is-als

22 Facebook q3 2013 earnings call transcript URL httpinvestorfbcomresultscfm Ac-cessed 2013-12-02

23 Miller D (Nov 24 2013) What will we learn from the fall of facebook URLhttpblogsuclacuk

24 URL httpwwwmathworkscom

25 Lagarias J Reeds J Wright M Wright P (1998) Convergence properties of the nelderndashmead simplexmethod in low dimensions SIAM Journal on Optimization 9 112-147

26 Dormand J Prince P (1980) A family of embedded runge-kutta formulae Journal of Computationaland Applied Mathematics 6 19 - 26

9

115

95

105

125

12012 42012 72012 72013102012 12013 42013

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 1 Google search query data for ldquoFacebookrdquo between January 2012 and July 2013before and after removal of the artifactual October 2012 jump in search queries Both datasets are scaled such that 100 corresponds to the maximum weekly Google search queries for the set withthe jump removed over the plotted time period

2012201020082006 2014112004

100

80

60

40

20

MySpace

Facebook

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 2 Search query data for ldquoFacebookrdquo and ldquoMySpacerdquo obtained from Google Trendsoverlaid on top of each other The data for both curves are scaled such that 100 corresponds to themaximum weekly Google search queries for rdquoFacebookrdquo over the plotted time period Search queries forldquoMySpacerdquo peak at 10 of the maximum weekly search queries for ldquoFacebookrdquo in this time period

Figure Legends

Tables

10

100

80

60

40

20

112004 2008 2012

MySpace

best fitSIR model

best fitirSIR model

2004 2008 2012

MySpace

Date Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 3 Data for search query ldquoMyspacerdquo with best fit (a) SIR and (b) irSIR modelsoverlaid The search query data are normalized such that the maximum data point corresponds to avalue of 100

Facebook

best fitirSIR model

predictionbound curves

112004 2008 2012 2016 2020

100

80

60

40

20

Date

Norm

alized w

eekly

sear

ch q

uerie

s

early late

Figure 4 Data for search query ldquoFacebookrdquo with best fit irSIR model overlaid Predictionbounds corresponding to the earliest and latest decline times with a 15 increase in SSE over the bestfit are overlaid The data are scaled such that the maximum weekly search queries for ldquoFacebookrdquocorresponds to a value of 100

11

Table 1 Epidemiological model parameter definitionsSymbol Units Disease Model Parameter Equivalent OSN Model Parameter

S People Susceptible Potential OSN usersI People Infected OSN usersR People RecoveredImmune Population opposed to OSN useβ Timeminus1 Infection rate Rate at which potential users join OSNγ Timeminus1 recovery rate -ν Timeminus1 - OSN abandonment rate

Summary of parameters in the epidemiological models and their equivalent OSN analogues

Table 2 Epidemiological model fitting parametersFit β ν S0

NI0N

R0

NN SSE 20 date

MySpace SIR 492 middot 10minus2 539 (γ) 0996 41 middot 10minus3 0 (fixed) 322 167 middot 104 122010MySpace irSIR 598 middot 10minus2 268 middot 10minus2 0992 949 middot 10minus4 719 middot 10minus3 9294 433 middot 103 112010Facebook best 336 middot 10minus2 498 middot 10minus2 100 643 middot 10minus5 235 middot 10minus6 945 134 middot 103 12015Facebook early 343 middot 10minus2 823 middot 10minus2 100 547 middot 10minus5 104 middot 10minus9 9361 153 middot 103 072014Facebook late 327 middot 10minus2 271 middot 10minus2 100 809 middot 10minus5 410 middot 10minus4 970 154 middot 103 022016

Table showing fitting parameter values from fitting of search query data with epidemiological models

Page 9: Epidemiological modeling of online social network dynamics

9

115

95

105

125

12012 42012 72012 72013102012 12013 42013

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 1 Google search query data for ldquoFacebookrdquo between January 2012 and July 2013before and after removal of the artifactual October 2012 jump in search queries Both datasets are scaled such that 100 corresponds to the maximum weekly Google search queries for the set withthe jump removed over the plotted time period

2012201020082006 2014112004

100

80

60

40

20

MySpace

Facebook

Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 2 Search query data for ldquoFacebookrdquo and ldquoMySpacerdquo obtained from Google Trendsoverlaid on top of each other The data for both curves are scaled such that 100 corresponds to themaximum weekly Google search queries for rdquoFacebookrdquo over the plotted time period Search queries forldquoMySpacerdquo peak at 10 of the maximum weekly search queries for ldquoFacebookrdquo in this time period

Figure Legends

Tables

10

100

80

60

40

20

112004 2008 2012

MySpace

best fitSIR model

best fitirSIR model

2004 2008 2012

MySpace

Date Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 3 Data for search query ldquoMyspacerdquo with best fit (a) SIR and (b) irSIR modelsoverlaid The search query data are normalized such that the maximum data point corresponds to avalue of 100

Facebook

best fitirSIR model

predictionbound curves

112004 2008 2012 2016 2020

100

80

60

40

20

Date

Norm

alized w

eekly

sear

ch q

uerie

s

early late

Figure 4 Data for search query ldquoFacebookrdquo with best fit irSIR model overlaid Predictionbounds corresponding to the earliest and latest decline times with a 15 increase in SSE over the bestfit are overlaid The data are scaled such that the maximum weekly search queries for ldquoFacebookrdquocorresponds to a value of 100

11

Table 1 Epidemiological model parameter definitionsSymbol Units Disease Model Parameter Equivalent OSN Model Parameter

S People Susceptible Potential OSN usersI People Infected OSN usersR People RecoveredImmune Population opposed to OSN useβ Timeminus1 Infection rate Rate at which potential users join OSNγ Timeminus1 recovery rate -ν Timeminus1 - OSN abandonment rate

Summary of parameters in the epidemiological models and their equivalent OSN analogues

Table 2 Epidemiological model fitting parametersFit β ν S0

NI0N

R0

NN SSE 20 date

MySpace SIR 492 middot 10minus2 539 (γ) 0996 41 middot 10minus3 0 (fixed) 322 167 middot 104 122010MySpace irSIR 598 middot 10minus2 268 middot 10minus2 0992 949 middot 10minus4 719 middot 10minus3 9294 433 middot 103 112010Facebook best 336 middot 10minus2 498 middot 10minus2 100 643 middot 10minus5 235 middot 10minus6 945 134 middot 103 12015Facebook early 343 middot 10minus2 823 middot 10minus2 100 547 middot 10minus5 104 middot 10minus9 9361 153 middot 103 072014Facebook late 327 middot 10minus2 271 middot 10minus2 100 809 middot 10minus5 410 middot 10minus4 970 154 middot 103 022016

Table showing fitting parameter values from fitting of search query data with epidemiological models

Page 10: Epidemiological modeling of online social network dynamics

10

100

80

60

40

20

112004 2008 2012

MySpace

best fitSIR model

best fitirSIR model

2004 2008 2012

MySpace

Date Date

Norm

alized w

eekly

sear

ch q

uerie

s

Figure 3 Data for search query ldquoMyspacerdquo with best fit (a) SIR and (b) irSIR modelsoverlaid The search query data are normalized such that the maximum data point corresponds to avalue of 100

Facebook

best fitirSIR model

predictionbound curves

112004 2008 2012 2016 2020

100

80

60

40

20

Date

Norm

alized w

eekly

sear

ch q

uerie

s

early late

Figure 4 Data for search query ldquoFacebookrdquo with best fit irSIR model overlaid Predictionbounds corresponding to the earliest and latest decline times with a 15 increase in SSE over the bestfit are overlaid The data are scaled such that the maximum weekly search queries for ldquoFacebookrdquocorresponds to a value of 100

11

Table 1 Epidemiological model parameter definitionsSymbol Units Disease Model Parameter Equivalent OSN Model Parameter

S People Susceptible Potential OSN usersI People Infected OSN usersR People RecoveredImmune Population opposed to OSN useβ Timeminus1 Infection rate Rate at which potential users join OSNγ Timeminus1 recovery rate -ν Timeminus1 - OSN abandonment rate

Summary of parameters in the epidemiological models and their equivalent OSN analogues

Table 2 Epidemiological model fitting parametersFit β ν S0

NI0N

R0

NN SSE 20 date

MySpace SIR 492 middot 10minus2 539 (γ) 0996 41 middot 10minus3 0 (fixed) 322 167 middot 104 122010MySpace irSIR 598 middot 10minus2 268 middot 10minus2 0992 949 middot 10minus4 719 middot 10minus3 9294 433 middot 103 112010Facebook best 336 middot 10minus2 498 middot 10minus2 100 643 middot 10minus5 235 middot 10minus6 945 134 middot 103 12015Facebook early 343 middot 10minus2 823 middot 10minus2 100 547 middot 10minus5 104 middot 10minus9 9361 153 middot 103 072014Facebook late 327 middot 10minus2 271 middot 10minus2 100 809 middot 10minus5 410 middot 10minus4 970 154 middot 103 022016

Table showing fitting parameter values from fitting of search query data with epidemiological models

Page 11: Epidemiological modeling of online social network dynamics

11

Table 1 Epidemiological model parameter definitionsSymbol Units Disease Model Parameter Equivalent OSN Model Parameter

S People Susceptible Potential OSN usersI People Infected OSN usersR People RecoveredImmune Population opposed to OSN useβ Timeminus1 Infection rate Rate at which potential users join OSNγ Timeminus1 recovery rate -ν Timeminus1 - OSN abandonment rate

Summary of parameters in the epidemiological models and their equivalent OSN analogues

Table 2 Epidemiological model fitting parametersFit β ν S0

NI0N

R0

NN SSE 20 date

MySpace SIR 492 middot 10minus2 539 (γ) 0996 41 middot 10minus3 0 (fixed) 322 167 middot 104 122010MySpace irSIR 598 middot 10minus2 268 middot 10minus2 0992 949 middot 10minus4 719 middot 10minus3 9294 433 middot 103 112010Facebook best 336 middot 10minus2 498 middot 10minus2 100 643 middot 10minus5 235 middot 10minus6 945 134 middot 103 12015Facebook early 343 middot 10minus2 823 middot 10minus2 100 547 middot 10minus5 104 middot 10minus9 9361 153 middot 103 072014Facebook late 327 middot 10minus2 271 middot 10minus2 100 809 middot 10minus5 410 middot 10minus4 970 154 middot 103 022016

Table showing fitting parameter values from fitting of search query data with epidemiological models