modeling and simulating social systems with matlab · 2016-11-28 modeling and simulating social...

36
2016-11-28 © ETH Zürich | Modeling and Simulating Social Systems with MATLAB Lecture 10–Dynamics on Networks © ETH Zürich | Computational Social Science Olivia Woolley, Lloyd Sanders, Dirk Helbing

Upload: others

Post on 30-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 © ETH Zürich |

Modeling and Simulating Social Systems with MATLAB

Lecture 10–Dynamics on Networks

© ETH Zürich |

Computational Social Science

Olivia Woolley, Lloyd Sanders, Dirk Helbing

Page 2: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB 2

Schedule of the course 26.09. 03.10. 10.10. 17.10. 24.10. 31.10. 07.11. 14.11. 21.11. 28.12. 05.12. 12.12. 19.12.

Introduction to MATLAB

Introduction to social science modelling and simulations

Working on projects (seminar thesis)

Handing in seminar thesis and giving a presentation

Flash Talks

Modeling overview

Complex networks From structure to dynamics

Page 3: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Final presentation schedule §  Project presentation 10’ + 5’ (for Q&A)

§  All group members have to actively participate in the presentation

§  Registration for final presentation is binding; if you do not want to

obtain credits, do not register!

§  There are 16 slots on two days:

§  Monday, 19 December: 16:30 – 18:30

§  Tuesday, 20 December: 14:30 – 16:30

§  Sign up for slots begins today: http://goo.gl/4psqsM

3

Page 4: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

II. Disease spread on networks

§  The structure of social interactions and human movement has a critical effect on disease spread

§  We can use networks to model this structure

4

Source Bearman et al. (2004)

Page 5: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Recap: Kermack-McKendrick model

5

Page 6: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Effect of topology on disease spread §  Small diameter leads to faster spread.

6

Long range connections Short-range clustered connections

Page 7: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Recap: Kermack-McKendrick model

7

Page 8: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Recap: Kermack-McKendrick model

8

dsdt= −β j(t)s(t)

djdt= β j(t)s(t)−γ j(t)

drdt= γ j(t)

N : Number of individuals s = S/N j = I/N r = R/N β: Infection/contact rate γ: Immunity/recovery rate

Page 9: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Recap: Kermack-McKendrick model

9

dsdt= −β j(t)s(t)

djdt= β j(t)s(t)−γ j(t)

drdt= γ j(t)

N : Number of individuals s = S/N j = I/N r = R/N

djdt≤ 0⇒ β ≤ γ

The disease will die out if:

Page 10: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Recap: Reproductive number

10

R0 =βγ

R0 >1⇒ Infection invades population

Page 11: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Approximating spreading on a network §  Assume that each of the k neighbors is equally

likely to be of type Infected.

§  Probability that a node with k neighbors becomes infected in time interval dt:

11

Probability contact occurs with a single infected neighbor

Expected number of infected neighbors

(Leading order of Taylor expansion when βdt << 1)

1� (1� �dt)kj ' �kjdt

Page 12: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Approximating spreading on a network

12

dsdt= −β k j(t)s(t)

djdt= β k j(t)s(t)−γ j(t)

drdt= γ j(t)

N : Number of individuals s = S/N j = I/N r = R/N β contact rate per link

Average degree

Page 13: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Approximating spreading on a network

13

dsdt= −β k j(t)s(t)

djdt= β k j(t)s(t)−γ j(t)

drdt= γ j(t)

N : Number of individuals s = S/N j = I/N r = R/N

§  Easier for disease to invade the population for larger <k>

R0 =β kγ

Page 14: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Super-spreaders: Degree heterogeneity §  What happens when the degree distribution is

heterogeneous? (e.g. scale free)

14

Swedish survey of sexual behaviour (1996) Source: Liljeros et al. (2001)

Page 15: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Super-spreaders: Degree heterogeneity

15

dskdt

= −βksk (t)Θk (t)

djkdt

= βksk (t)Θk (t)−γ jk (t)

drkdt

= γ jk (t)

N : Number of individuals s = S/N j = I/N r = R/N

density of infected neighbors around a node with degree k

Θk (t) =Θ(t)

§  Write down a compartmental model that explicitly tracks the state and degree of individuals

Page 16: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Super-spreaders: Degree heterogeneity

§  Higher heterogeneity makes epidemics more likely

§  In scale free networks with a very broad degree distribution epidemics are unavoidable!

16

R0 =β kγ

⇒ R0 =βγ

k2 − kk

For derivation see Pastor-Satorras et al. (2001) and Barrat et al. (2008)

p(k) = Ck�↵ with ↵ 3 =) hk2i ! 1 as N ! 1

Page 17: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

High degree nodes are more exposed

17

we expect a set of nominated friends to get infected earlier than aset of randomly chosen individuals (who represent the populationas a whole). More specifically, a random sample of individualsfrom a social network will have a mean degree of m (the meandegree for the population); but the friends of these randomindividuals will have a mean degree of m plus a quantity defined bythe variance of the degree distribution divided by m. Hence, whenthere is variance in degree in a population, and especially whenthere is high variance, the mean number of contacts for the friendswill be greater (and potentially much greater) than the mean forthe random sample. This is sometimes known as the ‘‘friendshipparadox’’ (‘‘your friends have more friends than you do’’) [15–19].

While the idea of immunizing such friends of randomly chosenpeople has previously been explored in a stimulating theoreticalpaper [12], to our knowledge, a method that uses nominatedfriends as sensors for early detection of an outbreak has notpreviously been proposed, nor has it been tested on any sort of realoutbreak. To evaluate the effectiveness of nominated friends associal network sensors, we therefore monitored the spread of flu atHarvard College from September 1 to December 31, 2009. In thefall of 2009, both seasonal flu (which typically kills 41,000Americans each year [20]) and the H1N1 strain were prevalent inthe US, though the great majority of cases in 2009 have beenattributed to the latter.[1] It is estimated that this H1N1 epidemic,which began roughly in April 2009, infected over 50 millionAmericans. Unlike seasonal flu, which typically affects individualsolder than 65, H1N1 tends to affect young people. Nationally,according to the CDC, the epidemic peaked in late October 2009,and vaccination only became widely available in December 2009.Whether another outbreak of H1N1 will occur (for example, inareas and populations that have heretofore been spared) is a

Figure 1. Network Illustrating Structural Parameters. This realnetwork of 105 students shows variation in structural attributes andtopological position. Each circle represents a person and each linerepresents a friendship tie. Nodes A and B have different ‘‘degree,’’ ameasure that indicates the number of ties. Nodes with higher degreealso tend to exhibit higher ‘‘centrality’’ (node A with six friends is morecentral than B and C who both only have four friends). If contagionsinfect people at random at the beginning of an epidemic, centralindividuals are likely to be infected sooner because they lie a shorternumber of steps (on average) from all other individuals in the network.Finally, although nodes B and C have the same degree, they differ in‘‘transitivity’’ (the probability that any two of one’s friends are friendswith each other). Node B exhibits high transitivity with many friendsthat know one another. In contrast, node C’s friends are not connectedto one another and therefore they offer more independent possibilitiesfor becoming infected earlier in the epidemic.doi:10.1371/journal.pone.0012948.g001

Figure 2. Theoretical expectations of differences in contagion between central individuals and the population as a whole. Acontagious process passes through two phases, one in which the number of infected individuals exponentially increases as the contagion spreads,and one in which incidence exponentially decreases as susceptible individuals become increasingly scarce. These dynamics can be modeled by alogistic function. Central individuals lie on more paths in a network compared to the average person in a population and are therefore more likely tobe infected early by a contagion that randomly infects some individuals and then spreads from person to person within the network. This shifts the S-shaped logistic cumulative incidence function forward in time for central individuals compared to the population as a whole (left panel). It also shiftsthe peak infection rate forward (right panel).doi:10.1371/journal.pone.0012948.g002

Social Network Sensors

PLoS ONE | www.plosone.org 2 September 2010 | Volume 5 | Issue 9 | e12948

Source: Christakis et al. (2010)

Page 18: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Exploiting structure for disease control §  Early warning for epidemic spreading

§  Is there a discrepancy between infection levels in high degree nodes and low degree nodes?

18

Page 19: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Exploiting structure for disease control §  Early warning for epidemic spreading

§  Is there a discrepancy between infection levels in high degree nodes and low degree nodes?

§  To do this we need a lot of information about the network structure

§  Smart local solution based on the way you sample individuals in a network §  Clue: your friends have more friends than you

19

Page 20: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Exploiting structure for disease control

20

and the same person was frequently nominated several times.Hence, our data collection procedures wound up yieldinginformation about 1,789 unique, inter-connected students whowere either surveyed or were identified as friends by those whotook part in the study. A connected component of 714 people wasin turn apparent within these 1,789 individuals. We illustrate thespread of flu in this component in Figure 4, which shows thetendency of the flu to ‘‘bloom’’ in more central nodes of thenetwork, and also in a 122-frame movie of daily flu prevalenceavailable online (see Supporting Information Video S1).

Sampling a densely interconnected population also allowed usto actually measure egocentric network properties like in-degree(number of times a subject was nominated as a friend),betweenness centrality (the number of shortest paths in thenetwork that pass through an individual), coreness (the number offriends an individual has when all individuals with fewer friendsare iteratively removed from the network), and transitivity (theprobability that two of one’s friends are friends with one another).This would not be possible in a deployment of the friends’technique in larger populations (wherein surveyed individualswould be much less likely to actually be connected to each other).The results showed that, as expected, the friend group differedsignificantly from the random group for all these measures,exhibiting higher in-degree (Mann Whitney U test p,0.001),higher centrality (p,0.001), higher k-coreness (p,0.001), andlower transitivity (p = 0.039).

We hypothesized that each of these measures could help toidentify groups that could be used as social network sensors whenfull network information is, indeed, available (see Figure 5). Forexample, we expect in-degree to be associated with early

contagion because more friends means more paths to others inthe network who might be infected. NLS estimates suggest thateach additional nomination shifts the flu curve left by 5.7 days(95% C.I. 3.6–8.1) for flu diagnoses by medical staff and 8.0 days(95% C.I. 7.3–8.5) for self-reported symptoms. On the other hand,the same is not true for out-degree (the number of friends a personnames). Pertinently, this is the only quantity that would bestraightforwardly ascertainable by asking respondents aboutthemselves. However, there is low variance in this measure inthe present setting since most people named three friends (themaximum allowed by our survey).

We also expect betweenness centrality to be associated withearly contagion. NLS estimates suggest that individuals withmaximum observed centrality shift the flu curve left by 16.5 days(95% C.I. 1.9–28.3) for flu diagnoses by medical staff and 22.9days (95% C.I. 20.0–27.2) for self-reported symptoms, relative tothose with minimum centrality. A related measure, k-coreness, alsosuggests that people at the center of the network get the flu earlier.NLS estimates suggest that increasing the measure k by one (therange is from 0 to 3) shifts the flu curve left by 4.3 days (95% C.I.1.8–6.5) for flu diagnoses by medical staff and 7.5 days (95% C.I.6.8–8.2) for self-reported symptoms. Moreover, both betweennesscentrality and k-coreness remain significant even when controllingfor both in-degree and out-degree, suggesting that it is not just thenumber of friends that is important with respect to flu risk, but alsothe number of friends of friends, friends of friends of friends, andso on [6].

Finally, we expect transitivity to be negatively associated withearly contagion. People with high transitivity may be poorlyconnected to the rest of the network because their friends tend to

Figure 3. Empirical differences in flu contagion between ‘‘friend’’ group and randomly chosen individuals. We compared two groups,one composed of individuals randomly selected from our population, and one composed of individuals who were nominated as a friend by membersof the random group. The friend group was observed to have significantly higher measured in-degree and betweenness centrality than the randomgroup (see Supporting Information Text S1). In the left panel, a nonparametric maximum likelihood estimate (NPMLE) of cumulative flu incidence(based on diagnoses by medical staff) shows that individuals in the friend group tended to get the flu earlier than individuals in the random group.Moreover, predicted daily incidence from a nonlinear least squares fit of the data to a logistic distribution function suggests that the peak incidenceof flu is shifted forward in time for the friends group by 13.9 days (right panel). A significant (p,0.05) lead time for the friend group was first detectedwith data available up to Day 16. Raw data for daily flu cases in the friend group (blue) and random group (red) is shown in the inset box (right panel).doi:10.1371/journal.pone.0012948.g003

Social Network Sensors

PLoS ONE | www.plosone.org 4 September 2010 | Volume 5 | Issue 9 | e12948

Source: Christakis et al. (2010)

Social sensors for early warning

Page 21: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Exploiting structure for disease control §  Which nodes would you immunize to stop the

spread of disease most effectively?

21

Page 22: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Exploiting structure for disease control §  Which nodes would you immunize to stop the

spread of disease most effectively? §  High degree nodes §  The friends of randomly chosen individuals

22

Page 23: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

II. Models of social spreading

23

Can your friends make you fat?

The Spread of Obesity in a Large Social Network Over 32 Years

n engl j med 357;4 www.nejm.org july 26, 2007 373

educational level; the ego’s obesity status at the previous time point (t); and most pertinent, the alter’s obesity status at times t and t + 1.25 We used generalized estimating equations to account for multiple observations of the same ego across examinations and across ego–alter pairs.26 We assumed an independent working correlation structure for the clusters.26,27

The use of a time-lagged dependent variable (lagged to the previous examination) eliminated serial correlation in the errors (evaluated with a Lagrange multiplier test28) and also substantial-ly controlled for the ego’s genetic endowment and any intrinsic, stable predisposition to obesity. The use of a lagged independent variable for an alter’s weight status controlled for homophily.25 The key variable of interest was an alter’s obesity at time t + 1. A significant coefficient for this vari-able would suggest either that an alter’s weight affected an ego’s weight or that an ego and an alter experienced contemporaneous events affect-

ing both their weights. We estimated these mod-els in varied ego–alter pair types.

To evaluate the possibility that omitted vari-ables or unobserved events might explain the as-sociations, we examined how the type or direc-tion of the social relationship between the ego and the alter affected the association between the ego’s obesity and the alter’s obesity. For example, if unobserved factors drove the association be-tween the ego’s obesity and the alter’s obesity, then the directionality of friendship should not have been relevant.

We evaluated the role of a possible spread in smoking-cessation behavior as a contributor to the spread of obesity by adding variables for the smoking status of egos and alters at times t and t + 1 to the foregoing models. We also analyzed the role of geographic distance between egos and alters by adding such a variable.

We calculated 95% confidence intervals by sim-ulating the first difference in the alter’s contem-

Figure 1. Largest Connected Subcomponent of the Social Network in the Framingham Heart Study in the Year 2000.

Each circle (node) represents one person in the data set. There are 2200 persons in this subcomponent of the social network. Circles with red borders denote women, and circles with blue borders denote men. The size of each circle is proportional to the person’s body-mass index. The interior color of the circles indicates the person’s obesity status: yellow denotes an obese person (body-mass index, ≥30) and green denotes a nonobese person. The colors of the ties between the nodes indicate the relationship between them: purple denotes a friendship or marital tie and orange denotes a familial tie.

The New England Journal of Medicine Downloaded from nejm.org at ETH ZUERICH on November 18, 2013. For personal use only. No other uses without permission.

Copyright © 2007 Massachusetts Medical Society. All rights reserved.

Source: Christakis et al. (2007)

Page 24: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Models of social spreading

24

Source: Christakis et al. (2007)

The Spread of Obesity in a Large Social Network Over 32 Years

n engl j med 357;4 www.nejm.org july 26, 2007 375

poraneous obesity (changing from 0 to 1), using 1000 randomly drawn sets of estimates from the coefficient covariance matrix and assuming mean values for all other variables.29 All tests were two-tailed. The sensitivity of the results was as-sessed with multiple additional analyses (see the Supplementary Appendix).

R esult s

Figure 1 depicts the largest connected subcom-ponent of the social network in the year 2000. This network is sufficiently dense to obscure much of the underlying structure, although re-gions of the network with clusters of obese or nonobese persons can be seen. Figure 2 illus-trates the spread of obesity between adjoining nodes in a part of the network over time. A video (available with the full text of this article at www.nejm.org) depicts the evolution of the largest component of the network and shows the prog-ress of the obesity epidemic over the 32-year study period.

Figure 3A characterizes clusters within the entire network more formally. To quantify these clusters, we compared the whole observed net-work with simulated networks with the same network topology and the same overall preva-

lence of obesity as the observed network, but with the incidence of obesity randomly distributed among the nodes (in what we call “random body-mass–index networks”). If clustering is occur-ring, then the probability that an alter will be obese, given that an ego is known to be obese, should be higher in the observed network than in the random body-mass–index networks. What we call the “reach” of the clusters is the point, in terms of an alter’s degree of separation from any given ego, at which the probability of an alter’s obesity is no longer related to whether the ego is obese. In all of the examinations (from 1971 through 2003), the risk of obesity among alters who were connected to an obese ego (at one de-gree of separation) was about 45% higher in the observed network than in a random network. The

100

80

60

40

20

0

1 2 3 4 5 6

1 2 3 4 5 6

100

80

60

40

20

0

AUTHOR:

FIGURE:

JOB:

4-CH/T

RETAKEICM

CASE

EMail LineH/TCombo

Revised

REG F

Enon

1st2nd3rd

Christakis

3 of 4

07-26-07

ARTIST: ts

35704 ISSUE:

22p3

Examination 1Examination 2Examination 3Examination 4Examination 5Examination 6Examination 7

Figure 3. Effect of Social and Geographic Distance from Obese Alters on the Probability of an Ego’s Obesity in the Social Network of the Framingham Heart Study.

Panel A shows the mean effect of an ego’s social prox-imity to an obese alter; this effect is derived by compar-ing the conditional probability of obesity in the observed network with the probability of obesity in identical net-works (with topology preserved) in which the same number of obese persons is randomly distributed. The social distance between the alter and the ego is repre-sented by degrees of separation (1 denotes one degree of separation from the ego, 2 denotes two degrees of separation from the ego, and so forth). The examina-tion took place at seven time points. Panel B shows the mean effect of an ego’s geographic proximity to an obese alter. We ranked all geographic distances (derived from geocoding) between the homes of directly connected egos and alters (i.e., those pairs at one degree of sepa-ration) and created six groups of equal size. This figure shows the effects observed for the six mileage groups (based on their average distance): 1 denotes 0 miles (i.e., closest to the alter’s home), 2 denotes 0.26 mile, 3 denotes 1.5 miles, 4 denotes 3.4 miles, 5 denotes 9.3 miles, and 6 denotes 471 miles (i.e., farthest from the alter’s home). There is no trend in geographic dis-tance. I bars for both panels show 95% confidence in-tervals based on 1000 simulations. To convert miles to kilometers, multiply by 1.6.

The New England Journal of Medicine Downloaded from nejm.org at ETH ZUERICH on November 18, 2013. For personal use only. No other uses without permission.

Copyright © 2007 Massachusetts Medical Society. All rights reserved.

The Spread of Obesity in a Large Social Network Over 32 Years

n engl j med 357;4 www.nejm.org july 26, 2007 375

poraneous obesity (changing from 0 to 1), using 1000 randomly drawn sets of estimates from the coefficient covariance matrix and assuming mean values for all other variables.29 All tests were two-tailed. The sensitivity of the results was as-sessed with multiple additional analyses (see the Supplementary Appendix).

R esult s

Figure 1 depicts the largest connected subcom-ponent of the social network in the year 2000. This network is sufficiently dense to obscure much of the underlying structure, although re-gions of the network with clusters of obese or nonobese persons can be seen. Figure 2 illus-trates the spread of obesity between adjoining nodes in a part of the network over time. A video (available with the full text of this article at www.nejm.org) depicts the evolution of the largest component of the network and shows the prog-ress of the obesity epidemic over the 32-year study period.

Figure 3A characterizes clusters within the entire network more formally. To quantify these clusters, we compared the whole observed net-work with simulated networks with the same network topology and the same overall preva-

lence of obesity as the observed network, but with the incidence of obesity randomly distributed among the nodes (in what we call “random body-mass–index networks”). If clustering is occur-ring, then the probability that an alter will be obese, given that an ego is known to be obese, should be higher in the observed network than in the random body-mass–index networks. What we call the “reach” of the clusters is the point, in terms of an alter’s degree of separation from any given ego, at which the probability of an alter’s obesity is no longer related to whether the ego is obese. In all of the examinations (from 1971 through 2003), the risk of obesity among alters who were connected to an obese ego (at one de-gree of separation) was about 45% higher in the observed network than in a random network. The

100

80

60

40

20

0

1 2 3 4 5 6

1 2 3 4 5 6

100

80

60

40

20

0

AUTHOR:

FIGURE:

JOB:

4-CH/T

RETAKEICM

CASE

EMail LineH/TCombo

Revised

REG F

Enon

1st2nd3rd

Christakis

3 of 4

07-26-07

ARTIST: ts

35704 ISSUE:

22p3

Examination 1Examination 2Examination 3Examination 4Examination 5Examination 6Examination 7

Figure 3. Effect of Social and Geographic Distance from Obese Alters on the Probability of an Ego’s Obesity in the Social Network of the Framingham Heart Study.

Panel A shows the mean effect of an ego’s social prox-imity to an obese alter; this effect is derived by compar-ing the conditional probability of obesity in the observed network with the probability of obesity in identical net-works (with topology preserved) in which the same number of obese persons is randomly distributed. The social distance between the alter and the ego is repre-sented by degrees of separation (1 denotes one degree of separation from the ego, 2 denotes two degrees of separation from the ego, and so forth). The examina-tion took place at seven time points. Panel B shows the mean effect of an ego’s geographic proximity to an obese alter. We ranked all geographic distances (derived from geocoding) between the homes of directly connected egos and alters (i.e., those pairs at one degree of sepa-ration) and created six groups of equal size. This figure shows the effects observed for the six mileage groups (based on their average distance): 1 denotes 0 miles (i.e., closest to the alter’s home), 2 denotes 0.26 mile, 3 denotes 1.5 miles, 4 denotes 3.4 miles, 5 denotes 9.3 miles, and 6 denotes 471 miles (i.e., farthest from the alter’s home). There is no trend in geographic dis-tance. I bars for both panels show 95% confidence in-tervals based on 1000 simulations. To convert miles to kilometers, multiply by 1.6.

The New England Journal of Medicine Downloaded from nejm.org at ETH ZUERICH on November 18, 2013. For personal use only. No other uses without permission.

Copyright © 2007 Massachusetts Medical Society. All rights reserved.

Page 25: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

When is it social influence?

25

Page 26: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Is it really influence?

Observation: You are more likely to be fat if you have fat friends.

§  Three competing hypothesis:

§  Social influence

§  Homophily

§  Covariation of another variable

26

Page 27: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Is it really influence?

Observation: You are more likely to be fat if you have fat friends.

§  Three competing hypothesis:

§  Social influence: Behavior spreads from one friend to another. You like McDonalds and because of this I start liking it too.

§  Homophily

§  Covariation of another variable

27

Page 28: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Is it really influence?

Observation: You are more likely to be fat if you have fat friends.

§  Three competing hypothesis:

§  Social influence

§  Homophily: Similar people are more likely to be friends. We both like McDonalds so we’re more likely to meet or like each other.

§  Covariation of another variable

28

Page 29: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Is it really influence?

Observation: You are more likely to be fat if you have fat friends.

§  Three competing hypothesis:

§  Social influence

§  Homophily

§  Covariation of another variable : We are friends because we live in the same neighborhood and there are many McDonalds there.

29

Page 30: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Is it really influence?

Observation: You are more likely to be fat if you have fat friends.

§  Three competing hypothesis:

§  Social influence: Behavior spreads from one friend to another. You like McDonalds and I like to eat with you.

§  Homophily: Similar people are more likely to be friends. We both like McDonalds so I think you’re cool.

§  Covariation of another variable : We are friends because we live close and there are many McDonalds in our neighborhood.

Impossible to distinguish hypothesis without a controlled experiment!

(See Shalizi et al. 2011).

30

Page 31: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Adoption and behavioral contagion §  Infection (adoption) could require

multiple infected neighbors

§  This could be due to peer

pressure, learning from others or

synergies of adopting together

31

Page 32: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Adoption and behavioral contagion §  Threshold model:

32

k neighborsm infected neighborsϕ threshold

Infection occurs if m/k ≥ ϕ

Page 33: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Adoption and behavioral contagion §  Too much connectivity can stop

global spreading

33

Source: Watts (2002)

explicitly excluding the percolating cluster (when it exists) from thesum !nqnxn. Using Eq. 3b, it follows that Sv " 1 # H0(1) " P #G0(H1(1)), where H1(1) satisfies Eq. 3a. Outside the cascadewindow, the only solution to Eq. 3a is H1(1) " 1, which yields Sv "0 (and therefore no cascades) as expected. But inside the cascadewindow, where the percolating vulnerable cluster exists, Eq. 3a hasan additional solution that corresponds to a non-zero value of Sv.In the special case of a uniform random graph with homogeneousthresholds, we obtain Sv " Q(K* $ 1, z) # ez(H1#1)Q(K* $ 1, zH1),in which H1 satisfies H1 " 1 # Q(K*, z) $ ez(H1#1)Q(K*, zH1). Wecontrast this expression with that for the size of the entire connectedcomponent of the graph, S " 1 # e#zS (32), which is equivalent toallowing K*3% (or !*3 0). In Fig. 2b we show the exact solutionsfor both Sv (long-dashed line) and S (solid line) for the case of !*" 0.18, and compare these quantities with the frequency and sizeof global cascades observed in the full dynamical simulation of10,000 nodes averaged over 1,000 random realizations of thenetwork and the initial condition. (The corresponding numericalvalues for Sv and S are indistinguishable from the analytical curves,except near the upper boundary of the window.)

The frequency of global cascades (open circles)—that is, cascadesthat are ‘‘successful’’—is obviously related to the size of thevulnerable component: the larger is Sv, the more likely a randomlychosen initial site is to be a part of it. In particular, if Sv does notpercolate, then global cascades are impossible. Fig. 2b clearlysupports this intuition, but it is equally clear that, within the cascadewindow, Sv seriously underestimates the likelihood of a globalcascade. The reason is that, according to our original decision rule,an individual’s choice of state depends only on the states of itsneighbors; hence, even stable vertices, although they do not par-ticipate in the initial stages of a global cascade, can still trigger themas long as they are directly adjacent to the vulnerable cluster. Thetrue likelihood of a global cascade is therefore determined by thesize of what we call the extended vulnerable cluster Se, consisting ofthe vulnerable cluster itself, and any stable vertices immediatelyadjacent to it. We have not solved for Se exactly (although this maybe possible), but it is relatively simple to determine numerically, andas the corresponding (dotted) curve in Fig. 2b demonstrates, theaverage value of Se is an excellent approximation to the observedfrequency of global cascades.

The average size of global cascades (solid circles) is clearly notgoverned either by the size of the vulnerable cluster Sv, or by Se, butby S, the connectivity of the network as a whole. This is a surprisingresult, the reason for which is not entirely clear, but a plausibleexplanation is as follows. If a global cascade is triggered by aninitially small seed striking the extended vulnerable cluster, it isguaranteed to occupy the entire vulnerable cluster, and therefore afinite fraction of even an infinite network. At this stage, thesmall-seed condition no longer holds, and so nodes that are still inthe off state can now have multiple (early-adopting) neighbors inthe on state. Hence, even individuals that were originally classifiedas stable (the early and late majority) can now be toppled, allowingthe cascade to occupy not just the vulnerable component thatallowed the cascade to spread initially, but the entire connectedcomponent of the graph. That the activation of a percolating

Fig. 1. Cascade windows for the threshold model. The dashed line enclosesthe region of the (!

*, z) plane in which the cascade condition (Eq. 5) is satisfied

for a uniform random graph with a homogenous threshold distribution f(!) ""(! # !

*). The solid circles outline the region in which global cascades occur for

the same parameter settings in the full dynamical model for n " 10,000(averaged over 100 random single-node perturbations).

Fig. 2. Cross section of the cascade window from Fig. 1, at !*

" 0.18. (a) Theaverage time required for a cascade to terminate diverges at both the lower andupper boundaries of the cascade window, indicating two phase transitions. (b)Comparison between connected components of the network and the propertiesof global cascades. The frequency of global cascades in the numerical model(open circles) is well approximated by the fractional size of the extended vulner-able cluster (short dashes). For comparison, the size of the vulnerable cluster isalso shown, both the exact solution derived in the text (long dashes) and theaverage over 1,000 realizations of a random graph (crosses). The exact andnumerical solutions agree everywhere except at the upper phase transition,where the finite size of the network (n " 10,000) affects the numerical results.Finally, the average size of global cascades is shown (solid circles) and comparedwith the exact solution for the largest connected component (solid line).

Watts PNAS ! April 30, 2002 ! vol. 99 ! no. 9 ! 5769

APP

LIED

MA

THEM

ATI

CS

Average degree

Infe

cted

frac

tion

Page 34: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

Adoption and behavioral contagion §  Spreading can be slower with

more long-range connections

34

not alter the topology in which they were em-bedded (e.g., by making new ties). In both condi-tions, each participant was randomly assignedto occupy a single node in one network. Theoccupants of the immediately adjacent nodes inthe network (i.e., the network neighbors) consti-tuted a participant’s health buddies (13). Eachnode in a social network had an identical numberof neighbors as the other nodes in the network,and participants could only see the immediateneighbors to whom they were connected.

Consequently, the size of each participant’ssocial neighborhood was identical for all par-ticipants within a network and across conditions.More generally, every aspect of a participant’sexperience before the initiation of the diffusiondynamics was equivalent across conditions, andthe only difference between the conditions wasthe pattern of connectedness of the social net-

works in which the participants were embedded.Thus, any differences in the dynamics of diffu-sion between the two conditions can be attri-buted to the effects of network topology.

There are four advantages of this experi-mental design over observational data. (i) Thepresent study isolates the effects of networktopology, independent of frequently co-occurringfactors such as homophily (3, 16), geographicproximity (17), and interpersonal affect (4, 18),which are easily conflated with the effects oftopological structure in observational studies(2, 3, 11). (ii) I study the spread of a health-related behavior that is unknown to the partici-pants before the study (13), thereby eliminatingthe effects of nonnetwork factors from the dif-fusion dynamics, such as advertising, availability,and pricing, which can confound the effects oftopology on diffusion when, for example, the

local structure of a social network correlateswith greater resources for learning about oradopting an innovation (11, 19). (iii) This studyeliminates the possibility for social ties to changeand thereby identifies the effects of networkstructure on the dynamics of diffusion withoutthe confounding effects of homophilous tieformation (1, 20). (iv) Finally, this design allowsthe same diffusion process to be observedmultiple times, under identical structural condi-tions, thus allowing the often stochastic process ofindividual adoption (21) to be studied in a waythat provides robust evidence for the effects ofnetwork topology on the dynamics of diffusion.

I report the results from six independent trialsof this experimental design, each consisting of amatched pair of network conditions. In each pair,participants were randomized to either a clustered-lattice network or a corresponding random net-work (13). This yielded 12 independent diffusionprocesses. Diffusion was initiated by selecting arandom “seed node,” which sent signals to its net-workneighbors encouraging them to adopt a health-related behavior—namely, registering for a healthforum Web site (13). Every time a participantadopted the behavior (i.e., registered for the healthforum), messages were sent to her health buddiesinviting them to adopt. If a participant had mul-tiple health buddies who adopted the behavior,then she would receive multiple signals, one fromeach neighbor. Themore neighbors who adopted,themore reinforcing signals a participant received.The sequence of adoption decisions made by the

0 2 4 6 8 10 12 140

0.1

0.2

0.3

0.4

0.5

0.6

F

Time (Days)

Frac

tion

Ado

pted

0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

0.6

D

Time (Days)

Frac

tion

Ado

pted

0 2 4 6 8 10 12 140

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.7 0.7

0.7 0.7

A

Time (Days)

Frac

tion

Ado

pted

0 2 4 6 8 10 12 140

0.1

0.2

0.3

0.4

0.5

0.6

C

Time (Days)

Frac

tion

Ado

pted

0 2 4 6 8 10 12 14 16 18 20 220

0.1

0.2

0.3

0.4

0.5

0.6

E

Time (Days)

Frac

tion

Ado

pted

0 2 4 6 8 10 12 140

0.1

0.2

0.3

0.4

0.5

0.6

0.7B

Time (Days)

Frac

tion

Ado

pted

Fig. 2. Time series showing the adoption of a health behavior spreading through clustered-lattice (solidblack circles) and random (open triangles) social networks. Six independent trials of the study areshown, including (A) N = 98, Z = 6, (B to D) N = 128, Z = 6, and (E and F) N = 144, Z = 8. The successof diffusion was measured by the fraction of the total network that adopted the behavior. The speed ofthe diffusion process was evaluated by comparing the time required for the behavior to spread to thegreatest fraction reached by both conditions in each trial.

2 3 40.75

1.00

1.25

1.50

1.75

2.00

2.25

Reinforcing Signals

Haz

ard

Rat

io

Fig. 3. Hazard ratios for adoption for individualsreceiving two, three, and four social signals. Thehazard ratio g indicates that the likelihood ofadoption increases by a factor of g for each ad-ditional signal k, compared to the likelihood ofadoption from receiving k – 1 signals. The 95%confidence intervals from the Cox proportionalhazards model are shown by error bars. The effectof an additional signal on the likelihood of adop-tion is significant if the 95% confidence intervaldoes not contain g = 1 (13).

www.sciencemag.org SCIENCE VOL 329 3 SEPTEMBER 2010 1195

REPORTS

Source: Centola (2010)

Lattice

Random

Page 35: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

References §  Barrat, A., Barthelemy, M., & Vespignani, A. (2008). Dynamical processes

on complex networks. Cambridge: Cambridge University Press. §  Bearman, P. S., Moody, J., & Stovel, K. (2004). Chains of Affection: The

Structure of Adolescent Romantic and Sexual Networks1. American Journal of Sociology, 110(1), 44-91.

§  Liljeros, F., Edling, C. R., Amaral, L. A. N., Stanley, H. E., & Åberg, Y. (2001). The web of human sexual contacts. Nature, 411(6840), 907-908.

§  Pastor-Satorras, R., & Vespignani, A. (2001). Epidemic spreading in scale-free networks. Physical review letters, 86(14), 3200.

§  Christakis, N. A., & Fowler, J. H. (2010). Social network sensors for early detection of contagious outbreaks. PloS one, 5(9), e12948.

35

Page 36: Modeling and Simulating Social Systems with MATLAB · 2016-11-28 Modeling and Simulating Social Systems with MATLAB Exploiting structure for disease control 20 and the same person

2016-11-28 Modeling and Simulating Social Systems with MATLAB

References Continued §  Christakis, N. A., & Fowler, J. H. (2007). The spread of obesity in a large

social network over 32 years. New England journal of medicine, 357(4), 370-379.

§  Shalizi, C. R., & Thomas, A. C. (2011). Homophily and contagion are generically confounded in observational social network studies. Sociological Methods & Research, 40(2), 211-239.

§  Damon Centola, The Spread of Behavior in an Online Social Network Experiment. Science, Vol. 329 no. 5996 pp. 1194-1197 (2010)

§  Watts, D. J. (2002). A simple model of global cascades on random networks. Proceedings of the National Academy of Sciences, 99(9), 5766-5771.

36