emma s. spiro · 2010. 9. 24. · discussion topics and ego networks on twitter emma s. spiro...

25
Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25, 2010 This material is based on research supported by the Office of Naval Research under award N00014-08-1-1015. NCASD NETWORKS, COMPUTATION, and SOCIAL DYNAMICS Scalable Methods for the Analysis of Network-Based Data

Upload: others

Post on 15-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Discussion Topics and Ego Networks on Twitter

Emma S. Spiro

University of California, IrvineDepartment of Sociology

Presented at MURI AHM May 25, 2010

This material is based on research supported by the Office of Naval Research

under award N00014-08-1-1015.

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 2: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Talk Outline

I Project Motivation

I Introduction to Twitter

I Data Collection

I Communication Dynamics

I Structural Characteristics of Personal Networks

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 3: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Project Motivation

I Informal communication channels are often the primary meansby which time-sensitive hazard information first reachesmembers of the public.

I Social media technologies, e.g. micro-blogging, provide ameans for gathering, sorting and disseminating information —a venue for collective problem solving.

I Relatively little is known about the dynamics of informalinformation communication in emergencies or hazards.

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 4: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 5: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 6: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Why Twitter?

I Twitter represents an extremely large social network —100 million users.

I Tie formation and destruction are rapid and widespread.

I Combination of text and interpersonal networks.

I Extreme heterogeneity in terms of network properties as wellas communication behavior.

I Scalable methods and models.

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 7: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Modeling Discussion Topics on Twitter

Consider the population of individuals talking about a given topic.Can we make predictions about

I the dynamics of this communication?

I the network properties of this discussion group?

I For now, sampling-based approaches.

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 8: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Project Activities

Using automated data collection methods we collect information

I on the dynamics of communication content.

I on the properties of communicants’ online interpersonalnetworks.

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 9: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Twitter Data Collection, Part I - Topic Dynamics

I Public, global content is searchable by keyword.

I Begin with a list of topics each characterized by a set ofkeywords.

I We include a control topic in which words are chosen fromOgden’s word list.

I Automated data collection designed to capture all publictweets containing the given keyword.

I Potential missing data.

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 10: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Twitter Data Collection, Part II - Personal Networks

I Each user on Twitter has a personal network consisting offriends (out-ties) and followers (in-ties).

I For each keyword we sample 20 recently active users each dayand keep them in the sample for 7 days.

I For each user we obtain a list of alters, as well as variouscovariates if available.

I Potential covariates: location, privacy settings, timezone,account creation date, activity level, language.

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 11: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Research Questions

I What seasonality exists within a discussion?

I How do exogenous events affect communication dynamics?

I What are the structural characteristics of the interpersonalnetworks of the discussant group?

I Individual level prediction?

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 12: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Research Questions

I What seasonality exists within a discussion?

I How do exogenous events affect communication dynamics?

I What are the structural characteristics of the interpersonalnetworks of the discussant group?

I Individual level prediction?

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 13: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Research Questions

I What seasonality exists within a discussion?

I How do exogenous events affect communication dynamics?

I What are the structural characteristics of the interpersonalnetworks of the discussant group?

I Individual level prediction?

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 14: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Research Questions

I What seasonality exists within a discussion?

I How do exogenous events affect communication dynamics?

I What are the structural characteristics of the interpersonalnetworks of the discussant group?

I Individual level prediction?

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 15: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Oil spill: Seasonality and Exogenous Events

Time

Tw

eet C

ount

500

1000

1500

●●

●●●

●●●

●●●

●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●●●●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●●●●

●●

●●

●●●●●●●

●●●●

●●

●●●●●●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●

●●●●

●●●

●●

●●●

●●●●●●●

●●●●

●●

●●●●

●●●●

●●●●●●

●●●

●●●●●

●●

●●

●●●●●●●●●

●●

●●●●●

●●●●●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●●●●

●●

●●●

●●

●●

●●●●●

●●●●●

●●●

●●

●●

●●

●●●●●●

●●●●

●●●

●●

●●

●●●●

●●

●●●

●●●●

●●●

●●●●●●

●●

●●●●●●●

●●●●

●●

●●●●

●●

●●●

●●●

●●

●●●●●●

●●●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●●

●●●●●●

●●●

●●

●●●●

●●

●●●●

●●

●●●●●

●●

●●

●●●●

●●●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●●●

●●●

●●

●●●●●●●●●●●●●●

●●●●

●●●●●

●●●●●●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●●●

●●

●●●

●●

●●

●●●●●

●●●●●●●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●●●

●●

●●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●

●●●●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●●

●●●●●●

●●

●●

●●●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●●●●

●●●●

●●

●●

●●●●

●●●●

●●

●●●●●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Oil

Rea

ches

Coa

stlin

e in

LA

BP

Lau

nche

s Li

ve W

ebca

m o

f Lea

k

Weekend WeekendWeekend

03−May 10−May 17−May 24−May

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 16: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Structural Characteristics of Personal Networks

I Mean degree of topic participants.

I Is an increase in overall mean degree due to those alreadypresent in the discussion gaining alters or is it due to highdegree individuals entering the discussion?

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 17: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Degree Distribution Dynamics

I Consider the mean degree in the population, i.e. topicdiscussion sample, over time.

I The mean degree is affected by different population processes:those entering the sample (immigrants), those who stay in thesample (non-migrants), and those who leave the sample(emigrants).

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 18: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Degree Distribution Dynamics

I Consider the mean degree in the population, i.e. topicdiscussion sample, over time.

I The mean degree is affected by different population processes:those entering the sample (immigrants), those who stay in thesample (non-migrants), and those who leave the sample(emigrants).

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 19: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Lightning: Mean Degree Dynamics

Date

Deg

ree

102

102.5

103

103.5

104

Mar−10 Apr−10

factor(type)

e

i

s

t

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 20: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Degree Dynamics Decomposition

time: t+1time: t

I

E'S

E

population samplepopulation sample

I'

E''S'

E'

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 21: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Degree Dynamics Decomposition

d̄t+1 − d̄t =d̄t+1(I ′)NI′

+ d̄t+1(S′)NS′

Nt+1− d̄t(S

′)NS′+ d̄t(E

′)NE′

Nt

=d̄t+1(I ′)NI′

Nt+1+d̄t+1(S′)NS′

Nt+1− d̄t(S

′)NS′

Nt− d̄t(E

′)NE′

Nt

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 22: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Degree Dynamics Decomposition

I Let d̄t+1(I ′) = αd̄t(S′), d̄t(E

′) = εd̄t(S′), d̄t+1(S′) = γd̄t(S

′).

I Intuitively, we are expressing the respective degrees of the immigrants,emigrants, and (t+1) stayers in terms of what the stayers’ degrees wereat time t.

I Likewise, let wI′t+1 = NI′

/Nt+1, wE′t = NE′

/Nt, and wS′t+1 = NS′

/Nt+1

be the relative population weights for the three groups.

d̄t+1 − d̄t = αd̄t(S′)wI′

t+1 + γd̄t(S′)wS′

t+1

»1− Nt+1

γNt

–− εd̄t(S

′)wE′t

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 23: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Lightning: Mean Degree Dynamics

Date

Par

amet

er

0

5

10

15

20

25

30

Mar−10 Apr−10

factor(type)

a

e

g

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 24: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Summary

I Modeling large scale dynamic networks with text component.

I Scalability.

I Activity sampling and egocentric properties.

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data

Page 25: Emma S. Spiro · 2010. 9. 24. · Discussion Topics and Ego Networks on Twitter Emma S. Spiro University of California, Irvine Department of Sociology Presented at MURI AHM May 25,

Future Work

I Time-series analysis of the topic data.

I Complete sampling of discussant groups.

I Decomposition of the change in average number of sharedpartners or other statistics.

I Statistical models of topics on dynamics networks.

I Questions?

NCASD

NETWORKS, COMPUTATION, and SOCIAL DYNAMICS

Scalable Methods for theAnalysis of Network-Based Data