beyond the individual: applications of social network analysis for … · 2017. 4. 24. · network...

44
Beyond the individual: Applications of social network analysis for the social sciences Dr Scott A. Hale Oxford Internet Institute http://www.scotthale.net/, @computermacgyve 8 March 2017

Upload: others

Post on 02-Nov-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Beyond the individual:Applications of social network analysis for the social

sciences

Dr Scott A. HaleOxford Internet Institute

http://www.scotthale.net/, @computermacgyve

8 March 2017

Page 2: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Introduction

The Oxford Internet Institute is . . .

a multidisciplinary department

of the University of Oxford

studying “life online”

I am . . .

a senior data scientist / computer scientist

analyzing large-scale data and

conducting experiments

to investigate multilingualism and collective action

Page 3: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Introduction

The Oxford Internet Institute is . . .

a multidisciplinary department

of the University of Oxford

studying “life online”

I am . . .

a senior data scientist / computer scientist

analyzing large-scale data and

conducting experiments

to investigate multilingualism and collective action

Page 4: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

What’s different

Surveys ask questions of individuals, and wethen infer large-scale / population-levelcharacteristics from the data. This isatomistic and ignores the social connectionsand influences people have on one another.

Social network analysis tries to capture thesocial context of individuals and linkmicro-level behaviours to emergentmicro-level outcomes.

Page 5: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Classic findings

Weak ties

Homophily

Six degrees of separation

Page 6: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Networks to show social context

Networks (graphs) are set of nodes (verticies) connected by edges (links,ties, arcs)

Further details

Whole vs. ego: whole networks have allnodes within a natural boundary(platform, organization, etc.). An egonetwork has one node and all of itsimmediate neighbors.

Edges can be directed or undirected andweighted or unweighted

Additionally, networks may be multilayerand/or multimodal.

Page 7: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Networks to show social context

Networks (graphs) are set of nodes (verticies) connected by edges (links,ties, arcs)

Further details

Whole vs. ego: whole networks have allnodes within a natural boundary(platform, organization, etc.). An egonetwork has one node and all of itsimmediate neighbors.

Edges can be directed or undirected andweighted or unweighted

Additionally, networks may be multilayerand/or multimodal.

Page 8: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Why?

Characterize network structure

How far apart / well-connected are nodes?Are some nodes at more important positions?Is the network composed of communities?

How does network structure affect processes?

Information diffusionCoordination/cooperationResilience to failure/attack

Page 9: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

A network

First questions when approaching a network

What are edges? What are nodes?

What kind of network?

Inclusion/exclusion criteria

Page 10: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Network data formats

Data format

The simplest form of network is simply two spreadsheets orcomma-separated values (csv) files. One sheet for the nodes of the networkand one sheet for the edges of the network.

Example

Nodesid name professionn1 Alan studentn2 Betty teachern3 Charlotte student

Edgesid source target weighte1 n1 n2 1e2 n1 n3 100

Page 11: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Gephi

Open-source, cross-platform GUI interface

Primary strength is to visualize networks

Basic statistical properties are also available

Alternatives include NodeXL, Pajek, GUESS, NetDraw, Tulip, and more

Page 12: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Other software

NodeXL (http://nodexl.codeplex.com/):

Add-in for Microsoft Excel (Windows only)Good data collection options (Twitter, Facebook, YouTube, . . . )Basic visualization

Pajek

http://pajek.imfm.si/doku.php?id=downloadStandalone, Windows (or Linux with wine)Good interactive environment for metrics and basic visualization

iGraph

install.packages(”igraph”) in R (r-project.org) (or python)Cross-platformText driven: powerful for analysis

NetworkX

Cross-platform python libraryExamples/introduction in this slide deck

Page 13: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

An example: bilingualism

Given the big differences in the information available in different languages,do bilingual users on social media serve as ‘bridges’ introducing contentfrom one language into another language online?

Page 14: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Bilingualism and novel information

Bilingualism common (offline)

“multilingualism...[is] the norm for most of the world’s societies” (Birner,2005), with over half of Europe and over a fifth of the US multilingual(Erard, 2012); yet, many platforms are designed only with monolingual usersin mind.

Novel information

Benefits for connecting clusters both at a large, network level & at anindividual level (Granovetter, 1973; Burt, 2004; Uzzi & Spiro, 2005; Aral &Alstyne, 2011). Beyond ‘filter bubbles’ (Pariser, 2011), language is a largerestriction on available content.

Page 15: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Twitter: Language and structure

◦ Twitter user

−→ Mention, Retweet

© Clusters: Automated groups ofstrongly–connected users basedpurely on network structure

Colour: Most-used language

Page 16: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Twitter: Language and structure

◦ Twitter user

−→ Mention, Retweet

© Clusters: Automated groups ofstrongly–connected users basedpurely on network structure

Colour: Most-used language

Page 17: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Twitter: Language and structure

◦ Twitter user

−→ Mention, Retweet

© Clusters: Automated groups ofstrongly–connected users basedpurely on network structure

Colour: Most-used language

Page 18: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Twitter: Language and structure

◦ Twitter user

−→ Mention, Retweet

© Clusters: Automated groups ofstrongly–connected users basedpurely on network structure

Colour: Most-used language

Page 19: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Twitter: Language and structure

Label propagation algorithm (Raghavan, Albert, & Kumara, 2007) found20,253 clusters.

: Histograms of the size of clusters (left) and the number of languages within eachcluster (right). Modularity score of 0.81 for this division of the network.

Page 20: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Twitter: Bilinguals bridge clusters

◦ Twitter user

−→ Mention, Retweet

Colour: Most-used language

Colour: Monolingual language;Black: Bilingual

Page 21: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Twitter: Bilinguals bridge clusters

◦ Twitter user

−→ Mention, Retweet

Colour: Most-used language

Colour: Monolingual language;Black: Bilingual

Page 22: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Twitter: Bilinguals bridge clusters

◦ Twitter user

−→ Mention, Retweet

Colour: Most-used language

Colour: Monolingual language;Black: Bilingual

Page 23: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Twitter: Bilinguals bridge clusters

Size of the largest, weakly-connected component (left), total number ofcomponents (center), and average size of the components (right) created byremoving all bilingual users, an equivalent number of monolingual users randomly,an equivalent number of all users randomly, and removing all bilingual users from anetwork with the same degree distribution but with edges randomly shuffled. Boxplots show values from 100 realizations. Mean values are indicated with +.

Page 24: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Twitter: Bilinguals bridge clusters

Removing bilingual users results in asmaller largest, weakly-connectedcomponent compared to removing anequivalent number of monolingualusers randomly, an equivalent numberof all users randomly, or removing allbilingual users from a network withthe same degree distribution but withedges randomly shuffled. Box plotsshow values from 100 realizations.Mean values are indicated with +.

Page 25: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Twitter: Bridging by language

The percentage of remaining users not in the largest connected componentafter removing all users in different languages.

Page 26: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Twitter: Cross-language connections

ar

de

en

es fil

fr

it

jako

ms

nl

pt

ru

th

tr

Mentions and retweets acrosslanguages

Nodes represent most-usedlanguage (size proportional tonumber of primary users)

Edges are weighted by thepercent error in the expected vs.the actual number of mentionsand retweets between languages

Colors indicate clusters found bythe infomap communitydetection algorithm

Page 27: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Bilingual bridging

Bilinguals on Twitter are in network positions to serve asbridges,

but what about the content they write in differentlanguages?

Bilinguals able to recall memories and taught knowledge more easily whenthe language of encoding matched the language of retrieval (Marian &Neisser, 2000; Marian & Fausey, 2006)

Page 28: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Similar text in different languages

Page 29: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Similar text in different languages

Page 30: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Similar text in different languages

Page 31: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Wikipedia: Do bilinguals bridge editions?

Do bilinguals edit similar articles across languages?

A large number of users did not edit any of the same articles in their primarylanguages, but a large number of users also always edited the same articles in theirprimary languages.

Page 32: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Wikipedia: Do bilinguals bridge editions?

Do bilinguals edit similar articles across languages?

A large number of users did not edit any of the same articles in their primarylanguages, but a large number of users also always edited the same articles in theirprimary languages.

Page 33: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Awareness and discovery

Awareness and discovery of relevant foreign-languagecontent is difficult for bilinguals

Many interfaces are designed for monolingual speakers

Page 34: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Other network statistics

With many nodes visualizations are often difficult/impossible to interpret.Statistical measures can be very revealing, however.

Node-level

Degree (in, out): How many incoming/outgoing edges does a node have?Centrality (next slide)Constraint

Network-level

Components: Number of disconnected subsets of nodesDensity: observed edges

maximum number of edges possible

Clustering coefficient closed tripletsconnected triples

Path length distributionDistributions of node-level measures

Page 35: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Centrality measures

Degree

Closeness: Measures the average geodesic distance to ALL other nodes.Informally, an indication of the ability of a node to diffuse a propertyefficiently.

Betweenness: Number of shortest paths the node lies on. Informally,the betweenness is high if a node bridges clusters.

Eigenvector: A weighted degree centrality (inbound links from highlycentral nodes count more).

PageRank: Not strictly a centrality measure, but similar to eigenvectorbut modeled as a random walk with a teleportation parameter

Page 36: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Implications: Bilingualism

Bilingualism is the norm offline, but not online

Twitter: 11%, Wikipedia: 15%

Differing level of competency in different languages (mirrors offline)

Proportion of bilinguals ∝ 1/self-focus bias

Importance of content discovery and interface design

Bilinguals are important

Small, dedicated group; large-sized contributions in primary languages

Contributions in non-primary language small, but valuable

Different types of contributions

Mixed evidence on serving as bridges between languages

Page 37: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Active areas of research

Multilingual topic modelling, word embeddings

(Word-level) language detection

Identifying similar/translated content across languages

Computational sociolinguistics: Nguyen, Dogruoz, Rose, and de Jong (2016)

Page 38: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Applying results

Page 39: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Applying results

Page 40: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Further resources for networks

NetworkX documentation (http://networkx.github.io/documentation/latest/reference/)

Newman, M.E.J., Networks: An Introduction

Kadushin, C., Understanding Social Networks: Theories, Concepts, andFindings

De Nooy, W., et al., Exploratory Social Network Analysis with Pajek

Shneiderman B., and Smith, M., Analyzing Social Media Networks withNodeXL

Page 41: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Beyond the individual:Applications of social network analysis for the social

sciences

Dr Scott A. HaleOxford Internet Institute

http://www.scotthale.net/, @computermacgyve

8 March 2017

Page 42: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Aral, S., & Alstyne, M. V. (2011). The Diversity-Bandwidth Trade-off.American Journal of Sociology, 117(1), 90–171. Retrieved fromhttp://www.jstor.org/stable/10.1086/661238

Birner, B. (2005). Bilingualism (Tech. Rep.). Washington, DC, USA:Linguistic Socieyt of America. Retrieved fromhttp://www.linguisticsociety.org/files/Bilingual.pdf

Burt, R. S. (2004). Structural Holes and Good Ideas. The American Journalof Sociology, 110(2), 349–399. Retrieved fromhttp://www.jstor.org/stable/3568221

Erard, M. (2012, January). Are we Really Monolingual? Retrieved fromhttp://www.nytimes.com/2012/01/15/opinion/sunday/

are-we-really-monolingual.html

Granovetter, M. (1973). The Strength of Weak Ties. The American Journalof Sociology, 78(6), 1360–1380. Retrieved fromhttp://www.jstor.org/stable/2776392

Page 43: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Marian, V., & Fausey, C. M. (2006). Language-dependent memory inbilingual learning. Applied Cognitive Psychology, 20(8), 1025–1047.Retrieved from http://dx.doi.org/10.1002/acp.1242 doi:10.1002/acp.1242

Marian, V., & Neisser, U. (2000, Sep). Language-dependent recall ofautobiographical memories. Journal of Experimental Psychology:General, 129(3), 361–368. (Russian–English bilinguals / multilingualsremembered more experiences from the Russian-speaking period oftheir lives when interviewed in Russian and more experiences from theEnglish-speaking period of their lives when interviewed in English thanvice versa (Marian & Neisser, 2000).)

Nguyen, D., Dogruoz, A. S., Rose, C. P., & de Jong, F. (2016).Computational sociolinguistics: A survey. Computational Linguistics.Retrieved from https://arxiv.org/abs/1508.07544

Pariser, E. (2011). The filter bubble: What the Internet is hiding from you.London: Viking.

Page 44: Beyond the individual: Applications of social network analysis for … · 2017. 4. 24. · Network data formats Data format The simplest form of network is simply two spreadsheets

Raghavan, U. N., Albert, R., & Kumara, S. (2007, September). Near lineartime algorithm to detect community structures in large-scale networks.Phys. Rev. E, 76(3), 36106. Retrieved fromhttp://link.aps.org/doi/10.1103/PhysRevE.76.036106 doi:10.1103/PhysRevE.76.036106

Uzzi, B., & Spiro, J. (2005). Collaboration and Creativity: The Small WorldProblem. The American Journal of Sociology, 111(2), 447–504.Retrieved from http://www.jstor.org/stable/10.1086/432782