beyond the individual: applications of social network analysis for … · 2017. 4. 24. · network...
TRANSCRIPT
Beyond the individual:Applications of social network analysis for the social
sciences
Dr Scott A. HaleOxford Internet Institute
http://www.scotthale.net/, @computermacgyve
8 March 2017
Introduction
The Oxford Internet Institute is . . .
a multidisciplinary department
of the University of Oxford
studying “life online”
I am . . .
a senior data scientist / computer scientist
analyzing large-scale data and
conducting experiments
to investigate multilingualism and collective action
Introduction
The Oxford Internet Institute is . . .
a multidisciplinary department
of the University of Oxford
studying “life online”
I am . . .
a senior data scientist / computer scientist
analyzing large-scale data and
conducting experiments
to investigate multilingualism and collective action
What’s different
Surveys ask questions of individuals, and wethen infer large-scale / population-levelcharacteristics from the data. This isatomistic and ignores the social connectionsand influences people have on one another.
Social network analysis tries to capture thesocial context of individuals and linkmicro-level behaviours to emergentmicro-level outcomes.
Classic findings
Weak ties
Homophily
Six degrees of separation
Networks to show social context
Networks (graphs) are set of nodes (verticies) connected by edges (links,ties, arcs)
Further details
Whole vs. ego: whole networks have allnodes within a natural boundary(platform, organization, etc.). An egonetwork has one node and all of itsimmediate neighbors.
Edges can be directed or undirected andweighted or unweighted
Additionally, networks may be multilayerand/or multimodal.
Networks to show social context
Networks (graphs) are set of nodes (verticies) connected by edges (links,ties, arcs)
Further details
Whole vs. ego: whole networks have allnodes within a natural boundary(platform, organization, etc.). An egonetwork has one node and all of itsimmediate neighbors.
Edges can be directed or undirected andweighted or unweighted
Additionally, networks may be multilayerand/or multimodal.
Why?
Characterize network structure
How far apart / well-connected are nodes?Are some nodes at more important positions?Is the network composed of communities?
How does network structure affect processes?
Information diffusionCoordination/cooperationResilience to failure/attack
A network
First questions when approaching a network
What are edges? What are nodes?
What kind of network?
Inclusion/exclusion criteria
Network data formats
Data format
The simplest form of network is simply two spreadsheets orcomma-separated values (csv) files. One sheet for the nodes of the networkand one sheet for the edges of the network.
Example
Nodesid name professionn1 Alan studentn2 Betty teachern3 Charlotte student
Edgesid source target weighte1 n1 n2 1e2 n1 n3 100
Gephi
Open-source, cross-platform GUI interface
Primary strength is to visualize networks
Basic statistical properties are also available
Alternatives include NodeXL, Pajek, GUESS, NetDraw, Tulip, and more
Other software
NodeXL (http://nodexl.codeplex.com/):
Add-in for Microsoft Excel (Windows only)Good data collection options (Twitter, Facebook, YouTube, . . . )Basic visualization
Pajek
http://pajek.imfm.si/doku.php?id=downloadStandalone, Windows (or Linux with wine)Good interactive environment for metrics and basic visualization
iGraph
install.packages(”igraph”) in R (r-project.org) (or python)Cross-platformText driven: powerful for analysis
NetworkX
Cross-platform python libraryExamples/introduction in this slide deck
An example: bilingualism
Given the big differences in the information available in different languages,do bilingual users on social media serve as ‘bridges’ introducing contentfrom one language into another language online?
Bilingualism and novel information
Bilingualism common (offline)
“multilingualism...[is] the norm for most of the world’s societies” (Birner,2005), with over half of Europe and over a fifth of the US multilingual(Erard, 2012); yet, many platforms are designed only with monolingual usersin mind.
Novel information
Benefits for connecting clusters both at a large, network level & at anindividual level (Granovetter, 1973; Burt, 2004; Uzzi & Spiro, 2005; Aral &Alstyne, 2011). Beyond ‘filter bubbles’ (Pariser, 2011), language is a largerestriction on available content.
Twitter: Language and structure
◦ Twitter user
−→ Mention, Retweet
© Clusters: Automated groups ofstrongly–connected users basedpurely on network structure
Colour: Most-used language
Twitter: Language and structure
◦ Twitter user
−→ Mention, Retweet
© Clusters: Automated groups ofstrongly–connected users basedpurely on network structure
Colour: Most-used language
Twitter: Language and structure
◦ Twitter user
−→ Mention, Retweet
© Clusters: Automated groups ofstrongly–connected users basedpurely on network structure
Colour: Most-used language
Twitter: Language and structure
◦ Twitter user
−→ Mention, Retweet
© Clusters: Automated groups ofstrongly–connected users basedpurely on network structure
Colour: Most-used language
Twitter: Language and structure
Label propagation algorithm (Raghavan, Albert, & Kumara, 2007) found20,253 clusters.
: Histograms of the size of clusters (left) and the number of languages within eachcluster (right). Modularity score of 0.81 for this division of the network.
Twitter: Bilinguals bridge clusters
◦ Twitter user
−→ Mention, Retweet
Colour: Most-used language
Colour: Monolingual language;Black: Bilingual
Twitter: Bilinguals bridge clusters
◦ Twitter user
−→ Mention, Retweet
Colour: Most-used language
Colour: Monolingual language;Black: Bilingual
Twitter: Bilinguals bridge clusters
◦ Twitter user
−→ Mention, Retweet
Colour: Most-used language
Colour: Monolingual language;Black: Bilingual
Twitter: Bilinguals bridge clusters
Size of the largest, weakly-connected component (left), total number ofcomponents (center), and average size of the components (right) created byremoving all bilingual users, an equivalent number of monolingual users randomly,an equivalent number of all users randomly, and removing all bilingual users from anetwork with the same degree distribution but with edges randomly shuffled. Boxplots show values from 100 realizations. Mean values are indicated with +.
Twitter: Bilinguals bridge clusters
Removing bilingual users results in asmaller largest, weakly-connectedcomponent compared to removing anequivalent number of monolingualusers randomly, an equivalent numberof all users randomly, or removing allbilingual users from a network withthe same degree distribution but withedges randomly shuffled. Box plotsshow values from 100 realizations.Mean values are indicated with +.
Twitter: Bridging by language
The percentage of remaining users not in the largest connected componentafter removing all users in different languages.
Twitter: Cross-language connections
ar
de
en
es fil
fr
it
jako
ms
nl
pt
ru
th
tr
Mentions and retweets acrosslanguages
Nodes represent most-usedlanguage (size proportional tonumber of primary users)
Edges are weighted by thepercent error in the expected vs.the actual number of mentionsand retweets between languages
Colors indicate clusters found bythe infomap communitydetection algorithm
Bilingual bridging
Bilinguals on Twitter are in network positions to serve asbridges,
but what about the content they write in differentlanguages?
Bilinguals able to recall memories and taught knowledge more easily whenthe language of encoding matched the language of retrieval (Marian &Neisser, 2000; Marian & Fausey, 2006)
Similar text in different languages
Similar text in different languages
Similar text in different languages
Wikipedia: Do bilinguals bridge editions?
Do bilinguals edit similar articles across languages?
A large number of users did not edit any of the same articles in their primarylanguages, but a large number of users also always edited the same articles in theirprimary languages.
Wikipedia: Do bilinguals bridge editions?
Do bilinguals edit similar articles across languages?
A large number of users did not edit any of the same articles in their primarylanguages, but a large number of users also always edited the same articles in theirprimary languages.
Awareness and discovery
Awareness and discovery of relevant foreign-languagecontent is difficult for bilinguals
Many interfaces are designed for monolingual speakers
Other network statistics
With many nodes visualizations are often difficult/impossible to interpret.Statistical measures can be very revealing, however.
Node-level
Degree (in, out): How many incoming/outgoing edges does a node have?Centrality (next slide)Constraint
Network-level
Components: Number of disconnected subsets of nodesDensity: observed edges
maximum number of edges possible
Clustering coefficient closed tripletsconnected triples
Path length distributionDistributions of node-level measures
Centrality measures
Degree
Closeness: Measures the average geodesic distance to ALL other nodes.Informally, an indication of the ability of a node to diffuse a propertyefficiently.
Betweenness: Number of shortest paths the node lies on. Informally,the betweenness is high if a node bridges clusters.
Eigenvector: A weighted degree centrality (inbound links from highlycentral nodes count more).
PageRank: Not strictly a centrality measure, but similar to eigenvectorbut modeled as a random walk with a teleportation parameter
Implications: Bilingualism
Bilingualism is the norm offline, but not online
Twitter: 11%, Wikipedia: 15%
Differing level of competency in different languages (mirrors offline)
Proportion of bilinguals ∝ 1/self-focus bias
Importance of content discovery and interface design
Bilinguals are important
Small, dedicated group; large-sized contributions in primary languages
Contributions in non-primary language small, but valuable
Different types of contributions
Mixed evidence on serving as bridges between languages
Active areas of research
Multilingual topic modelling, word embeddings
(Word-level) language detection
Identifying similar/translated content across languages
Computational sociolinguistics: Nguyen, Dogruoz, Rose, and de Jong (2016)
Applying results
Applying results
Further resources for networks
NetworkX documentation (http://networkx.github.io/documentation/latest/reference/)
Newman, M.E.J., Networks: An Introduction
Kadushin, C., Understanding Social Networks: Theories, Concepts, andFindings
De Nooy, W., et al., Exploratory Social Network Analysis with Pajek
Shneiderman B., and Smith, M., Analyzing Social Media Networks withNodeXL
Beyond the individual:Applications of social network analysis for the social
sciences
Dr Scott A. HaleOxford Internet Institute
http://www.scotthale.net/, @computermacgyve
8 March 2017
Aral, S., & Alstyne, M. V. (2011). The Diversity-Bandwidth Trade-off.American Journal of Sociology, 117(1), 90–171. Retrieved fromhttp://www.jstor.org/stable/10.1086/661238
Birner, B. (2005). Bilingualism (Tech. Rep.). Washington, DC, USA:Linguistic Socieyt of America. Retrieved fromhttp://www.linguisticsociety.org/files/Bilingual.pdf
Burt, R. S. (2004). Structural Holes and Good Ideas. The American Journalof Sociology, 110(2), 349–399. Retrieved fromhttp://www.jstor.org/stable/3568221
Erard, M. (2012, January). Are we Really Monolingual? Retrieved fromhttp://www.nytimes.com/2012/01/15/opinion/sunday/
are-we-really-monolingual.html
Granovetter, M. (1973). The Strength of Weak Ties. The American Journalof Sociology, 78(6), 1360–1380. Retrieved fromhttp://www.jstor.org/stable/2776392
Marian, V., & Fausey, C. M. (2006). Language-dependent memory inbilingual learning. Applied Cognitive Psychology, 20(8), 1025–1047.Retrieved from http://dx.doi.org/10.1002/acp.1242 doi:10.1002/acp.1242
Marian, V., & Neisser, U. (2000, Sep). Language-dependent recall ofautobiographical memories. Journal of Experimental Psychology:General, 129(3), 361–368. (Russian–English bilinguals / multilingualsremembered more experiences from the Russian-speaking period oftheir lives when interviewed in Russian and more experiences from theEnglish-speaking period of their lives when interviewed in English thanvice versa (Marian & Neisser, 2000).)
Nguyen, D., Dogruoz, A. S., Rose, C. P., & de Jong, F. (2016).Computational sociolinguistics: A survey. Computational Linguistics.Retrieved from https://arxiv.org/abs/1508.07544
Pariser, E. (2011). The filter bubble: What the Internet is hiding from you.London: Viking.
Raghavan, U. N., Albert, R., & Kumara, S. (2007, September). Near lineartime algorithm to detect community structures in large-scale networks.Phys. Rev. E, 76(3), 36106. Retrieved fromhttp://link.aps.org/doi/10.1103/PhysRevE.76.036106 doi:10.1103/PhysRevE.76.036106
Uzzi, B., & Spiro, J. (2005). Collaboration and Creativity: The Small WorldProblem. The American Journal of Sociology, 111(2), 447–504.Retrieved from http://www.jstor.org/stable/10.1086/432782