mining and analyzing social media part 2 - hicss47 tutorial - dave king
DESCRIPTION
Part 2 of a Tutorial on Mining and Analyzing Social Media at HICSS 47TRANSCRIPT
![Page 1: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/1.jpg)
Mining and Analyzing Social Media: Part 2Dave King
HICSS 47 - January 2014
![Page 2: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/2.jpg)
• Introduction to Social Network Analysis Metrics– Degrees of Separation– Hooray for Bollywood
• Standard Measures– Centrality– Cohesion
• Levels of Social Network Analysis – Facebook & Egocentric Analysis– Searching for Cohesive Subgroups
Agenda: Part 2
2
![Page 3: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/3.jpg)
3
Analyzing Social Networks – The World of MetricsSocial Network Analysis
Social scientists, physicists, computer scientists, and mathematicians have collaborated to create theories and
algorithms for calculating novel measurements of social networks and the people and things that populate them. These quantitative network metrics allow analysts to systematically dissect the
social world, creating a basis on which to compare networks, track changes in a network over time, and determine the relative position of individuals and clusters within a network.
![Page 4: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/4.jpg)
4
Wide Variety of Metrics AvailableSocial Network Analysis
![Page 5: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/5.jpg)
5
Best Known Social MetricSocial Network Analysis
6
![Page 6: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/6.jpg)
6
6 Degrees of SeparationSocial Network Analysis
A fascinating game grew out of this discussion. One of us suggested performing the following experiment to prove that the population of the Earth is closer together now than they have ever been before. We should select any person from the 1.5 billion inhabitants of the Earth—anyone, anywhere at all. He bet us that, using no more than five individuals, one of whom is a personal acquaintance, he could contact the selected individual using nothing except the network of personal acquaintances.
Frigyes Karinthy , Chains, 1929
I1 I2 I3 I4 I5P1 P2
Degrees of separation ~ Average Path Length ~ Distance
![Page 7: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/7.jpg)
Why the notoriety?Social Network Analysis
7
John Guare1990
Frigyes Karinthy1929
Stanley Milgram1967
Duncan Watts1998
Six Degrees of Kevin Bacon (1994)
![Page 8: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/8.jpg)
8
The Oracle of BaconSocial Networks
![Page 9: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/9.jpg)
9
Oracle of Bacon gone mainstreamSocial Network Analysis
![Page 10: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/10.jpg)
10
How the Oracle of Bacon WorksSocial Network Analysis
![Page 11: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/11.jpg)
Center of the Hollywood UniverseSocial Network Analysis
11
![Page 12: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/12.jpg)
of Bachchan
Social Network Analysis
Amitabh Bachchan Abhishek BachchanAishwarya Rai Bachchan
![Page 13: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/13.jpg)
13
There’s a new game in townSocial Network Analysis
![Page 14: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/14.jpg)
14
The Center of BollywoodSocial Network Analysis
HTML- Page Source
Relational DB – 627 Movies – 1061 Actors (2010-2013) Pajek .NET File
Wikipedia Web Page
![Page 15: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/15.jpg)
15
Bipartite Network – Movies and ActorsSocial Network Analysis
Pajek .NET File
![Page 16: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/16.jpg)
16
Analyzing BollywoodSocial Network Analysis
![Page 17: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/17.jpg)
17
Rules of Networks
• RULE 1: WE SHAPE OUR NETWORK• RULE 2: OUR NETWORK SHAPES
US• RULE 3: OUR FRIENDS AFFECT US• RULE 4: OUR FRIENDS’ FRIENDS’
FRIENDS AFFECT US• RULE 5: THE NETWORK HAS A LIFE
OF ITS OWN
Social Networks
![Page 18: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/18.jpg)
18
Bifurcated MethodologySocial Network Analysis
Local Measures
Global Measures
![Page 19: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/19.jpg)
19
Walks and Paths
• Two points may be directly connected or indirectly connected thru a sequence of lines called a walk.
• If points and lines are distinct it’s a path.
• The length of a path is the number of edges (or lines) that make it up.
• The shortest length is called the distance or geodesic.
Social Network Analysis
Undirected
Directed
Sample Walks((C,A),(A,D))
((C,A),(A,B),(B,D))((C,A),(A,D),(D,A),(A,D))
A
B
CD
A
B
CD
Sample Walks((C,A),(A,D))
((C,A),(A,B),(B,D))
Sample Paths((C,A),(A,D))
((C,A),(A,B),(B,D))
Sample Paths((C,A),(A,D))
((C,A),(A,B),(B,D))
![Page 20: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/20.jpg)
Centrality – Who is most influential?Social Network Analysis
20
Measure Definition Interpretation ReasoningDegree Number of edges or links. In
degree- links in, Out-degree - links out
How connected is a node? How many people can this person reach directly?
Higher probability of receiving and transmitting information flows in the network. Nodes considered to have influence over larger number of nodes and or are capable of communicating quickly with the nodes in their neighborhood.
Betweenness Number of times node or vertex lies on shortest path between 2 nodes divided by number of all the shortest paths
How important is a node in terms of connecting other nodes? How likely is this person to be the most direct route between two people in the network?
Degree to which node controls flow of information in the network. Those with high betweenness function as brokers. Useful where a network is vulnerable.
Closeness 1 over the average distance between a node and every other node in the network
How easily can a node reach other nodes? How fast can this person reach everyone in the network?
Measure of reach. Importance based on how close a node is located with respect to every other node in the network. Nodes able to reach most or be reached by most all other nodes in the network through geodesic paths.
Eigenvector Proporational to the sum of the eigenvector centralities of all the nodes directly connected to it.
How important, central, or influential are a node’s neighbors? How well is this person connected to other well-connected people?
Evaluates a player's popularity. Identifies centers of large cliques. Node with more connections to higher scoring nodes is more important.
![Page 21: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/21.jpg)
Cohesion – How well connected?Social Network Analysis
21
Cohesion Definition Interpretation Reasoning
Density Ratio of the number of edges in the network over the total number of possible edges between all pairs of nodes
How well connected is the overall network?
Perfectly connected network is called a "clique" and has a density of 1.
Clustering A node's clustering coefficient is the density of it's 1.5 degree egocentric network (ratio of connecting among ego's alters). For entire network it is the average of all the coefficients for the individual nodes.
What proportion of ego's alters are connected? More technically, how many nodes form triangular subgraphs with their adjacent nodes?
Measures certain aspects of "cliquishness." Proportion of you friends that are also friends with each other. Another way to measure is to determine (in a undirected) graph the ratio of the number of times that two links eminating from the same node are also linked.
Average Path Length (Distance)
Average number of edges or links between any two nodes (along the shortest path)
On average, how far apart are any two nodes?
This is synonymous with the "degrees of separation" in a network.
Diameter Longest (shortest path) between any two nodes
At most, how long will it take to reach any node in the network? Sparse networks usually have greater diameters.
Measure of the reach of the network
Centralization Normalize ratio of the sum of the variances of the centrality of each node from the most central node to the maximum sum possible
Indicates how unequal the distribution of centrality is in a network.
Measures how much variance there is in the distribution of centrality in a network. The measure applies to all forms of centrality.
![Page 22: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/22.jpg)
22
Generic Example for
• For each of the nodes what is it’s – Degree Centrality– Betweenness Centrality– Closeness Centrality– Eigenvector Centrality
• For the entire network what is it’s– Degree Centralization– Betweenness Centralization– Closeness Centralization– Eigenvector Centralization
Social Network Analysis
A
B E
GD
FC
H
RI N
P
KO
J
ML
Q
S
![Page 23: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/23.jpg)
23
Visualizing and Analyzing with ORASocial Network Analysis
![Page 24: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/24.jpg)
Centrality – Who is most important?Social Network Analysis
24
Eigenvector
Degree
Closeness
Betweenness
A
B E
GD
FC
H
RI N
P
KO
J
ML
Q
S
![Page 25: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/25.jpg)
Cohesion – How well connected?Social Network Analysis
25
Measure ValueNetwork Size 19Average Degree 3.37Degree Centralization 0.22Betweenness Centralization 0.48Closeness Centralization 0.27Eigenvector Centralization 0.56Clustering Coefficient 0.43Density 0.19Average Distance 3.06Diameter 8Number of Unreachable Nodes 0
Node ClusteringA 0.67B 0.67C 0.00D 0.40E 1.00F 0.50G 0.50H 0.10I 0.33J 0.29K 0.67L 0.67
M 0.33N 0.67O 1.00P 0.67Q 0.00R NAS NA
A
B E
GD
FC
H
RI N
P
KO
J
ML
Q
S
![Page 26: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/26.jpg)
26
The World of FacebookSocial Network Analysis
4.5B
Likes Daily
67%
53%F/47%M
Sex Ratio
.2B Users
Europe
4.8B
Pieces of ContentShared Daily
94%
1.2B
Active Users
18% .7B
Logon Daily
25%
300M/day
Photo Uploads
1/5 on FB
Of Total WebPage Views
.9B
Mobile Users
45%
5/second
New Profiles
30% of users
Age 24-34
20 minutes
Avg TimePer Visit
![Page 27: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/27.jpg)
27
Facebook in the WorldSocial Network Analysis
facebookstories.com/stories/1574/#color=language-official&story=1&country=SA
![Page 28: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/28.jpg)
28
Your World of FacebookSocial Network Analysis
app.thefacesoffacebook.com apps.facebook.com/challenger_meurs/?fb_source=appcenter&fb_appcenter=1
apps.facebook.com/touchgraph/Geographical.cz/socialMap/
![Page 29: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/29.jpg)
29
Egocentric Network – Facebook ExampleSocial Network Analysis
![Page 30: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/30.jpg)
30
Egocentric Network – Facebook ExampleSocial Network Analysis
![Page 31: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/31.jpg)
Egocentric Network – Facebook ExampleSocial Network Analysis
31
Netvizz v0.93 .gdf (GUESS) file format
One simple way to generate SNA data from Facebook
![Page 32: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/32.jpg)
32
Egocentric Network – Facebook ExampleSocial Network
Cleansed-Augmented Guess .GDF File used for analysis
guess.wikispot.org/The_GUESS_.gdf_format
![Page 33: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/33.jpg)
33
Egocentric Networks – by DegreeSocial Network Analysis
1.0 1.52.0
Ego
(Ego’s) Alter
Alter’s Alter
![Page 34: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/34.jpg)
34
Egocentric Network – Facebook ExampleSocial Network Analysis
![Page 35: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/35.jpg)
Egocentric Network – Facebook ExampleSocial Network Analysis
35
Getting Started With The Gephi Network Visualisation App
– My Facebook Network, Part I
blog.ouseful.info/2010/04/16/getting-started-with-gephi-network-
visualisation-app-my-facebook-network-part-i/
![Page 36: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/36.jpg)
36
Egocentric Network – Facebook ExampleSocial Network Analysis
PL = N(N-1)/2 = 3081 Ego Density = L/PL = 312/3081 =.10
Matrix Representation Graph Representation
![Page 37: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/37.jpg)
37
Egocentric Network – Facebook ExampleSocial Network Analysis
(Avg 7.9)
(Avg .26)
(Avg 7.9)
(Avg 32.0)
(Avg 2.7)
(Avg .66)
![Page 38: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/38.jpg)
38
Another Analytical Alternative - NodeXLSocial Network Analysis
![Page 39: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/39.jpg)
Facebook AnalysisSocial Network Analysis
39
The emergence of online social networking services over the past decade has revolutionized how social scientists study the structure of human … previously invisible social structures are being captured at tremendous scale and with unprecedented detail.
Accessed within 28 days of May ’11At least one friend
Over 13 years of age
![Page 40: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/40.jpg)
40
• Nearly fully connected, with 99.91% of individuals belonging to a single large connected component
• Confirm ‘six degrees of separation’ phenomenon on a global scale. Second, by studying
• While the Facebook graph as a whole is clearly sparse, the graph neighborhoods of users contain surprisingly dense
• Clear degree assortativity.
• Strong impact of age on friendship preferences as well as a globally modular community structure driven by nationality, but no gender homophily.
Social Network Analysis
![Page 41: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/41.jpg)
Facebook AnalysisSocial Network Analysis
41
14% for 100
Assortativity P(F|F) = .52P(F|M) = .51
![Page 42: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/42.jpg)
Facebook AnalysisSocial Network Analysis
42
Average4.7
Average4.3
World 92% 99.6%US 96% 99.7%
![Page 43: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/43.jpg)
43
Looking for Cohesive SubgroupsSocial Network Analysis
![Page 44: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/44.jpg)
44
Cohesive Subgroups
Many individuals have large clusters of friends corresponding to well-defined foci of interaction in their lives, such as their cluster of co-workers or the cluster of people with whom they attended college. Since many people within these clusters know each other, the clusters contain links of very high embeddedness, even though they do not necessarily correspond to particularly strong ties.
Sociology
![Page 45: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/45.jpg)
45
Cohesive Subgroups
• In a complete graph all the nodes are adjacent to one another
• A network is connected if every two nodes in the network are connected by some path in the network– A directed graph is strongly connected if and only if for
every pair of vertices x , y, there is a directed path from x to y and a directed path from y to x between them. It is weakly connected if there is at least an undirected path for every pair of vertices.
• A graph G has components G1 and G2 if no (undirected) path exists between any node in G1 to any node of G2 (component is an isolated subgraph)
• A link is a bridge in a network if it’s removal increases the number of components in the network
Social Network Analysis
A
B E
GD
FC
H
RI N
P
KO
J
ML
Q
S
![Page 46: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/46.jpg)
46
Cohesive Subgroups
• A clique is a maximal complete subnetwork containing three vertices or more.
• n-clique is a maximal subnetwork in which the maximum path length for connection is “n” (e.g. 2-clique has nodes connected by paths of 1 or 2 lengths).
• A k-core is a maximal subnetwork in which each vertex has at least degree k within the subnetwork (e.g. 3-core)
• Girvan-Newman Algorithm breaks complex networks into “communities” by removing edges (betweenness) resulting in fragmentation
Social Network Analysis
A
B E
GD
FC
H
RI N
P
KO
J
ML
Q
S
![Page 47: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/47.jpg)
47
Social Network Analysis
![Page 48: Mining and analyzing social media part 2 - hicss47 tutorial - dave king](https://reader036.vdocuments.us/reader036/viewer/2022062702/5549e78eb4c90512488b4c2c/html5/thumbnails/48.jpg)
48
It’s a Small World after all, or is it?Social Network Analysis
6.0 vs. 4.7