social network analysis - utkweb.eecs.utk.edu/~cphill25/cs594_spring2017/...social networks what is...
TRANSCRIPT
Social Network AnalysisColin Bird, Clarence Jackson, Brett Hagan
Questions
1. In what year did Leo Katz introduce Katz Centrality?
2. What clustering algorithm did we implement?
3. What was the name of the professor whose student adopted using lines and
points to represent social relations?
Colin Bird
Clarence Jackson II
Major: Computer Science
Advisor: Dr. Michael Langston
Research: Graph Theoretics, Machine Learning
Hometown: Flint, MI
Interests: Basketball, gaming, math, tech
Brett Hagan
Computer Science Undergraduate, Senior
Born in Morristown, TN
Hobbies include music production, gaming, and Tennessee athletics
Presentation Outline
Overview
History
Algorithms
Applications
Implementations
Open Issues
Social Networks What is a social network?
A collection of social entities and their interactions. Typically, these entities are
people, but they could represent other things.
Social Network Analysis (SNA)What is social network analysis?
The process of investigating social structures through the use of networks
and graph theory. There is an assumption of non-randomness or
locality. This condition is the hardest to formalize, but the intuition is
that relationships tend to cluster.
Telephone Networks
Email Networks
Collaboration Networks
Examples of Social Networks
History of SNA
Social structure developed as one of the early key concepts
in the social sciences.
Early 20th century scientists began to systematically theorize
social relationships.
Mathematical and computational models are at the base of
more current applications.
History of SNA
Leopold Von Wiese (1924/1932) - Adopted using lines
and points to describe social relations.
Jacob Moreno (1934) - Introduced the idea of
depicting social structure as a network diagram
(‘sociometry’)
History of SNA
Leo Katz (1953) - Measure of centrality in a network,
used to measure degrees of influence in a social
network.
Ithiel de Sola Pool and Manfred Kochen (1978) -
Introduced small world model quantifying the distance
between people through chains of connections.
Robert Luce & Albert Perry (1949) - FIrst to use graph
theoretics for SNA, specifically cliques.
Centrality
Centrality: Indicators of centrality identify the most important vertices in a graph
Intuitively, nodes in a social network with a higher centrality measure will be the
nodes we are probably most interested in
Eigenvector centrality was an early method, but inapplicable to Directed Acyclic
Graphs
Katz Centrality: Introduced by Leo Katz in 1953
Katz Centrality
Computes relative influence of individual nodes within a network
Takes into account the total number of walks between pairs
Let A be the adjacency matrix of a network under consideration. Elements (A,ij)
of A are variables that take a value 1 if a node i is connected to node j and 0
otherwise. The powers of A indicate the presence (or absence) of links
between two nodes through intermediaries.
𝛂 is an attenuation factor that penalizes connections made with distant nodes
Katz Centrality
Attenuation factor = .5
Peter Ashley
Ben
Tim
John Eric
John and Ben are neighbors of Tim, so the
weight assigned to these edges is (.5)^1 =
.5
Sarah
.5.5
Katz Centrality
Attenuation factor = .5
Peter Ashley
Ben
Tim
John
Eric
Sarah
Katz Centrality
Attenuation factor = .5
Peter Ashley
Ben
Tim
John Eric
Sarah is a path length of three away from
Tim, so the weight of the edge is (.5)^3 =
.125
Sarah
.5.5
.25.25
.25
.125
Katz(Tim) = 2(.5) + 3(.25) + (.125) = 1.875
Katz Centrality
Attenuation factor = .3
Peter Ashley
Ben
Tim
John Eric
Lowering the attenuation factor causes
longer paths to have less influence.
Sarah
.3.3
.09.09
.09
.027
Katz(Tim) = 2(.3) + 3(.09) + (.027) = .897
Clustering
Graph Clustering: Finding sets of related vertices in a graph
From a Social Network perspective, we might expect different clusters of users
to have sets of common friends or interests
The Markov Clustering Algorithm (MCL)
Developed by Stijn van Dongen in 2000 at the Centre for Mathematics and
Computer Science in the Netherlands
Markov Clustering Algorithm (MCL)
Markov Chain: Sequence of variables where, given the present state, the past
and future states are independent
In the MCL, our variables in the Markov Chain will be stochastic probability
matrices
The Markov Clustering Algorithm (MCL)
During early powers of the Markov Chain, edge weights are higher in links
within clusters and lower in links between clusters
The MCL boosts this effect using two mathematical operations
Expansion: Taking the Markov Chain transition matrix powers
Inflation: Raising columns to non-negative powers, followed by normalization
The Markov Clustering Algorithm (MCL)
Expansion: Allows flow to connect to different parts of the graph
Inflation: Responsible for both the strengthening and weakening of current.
Corresponds to taking Hadamard power of matrix followed by normalization.
The Markov Clustering Algorithm (MCL)
1. Input graph, power parameter e, and inflation parameter r
2. Create adjacency matrix
3. Normalize adjacency matrix to create stochastic probability matrix
4. Expansion: matrix to the power of e
5. Inflation: inflation operation with parameter r
6. Alternate between expansion and inflation until convergence
7. Interpret resulting matrix to extrapolate clusters
Applications
Crime (FBI,NSA,Terror Prevention)
Health Care (Primary care Pattern analysis)
Mining social media data
Implementation and results
● First need to collect to data
● First started using tools we already created
● Used open source libraries for betweenness
centrality
● Implemented Markov Clustering
Cooking the data
○ Click Farm 1 (500 bots ~ 20% densely interconnected)
○ Click Farm 2 (500 people)
● Twitter○ Click Farm 1 (10k bots ~ 2% densely interconnected)
○ Click Farm 2 (10k people)
MCL Clustering of Facebook w/ Click Farms
MCL Facebook actual
Facebook w/ MCL Clustering close up
Without MCL Clustering
With MCL Reduction
Open Issues
Combative nature of the problem (for click farms)
False positives are harmful
Large, dynamic data
References
Otte, Evelien; Rousseau, Ronald (2002). "Social network analysis: a powerful strategy, also for the information sciences".
Journal of Information Science. 28 (6): 441–453. doi:10.1177/016555150202800601. Retrieved 2015-03-23.
Katz, L. (1953). A New Status Index Derived from Sociometric Analysis. Psychometrika, 39–43.
Leo Katz: A New Status Index Derived from Sociometric Index. Psychometrika 18(1):39–43, 1953
http://phya.snu.ac.kr/~dkim/PRL87278701.pdf
https://www.researchgate.net/publication/281368621_Network_Analysis_History_of?enrichId=rgreq-
93ca7beab51cc6ca0390597397632fb3-
XXX&enrichSource=Y292ZXJQYWdlOzI4MTM2ODYyMTtBUzoyNjg0MTYxNzkyNDA5NjNAMTQ0MTAwNjgxMjI4Nw%3D%
3D&el=1_x_2&_esc=publicationCoverPdf
Discussion
Questions
1. In what year did Leo Katz introduce Katz Centrality?
2. What clustering algorithm did we implement?
3. What was the name of the professor whose student adopted using lines and
points to represent social relations?