social network analysis - utkweb.eecs.utk.edu/~cphill25/cs594_spring2017/...social networks what is...

Post on 16-Sep-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Social Network AnalysisColin Bird, Clarence Jackson, Brett Hagan

Questions

1. In what year did Leo Katz introduce Katz Centrality?

2. What clustering algorithm did we implement?

3. What was the name of the professor whose student adopted using lines and

points to represent social relations?

Colin Bird

Clarence Jackson II

Major: Computer Science

Advisor: Dr. Michael Langston

Research: Graph Theoretics, Machine Learning

Hometown: Flint, MI

Interests: Basketball, gaming, math, tech

Brett Hagan

Computer Science Undergraduate, Senior

Born in Morristown, TN

Hobbies include music production, gaming, and Tennessee athletics

Presentation Outline

Overview

History

Algorithms

Applications

Implementations

Open Issues

Social Networks What is a social network?

A collection of social entities and their interactions. Typically, these entities are

people, but they could represent other things.

Social Network Analysis (SNA)What is social network analysis?

The process of investigating social structures through the use of networks

and graph theory. There is an assumption of non-randomness or

locality. This condition is the hardest to formalize, but the intuition is

that relationships tend to cluster.

Telephone Networks

Email Networks

Collaboration Networks

Examples of Social Networks

History of SNA

Social structure developed as one of the early key concepts

in the social sciences.

Early 20th century scientists began to systematically theorize

social relationships.

Mathematical and computational models are at the base of

more current applications.

History of SNA

Leopold Von Wiese (1924/1932) - Adopted using lines

and points to describe social relations.

Jacob Moreno (1934) - Introduced the idea of

depicting social structure as a network diagram

(‘sociometry’)

History of SNA

Leo Katz (1953) - Measure of centrality in a network,

used to measure degrees of influence in a social

network.

Ithiel de Sola Pool and Manfred Kochen (1978) -

Introduced small world model quantifying the distance

between people through chains of connections.

Robert Luce & Albert Perry (1949) - FIrst to use graph

theoretics for SNA, specifically cliques.

Centrality

Centrality: Indicators of centrality identify the most important vertices in a graph

Intuitively, nodes in a social network with a higher centrality measure will be the

nodes we are probably most interested in

Eigenvector centrality was an early method, but inapplicable to Directed Acyclic

Graphs

Katz Centrality: Introduced by Leo Katz in 1953

Katz Centrality

Computes relative influence of individual nodes within a network

Takes into account the total number of walks between pairs

Let A be the adjacency matrix of a network under consideration. Elements (A,ij)

of A are variables that take a value 1 if a node i is connected to node j and 0

otherwise. The powers of A indicate the presence (or absence) of links

between two nodes through intermediaries.

𝛂 is an attenuation factor that penalizes connections made with distant nodes

Katz Centrality

Attenuation factor = .5

Peter Ashley

Ben

Tim

John Eric

John and Ben are neighbors of Tim, so the

weight assigned to these edges is (.5)^1 =

.5

Sarah

.5.5

Katz Centrality

Attenuation factor = .5

Peter Ashley

Ben

Tim

John

Eric

Sarah

Katz Centrality

Attenuation factor = .5

Peter Ashley

Ben

Tim

John Eric

Sarah is a path length of three away from

Tim, so the weight of the edge is (.5)^3 =

.125

Sarah

.5.5

.25.25

.25

.125

Katz(Tim) = 2(.5) + 3(.25) + (.125) = 1.875

Katz Centrality

Attenuation factor = .3

Peter Ashley

Ben

Tim

John Eric

Lowering the attenuation factor causes

longer paths to have less influence.

Sarah

.3.3

.09.09

.09

.027

Katz(Tim) = 2(.3) + 3(.09) + (.027) = .897

Clustering

Graph Clustering: Finding sets of related vertices in a graph

From a Social Network perspective, we might expect different clusters of users

to have sets of common friends or interests

The Markov Clustering Algorithm (MCL)

Developed by Stijn van Dongen in 2000 at the Centre for Mathematics and

Computer Science in the Netherlands

Markov Clustering Algorithm (MCL)

Markov Chain: Sequence of variables where, given the present state, the past

and future states are independent

In the MCL, our variables in the Markov Chain will be stochastic probability

matrices

The Markov Clustering Algorithm (MCL)

During early powers of the Markov Chain, edge weights are higher in links

within clusters and lower in links between clusters

The MCL boosts this effect using two mathematical operations

Expansion: Taking the Markov Chain transition matrix powers

Inflation: Raising columns to non-negative powers, followed by normalization

The Markov Clustering Algorithm (MCL)

Expansion: Allows flow to connect to different parts of the graph

Inflation: Responsible for both the strengthening and weakening of current.

Corresponds to taking Hadamard power of matrix followed by normalization.

The Markov Clustering Algorithm (MCL)

1. Input graph, power parameter e, and inflation parameter r

2. Create adjacency matrix

3. Normalize adjacency matrix to create stochastic probability matrix

4. Expansion: matrix to the power of e

5. Inflation: inflation operation with parameter r

6. Alternate between expansion and inflation until convergence

7. Interpret resulting matrix to extrapolate clusters

Applications

Crime (FBI,NSA,Terror Prevention)

Health Care (Primary care Pattern analysis)

Mining social media data

Implementation and results

● First need to collect to data

● First started using tools we already created

● Used open source libraries for betweenness

centrality

● Implemented Markov Clustering

Cooking the data

● Facebook

○ Click Farm 1 (500 bots ~ 20% densely interconnected)

○ Click Farm 2 (500 people)

● Twitter○ Click Farm 1 (10k bots ~ 2% densely interconnected)

○ Click Farm 2 (10k people)

MCL Clustering of Facebook w/ Click Farms

MCL Facebook actual

Facebook w/ MCL Clustering close up

Without MCL Clustering

With MCL Reduction

Open Issues

Combative nature of the problem (for click farms)

False positives are harmful

Large, dynamic data

References

Otte, Evelien; Rousseau, Ronald (2002). "Social network analysis: a powerful strategy, also for the information sciences".

Journal of Information Science. 28 (6): 441–453. doi:10.1177/016555150202800601. Retrieved 2015-03-23.

Katz, L. (1953). A New Status Index Derived from Sociometric Analysis. Psychometrika, 39–43.

Leo Katz: A New Status Index Derived from Sociometric Index. Psychometrika 18(1):39–43, 1953

http://phya.snu.ac.kr/~dkim/PRL87278701.pdf

https://www.researchgate.net/publication/281368621_Network_Analysis_History_of?enrichId=rgreq-

93ca7beab51cc6ca0390597397632fb3-

XXX&enrichSource=Y292ZXJQYWdlOzI4MTM2ODYyMTtBUzoyNjg0MTYxNzkyNDA5NjNAMTQ0MTAwNjgxMjI4Nw%3D%

3D&el=1_x_2&_esc=publicationCoverPdf

Discussion

Questions

1. In what year did Leo Katz introduce Katz Centrality?

2. What clustering algorithm did we implement?

3. What was the name of the professor whose student adopted using lines and

points to represent social relations?

top related