social network analysis. outline l background of social networks –definition, examples and...

Social Network Analysis

Outline

Background of social networks– Definition, examples and properties

Data in social networks– Data creation, flow and storage

Analytic tasks in social networks– Problems, solutions and examples

Summary

What is a Social Network?

A definition from Wikipedia– A social network is a social structure made

up of a set of social actors (such as individuals or organizations) and a set of the dyadic ties between these actors.

– Social network analysis: analyze the structure of the whole network, identify local and global patterns, locate influential entities, and examine network dynamics.

Social Network Representation

Graph Representation Matrix Representation

Social Network: Examples

The Scale and Growth of Social Networks

Facebook statistics

– 829 million daily active users on average in June 2014

– 1.32 billion monthly active users as of June 30, 2014

– 81.7% of daily active users are outside the U.S. and Canada

– 22% increase in Facebook users from 2012 to 2013 Facebook activities (every 20 minutes on Facebook)

– 1 million links shared

– 2 million friends requested

– 3 million messages senthttp://newsroom.fb.com/company-info/

http://www.statisticbrain.com/facebook-statistics/

Visualizing Friendships on Facebook

The Scale and Growth of Social Networks

Twitter statistics

– 271 million monthly active users in 2014

– 135,000 new users signing up every day

– 78% of Twitter active users are on mobile

– 77% of accounts are outside the U.S. Twitter activities

– 500 million Tweets are sent per day

– 9100 Tweets are sent per second

https://about.twitter.com/company

http://www.statisticbrain.com/twitter-statistics/

A Tweet Map of America

Properties of Large-Scale Social Networks

Scale-free distributions

Small-world effect

Strong community structure

Scale-free Distributions

Degree distribution in large-scale networks often follows a power law, that is, the fraction p(x) of nodes in the network having x connections to other nodes goes for large values of x as:

A.k.a. long tail distribution, scale-free distribution

Log-log Plot

Power law distribution becomes a straight line if plotted in a log-log scale

Friendship Network in Flickr Friendship Network in YouTube

Small-world Effect

“Six Degrees of Separation”

A famous experiment conducted by Travers and Milgram (1969)

– Subjects were asked to send a chain letter to his acquaintance in order to reach a target person

– The average path length is around 5.5

Verified on a planetary-scale IM network of 180 million users (Leskovec and Horvitz 2008)

– The average path length is 6.6

Facebook users (721 million) were separated by 4.74 degrees as of May 2011.

Diameter

Measures used to calibrate the small world effect– Diameter: the longest shortest path distance in a

network

– Average shortest path length

Example– The shortest distance between node 1 and node 9 is 4.

– The diameter of the network is 5, corresponding to the shortest distance between nodes 2 and 9.

Shortest PathThe Longest Shortest Path

Community Structure

Community: People in a group interact with each other more frequently than those outside the group

Friends of a friend are likely to be friends as well

Measured by clustering coefficient: – density of connections among one’s friends

Clustering Coefficient

d6=4, N6= {4, 5, 7,8}

k6=4 as e(4,5), e(5,7), e(5,8), e(7,8)

C6 = 4/(4*3/2) = 2/3

Average clustering coefficient

C = (C1 + C2 + … + Cn)/n

C = 0.61 for the left network

Data in Social Networks

Data creation

Data flow

Data storage

Data Creation in Social Networks

User profiles and relationships

User-generated content

– Text (blogs, microblogs, messages, reviews, etc.) 500 million tweets are sent per day.

– Images, audio, and video 100 hours of video are uploaded to YouTube every minute.

Distinction from Content in Traditional Media (Newspaper, TV, etc.)

Inexpensive to generate and publish

Widely accessible

Varying quality

Rich user interaction

Data Flow Architecture at Facebook

Hadoop: a distributed file system and map-reduce platform

Scribe: a distributed and scalable data bus that aggregates logs from web servers

Hive: a data warehousing framework for reporting, querying and analysis

Federated MySQL: contains all the Facebook site related data

[Thusoo et al., SIGMOD’10]

Data Storage at Facebook

The production cluster usually has to hold only one month’s worth of data

The ad hoc cluster needs to hold all the historical data, so that measures, models and hypotheses can be tested on historical data

Using gzip to compress data with a compression factor of 6-7

Cold Data Storage

Facebook uses 10,000 Blu-ray discs to store a petabyte (=1,000,000 GB) of ‘cold’ data that hardly ever needs to be accessed, including duplicates of its users’ photos and videos that Facebook keeps for backup purposes.

The Blu-ray system reduces costs by 50% and energy use by 80% compared with its current cold-storage system, which uses hard disk drives.

Server Racks in Facebook’s Data Center

Data Analytic Tasks in Social Networks

Community detection

Friend recommendation

Importance of nodes

Influence propagation

Event detection

Community Detection

What is a Community?

Community: It is formed by individuals such that those within a group interact with each other more frequently than with those outside the group

– a.k.a. group, cluster, cohesive subgroup, module in different contexts

Two types of groups in social networks– Explicit Groups: formed by user subscriptions

– Implicit Groups: implicitly formed by social interactions

Community Example

[McAuley and Leskovec, NIPS’2012]

Subjectivity of Community Definition

Each component is a communityA densely-knit

community

Definition of a community can be subjective.

Community Detection

Community detection: discovering groups in a network where individuals’ group memberships are not explicitly given

Some social media sites allow people to join groups, is it necessary to extract groups based on network topology?

– Not all sites provide community platform

– Not all people want to make effort to join groups

– Groups can change dynamically

Community Detection based on Cliques

Clique: a maximum complete subgraph in which all nodes are adjacent to each other

In a clique of size k, each node maintains degree >= k-1 (for example, node 7 with degree 4)

Nodes with degree < k-1 will not be included in the clique (for example, node 9 with degree 1)

Nodes 5, 6, 7 and 8 form a clique of size 4

Maximum Clique Example

In order to find a clique >3, remove all nodes with degree <=3-1=2

– Step 1. Remove nodes 2 and 9

– Step 2. Remove nodes 1 and 3

– Step 3. Remove node 4

Clique Percolation Method (CPM)

Clique is a very strict definition, unstable Normally use cliques as a core or a seed to find larger

communities

CPM is such a method to find overlapping communities– Input

A parameter k, and a network

– Procedure Find out all cliques of size k in a given network Construct a clique graph. Two cliques are adjacent if they share

k-1 nodes Each connected component in the clique graph forms a

community

CPM Example

Cliques of size 3:{1, 2, 3}, {1, 3, 4}, {4, 5, 6}, {5, 6, 7}, {5, 6, 8}, {5, 7, 8}, {6, 7, 8}

Communities: {1, 2, 3, 4}

{4, 5, 6, 7, 8}

Friend Recommendation

Friend Recommendation Example

What is Friend Recommendation?

Given a snapshot of a social network, can we recommend new friendships among its members that are likely to occur in the near future?

– a.k.a. link prediction

Observation: Users do not form friendship at random with all other users. Instead, they tend to prefer other users that are “close” to them.

link prediction

social network analysis. outline l background of social networks –definition, examples and...

facebook users

social structure

social networksproblems

twitter active users

daily active users

monthly active users

new users

social networksdata

Documents

social networks social outlets

social networks as learning networks

mining social networks for anomalies: methods and … ·...

claire l. kovacs, mapping paris: social and artistic...

social networks and graph...

social networks social networks& power laws

the analysis of social networks - university of...

social networks & social capital

running head: social psychology of networks sara... ·...

social networks / social media

social networks training programme - lemos&crane - social...

social networks - social networking

"social networks." (u. brandes, l. c. freeman and d. wagner)

l e a d e r networks copyright © 2014 leader networks, llc...

social networks & social media

psychology and social networks - semantic scholar ·...

social networks: analyzing social information in deep...

social networks

social networks, personalized advertising, and perceptions...

social networks