influence and correlation in social networks aris anagnostopoulos ravi kumar mohammad mahdian

Influence and Correlation in Social Networks

Aris AnagnostopoulosRavi Kumar

Mohammad Mahdian

Preliminaries

- Correlations exist in users' behaviors

Preliminaries

- Correlations exist in users' behaviors - Representation: individuals are nodes of a social graph, G every node is "active" or "inactive" - Formally, correlation = if u and v are adjacent in G: the event that u becomes active is correlated with v becoming active

Preliminaries

- Correlations exist in users' behaviors - Representation: individuals are nodes of a social graph, G every node is "active" or "inactive" - Formally, correlation = if u and v are adjacent in G: the event that u becomes active is correlated with v becoming active

- Want to distinguish between different sources of social correlation

Models of Social Correlation

- Homophily = tendency for individuals to choose friends with similar characteristics / preferences



- Confounding = external influence from elements in the environment (confounding factors)



- Confounding = external influence from elements in the environment (confounding factors)

- Influence = the action of one individual induces another individual to act in a similar way.

Motivation

- Useful to know when social influence is the source of correlation

Motivation


- Viral marketing -> want to target select individuals

Motivation



- Influence behavior -> create "role models" (e.g. in fashion)

Motivation




- We want to identify situations when such techniques can be applied.

Motivation




- We want to identify situations when such techniques can be applied.

- Also useful for analysis (predicting future state of network)

Modeling Influence

1. Graph G drawn according to some distribution

Modeling Influence

1. Graph G drawn according to some distribution 2. In each of the time steps 1, ..., T, each non-active agent decides whether to become active.

Modeling Influence

1. Graph G drawn according to some distribution 2. In each of the time steps 1, ..., T, each non-active agent decides whether to become active. 3. An agent becomes active with probability p(a), a function of the number of neighboring and active nodes.

or, alternatively,

Some remarks...

- The coefficient α measures social correlation.

Some remarks...


- Since actions are stored, a represents the number of users active at any earlier time step

Some remarks...



- This model is relatively simplistic: - the probability does not vary between nodes - or as time passes

Some remarks...



- This model is relatively simplistic: - the probability does not vary between nodes - or as time passes

- However, these simplifying assumption are practical

Estimating α, β

- Can estimate using maximum likelihood logistic regression

- Maximize expression

whereis the number of users who at the beginning of time had a active friends and became active at time t

The Shuffle Test

- Idea: if influence does not play a role, then the timing of activations amongst users should be independent of each other:

Pr(a active before b) = Pr(b active before a)

The Shuffle Test

1. Estimate α for initial graph2. Randomly permute the order in which active nodes have been activated:

set the time of

3. Estimate α' for this configuration4. If the values for α and α' are close to each other, the model exhibits little or no social influence.

The Edge-reversal Test

1. reverse direction of all the edges 2. run the same logistic regression on the data using the new graph

If correlation is not due to influence, then α should not change

Generative Models

- No Correlation

- Influence

- Correlation, no influence

Generative Models - No Correlation

- network grows just as the real data - at every step, randomly pick n nodes, and make them active

Influence Model- network grows just as the real data - at every step, every inactive node flips a coin, with

Correlation, No Influence Model

- network grows just as the real data - Pick a subset S of G: - randomly pick centers, add a ball of radius 2 from each to S - do this until |S| reaches parameter L- Pick nodes to become active uniformly at random, from S

Distinguishing Influence: Shuffle Test

Influence:

Correlation:

Distinguishing Influence: Edge Reversal

Correlation:

Influence:

Real Data: the Flickr Dataset

- analyzed 800K users over 16 months - about 340K exhibited tagging behavior

- size of giant component: 160K

- 2.8M directed edges, 28.5% not mutual

- analyzed 1,700 tags independently - various types (event, color, object, etc) - various numbers of users - various growth patterns (bursty, smooth, periodic)

Distinguishing Influence in Flickr

Shuffle test

Distinguishing Influence in Flickr

Edge reversal test

Some Influence

- can discover traces of influence by looking at similar tags

Some Influence

- can discover traces of influence by looking at similar tags - for the tag "graffiti", the difference between αs was 0

- however, for the misspelling "grafitti", difference was slightly larger

- with even less common misspelling "graffitti", difference increased even more

Conclusions

- distinguishing between correlation and causation is difficult

Conclusions


- timing information can help answer the question (shuffle)

Conclusions


- timing information can help answer the question (shuffle)

- knowing of asymmetric social ties is also useful (edge-reversal)

Further research directions

- formal verification of results? (controlled experiments) - quantification of the strength of influence? - identify which nodes influence others - what if social ties are symmetric? - distinguishing between other forms of correlation

- distinguishing between different forms of social influence

Questions?

influence and correlation in social networks aris anagnostopoulos ravi kumar mohammad mahdian

Documents

active slide

social influence

source of correlation

fashion slide

distribution slide

social graph

modeling influence

external influence