socail influence & homophilly

Post on 14-Dec-2014

1.391 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Quantifying the individual effects of Social Influence and Homophily in a Dataset.

TRANSCRIPT

Social Influence & Homophily

Nitish Upretinzu100@cse.psu.edu

OUTLINE

• Introduction and Review.• Motivation• Related Work• Problem Definition• Statistics Background • Methodology• Where to go from here?• Summary

PROBLEM DEFINITION

“Identifying and measuring individual Homophily and Social Influence effects on a dataset.”

Quick Review• Social Influence : Our friendship and behavior

is affected by Social Influence (to conform to our neighbors value).

• Selection: We have a tendency to be friends with people who are like us.

• Homophily: A widely observed social phenomena which states that “we tend to be similar to our friends”.

Quick Note before we start…

We will refer to Selection as Homophily (Reason: Authors assume that if Homophily effects are present, we tend to select individuals with similar values)

MOTIVATION

Selection Vs social influence: Why do we care?

• If Social Influence is a significant factor, then targeting key individuals and trying to modify undesirable behavior can be effective since we are then viewing such behavior as a process of influence spread.

• Otherwise, focusing on a few individuals will at best change the behavior of a few individuals.

REAL WORLD SCENARIO• A firm selling products to consumers in a social

network.• The firm knows that friends in the network

often make similar purchases. • What is the reason behind this similarity?• Is it because they have similar tastes, since,

after all, they are friends? • Is it because one influences the other’s

decision, as they communicate frequently?

Credits: (Homophily or Influence? – Analysis of Purchase Decisions in a Social Network Context Liye Ma, Alan Montgomery and Ramayya Krishnan )

How can the firm take advantage?

• If it is the taste similarity that drives the similar decisions, the firm should directly target friends of that customer by offering discounts to them.

• If, it is social influence that drives the similarity, the firm should incentivize that customer to promote the product or service to her friends.

Credits: (Homophily or Influence? – Analysis of Purchase Decisions in a Social Network Context Liye Ma, Alan Montgomery and Ramayya Krishnan )

SELF ANALYSISA Real World Problem worth Solving.

EXISTING WORK• A lot of research has gone into understanding

“Homophily” and “Social Influence” in social networks.

• Quickly mention studies which involve direct analysis of “Identifying and measuring Homophily and social influence effects”.

• This problem area serves as one of the biggest open ended challenges to Social Scientists. ( will make a good class project as well :D )

SURVEY OF RELATED WORK

RELATED WORK - 1

• “Homophily or Influence? – Analysis of Purchase Decisions in a Social Network Context” http://people.stern.nyu.edu/bakos/wise/papers/wise2009-5b2_paper.pdf

QUICK LOOK AT THE STUDY

• Phone call history dataset (3.7 Million) from an Indian Telecom company over a 6 month period for purchase records of monthly Caller Ring Back Tones (CRBT) subscription.

• Social Influence & Homophily is studied.• Study builds a “Hierarchical Bayesian model”

which simultaneously accounts for both Homophily and social influence effect in consumers’ decision process.

RELATED WORK - 2

• “Social selection and peer influence in an online social network.” http://www.irle.berkeley.edu/culture/conf2012/lewis_soc12.pdf

QUICK LOOK AT THE STUDY

• Employs Facebook activity of college students.• Coevolution of friendship and tastes in music,

movies and books over a 4 year time period is analyzed.

• A “Stochastic actor-based” modeling is employed to analyze individual effects of Social Influence & Homophily.

RELATED WORK - 3

• “Distinguishing influence-based contagion from Homophily driven diffusion in dynamic networks.”http://www.pnas.org/content106/51/21544.full.pdf

QUICK LOOK AT THE STUDY

• Employs the study of a longitudinal dataset that combines the global network of daily instant messaging (IM) traffic among 27.4 million users of Yahoo with day-by-day adoption of a mobile service application (Yahoo! Go)

• A sample estimation framework to distinguish influence based on “Matched sample estimation” is developed.

ANALYSIS OF EXISTING APPROACHES• Empirical Investigations

(Focuses on demonstrating the presence of Homophily and Influence in real world data sets)

• Significance Tests for Relational and Social network data

(Focuses mostly on static networks)

• Modeling Techniques for distinguishing Homophily & Influence.(Accuracy is impacted by suitability of model)

TODAY’S FOCUS“Randomization Tests for Distinguishing Social Influence and Homophily Effects.”https://www.cs.purdue.edu/homes/neville/papers/lafond-neville-www2010.pdf

INTRODUCTION• In Social Network, connected instances are

likely to have auto correlated attributes value.• “Two friends are more likely to share a

common political belief than two random strangers.”

• Presents a Randomization technique for temporal network data for measuring individual contribution of Homophily and Social Influence (details coming soon!).

THE EXPERIMENT / SUPPORT

• A subset of data from a Facebook group in Purdue.

• Time step from 2008(t) to 2009(t+1)• Hypothesis tested on :

1. Semi Synthetic Data with no Homophily & Social Influence.2. Semi Synthetic Data with strong Homophily or Influence

effect.3. Actual experiment on real dataset.

• Efficacy of the approach was proven for all conditions.

PROBLEM DEFINITION• Relational data represented as an undirected,

attributed graph G=(V,E)• Each node v belongs to V, has a number of attributes

(X1………….Xm)

• For a time step ‘t’, the attributes and relationships can change.

• Significant Influence : Attributes in t+1 depend on link structure at t.

• Significant Homophily : Link structure in t+1 will depend on attributes at t.

(Keep them in mind! We will come back to them)

BACKGROUND

• In Statistics, an association is a relationship between two statistically dependent quantities.

• ‘Relation Autocorrelation’ : Statistical dependency between values of the same variable on related object. ( Abundant in our dataset) Why?

• In this work we use the Chi-Square statistics.

STATISTICS 101

CHI-SQUARE STATISTICS

• How likely is an observed distribution due to chance?

• Observe 100 students to see “whether attending class influences how students perform on exam?”

• Four categories :– Students who attend class and pass.– Students who attend class and do not pass.– Students who do not attend class and pass.– Students who do not attend class and do not pass.

• Null Hypothesis : There is no difference based on attending classes.

CHI-SQUARE Continued….• The test compares the observed data to a model that

distributes the data according to the expectation that the variables are independent. Wherever the observed data doesn't fit the model, the likelihood that the variables are dependent becomes stronger, thus proving the null hypothesis incorrect!

• Degree of freedom : Values in final calculations that are free to vary.

• Calculate the Chi Square value. (How?)• Calculate the more interesting ‘p’ value (Percentage

likelihood that the null hypothesis is correct)

Calculating Relational Autocorrelation

CORRELATION GAIN

gain(t,t+1) = C( Xt+1, Gt+1 ) – C( Xt , Gt)

(The gain could be due to Homophily or Social Influence)

HOMOPHILY Continued…

If a Homophily effect is present in the data, the autocorrelation will increase when we consider the link changes from time t to time t+ 1 : C( Xt , Gt+1 ) – C( Xt , Gt )

(The Chi-Square value is a single number that adds up all the

differences between our actual data and the data expected.)

SOCIAL INFLUENCE Continued…

If an influence effect is present in the data, the autocorrelation will increase when we consider the attribute changes from time t to time t + 1: C( Xt +1 , Gt ) – C( Xt , Gt )

(The Chi-Square value is a single number that adds up all the differences between our actual data and the data expected.)

METHODOLOGY(Randomization Tests)

RANDOMIZATION TESTS

• Provide a robust statistical technique for hypothesis testing.

• Generates several Pseudosamples (permutations of original data sets).

• Correlation gain is calculated for each Pseudosample.

• Value of observed gain is then compared to distribution of scores.

• A high variance in comparison to the distribution is deemed significant.

ANALYSIS OF KEY ISSUES AND ASSUMPTIONS

(For Randomization Tests)

• Make an appropriate NULL Hypothesis.• The data is permuted in a way that accurately

reflects the null hypothesis.

SELF ANALYSISThe Approach is quite relevant and appropriate as there are no assumptions on the underlying model.Also both the attribute values and link change over time which focuses on assessing both Influence and Homophily.

NULL HYPOTHESIS

• H0H : Link changes are random and are not due

to attribute values in t.• H0

I : Attribute changes are random and are not due to friends in t.

• H0F : Both attribute and link changes are

random.

POSSIBLE PERMUTATIONS

CHOICE BASED RANDOMIZATION• For H0

H we can maintain the edge addition in t+1 but randomize the choice of target node so that each node has the same number of additions and deletions.

• For H0I we can randomized the choice of attribute

value to replace in t+1, so that any similarity of the value is destroyed.

• This is popularly referred to as “choice-based” randomization, as we are randomizing the result of choices(attribute/link changes)

CALCULATING CHOICE BASED RANDOMIZATION

• Non Trivial Problem.• A greedy assignment is involved.• Collect all the changes (edge & attributes).• Sort the nodes and attributes from those with

least number of random options to those with largest options.

• Prevents abusing the underlying NULL hypothesis

SELF ANALYSISWhere to go from here?

• Changing the granularity of time step to investigate deeper.

• Investigating why certain groups had more of Homophily or Social Influence?

• Apart from friendship, considering other influential effects.

SUMMARY

• Successful Employed a Randomization Technique for distinguishing Homophily and Social Influence.

• Tested the hypothesis on different synthetic-real world data sets.

• Different groups had Influence and Homophily vary to different degree based on group properties.

PERSONAL TAKEAWAY

Take a Statistics Class !

THANK YOU!

top related