contextual prediction of communication flow in social networks

33
1 Contextual Prediction of Communication Flow in Social Networks Munmun De Choudhury Hari Sundaram Ajita John Dorée Duncan Seligmann @IEEE Web Intelligence 2007 November 5, 2008 Arts, Media & Engineering Arizona State University, Tempe Collaborative Applications Research Avaya Labs, New Jersey

Upload: others

Post on 15-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Contextual Prediction of Communication Flow in Social Networks

1

Contextual Prediction of Communication

Flow in Social Networks

Munmun De Choudhury

Hari Sundaram

Ajita John

Dorée Duncan Seligmann

@IEEE Web Intelligence 2007 November 5, 2008

Arts, Media & Engineering

Arizona State University, Tempe

Collaborative Applications Research

Avaya Labs, New Jersey

Page 2: Contextual Prediction of Communication Flow in Social Networks

2

Introduction

� Why is the problem important?

• Determine information propagation

and the roles of people in the process.

• Targeted advertising, spread of

fashions and fads, innovations,

consumer interests etc.

• Determine community evolution.

A context based framework to

predict communication flow in large scale social networks.

Alice Bob

Communication Flow

November 5, 2008@IEEE Web Intelligence 2007

Spread of innovations

Page 3: Contextual Prediction of Communication Flow in Social Networks

3

Our Approach

� Computation of intent to

communicate and delay

between two individuals

on a particular topic.

• Communication context:

Neighborhood, Topic and

Recipient Context.

• A set of features capturing

communication semantics.

• A SVM Regression method for

prediction.

November 5, 2008@IEEE Web Intelligence 2007

Baseline

Our Approach

Imp

rov

em

en

t

in p

red

icte

d

err

or

Error in Prediction of Intent to communicate

� Experimental results on MySpace dataset with effective prediction (error ~15-20%).

Page 4: Contextual Prediction of Communication Flow in Social Networks

4

Related Work

� Work on information diffusion [Gruhl, Tomkins ’04].

� Early adoption based flow model for recommendation systems [Song ’06].

� Analysis of emails of software developers [Bird ’06].

� But in web based analysis, information flow is estimated from indirect evidence,• e.g. a topic appears on a blog several days after it appeared on another blog,

not from evidence of communication

� Context has not been considered.

Temporal Pattern of Blog Posts [Gruhl et al. 2004]

November 5, 2008@IEEE Web Intelligence 2007

Page 5: Contextual Prediction of Communication Flow in Social Networks

5

Introduction / Related work

Problem Statement

Communication Context

SVM Based prediction

MySpace dataset

Experimental Results

Conclusions

Outline

November 5, 2008@IEEE Web Intelligence 2007 5

• Two sub-problems:

Intent to communicate

Communication Delay

• A Physics Metaphor

Intent Delay

Page 6: Contextual Prediction of Communication Flow in Social Networks

6

� The probability that a person will engage herself in some communication (given a particular topic and at a certain point of time) with another person.• It is contingent upon several factors or features defined

by the communication context.

November 5, 2008 6

What is Intent to Communicate?

Movie: 40%

Sports: 40%

Movie: 80%

Dinner: 20%

@IEEE Web Intelligence 2007

Alice

Bob

Ann

Page 7: Contextual Prediction of Communication Flow in Social Networks

7

What is Delay in Propagation?

� The amount of time passed between the

reception of a message (on a certain topic) and

the corresponding response by a person.

@IEEE Web Intelligence 2007

Movie: 4 hours

Sports: 25 mins

Movie: 2 days

Dinner: 15 hours

November 5, 2008

Alice

Bob

Ann

Page 8: Contextual Prediction of Communication Flow in Social Networks

8

Wavefront Metaphor

Alice

Bob

Ann

� Thomas Young’s experiments on the wave theory of light.

� Three concepts:• Ann and Alice’s

messages: primary wavefronts.

• When Bob receives and responds: secondary wavefronts.

• Some of the secondary wavefronts travel back to Ann and Alice: backscatter.

@IEEE Web Intelligence 2007

Young’s double slit experiment

Wavefront Metaphor

November 5, 2008

Page 9: Contextual Prediction of Communication Flow in Social Networks

9

Introduction / Related work

Problem Statement

Communication ContextSVM Based prediction

MySpace dataset

Experimental Results

Conclusions

Outline

November 5, 2008@IEEE Web Intelligence 2007 9

• What is communication

context?

• Role of context

• Neighborhood context

• Topic context

• Recipient context

Neighborhood

Topic

Recipient

Page 10: Contextual Prediction of Communication Flow in Social Networks

10

Communication Context

� Communication context [Mani and Sundaram ‘07] is the set of attributes that affect communication between two individuals.

� Contextual attributes are dynamic [Dourish ’02].

• relationship between messages

• past communication behavior of a person

• response patterns from others

November 5, 2008@IEEE Web Intelligence 2007 10Mani and Sundaram ‘07

Page 11: Contextual Prediction of Communication Flow in Social Networks

11

Neighborhood Context: Susceptibility

� The susceptibility due to a contact v to her entire social

network in time slice ti is given by,

where,

|

|1

( , ) ( , , ),v w un

v u i j i

w j

t t tθ ϕ→

=

Λ = Λ∑ ∑

@IEEE Web Intelligence 2007 November 5, 2008

an indicator

function: 1 if

tj lies in time

slice ti and 0

otherwise

φ(Λ, tj, ti)

time-stamp

of the jth

message on

topic Λ from

v to u

tj

Susceptibility

AliceBob

Emily

Charlie

Donny

Page 12: Contextual Prediction of Communication Flow in Social Networks

12

Neighborhood Context: Backscatter

� The backscatter of u due to a contact v in time slice ti is

given by,

where,

|

|1

( , ) ( , , )v u un

v u u i j i

j

t t tθ ϕ→

→=

Λ = Λ∑

@IEEE Web Intelligence 2007 November 5, 2008

an indicator

function: 1 if tj

lies in time slice ti

and 0 otherwise

φ(Λ, tj, ti)

time-stamp of the

jth message on

topic Λ from v to u

tj

Backscatter

Bob

Emily

Charlie

Alice

Page 13: Contextual Prediction of Communication Flow in Social Networks

13

Topic Context: Message Coherence

� ConceptNet is used to compute distances between messages.

� Why ConceptNet?

• Expands on pure lexical terms, to compound terms – “buy food”

• Contains practical knowledge – we can infer that a student is near a library.

� The distance between a message m and a topic Λ is given as:

where,

( , ) maxmin ( , )c q kq k

d m d w wΛ =

@IEEE Web Intelligence 2007 November 5, 2008

a word

corresponding to

topic Λ

wk

a word in

message m

wq

Message Coherence

Page 14: Contextual Prediction of Communication Flow in Social Networks

14

Topic Context: Temporal Coherence

� Determined by the mean and variance of the difference

in the time stamps of messages.

� The mean μj is,

where,

( )( , , ) ( , , ) / ( , )j

j j i j i j

m t

t t T m t t n tµ∈

Λ = Λ − Λ∑

@IEEE Web Intelligence 2007 November 5, 2008

the number of messages on

topic Λ in the time slice tj

n(Λ,tj)

the index of a message of

topic Λ in the time slice tj

m

Temporal Coherence

Page 15: Contextual Prediction of Communication Flow in Social Networks

15

Recipient Context

� Reciprocity reflects the symmetry in communication.

� Communication correlation reflects the topical alignment of two individuals.

� Communication Significance reflects the importance of communication activity with a particular person with respect to the whole social network.

@IEEE Web Intelligence 2007 November 5, 2008

Communication

SignificanceReciprocity

Communication Correlation

Page 16: Contextual Prediction of Communication Flow in Social Networks

16

Introduction / Related work

Problem Statement

Communication Context

SVR Based prediction

MySpace dataset

Experimental Results

Conclusions

Outline

November 5, 2008@IEEE Web Intelligence 2007 16

• Sequential SVR approach

Page 17: Contextual Prediction of Communication Flow in Social Networks

17

The Prediction Algorithm

t t+1

t

t

t+1

Feature vectors, xi

Predicted intent, yi

Actual communication, yi’

t+1

t t+1

Error in prediction, E

t+2

t+2

t+2

t+2

November 5, 2008@IEEE Web Intelligence 2007

Page 18: Contextual Prediction of Communication Flow in Social Networks

18

Introduction / Related work

Problem Statement

Communication Context

SVM Based prediction

MySpace dataset

Experimental Results

Conclusions

Outline

November 5, 2008@IEEE Web Intelligence 2007 18

• Crawling Details

• Topology of the crawled network

Page 19: Contextual Prediction of Communication Flow in Social Networks

19

Crawling Statistics

� World’s largest social

networking site with over

108 million users.

� Crawling using a DFS

(Depth First Strategy).

@IEEE Web Intelligence 2007 November 5, 2008

A snapshot of MySpace

Sept 2005- Apr 2007Time-span

1,425,010Messages

20,000Users

Some statistics of crawled data:

Tom

Crawling

Page 20: Contextual Prediction of Communication Flow in Social Networks

20

Topology Characteristics

Average Path Length Distribution

for MySpace crawled data.

Topic Histogram

0.79Mean Clustering Coefficient

215.27 (γ= 2.01 )Average Degree per node

5.952Average Shortest Path Length

MeasureTopology Statistic

@IEEE Web Intelligence 2007 November 5, 2008

Page 21: Contextual Prediction of Communication Flow in Social Networks

21

Introduction / Related work

Problem Statement

Communication Context

SVM Based prediction

MySpace dataset

Experimental ResultsConclusions

Outline

November 5, 2008@IEEE Web Intelligence 2007 21

• Baseline heuristics for validation

• Prediction of intent and delay

• Feature evaluation

• Network Scalability

Page 22: Contextual Prediction of Communication Flow in Social Networks

22

� For intent to communicate:

• The ratio of the number of messages n sent by u to v

on topic Λ to the total number of messages on all Λ

sent by u to v in the past on all topics.

� For estimate of delay:

• The mean delay between two contacts u and v on topic

Λ is the mean delay between all pairs of corresponding

messages on the same topic.

• ConceptNet is used to compute message

correspondence.

November 5, 2008 22

Baseline Techniques

@IEEE Web Intelligence 2007

Page 23: Contextual Prediction of Communication Flow in Social Networks

23

Experimental Setup

� A randomly sampled user u from the set of Tom’s (the

super-user) contacts.

� A set of top eight contacts (v) of u (determined by

high message density).

� Recipient variability:

• Prediction of communication flow averaged over five

weeks for each contact.

� Temporal variability:

• Prediction of communication flow averaged over all eight

contacts for each of the five weeks.

November 5, 2008@IEEE Web Intelligence 2007 23

Page 24: Contextual Prediction of Communication Flow in Social Networks

24November 5, 2008 24

Predicted Intent

@IEEE Web Intelligence 2007

� The communication intent depends on a wide variety of

contextual factors (neighborhood, topic, and recipient);

� not just on prior probability of communication.

Page 25: Contextual Prediction of Communication Flow in Social Networks

25November 5, 2008 25

Predicted Estimate of Delay

@IEEE Web Intelligence 2007

� Delay may be strongly influenced by factors other than

the social network interaction (e.g. they may be habitual).

Page 26: Contextual Prediction of Communication Flow in Social Networks

26

� A person’s neighboring social network indeed

effects whether or not she will engage herself in a

particular communication quickly.

November 5, 2008 26

Evaluation of Features

@IEEE Web Intelligence 2007

Errors in L-O-O Procedure

0

5

10

15

20

25

30

35

No

Sus

cept

ibili

tyN

o B

acks

catte

r

No

Mes

sage

Coh

eren

ce

No

Tem

pora

l Coh

eren

ceN

o T

opic

Qua

ntity

No

Top

ic R

elev

ance

No

Rec

ipro

city

No

Com

mun

icat

ion

Cor

rela

tion

No

Com

mun

icat

ion

Sig

nific

ance

Err

or

(%)

Intent

Delay

Page 27: Contextual Prediction of Communication Flow in Social Networks

27

Scaling Experiment Details

� An exponential function: f(n)= exp(n/k), where k= 4.6 and n= 1, 2, 3,

4, …, 35 is used to choose networks with node out-degree values

f(n).

� Select the top three users corresponding to each f(n) based on high

message density.

@IEEE Web Intelligence 2007 November 5, 2008

Page 28: Contextual Prediction of Communication Flow in Social Networks

28

� With an increase in network size, the user is in

regular correspondence with only a small fraction

of the network.

November 5, 2008 28

Scalability of Intent

Topic A Topic B

@IEEE Web Intelligence 2007

Page 29: Contextual Prediction of Communication Flow in Social Networks

29

� Delay influenced by a majority with whom the user is not in

active communication.

� Delay may be affected due to intrinsic factors (e.g. habitual)

and less affected by the contextual factors.

November 5, 2008 29

Scalability of Delay

Topic A Topic B@IEEE Web Intelligence 2007

Page 30: Contextual Prediction of Communication Flow in Social Networks

30

Introduction / Related work

Problem Statement

Communication Context

SVM Based prediction

MySpace dataset

Experimental Results

Conclusions

Outline

November 5, 2008@IEEE Web Intelligence 2007 30

• Summary

• Contributions and Future Work

Page 31: Contextual Prediction of Communication Flow in Social Networks

31

� Predict communication flow in large scale social networks based on

communication context.

• identified three aspects : neighborhood,

topic and recipient context.

� Intent to communicate and delay predicted using SVR.

� Excellent results on a real world dataset MySpace.com

• for a single user

• networks of different sizes.

November 5, 2008 31

Summary

Neighborhood

Topic

Recipient

@IEEE Web Intelligence 2007

Page 32: Contextual Prediction of Communication Flow in Social Networks

32

� Consequences:

• Intent to communicate strongly affected by contextual factors.

• Delay is less affected.

� Modeling communication context is essential.

� Future work:

• Comparison against a standardized flow model e.g. epidemic

disease propagation model.

• Prediction, given a pair of users who are separated by n different

people in the social network.

November 5, 2008 32

Conclusions

@IEEE Web Intelligence 2007

Page 33: Contextual Prediction of Communication Flow in Social Networks

33November 5, 2008 33

Thanks!

November 5, 2008 33@IEEE Web Intelligence 2007