contextual prediction of communication flow in social networks
TRANSCRIPT
1
Contextual Prediction of Communication
Flow in Social Networks
Munmun De Choudhury
Hari Sundaram
Ajita John
Dorée Duncan Seligmann
@IEEE Web Intelligence 2007 November 5, 2008
Arts, Media & Engineering
Arizona State University, Tempe
Collaborative Applications Research
Avaya Labs, New Jersey
2
Introduction
� Why is the problem important?
• Determine information propagation
and the roles of people in the process.
• Targeted advertising, spread of
fashions and fads, innovations,
consumer interests etc.
• Determine community evolution.
A context based framework to
predict communication flow in large scale social networks.
Alice Bob
Communication Flow
November 5, 2008@IEEE Web Intelligence 2007
Spread of innovations
3
Our Approach
� Computation of intent to
communicate and delay
between two individuals
on a particular topic.
• Communication context:
Neighborhood, Topic and
Recipient Context.
• A set of features capturing
communication semantics.
• A SVM Regression method for
prediction.
November 5, 2008@IEEE Web Intelligence 2007
Baseline
Our Approach
Imp
rov
em
en
t
in p
red
icte
d
err
or
Error in Prediction of Intent to communicate
� Experimental results on MySpace dataset with effective prediction (error ~15-20%).
4
Related Work
� Work on information diffusion [Gruhl, Tomkins ’04].
� Early adoption based flow model for recommendation systems [Song ’06].
� Analysis of emails of software developers [Bird ’06].
� But in web based analysis, information flow is estimated from indirect evidence,• e.g. a topic appears on a blog several days after it appeared on another blog,
not from evidence of communication
� Context has not been considered.
Temporal Pattern of Blog Posts [Gruhl et al. 2004]
November 5, 2008@IEEE Web Intelligence 2007
5
Introduction / Related work
Problem Statement
Communication Context
SVM Based prediction
MySpace dataset
Experimental Results
Conclusions
Outline
November 5, 2008@IEEE Web Intelligence 2007 5
• Two sub-problems:
Intent to communicate
Communication Delay
• A Physics Metaphor
Intent Delay
6
� The probability that a person will engage herself in some communication (given a particular topic and at a certain point of time) with another person.• It is contingent upon several factors or features defined
by the communication context.
November 5, 2008 6
What is Intent to Communicate?
Movie: 40%
Sports: 40%
Movie: 80%
Dinner: 20%
@IEEE Web Intelligence 2007
Alice
Bob
Ann
7
What is Delay in Propagation?
� The amount of time passed between the
reception of a message (on a certain topic) and
the corresponding response by a person.
@IEEE Web Intelligence 2007
Movie: 4 hours
Sports: 25 mins
Movie: 2 days
Dinner: 15 hours
November 5, 2008
Alice
Bob
Ann
8
Wavefront Metaphor
Alice
Bob
Ann
� Thomas Young’s experiments on the wave theory of light.
� Three concepts:• Ann and Alice’s
messages: primary wavefronts.
• When Bob receives and responds: secondary wavefronts.
• Some of the secondary wavefronts travel back to Ann and Alice: backscatter.
@IEEE Web Intelligence 2007
Young’s double slit experiment
Wavefront Metaphor
November 5, 2008
9
Introduction / Related work
Problem Statement
Communication ContextSVM Based prediction
MySpace dataset
Experimental Results
Conclusions
Outline
November 5, 2008@IEEE Web Intelligence 2007 9
• What is communication
context?
• Role of context
• Neighborhood context
• Topic context
• Recipient context
Neighborhood
Topic
Recipient
10
Communication Context
� Communication context [Mani and Sundaram ‘07] is the set of attributes that affect communication between two individuals.
� Contextual attributes are dynamic [Dourish ’02].
• relationship between messages
• past communication behavior of a person
• response patterns from others
November 5, 2008@IEEE Web Intelligence 2007 10Mani and Sundaram ‘07
11
Neighborhood Context: Susceptibility
� The susceptibility due to a contact v to her entire social
network in time slice ti is given by,
where,
|
|1
( , ) ( , , ),v w un
v u i j i
w j
t t tθ ϕ→
=
Λ = Λ∑ ∑
@IEEE Web Intelligence 2007 November 5, 2008
an indicator
function: 1 if
tj lies in time
slice ti and 0
otherwise
φ(Λ, tj, ti)
time-stamp
of the jth
message on
topic Λ from
v to u
tj
Susceptibility
AliceBob
Emily
Charlie
Donny
12
Neighborhood Context: Backscatter
� The backscatter of u due to a contact v in time slice ti is
given by,
where,
|
|1
( , ) ( , , )v u un
v u u i j i
j
t t tθ ϕ→
→=
Λ = Λ∑
@IEEE Web Intelligence 2007 November 5, 2008
an indicator
function: 1 if tj
lies in time slice ti
and 0 otherwise
φ(Λ, tj, ti)
time-stamp of the
jth message on
topic Λ from v to u
tj
Backscatter
Bob
Emily
Charlie
Alice
13
Topic Context: Message Coherence
� ConceptNet is used to compute distances between messages.
� Why ConceptNet?
• Expands on pure lexical terms, to compound terms – “buy food”
• Contains practical knowledge – we can infer that a student is near a library.
� The distance between a message m and a topic Λ is given as:
where,
( , ) maxmin ( , )c q kq k
d m d w wΛ =
@IEEE Web Intelligence 2007 November 5, 2008
a word
corresponding to
topic Λ
wk
a word in
message m
wq
Message Coherence
14
Topic Context: Temporal Coherence
� Determined by the mean and variance of the difference
in the time stamps of messages.
� The mean μj is,
where,
( )( , , ) ( , , ) / ( , )j
j j i j i j
m t
t t T m t t n tµ∈
Λ = Λ − Λ∑
@IEEE Web Intelligence 2007 November 5, 2008
the number of messages on
topic Λ in the time slice tj
n(Λ,tj)
the index of a message of
topic Λ in the time slice tj
m
Temporal Coherence
15
Recipient Context
� Reciprocity reflects the symmetry in communication.
� Communication correlation reflects the topical alignment of two individuals.
� Communication Significance reflects the importance of communication activity with a particular person with respect to the whole social network.
@IEEE Web Intelligence 2007 November 5, 2008
Communication
SignificanceReciprocity
Communication Correlation
16
Introduction / Related work
Problem Statement
Communication Context
SVR Based prediction
MySpace dataset
Experimental Results
Conclusions
Outline
November 5, 2008@IEEE Web Intelligence 2007 16
• Sequential SVR approach
17
The Prediction Algorithm
t t+1
t
t
t+1
Feature vectors, xi
Predicted intent, yi
Actual communication, yi’
t+1
t t+1
Error in prediction, E
t+2
t+2
t+2
t+2
November 5, 2008@IEEE Web Intelligence 2007
18
Introduction / Related work
Problem Statement
Communication Context
SVM Based prediction
MySpace dataset
Experimental Results
Conclusions
Outline
November 5, 2008@IEEE Web Intelligence 2007 18
• Crawling Details
• Topology of the crawled network
19
Crawling Statistics
� World’s largest social
networking site with over
108 million users.
� Crawling using a DFS
(Depth First Strategy).
@IEEE Web Intelligence 2007 November 5, 2008
A snapshot of MySpace
Sept 2005- Apr 2007Time-span
1,425,010Messages
20,000Users
Some statistics of crawled data:
Tom
Crawling
20
Topology Characteristics
Average Path Length Distribution
for MySpace crawled data.
Topic Histogram
0.79Mean Clustering Coefficient
215.27 (γ= 2.01 )Average Degree per node
5.952Average Shortest Path Length
MeasureTopology Statistic
@IEEE Web Intelligence 2007 November 5, 2008
21
Introduction / Related work
Problem Statement
Communication Context
SVM Based prediction
MySpace dataset
Experimental ResultsConclusions
Outline
November 5, 2008@IEEE Web Intelligence 2007 21
• Baseline heuristics for validation
• Prediction of intent and delay
• Feature evaluation
• Network Scalability
22
� For intent to communicate:
• The ratio of the number of messages n sent by u to v
on topic Λ to the total number of messages on all Λ
sent by u to v in the past on all topics.
� For estimate of delay:
• The mean delay between two contacts u and v on topic
Λ is the mean delay between all pairs of corresponding
messages on the same topic.
• ConceptNet is used to compute message
correspondence.
November 5, 2008 22
Baseline Techniques
@IEEE Web Intelligence 2007
23
Experimental Setup
� A randomly sampled user u from the set of Tom’s (the
super-user) contacts.
� A set of top eight contacts (v) of u (determined by
high message density).
� Recipient variability:
• Prediction of communication flow averaged over five
weeks for each contact.
� Temporal variability:
• Prediction of communication flow averaged over all eight
contacts for each of the five weeks.
November 5, 2008@IEEE Web Intelligence 2007 23
24November 5, 2008 24
Predicted Intent
@IEEE Web Intelligence 2007
� The communication intent depends on a wide variety of
contextual factors (neighborhood, topic, and recipient);
� not just on prior probability of communication.
25November 5, 2008 25
Predicted Estimate of Delay
@IEEE Web Intelligence 2007
� Delay may be strongly influenced by factors other than
the social network interaction (e.g. they may be habitual).
26
� A person’s neighboring social network indeed
effects whether or not she will engage herself in a
particular communication quickly.
November 5, 2008 26
Evaluation of Features
@IEEE Web Intelligence 2007
Errors in L-O-O Procedure
0
5
10
15
20
25
30
35
No
Sus
cept
ibili
tyN
o B
acks
catte
r
No
Mes
sage
Coh
eren
ce
No
Tem
pora
l Coh
eren
ceN
o T
opic
Qua
ntity
No
Top
ic R
elev
ance
No
Rec
ipro
city
No
Com
mun
icat
ion
Cor
rela
tion
No
Com
mun
icat
ion
Sig
nific
ance
Err
or
(%)
Intent
Delay
27
Scaling Experiment Details
� An exponential function: f(n)= exp(n/k), where k= 4.6 and n= 1, 2, 3,
4, …, 35 is used to choose networks with node out-degree values
f(n).
� Select the top three users corresponding to each f(n) based on high
message density.
@IEEE Web Intelligence 2007 November 5, 2008
28
� With an increase in network size, the user is in
regular correspondence with only a small fraction
of the network.
November 5, 2008 28
Scalability of Intent
Topic A Topic B
@IEEE Web Intelligence 2007
29
� Delay influenced by a majority with whom the user is not in
active communication.
� Delay may be affected due to intrinsic factors (e.g. habitual)
and less affected by the contextual factors.
November 5, 2008 29
Scalability of Delay
Topic A Topic B@IEEE Web Intelligence 2007
30
Introduction / Related work
Problem Statement
Communication Context
SVM Based prediction
MySpace dataset
Experimental Results
Conclusions
Outline
November 5, 2008@IEEE Web Intelligence 2007 30
• Summary
• Contributions and Future Work
31
� Predict communication flow in large scale social networks based on
communication context.
• identified three aspects : neighborhood,
topic and recipient context.
� Intent to communicate and delay predicted using SVR.
� Excellent results on a real world dataset MySpace.com
• for a single user
• networks of different sizes.
November 5, 2008 31
Summary
Neighborhood
Topic
Recipient
@IEEE Web Intelligence 2007
32
� Consequences:
• Intent to communicate strongly affected by contextual factors.
• Delay is less affected.
� Modeling communication context is essential.
� Future work:
• Comparison against a standardized flow model e.g. epidemic
disease propagation model.
• Prediction, given a pair of users who are separated by n different
people in the social network.
November 5, 2008 32
Conclusions
@IEEE Web Intelligence 2007
33November 5, 2008 33
Thanks!
November 5, 2008 33@IEEE Web Intelligence 2007