nonparametric link prediction in dynamic graphs
DESCRIPTION
Nonparametric Link Prediction in Dynamic Graphs. Purnamrita Sarkar (UC Berkeley) Deepayan Chakrabarti (Facebook) Michael Jordan (UC Berkeley). Link Prediction. Who is most likely to be interact with a given node?. Should Facebook suggest Alice as a friend for Bob ?. Alice. Bob. - PowerPoint PPT PresentationTRANSCRIPT
1
Nonparametric Link Prediction in Dynamic Graphs
Purnamrita Sarkar (UC Berkeley)Deepayan Chakrabarti (Facebook)Michael Jordan (UC Berkeley)
2
Link Prediction Who is most likely to be interact with a given node?
Friend suggestion in Facebook
Should Facebook suggest Alice
as a friend for Bob?
Bob
Alice
3
Link Prediction
Alice
Bob
Charlie
Movie recommendation in Netflix
Should Netflix suggest this
movie to Alice?
4
Link Prediction Prediction using simple features
degree of a node number of common neighbors last time a link appeared
What if the graph is dynamic?
5
Related Work
Generative models Exp. family random graph models [Hanneke+/’06] Dynamics in latent space [Sarkar+/’05] Extension of mixed membership block models
[Fu+/10] Other approaches
Autoregressive models for links [Huang+/09] Extensions of static features [Tylenda+/09]
6
Goal
Link Prediction incorporating graph dynamics, requiring weak modeling assumptions, allowing fast predictions, and offering consistency guarantees.
7
Outline
Model Estimator Consistency Scalability Experiments
8
The Link Prediction Problem in Dynamic Graphs
G1 G2 GT+1……
Y1 (i,j)=1
Y2 (i,j)=0
YT+1 (i,j)=?
YT+1(i,j) | G1,G2, …,GT ~ Bernoulli (gG1,G2,…GT(i,j))
Edge in T+1 Features of previous graphsand this pair of nodes
9
cn
ℓℓ
deg
Including graph-based features
Example set of features for pair (i,j): cn(i,j) (common neighbors) ℓℓ(i,j) (last time a link was formed) deg(j)
Represent dynamics using “datacubes” of these features. ≈ multi-dimensional histogram on binned feature values
ηt = #pairs in Gt with these features
1 ≤ cn ≤ 33 ≤ deg ≤ 61 ≤ ℓℓ ≤ 2
ηt+ = #pairs in Gt with these
features, which had an edge in Gt+1
high ηt+/ηt this feature
combination is more likely to create a new edge at time t+1
10
G1 G2 GT……
Y1 (i,j)=1 Y2 (i,j)=0 YT+1 (i,j)=?
1 ≤ cn(i,j) ≤ 33 ≤ deg(i,j) ≤ 61 ≤ ℓℓ (i,j) ≤ 2
Including graph-based features
How do we form these datacubes? Vanilla idea: One datacube for Gt→Gt+1
aggregated over all pairs (i,j) Does not allow for differently evolving communities
11
YT+1 (i,j)=?
1 ≤ cn(i,j) ≤ 33 ≤ deg(i,j) ≤ 61 ≤ ℓℓ (i,j) ≤ 2
Our Model
How do we form these datacubes? Our Model: One datacube for each neighborhood
Captures local evolution
G1 G2 GT……
Y1 (i,j)=1 Y2 (i,j)=0
12
Our Model
Number of node pairs- with feature s- in the neighborhood of i- at time t
Number of node pairs- with feature s- in the neighborhood of i- at time t- which got connected at time t+1
Datacube
1 ≤ cn(i,j) ≤ 33 ≤ deg(i,j) ≤ 61 ≤ ℓℓ (i,j) ≤ 2
Neighborhood Nt(i)= nodes within 2 hops
Features extracted from (Nt-p,…Nt)
13
Our Model
Datacube dt(i) captures graph evolution in the local neighborhood of a node in the recent past
Model:
What is g(.)?
YT+1(i,j) | G1,G2, …,GT ~ Bernoulli ( gG1,G2,…GT(i,j))g(dt(i), st(i,j))
Features of the pair
Local evolution patterns
14
Outline
Model Estimator Consistency Scalability Experiments
15
Kernel Estimator for g
G1 G2 …… GTGT-1GT-2
query data-cube at T-1 and feature vector at time T
compute similarities
datacube, feature pair
t=1
{{
{
{
{
{
{
{
…
datacube, feature pair
t=2
{{
{
{
{
{
{
{
…datacube,
feature pair t=3
{{
{
{
{
{
{
{
…{
{
16
Factorize the similarity function Allows computation of g(.) via simple lookups
}} }
K( , )I{ == }
Kernel Estimator for g
17
Kernel Estimator for g
G1 G2 …… GTGT-1GT-2
datacubes t=1
datacubes t=2
datacubes t=3
compute similarities only between data cubes
w1
w2
w3
w4
η1 , η1+
η2 , η2+
η3 , η3+
η4 , η4+
44332211
44332211
wwwwwwww
18
Factorize the similarity function Allows computation of g(.) via simple lookups What is K( , )?
}}
}
K( , )I{ == }
Kernel Estimator for g
19
Similarity between two datacubes
Idea 1 For each cell s, take
(η1+/η1 – η2
+/η2)2 and sum
Problem: Magnitude of η is ignored 5/10 and 50/100 are treated
equally
Consider the distribution
η1 , η1+
η2 , η2+
20
Similarity between two datacubes
0 5 10 15 20 25 30 35 40 450
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0 5 10 15 20 25 30 35 40 450
0.02
0.04
0.06
0.08
0.1
0.12
0.14
) , dist(b) , K( 0<b<1
As b0, K( , ) 0 unless dist( , ) =0
Idea 2 For each cell s, compute
posterior distribution of edge creation prob.
dist = total variation distance between distributions summed over all cells
η1 , η1+
η2 , η2+
21
1tη) , K(#1f
) , (f) , (h) , (g
1tη) , K(
#1h
Want to show: gg
Kernel Estimator for g
22
Outline
Model Estimator Consistency Scalability Experiments
23
Consistency of Estimator
Lemma 1: As T→∞, for some R>0,
Proof using:
) , (f) , (h) , (g
As T→∞,
24
Consistency of Estimator
Lemma 2: As T→∞,
) , (f) , (h) , (g
25
Consistency of Estimator
Assumption: finite graph Proof sketch:
Dynamics are Markovian with finite state spacethe chain must eventually enter a closed, irreducible communication classgeometric ergodicity if class is aperiodic(if not, more complicated…)strong mixing with exponential decayvariances decay as o(1/T)
26
Consistency of Estimator
Theorem:
Proof Sketch:
for some R>0
So
27
Outline
Model Estimator Consistency Scalability Experiments
28
Scalability Full solution:
Summing over all n datacubes for all T timesteps Infeasible
Approximate solution: Sum over nearest neighbors of query datacube
How do we find nearest neighbors? Locality Sensitive Hashing (LSH)
[Indyk+/98, Broder+/98]
29
Using LSH
Devise a hashing function for datacubes such that “Similar” datacubes tend to be hashed to the
same bucket “Similar” = small total variation distance
between cells of datacubes
30
0 5 10 15 20 25 30 35 40 450
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Using LSH
Step 1: Map datacubes to bit vectors
Use B2 bits for each bucket For probability mass p the first bits are set to
1Use B1 buckets to discretize [0,1]
Total M*B1*B2 bits, where M = max number of occupied cells << total number of cells
31
Using LSH
Step 1: Map datacubes to bit vectors Total variation distance
L1 distance between distributions Hamming distance between vectors
Step 2: Hash function = k out of MB1B2 bits
32
Fast Search Using LSH
1111111111000000000111111111000
10000101000011100001101010000
10101010000011100001101010000
101010101110111111011010111110
1111111111000000000111111111001
00000001
1111
0011
.
.
.
.
1011
33
Outline
Model Estimator Consistency Scalability Experiments
34
Experiments
Baselines LL: last link (time of last occurrence of a pair)
CN: rank by number of common neighbors in AA: more weight to low-degree common neighbors Katz: accounts for longer paths
CN-all: apply CN to AA-all, Katz-all: similar
ss
35
Setup
Pick random subset S from nodes with degree>0 in GT+1
, predict a ranked list of nodes likely to link to s Report mean AUC (higher is better)
G1 G2 GT
Training data Test dataGT+
1
36
Simulations Social network model of Hoff et al.
Each node has an independently drawn feature vector
Edge(i,j) depends on features of i and j Seasonality effect
Feature importance varies with seasondifferent communities in each season
Feature vectors evolve smoothly over timeevolving community structures
37
Simulations
NonParam is much better than others in the presence of seasonality
CN, AA, and Katz implicitly assume smooth evolution
38
Sensor Network*
* www.select.cs.cmu.edu/data
39
Summary
Link formation is assumed to depend on the neighborhood’s evolution over a time window
Admits a kernel-based estimator Consistency Scalability via LSH
Works particularly well for Seasonal effects differently evolving communities