prediction in dynamic graphs

Prediction in Dynamic Graph Sequences


Emile Richard

CMLA-ENS Cachan & 1000mercis

Supervisors :Th. Evgeniou (INSEAD) and N. Vayatis (CMLA-ENS Cachan)

January 20, 2012


Table of contents

ContextMotivationData Description

Problem FormulationRandom Graph ModelsLink Prediction HeuristicsFramework

AlgorithmsTwo-stage optimizationJoint Optimization in W and SVariants

Discussion

References


Context


Context

Motivation

From Big Data to Business Decisions

1000mercis: interactive marketing and advertisement(emailing, mobile, viral games)

1. Send less ads: email is free → overwhelm consumers

2. Make consumers happy: serendipity

3. Act sustainably: avoid long-term fatigue

4. Earn more: up to 5 times!


Context

Motivation

Prediction in Relational Databases?I Recommender systems

I Links: to select recommendations, offline fine-tuningI Sales volumes: prepare or push trends

I Resource allocation Consumers and contributors in UGC[Zhang11], Stockmanagement

I Understanding of data through relevant features extraction

0 50 100 150 200 250 3009

9.5

10

10.5

11

11.5

12

Time (week)

Lo

g

Returning

0 50 100 150 200 250 3002

4

6

8

10

12

Time (week)

Lo

g

New

Sellers

Products

Buyers

Commission

Sellers

Products

Buyers

Commission


Context

Motivation

Similar Problems

I The Netflix prize: 1M$ for a 10% improvement in accuracy

I Amazon: 35% sales generated by recommendation[Linden03]

I CRM optimization: acquisition, cross-selling, churnmanagement, prediction of top-selling items etc.


Context

Motivation

Other Web Applications


Context

Motivation

Similar Problems in Computational Biology1

I Understanding the underlying mechanisms of biologicalsystems

I Inference procedures for analysis of effects of biologicalpathways in cancer progression

I Study the effect of potential drugs/treatments on generegulatory networks in cancer cells

1After a discussion with Ali Shohaie


Context

Data Description

Case Study

I Data: C-to-C website

I Recommendation newsletters and banners

I Management of promotional assets and pressure on users

Domain users products daily sales

Music 0.4M 60K 2K

Books 1.2M 1.7M 18K

Electronic 0.5M 60K 2K

Video Games 0.9M 0.2M 9K


Context

Data Description

Heterogeneous Domains

−8 −7 −6 −5 −4 −3 −2 −1 00

0.2

0.4

0.6

0.8

1

log(Clustering Coefficient)

Density

Users side

Video Games

Music

Electronic Devices

Books

−8 −7 −6 −5 −4 −3 −2 −1 00

0.2

0.4

0.6

0.8

1

log(Clustering Coefficient)

Density

Products side

Video Games

Music

Electronic Devices

Books

8 9 10 11 12 130

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

user side

log(degree)

Density

Video Games

Music

Electronic

Books

7 8 9 10 11 12 130

0.2

0.4

0.6

0.8

1

product side

log(degree)

Density

Video Games

Music

Electronic

Books

7 8 9 10 11 12 130

0.1

0.2

0.3

0.4

0.5

user side

log(d(2)

/degree)

Density

Video Games

Music

Electronic

Books

7 8 9 10 11 12 130

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

product side

log(d(2)

/degree)

Den

sit

y

Video Games

Music

Electronic

Books

7 8 9 10 11 12 130

0.1

0.2

0.3

0.4

0.5

user side

log(d(3)

/d(2)

)

Density

Video Games

Music

Electronic

Books

7 8 9 10 11 12 130

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

product side

log(d(3)

/d(2)

)

Density

Video Games

Music

Electronic

Books

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Books joint User x Product distribution

Users (decreasing degree)

Pro

duct

s(de

crea

sing

deg

ree)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Music joint User x Product distribution

Users (decreasing degree)

Pro

duct

s(D

ecre

asin

g de

gree

)


Problem Formulation

Problem Formulation


Problem Formulation

Dynamic Graphs

I Nodes linked by Edges that appear over timeI Web applications, Economics, Biology, Drug discovery

I (Social networks users, Friendship)I (Users and products, Purchases or clicks)I (Websites, Hyperlinks)I (Proteins, Interaction)


Problem Formulation

Prediction at Descriptor (macro) and Edge (micro) Levels

I Network Effect: cause and symptom of the evolution of nodefeatures e.g. popularity, homophily, centrality, diffusion level

I Simultaneousely predict node features and future links


Problem Formulation

Complex Networks?

I Degrees of freedom ∼ n2 , n: # nodes

I Latent factors r � n , r : # latent factors

I Intrinsic dimensionality reduced to ∼ rn� n2

I Kepler’s Laws of networks


Problem Formulation

Random Graph Models

Random Graph Models

I Erdos-Renyi[Bollobas01]: nodes connected with uniformprobability. No prediction chance

I Preferential Attachment[Albert02]: reproduces power-lawdegree distributions. Rich-get-Richer

I Block-Models[Nowicki01]: k blocks or clusters form thestructure of the graph. Community Structure

I Latent Factor Model[Hoff02, Krivitsky10] node latent factorszi , zj , pair-wise covariate descriptors xi ,j

P(Y |X ,Z , θ) =∏i 6=j

P(Yi ,j |Xi ,j ,Zi ,Zj , θ)

log odd(yi ,j = 1|xi ,j , zi , zj , α, β) ∝ α− βxi ,j + ‖zi − zj‖2

Parameter Estimation


Problem Formulation

Random Graph Models

Exponential Random Graph Families[Wasserman96]

I Graph z : realization of a random variable Z

I Pθ(Z = z) = eθ>ω(z)−Ψ(θ)

I θ ∈ RQ vector of parameters,

I ω sufficient statistics on the graph z : ω(z) ∈ RQ

I Ψ a normalization factor

I Parameter Estimation by Maximizing Log-likelihood


Problem Formulation

Link Prediction Heuristics

Nearest Neighbors and Walks

Hypothesis: a graph G is partially observed, we aim to find thehidden edges[Kleinberg07]

I Friends of my friends are likely to be my friends.I A ∈ {0, 1}n×n the social adjacency matrixI (A2)i,j =

∑nk=1 Ai,kAk,j = #paths of length 2 from i to j

= #common friends of i and j

I Random Walks

I Take W = D−1A where D is the diagonal matrix of degreesI Katz =

∑∞k=1 β

kW k = (In − βW )−1 − In


Problem Formulation


Bipartite Graphs of Marketplaces

u1

u2

u3

u4

p1

p2

p3

p4

p5

I Who bought this also bought that.I M ∈ {0, 1}#users×#products : transactionsI (MM>M)i,j : number of times product j was purchased by

users having purchased the same products as a given user i

I Random Walks Apply the unipartite formula to

(0 M

M> 0

)


Problem Formulation


Low-RankA = Udiag(σi )V > SVDDefine ‖X‖∗ =

∑i σi (X )

and Dτ (A) = Udiagmax(σi − τ, 0)V > : the Shrinkage operatorI Rank r matrix closest to A is Udiag(σ1, · · · , σr , 0, · · · 0)V >

I Fact : argminX12‖X − A‖2

F + τ‖X‖∗ = Dτ (A)

0 10 20 30 40 50 60

0

10

20

30

40

50

60

nz = 1400

block−wise adjacency

I Matrix Completion[Srebro05, Candes08, Koltchinskii11]estimates A by minimizing

1

2‖ω(A)− ω(X )‖2

2 + τ‖X‖∗

for a linear mapping ω : Rn×n → RQ


Problem Formulation


Link Prediction: Statistical and Spectral PropertiesI Statistics on number of triangles and length of paths in the

graph are stableI Spectral functions[Kunegis09] of the adjacency and stochastic

matrices killing low eigenvalues

If A = Udiag(σi )V > is the SVD, Udiag(f (σi )i )V > is calledspectral function.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Spectral Functions

σ

f(σ

)

σ2

∝ (1−β σ)−1

−1

max(σ − τ, 0)


Problem Formulation


Leading Insight

Link Prediction heuristics implicitly suggest

1. Graph sequence fits to some slowly varying feature map

2. Spectrum of graphs is regular

Define a regularization formulation of the problem in order toleverage the trade-offs and select the best features.

Obstacle to matrix completion: ω(A) is to be predicted.


Problem Formulation

Framework

Notations

I Time steps t ∈ {1, 2, ...,T}I Adjacency matrices At ∈ {0, 1}n×n graph sequence

I Feature map ω : Rn×n → RQ linear

I ω linear (degree, clusters)I Q � n2

I Prediction of AT+1: score matrix S ∈ Rn×n


Problem Formulation

Framework

Assumptions

1. Stationarity of successive feature vectors

∃f : RQ → RQ ,∀t, ω(At+1) = f (ω(At)) + εt

2. Simplicity of S

I S low rank[Srebro05],I Penalize the trace norm ‖S‖∗


Problem Formulation

Framework

Quantities to control

1. Features predictor

J1(f ) =T−1∑t=1

`(ω(At+1), f (ω(At)) + κ‖f ‖H

2. Predicted features matching the predicted graph features(coupling term)

J2(f ,S) = `(ω(S), f (ω(AT ))

3. Penalty on S

J3(S) = τ‖S‖∗


Problem Formulation

Framework

Convex Optimization Problem

Let

X =

ω(A1)>

...ω(AT−1)>

,Y =

ω(A2)>

...ω(AT )>

∈ R(T−1)×Q

We take linear predictors, f (ω) = ω>W and define the convexobjective

L .= J1 + J2 + J3

=1

2‖XW − Y ‖2

F +κ

2‖W ‖2

F +1

2‖ω(AT )>W − ω(S)>‖2

2 + τ‖S‖∗


Algorithms

Algorithms


Algorithms

Optimization Strategies

Goal : minimize L(S ,W )

1. Two-stage optimization

2. Joint optimization in W and S

3. Variant 1: graph regularization

4. Variant 2: sparsity constraint


Algorithms

Two-stage optimization

Two-stage Optimization [Richard10]

I Solve W.

= argminW∈RQ×Q J1(W ) (regression)

I Minimize J2(W ,S) + J3(S)

I Optimal algorithms due to Nesterov

I ε-optimal solution after O(1/√ε) iterations instead of

O(1/ε2) [Goldfarb09]

(r ,noise)\alg. Proposed Static P. A. Katz

(5,0.000) 0.671±0.008 0.648± 0.008 0.627± 0.015 0.616± 0.015

(5,0.250) 0.675± 0.009 0.642± 0.007 0.602± 0.016 0.592± 0.016

(5,0.750) 0.519± 0.007 0.525± 0.005 0.497± 0.007 0.491± 0.007

(500,0.000) 0.592± 0.008 0.587± 0.007 0.671± 0.010 0.667± 0.009

(500,0.250) 0.607± 0.011 0.588± 0.009 0.649± 0.009 0.643± 0.009

(500,0.750) 0.601± 0.010 0.583± 0.007 0.645± 0.017 0.641± 0.017


Algorithms

Two-stage optimization

Split and Alternately MinimizeI Splitting: Lη(S , S)

.= τ‖S‖∗ + h(S , ν), subject to S = S

I Alternately minimize in S and S :

I mG (S) = argminS

{τ‖S‖∗ + 〈∇h(S),S − S〉+ 1

2µ‖S − S‖2F

}I mH(S) = argminS

{h(S , ν) + 〈∇τ‖S‖∗, S − S〉+ 1

2µ‖S − S‖2F

}Algorithm 1 Link Discovery Algorithm

Parameters: τ, ν, η

Initialization: W0 = Z1 = AT , α1 = 0

for k = 1, 2, . . . doSk ← mG (Zk) and Sk ← mH(Sk)

Wk ←1

2(Sk + Sk)

αk+1 ←1

2(1 +

√1 + 4α2

k)

Zk+1 ←Wk +1

αk+1

(αk(Sk −Wk−1)− (Wk −Wk−1)

)end for


Algorithms

Joint Optimization in W and S

Minimization of L by proximal gradient descent

L(S ,W ) = g(S ,W ) + Γ(S ,W )

I g(S ,W ).

= 12‖XW − Y ‖2

F + 12‖ω(AT )>W − ω(S)>‖2

2 :smoothly differentiable fit-term

I Γ(S ,W ).

= κ2 ‖W ‖

2F + τ‖S‖∗ : convex penalty

I Explicit proximal

proxθΓ(S ,W ).

= argmin(Z ,V )θΓ(Z ,V )+1

2‖S−Z‖2

F+1

2‖W−V ‖2

F

= (Dθτ (S),W /(1 + θκ))

I (Sk+1,Wk+1) = proxθkΓ

((Sk ,Wk)− θkgradg(Sk ,Wk)

)I FISTA[Beck09] for optimal convergence rate


Algorithms

Variants

Variant 1: Graph Regularization Constraint

I Want i ∼S j ⇒ f (i) ∼H f (j)

I Control the laplacian-like[Chen10] inner product

J4(f , S) =∑

i ,j Si ,j︸︷︷︸i∼j

‖f (i) − f (j)‖2H︸︷︷︸

f (i)∼f (j)

=⟨S ,

(‖f (i) − f (j)‖2

H

)i ,j

⟩I Other possibility: J4(f ,S) =

⟨S ,Gram(f )

⟩I Lgraph regularization = L+ λJ4

I Issue: non-convex regularizersI Algorithms:

1. Gradient descent with hyper-parameters that keep theobjective inside the convexity domain

2. Projected gradient descent inside the convexity domain


Algorithms

Variants

Gradient Descent Convergence Area


Algorithms

Variants

Empirical Results

Data Marketing Synthetic

Method \ Error ∆Sales ∆Graph ∆Sales ∆Graph

Our solution 0.62 0.28 0.13 ± .002 0.21± .003Rank-free prediction 0.64 0.31 0.19 ± .008 0.24 ± .01AR 0.80 - 0.66 ± .007 -ARIMA 0.78 - 0.17 ± .02 -VAR 1.02 - 0.42 ± .09 -MC with shrinkage - 0.38 - 0.22 ± .003

I Sales Prediction metric: ∆Sales =‖ω(AT+1)−f (ω(AT ))‖2

‖ω(AT+1)‖2to be minimized

I Graph Completion metric: ∆Graph =‖AT+1−S‖F‖AT+1‖F

to be minimized


Algorithms

Variants

Convexity Domain

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

0

0.5

1

1.5

2

2.5

3

3.5

4

0

2

4

6

8

10

12

14

16

w

J4

s

sw

2

+−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

00.5

11.5

22.5

33.5

4

0

2

4

6

8

10

12

14

w

κ |f|2 + ν|S−AT|

2

s s

2 +

w

2=

−2

−1

0

1

2

00.5

11.5

22.5

33.5

4

0

5

10

15

20

25

30

w

λ J4 + κ |f|

2 + ν|S−AT|

2

s

sw

2 +

s2 +

w

2

I J4 not jointly-convex in (S , f )

I λJ4 + κ‖W ‖2F + ν‖S − AT‖2

F convex inside

E =

{S ∈ Rn×n

+ ,W ∈ Rn×d∣∣∣∣ ‖W ‖2

F ≤√νκ

2λ

}


Algorithms

Variants

Empirical Results

−8 −6 −4 −2 0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

1.2

1.4

log(ν)

rela

tiv

e e

rro

rs

Performance (ν)

HYBRID (Regression)

HYBRID (Graph Completion)

Rank Free Regression

Rank Free Graph Completion

Regression Only

Graph Only


Algorithms

Variants

Variant 2: Sparsity Constraint

I Lsparse(S ,W ).

= L(S ,W ) + γ‖S‖1,1 (lasso)

I Split S onto S and S and add an equality constraint

I Synthetic data n = 100,Q = 15,T = 200

I 10 runs for cross validation 10 runs for test

I AUC on S reported

Nearest Neighbors Static Low Rank Lsparse L0.9767 ± 0.0076 0.9751 ± 0.0362 0.9812 ± 0.0008 0.9778 ± 0.0071


Discussion

Discussion


Discussion

Synthetic Data Generation

Let ∀k ∈ {1, · · · , r}

U(i ,k)t =

1√2πσi ,k

e

−(t−µi,k )2

2σ2i,k + εi ,k

quantify the taste of user i for feature k at t, and

V(i ,k)t the weight of feature k for item i and take

A(i ,j)t = 1{U(i)(t) > θ}1{V (j)(t) > θ}>

At is

1. Sparse

2. Rank at most r

3. Its latent factors evolve slowly provided σ’s are not too small.


Discussion

Scalability

I Dτ (A) is dense, even for sparse A

I Fact[Srebro05] : ‖S‖∗ = 12 minUV>=S ‖U‖2

F + ‖V ‖2F

I Instead of fixing τ , fix r and take U,V ∈ Rn×r

I DefineJ (U,V ,W )

.=

‖XW−Y ‖2F+‖ω(AT )>W−ω(UV >)>‖2

2+κ

2‖W ‖2

F+λ

2(‖U‖2

F+‖V ‖2F )

I Parallel Stochastic Gradient Algorithms [Recht11]


Discussion

Store Recommendation Lists

I Each feature leads to a specific list of recommendation

I Store top-k lists

I Learn optimal combinations / aggregations

... work in progress


Discussion

Conclusion

I Introduction of a regularization approach formulation for linkprediction in graph sequences

I Several variants detailed and empirically tested

I Perspective for scalable algorithms

I Perspective for theoretical analysis and understanding of theproblem


Discussion

Thanks

Mercis !


References

Reka Albert and Albert-Laszlo Barabasi.Statistical mechanics of complex networks.Reviews of Modern Physics, 74:4797, 2002.

A. Beck and M. Teboulle.A fast iterative shrinkage-thresholding algorithm for linearinverse problems.SIAM Journal of Imaging Sciences, 2(1):183–202, 2009.

B. Bollobas.Random graphs, vol. 73 of Cambridge Studies in AdvancedMathematics. 2nd ed.Cambridge University Press, Cambridge, 2001.

Emmanuel J. Candes and Terence Tao.A singular value thresholding algorithm for matrix completion.SIAM Journal on Optimization, 20(4):1956–1982, 2008.


References

Xi Chen, Seyoung Kim, Qihang Lin, Jaime G. Carbonell, andEric P. Xing.Graph-structured multi-task regression and an efficientoptimization method for general fused lasso.arXiv, 2010.

Donald Goldfarb and Shiqlan Ma.Fast alternating linearization methods for minimizing the sumof two convex functions.Technical Report, Department of IEOR, Columbia University,2009.

P. D. Hoff, A. E. Raftery, and M. S. Handcock.Latent space approaches to social network analysis.Journal of the Royal Statistical Society, 97, 2002.

David Liben-Nowell and Jon Kleinberg.The link-prediction problem for social networks.


References

Journal of the American Society for Information Science andTechnology, 58(7):1019–1031, 2007.

Vladimir Koltchinskii, Karim Lounici, and Alexandre Tsybakov.

Nuclear norm penalization and optimal rates for noisy matrixcompletion.Annals of Statistics, 2011.

P. N. Krivitsky and M. S. Handcock.A Separable Model for Dynamic Networks.ArXiv e-prints, November 2010.

Jerome Kunegis and Andreas Lommatzsch.Learning spectral graph transformations for link prediction.In Proceedings of the 26th Annual International Conference onMachine Learning, ICML ’09, pages 561–568, New York, NY,USA, 2009. ACM.


References

G. Linden, B Smith, and J. York.Amazon.com recommendations : Item-to-item collaborativefiltering.IEEE Internet Computing, 2003.

K. Nowicki and T. Snijders.Estimation and prediction for stochastic blockstructures.Journal of the American Statistical Association, 96:1077–1087, 2001.

Benjamin Recht and Christopher Re.Parallel stochastic gradient algorithms for large-scale matrixcompletion.Submitted for publication, 2011.

Emile Richard, Nicolas Baskiotis, Theodoros Evgeniou, andNicolas Vayatis.Link discovery using graph feature tracking.


References

Proceedings of Neural Information Processing Systems (NIPS),2010.

Nathan Srebro, Jason D. M. Rennie, and Tommi S. Jaakkola.Maximum-margin matrix factorization.In Lawrence K. Saul, Yair Weiss, and Leon Bottou, editors, inProceedings of Neural Information Processing Systems 17,pages 1329–1336. MIT Press, Cambridge, MA, 2005.

Stanley Wasserman and Philippa Pattison.Logit models and logistic regressions for social networks: I. anintroduction to markov graphs and p∗.Psychometrika, 61(3):401–425, September 1996.

K. Zhang, Th. Evgeniou, V. Padmanabhan, and E. Richard.Content contributor management and network effects in a ugcenvironment.Marketing Science, 2011.

prediction in dynamic graphs

Technology

musicvideo games electronic

video games densitymusic

5m60k2k video games

music electronic devices

viral games

video gamesvideo games1

j log oddyi

logclustering coefficient