estimating pagerank on graph streams

29
Estimating PageRank on Graph Streams Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research)

Upload: elroy

Post on 12-Jan-2016

38 views

Category:

Documents


3 download

DESCRIPTION

Estimating PageRank on Graph Streams. Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research). PageRank. PageRank Determine Ranking of nodes in graphs Typically large graphs - WWW, Social Networks Run daily by commercial search engines. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Estimating PageRank on Graph Streams

Estimating PageRank on Graph Streams

Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi,

Rina Panigrahy (Microsoft Research)

Page 2: Estimating PageRank on Graph Streams

PageRank

• PageRank – Determine Ranking of nodes in graphs

• Typically large graphs - WWW, Social Networks

• Run daily by commercial search engines

Page 3: Estimating PageRank on Graph Streams

PageRank computation

u

a

b

c

Page 4: Estimating PageRank on Graph Streams

PageRank Computation

Our Approach:No Matrix-Vector

Multiplication!

u

a

b

c

Page 5: Estimating PageRank on Graph Streams

Our Result

Many Random Walk SamplesEfficiently.

Approximate PageRank

u

Page 6: Estimating PageRank on Graph Streams

Other results from Random Walks

We can estimate:Mixing TimeConductance

Using Streams

G

u

Page 7: Estimating PageRank on Graph Streams

Streaming

7

e1, e2, e3, e4, e5, e6, e7, ….

Input is a “stream”

Small RAM working memory

Few Passes

Frequency moments, quantiles

Graphs: Edges, arbitrary order

010001011

011101011

0100110111

Page 8: Estimating PageRank on Graph Streams

Related Work

• Sparsifiers (Benczur-Karger 96, Spielman-Teng 01, Spielman-Srivastava 08)– Given an undirected graph, produces a sparse one– approximately preserves x’Lx– Can be used to compute sparse cuts

• Streaming version of BK96 (Ahn, Guha 09)– Sparse cuts in 1 pass and O(n) space.

• Accelarated Page Rank (McSherry 08)– heuristics

8

~

Page 9: Estimating PageRank on Graph Streams

Key Idea

One walk from ulength l efficiently

Later extend toMany walks

u

vl

Page 10: Estimating PageRank on Graph Streams

Single Random Walk - Naive Algo.

One Stepwith every

Pass!

Constant Space Passes

s

Page 11: Estimating PageRank on Graph Streams

Second Naive Algo

Single PassSample sufficient edges!

If ,then sample2 out-edges

from each node.

(store order)

s

Page 12: Estimating PageRank on Graph Streams

Comparison

Naive (single walk):

Our Result:

In fact walks!

u

l

Automatically:

Page 13: Estimating PageRank on Graph Streams

Insight: Merge Short Walks

Sample fraction of nodes(centers)

passes - length walks

Merge and extendshort walks!

Two problems:End up at node second timeEnd up at non-sampled node

s

w

w

w

w

w

w

w

ab

Page 14: Estimating PageRank on Graph Streams

Stuck Nodes

Sample an edgefrom stuck.

Again.And again...

Slow?

If new nodes, good in passes!

s

w

w

w

w

w

w

w

Page 15: Estimating PageRank on Graph Streams

Stuck nodes

Stuck on sameNodes?

Sample s edges from each

s progress ORnew node!

Must include to set previous seen

centers

s

w

w

w

w

w

w

w

ww s s

s

s s

s

Page 16: Estimating PageRank on Graph Streams

Summary

s

w

w

w

w

w

w

w

ww s s

s

s s

s

• Perform short walks from sampled centers

• Concatenate walks until stuck

• Sample edges from stuck

• Make local progress until new node

• Local progress = s• New node : center with

prob • Amortized progress,

every pass

Page 17: Estimating PageRank on Graph Streams

Summary

s

w

w

w

w

w

w

w

ww s s

s

s s

s

Total number of passes :

Total Space :

Page 18: Estimating PageRank on Graph Streams

Summary

s

w

w

w

w

w

w

w

ww s s

s

s s

s

Set

Number of passes =

Space =

Page 19: Estimating PageRank on Graph Streams

Many WalksNaive Space

Bound:

Observation:Many short walks

not used inSingle RW.

s

w

w

w

w

w

w

w

ww s s

s

s s

s

We show:

lnKnO /for )(~

Page 20: Estimating PageRank on Graph Streams

Many Random Walks

ir

ir

w

lKrK i

ir

• : probability node ’s short walk used in single RW.

• If known : save lot of space!• Perform K random walks• Total number of short walks required is

about

• Don’t know . But can estimate.ir

Page 21: Estimating PageRank on Graph Streams

Estimating

• Run K = (log n) walks of length

• Gives a crude estimate of • Sufficient to double K• Continue doubling K• Gives K walks in space

• Passes

u

l

ir

irO

)(~

Kll

KnO

Page 22: Estimating PageRank on Graph Streams

Distributions

samples

Distribution: u

SpacePasses

Page 23: Estimating PageRank on Graph Streams

Mixing Time, Conductance• Undirected graphs: Compare Distribution

with Steady State.• Estimating difference: samples.

[Batu et. al.’ 01]– approximate mixing time.

• Directed, till distribution “stabilizes”: samples.

• Conductance:• Recall space for walks: lnKnO /for )(

~

Page 24: Estimating PageRank on Graph Streams

Results recap

• - Mixing Time for Undirected Graphs :

• Quadratic Approximation to Conductance• PageRank to accuracy

)(~

:Space nO

Page 25: Estimating PageRank on Graph Streams

Open Questions?

• Improve passes for random walks. In particular, sub-linear space and constant passes.

• Graph Cuts and Graph Sparsification for directed graphs

• Better (streaming) algorithms for computing eigenvectors

Page 26: Estimating PageRank on Graph Streams

Thank You!

Page 27: Estimating PageRank on Graph Streams

Summary

• Perform short walks from sampled centers• Concatenate walks until stuck• Sample edges from stuck• Make local progress until new node• Local progress = s• New node = nodes gives center• Amortized, every pass -

Page 28: Estimating PageRank on Graph Streams

Summary

• Perform short walks from sampled centers• Concatenate walks until stuck• Sample edges from stuck• Make local progress until new node• Local progress = s• New node = nodes gives center• Amortized, every pass -

Page 29: Estimating PageRank on Graph Streams

Analysis

• Total number of passes :• Total Space : • Set• Number of passes = • Space =