![Page 1: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/1.jpg)
FAST KATZ AND COMMUTERS Efficient Estimation of Social Relatedness
David F. Gleich
Sandia National Labs
WAW2010
16 December 2010
Palo Alto, CA
With Pooya Esfandiar, Francesco Bonchi, Chen Grief,
Laks V. S. Lakshmanan, and Byung-Won On
David F. Gleich (Sandia) ICME la/opt seminar
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin
Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
1 / 28
![Page 2: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/2.jpg)
MAIN RESULTS
A – adjacency matrix
L – Laplacian matrix
Katz score :
Commute time:
David F. Gleich (Sandia) ICME la/opt seminar
For Katz Compute one fast Compute top fast For Commute Compute one fast
2 of 28
![Page 3: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/3.jpg)
OUTLINE
Katz Rank and Commute Time, then Why
Matrices, moments, and quadrature rules for pairwise scores
Sparse linear systems solves for top-k
Some results
David F. Gleich (Sandia) ICME la/opt seminar 3 of 28
![Page 4: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/4.jpg)
NOTE – EVERYTHING IS SIMPLE
All graphs are undirected
All graphs are connected
All graphs are unweighted
David F. Gleich (Sandia) ICME la/opt seminar 4 of 28
![Page 5: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/5.jpg)
KATZ SCORES
The Katz score is
Carl Neumann
David F. Gleich (Sandia) ICME la/opt seminar 5 of 28
![Page 6: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/6.jpg)
COMMUTE TIME
Consider a uniform random walk on a graph
David F. Gleich (Sandia) ICME la/opt seminar
Fouss et al. TKDE 2007
Also called the hitting
time from node i to j, or
the first transition time
: graph Laplacian
is the only null-vector
6 of 28
![Page 7: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/7.jpg)
WHAT DO OTHER PEOPLE DO?
1) Just work with the linear algebra formulations
2) For Katz, truncate the Neumann series as a few (3-5) terms
3) Use low-rank approximations from EVD(A) or EVD(L)
4) For commute, use Johnson-Lindenstrauss inspired random sampling
5) Approximately decompose into smaller problems
David F. Gleich (Sandia) ICME la/opt seminar
Liben-Nowell and Kleinberg CIKM2003, Acar et al. ICDM2009,
Spielman and Srivastava STOC2008, Sarkar and Moore UAI2007, Wang et al. ICDM2007
7 of 28
![Page 8: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/8.jpg)
THE PROBLEM
All of these techniques are
preprocessing based because
most people’s goal is to compute
all the scores.
We want to avoid
preprocessing the graph.
David F. Gleich (Sandia) ICME la/opt seminar
There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverse
8 of 28
![Page 9: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/9.jpg)
WHY? LINK PREDICTION
David F. Gleich (Sandia) ICME la/opt seminar
Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficient
Neighborhood based
Path based
9 of 28
![Page 10: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/10.jpg)
WHY NO PREPROCESSING?
The graph is constantly changing
as I rate new movies.
David F. Gleich (Sandia) ICME la/opt seminar 10 of 28
![Page 11: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/11.jpg)
WHY NO PREPROCESSING?
David F. Gleich (Sandia) ICME la/opt seminar
Top-k predicted “links”
are movies to watch!
Pairwise scores give
user similarity
11 of 28
![Page 12: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/12.jpg)
PAIRWISE ALGORITHMS
Katz
Commute
David F. Gleich (Sandia) ICME la/opt seminar
Golub and Meurant
to the rescue!
12 of 28
![Page 13: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/13.jpg)
MMQ - THE BIG IDEA
Quadratic form
Weighted sum
Stieltjes integral
Quadrature approximation
Matrix equation David F. Gleich (Sandia) ICME la/opt seminar
Think
A is s.p.d. use EVD
“A tautology”
Lanczos
13 of 28
![Page 14: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/14.jpg)
MMQ PROCEDURE SKETCH
Goal
Given
1. Run k-steps of Lanczos on starting with
2. Compute , with an additional eigenvalue at ,
set
3. Compute , with an additional eigenvalue at , set
4. Output as lower and upper bounds on b
David F. Gleich (Sandia) ICME la/opt seminar
Correspond to a Gauss-Radau rule, with
u as a prescribed node
Correspond to a Gauss-Radau rule, with
l as a prescribed node
Bad bounds give worse
performance!
Larger k gives better results;
k steps is k matvecs
Easy to “update” this inverse as k increases.
14 of 28
![Page 15: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/15.jpg)
PRACTICAL MMQ
David F. Gleich (Sandia) ICME la/opt seminar 15 of 28
![Page 16: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/16.jpg)
ONE LAST STEP FOR KATZ
Katz
David F. Gleich (Sandia) ICME la/opt seminar 16 of 28
![Page 17: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/17.jpg)
TOP-K ALGORITHM FOR KATZ
Approximate
where is sparse
Keep sparse too
Ideally, don’t “touch” all of
For PageRank, we can do this!
David F. Gleich (Sandia) ICME la/opt seminar 17 of 28
![Page 18: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/18.jpg)
THE ALGORITHM - MCSHERRY
For
Start with the Richardson iteration
Rewrite
Note is sparse
If , then is sparse.
Idea Only add one component of to
David F. Gleich (Sandia) ICME la/opt seminar
McSherry WWW2005
Richardson converges if
18 of 28
![Page 19: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/19.jpg)
THE ALGORITHM FOR KATZ
For
Init:
Pick as max David F. Gleich (Sandia) ICME la/opt seminar
Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for more
19 of 28
![Page 20: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/20.jpg)
CONVERGENCE?
The “standard” proof works for 1/max-degree.
If you pick as the maximum element, we can show this is convergent if Richardson converges. This proof requires to be symmetric positive definite.
David F. Gleich (Sandia) ICME la/opt seminar 20 of 28
![Page 21: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/21.jpg)
RESULTS – DATA, PARAMETERS
All unweighted, connected graphs
Easy :
Hard :
David F. Gleich (Sandia) ICME la/opt seminar 21 of 28
![Page 22: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/22.jpg)
KATZ BOUND CONVERGENCE
David F. Gleich (Sandia) ICME la/opt seminar 22 of 28
![Page 23: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/23.jpg)
COMMUTE BOUND CONVERG.
David F. Gleich (Sandia) ICME la/opt seminar 23 of 28
![Page 24: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/24.jpg)
KATZ SET CONVERGENCE
David F. Gleich (Sandia) ICME la/opt seminar
For arXiv graph.
24 of 28
![Page 25: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/25.jpg)
TIMING
David F. Gleich (Sandia) ICME la/opt seminar 25 of 28
![Page 26: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/26.jpg)
CONCLUSIONS
These algorithms are faster than many alternatives in these special cases.
For pairwise commute, stopping criteria are simpler with bounds.
For top-k problems, we often need less than 1 matvec for good enough results
David F. Gleich (Sandia) ICME la/opt seminar 26 of 28
![Page 27: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/27.jpg)
WARTS AND TODOS
Stopping criteria on our top-k algorithm can be a bit hairy… we should refine it.
The top-k approach doesn’t work right for commute time… we have an alternative.
Take advantage of new research to “seed” commute time better!
Evaluate more datasets, like Netflix.
David F. Gleich (Sandia) ICME la/opt seminar
von Luxburg et al. NIPS2010
27 of 28
![Page 28: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/28.jpg)
By AngryDogDesign on DeviantArt
More details in the paper
Slides should be online soon
Code is online already
stanford.edu/~dgleich/
publications/2010/codes/fast-katz
David F. Gleich (Sandia) ICME la/opt seminar 28
![Page 29: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/29.jpg)
LEO KATZ
David F. Gleich (Sandia) ICME la/opt seminar 29 of 28
![Page 30: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/30.jpg)
NOT QUITE, WIKIPEDIA
: adjacency, : random walk
PageRank
Katz
These are equivalent if has constant degree
David F. Gleich (Sandia) ICME la/opt seminar 30 of 28
![Page 31: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/31.jpg)
WHAT KATZ ACTUALLY SAID
Leo Katz 1953, A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43
“we assume that each link independently has the
same probability of being effective” …
“we conceive a constant , depending
on the group and the context of the particular
investigation, which has the force of a probability
of effectiveness of a single link. A k-step chain
then, has probability of being effective.”
“We wish to find the column sums of the matrix”
David F. Gleich (Sandia) ICME la/opt seminar 31 of 28
![Page 32: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/32.jpg)
RETURNING TO THE MATRIX
Carl Neumann
David F. Gleich (Sandia) ICME la/opt seminar 32 of 28
![Page 33: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/33.jpg)
PROPERTIES OF KATZ’S MATRIX
is symmetric
exists when
is sym. pos. def. when
Note that 1/max-degree suffices
David F. Gleich (Sandia) ICME la/opt seminar 33 of 28
![Page 34: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/34.jpg)
SKIPPING DETAILS
: graph Laplacian
is the only null-vector
David F. Gleich (Sandia) ICME la/opt seminar 34 of 28
![Page 35: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/35.jpg)
Carl Neumann
I’ve heard the Neumann series called the “von Neumann”
series more than I’d like! In fact, the von Neumann kernel
of a graph should be named the “Neumann” kernel!
David F. Gleich (Sandia) ICME la/opt seminar
Wikipedia page
35 / 28
![Page 36: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/36.jpg)
LANCZOS
, k-steps of the Lanczos method produce
and
David F. Gleich (Sandia) ICME la/opt seminar
=
36 of 28
![Page 37: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/37.jpg)
PRACTICAL LANCZOS
Only need to store the last 2 vectors in
Updating requires O(matvec) work
is not orthogonal
David F. Gleich (Sandia) ICME la/opt seminar 37 of 28
![Page 38: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/38.jpg)
PRACTICAL MMQ
Increase k to become more accurate
Bad eigenvalue bounds yield worse results
and are easy to compute
not required, we can iteratively
update it’s LU factorization
David F. Gleich (Sandia) ICME la/opt seminar 38 of 28
![Page 39: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/39.jpg)
THE ALGORITHM
For
Init:
How to pick ?
David F. Gleich (Sandia) ICME la/opt seminar 39 of 28
![Page 40: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/40.jpg)
MAIN RESULTS – SLIDE ONE
A – adjacency matrix
L – Laplacian matrix
Katz score :
Commute time:
David F. Gleich (Sandia) ICME la/opt seminar 40 of 28
![Page 41: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/41.jpg)
TOP-K INSPIRATION: PAGERANK
Approximate
where is sparse
Keep sparse too? YES!
Ideally, don’t “touch” all of ? YES!
David F. Gleich (Sandia) ICME la/opt seminar
McSherry WWW2005, Berkhin 2007, Anderson et al. FOCS2008 – Thanks to Reid Anderson for telling me McSherry did this too.
41 of 28
![Page 42: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/42.jpg)
THE ALGORITHM
Note is sparse.
If , then is sparse.
Idea
only add one component of to
David F. Gleich (Sandia) ICME la/opt seminar 42 of 28
![Page 43: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/43.jpg)
MAIN RESULTS – SLIDE TWO
For Katz Compute one fast
Compute top fast
For Commute
Compute one fast
For almost commute
Compute top fast
David F. Gleich (Sandia) ICME la/opt seminar 43 of 28
![Page 44: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/44.jpg)
MAIN RESULTS – SLIDE THREE
David F. Gleich (Sandia) ICME la/opt seminar 44 of 28
![Page 45: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/45.jpg)
F-MEASURE
David F. Gleich (Sandia) ICME la/opt seminar 45 of 28
![Page 46: Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks](https://reader034.vdocuments.us/reader034/viewer/2022051817/548f7720b4795927058b4ee0/html5/thumbnails/46.jpg)
PAIRWISE RESULTS
Katz upper and lower bounds
Katz error convergence
Commute-time upper and lower bounds
Commute-time error convergence
For the arXiv graph here
David F. Gleich (Sandia) ICME la/opt seminar 46 of 28