fast matrix computations for pair-wise and column-wise katz scores and commute times

47
FAST MATRIX COMPUTATIONS FOR PAIRWISE AND COLUMN-WISE KATZ SCORES AND COMMUTE TIMES David F. Gleich Purdue University University of Chicago Statistical and Scientific Computing Seminar October 6 th , 2011 With Pooya Esfandiar, Francesco Bonchi, Chen Grief, Laks V. S. Lakshmanan, and Byung-Won On David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 1 / 47

Upload: david-gleich

Post on 15-Jan-2015

703 views

Category:

Education


2 download

DESCRIPTION

A seminar I gave at the University of Chicago about the ideas to compute the Katz matrices and commute times quickly.

TRANSCRIPT

Page 1: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

FAST MATRIX COMPUTATIONS FOR

PAIRWISE AND COLUMN-WISE KATZ

SCORES AND COMMUTE TIMES

David F. Gleich

Purdue University

University of Chicago

Statistical and Scientific Computing Seminar

October 6th, 2011

With Pooya Esfandiar, Francesco Bonchi, Chen Grief,

Laks V. S. Lakshmanan, and Byung-Won On

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 1 / 47

Page 2: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

MAIN RESULTS

A – adjacency matrix

L – Laplacian matrix

Katz score :

                                               

Commute time:

                             

For Commute

Compute one       fast

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

For Katz Compute one       fast Compute top       fast

2 of 47

Page 3: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

OUTLINE

Why study these measures?

Katz Rank and Commute Time

How else do people compute them?

Quadrature rules for pairwise scores

Sparse linear systems solves for top-k

As many results as we have time for…

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 3 of 47

Page 4: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

WHY? LINK PREDICTION

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Liben-Nowell and Kleinberg 2003, 2006 found that path based link prediction was more efficient

Neighborhood based

Path based

4 of 47

Page 5: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

NOTE

All graphs are undirected

All graphs are connected

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 5 of 47

Page 6: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

LEO KATZ

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 6 of 47

Page 7: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

NOT QUITE, WIKIPEDIA

    : adjacency,                     : random walk

PageRank                          

Katz                          

These are equivalent if     has constant degree

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 7 of 47

Page 8: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

WHAT KATZ ACTUALLY SAID

Leo Katz 1953, A New Status Index Derived from Sociometric Analysis, Psychometria 18(1):39-43

“we assume that each link independently has the

same probability of being effective” …

“we conceive a constant     , depending

on the group and the context of the particular

investigation, which has the force of a probability

of effectiveness of a single link. A k-step chain

then, has probability       of being effective.”

“We wish to find the column sums of the matrix”

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 8 of 47

Page 9: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

A MODERN TAKE

The Katz score (node-based) is

                   

The Katz score (edge-based) is

                   

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 9 of 47

Page 10: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

RETURNING TO THE MATRIX

                     

                                                     

Carl Neumann                

                                    David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 10 of 47

Page 11: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

Carl Neumann

I’ve heard the Neumann series called the “von Neumann”

series more than I’d like! In fact, the von Neumann kernel

of a graph should be named the “Neumann” kernel!

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Wikipedia page

11 / 47

Page 12: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

PROPERTIES OF KATZ’S MATRIX

    is symmetric

    exists when                      

                is spd when                      

Note that         1/max-degree suffices

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 12 of 47

Page 13: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

COMMUTE TIME

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Picture taken from Google images, seems to be

Bay Bridge Traffic by Jim M. Goldstein.

13 / 47

Page 14: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

COMMUTE TIME

Consider a uniform random walk on a graph                    

                                                       

                                               

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Also called the hitting

time from node i to j, or

the first transition time

14 of 47

Page 15: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

SKIPPING DETAILS

                    : graph Laplacian

                                                             

              is the only null-vector

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 15 of 47

Page 16: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

WHAT DO OTHER PEOPLE DO?

1) Just work with the linear algebra formulations

2) For Katz, Truncate the Neumann series as a few (3-5) terms

3) Use low-rank approximations from EVD(A) or EVD(L)

4) For commute, use Johnson-Lindenstrauss inspired random sampling

5) Approximately decompose into smaller problems

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Liben-Nowell and Kleinberg CIKM2003, Acar et al. ICDM2009,

Spielman and Srivastava STOC2008, Sarkar and Moore UAI2007, Wang et al. ICDM2007 16 of 47

Page 17: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

THE PROBLEM

All of these techniques are

preprocessing based because

most people’s goal is to compute

all the scores.

We want to avoid

preprocessing the graph.

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

There are a few caveats here! i.e. one could solve the system instead of looking for the matrix inverse

17 of 47

Page 18: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

WHY NO PREPROCESSING?

The graph is constantly changing

as I rate new movies.

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 18 of 47

Page 19: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

WHY NO PREPROCESSING?

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Top-k predicted “links”

are movies to watch!

Pairwise scores give

user similarity

19 of 47

Page 20: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

PAIR-WISE ALGORITHMS

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 20 / 47

Page 21: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

PAIRWISE ALGORITHMS

Katz                      

Commute                                        

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Golub and Meurant

to the rescue!

21 of 47

Page 22: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

MMQ - THE BIG IDEA

Quadratic form                    

Weighted sum                      

Stieltjes integral                      

Quadrature approximation                      

Matrix equation               David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Think                    

A is s.p.d. use EVD

“A tautology”

Lanczos

22 of 47

Page 23: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

LANCZOS

        , k-steps of the Lanczos method produce

                                    and                      

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

                    =

             

23 of 47

Page 24: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

PRACTICAL LANCZOS

Only need to store the last 2 vectors in      

Updating requires O(matvec) work

      is not orthogonal

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 24 of 47

Page 25: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

MMQ PROCEDURE

Goal                        

Given                        

1. Run k-steps of Lanczos on     starting with    

2. Compute       ,     with an additional eigenvalue at     ,

set                     3. Compute     ,     with an additional eigenvalue at   , set

                  4. Output               as lower and upper bounds on    

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Correspond to a Gauss-Radau rule, with

u as a prescribed node

Correspond to a Gauss-Radau rule, with

l as a prescribed node

25 of 47

Page 26: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

PRACTICAL MMQ

Increase k to become more accurate

Bad eigenvalue bounds yield worse results

      and     are easy to compute

      not required, we can iteratively

update it’s LU factorization

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 26 of 47

Page 27: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

PRACTICAL MMQ

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 27 of 47

Page 28: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

ONE LAST STEP FOR KATZ

Katz                      

           

                                                                   

                                         

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 28 of 47

Page 29: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

COLUMN-WISE

ALGORITHMS

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 29 / 47

Page 30: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

COLUMN-WISE COMMUTE-TIME

                               

      requires entire diagonal of    

                                           

                                           

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Paige and Saunders, 1975

                    =

             

Each vector     is computed by the a Lanczos based CG algorithm.

30 of 47

Page 31: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

COLUMN-WISE COMMUTE TIME

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

  following Hein et al. 2010 is a MUCH better and faster approximation.

31

Page 32: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

KATZ SCORES ARE LOCALIZED

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Up to 50 neighbors is 99.65% of the total mass

32 of 47

Page 33: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

PARTICIPATION RATIOS

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Participation

Ratios

“effective non-zeros” in a vector

           

33 of 47

Page 34: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

TOP-K ALGORITHM FOR KATZ

Approximate    

                           

where     is sparse

Keep     sparse too

Ideally, don’t “touch” all of    

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of 47

Page 35: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

INSPIRATION - PAGERANK

Approximate    

                           

where     is sparse

Keep     sparse too? YES!

Ideally, don’t “touch” all of     ? YES!

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

McSherry WWW2005, Berkhin 2007, Anderson et al. FOCS2008 – Thanks to Reid Anderson for telling me McSherry did this too.

35 of 47

Page 36: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

THE ALGORITHM - MCSHERRY

For              

Start with the Richardson iteration

                                                     

Rewrite

                                     

Richardson converges if                        

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 36 of 47

Page 37: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

THE ALGORITHM

Note     is sparse.

If                 , then         is sparse.

Idea

only add one component of         to        

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 37 of 47

Page 38: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

THE ALGORITHM

For              

Init:                                

                               

                               

How to pick   ?

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 38 of 47

Page 39: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

THE ALGORITHM FOR KATZ

For                            

Init:                                  

                             

                                           

Pick   as max       David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Storing the non-zeros of the residual in a heap makes picking the max log(n) time. See Anderson et al. FOCS2008 for more

39 of 47

Page 40: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

CONVERGENCE?

If         1/max-degree then       is sub-stochastic and the PageRank based proof applies because the matrix is diagonally dominant

For                       , then for symmetric     , this algorithm is the Gauss-Southwell procedure and it still converges.

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

40 of 47

Page 41: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

RESULTS – DATA, PARAMETERS

All unweighted, connected graphs

Easy                                    

Hard                            

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 41 of 47

Page 42: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

KATZ BOUND CONVERGENCE

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 42 of 47

Page 43: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

COMMUTE BOUND CONVERG.

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 43 of 47

Page 44: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

KATZ SET CONVERGENCE

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

For arXiv graph.

44 of 47

Page 45: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

TIMING

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 45 of 47

Page 46: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

CONCLUSIONS

These algorithms are faster than many alternatives.

For pairwise commute, stopping criteria are simpler with bounds

For top-k problems, we often need less than 1 matvec for good enough results

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 46 of 47

Page 47: Fast matrix computations for pair-wise and column-wise Katz scores and commute times

By AngryDogDesign on DeviantArt

Paper at WAW2010, J. Internet Mathematics

Slides should be online soon

Code is online already

www.cs.purdue.edu/homes/dgleich/

/codes/fast-katz-2011

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 47