extrapolation

45
Extrapolation Methods for Accelerating PageRank Computations Sepandar D. Kamvar Taher H. Haveliwala Christopher D. Manning Gene H. Golub Stanford University

Upload: carlos

Post on 13-Dec-2014

384 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Extrapolation

Extrapolation Methods for Accelerating PageRank Computations

Sepandar D. Kamvar

Taher H. Haveliwala

Christopher D. Manning

Gene H. Golub

Stanford University

Page 2: Extrapolation

2

Results:

1. The Official Site of the San Francisco Giants

Search: Giants

Results:

1. The Official Site of the New York Giants

Motivation Problem:

Speed up PageRank

Motivation: Personalization “Freshness”

Note: PageRank Computations don’t get faster as computers do.

Page 3: Extrapolation

3

0.4

0.2

0.4

(k)1)(k Axx Repeat:

u1 u2 u3 u4 u5

u1 u2 u3 u4 u5

Outline Definition of PageRank

Computation of PageRank

Convergence Properties

Outline of Our Approach

Empirical Results

Page 4: Extrapolation

4

Link Counts

Linked by 2 Important Pages

Linked by 2 Unimportant

pages

Sep’s Home Page

Taher’s Home Page

Yahoo! CNNDB Pub Server CS361

Page 5: Extrapolation

5

Definition of PageRank

The importance of a page is given by the importance of the pages that link to it.

jBj j

i xN

xi

1

importance of page i

pages j that link to page i

number of outlinks from page j

importance of page j

Page 6: Extrapolation

6

Definition of PageRank

1/2 1/2 1 1

0.1 0.10.1

0.05

Yahoo!CNNDB Pub Server

Taher Sep

0.25

Page 7: Extrapolation

7

PageRank Diagram

Initialize all nodes to rank

0.333

0.333

0.333

nxi

1)0(

Page 8: Extrapolation

8

PageRank Diagram

Propagate ranks across links(multiplying by link weights)

0.167

0.167

0.333

0.333

Page 9: Extrapolation

9

PageRank Diagram

0.333

0.5

0.167

)0()1( 1j

Bj ji x

Nx

i

Page 10: Extrapolation

10

PageRank Diagram

0.167

0.167

0.5

0.167

Page 11: Extrapolation

11

PageRank Diagram

0.5

0.333

0.167

)1()2( 1j

Bj ji x

Nx

i

Page 12: Extrapolation

12

PageRank Diagram

After a while…

0.4

0.4

0.2

jBj j

i xN

xi

1

Page 13: Extrapolation

13

Computing PageRank Initialize:

Repeat until convergence:

)()1( 1 kj

Bj j

ki x

Nx

i

nxi

1)0(

importance of page i

pages j that link to page i

number of outlinks from page j

importance of page j

Page 14: Extrapolation

14

Matrix Notation

jBj j

i xN

xi

1

0 .2 0 .3 0 0 .1 .4 0 .1=

.1

.3

.2

.3

.1

.1

.2

.1

.3

.2

.3

.1

.1TP

x

Page 15: Extrapolation

15

Matrix Notation

.1

.3

.2

.3

.1

.1

0 .2 0 .3 0 0 .1 .4 0 .1=

.1

.3

.2

.3

.1

.1

.2

xPx TFind x that satisfies:

Page 16: Extrapolation

16

Power Method Initialize:

Repeat until convergence:

(k)T1)(k xPx

T(0)x

nn

1...

1

Page 17: Extrapolation

17

PageRank doesn’t actually use PT. Instead, it uses A=cPT + (1-c)ET.

So the PageRank problem is really:

not:

A side note

AxxFind x that satisfies:

xPx TFind x that satisfies:

Page 18: Extrapolation

18

Power Method And the algorithm is really . . .

Initialize:

Repeat until convergence:

T(0)x

nn

1...

1

(k)1)(k Axx

Page 19: Extrapolation

19

0.4

0.2

0.4

(k)1)(k Axx Repeat:

u1 u2 u3 u4 u5

u1 u2 u3 u4 u5

Outline Definition of PageRank

Computation of PageRank

Convergence Properties

Outline of Our Approach

Empirical Results

Page 20: Extrapolation

20

Power Method

u1

1u2

2

u3

3

u4

4

u5

5

Express x(0) in terms of eigenvectors of A

Page 21: Extrapolation

21

Power Method

u1

1u2

22

u3

33

u4

44

u5

55

)(1x

Page 22: Extrapolation

22

Power Method)2(x

u1

1u2

222

u3

332

u4

442

u5

552

Page 23: Extrapolation

23

Power Method

u1

1u2

22k

u3

33k

u4

44k

u5

55k

)(kx

Page 24: Extrapolation

24

Power Method

u1

1u2

u3

u4

u5

)(x

Page 25: Extrapolation

25

Why does it work?

Imagine our n x n matrix A has n distinct eigenvectors ui.

ii uAu i

n0 uuux n ...221)(

u1

1u2

2

u3

3

u4

4

u5

5

Then, you can write any n-dimensional vector as a linear combination of the eigenvectors of A.

Page 26: Extrapolation

26

Why does it work? From the last slide:

To get the first iterate, multiply x(0) by A.

First eigenvalue is 1.

Therefore:

...;1 211

n0 uuux n ...221)(

n

n

(0)(1)

uuu

AuAuAu

Axx

nn

n

...

...

22211

221

n(1) uuux nn ...2221

All less than 1

Page 27: Extrapolation

27

Power Method

n0 uuux n ...221)(

u1

1u2

2

u3

3

u4

4

u5

5

u1

1u2

22

u3

33

u4

44

u5

55

n(1) uuux nn ...2221

n)( uuux 2

22221

2 ... nn u1

1u2

222

u3

332

u4

442

u5

552

Page 28: Extrapolation

28

The smaller 2, the faster the convergence of the Power Method.

Convergence

n)( uuux k

nnkk ...2221

u1

1u2

22k

u3

33k

u4

44k

u5

55k

Page 29: Extrapolation

29

Our Approach

u1 u2 u3 u4 u5

Estimate components of current iterate in the directions of second two eigenvectors, and eliminate them.

Page 30: Extrapolation

30

Why this approach? For traditional problems:

A is smaller, often dense. 2 often close to , making the power method slow.

In our problem, A is huge and sparse More importantly, 2 is small1.

Therefore, Power method is actually much faster than other methods.

1(“The Second Eigenvalue of the Google Matrix” dbpubs.stanford.edu/pub/2003-20.)

Page 31: Extrapolation

31

Using Successive Iterates

u1

x(0)

u1 u2 u3 u4 u5

Page 32: Extrapolation

32

Using Successive Iterates

u1

x(1)

x(0)

u1 u2 u3 u4 u5

Page 33: Extrapolation

33

Using Successive Iterates

u1

x(1)

x(0)

x(2)

u1 u2 u3 u4 u5

Page 34: Extrapolation

34

Using Successive Iterates

x(0)

u1

x(1)

x(2)

u1 u2 u3 u4 u5

Page 35: Extrapolation

35

Using Successive Iterates

x(0)

x’ = u1

x(1)

u1 u2 u3 u4 u5

Page 36: Extrapolation

36

How do we do this? Assume x(k) can be written as a linear

combination of the first three eigenvectors (u1, u2, u3) of A.

Compute approximation to {u2,u3}, and subtract it from x(k) to get x(k)’

Page 37: Extrapolation

37

Assume Assume the x(k) can be represented by

first 3 eigenvectors of A

33322211 uuuAxx )()( kk

n)( uuux 3221 k

32332

2221

2 uuux )( k

33332

3221

3 uuux )( k

Page 38: Extrapolation

38

Linear Combination Let’s take some linear combination of

these 3 iterates.

)()()( xxx 33

22

11

kkk

)( 32332

22212 uuu

)( 33332

32213 uuu

)( 33322211 uuu

Page 39: Extrapolation

39

Rearranging Terms We can rearrange the terms to get:

)()()( xxx 33

22

11

kkk

1321 )( u

2323

222212 )( u

3333

232313 )( u

Goal: Find 1,2,3 so that coefficients of u2 and u3 are 0, and coefficient of u1 is 1.

Page 40: Extrapolation

40

Summary We make an assumption about the

current iterate. Solve for dominant eigenvector as a

linear combination of the next three iterates.

We use a few iterations of the Power Method to “clean it up”.

Page 41: Extrapolation

41

u1 u2 u3 u4 u5

u1 u2 u3 u4 u5

0.4

0.2

0.4

(k)1)(k Axx Repeat:

Outline Definition of PageRank

Computation of PageRank

Convergence Properties

Outline of Our Approach

Empirical Results

Page 42: Extrapolation

42

ResultsQuadratic Extrapolation speeds up convergence. Extrapolation was only used 5 times!

Page 43: Extrapolation

43

ResultsExtrapolation dramatically speeds up convergence, for high values of c (c=.99)

Page 44: Extrapolation

44

Take-home message Speeds up PageRank by a fair amount,

but not by enough for true Personalized PageRank.

Ideas are useful for further speedup algorithms.

Quadratic Extrapolation can be used for a whole class of problems.

Page 45: Extrapolation

45

The End Paper available at

http://dbpubs.stanford.edu/pub/2003-16