probability, linear algebra, and numerical analysis: …wright/courses/m365/google...google's...

41
Google's PageRank Algorithm Probability, linear algebra, and numerical Probability, linear algebra, and numerical analysis: the mathematics behind analysis: the mathematics behind G G o o o o g g l l e' e' s s TM TM PageRank PageRank TM TM Grady Wright Department of Mathematics Boise State University

Upload: others

Post on 04-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm

Probability, linear algebra, and numerical Probability, linear algebra, and numerical analysis: the mathematics behindanalysis: the mathematics behind

GGoooogglle'e'ssTMTM PageRankPageRankTMTM

Grady WrightDepartment of Mathematics

Boise State University

Page 2: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm A GGooooggllee search

Page 3: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm A GGooooggllee search

● Two step process

1) Text processing

2) Ranking

● Information Retrieval score

● PageRankTM score“Heart of GGooooggllee software”

● Brin and Page (1998)

● Kleinberg: HITS

www.teoma.com

Page 4: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Outline

● Heuristic interpretation of PageRank

● PageRank as a random walk (surf)

● Linear algebra formulation

● Computing PageRank

● Example and tools

● Advanced topics

Page 5: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm A tiny web example● Pages of the web W:

two

one

three four

five

six

Page 6: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm PageRank: a random walk (or surf)● Example:

● Infinitely dedicated random surfer● Outlinks

two

one

three four

five

six

Page 7: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm PageRank: a random walk (or surf)● Example:

● Infinitely dedicated random surfer● Outlinks● Dangling node

two

one

three four

five

six

Page 8: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm A tiny web example● Pages of the web W as a directed graph:

● Interpretation of PageRank: ● A page is important if an important page has a link to it.

● “Democracy of the web”: a link from page A to page B is a vote from A to B.

● The web according to GGooooggllee has about 4.43 billion pages (4,430,000,000) (Estimated October 17, 2018 by http://www.worldwidewebsize.com)

two

one

three four

five

six

Page 9: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm PageRank: a random walk (or surf)● Example:

● Infinitely dedicated random surfer● Outlinks● Dangling node

● Markov Chain

● Probabilistic interpretation of PageRank:

A webpage's PageRank is the probability that at any particular time, the infinitely dedicated random surfer is visiting that page.

two

one

three four

five

six

Page 10: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Linear algebra formulation● Example:

● A directed graph can be represented using a connectivity matrix G.

G=[0 0 1 0 0 01 0 1 0 0 01 0 0 0 0 00 0 1 0 0 10 0 0 1 0 10 0 0 1 1 0

]Entries of G:

g i , j=

1 if page j has a link to page i

0 otherwise

i,j = 1,2,...,n

two

one

three four

five

six

Page 11: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Linear algebra formulation● Add the random surf info to G:

Directed graph of W:

Connectivity matrix:

Transition probability matrix:

A=[0 0 1/3 0 0 0

1/2 0 1/3 0 0 01/2 0 0 0 0 00 0 1/3 0 0 1/20 0 0 1/2 0 1/20 0 0 1/2 1 0

] [0 1/6 0 0 0 00 1/6 0 0 0 00 1/6 0 0 0 00 1/6 0 0 0 00 1/6 0 0 0 00 1/6 0 0 0 0

] = [0 1/6 1/3 0 0 0

1/2 1/6 1/3 0 0 01/2 1/6 0 0 0 0

0 1/6 1/3 0 0 1/20 1/6 0 1/2 0 1/20 1/6 0 1/2 1 0

]P e d T

/6

two

one

three four

five

six

d=[ 0 1 0 0 0 0 ]T

e=[ 1 1 1 1 1 1 ]T

G=[0 0 1 0 0 01 0 1 0 0 01 0 0 0 0 00 0 1 0 0 10 0 0 1 0 10 0 0 1 1 0

]c=[ 2 0 3 2 1 2 ]

T

Page 12: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Linear algebra formulation● Add the random surf info to G:

Directed graph of W:

Connectivity matrix:

Transition probability matrix:

A=[0 0 1/3 0 0 0

1/2 0 1/3 0 0 01/2 0 0 0 0 00 0 1/3 0 0 1/20 0 0 1/2 0 1/20 0 0 1/2 1 0

] [0 1/6 0 0 0 00 1/6 0 0 0 00 1/6 0 0 0 00 1/6 0 0 0 00 1/6 0 0 0 00 1/6 0 0 0 0

] = [0 1/6 1/3 0 0 0

1/2 1/6 1/3 0 0 01/2 1/6 0 0 0 0

0 1/6 1/3 0 0 1/20 1/6 0 1/2 0 1/20 1/6 0 1/2 1 0

]P e d T

/6

two

one

three four

five

six

e=[ 1 1 1 1 1 1 ]T

d=[ 0 1 0 0 0 0 ]T

c=[ 2 0 3 2 1 2 ]T

G=[0 0 1 0 0 01 0 1 0 0 01 0 0 0 0 00 0 1 0 0 10 0 0 1 0 10 0 0 1 1 0

]

Page 13: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Linear algebra formulation● For a general web W:

c j=∑i=1

n

g i , j j=1,2, , n

Define:

pi , j=if c

j ≠ 0

0 otherwise

i,j = 1,2,...,n

g i , j

c j

(number of outgoing links from page j)

(probability of visiting page i based on a random choice from the links on page j)

d j=if c

j = 0

0 otherwise

j = 1,2,...,n1

(tracks dangling pages)

● Transition probability matrix A:

A=Pend T , e=[ 1 1 ⋯ 1 ]

Twhere

n

Page 14: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Avoiding cycles around cliques● Problem:

● Solution: random teleportation

two

one

three four

five

six

Page 15: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Avoiding cycles around cliques● Problem:

● Solution: random teleportation

● Modification to the transition probability matrix:

A= PendT 1− en eT , 01

● Matrix for above example:

(Google originally set α = 0.85)

A=[0 1/6 1/3 0 0 0

1/2 1/6 1/3 0 0 01/2 1/6 0 0 0 00 1/6 1/3 0 0 1/20 1/6 0 1/2 0 1/20 1/6 0 1/2 1 0

] 1−[1/6 1/6 1/6 1/6 1/6 1/61/6 1/6 1/6 1/6 1/6 1/61/6 1/6 1/6 1/6 1/6 1/61/6 1/6 1/6 1/6 1/6 1/61/6 1/6 1/6 1/6 1/6 1/61/6 1/6 1/6 1/6 1/6 1/6

]

two

one

three four

five

six

Page 16: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Importance of the transition prob. matrix● Probability distribution vector :

xj = prob. the random surfer is currently visiting page j.

x

Ax j=prob. the random surfer will be visiting page j after leaving her current location.

● Example: two

one

three four

five

six

α = 0.85

[1/40 1/6 37/120 1/40 1/40 1/409/20 1/6 37/120 1/40 1/40 1/409/20 1/6 1/40 1/40 1/40 1/401/40 1/6 37/120 1/40 1/40 9/201/40 1/6 1/40 9/20 1/40 9/201/40 1/6 1/40 9/20 7/8 1/40

]A

[100000]

x

=[0.0250.450.45

0.0250.0250.025

]

∑j=1

n

x j=1

Page 17: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Importance of the transition prob. matrix

xj = prob. the random surfer is currently visiting page j.

Ax j=prob. the random surfer will be visiting page j after leaving her current location.

● Example:α = 0.85

[1/40 1/6 37/120 1/40 1/40 1/409/20 1/6 37/120 1/40 1/40 1/409/20 1/6 1/40 1/40 1/40 1/401/40 1/6 37/120 1/40 1/40 9/201/40 1/6 1/40 9/20 1/40 9/201/40 1/6 1/40 9/20 7/8 1/40

]A

[0.0250.450.450.0250.0250.025

]x

=[0.21625

0.2268750.0993750.226875

0.110.120625

]

two

one

three four

five

six

● Probability distribution vector : x ∑j=1

n

x j=1

Page 18: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm PageRank defined● Page j's PageRank: jth entry of the PDV satisfying

v=Av= stationary distribution vector of A.

● What is the mathematical name for ?

● Three concerns:

1. Existence

2. Uniqueness

3. Computation

v

v

v

Page 19: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Perron-Frobenius Theorem Theorem: If A is an n-by-n matrix with positive entries then

1) One of its eigenvalues λ is positive and dominant.

2) There exists a unique (up to scaling) positive eigenvector corresponding to the dominant eigenvalue.

3) The dominant eigenvalue is simple.

Corollary: If the sum of each column of A equals 1 then λ=1 is the dominant eigenvalue.

A= Pend T

1−eneT , 01Recall:

PageRank vector is the dominant eigenvector of A.

Page 20: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Computing PageRank: Power Method● All we need is the dominant eigenvector!

● An idea:

{1,2,3, ,m} , {v , v2 , v3 , , vm}

eigenvalues/vectors of A

Define:

Suppose: x0=v2 v23 v3⋯m vm

Consider: x1=Ax0

=Av2 A v23 A v3⋯m A vm

=v22 v233 v3⋯mm vm

1∣2∣≥∣3∣≥⋯≥∣m∣

∑j=1

n

x j0=1

Page 21: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Computing PageRank: Power Method● All we need is the dominant eigenvector!

x2=Ax1

=A2x0

=v222v233

2v3⋯mm

2vm

● All we need is the dominant eigenvector!

● An idea:

{1,2,3, ,m}, {v , v2 , v3 , , vm}

eigenvalues/vectors of A

Define:

Suppose: x0=v2 v23 v3⋯m vm

Consider: x1=Ax0

=Av2 A v23 A v3⋯m A vm

=v22 v233 v3⋯mm vm

1∣2∣≥∣3∣≥⋯≥∣m∣

∑j=1

n

x j0=1

Page 22: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Computing PageRank: Power Method● All we need is the dominant eigenvector!

. . .

xk1=Axk

=Akx0

=v22kv233

kv3⋯mm

kvm

x2=Ax1

=A2x0

=v222v233

2v3⋯mm

2vm

● An idea:

{1,2,3, ,m}, {v , v2 , v3 , , vm}

eigenvalues/vectors of A

Define:

Suppose: x0=v2 v23 v3⋯m vm

Consider: x1=Ax0

=Av2 A v23 A v3⋯m A vm

=v22 v233 v3⋯mm vm

1∣2∣≥∣3∣≥⋯≥∣m∣

∑j=1

n

x j0=1

Page 23: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Computing PageRank: Power Method

Thus: converges to the PageRank vector as

● All we need is the dominant eigenvector!

xk v k ∞ .

. . .

xk1=Axk

=Akx0

=v22kv233

kv3⋯mm

kvm

x2=Ax1

=A2x0

=v222v233

2v3⋯mm

2vm

● An idea:

{1,2,3, ,m}, {v , v2 , v3 , , vm}

eigenvalues/vectors of A

Define:

Suppose: x0=v2 v23 v3⋯m vm

Consider: x1=Ax0

=Av2 A v23 A v3⋯m A vm

=v22 v233 v3⋯mm vm

1∣2∣≥∣3∣≥⋯≥∣m∣

∑j=1

n

x j0=1

Page 24: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Example of power method

● Example: two

one

three four

five

six

α = 0.85

[1/40 1/6 37/120 1/40 1/40 1/409/20 1/6 37/120 1/40 1/40 1/409/20 1/6 1/40 1/40 1/40 1/401/40 1/6 37/120 1/40 1/40 9/201/40 1/6 1/40 9/20 1/40 9/201/40 1/6 1/40 9/20 7/8 1/40

] [1/61/61/61/61/61/6

][0.0958333330.166666670.119444440.166666670.190277780.26111111

]=● Initial guess: x0

=[ 1 1 ⋯ 1 ]T/n

x1x0

∥x1−x0

∥∞=max1≤ j≤n ∣x j1−x j

0∣=9.4⋅10−1

Page 25: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Example of power method

● Example: two

one

three four

five

six

α = 0.85

[1/40 1/6 37/120 1/40 1/40 1/409/20 1/6 37/120 1/40 1/40 1/409/20 1/6 1/40 1/40 1/40 1/401/40 1/6 37/120 1/40 1/40 9/201/40 1/6 1/40 9/20 1/40 9/201/40 1/6 1/40 9/20 7/8 1/40

] [0.0958333330.166666670.119444440.166666670.190277780.26111111

][0.0824537040.123182870.0893402780.193425930.230416670.28118056

]=● Initial guess: x0

=[ 1 1 ⋯ 1 ]T/n

x2x1

∥x2−x1

∥∞=4.3⋅10−2

Page 26: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Example of power method

● Example: two

one

three four

five

six

α = 0.85

[1/40 1/6 37/120 1/40 1/40 1/409/20 1/6 37/120 1/40 1/40 1/409/20 1/6 1/40 1/40 1/40 1/401/40 1/6 37/120 1/40 1/40 9/201/40 1/6 1/40 9/20 1/40 9/201/40 1/6 1/40 9/20 7/8 1/40

] [0.0824537040.123182870.0893402780.193425930.230416670.28118056

][0.0677639850.10280681

0.0774937310.187226570.244158660.32051109

]=● Initial guess: x0

=[ 1 1 ⋯ 1 ]T/n

x3x2

∥x3−x2

∥∞=3.9⋅10−2

Page 27: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Example of power method

● Example: two

one

three four

five

six

α = 0.85

[1/40 1/6 37/120 1/40 1/40 1/409/20 1/6 37/120 1/40 1/40 1/409/20 1/6 1/40 1/40 1/40 1/401/40 1/6 37/120 1/40 1/40 9/201/40 1/6 1/40 9/20 1/40 9/201/40 1/6 1/40 9/20 7/8 1/40

] [0.0677639850.10280681

0.0774937310.187226570.244158660.32051109

][0.0615208550.0903205490.0683639920.197738070.255369440.32668709

]=● Initial guess: x0

=[ 1 1 ⋯ 1 ]T/n

x4x3

∥x4−x3

∥∞=1.3⋅10−2

Page 28: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Example of power method

● Example: two

one

three four

five

six

α = 0.85

[1/40 1/6 37/120 1/40 1/40 1/409/20 1/6 37/120 1/40 1/40 1/409/20 1/6 1/40 1/40 1/40 1/401/40 1/6 37/120 1/40 1/40 9/201/40 1/6 1/40 9/20 1/40 9/201/40 1/6 1/40 9/20 7/8 1/40

] [0.0615208550.0903205490.0683639920.197738070.255369440.32668709

][0.0571652090.0833115720.0639417740.196007220.260676100.33889812

]=● Initial guess: x0

=[ 1 1 ⋯ 1 ]T/n

x5x4

∥x5−x4

∥∞=1.2⋅10−2

Page 29: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Example of power method

● Example: two

one

three four

five

six

α = 0.85

[1/40 1/6 37/120 1/40 1/40 1/409/20 1/6 37/120 1/40 1/40 1/409/20 1/6 1/40 1/40 1/40 1/401/40 1/6 37/120 1/40 1/40 9/201/40 1/6 1/40 9/20 1/40 9/201/40 1/6 1/40 9/20 7/8 1/40

] [0.0571652090.0833115720.0639417740.196007220.260676100.33889812

][0.0549193090.0792145230.0610976860.198951010.264137240.34168023

]=● Initial guess: x0

=[ 1 1 ⋯ 1 ]T/n

x6x5

∥x6−x5

∥∞=4.1⋅10−3

Page 30: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Example of power method

● Example: two

one

three four

five

six

α = 0.85

[1/40 1/6 37/120 1/40 1/40 1/409/20 1/6 37/120 1/40 1/40 1/409/20 1/6 1/40 1/40 1/40 1/401/40 1/6 37/120 1/40 1/40 9/201/40 1/6 1/40 9/20 1/40 9/201/40 1/6 1/40 9/20 7/8 1/40

] [0.0549193090.0792145230.0610976860.198951010.264137240.34168023

][0.0535330690.0768737750.0595627640.198747170.265990330.34529289

]=● Initial guess: x0

=[ 1 1 ⋯ 1 ]T/n

x7x6

∥x6−x5

∥∞=3.6⋅10−3

Page 31: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Example of power method

● Example: two

one

three four

five

six

α = 0.85

[1/40 1/6 37 /120 1/40 1/40 1/409/20 1/6 37 /120 1/40 1/40 1/409/20 1/6 1/40 1/40 1/40 1/401/40 1/6 37 /120 1/40 1/40 9/201/40 1/6 1/40 9/20 1/40 9/201/40 1/6 1/40 9/20 7/8 1/40

] [0.0517047460.0736792640.0574124130.199903810.268596080.34870368

][0.0517047460.0736792630.0574124130.199903810.268596080.34870368

]=● Initial guess: x0

=[ 1 1 ⋯ 1 ]T/n

x35x34

∥x35−x34

∥∞=1.0⋅10−9

PageRank

Page 32: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Power method algorithm and efficiency● Power method algorithm:

x⃗(0)= [1 1 ⋯ 1 ]

T/n

k=1

while

xk =Ax k−1

=∥xk −xk−1

∥∞

k=k1end

δ >ϵ

Operation

Axk−1

# FLOP

2n2−n

xk −xk−1 n

Total 2n2=O n2

● n = 45 ∙109

● 50 – 100 iterations: 19 – 37 days!

● FLOP ≈ 4.05 ∙ 1021

● Sunway TaihuLight: 125.435 ∙ 1015 FLOP/sec

● 1 iteration: 9 hours

Cocktail napkin computational cost analysis for GGooooggllee

δ =∞

Page 33: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm More efficient power method for PageRank● Idea: exploit the structure of the transition probability matrix

A= PendT

1−eneT , 01Recall:

Thus: xk =Axk−1

=P xk−1

end T

xk−11−

eneT

xk−1

xk = P xk−1

en

d Txk−1

1−

xk =[

0 0 1 /3 0 0 01/2 0 1 /3 0 0 01/2 0 0 0 0 00 0 1 /3 0 0 1/20 0 0 1/2 0 1/20 0 0 1/2 1 0

] [ xk−1] 16 [

111111] [ 0 1 0 0 0 0 ] [ xk−1] 1−

● Example:

Dense way: 47 FLOPs Sparse way: 30 FLOPs

Page 34: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm More efficient power method● Power method algorithm with sparse matrix

xk = P xk−1

en

d Txk−1

1−

=∥xk −xk−1

∥∞

k=k1end

Cocktail napkin computational cost analysis for GGooooggllee

Operation

1 Pxk−1

Approx. # FLOP

14 n

2 d Txk−1 n

Total 17 n=O n

● P averages 7 nonzero entries per row.

● 50 – 100 iterations: < 0.00061 seconds!

● 1 iteration: 6.1 ∙ 10-6 seconds12 n

● In actuality it takes a couple of days.

xk −xk−1 n

● FLOP ≈ 7.65 ∙ 1011

x0=[ 1 1 ⋯ 1 ]

T/n

while

k=1δ =∞

δ >ϵ

● Sunway TaihuLight: 125.435 ∙ 1015 FLOP/sec

Page 35: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Example: Boise State UniversityConnectivity matrix for www.boisestate.edu● Cleve Moler's (2004) surfer.m

(http://www.mathworks.com/moler)

● = 0.85

=∥xk −xk−1

∥∞≤10−7

● Repeat until

● Number of iterations = 58

Algorithm Time (sec.)

Dense matrix 8.28

Sparse matrix 0.08 nnz(P) = 1529518Sparsity ratio = 99.6%

● Modifications for BSU: bsusurfer.m (See course website)

Number of pages = 20000

Page 36: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm PageRank results for Boise State

01. 3.67059e-02 http://www.boisestate.edu

02. 8.84482e-03 http://template.boisestate.edu/feed

03. 7.61458e-03 http://template.boisestate.edu/comments/feed

04. 6.98594e-03 http://www.boisestate.edu/index.html

05. 6.58044e-03 http://my.boisestate.edu

06. 6.56559e-03 http://index.boisestate.edu

07. 6.34645e-03 http://directory.boisestate.edu

08. 6.15998e-03 http://maps.boisestate.edu

09. 5.87635e-03 http://news.boisestate.edu/update

10. 5.78294e-03 http://events.boisestate.edu

11. 5.77344e-03 http://go.boisestate.edu

12. 5.67036e-03 http://go.boisestate.edu/about

13. 5.61788e-03 http://go.boisestate.edu/boise-beyond

14. 5.49048e-03 http://go.boisestate.edu/year-in-photos

15. 5.48960e-03 http://news.boisestate.edu/facts

# PageRank Webpage

Page 37: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm PageRank results for Boise State

28 5.45711e-03 http://cobe.boisestate.edu

29 5.45711e-03 http://coas.boisestate.edu

43 5.45703e-03 http://coen.boisestate.edu

48 5.42033e-03 http://president.boisestate.edu

116 7.59747e-04 http://biology.boisestate.edu

117 7.56711e-04 http://biomolecularsciences.boisestate.edu

157 5.11459e-04 http://coen.boisestate.edu/ce

164 5.03192e-04 http://coen.boisestate.edu/cs

188 4.58116e-04 http://cobe.boisestate.edu/economics

192 4.26967e-04 http://coen.boisestate.edu/ece

265 3.36702e-04 http://coen.boisestate.edu/mse

267 3.36702e-04 http://coen.boisestate.edu/mbe

269 3.36702e-04 http://math.boisestate.edu

12539 9.81223e-06 http://math.boisestate.edu/~wright

# PageRank Webpage

Page 38: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Exercises

1. What is the PageRank vector for the following web:

one

two

three

five

four

2. What effect does decreasing have on the PageRank model?

A= PendT

1−eneT , 01Recall:

3. Let the web W consist of n pages and suppose satisfies ∑j=1

n

x j=1.x

If show where A is given above.y=Ax , ∑j=1

n

y j=1,

Page 39: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm More advanced PageRank topics

1. Effect of changing the teleport probability .

2. Faster algorithms.

xk1=P x k

en

d Txk

1−=vO k

3. Personalizing PageRank.

4. Search engine optimization (SEO).

5. Updating the PageRank vector.

Av=v [ I−Pend ] v=1−

ne

A= PendT

1−eneT

Au= PudT1−ueT

Page 40: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Search engine optimization

two

one

three four

five

six

● Idea

SEO

v=[0.0517047460.0736792630.0574124130.199903810.268596080.34870368

]Before SEO

[0.12742415

0.0612680760.0505303720.113654230.266260950.174231200.20663101

]After SEO

Page 41: Probability, linear algebra, and numerical analysis: …wright/courses/m365/google...Google's PageRank Algorithm Probability, linear algebra, and numerical analysis: the mathematics

Google's PageRank Algorithm Concluding remarks

● PageRank is the “Heart of GGooooggllee software”

● Use random walk (surf) to formulate PageRank problem.

● Use linear algebra to define PageRank.

● Can use the simple power method to compute PageRank.

● PageRank idea has been applied in many different areas:

For more details see: David F. Gleich, PageRank Beyond the Web, SIAM Review, 57 (2015), 321-363.