the linear algebra aspects of pagerank
TRANSCRIPT
![Page 1: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/1.jpg)
The Linear Algebra Aspectsof PageRank
Ilse Ipsen
Thanks to Teresa Selee and Rebecca Wills
NA – p.1
![Page 2: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/2.jpg)
More PageRank More Visitors
NA – p.2
![Page 3: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/3.jpg)
Two Factors
Determine where Google displays a web page onthe Search Engine Results Page:
1. PageRank (links)
A page has high PageRankif many pages with high PageRank link to it
2. Hypertext Analysis (page contents)
Text, fonts, subdivisions, location of words,contents of neighbouring pages
NA – p.3
![Page 4: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/4.jpg)
PageRank
An objective measure of the citation importance ofa web page [Brin & Page 1998]
• Assigns a rank to every web page• Influences the order in which Google displays
search results• Based on link structure of the web graph• Does not depend on contents of web pages• Does not depend on query
NA – p.4
![Page 5: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/5.jpg)
PageRank
. . . continues to provide the basis for all of our websearch tools http://www.google.com/technology/
• “Links are the currency of the web”• Exchanging & buying of links• BO (backlink obsession)• Search engine optimization
NA – p.5
![Page 6: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/6.jpg)
Overview
• Mathematical Model of Internet• Computation of PageRank• Sensitivity of PageRank to Rounding Errors• Addition & Deletion of Links• Web Pages that have no Outlinks• Is the Ranking Correct?
NA – p.6
![Page 7: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/7.jpg)
Mathematical Model of Internet
1. Represent internet as graph
2. Represent graph as stochastic matrix
3. Make stochastic matrix more convenient=⇒ Google matrix
4. dominant eigenvector of Google matrix=⇒ PageRank
NA – p.7
![Page 8: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/8.jpg)
The Internet as a Graph
Link from one web page to another web page
Web graph:
Web pages = nodesLinks = edges
NA – p.8
![Page 9: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/9.jpg)
The Web Graph as a Matrix
3
2
55
4
1
S =
0 12
0 12
0
0 0 13
13
13
0 0 0 1 0
0 0 0 0 1
1 0 0 0 0
Links = nonzero elements in matrix
NA – p.9
![Page 10: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/10.jpg)
Elements of Matrix S
Assume: every page i has li ≥ 1 outlinks
If page i has link to page j then sij = 1/lielse sij = 0
Probability that surfer moves from page i to page j
NA – p.10
![Page 11: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/11.jpg)
Properties of Matrix S
• Stochastic: 0 ≤ sij ≤ 1 S11 = 11• Dominant left eigenvector:
ωT S = ωT ω ≥ 0 ‖ω‖1 = 1
• ωi is probability that surfer visits page i
But: ω not uniqueif S has several eigenvalues equal to 1
Remedy: Make the matrix more convenient
NA – p.11
![Page 12: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/12.jpg)
Google Matrix
Convex combination
G = αS + (1 − α)11vT
︸ ︷︷ ︸
rank 1
• Stochastic matrix S
• Damping factor 0 ≤ α < 1e.g. α = .85
• Column vector of all ones 11• Personalization vector v ≥ 0 ‖v‖1 = 1
Models teleportation
NA – p.12
![Page 13: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/13.jpg)
Properties of Google Matrix G
G = αS + (1 − α)11vT
• Stochastic, reducible• Eigenvalues of G:
1 > αλ2(S) ≥ αλ3(S) ≥ . . .
• Unique dominant left eigenvector:
πT G = πT π ≥ 0 ‖π‖1 = 1
NA – p.13
![Page 14: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/14.jpg)
PageRank
Google Matrix
G = αS︸︷︷︸
Links
+ (1 − α)11vT
︸ ︷︷ ︸
Personalization
πT G = πT π ≥ 0 ‖π‖1 = 1
πi is PageRank of web page i
PageRank.= dominant left eigenvector of G
NA – p.14
![Page 15: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/15.jpg)
How Google Ranks Web Pages
• Model:Internet → web graph → stochastic matrix G
• Computation:PageRank π is eigenvector of G
πi is PageRank of page i
• Display:If πi > πk thenpage i may∗ be displayed before page k
∗ depending on hypertext analysis
NA – p.15
![Page 16: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/16.jpg)
History
• The anatomy of a large-scale hypertextual websearch engineBrin & Page 1998
• US patent for PageRank granted in 2001• Eigenstructure of the Google Matrix
Haveliwala & Kamvar 2003Eldén 2003Serra-Capizzano 2005
NA – p.16
![Page 17: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/17.jpg)
Statistics
• Google indexes 10s of billions of web pages• “3 times more than any competitor”• Google serves ≥ 200 million queries per day• Each query processed by ≥ 1000 machines• All search engines combined serve a total of
≥ 500 million queries per day
[Desikan, 26 October 2006]
NA – p.17
![Page 18: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/18.jpg)
Computation of PageRank
The world’s largest matrix computation[Moler 2002]
• Eigenvector• Matrix dimension is 10s of billions• The matrix changes often
250,000 new domain names every day• Fortunately: Matrix is sparse
NA – p.18
![Page 19: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/19.jpg)
Power Method
Want: π such that πT G = πT
Power method:
Pick an initial guess x(0)
Repeat[x(k+1)]T := [x(k)]T G
Each iteration is a matrix vector multiply
NA – p.19
![Page 20: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/20.jpg)
Matrix Vector Multiply
xT G = xT[αS + (1 − α)11vT
]
NA – p.20
![Page 21: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/21.jpg)
An Iteration is Cheap
Google matrix G = αS + (1 − α)11vT
Vector x ≥ 0 ‖x‖1 = 1
xT G = xT[αS + (1 − α)11vT
]
= α xT S + (1 − α) xT 11︸︷︷︸=1
vT
= α xT S + (1 − α)vT
Cost: # non-zero elements in SNA – p.21
![Page 22: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/22.jpg)
Error in Power Method
πT G = πT G = αS + (1 − α)11vT
[x(k+1) − π]T = [x(k)]T G − πT G
= α[x(k)]T S − απT S
= α[x(k) − π]T S
‖x(k+1) − π‖︸ ︷︷ ︸
iteration k+1
≤ α ‖x(k) − π‖︸ ︷︷ ︸
iteration k
Norms: 1, ∞NA – p.22
![Page 23: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/23.jpg)
Error in Power Method
πT G = πT G = αS + (1 − α)11vT
Error after k iterations:
‖x(k) − π‖ ≤ αk ‖x(0) − π‖︸ ︷︷ ︸
≤2
Norms: 1, ∞ [Bianchini, Gori & Scarselli 2003]
Error bound does not depend on matrix dimension
NA – p.23
![Page 24: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/24.jpg)
Iteration Counts for Different α
bound: k such that 2 αk ≤ 10−8
Termination based on residual norms vs bound
α n = 281903 n = 683446 bound
.85 69 65 119
.90 107 102 166
.95 219 220 415
.99 1114 1208 2075
Fewer iterations than predicted by bound
NA – p.24
![Page 25: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/25.jpg)
Advantages of Power Method
• Converges to unique vector• Convergence rate α
• Convergence independent of matrix dimension• Vectorizes• Storage for only a single vector• Sparse matrix operations• Accurate (no subtractions)• Simple (few decisions)
But: can be slow NA – p.25
![Page 26: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/26.jpg)
PageRank Computation
• Power methodPage, Brin, Motwani & Winograd 1999Bianchini, Gori & Scarselli 2003
• Acceleration of power methodKamvar, Haveliwala, Manning & Golub 2003Haveliwala, Kamvar, Klein, Manning & Golub 2003Brezinski & Redivo-Zaglia 2004, 2006Brezinski, Redivo-Zaglia & Serra-Capizzano 2005
• Aggregation/DisaggregationLangville & Meyer 2002, 2003, 2006Ipsen & Kirkland 2006
NA – p.26
![Page 27: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/27.jpg)
PageRank Computation
• Methods that adapt to web graphBroder, Lempel, Maghoul & Pedersen 2004 Kamvar,Haveliwala & Golub 2004Haveliwala, Kamvar, Manning & Golub 2003Lee, Golub & Zenios 2003Lu, Zhang, Xi, Chen, Liu, Lyu & Ma 2004Ipsen & Selee 2006
• Krylov methodsGolub & Greif 2004Del Corso, Gullí, Romani 2006
NA – p.27
![Page 28: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/28.jpg)
PageRank Computation
• Schwarz & asynchronous methodsBru, Pedroche & Szyld 2005Kollias, Gallopoulos & Szyld 2006
• Linear system solutionArasu, Novak, Tomkins & Tomlin 2002Arasu, Novak & Tomkins 2003Bianchini, Gori & Scarselli 2003Gleich, Zukov & Berkin 2004Del Corso, Gullí & Romani 2004Langville & Meyer 2006
NA – p.28
![Page 29: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/29.jpg)
PageRank Computation
• Surveys of numerical methods:Langville & Meyer 2004Berkhin 2005Langville & Meyer 2006 (book)
NA – p.29
![Page 30: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/30.jpg)
Sensitivity of PageRank
How sensitive is PageRank π to smallperturbations, e.g. rounding errors
• Changes in matrix S
• Changes in damping factor α
• Changes in personalization vector v
NA – p.30
![Page 31: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/31.jpg)
Perturbation Theory
For Markov chains
Schweizer 1968, Meyer 1980Haviv & van Heyden 1984Funderlic & Meyer 1986Golub & Meyer 1986Seneta 1988, 1991Ipsen & Meyer 1994Kirkland, Neumann & Shader 1998Cho & Meyer 2000, 2001Kirkland 2003, 2004
NA – p.31
![Page 32: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/32.jpg)
Perturbation Theory
For Google matrix
Chien, Dwork, Kumar & Sivakumar 2001Ng, Zheng & Jordan 2001Bianchini, Gori & Scarselli 2003Boldi, Santini & Vigna 2004, 2005Langville & Meyer 2004Golub & Greif 2004Kirkland 2005, 2006Chien, Dwork, Kumar, Simon & Sivakumar 2005Avrechenkov & Litvak 2006
NA – p.32
![Page 33: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/33.jpg)
Changes in the Matrix S
Exact:
πT G = πT G = αS + (1 − α)11vT
Perturbed:
π̃T G̃ = π̃T G̃ = α(S + E) + (1 − α)11vT
Error:
π̃T − πT = απ̃T E(I − αS)−1
‖π̃ − π‖1 ≤α
1 − α‖E‖∞
NA – p.33
![Page 34: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/34.jpg)
Changes in α and v
• Change in amplification factor:
G̃ = (α + µ)S + (1 − (α + µ)) 11vT
Error: ‖π̃ − π‖1 ≤ 21−α
|µ|
[Langville & Meyer 2004]
• Change in personalization vector:
G̃ = αS + (1 − α)11(v + f)T
Error: ‖π̃ − π‖1 ≤ ‖f‖1
NA – p.34
![Page 35: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/35.jpg)
Sensitivity of PageRank π
πT G = πT G = αS + (1 − α)11vT
Changes in• S: condition number α/(1 − α)
• α: condition number 2/(1 − α)
• v: condition number 1
α = .85: condition numbers ≤ 14α = .99: condition numbers ≤ 200
PageRank insensitive to rounding errorsNA – p.35
![Page 36: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/36.jpg)
Adding an In-Link
����
����
i-
π̃i > πi
Adding an in-link increases PageRank(monotonicity)
Removing an in-link decreases PageRank
[Chien, Dwork, Kumar & Sivakumar 2001]
[Chien, Dwork, Kumar, Simon & Sivakumar 2005]NA – p.36
![Page 37: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/37.jpg)
Adding an Out-Link
����
1
����
2
����
� 3
?�
��
����
��
�
π̃3 =1 + α + α2
3(1 + α + α2/2)< π3 =
1 + α + α2
3(1 + α)
Adding an out-link may decrease PageRank
NA – p.37
![Page 38: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/38.jpg)
Justification for TrustRank
Adjust personalization vector to combat web spam[Gyöngyi, Garcia-Molina, Pedersen 2004]
Increase v for page i: vi := vi + φDecrease v for page j: vj := vj − φ
PageRank of page i increases: π̃i > πi
PageRank of page j decreases: π̃j < πj
Total change in PageRank ‖π̃ − π‖1 ≤ 2φ
NA – p.38
![Page 39: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/39.jpg)
Web Pages that have no Outlinks
• Technical term: Dangling Nodes• Examples:
Image filesPDF and PS filesPages whose links have not yet been crawledProtected web pages
• 50%-80% of all web pages• Problem: zero rows in matrix• Popular fix: Insert artificial links
NA – p.39
![Page 40: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/40.jpg)
Dangling Node Fix
1 2
4 3
1 2
4 3
0 13
13
13
12
0 12
0
0 0 0 1
0 0 0 0
0 13
13
13
12
0 12
0
0 0 0 1
w1 w2 w3 w4
NA – p.40
![Page 41: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/41.jpg)
Inside the Stochastic Matrix S
Number pages so that dangling nodes are last
S =
(H
0
)
+
(0
11wT
)
︸ ︷︷ ︸
rank 1
Links from nondangling nodes: HDangling node vector w ≥ 0 ‖w‖1 = 1
Google matrix G = α
(H
11wT
)
+ (1 − α)11vT
NA – p.41
![Page 42: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/42.jpg)
Partitioning the Google Matrix
G =
(G11 G12
11uT1 11uT
2
)
G12
G11 u2
u1
(uT
1 uT2
)= αwT
︸ ︷︷ ︸
dangling nodes
+ (1 − α)vT
︸ ︷︷ ︸
personalization NA – p.42
![Page 43: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/43.jpg)
Lumping
Separate dangling and nondangling nodes“Lump” all dangling nodes into single node
• Stochastic matrices:Kemeny & Snell 1960Dayar & Stewart 1997Jernigan & Baran 2003Gurvits & Ledoux 2005
• Google matrix:Lee, Golub & Zenios 2003Ipsen & Selee 2006
NA – p.43
![Page 44: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/44.jpg)
Example
−→: real links
−→: artificial links NA – p.44
![Page 45: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/45.jpg)
Lumped Example
NA – p.45
![Page 46: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/46.jpg)
Google Lumping
1. “Lump” all dangling nodes into a single node
2. Compute dominant eigenvector ofsmaller, lumped matrix=⇒ PageRank of nondangling nodes
3. Determine PageRank of dangling nodeswith one matrix vector multiply
NA – p.46
![Page 47: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/47.jpg)
1. Lump Dangling Nodes
NA – p.47
![Page 48: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/48.jpg)
1. Lump Dangling Nodes
G =
(G11 G12
11uT1 11uT
2
)
Lump n − d dangling nodes into a single node
=⇒ Lumped matrix has dimension d + 1
L =
(G11 G1211uT
1 uT2 11
)
Stochastic, same nonzero eigenvalues as G
NA – p.48
![Page 49: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/49.jpg)
2. Eigenvector of Lumped Matrix
L =
(G11 G1211uT
1 uT2 11
)
Lumped matrix with d nondangling nodes
Compute eigenvector of lumped matrix
σT L = σT σ ≥ 0 ‖σ‖1 = 1
PageRank of nondangling nodes: σ1:d
NA – p.49
![Page 50: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/50.jpg)
3. Dangling Nodes
G =
(G11 G12
11uT1 11uT
2
)
L =
(G11 G1211uT
1 uT2 11
)
Eigenvector of lumped matrix: σT L = σT
PageRank of dangling nodes:
σT
(G12
uT2
)
One matrix vector multiply
NA – p.50
![Page 51: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/51.jpg)
Summary: Dangling Nodes
n web pages with n − d dangling nodes• PageRank σ1:d of d nondangling nodes:
from lumped matrix L of dimension d + 1
• PageRank of dangling nodes:one matrix vector multiply
• Total PageRank
πT =
σT1:d︸︷︷︸
nondangling
σT
(G12
uT2
)
︸ ︷︷ ︸
dangling
NA – p.51
![Page 52: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/52.jpg)
Summary: Dangling Nodes, ctd.
• PageRank of nondangling nodes isindependent of PageRank of dangling nodes
• PageRank of nondangling nodes can becomputed separately
• Power method on lumped matrix L:same convergence rate as for Gbut L much smaller than G
speed increases with # dangling nodes
NA – p.52
![Page 53: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/53.jpg)
Is the Ranking Correct?
πT =(.23 .24 .26 .27
)
• [x(k)]T =(.27 .26 .24 .23
)
‖x(k) − π‖∞ = .04
Small error, but incorrect ranking
• [x(k)]T =(0 .001 .002 .997
)
‖x(k) − π‖∞ = .727
Large error, but correct rankingNA – p.53
![Page 54: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/54.jpg)
Is the Ranking Correct?
After k iterations of power method:
Error: ‖x(k) − π‖ ≤ 2 αk
But: Do the components of x(k) have the sameranking as those of π?
Rank-stability, rank-similarity: [Lempel & Moran, 2005]
[Borodin, Roberts, Rosenthal & Tsaparas 2005]
NA – p.54
![Page 55: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/55.jpg)
Web Graph is a Ring
S =
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
1 0 0 0 0
[Ipsen & Wills]
NA – p.55
![Page 56: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/56.jpg)
All Pages are Trusted
S is circulant of order n, v = 1n
11
• PageRank: π = 1n
11All pages have same PageRank
• Power methodx(0) = v: x(0) = π correct ranking
x(0) 6= v: [x(k)]T ∼ 1n
11T
+ αk([x(0)]TSk − 1
n11
)
Ranking does not converge (in exact arithmetic)
NA – p.56
![Page 57: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/57.jpg)
Only One Page is Trusted
vT =(1 0 0 0 0
)
NA – p.57
![Page 58: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/58.jpg)
Only One Page is Trusted
πT ∼(1 α α2 α3 α4
)
PageRank decreases with distance from page 1NA – p.58
![Page 59: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/59.jpg)
Only One Page is Trusted
S is circulant of order n, v = e1
• PageRank: πT ∼(1 α . . . αn−1
)
• Power method with x(0) = v:
[x(k)]T ∼(
1 α . . . αk−1 αk
1−α0 . . . 0
)
[x(n)]T ∼(
1 + αn
1−αα α2 . . . αn−1
)
Rank convergence in n iterations
NA – p.59
![Page 60: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/60.jpg)
Too Many Iterations
Power method with x(0) = v = e1:
• After n iterations:[x(n)]T ∼
(
1 + αn
1−αα α2 . . . αn−1
)
• After n + 1 iterations:[x(n+1)]T ∼
(
1 + αn α + αn+1
1−αα2 . . . αn−1
)
If α = .85, n = 10: α + αn+1
1−α> 1 + αn
Additional iterations can destroya converged ranking
NA – p.60
![Page 61: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/61.jpg)
Recovery of Ranking
S is circulant of order n
• After k iterations:
[x(k)]T = αk [x(0)]T Sk + (1 − α)vTk−1∑
j=0
αjSj
• After k + n iterations:
[x(k+n)]T = αn [x(k)]T + (1 − αn)πT
If x(k) has correct ranking, so does x(k+n)
NA – p.61
![Page 62: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/62.jpg)
Any Personalization Vector
S is circulant of order n
• PageRank: πT ∼ vT∑n−1
j=0 αjSj
• Power method with x(0) = 1n
11
[x(n)]T = (1 − αn)︸ ︷︷ ︸
scalar
πT +αn
n11T
︸ ︷︷ ︸constant vector
For any v: rank convergence after n iterations
NA – p.62
![Page 63: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/63.jpg)
Problems with Ranking
• Ranking may never converge• Additional iterations can destroy ranking• Small error does not imply correct ranking• Rank convergence depends on:
α, v, initial guess, matrix dimension,structure of web graph
• How do we know when the ranking is correct?• Even if successive iterates have the same
ranking, their ranking may not be correct
NA – p.63
![Page 64: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/64.jpg)
Summary
• Google orders web pages according to:PageRank and hypertext analysis
• PageRank = left eigenvector of G
G = αS + (1 − α)11vT
• Power method: simple and robust
• Error in iteration k bounded by αk
• Convergence rate largely independent ofdimension and eigenvalues of G
NA – p.64
![Page 65: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/65.jpg)
Summary, ctd
• PageRank insensitive to rounding errors• Adding in-links increases PageRank• Adding out-links may decrease PageRank• Dangling nodes = pages w/o outlinks
Rank one change to hyperlink matrix• Lumping:
PageRank of nondangling nodes computedseparately from PageRank of dangling nodes
• Ranking problem: DIFFICULT
NA – p.65
![Page 66: The Linear Algebra Aspects of PageRank](https://reader034.vdocuments.us/reader034/viewer/2022051522/588c70c01a28ab4e218b88f2/html5/thumbnails/66.jpg)
User-Friendly Resources
• Rebecca Wills:Google’s PageRank: The Math Behind theSearch EngineMathematical Intelligencer, 2006
• Amy Langville & Carl Meyer:Google’s PageRank and BeyondThe Science of Search Engine RankingsPrinceton University Press, 2006
• Amy Langville & Carl Meyer:Broadcast of On-Air Interview, November 2006Carl Meyer’s web page
NA – p.66