page-rank algorithm final

38
Page-Rank Algorithm Brandon B, Abbie H, Billy K, Hannah S

Upload: william-keene

Post on 17-Jan-2017

358 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Page-Rank Algorithm Final

Page-Rank AlgorithmBrandon B, Abbie H, Billy K, Hannah S

Page 2: Page-Rank Algorithm Final

Overview• History and Background

• Importance and Problems

• Algorithm and Variations of it

• Our Application of PageRank

Page 3: Page-Rank Algorithm Final

Background• Mathematical equation designed to measure the importance of web

pages

• Developed in 1996 at Stanford University by Larry Page and Sergey Brin, the founders of Google• Started as a research project

Page 4: Page-Rank Algorithm Final

Background• PageRank is a trademark of Google

• PageRank algorithm has been patented• Exact algorithm used today is unknown

• Algorithm is still tweaked each day to improve search results

Page 5: Page-Rank Algorithm Final

Background• Algorithm is one of ~200 factors that determines the order web pages

are reported to the users • Results are reported based on relevance to a search and overall importance

• Previously, internet search engines linked to pages that had the highest keyword density

Page 6: Page-Rank Algorithm Final

Problem with Previously Used Method• Possible for websites to easily increase their rank in search results

• Did not take into account the relevancy of the results to the search, therefore was not very useful

Page 7: Page-Rank Algorithm Final

The Algorithm • Measures a web page’s overall importance

• Importance of a page is based on the number of links into it from other web pages • Links can be viewed as votes

• Quality of the web pages linked into the page is taken into consideration too

Page 8: Page-Rank Algorithm Final

Web Pages as a Digraph• Web pages and links can be viewed as a digraph• Web pages are represented by vertices • Links in are represented by arcs directed in

Videos

Books

Home

Football

Page 9: Page-Rank Algorithm Final

Adjacency Matrix of Digraph• After creating a digraph with the web pages and links, an adjacency

matrix is constructed

• Algorithm is then applied using this matrix and PageRank values are determined

• On Google, pages are ranked from 1 to 10

Page 10: Page-Rank Algorithm Final

Importance of PageRank• PageRank is main factor used daily by Google to deliver the best

results to a Google search

• Can also be applied to other sets of data• Many real-life applications of the algorithm are possible (i.e. ranking NFL

teams based on wins and losses)

Page 11: Page-Rank Algorithm Final

Problems with the Algorithm• Some websites look for a way to increase their own PageRank• Called search engine optimization (SEO)

• Two specific examples of cheating:• Google Bomb• Link Farming

Page 12: Page-Rank Algorithm Final

Google Bomb• Occurs when a group of people conspire to increase PageRank

artificially by linking a particular word or phrase to the website

• Prevention: alter algorithm to rank pages by relevancy

Page 13: Page-Rank Algorithm Final

Link Farming• Linking without the thought of relevance of pages being linked• i.e. a website with a collection of random links to other websites

• Prevention: alter calculations to filter out possible link farms

Page 14: Page-Rank Algorithm Final

Real Life Examples of Cheating• JC Penny (furniture)• BMW German Car Sales Website• Bing uses Google’s search engine ranking system to improve their own

Page 15: Page-Rank Algorithm Final

The Algorithm• Construct a digraph with nodes representing pages• Number of nodes = N• W = NxN adjacency matrix where wij = 1 if there is a link from page i to page j• wij = 0 if there is no link from i to j• Degi is the out degree of node i, and D is the NxN diagonal matrix of deg• so = = is the starting vector with equal probabilities of each vertex

Page 16: Page-Rank Algorithm Final

Examplev1

v2

v4

v3

Page 17: Page-Rank Algorithm Final

ExampleAdjacency Matrix

v1

v2

v4

v3

V1 V2 V3 V4

V1 0 1 1 1

V2 0 0 1 0

V3 1 1 0 0

V4 0 1 0 0

Page 18: Page-Rank Algorithm Final

ExampleAdjacency Matrix Transpose

v1

v2

v4

v3

V1 V2 V3 V4

V1 0 0 1 0

V2 1 0 1 1

V3 1 1 0 0

V4 1 0 0 0

Page 19: Page-Rank Algorithm Final

ExampleAdjacency Matrix Transpose/out degrees

v1

v2

v4

v3

V1 V2 V3 V4

V1 0 0 .5 0

V2 1/3 0 .5 1

V3 1/3 1 0 0

V4 1/3 0 0 0

Page 20: Page-Rank Algorithm Final

ExampleStart Vector

1

0

0

0

5 iterations

0.86

1.43

1.48

0.23

v1

v2

v4

iterations→∞

0.36

0.59

0.71

0.12

Page 21: Page-Rank Algorithm Final

PageRank as a Stochastic Process• Markov Chain

• Discrete-time stochastic process

• Consisting of N states and a transition probability matrix Pϵ RNxN

• At each step, we are in exactly one of the states

Page 22: Page-Rank Algorithm Final

PageRank as a Stochastic Process• Markov Chain - Transition Probability Matrix

• Each entry is in the interval [0, 1]

• Pij = probability of j being the next state, given we are currently in state i ,for 1 i,j n

• A stochastic matrix has non-negative entries and satisfies

• Each entry is known as a transition probability and depends only on the current state i.

Page 23: Page-Rank Algorithm Final

PageRank as a Stochastic Process• Markov Chain – Example

1

Page 24: Page-Rank Algorithm Final

PageRank as a Stochastic Process• Markov Chain – Example

v1 v2 v3 v4

v1 0 0 1/2 0

v2 1/3 0 1/2 1

v3 1/3 1 0 0

v4 1/3 0 0 0

Transition Probability Matrix Pϵ R4x4

1

Page 25: Page-Rank Algorithm Final

PageRank as a Stochastic Process• Markov Chain – Example

1

1

1

Page 26: Page-Rank Algorithm Final

PageRank as a Stochastic Process• Markov Chain

• sk=Pk s∙ 0 , s is the state vector

• Does the process “settle down” and converge to a certain vector?

Page 27: Page-Rank Algorithm Final

Linear Algebra• What state vector should it converge to?• Want a state vector, π, that satisfies PT∙π = π (i.e. PT∙π = 1 ∙ π)• Recall definition: π is an eigenvector for eigenvalue λ = 1• A stochastic matrix has 1 as its maximum eigenvalue• This is the “Long term” or “steady state” vector

• This vector exists if the stochastic matrix is regular• Some power of P has all non-zero entries.

Page 28: Page-Rank Algorithm Final

Random Walk• Random Walk• Suppose you are at vertex (page) vi

• Randomly choose a vertex vj that vi is directed out to• Transition to that vertex• Probability of being at vj given at vi

• 0 if wij= 0 (no link from i to j)• wij/degi if wij= 1 (link exists from i to j)

• Problems?• Getting stuck at a vertex with no out degrees• More generally: getting caught in an isolated cycle• Non-regular matrix

Page 29: Page-Rank Algorithm Final

Teleporting Random Walk• Teleporting operation:

• The surfer jumps from a node to any other node in the Web graph, e.g. type an address into URL bar

• The destination of a teleport operation is chosen uniformly at random for all Web pages: 1/N

Page 30: Page-Rank Algorithm Final

Teleporting Random Walk

Page 31: Page-Rank Algorithm Final

Application of PageRank to the NFL• Apply the algorithm to last year’s NFL regular season to achieve a

ranking of the NFL teams based on importance

• Vary the algorithm to see how the rankings would change

• Compare our results to the actual results of the season

Page 32: Page-Rank Algorithm Final

Compiling the Data • View each match up week by week, record the outcome of each

game.• Teams = vertices, games played = directed arcs • For teams A and B, if team A lost to team B an arc would be directed from A

to B in the digraph

• The in-degree of each vertex is the amount of games that team won, so the out-degree of each vertex is the amount of games lost.• Example of subgraph of the digraph.

Page 33: Page-Rank Algorithm Final

Compiling the Data• Adjacency matrix constructed from the results of each game

• Record a 1 in a cell if the team in that row beat the team in that column, or ½ if a tie, otherwise a 0 is recorded

1. Arizona Cardinals

2. Atlanta Falcons

3. Baltimore Ravens

4. Buffalo Bills

5. Carolina Panthers

6. Chicago Bears

7. Cincinnati Bengals

8. Cleveland Browns

1. Arizona Cardinals 0 1 0 0 1 0 0 02. Atlanta Falcons 0 0 0 1 0 0 0 03. Baltimore Ravens 0 0 0 0 0 0 1 14. Buffalo Bills 0 0 1 0 1 0 0 05. Carolina Panthers 0 2 0 0 0 0 0 0

Page 34: Page-Rank Algorithm Final

Actual NFL Results1. Denver Broncos2. Seattle Seahawks3. Carolina Panthers4. New England Patriots5. San Francisco 49ers6. Cincinnati Bengals7. Indianapolis Colts8. Kansas City Chiefs9. New Orleans Saints10.Arizona Cardinals

Page 35: Page-Rank Algorithm Final

Page Rank with d=11. Seattle Seahawks2. San Francisco 49ers3. Arizona Cardinals4. New Orleans Saints5. Carolina Panthers6. Denver Broncos7. New England Patriots8. Saint Louis Rams9. Kansas City Chiefs10. Indianapolis Colts

Page 36: Page-Rank Algorithm Final

Page Rank with d=.81. Seattle Seahawks2. San Francisco 49ers3. Arizona Cardinals4. Denver Broncos5. Carolina Panthers6. New Orleans Saints7. New England Patriots8. Kansas City Chiefs9. Indianapolis Colts10. Saint Louis Rams

Page 37: Page-Rank Algorithm Final

Page Rank with d=.51. Seattle Seahawks2. San Francisco 49ers3. Denver Broncos4. Carolina Panthers5. New England Patriots6. New Orleans Saints7. Arizona Cardinals8. Kansas City Chiefs9. Indianapolis Colts10. Philadelphia Eagles

Page 38: Page-Rank Algorithm Final

Page Rank with d=.21. Seattle Seahawks2. San Francisco 49ers3. Denver Broncos4. Carolina Panthers5. New England Patriots6. New Orleans Saints7. Arizona Cardinals8. Kansas City Chiefs9. Indianapolis Colts10. Philadelphia Eagles