9 algorithms: pagerank

Post on 24-Feb-2016

38 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

9 Algorithms: PageRank. Ranking. After matching, have to rank:. Index Based Ranking. Strategies we could (do) use: Frequency Position Metadata. Missing Ingredient. Index lacks intra-page information. Link Quality. More links is easy to abuse. Spam Link Pages. Link Quality. - PowerPoint PPT Presentation

TRANSCRIPT

9 Algorithms:PageRank

Ranking

• After matching, have to rank:

Index Based Ranking

• Strategies we could (do) use:– Frequency– Position– Metadata

Missing Ingredient

• Index lacks intra-page information

Link Quality

• Not all links are equal• Who do you trust?– CS Prof– World Famous Chef

Identifying Authority

• Links into a page give it authority• Page value = sum of authorities of pages

linking to it

Link Quality

• More links is easy to abuse Spam Link Pages

Issues

• Spam Links– Discourage with negative weight

Spam Link Pages

-1

-1

-1

-1

-1

-1

Issues

• Cycles:

Issues

• Cycles:

Issues

• Cycles:

Random Surfer

• Simulating a web surfing session– Start at random page– At each page have a chance to

• Pick a random link to go to• Jump to a completely random page

Results

• Results of many random sessions:

Results

• Expressed as percentages, results stabilize– Law of large numbers

Cycle Buster

• Random surfer not phased by cycles:

Random Surfer In Use

• The recipe pages visited by random surfers:

Simulator

• PageRank Simulator:http://caccio.blogdns.net/software/pagerank-simulator

The Real Math

• Markov Chains– Set of states– Each state has probability of leading to other

states– Represent as matrix

Excel Simulation

• Three pages:

Limitations

• Still have issues/room for growth– Link Spam– Context of link• Where link is on page• "Bob's recipe is terrible" vs "Bob's recipe is great"

– Lack of semantic knowledge• Page's Authority should not be the same for all domains

Power

• Controlling search is power:http://www.bitsbook.com/

"If you're not paying for the product, you are the product."

top related