search engine page rank demystification

Post on 24-Apr-2015

745 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Hi All, This Presentation will feature more about the working of search engine how do the inner functionality takes place. In the later half of the Presentation the Page Rank will be explained in depth. how do they calculate it, How it differing from the actual PR, Google PR. How frequently they do update the PR value in the google. and lots more with calculation and few examples.

TRANSCRIPT

By, Rajanagan R Web Analyst

Search Engines

What is Search Engine.???

A Search Engine is an information retrieval system designed

to help find information stored on a computer system, such

as on the World Wide Web.

A web search tool that automatically visits websites (using

crawlers), records and indexes them within its database, and

generates results based on a user's search criteria.

Unlike Web directories, which are maintained by human

editors, search engines operate algorithmically or are a

mixture of algorithmic and human input.

History of Search Engines

1993: First web robot – World Wide Web WandererMatthew Gray, Physics student from MITObjective: Track all pages on web to monitor growth of the web

1994: First search engine – WebCrawler, Brian Pinkerton, CS student from U of WashingtonObjective: Download web pages, store the links linked to keyword-searchable DB

1994: Jerry’s Guide to the InternetJerry Yang, David Filo, Stanford UniversityObjective: Crawl for web pages, organize them by content into hierarchies Yet Another Hierarchical Officious Oracle (Yahoo)

1994-97: Infoseek, AltaVista, Excite, Lycos, LookSmart (meta engine) Ranking Based on Content & Structure

1998: Google (Sergey Brin, Larry Page, CS students, Stanford University) Ranking Based on Content, Structure & Value

1990: First tool for Searching on Internet - ArchieAlan Emtage, Student from McGill University in MontrealObjective: Tool for Indexing FTP archives, allowing people to find specific files.

How Search Engine Works..????

Step 1: Crawling

Want to See what Crawler looks @

Click Here

Crawler Looks @ Example

Back This is what I look in a

website..!!!

Step 2 : Indexing

Indexed Database Click Here

Back

Step 3 : Processing Query

Step 4 : Ranking

Overall Functioning of Search Engines

Your Browser

The Web

URL1

URL2

URL3 URL4

Crawler

Indexer

SearchEngine

Database Eggs?Eggs.

Eggs - 90%Eggo - 81%Ego- 40%

Huh? - 10%

All AboutEggs

in a fraction of second

SERP

Page Rank???

Google Page Rank Algorithm

Back Bone of Google Technology developed by Larry Page & Sergey Brin in 1998.

Ranks Pages based on the number of other pages that link to it.

Calculated by the nature and the number of Back links producing the SERP Listing.

Google toolbar shows the page rank as scale value from 0 -10, you can find at - www.toolbar.google.com. But it’s just an rough guide not the Actual or the Real PR. Nevertheless, it can be a good indication for SEO practitioners to know whether the website is moving in the right (or wrong) direction.

Definition of Page Rank In order to measure the relative importance of web pages, Page Rank is

proposed. It is a method for computing a ranking for every web page based on the graph (Links) of the web.

We assume,T1...Tn – Links in page A which point to it (i.e., are citations). D - Damping factor which can be set between 0 and 1, usually set d=0.85. C(A) - Number of links going out of page A i.e. Outgoing links

The Page Rank of a page A is given as follows,

PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

Note: Page Ranks form a probability distribution over web pages, so the average of all web pages Page Ranks will be one.

Calculating Page Rank

The PR of each page depends on the PR of the pages pointing to it. We won’t know what PR those pages have until the pages pointing to them have their PR calculated and it goes on..

Seems impossible in calculating PR..! But there is a Solution..! Here we Go.!!!

Page Rank can be calculated using a simple iterative algorithm, corresponds to the principal eigenvector of the normalized link matrix of the web.

It means, We can calculate a page’s PR without knowing the final value of the PR of the

other pages. What we need to do :- Remember the each value we calculate Repeat the calculations lots of times until

the numbers stop changing much.

Simple hierarchy

Each page has one outgoing link, i.e. C(A) = 1 and C(B) = 1)

We don’t know the PR of the pages, lets assume each has PR = 1.00 , d = 0.85

PR(A) = (1 – d) + d(PR(B)/1) PR(B) = (1 – d) + d(PR(A)/1)

i.e.PR(A) = 0.15 + 0.85 * 1 = 1PR(B) = 0.15 + 0.85 * 1= 1

We started out with a lucky guess..! The numbers aren't changing at all..!

Complex Hierarchy

Average PR : 0.378 PR Loss : 8 – (.92+.41+.41+.41+.22+.22+.22+.22)0.378 = 7.622

For Calculation Click Here

Complex Hierarchy with Avg PR = 1.0000

Average PR : 1.0000 PR Loss : 8 – (3.35+1.1+1.1+1.1+.34+.34+.34+.34) = 0.0000

FinallyObservation:

It doesn't matter how many pages you have in your site, your average PR will always be 1.0 at best. But a hierarchical layout can strongly concentrate votes and therefore the PR.

Page Rank is, in fact, very simple (apart from one scary looking formula). But when a simple calculation is applied hundreds (or billions) of times over the results can seem complicated.

Page Rank is also only part of the story about what results get displayed high up in a Google listing. Google also pays attention to the text in a link's anchor when deciding the relevance of a target page perhaps more than the page's PR.

Page Rank is still part of the listings story though, so it's worth your while as a good designer to make sure you understand it correctly.

DFID 200623

ReferencesThe PageRank paper by Google's founders Sergey Brin and Lawrence Page

http://www-db.stanford.edu/~backrub/google.html

Chris Ridings' "PageRank Explained" paper which, as of April 2002 http://web.archive.org/web/*/

http://www.goodlookingcooking.co.uk/PageRank.pdf

An excellent discussion by Douglas W. Jones http://www.cs.uiowa.edu/~jones/cards/chad.html

http://www.sirgroane.net/google-page-rank/

http://www.youtube.com/watch?feature=player_embedded&v=h3Jup5R1MGY#!

http://www.searchnerd.com/pagerank/

Thank You..!!!

Queries if any please.!!Reach me @ rajanagan.tpgit@gmail.com

Next

Back

top related