page ranking

61
WELCOME TO ALL!

Upload: pradiprahul

Post on 06-Aug-2015

25 views

Category:

Education


0 download

TRANSCRIPT

Page 1: PAGE RANKING

WELCOME TO ALL!

Page 2: PAGE RANKING

RANKING IN WEBSERVICES FOR CONTENT SEARCH OPTIMIZED IN

SPIDER CRAWLING COMB PAGE RANKING

Page 3: PAGE RANKING

SPIDER CRAWLING COMB PAGE RANKING

Page 4: PAGE RANKING

NAME : S. THARABAIREGISTER NUMBER : 121322201011DEPARTMENT : M.TECH(CSE) PT GUIDE NAME : Dr. V. CYRIL RAJ

Page 5: PAGE RANKING

TITLE

This PPT focuses on the Page Ranking in Web services. The Spider Crawling Comb Page Ranking is studied in detail

Page 6: PAGE RANKING

ABSTRACT

Page 7: PAGE RANKING

This report explore Filtering, Ranking and Selection algorithms used for the purpose of selecting the best web service for requester in line with her preferences. Experiments are conducted using real web services datasets and the outcome of the experiments confirms an improvement over existing methods in Page Ranking.

Page 8: PAGE RANKING

Page Ranking, Service Filtering, Web Service, Web Service Selection

KEYWORDS

Page 9: PAGE RANKING

LITERATURE REVIEW

• Al-Masri & Mahmoud proposed a solution by introducing the term -Web Service Relevancy Function (WsRF) which is used to measure the relevancy ranking of a specific Web service using parameters and preference of requester

• Zheng et al. proposed a Web service recommender system (WSRec) which incorporates user-contribution machinery for Web service information gathering with a hybrid collective filtering algorithm.

Page 10: PAGE RANKING

INTRODUCTION

Page 11: PAGE RANKING

WEB SERVICES ARCHITECTURE

Page 12: PAGE RANKING

Publishing, Binding and Discovering web services are the three major tasks in web service architectureA Web service is a software system designed to support interoperable machine-to-machine interaction over a network. The Web service uses SOAP messages, and conveyed using HTTP with XML standards.

WEB SERVICE

Page 13: PAGE RANKING

The service providers build web services that offer specified functions for users. The web service requester is any user of the web service who submits requests for the purpose of finding a service. Universal Description, Discovery and Integration (UDDI) is the registry standard for Web services.

COMPONENTS OF WEB SERVICES

Page 14: PAGE RANKING

As the number of Web service providers grows, redundancy becomes prevalent with many Web Service providers offering the same or similar services. we try to find an automatic and objective way to recommend a Web service. The ranking process will reduce correlation degree and extract user preference.

CONTENT SEARCH

Page 15: PAGE RANKING

Service Filtering is one of the methods used to reduce the redundancy services.

Web service selection refers to the process by which a service implementation is chosen for a request.

Qualified, Filtering, Ranking and Selection Algorithm(QFRSA)Web Service Selection and Ranking Model (WSSRM)

Web Services usingFiltering, Ranking and Selection

Page 16: PAGE RANKING

Ranking is the Reputation-enhanced service discovery algorithm.

In a situation where multiple services providing similar functionality, Ranking provides a reliable means of differentiating between the services.

Ranking is an essential factor for choosing optimal service for requesters.

RANKING

Page 17: PAGE RANKING
Page 18: PAGE RANKING

Google Architecture

Page 19: PAGE RANKING

1. In Google, the web crawling (downloading of web pages) is done by several distributed crawlers.

2. There is a URLserver that sends lists of URLs to be fetched to the crawlers.

3. The web pages that are fetched are then sent to the storeserver.

4. The storeserver then compresses and stores the web pages into a repository. Every web page has an associated ID number called a docID which is assigned whenever a new URL is parsed out of a web page.

Google Architecture

Page 20: PAGE RANKING

5. The indexer distributes these hits into a set of "barrels", creating a partially sorted forward index.

6. A program called DumpLexicon takes this list together with the lexicon produced by the indexer and generates a new lexicon to be used by the searcher.

7. The searcher is run by a web server and uses the lexicon built by DumpLexicon together with the inverted index and the PageRanks to answer queries.

Page 21: PAGE RANKING
Page 22: PAGE RANKING

GOOGLE PAGE RANKINGResources for Google Page Ranking

Google Page Ranking takes more factors such as,• Hits • Backlinks• Citation Graph• Keywords, Candidates• Metadata Keywords• Damping factor(d) obtained from random surfing• Outgoing links• Anchor Text• Repository of web sources for more web sources• Indexing or Sorting of documents based on DocIds or WordIds.• Font type and Format• Internet Ranking• Final Page Ranking

Page 23: PAGE RANKING

If your site doesn't show up on Google or other popular search engines, no one except those you tell about your site will find it.For example, if we type words "school of public health" into Google. It displays the following “hit list”.

school of public health graduate school public health public health school masters public health

The higher a websites PageRank, the higher it will show up in search results. Google and other search engines use secret algorithms pointing to dozens of factors to determine PageRank. To select an optimal website.

Page 24: PAGE RANKING

The Ranking System

Google maintains much more information about web documents than typical search engines. Every hit list includes position, font, and capitalization information. Additionally, we factor in hits from anchor text and the PageRank of the document. Combining all of this information into a rank is difficult. We designed our ranking function so that no particular factor can have too much influence.

Page 25: PAGE RANKING

Single and Multi – word hit listssingle word query: At first Google looks at that document's hit list for the given word. The hit list types are title, anchor, URL, plain text large font, plain text small font, etc. The indexed vector of type-weights is preparedGoogle counts the number of hits of each type in the hit list. We take the dot product of the vector of count-weights with the vector of type-weights to compute an IR score for the document. Finally, the IR score is combined with PageRank to give a final rank to the document.

Page 26: PAGE RANKING

Now multiple hit lists must be scanned through at once so that hits occurring close together in a document are weighted higher than hits occurring far apart in the web crawling. The hits from the multiple hit lists are matched up so that nearby hits are matched together.Huffman coding is used to hit the optimal list.For example, in a web site containing 200 pages the pages nearby to the home page are selected first for ranking.

MULTI-WORD SEARCH

Page 27: PAGE RANKING

Fancy hits and plain hits

Our compact encoding uses two bytes for every hit. There are two types of hits: fancy hits and plain hits.Fancy hits include hits occurring in a URL, title, anchor text, or meta tag. A plain hit consists of a capitalization bit, font size, and 12 bits of word position in a document (all positions higher than 4095 are labeled 4096). Font size is represented relative to the rest of the document using three bits For anchor hits, the 8 bits of position are split into 4 bits for position in anchor and 4 bits for a hash of the docID the anchor occurs in.

Page 28: PAGE RANKING

According to W3C [4], Web Service s denotes the web service such as performance, reliability, scalability, availability, etc. In a situation where multiple services providing similar functionality, it provides a reliable means of differentiating between the services, However the existing system not provide optimal service for requesters.

EXISTING SYSTEM

Page 29: PAGE RANKING

The higher a websites PageRank, the higher it will show up in search results. In the existing system you can find out the PageRank of any web page as below:

Check Page Rank of any web site pages instantly:

Top of Form

Bottom of Form

This free page rank checking tool is powered by

Page Rank Checker service

Page Rank Checker

http:// Check PR

Page 30: PAGE RANKING

In general:•Search Engine send out "spiders" or "robots" that comb through web pages, recording URLs, page titles, content and meta data. They move from a page to every page linked to from it, and from those pages to every page linked to from them, in a spider-web-like fashion. •A count is kept on how many times the robot comes across each page. •They use information from internet directories. •They use information submitted by Web Masters.

How Search Engine Works?

Page 31: PAGE RANKING

LIMITATIONS OF EXISTING SYSTEM

•Lesser available data:For example, a requester can request for weather information service with availability of 96% data alone.•No Optimal Service for the user’s requestInadequate for selecting optimal service that would satisfy users’ expectations•Higher response time

Page 32: PAGE RANKING

PROPOSED SYSTEM

Page 33: PAGE RANKING

Optimal selection of web services is the aim of the proposed system. The system examine various PAGE RANKING methods by which optimal web services can be identified from a set of candidates offering similar functionality using the performance of the candidates and the preference of web service requesters.

AIM

Page 34: PAGE RANKING

OBJECTIVEThe number of sites that link to your site is the number one determinant.Targeting appropriate sites, such as affiliates/partners web sites,business/trade web sites and related sites.Best results come from having the keywords as part of domain name (e.g., www.diabetes.org)Use of short, descriptive page titles. URL is the most important factor for search engines.

Page 35: PAGE RANKING

Provides Good Content

• The first 200 words on a web page are crucial. The first 2 or 3 sentences may be used in search engine result listings.

• A well-written first paragraph, packed with keywords, can do wonders for your search engine ranking.

• Make sure that there is text on your site's homepage describing your site and its purpose

Page 36: PAGE RANKING

Provide Good Meta Data

Meta data is defined by the meta tags you use in the head section of your HTML document. The important ones are:

Content-Type author title copyright description keywords

Page 37: PAGE RANKING

• Knowledge-based services• Quality of a web service such as availability,

response time, reliability, scalability • Cost beneficial for the business people due to

increased visibility• Reputation-enhanced service discovery algorithm• The higher the Page Ranking the lower is the

response time.

ADVANTAGES OF THE PROPOSED SYSTEM

Page 38: PAGE RANKING

AREAS UNDER STUDYWeb service RankingContent SearchingSearch Engine OptimizationPage rank Algorithm

PROBLEM DOMAIN

Page 39: PAGE RANKING

• PageRank is defined like this:• We assume page A has pages T1…Tn which

point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:

• PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

PAGE RANKING ALGORITHM

Page 40: PAGE RANKING

TECHNICAL TERMS IN PAGE RANKING

• PR: Shorthand for PageRank: the actual, real, page rank for each page as calculated by Google. As we'll see later this can range from 0.15 to billions.

• Toolbar: The PageRank displayed in the Google toolbar in your browser. This ranges from 0 to 10.

• Backlink:If page A links out to page B, then page B is said to have a "backlink" from page A

Page 41: PAGE RANKING

Page Ranking Essentials• In short Page Rank is a "vote", by all the other

pages on the Web, about how important a page is. A link to a page counts as a vote of support

• We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. Also C(A) is defined as the number of links going out of page A. The Page Rank of a page A is given as follows:

Page 42: PAGE RANKING

• (1 – d) – The (1 – d) bit at the beginning is a bit of probability math magic so the "sum of all web pages' PageRanks will be one": it adds in the bit lost by the d(…. It also means that if a page has no links to it (no backlinks) even then it will still get a small PR of 0.15 (i.e. 1 – 0.85). (Aside: the Google paper says "the sum of all pages" but they mean the "the normalised sum" otherwise known as "the average" to you and me.

Page 43: PAGE RANKING

How is Page Rank Calculated?

• PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web.

• Lets take the simplest example network: two pages, each pointing to the other:

Each page has one outgoing link (the outgoing count is 1, i.e. C(A) = 1 and C(B) = 1).

Page 44: PAGE RANKING

PAGE RANK CALCULATOR

Page 45: PAGE RANKING

Guess 1we don't know what their PR should be to begin with, so let's take a guess at 1.0 and do some calculations:

d = 0.85PR(A) = (1 – d) + d(PR(B)/1)PR(B) = (1 – d) + d(PR(A)/1)i.e.PR(A) = 0.15 + 0.85 * 1

= 1PR(B) = 0.15 + 0.85 * 1

= 1

Page 46: PAGE RANKING

GUESS 2

Well let's see. Let's start the guess at 40 each and do a few cycles:

PR(A) = 40 PR(B) = 40First calculationPR(A)= 0.15 + 0.85 * 40 = 34.15PR(B)= 0.15 + 0.85 * 34.15 = 29.1775And againPR(A)= 0.15 + 0.85 * 29.1775 = 24.950875PR(B)= 0.15 + 0.85 * 24.950875 = 21.35824375

Page 47: PAGE RANKING

PAGE RANK 0 - 10

1 Page Rank (PR)• The principle of PR is that sites are divided into 11 categories with

ranks from 0 to 10, respectively. The concept is that the higher the PR, the better the site.

• Sites that have a PR of 10 are very rare.• Sites with PR of 7-9 are more common but they are a minority PR.• If a site has a PR of 5 or 6, this means this site is viewed by Google

as a quality site.• PR of 3 and 4 are for sites that are about the average. • PR of 0 to 2 are for sites that are below the average and therefore

aren't the top backlinking candidate.

Page 48: PAGE RANKING

OTHER PAGE RANKING ALGORITHMS

2 Alexa• Unlike PR, Alexa doesn't divide sites in groups.

Rather, it arranges them in a list. The most popular sites, such as Google, Facebook, or Twitter are at the top.

3 Compete• When you analyze Compete data, you will notice

that frequently sites with good PR 4 Quantcast• Quantcast is also a service targeted mainly at the US

market. It gathers data from a sample, ISP and ad.

Page 49: PAGE RANKING

5 CustomRank• CustomRank.com provides a service that combines

several metrics at once to offer a joint ranking. The services it aggregates are MozTrust, MozRank, PageAuthority, DomainAuthority etc.

6 MozTrust and MozRank• MozTrust measures the global link trust score,

while MozRank measures link popularity. The more reputable a site's backlinks are, the higher the MozTrust score.

Page 50: PAGE RANKING

7 ComScore• ComScore is another company that uses a

sample of 2 million users to provide rankings 8 Google Trends• Google Trends is mainly about search volume of

keywords but one of its less known uses is to compare how two sites fare over time or in different regions.

9 Ranking• Ranking.com is one more service to consider if

you are dissatisfied with the rest.

Page 51: PAGE RANKING
Page 52: PAGE RANKING

TOOLS REQUIREDMs – Office for documentation and FlowchartingJSP.NET and XML to create formsNet beans and DOM Web Server to store intermediately. World wide web and internet libraries Google Chrome

Page 53: PAGE RANKING

METHOD

The proposed system is designed to carry out the process of selecting optimal service for a requester using service. The following four attributes.Increased Response time, Reliability, Availability and Successability are provided in this project by ranking the page.

Page 54: PAGE RANKING

ALEXA PAGE RANKING

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><title>Enter your Website here</title><script language="javascript">

function verify(){if(document.form1.u_name.value==""){alert("Please give username");document.form1.u_name.focus();return false;}

if(document.form1.pass.value==""){alert("Please give a password ");document.form1.pass.focus();return false;}

Page 55: PAGE RANKING

if(document.form1.r_pass.value==""){alert("Please retype your password");document.form1.r_pass.focus();return false;}if((document.form1.pass.value != document.form1.r_pass.value)){alert("Your password does not match");document.form1.r_pass.value=="";document.form1.r_pass.focus();return false;}if(document.form1.country.value==""){alert("Please enter country 'India or Global'");document.form1.country.focus();return false;}if(document.form1.website.value=="") {alert("Please enter your website name");document.form1.website.focus();return false;}elsereturn(true);}

Page 56: PAGE RANKING

function Rank(){var r1,e1,e2,e3,rank1;if(document.form1.country.value=="India"){r1=40.0;}else{r1=35.0;}e1=new String(document.form1.website.value);e2=e1.lastIndexOf(".");e3=e1.substr(e2);if(e3==".com"){ rank1=32.0;document.write("<p>The PageRank is :"+((r1+rank1)/2)+"%"+"</p>");}if(e3==".org"){ rank1=34.0;document.write("<p>The PageRank is :"+((r1+rank1)/2)+"%"+"</p>");}if(e3==".in"){rank1=36.0;document.write("<p>The PageRank is :"+((r1+rank1)/2)+"%"+"</p>");}if(e3==".edu"){rank1=38.0;document.write("<p>The PageRank is :"+((r1+rank1)/2)+"%"+"</p>");}

Page 57: PAGE RANKING

if(e3==".net"){rank1=39.0;document.write("<p>The PageRank is :"+((r1+rank1)/2)+"%"+"</p>");}return(true);}</script></head><body><!--Enter your Website name--><pre><form method="POST" action="" name="form1"><table border="2" align="center" cellpadding="7"><tr><td><strong>Username:</strong></td><td><input type="text" name="u_name"/></td></tr><tr><td><strong>Password:</strong></td><td><input type="password" name="pass"/></td></tr><tr><td><strong>Retype Password:</strong></td><td><input type="password" name="r_pass"/></td></tr>

Page 58: PAGE RANKING

<tr><td><strong>Country:</strong></td><td><p> <select name="country"><option value="" selected/>--select--<option value="India"/>India<option value="Global"/>Global</select></td></tr><tr><td><strong>Website:</strong></td><td><input type="text" value="http://" name="website"/></td></tr><tr align="center"><td><input type="button" value="Verify" onClick="return (verify());"/></td><td><input type="button" value="pageRank" onClick="return (Rank());"/></td></tr></table></form></pre></body></html>

Page 59: PAGE RANKING

Result :The PageRank is :37%

PAGE RANKING RESULT

Page 60: PAGE RANKING

PAGE RANKING USING MACHINE LEARNING

•K – NEAREST NEIGHBOURHOOD FOR RANKING•CLUSTERING TO DISPLAY RESULTS

Page 61: PAGE RANKING

THANK YOU!