search engine

24
Search Engine Submitted by Swaraj Kumar Dehuri CSE,1201109060

Upload: swaraj27

Post on 16-Jul-2015

23 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Search engine

Search EngineSubmitted by

Swaraj Kumar Dehuri

CSE,1201109060

Page 2: Search engine

Contents

What is a search engine?

Why search engine???

History of SE

How search engine works

o Web crawling

o Indexing

o Searching

Search Engine Optimization

PageRank Algorithm

Types of SE

Conclusion

Page 3: Search engine

What is a search engine?

Search engine is a software program that searches a

database and gathers reports, information that contains or is

related to specified terms.

Or,

It is a website whose primary function is providing a search

for gathering and reporting information’s available on the

internet or a portion of internet.

Page 4: Search engine

Examples of Search engine:

Page 5: Search engine

Why Search Engine???

In today’s world we have million and billions of information

available in the vast WWW. If one has to search some

information it will kill lots of time of the user. For this

purpose we should have certain tools for making this search

automatic, quick, and effortless.so, to reduce the problem

to a, more or less manageable solution, web search engine

were introduced a few years ago.

Page 6: Search engine

History of search engine

In 1990, the first search engine ARCHIE was released, at

that time there is no World Wide Web. Data resided on

defence contractor, university, and government computers

and techies were the only people accessing the data

Computers are interconnected by Telnet.

File Transfer Protocol(FTP) used for transferring file from

computer to computer

In 1994, WebCrawler, a new type of search engine that

indexed the entire contents of a webpage, was introduced

In around 1998, search engine Algorithms was introduces

to optimize the searching.

Page 7: Search engine

How search engine works

Web crawling

Indexing

Searching

Page 8: Search engine

Web Crawling

Spiders: To find information on the hundreds of millions of

Web pages that exist, a search engine employs special

software robots, called spiders, to build lists of the words

found on Web sites.

Crawling: When a spider is building its lists, the process is

called Web Crawling. In order to build and maintain a

useful list of words, a search engine's spiders have to look

at a lot of pages.

Page 9: Search engine

Indexing

Indexer, then reads these documents and creates

an index based on the words contained in each document

Search engine indexing collects, parses, and stores data to

facilitate fast and accurate information retrieval

The purpose of storing an index is to optimize speed and

performance in finding relevant documents for a search

query. without an index, the search engine would scan

every document in the corpus, which would require

considerable time and computing power.

Page 10: Search engine

Searching

Each search engine uses a proprietary algorithm to create

its indices such that, ideally, only meaningful results are

returned for each query.

Search algorithm: Unique to every search engine, and

just as important as keywords, search engine algorithms

are the why and the how of search engine rankings.

Basically, a search engine algorithm is a set of rules, or a

unique formula, that the search engine uses to determine

the significance of a web page, and each search engine

has its own set of rules.

E.g. Google uses PageRank algorithm

Page 11: Search engine

Search Engine Indexing process

Page 12: Search engine
Page 13: Search engine

Search Engine Optimization

It is the process of improving the volume and quality of

traffic to a website from search engine.

As a marketing strategy for increasing a site's relevance,

SEO considers how search algorithm work and what people

search for.

Used by companies to get a higher result in search engines

Page 14: Search engine

PageRank Algorithm

Page Rank is once the most important part of Google’s ranking system

and search engine optimization. It is a link analysis algorithm applied

by Google.com that assigns a number or rank to each hyperlinked web

page within the World Wide Web.

The original Page Rank algorithm which was described by Larry Page

and Sergey Brin is given by

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

where,

PR(A) – Page Rank of page A

PR(Ti) – Page Rank of pages Ti which link to page A

C(Ti) - number of outbound links on page Ti

d - damping factor which can be set between 0 and 1

Page 15: Search engine

Simplified Algorithm(PR)

Assume four web pages: A, B, C and PageRank is initialized to the same

value for all pages. assume a probability distribution between 0 and 1.

Hence the initial value for each page is 0.25.

In 0 iteration PageRank of A is 0.25

If the only links in the system were from pages B, C, and D to A, each link

would transfer 0.25 PageRank to A upon the next iteration, for a total of

0.75

PR(A)=PR(B)+PR(C)+PR(D)

A

0.75

B

0.25

D

0.25

C

0.25

C

0.25D

0.25

B

0.25

A

0.25

Page 16: Search engine

(cont.)

Suppose page B had a link to pages C and A, page C

had a link to page A, and page D had links to all three

pages

Page B would transfer half of its existing value, or

0.125, to Page A and the other half, or 0.125, to Page

C.

Page C would transfer all of its existing value, 0.25, to

the only page it links to, Page A

Page D had three outbound links, it would transfer one

third of its existing value, or approximately 0.083, to

Page A

PR(A)=PR(B)/2+PR(C)/1+PR(D)/3

So, PageRank of A is 0.0458

A0.458

B0.125

D0.083

C0.25

Page 17: Search engine

Example

Damping factor(d)=0.5

PR(A) = 0.5 + 0.5 PR(C)=1.07692308

PR(B) = 0.5 + 0.5 (PR(A) / 2)=0.76923077

PR(C) = 0.5 + 0.5 (PR(A) / 2 + PR(B))=1.15384615

Because of the size of the actual web, the Google search engine uses an

approximated, iterative computation of PageRank values. This means that

each page is assigned an initial starting value and the PageRank of all pages

are then calculated in several computation circles based on the equations

determined by the PageRank algorithm

A

C

B

Page 18: Search engine

The Iterative Computation of PageRank

Iteration 0:

Initially PageRank of a web is 1

Iteration 10:

The sum of all pages' PageRank still

converges to the total number of web

pages. So the average PageRank of a

web page is 1.

Iteration PR(A) PR(B) PR(C)

0 1 1 1

1 1 0.75 1.125

2 1.0625 0.765625 1.1484375

3 1.07421875 0.76855469 1.15283203

4 1.07641602 0.76910400 1.15365601

5 1.07682800 0.76920700 1.15381050

6 1.07690525 0.76922631 1.15383947

7 1.07691973 0.76922993 1.15384490

8 1.07692245 0.76923061 1.15384592

9 1.07692296 0.76923074 1.15384611

10 1.07692305 0.76923076 1.15384615

Page 19: Search engine

How to Increase PageRank

By adding spam pages.

Join forum.

Submit to search engine directories.

Reciprocating links.

Publish relevant content. Quality content

is the number one driver of your search

engine rankings and there is no substitute

for great content.

A331.0

B281.6

Spam 1

0.39

Spam 2

0.39

Spam

1000

0.39

Page 20: Search engine

Types of search engine

Crawler based search engine

These types of search engines use a "spider" or a "crawler" to search the

Internet. The crawler digs through individual web pages, pulls out keywords and

then adds the pages to the search engine's database. Google and Yahoo are

examples of crawler search engines.

Directories

Directories depend on human editors to create their listings or the database.

Yahoo Directory, Open Directory and Look Smart are few examples.

Page 21: Search engine

(cont.)

Hybrid search engines

Hybrid search engines are search engines that use both crawler based searches

and directory searches to obtain their results. Example: Google, Yahoo

Meta search engine

These transmit user-supplied keywords simultaneously to several individual

search engines to actually carry out the search.

Search results returned from all the search engines can be integrated, duplicates

can be eliminated and additional features such as clustering by subjects within

the search results can be implemented by meta-search engines.

Example: Dogpile, Metacrawler

Page 22: Search engine

Conclusion

Search Engine is designed for getting relevant results. The primary goal is to

provide high quality search results over a rapidly growing World Wide Web.

e.g. Google employs a number of techniques to improve search quality

including page rank, anchor text, and proximity information. Furthermore,

Google is a complete architecture for gathering web pages, indexing them,

and performing search queries over them.

Page 23: Search engine

Reference

Search Engine Basics & types of SE

en.wikipedia.org/wiki/web_search_engine.html

www.howstuffsworks.com/howsearchworks.html

PageRank-

en.wikipedia.org/wiki/PageRank.html

www.webworkshop.net/pagerank.html

Page 24: Search engine

Thank You…