search engine
TRANSCRIPT
Search EngineSubmitted by
Swaraj Kumar Dehuri
CSE,1201109060
Contents
What is a search engine?
Why search engine???
History of SE
How search engine works
o Web crawling
o Indexing
o Searching
Search Engine Optimization
PageRank Algorithm
Types of SE
Conclusion
What is a search engine?
Search engine is a software program that searches a
database and gathers reports, information that contains or is
related to specified terms.
Or,
It is a website whose primary function is providing a search
for gathering and reporting information’s available on the
internet or a portion of internet.
Examples of Search engine:
Why Search Engine???
In today’s world we have million and billions of information
available in the vast WWW. If one has to search some
information it will kill lots of time of the user. For this
purpose we should have certain tools for making this search
automatic, quick, and effortless.so, to reduce the problem
to a, more or less manageable solution, web search engine
were introduced a few years ago.
History of search engine
In 1990, the first search engine ARCHIE was released, at
that time there is no World Wide Web. Data resided on
defence contractor, university, and government computers
and techies were the only people accessing the data
Computers are interconnected by Telnet.
File Transfer Protocol(FTP) used for transferring file from
computer to computer
In 1994, WebCrawler, a new type of search engine that
indexed the entire contents of a webpage, was introduced
In around 1998, search engine Algorithms was introduces
to optimize the searching.
How search engine works
Web crawling
Indexing
Searching
Web Crawling
Spiders: To find information on the hundreds of millions of
Web pages that exist, a search engine employs special
software robots, called spiders, to build lists of the words
found on Web sites.
Crawling: When a spider is building its lists, the process is
called Web Crawling. In order to build and maintain a
useful list of words, a search engine's spiders have to look
at a lot of pages.
Indexing
Indexer, then reads these documents and creates
an index based on the words contained in each document
Search engine indexing collects, parses, and stores data to
facilitate fast and accurate information retrieval
The purpose of storing an index is to optimize speed and
performance in finding relevant documents for a search
query. without an index, the search engine would scan
every document in the corpus, which would require
considerable time and computing power.
Searching
Each search engine uses a proprietary algorithm to create
its indices such that, ideally, only meaningful results are
returned for each query.
Search algorithm: Unique to every search engine, and
just as important as keywords, search engine algorithms
are the why and the how of search engine rankings.
Basically, a search engine algorithm is a set of rules, or a
unique formula, that the search engine uses to determine
the significance of a web page, and each search engine
has its own set of rules.
E.g. Google uses PageRank algorithm
Search Engine Indexing process
Search Engine Optimization
It is the process of improving the volume and quality of
traffic to a website from search engine.
As a marketing strategy for increasing a site's relevance,
SEO considers how search algorithm work and what people
search for.
Used by companies to get a higher result in search engines
PageRank Algorithm
Page Rank is once the most important part of Google’s ranking system
and search engine optimization. It is a link analysis algorithm applied
by Google.com that assigns a number or rank to each hyperlinked web
page within the World Wide Web.
The original Page Rank algorithm which was described by Larry Page
and Sergey Brin is given by
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
where,
PR(A) – Page Rank of page A
PR(Ti) – Page Rank of pages Ti which link to page A
C(Ti) - number of outbound links on page Ti
d - damping factor which can be set between 0 and 1
Simplified Algorithm(PR)
Assume four web pages: A, B, C and PageRank is initialized to the same
value for all pages. assume a probability distribution between 0 and 1.
Hence the initial value for each page is 0.25.
In 0 iteration PageRank of A is 0.25
If the only links in the system were from pages B, C, and D to A, each link
would transfer 0.25 PageRank to A upon the next iteration, for a total of
0.75
PR(A)=PR(B)+PR(C)+PR(D)
A
0.75
B
0.25
D
0.25
C
0.25
C
0.25D
0.25
B
0.25
A
0.25
(cont.)
Suppose page B had a link to pages C and A, page C
had a link to page A, and page D had links to all three
pages
Page B would transfer half of its existing value, or
0.125, to Page A and the other half, or 0.125, to Page
C.
Page C would transfer all of its existing value, 0.25, to
the only page it links to, Page A
Page D had three outbound links, it would transfer one
third of its existing value, or approximately 0.083, to
Page A
PR(A)=PR(B)/2+PR(C)/1+PR(D)/3
So, PageRank of A is 0.0458
A0.458
B0.125
D0.083
C0.25
Example
Damping factor(d)=0.5
PR(A) = 0.5 + 0.5 PR(C)=1.07692308
PR(B) = 0.5 + 0.5 (PR(A) / 2)=0.76923077
PR(C) = 0.5 + 0.5 (PR(A) / 2 + PR(B))=1.15384615
Because of the size of the actual web, the Google search engine uses an
approximated, iterative computation of PageRank values. This means that
each page is assigned an initial starting value and the PageRank of all pages
are then calculated in several computation circles based on the equations
determined by the PageRank algorithm
A
C
B
The Iterative Computation of PageRank
Iteration 0:
Initially PageRank of a web is 1
Iteration 10:
The sum of all pages' PageRank still
converges to the total number of web
pages. So the average PageRank of a
web page is 1.
Iteration PR(A) PR(B) PR(C)
0 1 1 1
1 1 0.75 1.125
2 1.0625 0.765625 1.1484375
3 1.07421875 0.76855469 1.15283203
4 1.07641602 0.76910400 1.15365601
5 1.07682800 0.76920700 1.15381050
6 1.07690525 0.76922631 1.15383947
7 1.07691973 0.76922993 1.15384490
8 1.07692245 0.76923061 1.15384592
9 1.07692296 0.76923074 1.15384611
10 1.07692305 0.76923076 1.15384615
How to Increase PageRank
By adding spam pages.
Join forum.
Submit to search engine directories.
Reciprocating links.
Publish relevant content. Quality content
is the number one driver of your search
engine rankings and there is no substitute
for great content.
A331.0
B281.6
Spam 1
0.39
Spam 2
0.39
Spam
1000
0.39
Types of search engine
Crawler based search engine
These types of search engines use a "spider" or a "crawler" to search the
Internet. The crawler digs through individual web pages, pulls out keywords and
then adds the pages to the search engine's database. Google and Yahoo are
examples of crawler search engines.
Directories
Directories depend on human editors to create their listings or the database.
Yahoo Directory, Open Directory and Look Smart are few examples.
(cont.)
Hybrid search engines
Hybrid search engines are search engines that use both crawler based searches
and directory searches to obtain their results. Example: Google, Yahoo
Meta search engine
These transmit user-supplied keywords simultaneously to several individual
search engines to actually carry out the search.
Search results returned from all the search engines can be integrated, duplicates
can be eliminated and additional features such as clustering by subjects within
the search results can be implemented by meta-search engines.
Example: Dogpile, Metacrawler
Conclusion
Search Engine is designed for getting relevant results. The primary goal is to
provide high quality search results over a rapidly growing World Wide Web.
e.g. Google employs a number of techniques to improve search quality
including page rank, anchor text, and proximity information. Furthermore,
Google is a complete architecture for gathering web pages, indexing them,
and performing search queries over them.
Reference
Search Engine Basics & types of SE
en.wikipedia.org/wiki/web_search_engine.html
www.howstuffsworks.com/howsearchworks.html
PageRank-
en.wikipedia.org/wiki/PageRank.html
www.webworkshop.net/pagerank.html
Thank You…