different components of a crawlable search engine
TRANSCRIPT
3
IMPORTANCE OF SEARCH ENGINE
In years Search engines have increased the
usability of Web dramatically
4
DIFFERENT TYPES OF SEARCH ENGINES
TYPES
❏ CRAWLER BASED SEARCH ENGINE
❏ HUMAN POWERED DIRECTORIES
❏ HYBRID SEARCH ENGINES
❏ META SEARCH ENGINES
EXAMPLE
➢ YAHOO
➢ GOOGLE AND YAHOO
➢ DOGPILE
5Real Facts About CRAWLABLE SEARCH ENGINE
1, Before September, 1993 World Wide Web used to get indexed by hand, entirely.
2. The first Web servers for world wide web were edited by Tim-Berners-Lee and were hosted on CERN web server.
3. On 1993, Matthew Gray produced the first web robot namely, World Wide Web Wanderer and used it for generating the first ever index called ‘Wandex’.
Image Credit: Agronet
DIFFERENT COMPONENTS OF A CRAWLABLE SEARCH ENGINE
6
PHYSICAL ARCHITECTURAL
COMPONENTS
MAJOR DATA STRUCTURAL
COMPONENTS
Image Credit: Iconfinder, Stack4Things
PHYSICAL ARCHITECTURAL COMPONENTS
7
❏URL SERVER : Provides a list of URL to the crawler to fetch their information.
❏CRAWLER : It automatically traverses the web and downloads web pages and follows links from
pages to pages.
❏STORE SERVER : It stores the downloaded web pages.
❏BARREL : It stores documents processed by indexer with minute details.
❏SORTER : It rearrange the barrel sorted product to generate inverted index.
❏ANCHOR FILE : It holds the information of link’s source, destination and text.
8MAJOR DATA STRUCTURAL COMPONENTS -1
❏BIG FILES : These are virtual files spanning multiple file systems.
❏REPOSITORY : It contains full HTML of every page in a compressed format.
❏DOCUMENT INDEX : A simple index sorted by Doc ID and helps to create Forward index and Anchor file.
❏LEXICON : It is one kind of search engine’s dictionary and contains word list.
MAJOR DATA STRUCTURAL COMPONENTS -2
❏HIT LIST : It precisely holds information of a particular word and its position in a document.
❏FORWARD INDEX : It stores partially sorted words for each document and holds the Anchor text of a corresponding Doc ID.
❏INVERTED INDEX : The documents are rearranged by Word ID from Doc ID by the Sorter service.
9
10
LOOK, HOW THEY WORK TOGETHER...
Image credit: Stanford
AND WHAT WE SEE…… 10
Image Credit: Slideshare
Always, feel free to bug us with your query at:
www.promptcloud.comemail: [email protected]
call: +1-6507310002 (Skype) +91-8041216038
11