types of search engines and how it works

42
Types of search engine and How they work Prepared By: Hetal Dodia (8) Asif kureshi (17) Tejas Patel (27) Nidhi Trivedi (37) 1

Upload: ratan-gohel

Post on 22-May-2017

222 views

Category:

Documents


0 download

TRANSCRIPT

Types of search engine and how they work

Types of search engine and How they workPrepared By:Hetal Dodia (8)Asif kureshi (17)Tejas Patel (27)Nidhi Trivedi (37)1

InternetSearchingSearch EngineHistoryExamplesTypes Of Search EngineHow It Works.

2

Internet

InternetAn interconnected network of thousands of networks and millions of computers linking businesses, educational institutions, government agencies, and individuals together

Searching.A lot of information makes a site huge, complex and navigation difficult.Search is the user's lifeline for mastering complex websites. Search feature is essential for users when they revisit a site, looking for specific info.

4

Types of Searching A search can be of various types: Internet Search: Search Engines like Yahoo, Info seek crawl the web gathering web pages or info on web pages, index them and retrieve them when the specific term is found Database search: Databases store their information neatly organized into fields. A search Interface is provided for this.

5

SEARCH ENGINE A tool designed to search for information on the World Wide Web. The information may consist of web pages, images, information and other types of files. A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits.

6

Every ordinary user on the Internet must have good knowledge about search engines and searching in order to explore the wonderful world that the Internet creates to a greater extent.Search engines help to minimize the time required to find information and the amount of information which must be consultedSearching is one of the most used action on the Internet.Search engines as an instrument of searching, are special sites on the Web that are designed to help people find information stored on other sites.

Includes external engines like Google, Yahoo, MSN, AOL, Live.

7

History of search engineA list of web servers.. New servers were announced under title Whats newThe first tool for searching ArchieThen rise of Gopher led to 2 new search programs Veronica and Jughead.(1991)Till 1993, no search engine existed for the web.Webs first primitive search engine W3catalog.(1993)

8

History Cont..First all text crawler based search engine WEBCRAWLER (1994)Google adopted idea of selling search terms in 1998, from small company named goto.comBrightest stars in the internet investing frenzy.Google rose to prominence (2000)Microsofts first SE MSN was using search results from Inktomi

9

History ContMicrosoft rebranded SE, Bing launched on June 1 2009. on July 29, A deal between Yahoo and Bing.In 2012, Google released the Beta version of Open Drive- available as a chrome application.

10

Market of Search Engines

11

The Best & Most popular Search engine

12

4th Most visited website in the world

13

Ask Question Search Engine

14

MicroSoft Bing Search Engine

15

Types Of Search Engines

Crawler-Based Search EnginesHuman-Powered Directories"Hybrid Search Engines" or Mixed ResultsMeta Search Engine

16

Crawler-Based Search EnginesCrawler-based search engines, such as Google, create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found.

If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role.17

when the search topic is general, crawler-base search engines may return hundreds of thousands of irrelevant responses to simple search requests, including lengthy documents in which your keyword appears only once

17

Cont.Crawler-based search engines are good when you have a specific search topic in mind and can be very efficient in finding relevant information in this situation

LIKE.. Google, AllTheWeb and AltaVista

18

Human-Powered DirectoriesA human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted.Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site19

Human-powered directories are good when you are interested in a general topic of search. In this situation, a directory can guide and help you narrow your search and get refined results. Therefore, search results found in a human-powered directory are usually more relevant to the search topic and more accurate. However, this is not an efficient way to find information when a specific search topic is in mind.19

Cont..Human-powered directories are good when you are interested in a general topic of search. In this situation, a directory can guide and help you narrow your search and get refined results. Therefore, search results found in a human-powered directory are usually more relevant to the search topic and more accurate. However, this is not an efficient way to find information when a specific search topic is in mind.Example- Yahoo directory, Open Directory and LookSmart

20

Pros of Human-Powered DirectoriesFast answers (sometimes)

Answers sent directly to your phone or email. This is especially beneficial if you are on the go, and using a service such asChaChaorKGB, that allows you to ask and answer your question via text message.

Sometimes standard search engines don't know what you're talking about- and that's where dealing with an actual human helps.

21

21

Cons of Human-Powered DirectoriesLengthy search time:Having to wait for, what may seem like forever, before receiving an answer.

Unanswered questions:While some sites may take days, other sites may not even have an answer for your question

Human Error:We all know and trust Google to deliver our answers, but we have no idea who is answering our questions on human powered sites, and what their qualifications are. Would you trust just anyone? Because I certainly don't

Annoying Categorization:Many sites ask you to categorize, sub-categorize, and sub-subcategorize your questions- which takes the simplicity out of these human powered search engines.

22

"Hybrid Search Engines" or Mixed ResultsHybrid search engines use a combination of both crawler-based results and directory results. More and more search engines these days are moving to a hybrid-based model.

It extremely common for both types of results to be presented. Usually, a hybrid search engine will favor one type of listings over another.

For example, MSN Search is more likely to present human-powered listings from Look Smart. Example-Yahoo ,Google

23

Meta Search EngineTransmit user-supplied keywords simultaneously to several individual search engines to actually carry out the search.

Search results returned from all the search engines can be integrated, duplicates can be eliminated and additional features such as clustering by subjects within the search results can be implemented by meta-search engines.24

Meta Search Engine Cont..Meta-search engines are good for saving time by searching only in one place and sparing the need to use and learn several separate search engines.

But since meta-search engines do not allow for input of many search variables, their best use is to find hits on obscure items or to see if something can be found using the Internet.

25

Pros of meta search engines 1.Searching with many primary search engines often finds results missed by a single primary engine.

2. Requesting results from many primary engines in parallel saves time.

3. Eliminating duplicate results also saves time.

4. Getting results from many different primary engines provides opportunities to explore how to best combine the separate result lists26

Cons of meta search engines

1. Timeouts or long waits may occur if the meta search engine is having difficulty contacting the primary engine.

2. Many meta search engines only get the top 10 to 50 results per primary engine.

3. Some advanced features (ex. phrase searching) may not be available.

4. Many meta search engines exclude one or more of the major primary search engines (Google, Microsoft, or Yahoo).27

How it worksIndex ahead of timeFind files or recordsOpen each one and read it Store each word in a searchable indexProvide search formsMatch the query terms with words in the indexSort documents by relevanceDisplay results28

28

29

29

1. Index ahead of time by spiders

To find information on the hundreds of millions of Web pages that exist, a search engine employs special software robots, calledspiders, to build lists of the words found on Web sites.

A program that automatically fetches Web pages. Spiders are used to feed pages to search engines. It's called a spider because it crawls over the Web. Another term for these programs is web crawler.

30

ContSpiders store the lists in the engines database.

The engines indexing software builds an index of words .

Information is matched against query input and retrieved (processing algorithm)

31

What the Index NeedsBasic information for document or recordFile name / URL / record IDTitle or equivalentSize, date, MIME typeFull text of itemMore metadataProduct name, picture IDCategory, topic, or subjectOther attributes, for relevance ranking and display

32

Simple Index Diagram

33

Cont..Once the spiders have completed the task of finding information on Web pages the search engine must store the information in a way that makes it useful.

a search engine could just store the word and the URL where it was found. In reality, this would make for an engine of limited use, since there would be no way of telling whether the word was used in an important or a trivial way on the page

34

Cont Ranking list that tries to present the most useful pages at the top of the list of search results

The engine might assign a weight to each entry, with increasing values assigned to words as they appear near the top of the document, in sub-headings, in links, in the meta tags or in the title of the page

An index has a single purpose: It allows information to be found as quickly as possible. There are quite a few ways for an index to be built, but one of the most effective ways is to build a hash table. In hashing, a formula is applied to attach a numerical value to each word.

35

2.Provide search form

Searching through an index involves a user building a query and submitting it through the search engine.

The query can be quite simple, a single word at minimum. Building a more complex query requires the use of Boolean operators that allow you to refine and extend the terms of the search

Boolean operators- AND, OR, NOT, FOLLOWED BY, NEAR etc.

36

Cont..Most of search engines support caching to reduce the cost of time of searching of common words like "Amazon" dramatically. If the site received a query whose result is stored in cache, it returns the result from the cache without any posting a query request to the main database.

37

3. Display resultAfter the search engine received the result from the main database or cache, the site has to display the result to the user. The listing of result is usually quite simple: just list web pages that are hit with the description of the site. However, the order of the list is important yet difficult to judge by pure computation.38

Page rankOnce the search engine has found web pages for the given query, what ordering should the links be provided?Google researchers invented the page ranksome pages are found to be more important than others and so, if two pages match a query, order them so that the more important pages link comes firstOrdering is based on the page rank which primarily looks to see if a page is an authoritarian page which means that a lot of other pages link to it

39

Cont.Similarly, a hub is a page which has a lot of outgoing links and may represent a good starting pointAdvertising can also affect the order that pages are offeredAdvertisers will pay search engine sites to place their links before others, or in special areas of the web pageIf you go to Google and search for computers, you get links for Dell, Apple, Staples, and others near the top and to the right of the page why?they paid to be there !!Best Buy didnt pay as much, so they are located lower down !This is a consequence of commercializing the web money talks

40

Search Will Never Be PerfectSearch engines cant read mindsUser queries are short and ambiguousSome things will helpDesign a usable interface Show match words in contextKeep index current and completeAdjust heuristic weightingMaintain suggestions and synonymsConsider faceted metadata search

41

41

THANK YOU42