internet exploration: search engines

18
Internet Exploration: Search Engines Computer Information Technology – Section 3-2

Upload: audi

Post on 24-Feb-2016

70 views

Category:

Documents


0 download

DESCRIPTION

Internet Exploration: Search Engines. Computer Information Technology – Section 3-2. The Internet. Objectives: The Student will: Understand Search Engines and how they work Understand the pros and cons or various popular search engines - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Internet Exploration: Search Engines

Internet Exploration:Search Engines

Computer Information Technology – Section 3-2

Page 2: Internet Exploration: Search Engines

The InternetObjectives:

The Student will:1. Understand Search Engines and how they work2. Understand the pros and cons or various popular

search engines3. Understand the definitions of terms associated

Search Engines.4. Perform a basic search and compare results from

different search engines

Page 3: Internet Exploration: Search Engines

How does a search work?Google give a quick tour of how a search

works:http://www.google.com/intl/en/insidesearch/howsearchworks/thestory/index.html

Page 4: Internet Exploration: Search Engines

Search EnginesSearch Engine: A program that searches

documents for specified keywords and returns a list of the documents where the keywords were found. Without search engines you would never be able to find anything on the web

Typically, a search engine works by sending out a spider to fetch as many documents as possible.

Spider: A program that automatically fetches Web pages. Spiders are used to feed pages to search engines. It's called a spider because it crawls over the Web. Another term for these programs is webcrawler. Because most Web pages contain links to other pages, a spider can start almost anywhere. As soon as it sees a link to another page, it goes off and fetches it.

Page 5: Internet Exploration: Search Engines

Search EnginesSpiders or Crawlers visit a Web site, read the

information on the actual site, read the site's meta tags and also follow the links that the site connects to performing indexing on all linked Web sites as well.

meta tags: A special HTML tag that provides information about a Web page. You can’t see meta tags on the web page. They provide information such as who created the page, how often it is updated, what the page is about, and which keywords represent the page's content.

The crawler returns all that information back to a central depository, where the data is indexed.

This is the data the search engine searches! This is why search engines return links that are no longer valid.

Page 6: Internet Exploration: Search Engines

Search EnginesCrawlers rely entirely on links from other

web pages, so if a web page is never linked to in any other page, search engine spiders cannot find it.

Crawlers will return to web pages periodically to update the database

Page 7: Internet Exploration: Search Engines

Search Engines – Why they give different resultsNot all indices are going to be exactly the

same. It depends on what the spiders find (or what the

humans submitted). Not every search engine uses the same

algorithm to search through the indices. The algorithm is what the search engines use to

determine the relevance of the information in the index to what the user is searching for.

Algorithm: A formula or set of steps for solving a particular problem.

Page 8: Internet Exploration: Search Engines

Search Engines – Why they give different resultsGoogle has one of the largest databases but

studies indicate that less than ½ of the searchable web is searchable in Google.

Studies also show that more than 80% of the pages in a major search engine's database exist only in that database.

When doing research try different search engines!

Page 9: Internet Exploration: Search Engines

Search Engines – ComparisonSearch Engine Google Yahoo Ask.com

Size, type. HUGE. Size not disclosed in any way that allows comparison. Probably the biggest.

HUGE. Claims over 20 billion total "web objects."

LARGE. Claims to have 2 billion fully indexed, searchable pages.

Noteworthy features

Many additional databases including Book Search, Scholar (journal articles), Blog Search, Patents, Images, etc.

Shortcuts give quick access to dictionary, synonyms, patents, traffic, stocks, encyclopedia, and more.

Boolean logic Partial. AND assumed between words.Capitalize OR.( ) accepted but not required. In Advanced Search partial Boolean available in boxes.

Accepts AND, OR, NOT or AND NOT. Must be capitalized.( ) accepted but not required.

Partial. AND assumed between words.Capitalize OR.- excludes.No ( ) or nesting.

Page 10: Internet Exploration: Search Engines

Search Engines – ComparisonSearch Engine Google Yahoo Ask.com

+Requires/ -Excludes

- excludes + will allow you to retrieve “stop words” (e.g., +in)

- excludes  + will allow you to search common words: "+in truth"

- excludes + will allow you to retrieve “stop words” (e.g., +in)

Sub-Searching The search box at the top of the results page shows your current search. Modify this (e.g., add more terms at the end.)

The search box at the top of the results page shows your current search. Modify this (e.g., add more terms at the end.)

The search box at the top of the results page shows your current search. Modify this (e.g., add more terms at the end.)

Results Ranking Based on page popularity measured in links to it from other pages: high rank if a lot of other pages link to it. Matching and ranking based on "cached" version of pages that may not be the most recent version.

Documents with all terms are ranked first, followed by documents containing any terms. The farther down, the fewer the terms, although at least one should always be present.

Based on Subject-Specific Popularity™, links to a page by related pages.

Page 11: Internet Exploration: Search Engines

Search Engines – ComparisonSearch Engine Google Yahoo Ask.com

Truncation,Stemming

No truncation. Stems some words. Search variant endings and synonyms separately, separating with OR (capitalized):airline OR airlines

Neither. Search with OR as in Google.

Neither. Search with OR as in Google.

Page 12: Internet Exploration: Search Engines

Search Engines – Search ResultsSearching for “Hancock High School”:

Google: About 32,000,000 results Yahoo: 4,250,000 resultsAsk.com: Doesn’t tell you.Bing.com: 4,120,000 results

Page 13: Internet Exploration: Search Engines

Search Engines – Search Results

Page 14: Internet Exploration: Search Engines

Search Engines – Search Results

Page 15: Internet Exploration: Search Engines

Search Engines – Search Results

Page 16: Internet Exploration: Search Engines

Search Engines – Search Results

Page 17: Internet Exploration: Search Engines

Search Engines – Wrap-UpTerms you should know:1. Search Engine:

A program that searches documents for specified keywords

2. Spider or Crawler: A program that automatically fetches Web

pages.3. Meta tags:

A special HTML tag that provides information about a Web page.

4. Algorithm: A formula or set of steps for solving a

particular problem.

Page 18: Internet Exploration: Search Engines

Search Engines – AssignmentBefore you leave today…

Pick a topic of interest to you. IT MUST BE APPROPRIATE FOR SCHOOL!

Pick 3 search engines (Google, Yahoo, Altavista.com , Ask.com, www.alltheweb.com, bing.com, www.askjeeves.com, lycos.com)

Do a search on your topic On the paper put:1. Your Name and the period.2. Your topic3. Report how many web sites each search engine

finds4. Note if any of the top 10 sites are the same between

the different search engines (circle the sites that are on all 3 lists).