things you just have to know about search engines

27
Things You Just Have to Know About Search Engines Ran Hock Online Strategies May 14, 2002 InfoToday 2002

Upload: ivi

Post on 10-Jan-2016

18 views

Category:

Documents


0 download

DESCRIPTION

Things You Just Have to Know About Search Engines. Ran Hock Online Strategies May 14, 2002 InfoToday 2002. Things You Just Have to Know About Search Engines. 1 - No Search Engine Covers Everything - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Things You Just Have to Know About Search Engines

Things You Just Have to Know About Search Engines

Ran HockOnline Strategies May 14, 2002 InfoToday 2002

Page 2: Things You Just Have to Know About Search Engines

Things You Just Have to Know About Search Engines

1 - No Search Engine Covers Everything

2 - Different Engines "Miss" and Find Different Things

3 - Large Numbers Aren’t Necessarily Bad Searches

4 - All Search Engines Have Techniques That Allow You Improve Results

Page 3: Things You Just Have to Know About Search Engines

Things You Just Have to Know About Search Engines

5 - Metasearch engines are not "search engines"

6 - Google is great, but not the only one you should use.

7 - Some Things Change, Some Don't

Page 4: Things You Just Have to Know About Search Engines

1 -No Search Engine Covers Everything

There are pages no engine covers: Invisible pages Un-linked pages, database pages,

password protected sites, “deep” pages, etc.

Different engines ”miss" and find different things (Point #2)

Page 5: Things You Just Have to Know About Search Engines

2 - Different Engines Find and Miss Different Things

Each engine may find something others missed.

Even “2nd tier” engines find things missed by the top 3

Consider the results of the following search on: “erris head” sailing

Page 6: Things You Just Have to Know About Search Engines

2 - Different Engines Find and Miss Different Things

Page 7: Things You Just Have to Know About Search Engines

2 - Different Engines Find and Miss Different Things

Of the 20 different records retrieved by all the engines, Google found (only) 14 (70%)

Google missed 6 (30%)

If you had searched Google, then just one more engine, your retrieval would have increased by 15%

Even HotBot found 2 the other three engines missed.

Page 8: Things You Just Have to Know About Search Engines

2 - Different Engines Find and Miss Different Things - Why ?

Indexing "policies" What words and other items get indexed How those things are "parsed"

Crawling differences Starting points Depth / Breadth of crawling etc.

Spam policiesRanking

Page 9: Things You Just Have to Know About Search Engines

3 - Large Numbers Aren’t Necessarily Bad Searches

Most common complaintYou’re not “obligated”All use some form of relevance

rankingRelevance ranking does, to some

degree at least, the same things we do to find the best items

What relevance ranking uses:

Page 10: Things You Just Have to Know About Search Engines

3 - Large Numbers Aren’t Necessarily Bad Searches

Relevance ranking uses some combination of:PopularityFrequency of termsWeighting by field (e.g., Title counts more than

Summary)Proximity of termsWeighting by size of the typeWeighting according to the order in which the

searcher entered termsEtc.

Page 11: Things You Just Have to Know About Search Engines

3 - Large Numbers Aren’t Necessarily Bad Searches

Most search engines automatically “enhance” your search

Automatic phrase identification

Word variants (and/or truncation)

Case sensitivity

Analysis of documents in the database (links, term association, associative networks, cluster analysis, co-occurrence, etc.)

Etc.

Page 12: Things You Just Have to Know About Search Engines

Automatic Re-Write - AllTheWeb

Page 13: Things You Just Have to Know About Search Engines

4- All Search Engines Provide Options for You to Enhance Your Search

Field Searching title URL date language etc.

Boolean (yes, “Boolean,” which is neither difficult nor bad)

Page 14: Things You Just Have to Know About Search Engines

4- All Search Engines Provide Options for You to Enhance Your Search

How do you know about these optionsUse the Advanced Search pageRead the documentation________________

Page 15: Things You Just Have to Know About Search Engines

4- All Search Engines Provide Options for You to Enhance Your Search

Use the Advanced Search page

Page 16: Things You Just Have to Know About Search Engines
Page 17: Things You Just Have to Know About Search Engines

5 - Metasearch engines are notnot “search engines”

Consider the following example of a search done in individual engines, then in metasearch engines

Page 18: Things You Just Have to Know About Search Engines

DoneDirectly

viavivisimo

viaDogPile

viaMetaCrawler

ViaSearch.com

Viaixquick

AllTheWeb 52 10 0 0 9 0Google 39 0 0 0 0 0WiseNut 15 0 0 0 10 0AltaVista 10 0 0 9 10 0HotBot 9 0 0 0 10 0Excite 6 0 0 0 0 1TOTAL 48 15 16 61 16

Search done for “geologic resources” worcester

Page 19: Things You Just Have to Know About Search Engines

5 - Metasearch engines are notnot “search engines”

Most don’t search all of the largest enginesMost don’t give you more than 10 or 20 records

from each engineMost don’t convey your full query syntax to the

target enginesMost give “paid sites” first“Client-side” metasearch programs, e.g., Copernic

and Bulls-Eye do NOT have the above problems.Even online metasearch engines have occasional

socially redeeming features (vivisimo’s clustering).

Page 20: Things You Just Have to Know About Search Engines

6 - Google is Great, But Not the Only One You Should Use

Points 1 and 2 - No search engine finds everything and different engines find different things

Page 21: Things You Just Have to Know About Search Engines

6 - Google is Great, But Not the Only One You Should Use

Great Because of: Size Popularity-based ranking Unique content

newsgroupsPDFs and other file typeslargest image collection

Dandy little features like addresses, definitions, etc.

Pretty good search options

Page 22: Things You Just Have to Know About Search Engines

6 - Google is Great, But Not the Only One You Should Use

But Doesn’t Have:EverythingTruncation and NEAR that AltaVista hasAs much news coverage as AllTheWebAs much currentness as AllTheWeb

(maybe)Etc.

Page 23: Things You Just Have to Know About Search Engines

7 - Search Engines Change

In some ways a lot, in other ways very little

Page 24: Things You Just Have to Know About Search Engines

7 - Search Engines Change

Areas of little changeFor most engines: How they do basic

things such as phrases, Boolean, truncation, field searching etc.

Page 25: Things You Just Have to Know About Search Engines

7 - Search Engines ChangeAreas of frequent/considerable changeSome come, some go

Gone” Go/InfoSeek et al. Arrived: WiseNut, Teoma

How things are arranged on the home page (esp. AltaVista)

Partners (which directory they use, featured partners and tools, etc.)

Added content, esp, content types (PDFs, newsgroups, etc. in Google.)

Page 26: Things You Just Have to Know About Search Engines

In Summary

1 - No Search Engine Covers Everything

2 - Different Engines "Miss" and Find Different Things

3 - Large Numbers Aren’t Necessarily Bad Searches

4 - All Search Engines Have Techniques That Allow You Improve Results

5 - Metasearch engines are not "search engines"

6 - Google is great, but not the only one you should use.

7 - Some Things Change, Some Don't

Page 27: Things You Just Have to Know About Search Engines

Ran HockOnline [email protected]