things you just have to know about search engines
DESCRIPTION
Things You Just Have to Know About Search Engines. Ran Hock Online Strategies May 14, 2002 InfoToday 2002. Things You Just Have to Know About Search Engines. 1 - No Search Engine Covers Everything - PowerPoint PPT PresentationTRANSCRIPT
Things You Just Have to Know About Search Engines
Ran HockOnline Strategies May 14, 2002 InfoToday 2002
Things You Just Have to Know About Search Engines
1 - No Search Engine Covers Everything
2 - Different Engines "Miss" and Find Different Things
3 - Large Numbers Aren’t Necessarily Bad Searches
4 - All Search Engines Have Techniques That Allow You Improve Results
Things You Just Have to Know About Search Engines
5 - Metasearch engines are not "search engines"
6 - Google is great, but not the only one you should use.
7 - Some Things Change, Some Don't
1 -No Search Engine Covers Everything
There are pages no engine covers: Invisible pages Un-linked pages, database pages,
password protected sites, “deep” pages, etc.
Different engines ”miss" and find different things (Point #2)
2 - Different Engines Find and Miss Different Things
Each engine may find something others missed.
Even “2nd tier” engines find things missed by the top 3
Consider the results of the following search on: “erris head” sailing
2 - Different Engines Find and Miss Different Things
2 - Different Engines Find and Miss Different Things
Of the 20 different records retrieved by all the engines, Google found (only) 14 (70%)
Google missed 6 (30%)
If you had searched Google, then just one more engine, your retrieval would have increased by 15%
Even HotBot found 2 the other three engines missed.
2 - Different Engines Find and Miss Different Things - Why ?
Indexing "policies" What words and other items get indexed How those things are "parsed"
Crawling differences Starting points Depth / Breadth of crawling etc.
Spam policiesRanking
3 - Large Numbers Aren’t Necessarily Bad Searches
Most common complaintYou’re not “obligated”All use some form of relevance
rankingRelevance ranking does, to some
degree at least, the same things we do to find the best items
What relevance ranking uses:
3 - Large Numbers Aren’t Necessarily Bad Searches
Relevance ranking uses some combination of:PopularityFrequency of termsWeighting by field (e.g., Title counts more than
Summary)Proximity of termsWeighting by size of the typeWeighting according to the order in which the
searcher entered termsEtc.
3 - Large Numbers Aren’t Necessarily Bad Searches
Most search engines automatically “enhance” your search
Automatic phrase identification
Word variants (and/or truncation)
Case sensitivity
Analysis of documents in the database (links, term association, associative networks, cluster analysis, co-occurrence, etc.)
Etc.
Automatic Re-Write - AllTheWeb
4- All Search Engines Provide Options for You to Enhance Your Search
Field Searching title URL date language etc.
Boolean (yes, “Boolean,” which is neither difficult nor bad)
4- All Search Engines Provide Options for You to Enhance Your Search
How do you know about these optionsUse the Advanced Search pageRead the documentation________________
4- All Search Engines Provide Options for You to Enhance Your Search
Use the Advanced Search page
5 - Metasearch engines are notnot “search engines”
Consider the following example of a search done in individual engines, then in metasearch engines
DoneDirectly
viavivisimo
viaDogPile
viaMetaCrawler
ViaSearch.com
Viaixquick
AllTheWeb 52 10 0 0 9 0Google 39 0 0 0 0 0WiseNut 15 0 0 0 10 0AltaVista 10 0 0 9 10 0HotBot 9 0 0 0 10 0Excite 6 0 0 0 0 1TOTAL 48 15 16 61 16
Search done for “geologic resources” worcester
5 - Metasearch engines are notnot “search engines”
Most don’t search all of the largest enginesMost don’t give you more than 10 or 20 records
from each engineMost don’t convey your full query syntax to the
target enginesMost give “paid sites” first“Client-side” metasearch programs, e.g., Copernic
and Bulls-Eye do NOT have the above problems.Even online metasearch engines have occasional
socially redeeming features (vivisimo’s clustering).
6 - Google is Great, But Not the Only One You Should Use
Points 1 and 2 - No search engine finds everything and different engines find different things
6 - Google is Great, But Not the Only One You Should Use
Great Because of: Size Popularity-based ranking Unique content
newsgroupsPDFs and other file typeslargest image collection
Dandy little features like addresses, definitions, etc.
Pretty good search options
6 - Google is Great, But Not the Only One You Should Use
But Doesn’t Have:EverythingTruncation and NEAR that AltaVista hasAs much news coverage as AllTheWebAs much currentness as AllTheWeb
(maybe)Etc.
7 - Search Engines Change
In some ways a lot, in other ways very little
7 - Search Engines Change
Areas of little changeFor most engines: How they do basic
things such as phrases, Boolean, truncation, field searching etc.
7 - Search Engines ChangeAreas of frequent/considerable changeSome come, some go
Gone” Go/InfoSeek et al. Arrived: WiseNut, Teoma
How things are arranged on the home page (esp. AltaVista)
Partners (which directory they use, featured partners and tools, etc.)
Added content, esp, content types (PDFs, newsgroups, etc. in Google.)
In Summary
1 - No Search Engine Covers Everything
2 - Different Engines "Miss" and Find Different Things
3 - Large Numbers Aren’t Necessarily Bad Searches
4 - All Search Engines Have Techniques That Allow You Improve Results
5 - Metasearch engines are not "search engines"
6 - Google is great, but not the only one you should use.
7 - Some Things Change, Some Don't
Ran HockOnline [email protected]