the invisible web gary price, mlis george washington university chris sherman associate editor...
TRANSCRIPT
The Invisible WebThe Invisible Web
Gary Price, MLISGeorge Washington University
Chris ShermanAssociate Editor
Search Engine Watch
Your Browser
How Search Engines WorkHow Search Engines Work
The Web
URL1
URL2
URL3 URL4
Crawler
Indexer
SearchEngine
Database Eggs?Eggs.
Eggs - 90%Eggo - 81%Ego- 40%
Huh? - 10%
All AboutEggsby
S. I. Am
What is the Invisible Web?What is the Invisible Web?
• “Stuff” that search engine crawlers (spiders) can not -- or will not -- add to their databases
• 2 to 50 times larger than the visible Web
• Resources often much higher quality than the visible Web
What is the Invisible Web?What is the Invisible Web?
• Certain file formats (PDF, Flash, Office files, streaming media)– Why? They aren’t HTML text
• Most real-time data (stock quotes, weather, airline flight info)– Why? Ephemeral & storage intensive
What is the Invisible Web?What is the Invisible Web?
• Dynamically generated pages (cgi, javascript, asp, or most pages with “?” in URL)– Why? Spider traps
• Web accessible databases– Why? Spiders can’t type
Invisible Web GatewaysInvisible Web Gateways
• Intelliseek– http://www.invisibleweb.com– http://beta.profusion.com
• Complete Planet– http://www.completeplanet.com/
• Librarians’ Index to the Internet– http://www.lii.org
The Invisible Web The Invisible Web & The Librarian& The Librarian
The Need For Knowledge!• Awareness that the IW Exists
Maybe the IW Hold the Content Your Users Can’t Find! What is the cost in both wasted time/effort and total frustration?
• Let Others Know About the IW• Awareness of The Synonyms
– Invisible Web– Deep Web– Hidden Web
• Let the Content be Your Calling CardFocus Less on the Amount IW Data
The Invisible Web The Invisible Web & The Librarian& The Librarian
Why is the IW Useful to the Librarian and the End User?
• Quality of Content (Authority)• Deep Content on Subject Area
(Comprehensiveness) • Focused Databases (Limited Scope)
Smaller Universe of Documents to Search (Maximize Precision/Recall)
The Invisible Web The Invisible Web & The Librarian& The Librarian
Why is the IW Useful to the Librarian & the End User?
• Material Unavailable Elsewhere on the Web Material Unavailable Elsewhere on the Web (Uniqueness)(Uniqueness)
• Many Options to Limit, Sort, Interact with the Many Options to Limit, Sort, Interact with the DataData(Maximize Precision)(Maximize Precision)
• Timeliness vs. Time Lag of General Search Timeliness vs. Time Lag of General Search Tools (Currency)Tools (Currency)
The Invisible Web The Invisible Web & The Librarian& The Librarian
The IW, The Librarian, The Future
• What Happens If/When the General Search Tools Crawl IW Material? Good News? Bad News?
• General Search Tools May NOT:Offer Many Interactive/Limiting ToolsMay Not be Updated/Refreshed (time lag) as FrequentlyTimeliness, making current info available is one of the things the NET does well.
The Invisible Web The Invisible Web & The Librarian& The Librarian
The IW, The Librarian, The Future
• The Search Engine Business, Will IW Material be a Priority?
• Just One Dialog or SilverPlatter Database?NO, in Terms of Content!!!
• Yes, Common Interface, SyntaxPerhaps XML will Assist
The Invisible Web The Invisible Web & The Librarian& The Librarian
Challenges
• It’s Not The Magic Bullet. It’s a Tool• We Still Need Traditional Online Databases• Learning Curve, Sorry!• Database Selection, When To Use the IW? • Numerous Interfaces, Syntax• A Non-Stop Flow of New Material
The Invisible Web The Invisible Web & The Librarian& The Librarian
Things To Do!
• Build Your Own CollectionsInternet Resource Collection Development
• Mine Entire Sites, Often the IW Material Gets Little or No Notice In Reviews
• Create Links When Possible DIRECT to the Interface.
• “Save the Time of the Web Researcher”• Keep Current
The Invisible Web The Invisible Web & The Librarian& The Librarian
• Bibliographic- OPAC’s- Subject Bibs
• Non-Bibliographic- Full-Text- Numeric- Graphic- Directory- Real-Time
Types of IW Content in Librarian Terms
Future TrendsFuture Trends
• Killer apps will lead the way– Research Index (CiteSeer)
• Search engines will work harder to “find” Invisible Web content– Inktomi (Index Connect, Ultraseek)– WhizBang (“wrappers”)
• No matter what, there will always be a problem!
Coming SoonComing Soon
Available: July 2001 CyberAge Books 0-910965-51-
X
http://www.invisible-web.net
Invisible Web:Invisible Web:Computer ScienceComputer Science
• MacAfee World Virus Map – http://www.mcafee.com
• ResearchIndex – http://www.researchindex.com
Invisible Web:Invisible Web:Company ResearchCompany Research
• European High-Tech Industry Database – http://www.tornado-insider.com/
radar/
• Kompass – http://www.kompass.com
Invisible Web:Invisible Web:Intellectual PropertyIntellectual Property
• Delphion Intellectual Property Network– http://www.delphion.com/
• ESP@CENET (European Patent Office) Patent Database – http://ep.espacenet.com/
Invisible Web:Invisible Web:Dictionaries & LanguagesDictionaries & Languages
• EuroDicAutom – http://eurodic.ip.lu
• Verbix – http://www.verbix.com/index.html
Invisible Web:Invisible Web:Art & ArtistsArt & Artists
• ADAM (Art, Design, Architecture & Media Information Gateway) – http://adam.ac.uk/
• Artcyclopedia – http://www.artcyclopedia.com/
Invisible Web:Invisible Web:Real-Time InformationReal-Time Information
• Flight Tracker– http://www.trip.com/ft/home/
0,2096,1-1,00.shtml
• J-Track 3-D Satellite Locator – http://liftoff.msfc.nasa.gov/realtime/
JTrack/Spacecraft.html
Invisible Web:Invisible Web:Maps and Driving Maps and Driving
DirectionsDirections• MapBlast
– http://www.mapblast.com
• Streetmap.co.uk– http://www.streetmap.co.uk/
Invisible Web:Invisible Web:Government InfoGovernment Info
• Parline Database – http://www.ipu.org
• United Nations Daily Press Briefings– http://www.un.org/News/
Invisible Web:Invisible Web:Health & MedicineHealth & Medicine
• Economics of Tobacco Control Database – http://www1.worldbank.org/tobacco/
database.asp
• International Digest of Health Legislation – http://www.who.int
Invisible Web:Invisible Web:News & Current EventsNews & Current Events
• Cold North Wind Newspaper Archive Project – http://www.coldnorthwind.com
• Financial Times Global Archive – http://www.globalarchive.ft.com
Invisible Web:Invisible Web:ScienceScience
• Great Barrier Reef Online Image Catalogue– http://www.gbrmpa.gov.au/corp_site/
info_services/library/index.html
• Nuclear Explosions Database – http://www.ausseis.gov.au/databases