midnight in the garden of good and evil search engines
DESCRIPTION
Midnight in the Garden of Good and Evil Search Engines. Presentation by Richard Wiggins Technical Advisor, NEM Online, Michigan State University www.msu.edu/staff/rww [email protected] Columnist, “Internet Buzz,” webreference.com www.webreference.com/outlook [email protected] - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/1.jpg)
Midnight in the Garden of Good and Evil Search Engines
• Presentation by Richard Wiggins
– Technical Advisor, NEM Online, Michigan State University
• www.msu.edu/staff/rww
– Columnist, “Internet Buzz,” webreference.com
• www.webreference.com/outlook
– Co-host, Nothing But Net television program (produced by Media One)
![Page 2: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/2.jpg)
A Parable: The Encounter
Between the USS Nimitz and a Canadian
Vessel...
![Page 3: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/3.jpg)
![Page 4: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/4.jpg)
A Frequency Analysis of A Frequency Analysis of the Appearance of a the Appearance of a Critical Search Term Critical Search Term Among Major Search Among Major Search
Engines...Engines...
![Page 5: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/5.jpg)
Frequency of the Search Frequency of the Search Term “Slavko” Among Term “Slavko” Among Major Search IndexesMajor Search Indexes
• AltaVistaAltaVista 54775477
• ExciteExcite 11601160
• InfoseekInfoseek 14521452
• HotbotHotbot 42264226
![Page 6: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/6.jpg)
AltaVista
ExciteInfoseek
Hotbot
Slavko"Celtic Music"0
2000
4000
6000
8000
10000
12000
14000
Slavko
"CelticMusic"
![Page 7: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/7.jpg)
Come Join Our Tour ofCome Join Our Tour of
...a place ...a place millions want to visit...millions want to visit...
……where a cast of characters where a cast of characters stands ready to help you find stands ready to help you find
exactly exactly what you’re looking for...what you’re looking for...
![Page 8: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/8.jpg)
SearchVannah’s Tour Guides
• …a relatively new town
• …only existed since 1993
• With so many visitors, lots of tour guides have set up shop– They tend to have funny names– They compete fiercely– They’re all trying to make money helping
visitors find their way
![Page 9: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/9.jpg)
The Tour Guides• AltaVista
– Fast, lots of memory, knows a lot
– But people complain sometimes results are inconsistent
• InfoSeek
– Claims answers are more relevant
• MetaCrawler
– Doesn’t know anything at all! Just asks the other tour guides!
![Page 10: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/10.jpg)
HotBotHotBot: This HotBot: This
tour guide tour guide wears wears
the the ugliest ugliest clothes!clothes!
![Page 11: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/11.jpg)
The Tour Guides...• Inktomi: other tour guides hire Inktomi to
answer their questions• One guide knows a LOT less than all the
others…– But it’s the most popular by far!– The smarter tour guides think of it as just a dumb
Yahoo…
• But maybe tourists want to know where the B&B is, not a list of all the towels and dishes
![Page 12: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/12.jpg)
Definitions
• Crawler: automated tool to discover new and changed pages, feeds data to…
• Indexer: builds and maintains an index, concordance-style
• Search engine: the actual tool end-users employ when searching
• …but in popular usage, all together = “search engine”
![Page 13: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/13.jpg)
Leveraging 30 Years of Information Retrieval (IR)
• Most new ideas we see in Web engines were thought of long ago...– Stemming– Controlled vocabulary– Text analytics– Knowledge Bases– Personalization (by observing user usage patterns)– Natural language
![Page 14: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/14.jpg)
How Do People How Do People Search?Search?
““Honestly, tourists are the dumbest people” Honestly, tourists are the dumbest people” -- anonymous Tour Guide-- anonymous Tour Guide
![Page 15: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/15.jpg)
What Do People Search For?
• Major search services say people look for... – Sex sites– One’s own name– Friends, colleagues’ Web sites (also by name)– Items in the news– Company / product information– Etc.
![Page 16: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/16.jpg)
Metaspy: Window into Real User Queries
![Page 17: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/17.jpg)
![Page 18: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/18.jpg)
One user view of search.msu.edu: Academics
•application for graduation•overseas study•ordering catalog•School of Music•Computer Science•human ecology department•psychology 101
![Page 19: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/19.jpg)
Another user view of search.msu.edu: Virtual Library
•DNA sequencing•climate change•beam theory•feline brain tumor•PRL and sequencing
![Page 20: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/20.jpg)
Another user view of search.msu.edu: Extension
•livestock pavilion•wildlife fisheries•bathtub removal and installation
•Round Bale Storage
![Page 21: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/21.jpg)
Another user view of search.msu.edu: Conversational
• I would like to know if you offer a workshop on “International Law”
![Page 22: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/22.jpg)
What Do People Search For?Matt Koll’s Formulation
• “finding a needle in a haystack”
• a known needle in a known haystack
• a known needle in an unknown haystack,
• to any needle in a haystack
• Where are the haystacks?
• GenX rendition: Needles? Haystacks? Whatever!
![Page 23: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/23.jpg)
Typical User Search Strategy
• Type in a one-word search term• Maybe two words• Seldom exploit advanced options
– Capitalization– Quoting phrases (e.g. “climate change”)– Date restrictions– Host:, URL: parameters
• Seldom use iterative refinement
![Page 24: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/24.jpg)
Users Make “Wrong” Choices
• Picking the right database is confusing– Reference librarians, experienced users learn brand
names– Inexperienced users do not
• Lycos example: “Small” versus “Large” catalog– “Small” catalog was faster, more precise– Virtually no one used it, thinking “Large” meant
“better”
![Page 25: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/25.jpg)
A Route 128 Story
• Engineering firm on Route 128
• Engineers new products
• Has constant need for specialized information
• Uses traditional sources, and the Web
• “Joe down the hall” does the Internet searches
• Joe is a reference librarian with an engineering degree (and no training in online searching!)
![Page 26: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/26.jpg)
Prospects for Training are Dismal!• We don’t know the users, so we can’t hope to
train them• Users won’t read documentation or help notes• If engine doesn’t deliver, users react viscerally
– “This engine is useless” or– “The Internet has nothing useful”– “The Internet has too much information!”
![Page 27: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/27.jpg)
How Well Do Today’s Engines Meet Real Users’ Needs?
• Most engines cannot yield high precision, high recall hit list with only one search term
• But most users don’t compose or refine their searches carefully
• Boolean operators virtually unused
• Therefore most users probably fail to get desired results
• Many sample searches from MSU example would not yield desired information
![Page 28: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/28.jpg)
AltaVista “Intelligent” Case Matching Example
• Looking for information on “TREC” search engines testing at NIST
![Page 29: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/29.jpg)
Scale IssuesScale Issues
““This town is growing so fast, and there’s This town is growing so fast, and there’s too many tourists!” -- a 3rd generation too many tourists!” -- a 3rd generation
residentresident
![Page 30: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/30.jpg)
The Problem of Scale
• No one knows exact size of Web– Databases, intranets complicate issue– “Dark matter” -- Vint Cerf
• Probably 250 to 500 million pages publicly accessible
• Recent Science article claims most spider coverage is incomplete
• AltaVista claims 140 million pages in index
![Page 31: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/31.jpg)
1 Billion URLs -- and Beyond
1996 1997 19981996 1997 1998
30M30M
140M140M
1000M1000M
![Page 32: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/32.jpg)
Problem of Scale: Transaction Load
• AltaVista handles 30 million searches per day
• Inktomi is “back-end” for numerous sites– HotBot, N2H2 (Japan), Australian news service– Soon, the “find a Web site” function in
Windows 98
• No popular service has melted down yet
![Page 33: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/33.jpg)
Inktomi’s “Network of Workstations” Model
• Eric Brewer, CEO, claims centralized high-speed servers cannot scale
• Developed new clustering scheme: dozens or hundreds of low-cost servers on high-speed network
• But centralized engines have not broken down yet
• 64-bit processors @ 300-450 MHz, gigabytes of RAM, fast paths to disk
![Page 34: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/34.jpg)
TrendsTrends
““We have a forward-looking sense of We have a forward-looking sense of fashion!” -- one of the tour guidesfashion!” -- one of the tour guides
![Page 35: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/35.jpg)
Trends Among Search Engines • Observations of Dr. Susan Feldman, Cornell:
• More professional look, feel than a couple years ago
• Common syntax evolving:– Plus sign prefix for required term, minus for
excluded term
– Quotes signify phrases, caps signify case significant
• Unique “personalities” evolving
![Page 36: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/36.jpg)
The Role of Meta-Crawlers
• Experts agree that spider coverage varies across services
• No two services cover the same sites for a given search
• Therefore searching across multiple indexes yields more results
• Therefore metacrawlers can be useful
![Page 37: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/37.jpg)
Targeted Spiders• Train the spider to crawl only sites that fit a certain
subject domain
• InfoSeek News Index– Death of a Princess example
• Internet.com’s “vertical” index
• LawCrawler
• NEM Online– Research project at Michigan State University
– Harnessing information of use to manufacturers
![Page 38: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/38.jpg)
“death of Princess Diana” Search on Infoseek, 8/31/97 1:00 pm
![Page 39: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/39.jpg)
AskJeeves: Question-oriented Knowledge Base
![Page 40: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/40.jpg)
A Better AskJeeves Question
![Page 41: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/41.jpg)
Northern LightNorthern Light
![Page 42: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/42.jpg)
Traditional Model: First, Pick a Database, Then Do Your Search
![Page 43: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/43.jpg)
Why Northern Light is a Breakthrough
• Delivering quality sources alongside Web resources– As Web becomes more cluttered, advantage grows
• Database search paradigm inverted: First do your search, then pick your source
• Automatic categorization yields manageable hit lists– Advantage also grows as Web grows
![Page 44: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/44.jpg)
Real Name System
![Page 45: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/45.jpg)
Specialized Engines: Serving Specific Geographic Areas
![Page 46: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/46.jpg)
Search for “Intel” on ExciteSearch for “Intel” on Excite
![Page 47: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/47.jpg)
Alexa: Group Alexa: Group ExperienceExperience
![Page 48: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/48.jpg)
Beyond Text: Still Images, Digitized Speech, Video
• We tend to think of search engines as limited to text
• But increasingly we will face digital content• Thanks to scanners, digital cameras, digital
sound cards, digital video cameras• These digital collections will be corporate assets• But to use, and re-purpose, these assets, we will
need search engines
![Page 49: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/49.jpg)
IBM Almaden’s Image Search Software
• Able to index a large collection of still images
• Able to find similar images – User selects image, asks for similar shapes– User draws shapes– User filters by color, textual metadata
• Samples available online:– Searchable digital postage stamp archive
• www.qbic.almaden.ibm.com/cgi-bin/stamps-demo
– Searchable archive of trademarks (logos)
![Page 50: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/50.jpg)
![Page 51: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/51.jpg)
Magnifi: Multimedia Search Engine
![Page 52: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/52.jpg)
AltaVista Keyword Index into Clinton Testimony Video
![Page 53: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/53.jpg)
Cross-Language Searching
• Internet is biased towards English• But it is a World Wide Web• Tools to allow searching in one language,
against a universe in other languages, are evolving
• Challenges of understanding meaning, resolving ambiguities multiply
• But effective tools are coming
![Page 54: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/54.jpg)
The AltaVista Translation Service: Extending Search
Engines into New Areas
• Translates to/from English, Spanish, French, Italian, German
• Try translating “Are you having a bad hair day?” to another language and back...
![Page 55: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/55.jpg)
Translation Result:
• “Are you having a bad hair day?” ...becomes…
• “It is for you defective day of hats, no?”
![Page 56: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/56.jpg)
Any Portal in a Storm
• Search engine services becoming portals
• Non-index services– Browsing view
– Stock quotes
– Pager services
– Personalization (“My Yahoo, My AltaVista, My Foot)
• The linear search engine result set can’t compete without added components
![Page 57: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/57.jpg)
Evaluating the EnginesEvaluating the Engines
““You just can’t trust some of the other tour You just can’t trust some of the other tour guides!” -- guides!” -- every every tour guide tour guide
![Page 58: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/58.jpg)
Evaluating Search Engines• Searchenginewatch.com
– Part of internet.com family– “Search Engine EKG”– Measures rate of crawling, other metrics, fhor leading
Web engines
• National Institute of Standards and Technology TREC Series– Rigorous annual “bakeoff” conducted by Donna Harmon– Leading technology firms, university researchers compete
![Page 59: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/59.jpg)
SearchEngineWatch.com: Search Engine EKG
![Page 60: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/60.jpg)
AltaVista vs Infoseek: An Accidental Bakeoff
• Michigan State University was first university to acquire AltaVista Intranet product (1996)
• Used for campus-wide spider as well as subject-specific index (manufacturing)– search.msu.edu– www.nemonline.org
• Infoseek on its own initiative set up an index of msu.edu
![Page 61: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/61.jpg)
• In many cases, AltaVista and Infoseek return very similar results
• Using actual searches typed by users, in some cases Infoseek shows superior relevancy ranking– Word proximity has more weight
• Infoseek also appears to offer superior duplicate detection
• “Find similar” in Infoseek works very well
AltaVista vs Infoseek: Preliminary Observations
![Page 62: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/62.jpg)
Decentralized Searching: The Infoseek
Experiment• Steve Kirsch (CEO of Infoseek) offers this
experiment:
• “Name a movie by James Cameron”
![Page 63: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/63.jpg)
What This Experiment Shows…
• Some servers are louder than others
• Several servers know recent, highly-publicized information
• Some pieces of information are known only to one server
• Some servers give out wrong information
• Some servers never answer any question
![Page 64: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/64.jpg)
Decentralization Trend
• We’ve tried decentralized indexes with little success– WAIS– Harvest
• But scale of single central indexes may force new attempts
• Infoseek intends major push– Network of “Ultraseek” intranet sites– “Use other people’s servers to do the hard work”
![Page 65: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/65.jpg)
Ethics of EnginesEthics of Engines
““What does ethics have to do with helping What does ethics have to do with helping people find things??!” -- people find things??!” -- every every tour guide tour guide
![Page 66: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/66.jpg)
The Ethics of Search Engines
• Gaining value from freely-available content
• Yahoo, AskJeeves advertise themselves as reference sources
• They make money on answers that others provide for free
• Are they a bibliography, which has always been legitimate?
• Or, thanks to the hyperlink, are they exploiting those who provide the real value?
![Page 67: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/67.jpg)
Ethics: Index Spamming
• People learned to spam the index early on– Overload your page with terms people use in searching– Some sites present a different page to the spider than
the end user sees– One church asked a Web developer to put in meta tags
with obscene words
• Is spamming unethical?– Seems to be, but why exactly?– Sears catalog vs Montgomery Ward
![Page 68: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/68.jpg)
Ethics: The Search Services’ Incentives
• Most make money from banner ads
• They want to maximize page impressions and clickthroughs
• The ideal user would search forever!
• Banner ads adapt to the search based on keyword– Banner ad technology is better than result set
technology!
![Page 69: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/69.jpg)
Ethics: Editorial Copy Versus Advertising-Influenced
• In the print world, it’s pretty obvious what’s an advertisement– Yellow Pages– New York Times– Thomas Register
• To avoid confusion, some ads are labeled
• In the online world, it’s not always clear
• If companies sold better search positions, how would we know?
![Page 70: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/70.jpg)
Ethics: Buy First Place on the Hit List -- and Tell How Much You Paid!
![Page 71: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/71.jpg)
Paying for Position
"We tried the editorial system of rating pages, and we found that it wasn't scalable...but the market is infinitely scalable.”
– Jeffrey Brewer, CEO, GoTo.com
![Page 72: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/72.jpg)
The FutureThe Future
““I don’t know what the future is, but we’ll be I don’t know what the future is, but we’ll be number one!!!” -- number one!!!” -- every every tour guide tour guide
![Page 73: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/73.jpg)
The Future: Promises and Limits
• IR scientists say engines may be approaching fundamental limit
• Koll: typical gigabyte of searchable space holds 25,000 occurrences of typical search term
• “With a lot of work, maybe we can get to 50% recall and 50% precision”
• But combination of approaches can yield greater power
![Page 74: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/74.jpg)
The Search Engine Industry
• Analysts generally agree that “Yahoo wins!”!• Claims ~100 million transactions per day• Also claims 30 million unique users• Also claims more “viewership” per day than
most specialty cable TV channels (e.g. MTV)• And it’s a catalog, not a full-text engine!
![Page 75: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/75.jpg)
Search Engine Companies’ “Value Per User” (Mecklermedia)
VALUE INDEX UsersMarket
Value ofCompany
Value PerUser
(sorted by valueper user)
(millions) (millions)
Yahoo 32.5 $ 5,273 $ 162.38Microsoft.com 18.0 $ 1,850 $ 102.68Excite 19.3 $ 1,488 $ 76.96Lycos –Tripod 15.1 $ 992 $ 65.58Netscape.com 23.4 $ 1,500 $ 64.09Infoseek – WBS 16.2 $ 964 $ 59.50AltaVista 7.5 $ 260 $ 34.75TOTAL $ 179 $ 14,982 $ 730AVERAGE $ 18 $ 1,498 $ 73
![Page 76: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/76.jpg)
Changes and Alliances
• All search sites now offer browsing views
• All services offering free e-mail
• Yahoo offering news
• Alliances– AltaVista plus Real Name System– AltaVista plus Amazon.com– Lycos plus Barnes and Noble online– Yahoo drops AltaVista when AltaVista adds browsing
view
![Page 77: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/77.jpg)
Combining Best Features
• Of Yahoo, Infoseek, AskJeeves
• Build a knowledge base – Leverage the actual queries people issue
– An FAQ
• Offer a blend of drill-down hierarchy, knowledge base, full-text
• Search for one word yields rich result set – E.g. “Intel”
• Example: Verity’s new Knowledge Organizer
![Page 78: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/78.jpg)
Verity’s Knowledge Organizer Product
• A tool to capture and organize an organization’s online information– Build your own Yahoo and AltaVista-style search service
• Site builds its own topical taxonomy– Using a graphical user interface
• Tool indexes within categories and across them
• End user can – drill down within topics– search within and across topics
![Page 79: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/79.jpg)
A Modest Proposal: The Accidental Thesaurus
• For intranet, online product catalog, newspaper, campus sites
• Build a thesaurus based on what people look for• Don’t even try to be comprehensive• Use your search logs to find what people look
for -- and how they actually search• Fuzzy matching of user searches against
thesaurus, a la AskJeeves
![Page 80: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/80.jpg)
New Job Title: The Info Snout
• Like an Info Scout...only nosier
• Similar job as cataloging librarian...more like a pathfinder builder
• Daily routine:– Look at search logs– Find new terms, add to thesaurus– Also look at company newsletters, newspaper,
trade journals, etc
![Page 81: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/81.jpg)
Lack of Structure
• Today’s spiders effectively index every page as a separate document
• What if an OPAC did that?• The atom in a hit list should be a document,
not a page• With XML, one could define structure for
documents• But will we have one definition, or many?
![Page 82: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/82.jpg)
The Future• Much more intelligent engines
• Not much more intelligence in users
• The linear, undifferentiated hit list will die
• Cross-language
• Text, image, sound, video
• The “Star Trek” computer model of searching
![Page 83: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/83.jpg)
A Comment from the PR Person at a Major Internet
Search Service...• “I hope you are aware of our
product, and I hope your remarks will show that our product is one of the good ones, not one of the evil ones…”
• We will not name the company, but its name evokes “aurora borealis”….
![Page 84: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/84.jpg)
Infonortics Search Engines Conference
• Outstanding two day conference with leading search engine experts– From academe and from search industry
• Held April 1 in Boston; two previous conferences
• Scheduled for April 19-20, 1999– Back Bay Hilton, Boston
• See www.infonortics.com
![Page 85: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/85.jpg)
Special Thanks To...
• Judy Matthews, Michigan State University Libraries
• Lou Rosenfeld, Argus Associates
• Sue Davidsen, Michigan Electronic Library
• Julie Long, Advanced Information Consultants
![Page 86: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/86.jpg)
See Related Articles in June 1998 issue of Searcher• “Infonortics '98 Search Engines
Conference” article by Judy Matthews and Rich Wiggins: http://www.infotoday.com/searcher/jun/story4.htm
• Article & chart covering search engine trends by Susan Feldman: http://www.infotoday.com/searcher/jun/story2.htm
![Page 87: Midnight in the Garden of Good and Evil Search Engines](https://reader035.vdocuments.us/reader035/viewer/2022081506/56814dd4550346895dbb36c6/html5/thumbnails/87.jpg)
These slides will appear...
• www.nemonline.org/present/rww