search engines jan damsgaard dept. of informatics copenhagen business school
Post on 20-Dec-2015
214 Views
Preview:
TRANSCRIPT
Search Engines
Jan DamsgaardDept. of Informatics
Copenhagen Business Schoolhttp://www.cbs.dk/staff/damsgaard
EBUSS Jan Damsgaard, 2004
Introduction How to find relevant information on the
web a major problem Size, growth, lack of universal semantic
organization major impediments Two major strategies
1. Improve users’ search capability by using raw computer power: search engines
2. Help organize user relevant information into meaningful categories and bundles of services: portals
EBUSS Jan Damsgaard, 2004
Definitions Search engine
– Specific information retrieval software which provides as results URL and descriptions web pages
Portal– Site that forms a major site for users when they
connect to web; portals combine directories, services and search capabilities and personalization
EBUSS Jan Damsgaard, 2004
Search Engines
Technical and business solutions that provide these services on a mass scale are important internet phenomena for two reasons:– 1) they obtain immense hit rates and therefore are major
points of origin for any internet activity
– 2) they are most important means to channel user search and retrieval
– Therefore they are strategically important as reflected in the valuations of the search engine companies in the market
www.mediametrix.com www.nielsen-netratings.com
msn.dk 1.365.657
dr.dk 863.095
krak.dk 794.017
tv2.dk 540.977
eniro.dk 496.780
ekstrabladet.dk 480.470
ofir.dk 454.639
tdconline.dk 412.923
bt.dk 336.704
sol.dk 317.125
netdoktor.dk (26) 103.669
FDIM (top ti)
EBUSS Jan Damsgaard, 2004
Look at the stickiness
Top 10 sites in November 2000 in terms of minutes spend per month
EBUSS Jan Damsgaard, 2004
Where Do Search Engines Develop Market Value?
Market recognition, leading to– popular use and adoption
– selling add impressions
– long term contracts for search engine functionality
Market assessment of real options associated with the recognition of the tool in the marketplace– future value-added alliance and spin-offs
EBUSS Jan Damsgaard, 2004
Search engine basics
Basic information retrieval techniques Market trends and capabilities Awareness of popular assessment metrics
for search engine performance Search engine business models
EBUSS Jan Damsgaard, 2004
How Search Engines Work
Three components: – spider or link crawler software agent– index or catalog database of content– search engine software or combined meta-search
engine Require significant hardware horsepower,
server connectivity and database capabilities If not connected, you submit your links
EBUSS Jan Damsgaard, 2004
How do search engines work
Add keywords to text fields Critical is the choice of the keywords,
possibilities of their combination and how the search engine exploits the results
Multilingual support Another issue is how it organizes search result
The most popular search engines
Search EngineTotal from
Dec. 2002
Total from March 2002
Total from Aug. 2001
Google 9,732 8,371 6,567
AlltheWeb 6,757 4,388 4,969
AltaVista 5,419 3,432 3,112
WiseNut 4,664 5,009 4,587
HotBot 3,680 2,869 3,277
MSN Search 3,267 2,523 3,005
Teoma 3,259 1,839 2,219
NLResearch 2,352 3,610 3,321
Gigablast 2,352 NA NA
EBUSS Jan Damsgaard, 2004
Popularity over time March 2002:Google, WiseNut, AlltheWeb August 2001:Google, Fast, WiseNut April 2001:Google, Fast, MSN (Inktomi) Oct. 2000:Fast, Google, Northern Light July 2000:iWon, Google, AltaVista April 2000:Fast, AltaVista, Northern Light Feb. 2000:Fast, Northern Light, AltaVista Jan. 2000:Fast, Northern Light, AltaVista Nov. 1999:Northern Light, Fast, AltaVista Sept. 1999:Fast, Northern Light, AltaVista Aug. 1999:Fast, Northern Light, AltaVista
May 1999:Northern Light, AltaVista, Anzwers March 1999:Northern Light, AltaVista, HotBot January 1999:Northern Light, AltaVista, HotBot August 1998:AltaVista, Northern Light, HotBot May 1998:AltaVista, HotBot, Northern Light February 1998: HotBot, AltaVista, Northern Light October 1997:AltaVista, HotBot, Northern Light September 1997:Northern Light, Excite, HotBot June 1997:HotBot, AltaVista, Infoseek October 1996:HotBot, Excite, AltaVista
http://searchengineshowdown.com/stats/size.shtml
EBUSS Jan Damsgaard, 2004
Also specific services E.g. Google provides
– Find pdf files– Stock quotes– Cached links– Similar pages– Who links to you– Specific site– Dictionary definitions– Find Maps
Major design issues: completeness and relevance
The set of relevant repliesThe set of obtained results
The larger the overlap the better in terms of completeness
The smaller the set of not relevantReplies the more relevant search
How to organize the results for fast reviewing
EBUSS Jan Damsgaard, 2004
Page Ranking for Relevance Biased or unbiased by search engine? The size of the search space (pages e.g. google
addresses currently 1,346,966,000 pages) Use of keywords: in title, meta-tags information in
HTML code, or near top of the page Use of other facilities like semantic nets or reliability
indices (E.g. google uses page ranks and filtering) Daily, weekly, monthly WebCrawler software refresher For an analysis see http://www.notess.com/search/
EBUSS Jan Damsgaard, 2004
Special features of search engines
Multi-lingua searches Natural language interfaces Image searches Agents (specific crawlers and service
providers, e-mail, news agents, shopping and trading agents)
EBUSS Jan Damsgaard, 2004
Search Assistance Features
Phrase Searching– finds terms you enter into the search box as a phrase; tells you
in results whether any full or partial matches found
Stemming– Ability for search engine to search for variations of word based
on stem Entering "swim" might also find "swims" and maybe "swimming,"
depending on the search engine, in some other languages more important
Some search engines have stemming switched on by default
Clustering – Allows only one page per site to be represented in the results
top related