evolving internet technologies: web search engines danny sullivan editor, searchenginewatch.com

39
Evolving Internet Technologies: Web Search Engines Danny Sullivan Editor, SearchEngineWatch.com http://searchenginewatch.com/

Upload: stacy-darcy

Post on 15-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Evolving Internet Technologies:Web Search Engines

Danny SullivanEditor, SearchEngineWatch.com

http://searchenginewatch.com/

Overview

Key “technology” in 2001 was survival

Crawlers replacing humans New & old players to watch 11 September & Mindreading Other Things

RIP 2001Go.com (Infoseek)You were one of the first web-wide spiders andlater added your own human directory of sites

NBCi (Snap)You provided your own human-compiledguide to the web

ExciteYou were another of the oldest web spiders to finally cease crawling

Search Economics

Economics is boring but important! Makes search engines viable; may impact results

Banner ads no longer sell

Listing services new way to make money Allow much needed “conversation” between

search engines and site owners…

But more interactive with results than banners,so searchers and site owners have new fears

What’s offered & should you worry?

Paid Placement

Buy your way to the top

All sell it, even Google

Overture (GoTo) sells for AOL, AltaVista, Ask, HotBot, Lycos, Yahoo and others

In Europe, Espotting sells for Yahoo, Lycos, others

Paid Placement Concerns

Users don’t really seem to mind -- yet

Similarity to “editorial” may cause distrust Main reason behind FTC complaint last July Ask Jeeves, Lycos recently improved labels

Why deny users top sites, if they don’t pay???

Heavy “ad break” might drive users away…

Meta Search or Meta Ads?

Meta Ads Dogpile, http://www.dogpile.com Search.com, http://www.search.com Mamma, Metacrawler like above

Meta Search Vivisimo, http://vivisimo.com IxQuick, http://www.ixquick.com qbSearch, http://www.qbsearch.com SurfWax, http://www.surfwax.com

Paid Submission

Pay to get your site reviewed quickly

No guaranteed ranking – no guarantee to even be included!

Yahoo and LookSmart both offer Mandatory for business categories

Annual charge at Yahoo: Yellow Pages

Paid Submission Concerns

Is it fair to miss some businesses?

How many florists do you want? 100, 1000?

What about non-profits, hobbyists?

Non-commercial categories exempt at Yahoo

LookSmart’s use of Zeal.com feeds itsnon-commercial listings, give good balance

Paid Inclusion

Get deeper representation in listingsand with crawlers, faster revisits

Usually doesn’t guarantee rankings, but…

Like having more tickets in the lottery – more chances to win

Every major crawler but Google sells this, as does LookSmart

Paid Inclusion: Example

Inktomi: $39 gets first URL listed in2 days, revisited each week Want more, $12-15 each, or CPC pricing No pay? Still might get included,

anyway Program has provisions for non-profits No rank boost

Paid Inclusion Concerns

Will we see important sites / pages dropped just because they don’t pay? That works against users and site owners

Fair those who pay better represented? The “real” world works this way

Northern Light worked this way for years

May depend on a case-by-case basis

Humans Were Supreme

From start of popular use of the web, human-powered Yahoo has been top search site

Why? It helped you refine. Search for “travel” gave 10 categories rather than 10 million results

Yahoo “seemed” to find things when it actually gave you less but forced you to be more specific

Others followed Yahoo’s lead…

Rise & Fall Of Humans

By 2000, 5 major human directories “powered” 6 of top 10 search engines

But now, 3 directories power 4 of top 10 Yahoo, MSN, Netscape, LookSmart

AOL, Lycos, Ask Jeeves abandoned humans in 2001/2002

Why the change?

Human Weaknesses

Editors cost money Go and NBCi ran out of this in 2001 Ask also scaled back on human answers

Machines can now do some of what humans originally did… “Related Searches” refine queries in the

way categories did “Autocategorization” also refines…

Auto-Categorization

Group pages into categories, on the fly One reason why Teoma, Wisenut and

Vivisimo get good reviews Not news to Northern Light!

Google says not necessary, but we’ll see They find human categorization better

(directory tab)

But Humans Still Involved Crawlers better at being “human” because they

leverage human work more than in the past Human-made links used to determine importance Links used to determine context of pages Links used to autocategorize into “communities”

Crawlers also dependent on directories, giving them great weight in considering how to rank So what happens now that Yahoo & LookSmart are

more commercial? What happens if the Open Directory fails?

Who’s New: Crawlers

Teoma.com Potential there, but will Ask have the funds

and know how (this time) to make it happen

Coverage is set to grow; of course, paid inclusion was first “improvement” shown

WiseNut.com LookSmart set to buy it; will this solve the

freshness issue?

Who’s Still Hot

Google (and Google Toolbar) Everything they do is magic Good: finally, a tool you can learn and depend

on

AllTheWeb.com Big improvements recently; take another look

Don’t forget to visit Yahoo’s categories or surprise yourself with a search at MSN

Lessons Of 11 September

People hit Google & others for news

Terms included: cnn, news, world trade center, bbc, reuters, msnbc, sky news, new york times, pentagon, bin laden, american airlines, united airlines

What did they get?…

Google results, 4 hours after attack

FAST/AllTheWeb.com results, 4 hours after attack

Ask Jeeves results, 4 hours after attack

But at AltaVista, less than 2 hours after the attack…

“Blended” results mixed in news content,even if news option on home page

had been ignored

Why Did AltaVista Succeed?

We know historically that home page optionsDO get ignored, but we learn this again onSept. 11, by watching Google and others

Read My Mind

Sept. 11 dramatically illustrates the main search challenge – the need to somehow automatically hit the correct dataset Images for search on “pictures of spain” MP3 files for search on “madonna” Movie info for search on Harry Potter

this month How NOT to do it, then good examples…

Examples Of Mind Reading

Smart query analysis, then suggestionsor insertions of non-web material Products at AltaVista Sidebar results at AllTheWeb News, dictionary, stock & more at Google Encyclopedia at MSN

Careful not to take away all control Power search for few who want to drive

Specialty Search / Vortals

To mind read, you need specialty datasets Among the majors, Google & AllTheWeb

pushing here, & I think they’ll keep going Also think (and hope) we’ll still see more

“vortals” or “vertical portals” Moreover, http://www.moreover.com MessageKing, http://www.messageking.com xrefer, http://www.xrefer.com LawCrawler, http://www.lawcrawler.com

Other Issues & Trends

Freshness Size “Off the page” ranking criteria

Feeling For Freshness

AllTheWeb pushed end of last year to be 9-12 days old at most – now more likely to be a month, like others

Google & others aiming to be less than a month old or fresher for key documents

Just show dates when pages were visited!

Still Growing, But Still Missing

The leaders? Google, 1.1 to 1.6 billion documents AllTheWeb, 625 million

Large index probably more comprehensive

We do want more index growth! However, don’t judge a search engine

only based on its index size…

Does Size Matter?

To professionals, yes. Coverage helps them find unusual or obscure material What good is half a haystack?

To average users, not really. They desperately need better relevancy. How about I dump a haystack on your head? 100 million extra pages makes no difference to

best matches for “horoscopes” or “britney spears”

“Off The Page” Ranking

Looking beyond content of the page,since webmasters can’t easily control this

Link analysis still going strong But can produce oddities, like infamous Bush

result Under new pressure from link spammers

Clickthrough measurements not as hot Personalization might get revived with Google

Past fears would limit results, rather than help

Some Closing Thoughts

Yet we’ve had them less than 10 years!

Answers to everything weren’t on web before, aren’t now and never will be, so…

Search engines are the top resource for Americans seeking answers, used 32% of the time

--Consumer Daily Question Study, Fall 2000

Just One Of Many Tools

Don’t expect miracles from search engines

They’re great “Swiss Army Knives,”but you’ll still want an entire toolbox

My hot search tools? Telephone & Email! Use them to avoid “search rage” Stop searching after 10 minutes

and try other means. Also…

Be Non-Traditional

Forget Boolean, please Don’t cast your net wide

You don’t need every synonym in your query…

Instead, explore what’s in the first catch! Unlike traditional tools, web documents LINK A few good pages usually lead you to more

good pages – your answer may be a few clicks away

You’ll also find links bring you to documents that contain the synonyms you would have tried

http://searchenginewatch.com

This Presentation - http://calafia.com/presentations

Search Engine Watch - http://searchenginewatch.com

Web Searching Tips – Search Engine Listings

Free Search Engine Newsletters(SearchDay – Search Engine Report)

Become A Member – You Support Me & Chris Sherman!(and get some extra benefits)