© tefko saracevic, rutgers university1 web sources and library & information services finding,...

24
© Tefko Saracevic, Rutgers Universi ty 1 Web sources and library & information services Finding, evaluating and using a variety of Web sources for searching and reference

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

© Tefko Saracevic, Rutgers University 1

Web sources and library & information

services

Web sources and library & information

services

Finding, evaluating and using a variety of Web sources for searching and reference

Finding, evaluating and using a variety of Web sources for searching and reference

© Tefko Saracevic, Rutgers University 2

Similarities between Web searching & IR &

reference

Similarities between Web searching & IR &

reference• Basic principles to approach the

same– human-human interaction - interview -

• social, organizational, cognitive, affective aspects to explore including task, need …

– preparation of search concepts, terms, logic

– determination of range, restrictions– estimation of relevance

• Basic principles to approach the same– human-human interaction - interview -

• social, organizational, cognitive, affective aspects to explore including task, need …

– preparation of search concepts, terms, logic

– determination of range, restrictions– estimation of relevance

© Tefko Saracevic, Rutgers University 3

DifferencesDifferences• Vastly different sources

– as to contents, authority, reliability persistence

– variation in amounts, depth, breadth• Very different organization

– little standardization, few if any fields• Quite different search engines &

capabilities -basic & advanced– also different from engine to engine

• Differing search strategies needed

• Vastly different sources– as to contents, authority, reliability

persistence– variation in amounts, depth, breadth

• Very different organization– little standardization, few if any fields

• Quite different search engines & capabilities -basic & advanced– also different from engine to engine

• Differing search strategies needed

© Tefko Saracevic, Rutgers University 4

Also: invisible WebAlso: invisible Web

• Materials that general search engines cannot or WILL not include in their collection of Web pages (indexes)

• You cannot find through general search engines

• Contains a vast amount of information– much of it authoritative, qualitative

• Materials that general search engines cannot or WILL not include in their collection of Web pages (indexes)

• You cannot find through general search engines

• Contains a vast amount of information– much of it authoritative, qualitative

© Tefko Saracevic, Rutgers University 5

Why search engines miss?

Why search engines miss?

• Size: Web is huge, cannot cover all• Economics: associated costs are high

– also pay per crawl & rank

• Technical: still limited capabilities• Spam: eliminating bad also looses good• Restrictions: some site do not let in• Deep structure: some sites complex

• Size: Web is huge, cannot cover all• Economics: associated costs are high

– also pay per crawl & rank

• Technical: still limited capabilities• Spam: eliminating bad also looses good• Restrictions: some site do not let in• Deep structure: some sites complex

© Tefko Saracevic, Rutgers University 6

Needed for Web searching

Needed for Web searching

• Knowledge & competencies– variety of Web sources – their organization– search engines– Web search strategies– search dynamics, feedback

• Keeping up & up & up– constant updates, changes, innovations– many domain/subject specific

• Knowledge & competencies– variety of Web sources – their organization– search engines– Web search strategies– search dynamics, feedback

• Keeping up & up & up– constant updates, changes, innovations– many domain/subject specific

© Tefko Saracevic, Rutgers University 7

Web size - who knows?Web size - who knows?• Estimated over 16 million web servers

Lawrence & Giles, 1999

– But only a fraction of direct search relevance• Domains of sites

• 83% commercial, 6% scientific or educational; 3% health• 2.5% personal; 2% societies; 1.5% government, • about 1% each community, religion• 1.5% pornographic

• Web Characterization Project - OCLC – statistics, trends, report, links … for 2001 reports 8.5 mill web sites

– http://wcp.oclc.org/

• Estimated over 16 million web serversLawrence & Giles, 1999

– But only a fraction of direct search relevance• Domains of sites

• 83% commercial, 6% scientific or educational; 3% health• 2.5% personal; 2% societies; 1.5% government, • about 1% each community, religion• 1.5% pornographic

• Web Characterization Project - OCLC – statistics, trends, report, links … for 2001 reports 8.5 mill web sites

– http://wcp.oclc.org/

© Tefko Saracevic, Rutgers University 8

Organization of sourcesOrganization of sources• No standardization across sources• Major approaches in search engines

– classification: many directory types used– statistical analyses of terms, links

• Metatags in sources – to enable retrieval by fields– HTML “keywords”, “description”

• 34% of sites use them

– Dublin core - .3% sites use• Organization: hindrance to retrieval

– also faked contents to force retrieval

• No standardization across sources• Major approaches in search engines

– classification: many directory types used– statistical analyses of terms, links

• Metatags in sources – to enable retrieval by fields– HTML “keywords”, “description”

• 34% of sites use them

– Dublin core - .3% sites use• Organization: hindrance to retrieval

– also faked contents to force retrieval

© Tefko Saracevic, Rutgers University 9

Sources & search engines

Sources & search engines

• Indexed by search engines (publicly indexed) – by terms, selection, links, registration

• Not publicly indexed– many domain sources will not be found e.g digital

libraries, online journals, reference– many commercial sites will hardly be found

• Differing approaches to inclusion/selection– mostly automatic; also generic source providers– increasingly added human evaluation & selection

• Indexed by search engines (publicly indexed) – by terms, selection, links, registration

• Not publicly indexed– many domain sources will not be found e.g digital

libraries, online journals, reference– many commercial sites will hardly be found

• Differing approaches to inclusion/selection– mostly automatic; also generic source providers– increasingly added human evaluation & selection

© Tefko Saracevic, Rutgers University 10

Search engine coverageSearch engine coverage• No engine covers more than 16% of WWW• In respect to combined coverage of 11 top:

– Northern Light 38.3% ; Snap 37.1; AltaVista 37.1 HotBot 27.1 MS 20.3 Infoseek 19.2, Google 18.6, Yahoo 17.6 Excite 13.5, Lycos 5.9, EuroSeek 5.2

– HotBot MS, Snap & Yahoo use Inktomi as search provider, but have different filtering & Inktomi databases

• Northern Light has ‘special collection’ - documents not part of publicly indexabable web

• Hard to discern & compare coverage• Many national search engines - own

coverage

• No engine covers more than 16% of WWW• In respect to combined coverage of 11 top:

– Northern Light 38.3% ; Snap 37.1; AltaVista 37.1 HotBot 27.1 MS 20.3 Infoseek 19.2, Google 18.6, Yahoo 17.6 Excite 13.5, Lycos 5.9, EuroSeek 5.2

– HotBot MS, Snap & Yahoo use Inktomi as search provider, but have different filtering & Inktomi databases

• Northern Light has ‘special collection’ - documents not part of publicly indexabable web

• Hard to discern & compare coverage• Many national search engines - own

coverage

© Tefko Saracevic, Rutgers University 11

Search features among engines

Search features among engines

• Some search features the same across all but details differ - particularly in advanced– Boolean available

• but sometimes AND sometimes OR default

– Differences may be found in:• phrases, proximity, truncation, case sensitivity,

relevance feedback, field searching, special features

• term expansion to concepts (latent semantic indexing)

• Some search features the same across all but details differ - particularly in advanced– Boolean available

• but sometimes AND sometimes OR default

– Differences may be found in:• phrases, proximity, truncation, case sensitivity,

relevance feedback, field searching, special features

• term expansion to concepts (latent semantic indexing)

© Tefko Saracevic, Rutgers University 12

Search strategies & outputs

Search strategies & outputs

• Geared toward very short searches– big majority of searches 2-3 terms (av. 2.5)

• in IR av. 7-14 - making a big difference

• Directory browsing a big component - not in IR

• Geared toward limited top outputs• Ranking output by relevance predominates

– relevance calculation differ & proprietary (secret)– except Google - they published their method– affects search strategy - you guess how is done

• Geared toward very short searches– big majority of searches 2-3 terms (av. 2.5)

• in IR av. 7-14 - making a big difference

• Directory browsing a big component - not in IR

• Geared toward limited top outputs• Ranking output by relevance predominates

– relevance calculation differ & proprietary (secret)– except Google - they published their method– affects search strategy - you guess how is done

© Tefko Saracevic, Rutgers University 13

Meta search enginesMeta search engines• Search engines that cover search

engines – many around e.g.– All4one http://all4one.com/

• four windows - good for comparison– CDNET Search.com http://www.search.com/

• meta engine of meta engines - customization

• Search Engines Worldwide • 174 countries, over 1300 engines

http://www.twics.com/~takakuwa/search/search.html

• More on the horizon & differing

• Search engines that cover search engines – many around e.g.– All4one http://all4one.com/

• four windows - good for comparison– CDNET Search.com http://www.search.com/

• meta engine of meta engines - customization

• Search Engines Worldwide • 174 countries, over 1300 engines

http://www.twics.com/~takakuwa/search/search.html

• More on the horizon & differing

© Tefko Saracevic, Rutgers University 14

Specialized meta engines

Specialized meta engines

• Selective with directories & large number of databases & search engines– Complete Planet http://completeplanet.com

– Invisible Web http://invisibleweb.com

• U.S. federal information via Government Printing Office Access http://www.gpo.gov/gpoaccess – Federal Bulletin Board (file libraries for

download from many agencies): http://fedbbs.access.gpo.gov

• Selective with directories & large number of databases & search engines– Complete Planet http://completeplanet.com

– Invisible Web http://invisibleweb.com

• U.S. federal information via Government Printing Office Access http://www.gpo.gov/gpoaccess – Federal Bulletin Board (file libraries for

download from many agencies): http://fedbbs.access.gpo.gov

© Tefko Saracevic, Rutgers University 15

Reference (expert) services

Reference (expert) services

• Reference services - several models – Q&A, directories, email answers etc. – e.g.

– Martindale’s Reference Desk - comprehensivehttp://www-sci.lib.uci.edu/~martindale/Ref.html

– Ask Jeeves! – most popular http://www.ask.com/

– Ask ERIC – education questions- email answershttp://www.askeric.org/Qa/

– Information Please - almanac type questions http://www.infoplease.com/

• Academic libraries developing reference models - new service area

• Reference services - several models – Q&A, directories, email answers etc. – e.g.

– Martindale’s Reference Desk - comprehensivehttp://www-sci.lib.uci.edu/~martindale/Ref.html

– Ask Jeeves! – most popular http://www.ask.com/

– Ask ERIC – education questions- email answershttp://www.askeric.org/Qa/

– Information Please - almanac type questions http://www.infoplease.com/

• Academic libraries developing reference models - new service area

© Tefko Saracevic, Rutgers University 16

Libraries as Web sources

Libraries as Web sources

• Academic libraries providing open collections & services; models vary– Rutgers libraries - big long term effort

http://www.libraries.rutgers.edu/

– various sources & links involved• for domain information& sources go to:

– Electronic Reference Sources; Subject Research Guides: Social Sciences & Law; Library & Information Science

– University of California, Berkeley - a most elaborate effort together with Sun Corporation http://sunsite.berkeley.edu/

• Academic libraries providing open collections & services; models vary– Rutgers libraries - big long term effort

http://www.libraries.rutgers.edu/

– various sources & links involved• for domain information& sources go to:

– Electronic Reference Sources; Subject Research Guides: Social Sciences & Law; Library & Information Science

– University of California, Berkeley - a most elaborate effort together with Sun Corporation http://sunsite.berkeley.edu/

© Tefko Saracevic, Rutgers University 17

Virtual libraries on the Web

Virtual libraries on the Web

• Libraries emerging only on the Web– More & more libraries & organizations involved

Examples of academic & public libraries– Virtual Library - Switzerland, US, UK & other

countries – ‘oldest virtual library on the Web’• http://vlib.org

– Toronto Public Library• http://vrl.tpl.toronto.on.ca/

– Internet Public Library, Michigan• http://www.ipl.org/

• Libraries emerging only on the Web– More & more libraries & organizations involved

Examples of academic & public libraries– Virtual Library - Switzerland, US, UK & other

countries – ‘oldest virtual library on the Web’• http://vlib.org

– Toronto Public Library• http://vrl.tpl.toronto.on.ca/

– Internet Public Library, Michigan• http://www.ipl.org/

© Tefko Saracevic, Rutgers University 18

Domain sitesDomain sites

• Many domain/issue specific sites– rich & often unique coverage & services– different approaches & requirements

• Examples in health related domains:– Medscape - registration required

http://www.medscape.com/

– Rxlist - The Internet Drug Indexhttp://www.rxlist.com/

– Mayo Clinic HealthOasis http://www.mayohealth.org/

• Many domain/issue specific sites– rich & often unique coverage & services– different approaches & requirements

• Examples in health related domains:– Medscape - registration required

http://www.medscape.com/

– Rxlist - The Internet Drug Indexhttp://www.rxlist.com/

– Mayo Clinic HealthOasis http://www.mayohealth.org/

© Tefko Saracevic, Rutgers University 19

Societies, organizations , publishers

Societies, organizations , publishers

• Great many rich sources for searching– differences in requirements, depth, richness

Examples from variety of organizations:– Assoc. for Computing Machinery

http://www.acm.org/• Digital Library; subscription or registration

– State department http://www.state.gov/• about the U.S & other countries

– R.R. Bowker http://www.bowker.com/• Free Resources from Bowker; Library Resource Guide

– Genealogy: http://www.familysearch.org/

• Great many rich sources for searching– differences in requirements, depth, richness

Examples from variety of organizations:– Assoc. for Computing Machinery

http://www.acm.org/• Digital Library; subscription or registration

– State department http://www.state.gov/• about the U.S & other countries

– R.R. Bowker http://www.bowker.com/• Free Resources from Bowker; Library Resource Guide

– Genealogy: http://www.familysearch.org/

© Tefko Saracevic, Rutgers University 20

Language barriers on the Web

Language barriers on the Web

• English still the major language– but declining, now slightly over 50%

• Multilingual retrieval search engines– Euroseek – searches 40 languages

http://www.euroseek.com/

– All the Web – 45 languages http://www.alltheweb.com/

– in both, search in different languages covers primarily their language sources

• English still the major language– but declining, now slightly over 50%

• Multilingual retrieval search engines– Euroseek – searches 40 languages

http://www.euroseek.com/

– All the Web – 45 languages http://www.alltheweb.com/

– in both, search in different languages covers primarily their language sources

© Tefko Saracevic, Rutgers University 21

Language barriers: translations

Language barriers: translations

• A number of translation sites – machine aided – i.e. plug in terms,

phrases, sentences in one & review in the other language , but effectiveness???

– Free Translations http://www.freetranslations.com

– Babel Fish http://babelfish.altavista.com/tr

– Travlang – great for travelers – phrases http://www.travlang.com

• A number of translation sites – machine aided – i.e. plug in terms,

phrases, sentences in one & review in the other language , but effectiveness???

– Free Translations http://www.freetranslations.com

– Babel Fish http://babelfish.altavista.com/tr

– Travlang – great for travelers – phrases http://www.travlang.com

© Tefko Saracevic, Rutgers University 22

Key professional competencies

Key professional competencies

• Knowledge of SOURCES in area of interest• search engines not enough• not too helpful in finding these other sources;

structure hard to discern

• Evaluation of sources – a key professional skill!

• standard criteria: quality, veracity, coverage etc • plus Web criteria: authority; accuracy; currency (timeliness);

objectivity; coverage, persistence, usability – http://www.otterbein.edu/learning/libpages/subeval.htm

• Knowledge of SOURCES in area of interest• search engines not enough• not too helpful in finding these other sources;

structure hard to discern

• Evaluation of sources – a key professional skill!

• standard criteria: quality, veracity, coverage etc • plus Web criteria: authority; accuracy; currency (timeliness);

objectivity; coverage, persistence, usability – http://www.otterbein.edu/learning/libpages/subeval.htm

© Tefko Saracevic, Rutgers University 23

competencies …competencies …

• Knowledge of users & use• Knowledge of searching• Use of technology• Adaptability, flexibility• Integration with other resources• Teaching others • Constant learning & update

• Knowledge of users & use• Knowledge of searching• Use of technology• Adaptability, flexibility• Integration with other resources• Teaching others • Constant learning & update

© Tefko Saracevic, Rutgers University 24