© tefko saracevic, rutgers university1 web sources and library & information services finding,...
Post on 21-Dec-2015
215 views
TRANSCRIPT
© Tefko Saracevic, Rutgers University 1
Web sources and library & information
services
Web sources and library & information
services
Finding, evaluating and using a variety of Web sources for searching and reference
Finding, evaluating and using a variety of Web sources for searching and reference
© Tefko Saracevic, Rutgers University 2
Similarities between Web searching & IR &
reference
Similarities between Web searching & IR &
reference• Basic principles to approach the
same– human-human interaction - interview -
• social, organizational, cognitive, affective aspects to explore including task, need …
– preparation of search concepts, terms, logic
– determination of range, restrictions– estimation of relevance
• Basic principles to approach the same– human-human interaction - interview -
• social, organizational, cognitive, affective aspects to explore including task, need …
– preparation of search concepts, terms, logic
– determination of range, restrictions– estimation of relevance
© Tefko Saracevic, Rutgers University 3
DifferencesDifferences• Vastly different sources
– as to contents, authority, reliability persistence
– variation in amounts, depth, breadth• Very different organization
– little standardization, few if any fields• Quite different search engines &
capabilities -basic & advanced– also different from engine to engine
• Differing search strategies needed
• Vastly different sources– as to contents, authority, reliability
persistence– variation in amounts, depth, breadth
• Very different organization– little standardization, few if any fields
• Quite different search engines & capabilities -basic & advanced– also different from engine to engine
• Differing search strategies needed
© Tefko Saracevic, Rutgers University 4
Also: invisible WebAlso: invisible Web
• Materials that general search engines cannot or WILL not include in their collection of Web pages (indexes)
• You cannot find through general search engines
• Contains a vast amount of information– much of it authoritative, qualitative
• Materials that general search engines cannot or WILL not include in their collection of Web pages (indexes)
• You cannot find through general search engines
• Contains a vast amount of information– much of it authoritative, qualitative
© Tefko Saracevic, Rutgers University 5
Why search engines miss?
Why search engines miss?
• Size: Web is huge, cannot cover all• Economics: associated costs are high
– also pay per crawl & rank
• Technical: still limited capabilities• Spam: eliminating bad also looses good• Restrictions: some site do not let in• Deep structure: some sites complex
• Size: Web is huge, cannot cover all• Economics: associated costs are high
– also pay per crawl & rank
• Technical: still limited capabilities• Spam: eliminating bad also looses good• Restrictions: some site do not let in• Deep structure: some sites complex
© Tefko Saracevic, Rutgers University 6
Needed for Web searching
Needed for Web searching
• Knowledge & competencies– variety of Web sources – their organization– search engines– Web search strategies– search dynamics, feedback
• Keeping up & up & up– constant updates, changes, innovations– many domain/subject specific
• Knowledge & competencies– variety of Web sources – their organization– search engines– Web search strategies– search dynamics, feedback
• Keeping up & up & up– constant updates, changes, innovations– many domain/subject specific
© Tefko Saracevic, Rutgers University 7
Web size - who knows?Web size - who knows?• Estimated over 16 million web servers
Lawrence & Giles, 1999
– But only a fraction of direct search relevance• Domains of sites
• 83% commercial, 6% scientific or educational; 3% health• 2.5% personal; 2% societies; 1.5% government, • about 1% each community, religion• 1.5% pornographic
• Web Characterization Project - OCLC – statistics, trends, report, links … for 2001 reports 8.5 mill web sites
– http://wcp.oclc.org/
• Estimated over 16 million web serversLawrence & Giles, 1999
– But only a fraction of direct search relevance• Domains of sites
• 83% commercial, 6% scientific or educational; 3% health• 2.5% personal; 2% societies; 1.5% government, • about 1% each community, religion• 1.5% pornographic
• Web Characterization Project - OCLC – statistics, trends, report, links … for 2001 reports 8.5 mill web sites
– http://wcp.oclc.org/
© Tefko Saracevic, Rutgers University 8
Organization of sourcesOrganization of sources• No standardization across sources• Major approaches in search engines
– classification: many directory types used– statistical analyses of terms, links
• Metatags in sources – to enable retrieval by fields– HTML “keywords”, “description”
• 34% of sites use them
– Dublin core - .3% sites use• Organization: hindrance to retrieval
– also faked contents to force retrieval
• No standardization across sources• Major approaches in search engines
– classification: many directory types used– statistical analyses of terms, links
• Metatags in sources – to enable retrieval by fields– HTML “keywords”, “description”
• 34% of sites use them
– Dublin core - .3% sites use• Organization: hindrance to retrieval
– also faked contents to force retrieval
© Tefko Saracevic, Rutgers University 9
Sources & search engines
Sources & search engines
• Indexed by search engines (publicly indexed) – by terms, selection, links, registration
• Not publicly indexed– many domain sources will not be found e.g digital
libraries, online journals, reference– many commercial sites will hardly be found
• Differing approaches to inclusion/selection– mostly automatic; also generic source providers– increasingly added human evaluation & selection
• Indexed by search engines (publicly indexed) – by terms, selection, links, registration
• Not publicly indexed– many domain sources will not be found e.g digital
libraries, online journals, reference– many commercial sites will hardly be found
• Differing approaches to inclusion/selection– mostly automatic; also generic source providers– increasingly added human evaluation & selection
© Tefko Saracevic, Rutgers University 10
Search engine coverageSearch engine coverage• No engine covers more than 16% of WWW• In respect to combined coverage of 11 top:
– Northern Light 38.3% ; Snap 37.1; AltaVista 37.1 HotBot 27.1 MS 20.3 Infoseek 19.2, Google 18.6, Yahoo 17.6 Excite 13.5, Lycos 5.9, EuroSeek 5.2
– HotBot MS, Snap & Yahoo use Inktomi as search provider, but have different filtering & Inktomi databases
• Northern Light has ‘special collection’ - documents not part of publicly indexabable web
• Hard to discern & compare coverage• Many national search engines - own
coverage
• No engine covers more than 16% of WWW• In respect to combined coverage of 11 top:
– Northern Light 38.3% ; Snap 37.1; AltaVista 37.1 HotBot 27.1 MS 20.3 Infoseek 19.2, Google 18.6, Yahoo 17.6 Excite 13.5, Lycos 5.9, EuroSeek 5.2
– HotBot MS, Snap & Yahoo use Inktomi as search provider, but have different filtering & Inktomi databases
• Northern Light has ‘special collection’ - documents not part of publicly indexabable web
• Hard to discern & compare coverage• Many national search engines - own
coverage
© Tefko Saracevic, Rutgers University 11
Search features among engines
Search features among engines
• Some search features the same across all but details differ - particularly in advanced– Boolean available
• but sometimes AND sometimes OR default
– Differences may be found in:• phrases, proximity, truncation, case sensitivity,
relevance feedback, field searching, special features
• term expansion to concepts (latent semantic indexing)
• Some search features the same across all but details differ - particularly in advanced– Boolean available
• but sometimes AND sometimes OR default
– Differences may be found in:• phrases, proximity, truncation, case sensitivity,
relevance feedback, field searching, special features
• term expansion to concepts (latent semantic indexing)
© Tefko Saracevic, Rutgers University 12
Search strategies & outputs
Search strategies & outputs
• Geared toward very short searches– big majority of searches 2-3 terms (av. 2.5)
• in IR av. 7-14 - making a big difference
• Directory browsing a big component - not in IR
• Geared toward limited top outputs• Ranking output by relevance predominates
– relevance calculation differ & proprietary (secret)– except Google - they published their method– affects search strategy - you guess how is done
• Geared toward very short searches– big majority of searches 2-3 terms (av. 2.5)
• in IR av. 7-14 - making a big difference
• Directory browsing a big component - not in IR
• Geared toward limited top outputs• Ranking output by relevance predominates
– relevance calculation differ & proprietary (secret)– except Google - they published their method– affects search strategy - you guess how is done
© Tefko Saracevic, Rutgers University 13
Meta search enginesMeta search engines• Search engines that cover search
engines – many around e.g.– All4one http://all4one.com/
• four windows - good for comparison– CDNET Search.com http://www.search.com/
• meta engine of meta engines - customization
• Search Engines Worldwide • 174 countries, over 1300 engines
http://www.twics.com/~takakuwa/search/search.html
• More on the horizon & differing
• Search engines that cover search engines – many around e.g.– All4one http://all4one.com/
• four windows - good for comparison– CDNET Search.com http://www.search.com/
• meta engine of meta engines - customization
• Search Engines Worldwide • 174 countries, over 1300 engines
http://www.twics.com/~takakuwa/search/search.html
• More on the horizon & differing
© Tefko Saracevic, Rutgers University 14
Specialized meta engines
Specialized meta engines
• Selective with directories & large number of databases & search engines– Complete Planet http://completeplanet.com
– Invisible Web http://invisibleweb.com
• U.S. federal information via Government Printing Office Access http://www.gpo.gov/gpoaccess – Federal Bulletin Board (file libraries for
download from many agencies): http://fedbbs.access.gpo.gov
• Selective with directories & large number of databases & search engines– Complete Planet http://completeplanet.com
– Invisible Web http://invisibleweb.com
• U.S. federal information via Government Printing Office Access http://www.gpo.gov/gpoaccess – Federal Bulletin Board (file libraries for
download from many agencies): http://fedbbs.access.gpo.gov
© Tefko Saracevic, Rutgers University 15
Reference (expert) services
Reference (expert) services
• Reference services - several models – Q&A, directories, email answers etc. – e.g.
– Martindale’s Reference Desk - comprehensivehttp://www-sci.lib.uci.edu/~martindale/Ref.html
– Ask Jeeves! – most popular http://www.ask.com/
– Ask ERIC – education questions- email answershttp://www.askeric.org/Qa/
– Information Please - almanac type questions http://www.infoplease.com/
• Academic libraries developing reference models - new service area
• Reference services - several models – Q&A, directories, email answers etc. – e.g.
– Martindale’s Reference Desk - comprehensivehttp://www-sci.lib.uci.edu/~martindale/Ref.html
– Ask Jeeves! – most popular http://www.ask.com/
– Ask ERIC – education questions- email answershttp://www.askeric.org/Qa/
– Information Please - almanac type questions http://www.infoplease.com/
• Academic libraries developing reference models - new service area
© Tefko Saracevic, Rutgers University 16
Libraries as Web sources
Libraries as Web sources
• Academic libraries providing open collections & services; models vary– Rutgers libraries - big long term effort
http://www.libraries.rutgers.edu/
– various sources & links involved• for domain information& sources go to:
– Electronic Reference Sources; Subject Research Guides: Social Sciences & Law; Library & Information Science
– University of California, Berkeley - a most elaborate effort together with Sun Corporation http://sunsite.berkeley.edu/
• Academic libraries providing open collections & services; models vary– Rutgers libraries - big long term effort
http://www.libraries.rutgers.edu/
– various sources & links involved• for domain information& sources go to:
– Electronic Reference Sources; Subject Research Guides: Social Sciences & Law; Library & Information Science
– University of California, Berkeley - a most elaborate effort together with Sun Corporation http://sunsite.berkeley.edu/
© Tefko Saracevic, Rutgers University 17
Virtual libraries on the Web
Virtual libraries on the Web
• Libraries emerging only on the Web– More & more libraries & organizations involved
Examples of academic & public libraries– Virtual Library - Switzerland, US, UK & other
countries – ‘oldest virtual library on the Web’• http://vlib.org
– Toronto Public Library• http://vrl.tpl.toronto.on.ca/
– Internet Public Library, Michigan• http://www.ipl.org/
• Libraries emerging only on the Web– More & more libraries & organizations involved
Examples of academic & public libraries– Virtual Library - Switzerland, US, UK & other
countries – ‘oldest virtual library on the Web’• http://vlib.org
– Toronto Public Library• http://vrl.tpl.toronto.on.ca/
– Internet Public Library, Michigan• http://www.ipl.org/
© Tefko Saracevic, Rutgers University 18
Domain sitesDomain sites
• Many domain/issue specific sites– rich & often unique coverage & services– different approaches & requirements
• Examples in health related domains:– Medscape - registration required
http://www.medscape.com/
– Rxlist - The Internet Drug Indexhttp://www.rxlist.com/
– Mayo Clinic HealthOasis http://www.mayohealth.org/
• Many domain/issue specific sites– rich & often unique coverage & services– different approaches & requirements
• Examples in health related domains:– Medscape - registration required
http://www.medscape.com/
– Rxlist - The Internet Drug Indexhttp://www.rxlist.com/
– Mayo Clinic HealthOasis http://www.mayohealth.org/
© Tefko Saracevic, Rutgers University 19
Societies, organizations , publishers
Societies, organizations , publishers
• Great many rich sources for searching– differences in requirements, depth, richness
Examples from variety of organizations:– Assoc. for Computing Machinery
http://www.acm.org/• Digital Library; subscription or registration
– State department http://www.state.gov/• about the U.S & other countries
– R.R. Bowker http://www.bowker.com/• Free Resources from Bowker; Library Resource Guide
– Genealogy: http://www.familysearch.org/
• Great many rich sources for searching– differences in requirements, depth, richness
Examples from variety of organizations:– Assoc. for Computing Machinery
http://www.acm.org/• Digital Library; subscription or registration
– State department http://www.state.gov/• about the U.S & other countries
– R.R. Bowker http://www.bowker.com/• Free Resources from Bowker; Library Resource Guide
– Genealogy: http://www.familysearch.org/
© Tefko Saracevic, Rutgers University 20
Language barriers on the Web
Language barriers on the Web
• English still the major language– but declining, now slightly over 50%
• Multilingual retrieval search engines– Euroseek – searches 40 languages
http://www.euroseek.com/
– All the Web – 45 languages http://www.alltheweb.com/
– in both, search in different languages covers primarily their language sources
• English still the major language– but declining, now slightly over 50%
• Multilingual retrieval search engines– Euroseek – searches 40 languages
http://www.euroseek.com/
– All the Web – 45 languages http://www.alltheweb.com/
– in both, search in different languages covers primarily their language sources
© Tefko Saracevic, Rutgers University 21
Language barriers: translations
Language barriers: translations
• A number of translation sites – machine aided – i.e. plug in terms,
phrases, sentences in one & review in the other language , but effectiveness???
– Free Translations http://www.freetranslations.com
– Babel Fish http://babelfish.altavista.com/tr
– Travlang – great for travelers – phrases http://www.travlang.com
• A number of translation sites – machine aided – i.e. plug in terms,
phrases, sentences in one & review in the other language , but effectiveness???
– Free Translations http://www.freetranslations.com
– Babel Fish http://babelfish.altavista.com/tr
– Travlang – great for travelers – phrases http://www.travlang.com
© Tefko Saracevic, Rutgers University 22
Key professional competencies
Key professional competencies
• Knowledge of SOURCES in area of interest• search engines not enough• not too helpful in finding these other sources;
structure hard to discern
• Evaluation of sources – a key professional skill!
• standard criteria: quality, veracity, coverage etc • plus Web criteria: authority; accuracy; currency (timeliness);
objectivity; coverage, persistence, usability – http://www.otterbein.edu/learning/libpages/subeval.htm
• Knowledge of SOURCES in area of interest• search engines not enough• not too helpful in finding these other sources;
structure hard to discern
• Evaluation of sources – a key professional skill!
• standard criteria: quality, veracity, coverage etc • plus Web criteria: authority; accuracy; currency (timeliness);
objectivity; coverage, persistence, usability – http://www.otterbein.edu/learning/libpages/subeval.htm
© Tefko Saracevic, Rutgers University 23
competencies …competencies …
• Knowledge of users & use• Knowledge of searching• Use of technology• Adaptability, flexibility• Integration with other resources• Teaching others • Constant learning & update
• Knowledge of users & use• Knowledge of searching• Use of technology• Adaptability, flexibility• Integration with other resources• Teaching others • Constant learning & update