the future of the online catalog andrew k. pace ncsu libraries july 28, 2006 library automation:...

36
The Future of the The Future of the Online Catalog Online Catalog Andrew K. Pace Andrew K. Pace NCSU Libraries NCSU Libraries July 28, 2006 July 28, 2006 Library Library Automation: Automation: Yesterday’s Yesterday’s Technology, Technology, Tomorrow Tomorrow

Upload: marvin-gray

Post on 23-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

The Future of the The Future of the Online CatalogOnline Catalog

Andrew K. Pace Andrew K. Pace

NCSU LibrariesNCSU Libraries

July 28, 2006July 28, 2006

Library Automation: Library Automation: Yesterday’s Yesterday’s Technology, Technology, TomorrowTomorrow

What I will cover: Online catalog: the problemOnline catalog: the problem Brief environmental scanBrief environmental scan Endeca: team, timeline, technologyEndeca: team, timeline, technology Usability, statistical results, relevance Usability, statistical results, relevance

studystudy Dis-integrated systems / Future Dis-integrated systems / Future

CatalogsCatalogs

What ILS Catalogs Do Well…(liberally stolen from Roy Tennant)

Inventory control: What and whereInventory control: What and where Known item searchingKnown item searching

Any search other than known itemAny search other than known item Most Anything other than books (serials, Most Anything other than books (serials,

e-resources, articles, digital objects)e-resources, articles, digital objects) Logical groupings of results (e.g. FRBR)Logical groupings of results (e.g. FRBR) Faceted browsingFaceted browsing Relevance rankingRelevance ranking Sideways searching (suggestions, Sideways searching (suggestions,

expansion of searches and search targets)expansion of searches and search targets)

What ILS Catalogs Don’t do Well…(liberally stolen from Roy Tennant, and augmented by me)

“OPAC Complainers”““There is certainly no dearth of OPAC There is certainly no dearth of OPAC complainers. You have Andrew Pace (OPACs complainers. You have Andrew Pace (OPACs suck), and Roy Tennant (You Can’t Put Lipstick suck), and Roy Tennant (You Can’t Put Lipstick on a Pig) writing and presenting about the on a Pig) writing and presenting about the need for change (more simplicity) in the OPAC need for change (more simplicity) in the OPAC world. I can appreciate their arguments for a world. I can appreciate their arguments for a simpler OPAC (not to mention the rest of the simpler OPAC (not to mention the rest of the system) but other then [system) but other then [sicsic] present their ] present their arguments, neither has much in the way of arguments, neither has much in the way of suggestions nor have they sparked a suggestions nor have they sparked a movement among librarians or the automation movement among librarians or the automation vendors to do anything about the situation.”vendors to do anything about the situation.”

-ACRL Blog entry-ACRL Blog entryOct. 13 2005Oct. 13 2005

NextGen Library Search Tools RedLightGreen (RLG)RedLightGreen (RLG) OCLC FictionfinderOCLC Fictionfinder Vivisimo clustered Vivisimo clustered

search (Ex Libris, search (Ex Libris, Serials Soltions)Serials Soltions)

Grokker (EBSCO)Grokker (EBSCO) Aquabrowser visual Aquabrowser visual

context context Endeca Information Endeca Information

Access PlatformAccess Platform OCLC Custom Worldcat OCLC Custom Worldcat

and OpenWorldCatand OpenWorldCat

Innovative Interfaces Innovative Interfaces OPAC Pro & EncoreOPAC Pro & Encore

Ex Libris PrimoEx Libris Primo Polaris, AJAX-Enabled Polaris, AJAX-Enabled

OPACOPAC SirsiDynix Enterprise SirsiDynix Enterprise

Portal System, FASTPortal System, FAST Talis, et alTalis, et alWeb Web

ServicesServices Georgia Pines and the Georgia Pines and the

Library 2.0 Library 2.0 BandwagonBandwagon

Endeca purchase decision Lots of topical searches and poor Lots of topical searches and poor

subject accesssubject access– Keyword gives too many or too few Keyword gives too many or too few

results – leads to general distrustresults – leads to general distrust– Misunderstanding of authority headingsMisunderstanding of authority headings

No relevancy ranking of resultsNo relevancy ranking of results Needed more responsiveness (speed)Needed more responsiveness (speed)

Implementation Team 7 representative team members7 representative team members– Andrew Pace, IT, ChairAndrew Pace, IT, Chair– Emily Lynema, IT, ex officio (tech lead)Emily Lynema, IT, ex officio (tech lead)– Cindy Levine, Research and Information ServicesCindy Levine, Research and Information Services– Erik Moore, IT, ex officio (ILS librarian)Erik Moore, IT, ex officio (ILS librarian)– Charley Pennell, Metadata and CatalogingCharley Pennell, Metadata and Cataloging– Shirley Rodgers, ITShirley Rodgers, IT– Tito Sierra, Digital Library InitiativesTito Sierra, Digital Library Initiatives

TimelineTimeline– License / negotiation: Spring 2005License / negotiation: Spring 2005– Acquire: Summer 2005Acquire: Summer 2005– Implementation: August 2005 – January 12, 2006Implementation: August 2005 – January 12, 2006

Technical Overview Endeca ProFind co-exists with Endeca ProFind co-exists with

SirsiDynix Unicorn ILS and Web2 SirsiDynix Unicorn ILS and Web2 online catalog.online catalog.

Endeca indexes MARC records Endeca indexes MARC records exported from Unicorn.exported from Unicorn.

Index is refreshed nightly with Index is refreshed nightly with records added/updated during records added/updated during previous day.previous day.

Endeca ProFind Overview

Raw MARC data

NCSU exports and reformats

Flat text files

Data Foundry

Parse text files Indices

Navigation Engine

NCSU Web Application

HTTP

Client browser

HTTP

Endeca ProFind

Endeca ProFind Overview

Raw MARC data

NCSU exports and reformats

Flat text files

Data Foundry

Parse text files Indices

Navigation Engine

NCSU Web Application

HTTP

Client browser

HTTP

Offline - Nightly

Endeca ProFind Overview

Raw MARC data

NCSU exports and reformats

Flat text files

Data Foundry

Parse text files Indices

Navigation Engine

NCSU Web Application

HTTP

Client browser

HTTP

Always Online

Integrating Endeca Endeca doesn’t understand MARC data / MARC-8 Endeca doesn’t understand MARC data / MARC-8

character encoding – translate to UTF-8 text filescharacter encoding – translate to UTF-8 text files Each night a script updates the data indexed by Each night a script updates the data indexed by

Endeca:Endeca:– Exports updated or new MARC records from Unicorn.Exports updated or new MARC records from Unicorn.– Reformats and merges these records with those already Reformats and merges these records with those already

indexed.indexed.– Starts Endeca re-index – completely rebuilding index for Starts Endeca re-index – completely rebuilding index for

the catalog.the catalog. Process requires about 4 hours.Process requires about 4 hours. Retain Web2 OPAC for some functionalityRetain Web2 OPAC for some functionality

– Authority searching - known items and cross-referencesAuthority searching - known items and cross-references– Detailed record pages – how to make Endeca -> Web2 Detailed record pages – how to make Endeca -> Web2

link?link?

Quick Demo http://catalog.lib.ncsu.eduhttp://catalog.lib.ncsu.edu

Some User Reaction““This is absolutely the coolest thing I've seen all This is absolutely the coolest thing I've seen all

century.” century.” - Will Owen, Head of Systems (UNC Libraries)Will Owen, Head of Systems (UNC Libraries)

““Also, I'm really digging the new NCSU library catalog. Also, I'm really digging the new NCSU library catalog. Very nice." Very nice."

- Educause staff (non-librarian)- Educause staff (non-librarian)

““The new Endeca system is incredible. It would be The new Endeca system is incredible. It would be difficult to exaggerate how much better it is than difficult to exaggerate how much better it is than our old online card catalog (and therefore that of our old online card catalog (and therefore that of most other universities). I've found myself most other universities). I've found myself searching the catalog just for fun, whereas before it searching the catalog just for fun, whereas before it was a chore to find what I needed.”was a chore to find what I needed.”

- NCSU Undergrad, Statistics- NCSU Undergrad, Statistics

Basic statistics (March – May 2006)

Requests by Search Type

Search -> Navigation

29%

Navigation 20%

Search 51%

Navigation statistics (March – May 2006)

Navigation Requests by Dimension

70,516

38,074

38,605

59,248

87,221

74,985

65,545

155,856

169,249

23,848

0 30,000 60,000 90,000 120,000 150,000

Author

Language

Subject: Era

Subject: Region

Library

Format

Subject: Genre

Subject: Topic

LC Classification

Availability

Requests

Navigation statistics (March – May 2006)

Navigation by Dimensions

Subject: Topic19%

Library11%

Format9%

Author9%

Subject: Genre8%

Subject: Region7%

Subject: Era5%

Language5%

New4%

LC Classification20%

Availability3%

Sorting statistics (March – May 2006)

Sorting Requests

Most Popular19%

Title A-Z13%

Pub Date53%

Author A-Z9%

Call Number6%

Other interesting tidbits… (March 2006)

Authority searching decreased 45%Authority searching decreased 45% Keyword searching increased 230% Keyword searching increased 230% – Caveat: default catalog search changed Caveat: default catalog search changed

from title authority to keywordfrom title authority to keyword ~ 5% of keyword searches offered ~ 5% of keyword searches offered

spelling correction or suggestion spelling correction or suggestion – 3.1% - automatic spell correction3.1% - automatic spell correction– 2.3% - “Did you mean…” suggestion2.3% - “Did you mean…” suggestion

Usability Testing Trends 10 undergraduate students10 undergraduate students

– 5 with Endeca catalog5 with Endeca catalog– 5 with old Web2 OPAC5 with old Web2 OPAC

Endeca performed as well as OPAC for known-Endeca performed as well as OPAC for known-item searchingitem searching– 89% Endeca tasks completed ‘easily’ (8/9)89% Endeca tasks completed ‘easily’ (8/9)– 71% OPAC tasks completed ‘easily’ (15/21)71% OPAC tasks completed ‘easily’ (15/21)

Endeca performs better than OPAC for topical Endeca performs better than OPAC for topical searchingsearching– 61% Endeca tasks completed ‘easily’ (19/31)61% Endeca tasks completed ‘easily’ (19/31)– 3% Endeca tasks completed as ‘hard’ (1/31)3% Endeca tasks completed as ‘hard’ (1/31)– 33% OPAC tasks completed ‘easily’ (13/39) 33% OPAC tasks completed ‘easily’ (13/39) – 26% OPAC tasks completed as ‘hard’ (10/39)26% OPAC tasks completed as ‘hard’ (10/39)

A study in relevance Are search results in Endeca more Are search results in Endeca more

likely to be relevant to a user’s query likely to be relevant to a user’s query than search results in Web2 OPAC? than search results in Web2 OPAC?

100 topical user searches from 1 100 topical user searches from 1 month in fall 2005month in fall 2005

How many of top 5 results relevant?How many of top 5 results relevant?– 40% relevant in Web2 OPAC40% relevant in Web2 OPAC– 68% relevant in Endeca catalog68% relevant in Endeca catalog

Relevance defined Relevance ranking in Endeca – select Relevance ranking in Endeca – select

from a variety of modules and order from a variety of modules and order them based on importance.them based on importance.

Relevance most important in Keyword Relevance most important in Keyword Anywhere - searches all fields.Anywhere - searches all fields.

At NCSU…At NCSU…1.1. Original query term(s) (no thesaurus, Original query term(s) (no thesaurus,

stemming, spell correction)stemming, spell correction)2.2. Exact phrase matchExact phrase match3.3. Field ranking (Title higher than Author higher Field ranking (Title higher than Author higher

than Table of Contents)than Table of Contents)4.4. Number of fields that contain term(s) …Number of fields that contain term(s) …

Future Plans Ongoing tweaks:Ongoing tweaks:– Continued usability testingContinued usability testing– Relevance ranking algorithms & spell correction Relevance ranking algorithms & spell correction

thresholdsthresholds– Additional browsing optionsAdditional browsing options

Endeca 2.0 ideasEndeca 2.0 ideas– FRBR-ized displayFRBR-ized display– Discussions with OCLC regarding FAST (Faceted Discussions with OCLC regarding FAST (Faceted

Access to Subject Terms) and FRBRAccess to Subject Terms) and FRBR– Patron-generated refinements (folksonomies?)Patron-generated refinements (folksonomies?)– Enrich records with supplemental Web Services Enrich records with supplemental Web Services

content – more usable TOCs, book reviews, etc.content – more usable TOCs, book reviews, etc.– The death of authority searching (?)The death of authority searching (?)– More integration with QuickSearch, other data More integration with QuickSearch, other data

repositories, and third-party discovery toolsrepositories, and third-party discovery tools

Stuff to read… Rethinking how we provide bibliographic services for the Rethinking how we provide bibliographic services for the

University of California by the Bibliographic Services Task Force University of California by the Bibliographic Services Task Force http://libraries.universityofcalifornia.edu/sopag/BSTF/Final.pdfhttp://libraries.universityofcalifornia.edu/sopag/BSTF/Final.pdf

The Changing nature of the catalog and its integration with other The Changing nature of the catalog and its integration with other discovery tools by Karen Calhoundiscovery tools by Karen Calhounhttp://www.loc.gov/catdir/calhoun-report-final.pdf http://www.loc.gov/catdir/calhoun-report-final.pdf

The Changing nature of the catalog and its integration with The Changing nature of the catalog and its integration with other discovery tools. Final report. March 17, 2006. Prepared for other discovery tools. Final report. March 17, 2006. Prepared for the the Library of Congress by Karen Calhoun: A Critical review by Thomas Library of Congress by Karen Calhoun: A Critical review by Thomas Mann Mann http://www.guild2910.org/AFSCMECalhounReviewREV.pdfhttp://www.guild2910.org/AFSCMECalhounReviewREV.pdf

A “Next Generation Catalog, Eric Morgan A “Next Generation Catalog, Eric Morgan http://dewey.library.nd.edu/morgan/ngc/http://dewey.library.nd.edu/morgan/ngc/

Metadata Research Center, SILSMetadata Research Center, SILShttp://ils.unc.edu/mrc/http://ils.unc.edu/mrc/

University of Rochester eXtensible CatalogUniversity of Rochester eXtensible Catalog Toward a 21Toward a 21stst Century Catalog, ITAL, Sept. 2006, by Antelman, Century Catalog, ITAL, Sept. 2006, by Antelman,

Lynema, and PaceLynema, and Pace

From the Calhoun Report "If one accepts the premise that library "If one accepts the premise that library

collections have value, then library leaders collections have value, then library leaders must move swiftly to establish the catalog must move swiftly to establish the catalog within the framework of online information within the framework of online information discovery systems of all kinds. Because it discovery systems of all kinds. Because it is catalog data that has made collections is catalog data that has made collections accessible over time, to fail to define a accessible over time, to fail to define a strategic future for library catalogs places strategic future for library catalogs places in jeopardy the legacy of the world's in jeopardy the legacy of the world's library collections themselves. For this library collections themselves. For this reason, the option of rejecting library reason, the option of rejecting library catalogs is not considered in this report." catalogs is not considered in this report."

The library system pile

““Seams serve as perceptible Seams serve as perceptible boundaries that provide points of boundaries that provide points of reference; without such boundaries reference; without such boundaries readers get ‘lost at sea’ and don’t know readers get ‘lost at sea’ and don’t know were they are in relation to anything were they are in relation to anything else; they can’t perceive either the else; they can’t perceive either the extent of what they have or what they extent of what they have or what they don’t have.”don’t have.”

-Thomas Mann-Thomas Mann

Wither or Whither the Catalog?

Reversal of fortuneOLD SEARCH MODEL

NEW SEARCH MODEL

The library system puzzle

Catalog

Serials

A&I / FT DBs

Web

The library system puzzle

Catalog

Serials

A&I / FT DBs

Web

Digital Repositories

ERM Systems

Guided Navigation

Legacy ILS

Metasearch

IR

GS

Thank you.

http://www.lib.ncsu.edu/endecahttp://www.lib.ncsu.edu/endeca

Andrew Pace, Head, ITAndrew Pace, Head, IT

[email protected][email protected]