maintaining the integrity of e-book titles in cityu library catalogue

42
1 Maintaining the integrity of e- book titles in CityU library catalogue 7 th HKIUG, 12 Dec 2006, HKUST Joanna Pong, Philip Wong Run Run Shaw Library City University of Hong Kong

Upload: jasmine-kaufman

Post on 30-Dec-2015

29 views

Category:

Documents


0 download

DESCRIPTION

Maintaining the integrity of e-book titles in CityU library catalogue. 7 th HKIUG, 12 Dec 2006, HKUST Joanna Pong, Philip Wong Run Run Shaw Library City University of Hong Kong. Table of Contents. Growth of e-books in CityU Duplication problems Attempted solutions Effective Solutions - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Maintaining the integrity of e-book titles  in CityU library catalogue

1

Maintaining the integrity of e-book titles in CityU library catalogue

7th HKIUG, 12 Dec 2006, HKUST

Joanna Pong, Philip WongRun Run Shaw Library

City University of Hong Kong

Page 2: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 2

Table of Contents

1. Growth of e-books in CityU2. Duplication problems3. Attempted solutions4. Effective Solutions5. De-duplication jobs6. Benefits and limitations

Page 3: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 3

1. Growth of e-books in CityU

E-book collection contains English e-books, Chinese e-books & e-theses

From 2001: NetLibrary (around 200 titles)To Oct 2006: > 200,000 titles English e-books: > 87,000 titles Chinese e-books: > 45,000 titles e-theses: > 70,000 titles

Page 4: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 4

1. Growth of e-books in CityU (cont’d)

Acquisition of e-books from 2001 onwards

English ebooks

Chinese ebooks

eTheses Total

2001-02 200 0 0 >200

2002-03 100 0 200 >300

2003-04 200 0 100 >400

2004-05 1,300 1,400 39,000 >40,000

2005-06 77,000 44,000 31,000 >150,000

2006-07 (Jul-Oct 06)

8000 0 100 >8,100

>87,000 >45,000 >70,000 >200,000

Page 5: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 5

1. Growth of e-books in CityU (cont’d)

Acquisition of eBooks

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

2001-02 2002-03 2003-04 2004-05 2005-06 2006-07

Year

Num

ber o

f Titl

e

English ebooks Chinese ebooks e-theses

Total > 200,000 titles

Page 6: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 6

1. Growth of e-books in CityU (cont’d)

Major e-book collections

No. of Titles

4600036%

50004%

2700022%

4300035%

10001%

20002%

Apabi

Books24x7

Ebrary

NetLibrary

Safari

Springer

Page 7: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 7

1. Growth of e-books in CityU (cont’d)

E-theses

No. of Titles

2000

3% 15000

21%

54000

76%

UMI pdf files

ProQuestABI/Inform

Digital DissertationConsortium

Page 8: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 8

1. Growth of e-books in CityU (cont’d)

Consortial acquisition of e-books Digital Dissertation Consortium – since 2005 Apabi D-Lib Consortium – since 2006 NetLibrary Super E-book Consortium – since 2006

New consortia Electronic Resources Academic Library Link (ERALL),

a JULAC project on collective e-book collection development

Page 9: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 9

1. Growth of e-books in CityU (cont’d)

Growth of e-book usages (from CGI Logs) -- showed an uprising trend

eBooks Yr 2004 Yr 2005 Yr 2006 % Growth

05 to 06

Apabi 588 5196 8047 55%

ebrary - 5922 18467 212%

netLibrary 1928 2563 14753 476%

Safari - 1488 1768 19%

Wiley InterScience - 302 1291 327%

Digital Dissert. Con.

- 9881 11485 16%

ProQuest Dissert. - 1594 3171 99%

Page 10: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 10

2. Duplication problems

The variety of e-book collections and high number of titles created problems in cataloguing

A major problem-> Title duplication

We load records supplied by different vendors, resulted in title duplication

More e-book titles, more title duplication

same title from different collections same title from same collection

Page 11: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 11

2. Duplication problems (cont’d)

Duplication from different collections

Page 12: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 12

2. Duplication problems (cont’d)

Duplication from the same collection NetLibrary collection

Titles purchased by CityU since 2001 Titles acquired via Super-ebook Consortium

Page 13: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 13

2. Duplication problems (cont’d)

Same title from NetLibrary acquired in different period

Page 14: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 14

2. Duplication problems (cont’d)

Duplication from the same collection (cont’d) UMI e-theses

Titles purchased by CityU since 2002 Titles acquired via Digital Dissertation

Consortium Titles in ProQuest Database

Page 15: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 15

2. Duplication problems (cont’d)

Same UMI e-thesis title acquired in different period

Page 16: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 16

3. Attempted solutions

Single record approach in cataloguing We apply single record approach for all e-

versions of the same title Applied to e-books and e-journals

Page 17: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 17

3. Attempted solutions (cont’d)

Duplication control in e-journals CityU applied and modified BU’s program to merge

e-journal titles from aggregator databases

Page 18: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 18

3. Attempted solutions (cont’d)

Duplication control through manual methods For e-books, our previous solutions

1. Manual checking2. Headings reports – duplicate call numbers3. Loading through match field 001 – identify

duplicate records4. Encounter basis

Okay when the number of titles remains small

Page 19: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 19

3. Attempted solutions (cont’d)

Duplication control through customized load profiles The first attempt to automate the procedure Utilized the local load profiles and translation

table in INNOPAC to merge 2 sets of NetLibrary titles Super E-book Consortium titles purchased in

2006 NetLibrary titles purchased since 2001 2,206 titles were found duplicated

Page 20: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 20

3. Attempted solutions (cont’d)

Duplication control through customized load profiles (cont’d) Using load profiles is not a complete solution

Cannot match multiple tags (cannot match tag 020 against tag 024)

Cannot match selected sets (cannot exclude print titles)

Cannot merge multiple records automatically; must output for manual checking to decide the master record

Page 21: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 21

4. Effective Solutions

Cataloguing worked with Systems to run de-duplication and merging of records

Prerequisite easy to apply able to fit in the existing workflow have flexibility to handle different sizes of e-

book batches allow prompt or ad hoc loading of records if

necessary

Page 22: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 22

4. Effective Solutions (cont’d)

Scope of de-duplication Include English e-books and e-theses

e-books: 88,000 records e-theses: 70,000 records

Page 23: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 23

4. Effective Solutions (cont’d)

Scope of de-duplication (cont’d) Exclude Chinese e-books because

CityU so far only has one Chinese e-book collection, Apabi.

Vendor supplied unique records when we joined the Apabi D-Lib consortium (no duplication with previously purchased titles)

We will also handle Chinese e-books if we acquire other Chinese e-book collections in the future

Page 24: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 24

4. Effective Solutions (cont’d)

What fields to match? E-books

Match ISBN – a relatively reliable tag Match major MARC tags – 110 match key

UMI e-theses Use UMI number for matching

Page 25: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 25

4. Effective Solutions (cont’d)

How to merge? Set the one with the earliest Create Date as the

master record Add reproduction note (tag 533), name of book

collection (tag 773) and URL link (tag 856) of the duplicate record(s) to the master record

Page 26: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 26

4. Effective Solutions (cont’d)

Matching algorithm of ISBN Print ISBN vs. e-book ISBN

Some records come with print ISBN, some with e-book ISBN, some with both

Both types are used for matching

Different tags to store ISBN 020 $a, $z 024 (1st indicator 3) $a, $z 776 $z All the above are used for matching

Page 27: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 27

4. Effective Solutions (cont’d)

Matching algorithm of ISBN (cont’d) 13-digit ISBN vs. 10-digit ISBN

Starting on 1 Jan 2007, the ISBN is 13-digit Some publishers already used 13-digit ISBN

before that Starting from 12 Nov 06, OCLC moves 13-digit

ISBN to tag 020 13-digit ISBN with prefix “978” may have 10-digit

equivalents, they are converted to 10-digit for matching

Page 28: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 28

4. Effective Solutions (cont’d)

Matching algorithm of ISBN (cont’d) ISBN with “noise”

Some ISBN include a note enclosed in parentheses

Do not use ISBN for matching if the text inside the parentheses indicates that the ISBN is for a set, a series, or a volume etc.e.g. “0415191327 (series : International library of

psychology)” Hints: look for keywords “set”, “series” and

compare with Tag 440 and Tag 830

Page 29: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 29

4. Effective Solutions (cont’d)

Matching algorithm of the 110 Match Key To guarantee there is no mismatch by ISBN,

construct additional match key based on INN-Reach 110 Match Key

Title + Gen. Media + Pub. Year + Pagination + Edition + Publisher + Type of Record + Title Part + Title Number

Constructed the key and normalized Refer to INN-Reach documentation for details

Page 30: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 30

5. De-duplication jobs

Initial clean-up Regular de-duplication

Page 31: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 31

5. De-duplication jobs (cont’d)

Initial clean-up One time -- to de-duplicate records that had

been loaded 6,063 (7.2%) duplicate records were found, out

of 84,756 English e-book titles Fine tune program after initial clean-up

Page 32: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 32

5. De-duplication jobs (cont’d)

Regular de-duplication Once every month Flexibility

Depends on no. of title loaded & urgency to load the records

Clean-up before loading vs. clean-up after loading

Page 33: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 33

5. De-duplication jobs (cont’d)

Regular de-duplication (cont’d) Procedures

Output e-book records from catalogue Run de-duplication program to match with

vendor records Overlay records in catalogue with merged

records If vendor records have been loaded

delete duplicate vendor records from catalogue Else

insert new vendor records into catalogue

Page 34: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 34

5. De-duplication jobs (cont’d)

Flow chart

Match & Merge

DeleteOverlay

Master records

Vendor records

MergedDuplicate

dNew

INNOPAC

Insert

Vendor

INNOPAC

Page 35: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 35

5. De-duplication jobs (cont’d)

De-duplication results Initial clean-up of e-books

Total English e-book records 84756 100.0%

Records duplicated 6063 7.2%

     

Titles merged from 2 records 3024  99.8%

Titles merged from 3 records 5 0.2%

Titles merged from >= 4 records 0  0.0%

Page 36: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 36

5. De-duplication jobs (cont’d)

De-duplication results Initial clean-up of e-books (cont’d)

  Books24x7 ebrary netLibrary Safari Springer Wiley Total

Books24x7 7            

ebrary 0 14          

netLibrary 4 2842 10        

Safari 10 30 51 2      

Springer 0 41 0 0 0    

Wiley 0 11 0 0 0 0  

Total 21 2938 61 2 0 0 3022

            (Misc) 2

Distribution of titles merged from 2 records

Page 37: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 37

5. De-duplication jobs (cont’d)

De-duplication results Initial clean-up of e-books (cont’d)

We found that for the duplicated titles within the same collection, some will direct users to different e-books, this problem is more serious in ebrary.

Fine-tune program, add the condition:

When two matched records have the same CGI scripts (i.e. belong to the same collection) but different book IDs, do not merge them, but flag for review

Page 38: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 38

5. De-duplication jobs (cont’d)

De-duplication results (cont’d) Initial clean-up of e-theses

Total UMI e-thesis records 66358 100.0%

Records duplicated 502 0.76%

     

Titles merged from 2 records 251  100%

Titles merged from 3 records 0 0.0%

Titles merged from >= 4 records 0  0.0%

Page 39: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 39

5. De-duplication jobs (cont’d)

De-duplication results Initial clean-up of e-theses (cont’d)

  UMI (pdf) DDC ProQuest Total

UMI (pdf) 0      

DDC 226 0    

ProQuest 23 2 0  

Total 249 2 0 251

Distribution of titles merged from 2 records

(DDC = Digital Dissertation Consortium)

More than 4,000 DDC & ProQuest records had been de-duplicated with manual process (using 001 field) before the initial clean-up process.

Page 40: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 40

6. Benefits and limitations

Benefits Single record for all versions of the same e-book

or e-thesis titles, maintain integrity in the library catalogue

Save much staff time & manual effort Method applicable to other e-resources Management need – generate duplication

statistics Can be applied to match existing e-book

collections with e-book titles supplied by potential vendors – e-book collection development

Page 41: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 41

6. Benefits and limitations (cont’d)

Limitations Depends on data in vendor-supplied records

Incorrect match and merge in case of incorrect or incomplete data

Chinese e-book records Brief bibliographic data Lack of standardization in transcription Difficult to construct reliable match-key Sometimes lack of ISBNs

Page 42: Maintaining the integrity of e-book titles  in CityU library catalogue

Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 42

Maintaining the integrity of e-book titles in CityU library catalogue

Thank You!

Joanna PongE-mail: [email protected]

Philip WongE-mail: [email protected]