shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3)...

18
Shelf packing? : the role of cataloguers in a world of automatic metadata extraction Claudia Reynolds (University of Johannesburg) [email protected] IGBIS Workshop 29-30 August 2019, Pretoria

Upload: others

Post on 07-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

Shelf packing? : the role of cataloguers in a world of

automatic metadata extraction

Claudia Reynolds (University of Johannesburg)[email protected]

IGBIS Workshop 29-30 August 2019, Pretoria

Page 2: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

“Bibliographic records” created outside of libraries

Page 3: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify
Page 4: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

Where does bibliographic metadata come from?

1) External metadata (separate from the resource itself)

Libraries: MARC recordsPublishers: ONIX and similarVendors: Variety of formats and quality, from full MARC to Excel listsAuthors: Self-submission of metadata into

Research Data systems like Figshare/D-Space (including ETDs) or Booktrade retailers (e.g. via Kindle Direct Publishing for Amazon)

Page 5: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

A closer look at ONIX-Library partnerships

ONIX = Online Information Exchange

XML metadata files sent by publishers to retailers and distributorswho use them to create listings in their catalogues or websites.

An ONIX record

ISBN

TITLE

Page 6: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

An ONIX record (cont.)

SUMMARY

CONTENTS

IMPRINT

AUTHORS

LANGUAGE

SUBJECTS

Page 7: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

Library of Congress• Developed ONIX-MARC conversion software in 2009• Weekly ONIX feeds as part of CIP program• If no ISBN match - new CIP record is created. Imported fields:100, 245, 250,

264b, 264c, 490, 520, 505, 650 (BISAC), 700• Cataloguers “proof-read” – check errors, capitalization issues, add fields• If ISBN match – existing record is enhanced with TOCs, BISAC headings and

summaries from ONIX.• 100 000+ CIP records created annually

British Library• ONIX feeds – standardized via Ingram – compared against a number of data

sets (including Nielsen and OCLC) to find and select the best record • Ebook batch ingest process: 1500+ records per week, 70% require NO

cataloguer input at all

Page 8: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

2) Embedded metadata in digital resources

In META tags in the source code

These meta tags can be used by metadata extractors like Apache Tika, Exiftool, etc to create records.

Material Type Standards

Ebooks ePUB3 standards, including compulsory fields like identifier & title

Audiovisual e.g. id3 standards for digital musicOpen Access OAI-PMH = XML in Dublin Core formateJournals PRISM standardsOther (websites, digital photos, etc.)

Technical metadata at least

Page 9: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

3) AI-identified metadata

Machine learning, algorithms, document image analysis

• identify metadata from title-page, t.p. verso, table of contents• autogenerate keywords

Some examples• 1989 : Automated title page cataloging : a feasibility study / S. Weibel,

M. Oskins, D. Vizine-Goetz. Information processing & management v. 25, issue 2• 2001 : MARS2 Automating the production of bibliographic records for MEDLINE / George R. Thoma.• 2009 : DCEditor Digital Collections Production Center, Washington Research Library Consortium (WRLC) • 2018 : BL OCR -> MARC conversion tool - The paper museum of Cassiano dal Pozza : a catalogue raisonne• 2018 : Github code A Metadata Extractor for Books in a Digital Library / S.S. Akhtar & others. In: Maturity

and Innovation in Digital Libraries / Dobreva M., Hinze A., Žumer M. (eds).

Page 10: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

• More and more data• Automated metadata extraction is becoming more efficient and

consistent• As libraries adopt Open Web standards (BIBFRAME, XML, Dublin Core)

automatic record creation will become seamless

How the role of the cataloguer will change

Focus of cataloguing will shift from data creation to data management.

Page 11: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

“Shelf packing”

Page 12: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

1. Unpacking

• Getting the metadata into your library catalogue/discovery tool/webpage

* Recognize file formats and SchemasXML ; Dublin Core ; BIBFRAME ; EPUB ; HTML

* Know crosswalks / conversion programmesLoad tables ; MarcEdit ; Zepheira ; XC Metadata Toolkit

Page 13: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

2. LINK creation and maintenance

• Connect creators, works, subjects• Inserted by?

Bib agencies (OCLC/ONIX)Systems (SirsiDynix BLUECloud)Processing software (MarcEdit) You! (new links/NACO)

* What are URIs & where do they come from?VIAF ; ORCID ; ISNI ; id.loc.gov

* How do links in RDA, Bibframe and LRM (Library Reference Model) work within the Web RDF format

https://www.loc.gov/aba/pcc/bibframe/TaskGroups/formulate_obtain_URI_guide.pdf

Page 14: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

3. Quality control

• Find and fix blank fields, errors, diacritics

• Remove duplicates• Add local data or enhancements

* Machine identification of errors & duplicates * Batch processing tools

Library system functions (global updates, bad codes, etc.) ; MarcEdit ; OpenRefine ; Regular expressions

Page 15: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

4. Access

• Print:Weeding / withdrawalsStocktake

• E-itemsDatabase managementURL checkingFix broken links reported by users

* Your library system’s inventory control process

* Automatic URL checkers

including CMS (Control Management Systems)Limited functionality with proxy servers. Still need human checking.

Page 16: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

5. Collection analysis (by subject)

• Automatic tools use a “conspectus” of call numbers

• You can refine the conspectus to suit own collections, or use subject headings as well

• Work with collection development staff / subject librarians / educators

* Ensure all records have call numbers & controlled subjectsDewey ; LC classification ; LCSH ; MESH

* Know how to query that dataLibrary system functions (e.g. Sierra Create Lists/Decision Center) ; WorldShare Analytics ; SQL ; SPARQL ; JSON

Page 17: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

6. Original cataloguing

• Unique materials• Realia & museum artifacts• Devices (phone chargers, ebook readers,

aids for disabled, etc.)

* You already have the skills!!

Page 18: Shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3) AI-identified metadata Machine learning, algorithms, document image analysis • identify

• Library Carpentry https://librarycarpentry.org/

An introductory software skills training programme with a focus on the needs and requirements of library and information professionals.Lessons include:

Introduction to dataRegular expressions

Shell commandsOpenRefine

GitHub

Make a start

• MarcEdit https://marcedit.reeset.net/downloads

A metadata editing software suite

Functions include:Crosswalks

Batch processingBibframe

OpenRefineLinked Data

SPARQLOAI-PMH

***** Subscribe to the JOURNAL OF LIBRARY METADATA *****