shelf packing? : the role of cataloguers in a world of automatic … · 2019-09-27 · 3)...
TRANSCRIPT
Shelf packing? : the role of cataloguers in a world of
automatic metadata extraction
Claudia Reynolds (University of Johannesburg)[email protected]
IGBIS Workshop 29-30 August 2019, Pretoria
“Bibliographic records” created outside of libraries
Where does bibliographic metadata come from?
1) External metadata (separate from the resource itself)
Libraries: MARC recordsPublishers: ONIX and similarVendors: Variety of formats and quality, from full MARC to Excel listsAuthors: Self-submission of metadata into
Research Data systems like Figshare/D-Space (including ETDs) or Booktrade retailers (e.g. via Kindle Direct Publishing for Amazon)
A closer look at ONIX-Library partnerships
ONIX = Online Information Exchange
XML metadata files sent by publishers to retailers and distributorswho use them to create listings in their catalogues or websites.
An ONIX record
ISBN
TITLE
An ONIX record (cont.)
SUMMARY
CONTENTS
IMPRINT
AUTHORS
LANGUAGE
SUBJECTS
Library of Congress• Developed ONIX-MARC conversion software in 2009• Weekly ONIX feeds as part of CIP program• If no ISBN match - new CIP record is created. Imported fields:100, 245, 250,
264b, 264c, 490, 520, 505, 650 (BISAC), 700• Cataloguers “proof-read” – check errors, capitalization issues, add fields• If ISBN match – existing record is enhanced with TOCs, BISAC headings and
summaries from ONIX.• 100 000+ CIP records created annually
British Library• ONIX feeds – standardized via Ingram – compared against a number of data
sets (including Nielsen and OCLC) to find and select the best record • Ebook batch ingest process: 1500+ records per week, 70% require NO
cataloguer input at all
2) Embedded metadata in digital resources
In META tags in the source code
These meta tags can be used by metadata extractors like Apache Tika, Exiftool, etc to create records.
Material Type Standards
Ebooks ePUB3 standards, including compulsory fields like identifier & title
Audiovisual e.g. id3 standards for digital musicOpen Access OAI-PMH = XML in Dublin Core formateJournals PRISM standardsOther (websites, digital photos, etc.)
Technical metadata at least
3) AI-identified metadata
Machine learning, algorithms, document image analysis
• identify metadata from title-page, t.p. verso, table of contents• autogenerate keywords
Some examples• 1989 : Automated title page cataloging : a feasibility study / S. Weibel,
M. Oskins, D. Vizine-Goetz. Information processing & management v. 25, issue 2• 2001 : MARS2 Automating the production of bibliographic records for MEDLINE / George R. Thoma.• 2009 : DCEditor Digital Collections Production Center, Washington Research Library Consortium (WRLC) • 2018 : BL OCR -> MARC conversion tool - The paper museum of Cassiano dal Pozza : a catalogue raisonne• 2018 : Github code A Metadata Extractor for Books in a Digital Library / S.S. Akhtar & others. In: Maturity
and Innovation in Digital Libraries / Dobreva M., Hinze A., Žumer M. (eds).
• More and more data• Automated metadata extraction is becoming more efficient and
consistent• As libraries adopt Open Web standards (BIBFRAME, XML, Dublin Core)
automatic record creation will become seamless
How the role of the cataloguer will change
Focus of cataloguing will shift from data creation to data management.
“Shelf packing”
1. Unpacking
• Getting the metadata into your library catalogue/discovery tool/webpage
* Recognize file formats and SchemasXML ; Dublin Core ; BIBFRAME ; EPUB ; HTML
* Know crosswalks / conversion programmesLoad tables ; MarcEdit ; Zepheira ; XC Metadata Toolkit
2. LINK creation and maintenance
• Connect creators, works, subjects• Inserted by?
Bib agencies (OCLC/ONIX)Systems (SirsiDynix BLUECloud)Processing software (MarcEdit) You! (new links/NACO)
* What are URIs & where do they come from?VIAF ; ORCID ; ISNI ; id.loc.gov
* How do links in RDA, Bibframe and LRM (Library Reference Model) work within the Web RDF format
https://www.loc.gov/aba/pcc/bibframe/TaskGroups/formulate_obtain_URI_guide.pdf
3. Quality control
• Find and fix blank fields, errors, diacritics
• Remove duplicates• Add local data or enhancements
* Machine identification of errors & duplicates * Batch processing tools
Library system functions (global updates, bad codes, etc.) ; MarcEdit ; OpenRefine ; Regular expressions
4. Access
• Print:Weeding / withdrawalsStocktake
• E-itemsDatabase managementURL checkingFix broken links reported by users
* Your library system’s inventory control process
* Automatic URL checkers
including CMS (Control Management Systems)Limited functionality with proxy servers. Still need human checking.
5. Collection analysis (by subject)
• Automatic tools use a “conspectus” of call numbers
• You can refine the conspectus to suit own collections, or use subject headings as well
• Work with collection development staff / subject librarians / educators
* Ensure all records have call numbers & controlled subjectsDewey ; LC classification ; LCSH ; MESH
* Know how to query that dataLibrary system functions (e.g. Sierra Create Lists/Decision Center) ; WorldShare Analytics ; SQL ; SPARQL ; JSON
6. Original cataloguing
• Unique materials• Realia & museum artifacts• Devices (phone chargers, ebook readers,
aids for disabled, etc.)
* You already have the skills!!
• Library Carpentry https://librarycarpentry.org/
An introductory software skills training programme with a focus on the needs and requirements of library and information professionals.Lessons include:
Introduction to dataRegular expressions
Shell commandsOpenRefine
GitHub
Make a start
• MarcEdit https://marcedit.reeset.net/downloads
A metadata editing software suite
Functions include:Crosswalks
Batch processingBibframe
OpenRefineLinked Data
SPARQLOAI-PMH
***** Subscribe to the JOURNAL OF LIBRARY METADATA *****