![Page 1: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/1.jpg)
Data BricolageMixed methods to verify, summarize, clean, and enhance data in and out of
the ILS
Kristina SpurginE-Resources Cataloger - UNC-Chapel Hill
![Page 2: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/2.jpg)
Photo by dannybirchall
BRICOLAGE“construction (as of a sculpture or a structure of ideas) achieved by using whatever comes to hand; also : something constructed in this way” –m-w.com
![Page 3: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/3.jpg)
A mapA bit of context – my institution and my role in it
My favorite load table – gathering bib records
Extended example of “data bricolage” for cleaning/enhancing bib records
Script to verify full text access to ebooks
Script/program to summarize data exported from Millennium
![Page 4: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/4.jpg)
Photo of Davis Library @ UNC by benuski
- University of NC @ Chapel Hill- Large institution, ARL member- 6,048,337 catalog results (not
exactly what’s in our III backend, but gives an idea of scale)
- 3 administrative units- +/- 30 branches and specialized
collection locations- >1060 item locations- Part of Triangle Research
Libraries Network, sharing:- Endeca OPAC- Physical storage space- Some MARC records- Some acquisitions
- 1 staff member with load table training
“It’s complicated…”
![Page 5: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/5.jpg)
My official job – E-resources cataloger• Managing & loading batches of MARC records
for ebooks• Individual cataloging of Web sites, online
databases, and some ebooks• Oversee maintenance of URLs in catalog
records• (new!) Extraction of our catalog data from
Millennium for use in our Endeca OPAC
![Page 6: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/6.jpg)
My official job – Tools of the data bricoleur
![Page 7: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/7.jpg)
My unofficial job – “Fixer”
So, HathiTrust requires very specific info in their metadata for an ingest…
Image from QuotesPics.com
Oops, a lot titles in that big ebook package we just cancelled were on EReserve. How can we identify them?
This branch library has an old Access database of items they want to put in the catalog…
We need a way to easily work with payment data outside Millennium for a serials review!
![Page 8: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/8.jpg)
MY FAVORITE LOAD TABLEgathering, cleaning, & enhancing records
![Page 9: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/9.jpg)
BACKGROUND: a pre-existing workflow
Spreadsheet from Internet Archive Scribe manager:
Spreadsheet >> MarcEdit Delimited Text Translator:
![Page 10: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/10.jpg)
BACKGROUND: A pre-existing workflow
Compiled to .mrc and loaded with locally-created load table that:
• Matches on bnum (907) for overlay• Protects ALL fields in existing record (LDR, Cat Date, etc… everything)• Inserts any fields from the new stub record (will create dupe fields)• Creates new item
![Page 11: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/11.jpg)
“How can I get these back into a review file?”b29786551b30718326b31024907b31024932b31351463b32383137b32568149b32594124b32874492b32921342b32935602b33764037…
“You can’t, really.”
(me)
Why am I telling you about this old thing?
![Page 12: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/12.jpg)
What if I loaded stub recordscontaining nothing but the
bnum?
(me)
• On load, check “Use Review Files” box
• It works! • We toggle item creation in the
load table as needed (trivial tweak)
![Page 13: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/13.jpg)
THE SAVINE SAGAcleaning & maintaining catalog records
![Page 14: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/14.jpg)
Savine Digital Library home: http://dc.lib.unc.edu/cdm/customhome/collection/rbr/
![Page 15: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/15.jpg)
Local Millennium Record
OCLCMasterRecord
![Page 16: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/16.jpg)
+3600 local records+3600 OCLC records
http://rbr.lib.unc.edu/cm/card.html?source_id=00664
became
http://dc.lib.unc.edu/cdm/item/collection/rbr/?id=32017
![Page 17: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/17.jpg)
list of bnums not associated with new URLs
new URLs manually identified for each bnum
initial list of catalog bnums for Savine records(but for print only… oops)
new URL for each bnum
![Page 18: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/18.jpg)
Local Record Strategy- Create review file of all bib records with 856 matching old db URL- Export data from Millennium/open in Excel… (table name = mill)
- New worksheet w/new DB info (table name = contdm)
Hmm… these bnums won’t match…
![Page 19: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/19.jpg)
Local Record Strategy- Add 8-character bnum to mill table
- Copy entire bnum8 column- “Paste special > Values” back in the same place
![Page 20: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/20.jpg)
Local Record Strategy- VLOOKUP formula to grab new URLS from contdm table
contdm table
mill table (some columns hidden)
![Page 21: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/21.jpg)
Local Record Strategy- Identify pattern in missing new URLs
- Create new table (name = urlmatch)
![Page 22: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/22.jpg)
Local Record Strategy- In mill table, clear out NEW URL column
![Page 23: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/23.jpg)
Local Record Strategy- In mill table, repopulate NEW URL with VLOOKUP from urlmatch
![Page 24: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/24.jpg)
Local Record Strategy- Use MarcEdit Delimited Text
Translator to create “stub records”
![Page 25: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/25.jpg)
Local Record Strategy- Global update on review file of Savine records
- Delete all old 865s containing |uhttp://rbr.lib.unc.edu - Load stub records with my favorite load table
- New URLs added
![Page 26: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/26.jpg)
OCLC Record Strategy
• Batch search OCLC#s into local OCLC save file
• Validate/correct as necessary• Use MARCedit/OCLC plugin to
open local save file in MARCedit• Copy all to new MARCedit file• Delete old URLs, Save• Merge in new URLs from “stub”
record file created w/OCLC# and new URLs
• Copy merged records back into file created by plugin
• Save records from plugin MARCedit file back to local OCLC save file
• Batch replace records in OCLC Connexion
![Page 27: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/27.jpg)
Other bricolage projects using my favorite load table- SpringerLink ebook records
- 950s (subject module) were deleted from many records- In SpringerLink title list: DOI url, Subject module- In Millennium: bnum, DOI url- Stub records with bnum (907) and new 950
- Alexander Street Press (ASP) records released without OCLC nums
- From ASP: ASP record ID, OCLC num- From Mill: bnum, ASP record ID- Stub records with bnum (907) and new 035
![Page 28: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/28.jpg)
BEYOND THE URL CHECKERA script to verify full-text access to ebooks
![Page 29: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/29.jpg)
Access checker:
• Ideally, vendors would provide us with:– MARC records for ALL items to which we have full
access– NO MARC record for items to which we have
restricted access• Reality is not ideal. • Example: SpringerLink e-books
• 250-560 new MARC records a month
The problem addressed
![Page 33: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/33.jpg)
URL CHECKER
ACCESS CHECKER
!=
![Page 34: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/34.jpg)
• Data souces:– Extract from MARC file pre-load using MARCedit– Export from Millennium Create Lists post-load
• URL must be final column – One URL per row• Any number of columns can be included before the URL
Access checker:Script use: input
![Page 35: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/35.jpg)
Access checker:Script use: running the script
In Windows Powershell:
![Page 36: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/36.jpg)
Access checker:Script use: running the script
In Windows Powershell:
![Page 37: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/37.jpg)
Access checker:Script use: running the script
In Windows Powershell:
![Page 38: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/38.jpg)
Access checker:Script use: output
![Page 39: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/39.jpg)
Access checker:Other info
• Looks at the “landing page” for each URL – does not download or harvest any full text content
• Written in JRuby• Open source – Code available from GitHub• Instructions for use also at GitHub – I tried to
write them for people not familiar with using scripts
![Page 40: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/40.jpg)
DEALING WITH PAYMENT DATAA script to summarize PAID data from order records
![Page 41: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/41.jpg)
Payment data processor:
• Millennium will export payment data from Create Lists of order records
• BUT the format of the exported data makes it virtually unusable. – 9 payment field columns, repeated
• One row in the output below had data all the way to column ST!
The problem addressed
![Page 42: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/42.jpg)
Payment data processor:
• Script outputs either:– One payment per line– Payments summarized by fiscal year
The solution
![Page 43: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/43.jpg)
• Exported .txt file from Millennium Create Lists
Payment data processor:Script use: input
![Page 44: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/44.jpg)
Payment data processor:Script use: running the script
• You can run the Ruby (.rb) script from the command line• BUT• Everyone using this at UNC just double-clicks on the .exe
![Page 45: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/45.jpg)
Payment data processor:Script use: running the script
![Page 46: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/46.jpg)
Payment data processor:Script use: running the script
![Page 47: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/47.jpg)
Payment data processor:Script use: running the script
![Page 48: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/48.jpg)
Payment data processor:Script use: output
![Page 49: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/49.jpg)
Payment data processor:Script use: running the script
![Page 50: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/50.jpg)
Payment data processor:Script use: output
![Page 51: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/51.jpg)
Payment data processor:Other info
• Written in Ruby• Open source – Code available from GitHub• Instructions for use also at GitHub – I tried to
write them for people not familiar with using scripts
![Page 52: Data Bricolage Mixed methods to verify, summarize, clean, and enhance data in and out of the ILS Kristina Spurgin E-Resources Cataloger - UNC-Chapel Hill](https://reader031.vdocuments.us/reader031/viewer/2022032723/56649cff5503460f949d060b/html5/thumbnails/52.jpg)
Questions?
Photo by theunquietlibrarian on Flickr