a peek inside the carolina digital repository michael daines digital repository analyst unc –...
TRANSCRIPT
A Peek Inside theCarolina Digital Repository
Michael DainesDigital Repository Analyst
UNC – Chapel Hill
Goals
What’s in the repository?
What’s in the repository?
• 41158 images• 18671 texts (PDF, Microsoft Word, text files)• 11856 audio files• 1438 datasets• 54 video files
(As of July 17, 2013)
What’s in the repository?
• Research Laboratories of Archaeology35502 images (photographs and scans)
• Electronic Theses and Dissertations4035 PDFs
• BioMed Central1777 PDFs (articles)
(As of July 17, 2013)
How to show what we have?
“Peek”
https://github.com/UNC-Libraries/peek
How do we findinteresting images?
Cover pages?
Random pages?
How do we findinteresting images?
Query → Download → Split → Resize → Choose
Query, Download
Solr queryDownload public datastreams
Split, Resize
CoreGraphicsImageMagick
Choose
Initial set
2000 objects35855 images split
425 images for homepage
Further work
• Larger sample?• Automation?• Integration with repository?• Collaborative filtering?• Image classification?• No processing step?• A/V objects?• Bias?
Try it!
https://cdr.lib.unc.edu/https://github.com/UNC-Libraries/peek
https://github.com/UNC-Libraries/peek-data