digital collections: storage and access - indiana university
TRANSCRIPT
Digital Collections:Storage and Access
Jon DunnAssistant Director for Technology
IU Digital Library [email protected]
October 2, 2003 ALI Digital Library Workshop
StorageWhy is storage an issue?
Space requirementsPersistenceAccessibility
Needs depend on purpose of storageCapture/encodingAccess/deliveryPreservation
October 2, 2003 ALI Digital Library Workshop
Storage: Working SpaceSpace for storage of digital files during capture/encoding/quality control processPossibilities
PC hard driveFile server / LAN
IssuesCapacity, backup, speed, accessibility
October 2, 2003 ALI Digital Library Workshop
Storage: Access/DeliveryStorage of derivative files for web delivery
Image, audio, video, text files, etc.Possibilities
Local web serverCommercially-hosted web siteConsortial service provider
Issues: capacity, backup, performance, software integration, maintenance/migration
October 2, 2003 ALI Digital Library Workshop
Storage: PreservationMuch harder problemLonger term
Issues of longevity of media, hardware, file format“Where did we put the files?”
Larger filesHard disk storage, traditional backup methods not cost-effective
Infrequency of accessProblems do not become immediately evident
October 2, 2003 ALI Digital Library Workshop
Long-Term Storage OptionsRemovable media stored offline
OpticalCD-R (CD-Recordable)DVD-R (DVD-Recordable), DVD+R, DVD+RW, DVD-RW, …
TapeDLT, 8mm, DAT, …
Pros: cheap, easy, produces tangible itemCons: Low capacity, physical space requirements, unknown longevity, migration, potential format obsolescence
Online/nearline storage systemsHSM: Hierarchical Storage Management
Combine disk and automated tape storage with software to keep track of where files are located
Locally managed or remote providerPros: high capacity, migration can be handled by software, Cons: expensive, complex, network bandwidth issues, must trust service provider, potential single point of failure
October 2, 2003 ALI Digital Library Workshop
HSM Example: IU’s Massive Data Storage Service (MDSS)
HPSS (High Performance Storage System) software
Developed as collaboration of IBM and US national labs
Four tape robots 2 in Bloomington, 2 in IndianapolisData can be mirrored
540 terabytes (TB) total storage~75 TB used as of April 2001
October 2, 2003 ALI Digital Library Workshop
A digital object is more than just a file!
Metadata
Delivery page image files (JPEG)
Hi-res page image files (TIFF)
Text file (TEI/XML)
October 2, 2003 ALI Digital Library Workshop
A digital object is more than just a file!
EADFinding
Aid
October 2, 2003 ALI Digital Library Workshop
DL ObjectsDigital library “objects” have many parts
MetadataPreservation/archival filesDelivery files
How do we keep them connected?Now: Good practice in file naming, directory organization, project documentation -not scalable!Future: Digital object repository
October 2, 2003 ALI Digital Library Workshop
Data PersistenceKey is migrationKeeping the bits alive
Physical mediaLogical media format
Keeping the bits understandableFile formatMetadata
Small “pockets” of digital content pose a problem for migration
October 2, 2003 ALI Digital Library Workshop
DL Object Repository
Preservation version in HSM
Delivery version(s) on web server
Metadata records
RepositorySystem
Users andapplications
October 2, 2003 ALI Digital Library Workshop
Web Delivery FunctionsSearching
MetadataFull text
BrowsingBy subject, date, author, …
NavigationPage turning, image panning/zooming, …
StreamingFor audio/video
ReuseDownloading, format conversionLinking, persistent naming
Access controlIf necessary
October 2, 2003 ALI Digital Library Workshop
Digital Collection Delivery Software
Very complex systemsNeed to integrate data from databases, full-text search engines, file systems, and other sourcesCross-collection searchingCommercial
ContentDM, Luna Insight, various library management system addons
Open sourceUMich DLXS, Greenstone, Eprints, MIT DSpace, …
Homegrown
October 2, 2003 ALI Digital Library Workshop
DemonstrationHoagy Carmichael Collection,IU Digital Library Programhttp://www.dlib.indiana.edu/collections/hoagy/
October 2, 2003 ALI Digital Library Workshop
Exposing Digital Resources Broadly
Pay servicesRLG Cultural Materials, Archival Resources
Free servicesUniversity of Michigan OAIster
www.oaister.orgUIUC Digital Gateway to Cultural Heritage Materials
oai.grainger.uiuc.edu
OAI-PMHOpen Archives Initiative Protocol for Metadata Harvestingwww.openarchives.org
October 2, 2003 ALI Digital Library Workshop
OAI Metadata HarvestingExtract metadata from various sourcesBuild services on local copies of metadata
user
. . .
search for “Indiana”
local copy ofmetadata
metadataharvested offline
metadataharvested offline
metadataharvested offline
metadataharvested offline
all searching, browsing, etc. performed on the metadata here
Data providers
Service provider
October 2, 2003 ALI Digital Library Workshop
More Information
Bibliography to be made available at:http://www.dlib.indiana.edu/workshops/alioct03/