digital collections: storage and access - indiana university

22
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program [email protected]

Upload: others

Post on 03-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Digital Collections:Storage and Access

Jon DunnAssistant Director for Technology

IU Digital Library [email protected]

October 2, 2003 ALI Digital Library Workshop

StorageWhy is storage an issue?

Space requirementsPersistenceAccessibility

Needs depend on purpose of storageCapture/encodingAccess/deliveryPreservation

October 2, 2003 ALI Digital Library Workshop

Storage: Working SpaceSpace for storage of digital files during capture/encoding/quality control processPossibilities

PC hard driveFile server / LAN

IssuesCapacity, backup, speed, accessibility

October 2, 2003 ALI Digital Library Workshop

Storage: Access/DeliveryStorage of derivative files for web delivery

Image, audio, video, text files, etc.Possibilities

Local web serverCommercially-hosted web siteConsortial service provider

Issues: capacity, backup, performance, software integration, maintenance/migration

October 2, 2003 ALI Digital Library Workshop

Storage: PreservationMuch harder problemLonger term

Issues of longevity of media, hardware, file format“Where did we put the files?”

Larger filesHard disk storage, traditional backup methods not cost-effective

Infrequency of accessProblems do not become immediately evident

October 2, 2003 ALI Digital Library Workshop

Long-Term Storage OptionsRemovable media stored offline

OpticalCD-R (CD-Recordable)DVD-R (DVD-Recordable), DVD+R, DVD+RW, DVD-RW, …

TapeDLT, 8mm, DAT, …

Pros: cheap, easy, produces tangible itemCons: Low capacity, physical space requirements, unknown longevity, migration, potential format obsolescence

Online/nearline storage systemsHSM: Hierarchical Storage Management

Combine disk and automated tape storage with software to keep track of where files are located

Locally managed or remote providerPros: high capacity, migration can be handled by software, Cons: expensive, complex, network bandwidth issues, must trust service provider, potential single point of failure

October 2, 2003 ALI Digital Library Workshop

HSM Example: IU’s Massive Data Storage Service (MDSS)

HPSS (High Performance Storage System) software

Developed as collaboration of IBM and US national labs

Four tape robots 2 in Bloomington, 2 in IndianapolisData can be mirrored

540 terabytes (TB) total storage~75 TB used as of April 2001

October 2, 2003 ALI Digital Library Workshop

A digital object is more than just a file!

Metadata

Delivery page image files (JPEG)

Hi-res page image files (TIFF)

Text file (TEI/XML)

October 2, 2003 ALI Digital Library Workshop

A digital object is more than just a file!

EADFinding

Aid

October 2, 2003 ALI Digital Library Workshop

DL ObjectsDigital library “objects” have many parts

MetadataPreservation/archival filesDelivery files

How do we keep them connected?Now: Good practice in file naming, directory organization, project documentation -not scalable!Future: Digital object repository

October 2, 2003 ALI Digital Library Workshop

Data PersistenceKey is migrationKeeping the bits alive

Physical mediaLogical media format

Keeping the bits understandableFile formatMetadata

Small “pockets” of digital content pose a problem for migration

October 2, 2003 ALI Digital Library Workshop

DL Object Repository

Preservation version in HSM

Delivery version(s) on web server

Metadata records

RepositorySystem

Users andapplications

October 2, 2003 ALI Digital Library Workshop

Web Delivery FunctionsSearching

MetadataFull text

BrowsingBy subject, date, author, …

NavigationPage turning, image panning/zooming, …

StreamingFor audio/video

ReuseDownloading, format conversionLinking, persistent naming

Access controlIf necessary

October 2, 2003 ALI Digital Library Workshop

Digital Collection Delivery Software

Very complex systemsNeed to integrate data from databases, full-text search engines, file systems, and other sourcesCross-collection searchingCommercial

ContentDM, Luna Insight, various library management system addons

Open sourceUMich DLXS, Greenstone, Eprints, MIT DSpace, …

Homegrown

October 2, 2003 ALI Digital Library Workshop

DemonstrationHoagy Carmichael Collection,IU Digital Library Programhttp://www.dlib.indiana.edu/collections/hoagy/

October 2, 2003 ALI Digital Library Workshop

Exposing Digital Resources Broadly

Pay servicesRLG Cultural Materials, Archival Resources

Free servicesUniversity of Michigan OAIster

www.oaister.orgUIUC Digital Gateway to Cultural Heritage Materials

oai.grainger.uiuc.edu

OAI-PMHOpen Archives Initiative Protocol for Metadata Harvestingwww.openarchives.org

Google

October 2, 2003 ALI Digital Library Workshop

OAI Metadata HarvestingExtract metadata from various sourcesBuild services on local copies of metadata

user

. . .

search for “Indiana”

local copy ofmetadata

metadataharvested offline

metadataharvested offline

metadataharvested offline

metadataharvested offline

all searching, browsing, etc. performed on the metadata here

Data providers

Service provider

October 2, 2003 ALI Digital Library Workshop

More Information

Bibliography to be made available at:http://www.dlib.indiana.edu/workshops/alioct03/