digital preservation at hul & drs 2
DESCRIPTION
Digital Preservation at HUL & DRS 2. HMS Countway Library Andrea Goethals July 20, 2009. Agenda. The problem What are we doing about it? DRS 2 Open for questions. 1. The problem …. The problem is twofold. 1. Keeping the bits safe. 2. Keeping the bits useful to people. - PowerPoint PPT PresentationTRANSCRIPT
Digital Preservation at HUL & DRS 2
HMS Countway LibraryAndrea Goethals
July 20, 2009
Agenda1. The problem2. What are we doing about it?3. DRS 24. Open for questions
1. The problem …
The problem is twofold1. Keeping the bits safe
2. Keeping the bits useful to people
Keeping the bits safe Digital things are amazingly easy to
destroy Bad people Software or hardware failure Human mistakes
Destruction is not always apparent Data not used frequently is at risk of unnoticed
damage Some damage is not noticeable to human eyes
and ears
Keeping the bits useful to people Digital material is fragile
Humans are dependent on technology to interpret the content...
Technologies must understand the format of the content
Technologies age and disappear!
Using information content
informationcontent
bitsformats
SWHW
HW (paper)informationcontent
HW (paper)
symbols
language
Analog bookUnmediated use
Digital bookTechnology-mediated use
Formats are key to determining usability
informationcontent
bitsformats
SWHW
supporting
technologies
digital
content
Formats are the bridge between the content we want to preserve and supporting technologies
2. What are we doing about it?
Keeping the bits safe Store the bits in multiple copies, in
multiple places Make sure the bits are not corrupt Replace media periodically Restrict who can access the bits Be able to recover the bits!
Keeping the bits safe at HUL 3-4 copies of each file, 2 different media
1-2 (tape and sometimes disk): 60 Oxford Street, Cambridge
1 (disk): Summer Street, Boston 1 (tape): Southborough
Keeping the bits safe at HUL Automated integrity monitoring
Drscheck script Compares the MD5 of each file at the Summer
Street location to the MD5 stored in a database Also checks the 60 Oxford Street disk copy
A copy of each file checked ~every 2 weeks Recent enhancement: Trigger on database
update of MD5 Storage media replaced every 4-5 years
Keeping the bits safe at HUL Overseen by OIS and UIS IT staff Just-in-case plans
Disaster recovery Server fail-overs Software failure Tape libraries Fabric switches Lost or damaged tapes
Data recovery (corruption)
It’s safe - but is it usable??? It’s not enough to preserve the bits if the
format of the bits is obsolete! WordStar? AppleWorks? Excel 1.0?
For digital content we are dependent on software that can understand the format…
The importance of format Understanding formats is fundamental to
preservation
ffd8ffe000104a46494600010201008300830000ffed0fb050686f746f73686f7020332e30003842494d03e90a5072696e7420496e666f000000007800000000004800480000000002f40240ffeeffee030602520347052803fc00020000004800480000000002d80228000100000064000000010003030300000001270f0001000100000000000000000000000060080019019000000000000000000000000000000000000000000000000000000000000000003842494d03ed0a5265736f6c7574696f6e0000000010008313a3000200 ...
The importance of format Understanding formats is fundamental to
preservation
ffd8ffe000104a46494600010201008300830000ffed0fb050686f746f73686f7020332e30003842494d03e90a5072696e7420496e666f000000007800000000004800480000000002f40240ffeeffee030602520347052803fc00020000004800480000000002d80228000100000064000000010003030300000001270f0001000100000000000000000000000060080019019000000000000000000000000000000000000000000000000000000000000000003842494d03ed0a5265736f6c7574696f6e0000000010008313a3000200 ...
SOIAPP0 JFIF 1.2APP13 IPTCAPP2 ICCDQTSOF0 183x512DRIDHTSOSECS0RST0ECS1RST1ECS2...
The importance of format Understanding formats is fundamental to
preservation
ffd8ffe000104a46494600010201008300830000ffed0fb050686f746f73686f7020332e30003842494d03e90a5072696e7420496e666f000000007800000000004800480000000002f40240ffeeffee030602520347052803fc00020000004800480000000002d80228000100000064000000010003030300000001270f0001000100000000000000000000000060080019019000000000000000000000000000000000000000000000000000000000000000003842494d03ed0a5265736f6c7574696f6e0000000010008313a3000200 ...
SOIAPP0 JFIF 1.2APP13 IPTCAPP2 ICCDQTSOF0 183x512DRIDHTSOSECS0RST0ECS1RST1ECS2...
Keeping the bits useful to people Know what formats you have Make sure there’s technology to support
the formats! Provide ways for people to find it Provide ways for curators to manage it Keep records of significant events Repair, replace
Can we approach the problem differently? In way that’s more proactive? And more efficient? And less expensive?
Yes…
The content production matters! The least expensive, and most effective
preservation measure is to think about the future when digital content is created!
It makes good sense to try to influence the content creation process
Preservation lifecycle Create digital content Ingest into a preservation repository
Continuous cycle of: Monitoring Planning Intervention
Subject to collection management decisions Transfer to next generation of the
repository or to a different repository
Keeping the bits useful to people at HUL Guidelines
More ‘preservable’ files formats: standard, well-understood, well-supported,
open Recommended supplementary documentation
(metadata) Tools
FITS, JHOVE: check quality of files, automated metadata extraction
Staff available to consult
Keeping the bits useful to people at HUL Collection management applications Discoverable content
Catalogs Persistent names Search engines
Extensive metadata Administrative, Technical, Structural,
Provenance Suite of delivery applications…
Keeping the bits useful to people at HUL Suite of delivery services
Delivery applications created and maintained at OIS
IDS, PDS, SDS, ADS, FTS Third party middle-ware maintained at OIS
RealServer, Luratech JPEG 2000 Server Third party rendering applications on users’
desktops Web browsers, RealAudio Players, TIFF viewers, ZIP
utilities
Involvement in broader preservation community efforts E-journal archiving Technical metadata
Still images, audio, documents METS (package for metadata and digital objects) PDF-A PREMIS (preservation metadata) AIHT (repository interaction demonstration) Registry of digital masters Repository certification Formats registry (UDFR)
4. DRS 2 …
DRS 2 changesWhy?1. To better support digital preservation2. To better support needs of DRS
depositors, curators and collection managers
DRS 2 changes1. New conceptual foundation
Objects, content models
2. User improvements Opaque objects, new file formats, tools,
guidance
3. A new approach to metadata4. Increased preservation planning and
activities
Objects Currently only a file level in the DRS
All management has to be done at the individual file level
Objects are aggregations of files Page-turned object Still image object
More intuitive unit for management, reporting and searching Example: How many Page-turned objects do I
have in the DRS?
Content models Types of objects Example: audio content model
Support for opaque objects A special content model Allows files in any format Digital equivalent of buying time at HD
Content can be minimally processed, or can be fully processed by depositors but not yet supported by the DRS
Must be intended for long-term preservation Will receive some preservation services Will be on a path to fuller DRS
preservation
Support for new file formats PDF Audio
MP3, MP4/AAC Drawings
AutoCAD Adobe Illustrator
Video What’s next?
Deposit, management & delivery tools Enhanced Batch Builder
Integrated with File Information Tool Set (FITS) Enhanced DRS Web Admin
Better searching Richer management and reporting Ability to perform batch updates
File Delivery Service (FDS) Created for PDF delivery Delivers a file to user’s web browser
Future of http://hul.harvard.edu/ois/
Guidance & user communityNew website for digital preservation Formats central Content models DRS practices HUL digital preservation projects Emerging standards and best practices Tools, services, registries Resources & Experts
A new approach to metadata Moving towards community-standard
schemas PREMIS, MODS, MIX, textMD, etc.
Metadata files on the file system alongside content files “object descriptor files”
Preservation, rights, descriptive metadata More reliance on embedded metadata
Automatic extraction at deposit time by FITS Third party delivery applications are becoming aware of
file-embedded metadata
Increased preservation planning and activities More granular format identification Sub-file characterization Preservation plans per content model
Digital first aid (content & metadata) “Localization,” migrations, normalizations
Technology watch Virus checking
5. Open questions …