digital preservation at hul & drs 2

38
Digital Preservation at HUL & DRS 2 HMS Countway Library Andrea Goethals July 20, 2009

Upload: yakov

Post on 21-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Digital Preservation at HUL & DRS 2. HMS Countway Library Andrea Goethals July 20, 2009. Agenda. The problem What are we doing about it? DRS 2 Open for questions. 1. The problem …. The problem is twofold. 1. Keeping the bits safe. 2. Keeping the bits useful to people. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Digital Preservation at HUL & DRS 2

Digital Preservation at HUL & DRS 2

HMS Countway LibraryAndrea Goethals

July 20, 2009

Page 2: Digital Preservation at HUL & DRS 2

Agenda1. The problem2. What are we doing about it?3. DRS 24. Open for questions

Page 3: Digital Preservation at HUL & DRS 2

1. The problem …

Page 4: Digital Preservation at HUL & DRS 2

The problem is twofold1. Keeping the bits safe

2. Keeping the bits useful to people

Page 5: Digital Preservation at HUL & DRS 2

Keeping the bits safe Digital things are amazingly easy to

destroy Bad people Software or hardware failure Human mistakes

Destruction is not always apparent Data not used frequently is at risk of unnoticed

damage Some damage is not noticeable to human eyes

and ears

Page 6: Digital Preservation at HUL & DRS 2

Keeping the bits useful to people Digital material is fragile

Humans are dependent on technology to interpret the content...

Technologies must understand the format of the content

Technologies age and disappear!

Page 7: Digital Preservation at HUL & DRS 2

Using information content

informationcontent

bitsformats

SWHW

HW (paper)informationcontent

HW (paper)

symbols

language

Analog bookUnmediated use

Digital bookTechnology-mediated use

Page 8: Digital Preservation at HUL & DRS 2

Formats are key to determining usability

informationcontent

bitsformats

SWHW

supporting

technologies

digital

content

Formats are the bridge between the content we want to preserve and supporting technologies

Page 9: Digital Preservation at HUL & DRS 2

2. What are we doing about it?

Page 10: Digital Preservation at HUL & DRS 2

Keeping the bits safe Store the bits in multiple copies, in

multiple places Make sure the bits are not corrupt Replace media periodically Restrict who can access the bits Be able to recover the bits!

Page 11: Digital Preservation at HUL & DRS 2

Keeping the bits safe at HUL 3-4 copies of each file, 2 different media

1-2 (tape and sometimes disk): 60 Oxford Street, Cambridge

1 (disk): Summer Street, Boston 1 (tape): Southborough

Page 12: Digital Preservation at HUL & DRS 2

Keeping the bits safe at HUL Automated integrity monitoring

Drscheck script Compares the MD5 of each file at the Summer

Street location to the MD5 stored in a database Also checks the 60 Oxford Street disk copy

A copy of each file checked ~every 2 weeks Recent enhancement: Trigger on database

update of MD5 Storage media replaced every 4-5 years

Page 13: Digital Preservation at HUL & DRS 2

Keeping the bits safe at HUL Overseen by OIS and UIS IT staff Just-in-case plans

Disaster recovery Server fail-overs Software failure Tape libraries Fabric switches Lost or damaged tapes

Data recovery (corruption)

Page 14: Digital Preservation at HUL & DRS 2

It’s safe - but is it usable??? It’s not enough to preserve the bits if the

format of the bits is obsolete! WordStar? AppleWorks? Excel 1.0?

For digital content we are dependent on software that can understand the format…

Page 15: Digital Preservation at HUL & DRS 2

The importance of format Understanding formats is fundamental to

preservation

ffd8ffe000104a46494600010201008300830000ffed0fb050686f746f73686f7020332e30003842494d03e90a5072696e7420496e666f000000007800000000004800480000000002f40240ffeeffee030602520347052803fc00020000004800480000000002d80228000100000064000000010003030300000001270f0001000100000000000000000000000060080019019000000000000000000000000000000000000000000000000000000000000000003842494d03ed0a5265736f6c7574696f6e0000000010008313a3000200 ...

Page 16: Digital Preservation at HUL & DRS 2

The importance of format Understanding formats is fundamental to

preservation

ffd8ffe000104a46494600010201008300830000ffed0fb050686f746f73686f7020332e30003842494d03e90a5072696e7420496e666f000000007800000000004800480000000002f40240ffeeffee030602520347052803fc00020000004800480000000002d80228000100000064000000010003030300000001270f0001000100000000000000000000000060080019019000000000000000000000000000000000000000000000000000000000000000003842494d03ed0a5265736f6c7574696f6e0000000010008313a3000200 ...

SOIAPP0 JFIF 1.2APP13 IPTCAPP2 ICCDQTSOF0 183x512DRIDHTSOSECS0RST0ECS1RST1ECS2...

Page 17: Digital Preservation at HUL & DRS 2

The importance of format Understanding formats is fundamental to

preservation

ffd8ffe000104a46494600010201008300830000ffed0fb050686f746f73686f7020332e30003842494d03e90a5072696e7420496e666f000000007800000000004800480000000002f40240ffeeffee030602520347052803fc00020000004800480000000002d80228000100000064000000010003030300000001270f0001000100000000000000000000000060080019019000000000000000000000000000000000000000000000000000000000000000003842494d03ed0a5265736f6c7574696f6e0000000010008313a3000200 ...

SOIAPP0 JFIF 1.2APP13 IPTCAPP2 ICCDQTSOF0 183x512DRIDHTSOSECS0RST0ECS1RST1ECS2...

Page 18: Digital Preservation at HUL & DRS 2

Keeping the bits useful to people Know what formats you have Make sure there’s technology to support

the formats! Provide ways for people to find it Provide ways for curators to manage it Keep records of significant events Repair, replace

Page 19: Digital Preservation at HUL & DRS 2

Can we approach the problem differently? In way that’s more proactive? And more efficient? And less expensive?

Yes…

Page 20: Digital Preservation at HUL & DRS 2

The content production matters! The least expensive, and most effective

preservation measure is to think about the future when digital content is created!

It makes good sense to try to influence the content creation process

Page 21: Digital Preservation at HUL & DRS 2

Preservation lifecycle Create digital content Ingest into a preservation repository

Continuous cycle of: Monitoring Planning Intervention

Subject to collection management decisions Transfer to next generation of the

repository or to a different repository

Page 22: Digital Preservation at HUL & DRS 2

Keeping the bits useful to people at HUL Guidelines

More ‘preservable’ files formats: standard, well-understood, well-supported,

open Recommended supplementary documentation

(metadata) Tools

FITS, JHOVE: check quality of files, automated metadata extraction

Staff available to consult

Page 23: Digital Preservation at HUL & DRS 2

Keeping the bits useful to people at HUL Collection management applications Discoverable content

Catalogs Persistent names Search engines

Extensive metadata Administrative, Technical, Structural,

Provenance Suite of delivery applications…

Page 24: Digital Preservation at HUL & DRS 2

Keeping the bits useful to people at HUL Suite of delivery services

Delivery applications created and maintained at OIS

IDS, PDS, SDS, ADS, FTS Third party middle-ware maintained at OIS

RealServer, Luratech JPEG 2000 Server Third party rendering applications on users’

desktops Web browsers, RealAudio Players, TIFF viewers, ZIP

utilities

Page 25: Digital Preservation at HUL & DRS 2

Involvement in broader preservation community efforts E-journal archiving Technical metadata

Still images, audio, documents METS (package for metadata and digital objects) PDF-A PREMIS (preservation metadata) AIHT (repository interaction demonstration) Registry of digital masters Repository certification Formats registry (UDFR)

Page 26: Digital Preservation at HUL & DRS 2

4. DRS 2 …

Page 27: Digital Preservation at HUL & DRS 2

DRS 2 changesWhy?1. To better support digital preservation2. To better support needs of DRS

depositors, curators and collection managers

Page 28: Digital Preservation at HUL & DRS 2

DRS 2 changes1. New conceptual foundation

Objects, content models

2. User improvements Opaque objects, new file formats, tools,

guidance

3. A new approach to metadata4. Increased preservation planning and

activities

Page 29: Digital Preservation at HUL & DRS 2

Objects Currently only a file level in the DRS

All management has to be done at the individual file level

Objects are aggregations of files Page-turned object Still image object

More intuitive unit for management, reporting and searching Example: How many Page-turned objects do I

have in the DRS?

Page 30: Digital Preservation at HUL & DRS 2

Content models Types of objects Example: audio content model

Page 31: Digital Preservation at HUL & DRS 2

Support for opaque objects A special content model Allows files in any format Digital equivalent of buying time at HD

Content can be minimally processed, or can be fully processed by depositors but not yet supported by the DRS

Must be intended for long-term preservation Will receive some preservation services Will be on a path to fuller DRS

preservation

Page 32: Digital Preservation at HUL & DRS 2

Support for new file formats PDF Audio

MP3, MP4/AAC Drawings

AutoCAD Adobe Illustrator

Video What’s next?

Page 33: Digital Preservation at HUL & DRS 2

Deposit, management & delivery tools Enhanced Batch Builder

Integrated with File Information Tool Set (FITS) Enhanced DRS Web Admin

Better searching Richer management and reporting Ability to perform batch updates

File Delivery Service (FDS) Created for PDF delivery Delivers a file to user’s web browser

Page 34: Digital Preservation at HUL & DRS 2

Future of http://hul.harvard.edu/ois/

Page 35: Digital Preservation at HUL & DRS 2

Guidance & user communityNew website for digital preservation Formats central Content models DRS practices HUL digital preservation projects Emerging standards and best practices Tools, services, registries Resources & Experts

Page 36: Digital Preservation at HUL & DRS 2

A new approach to metadata Moving towards community-standard

schemas PREMIS, MODS, MIX, textMD, etc.

Metadata files on the file system alongside content files “object descriptor files”

Preservation, rights, descriptive metadata More reliance on embedded metadata

Automatic extraction at deposit time by FITS Third party delivery applications are becoming aware of

file-embedded metadata

Page 37: Digital Preservation at HUL & DRS 2

Increased preservation planning and activities More granular format identification Sub-file characterization Preservation plans per content model

Digital first aid (content & metadata) “Localization,” migrations, normalizations

Technology watch Virus checking

Page 38: Digital Preservation at HUL & DRS 2

5. Open questions …