a preservation repository in prose being a story of the drs past, present and future by andrea...

Post on 23-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Preservation Repository

in Prose Being a Story of the

DRS Past, Present and FutureBy

Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009

Today’s Agenda

DRS 1: Being a Story of the PastA Transition: Being a Story of the Present

DRS 2 and You!: Being a Story of the Future

Questions?

DRS 1: Being a Story of the Past

1997-2007

The Formative years - LDI

• November 1997 Proposal for the Library Digital Initiative

“…create the first-generation technical infrastructure to support storage of and access to digital library materials.”

• In July 1998, LDI was approved and funded

• In December 1998, planning for DRS began

Digital Repository Service (DRS)• provides a set of professionally

managed services to ensure the usability of securely stored digital objects over time.

• is both a preservation and an access repository 

• includes the bundled delivery services

October 2000 Launch

LDI Grant projects

49 Grants were awarded 1999-2006

• Digitizing analog collections• Images• Text• Audio • Music scores

• Born Digital • Biomedical images• Geospatial data • Web sites

• Online cataloging projects

Digitizing Facilities

June 1999• Harvard College Library

Imaging Services

2001 - 2002• HCL Fine Arts Library

Digital Imaging Lab (FAL DIL)• Harvard Art Museum

Digital Imaging and Visual Resources (DIVR)

• Harvard College Library Audio Preservation Services (HCL APS)

• Peabody Museum of Archaeology and Ethnology

The first Deposit

• and the first object was deposited

on October 23, 2000…

w/ Metadata• Administrative

• Stewardship, contacts (e.g., HCL Harvard-Yenching Library,

Ray Lum, etc.) • Billing account

(e.g., 33-digit account number)• Access flag

(e.g., open to the public, restricted to the Harvard community, no access)

• Technical • Physical characteristics

(e.g. for images, x and y resolution, MD5 signature, pixel width and height, compression, bit sample rate, etc.)

• Production methods (e.g. for images, Scitex; Leaf Volare; Leaf Colorshop 5.x ) 

The first Book was deposited on June 29, 2001

The first Audio was deposited on January 28, 2003

• Matins for Sunday after the Elevation of the Holy Cross

• Laura Boulton (1899-1980) Collection of Byzantine and Orthodox MusicsArchive of World Music

• One of a series of Byzantine hymns and liturgies recorded in a monastery on Patmos, 1960.

• Logbook (Part I, p. 1-10)

The first georeferenced map was deposited on January 14, 2005

• Barnstable, Massachusetts 15 Minute Digital Raster Graphic

• From an 1893 Historic USGS map reprinted in 1907

Systems and Services

1985• HOLLIS –our OPAC

1998 - 1999• VIA Visual Information Access–

union catalog• OASIS Online Archival Search

Information System – union catalog

1999-2000• OLIVIA – image cataloging tool

Systems and Services

2000-2001• DRS Digital Repository Service –

preservation and access repository• NRS Name Resolution Service – to

resolve persistent identifiers• AMS Access Management Service – to

provide access controls• IDS Image Delivery Service • PDS Page Delivery Service• FTS Full-text Search Service• NRS Web Admin• Policy Web Admin

Systems and Services2001-2002

• DRS Web Admin – staff interface to DRS• PDS Maint• Harvard Geospatial Library – union

catalog

2002-2003 • TED TEmplated Database – collection

building tool• SDS Streaming Delivery Service – for

audio delivery• ADS Asynchronous Delivery – for large

files• Cross-catalog search – for federated

searching

Systems and Services2003-2004

• Dynamic IDS – for zoom and pan features w/ JP2

• DMART - Audio deposit tool2004-2005

• RList – Course reserves tool2005-2006

• Virtual Collections2006 - 2007

• Batch Builder2008 - 2009

• Google data loading• WAX

A Transition: Being a Story of the Present

2008-2009

2008: new DRS storage system• New servers, new storage arrays, new tape

library, new storage software• Increased storage capacity• Less complex - DRS loader doesn’t need to

know the details of storage system anymore• Higher availability for deliverable content• Copies stored in 3 different geographic

locations• 3 “low use” copies, 4 “high use” copies

Cumulative file count per format type

2000 2001 2002 2003 2004 2005 2006 2007 2008

0

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

7,000,000

8,000,000

9,000,000

Im age

Tex tContainer

2000 2001 2002 2003 2004 2005 2006 2007 2008

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

18,000

20,000

A udio

Annual file size per harvard unit (gb)

2000 2001 2002 2003 2004 2005 2006 2007 2008

0

2000

4000

6000

8000

10000

12000

14000

A rn. A rb. Divinity FA S M us /S pec . Libs HCL GS D GS E HB S

CHS Law Countway HA M HU A rchives K S G Radc liffe

HCL

ArtMuseums

Cumulative non-Google file sizeper use (gb)

• April 2009: 45,742 GB

0

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

45,000

50,000

Oct-0

0

Feb-0

1

Jun-

01

Oct-0

1

Feb-0

2

Jun-

02

Oct-0

2

Feb-0

3

Jun-

03

Oct-0

3

Feb-0

4

Jun-

04

Oct-0

4

Feb-0

5

Jun-

05

Oct-0

5

Feb-0

6

Jun-

06

Oct-0

6

Feb-0

7

Jun-

07

Oct-0

7

Feb-0

8

Jun-

08

Oct-0

8

Feb-0

9

Lowuse

Highuse

Cumulative file size (gb)

• April 2009: 105,652 GB

0

20,000

40,000

60,000

80,000

100,000

120,000

Oct-0

0Feb

-01

Jun-

01Oct

-01

Feb-0

2Ju

n-02

Oct-0

2Feb

-03

Jun-

03Oct

-03

Feb-0

4Ju

n-04

Oct-0

4Feb

-05

Jun-

05Oct

-05

Feb-0

6Ju

n-06

Oct-0

6Feb

-07

Jun-

07Oct

-07

Feb-0

8Ju

n-08

Oct-0

8Feb

-09

Google

Non-Google

DIY -- http://hul.harvard.edu/ois/reporting/

2008: new program, new position

• HUL takes next step in its commitment to digital preservation and establishes:1. Digital Preservation and

Repository Manager Position• March 2008• Andrea Goethals

2. Digital Preservation Program• June 2008• Established within OIS

2008/9 priorities of new digital preservation program

1. Define additional infrastructure requirements to support digital preservation• DRS enhancements• Global digital format registry (GDFR)

2. Identify and analyze new formats for the DRS to support• PDF, email, audio, architectural

drawings, etc.

3. Establish communication network with the 2 communities we inhabit• Broader digital preservation

community • Harvard community

Avenues of communication

• Broader digital preservation community• Conferences and meetings• Collaborative projects• Email conversations, blogs,

newsgroups

• Harvard community• Committees (ULC, CCCC, DMCC,

DCSWG, etc.)• Digital project librarians• Ad-hoc focus groups, meetings and

email with stakeholders (depositors, curators and collection managers)

• Customer surveys

These communities inform our thinking about:

• Concepts and terms• Metadata• Data models• Content

• Recommended & supported formats

• Best practices• Preservation planning and actions• Storage, management and monitoring

• Certifications• Registries• Tools and services

DRS customer survey 2008

• August - September 2008• Users of DRS tools or services• To evaluate the level of

satisfaction with DRS tools, services, and websites

• To understand any unmet needs

Survey findings

• Question 1: What word or phrase best describes the DRS?

• In general the DRS is valued for its preservation services and perceived as stable, secure and trusted.

Other key findings of survey

DRS Customers want:• Support for more formats• Guidance on preservation

formats and content creation• Better search and editing

management tools• Delivery services that use

common or popular third-party applications

Trends in DRS customer needs

1. Problem of abundance2. Remote creators3. Diversity of formats

1. Problem of abundance

DRS owners and depositors:

• Are increasingly overwhelmed by the amount of digital content to preserve

• Can’t fully process the material they want to deposit into the DRS

• Can’t go through a deposit process that is time-consuming

2. Remote creators

• Increasingly DRS owners and depositors are acquiring content they did not create

• DRS staff can not influence the formats or technical properties of this content during creation

3. Diversity of formats• DRS owners and depositors increasingly

need to preserve formats and genres that aren’t currently supported by the DRS

CAD formatsSpreadsheet formats

3D visualization formats

Presentation formats

Additional audio formats

Databases

Video formatsLocally archived websites

Executable file formats Raw survey data

Word processing formats

Raw camera files

Implications of these trends

The DRS needs to:• accept and preserve minimally-

processed content• provide a time-efficient deposit

process• support a broad range of formats

and genresAnd:• can’t rely on the content being in

“preservable” formats prior to deposit into the DRS

DRS 2 and You!: Being a Story of the Future

2009 -

DRS 2 changes

Why?1. To better support digital

preservation2. To better support needs of DRS

depositors, curators and collection managers

DRS 2 changes

1. New conceptual foundation• Objects• Content models

2. User improvements• Support for opaque objects• Support for new file formats• Deposit, management &

delivery tools• Guidance & user community

3. A new approach to metadata4. Increased preservation planning

and activities

Objects

• Currently only a file level in the DRS• All management has to be done at

the individual file level

• Objects are aggregations of files • Page-turned object• Still image object

• More intuitive unit for management, reporting and searching • Example: How many Page-turned

objects do I have in the DRS?

Content models

• Types of objects• Example: audio content model

Support for opaque objects

• A special content model• Allows files in any format• The digital equivalent of buying time at

HD• Content can be minimally processed• Must be intended for long-term

preservation

• The content could be fully processed by depositors but not supported yet by DRS

• Will receive some preservation services• Will be on a path to fuller DRS

preservation

Support for new file formats

• PDF• Audio

• MP3, MP4/AAC

• Drawings• AutoCAD• Adobe Illustrator

• Video• What’s next?

Deposit, management & delivery tools

• Enhanced Batch Builder• Integrated with File Information

Tool Set (FITS)• Enhanced DRS Web Admin

• Better searching • Richer management and reporting• Ability to perform batch updates

• File Delivery Service (FDS)• Created for PDF delivery• Delivers a file to user’s web browser

Future of http://hul.harvard.edu/ois/

Guidance & user community

New website for digital preservation

• Formats central• Content models• DRS practices• HUL digital preservation projects• Emerging standards and best

practices• Tools, services, registries• Resources & Experts

A new approach to metadata

• Moving towards community-standard schemas• PREMIS, MODS, MIX, textMD, etc.

• Metadata files on the file system alongside content files• “object descriptors”

• Preservation, rights, descriptive metadata

• More reliance on embedded metadata• Automatic extraction at deposit time by FITS• Third party delivery applications are

becoming aware of file-embedded metadata

Increased preservation planning and activities

• More granular format identification• Sub-file characterization

• Preservation plans per content model• Digital first aid (content & metadata)• “Localization,” migrations,

normalizations

• Technology watch• Virus checking

DRS 2 process

• Phases of work• DRS 2.1, 2.2, 2.3, etc.

• Themed phases• DRS 2.1: “Object Security and

Integrity”• DRS 2.2: “Management and

Monitoring”

• Includes support for new formats• DRS 2.1: PDFs, opaque objects• DRS 2.2: more audio formats

(MP3, MP4/AAC)

http://hul.harvard.edu/ois/systems/drs/enhancements.html

Questions?

Image credits• Future ghost

• http://www.animationartgallery.com/images/OSC/OSCJL2.gif

• Marley’s ghost• http://cueballcol.files.wordpress.com/2007/12/435px-a_c

hristmas_carol_-_marley27s_ghost.jpg• Ghost of the past

• https://www.1st-art-gallery.com/thumbnail/202533/1/Scrooge-And-The-Ghost-Of-Marley,-From-Dickens-A-Christmas-Carol.jpg

• Ignorance and want• http://doxoblogy.files.wordpress.com/2007/03/a_christm

as_carol_02.jpg• Weight of wikipedia

• http://images.theage.com.au/ftage/ffximage/2008/05/26/300_wikipedia1.jpg

• Lots of people• http://repairstemcell.files.wordpress.com/2009/02/lotsa-

people.jpg• Ghost of the future

• http://www.ibiblio.org/ebooks/Dickens/Carol/4.jpg • Mr. Magoo

• http://www.affordablehousinginstitute.org/blogs/us/Magoo_christmas_future_small.jpg

top related