a preservation repository in prose being a story of the drs past, present and future by
DESCRIPTION
A Preservation Repository in Prose Being a Story of the DRS Past, Present and Future By. Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009. Today’s Agenda. DRS 1: Being a Story of the Past A Transition: Being a Story of the Present DRS 2 and You!: Being a Story of the Future - PowerPoint PPT PresentationTRANSCRIPT
A Preservation Repository
in Prose Being a Story of the
DRS Past, Present and FutureBy
Andrea Goethals, Wendy Gogel In Cambridge, Massachusetts 2009
Today’s Agenda
DRS 1: Being a Story of the PastA Transition: Being a Story of the Present
DRS 2 and You!: Being a Story of the Future
Questions?
DRS 1: Being a Story of the Past
1997-2007
The Formative years - LDI
• November 1997 Proposal for the Library Digital Initiative
“…create the first-generation technical infrastructure to support storage of and access to digital library materials.”
• In July 1998, LDI was approved and funded
• In December 1998, planning for DRS began
Digital Repository Service (DRS)• provides a set of professionally
managed services to ensure the usability of securely stored digital objects over time.
• is both a preservation and an access repository
• includes the bundled delivery services
October 2000 Launch
LDI Grant projects
49 Grants were awarded 1999-2006
• Digitizing analog collections• Images• Text• Audio • Music scores
• Born Digital • Biomedical images• Geospatial data • Web sites
• Online cataloging projects
Digitizing Facilities
June 1999• Harvard College Library
Imaging Services
2001 - 2002• HCL Fine Arts Library
Digital Imaging Lab (FAL DIL)• Harvard Art Museum
Digital Imaging and Visual Resources (DIVR)
• Harvard College Library Audio Preservation Services (HCL APS)
• Peabody Museum of Archaeology and Ethnology
The first Deposit
• and the first object was deposited
on October 23, 2000…
w/ Metadata• Administrative
• Stewardship, contacts (e.g., HCL Harvard-Yenching Library,
Ray Lum, etc.) • Billing account
(e.g., 33-digit account number)• Access flag
(e.g., open to the public, restricted to the Harvard community, no access)
• Technical • Physical characteristics
(e.g. for images, x and y resolution, MD5 signature, pixel width and height, compression, bit sample rate, etc.)
• Production methods (e.g. for images, Scitex; Leaf Volare; Leaf Colorshop 5.x )
The first Book was deposited on June 29, 2001
The first Audio was deposited on January 28, 2003
• Matins for Sunday after the Elevation of the Holy Cross
• Laura Boulton (1899-1980) Collection of Byzantine and Orthodox MusicsArchive of World Music
• One of a series of Byzantine hymns and liturgies recorded in a monastery on Patmos, 1960.
• Logbook (Part I, p. 1-10)
The first georeferenced map was deposited on January 14, 2005
• Barnstable, Massachusetts 15 Minute Digital Raster Graphic
• From an 1893 Historic USGS map reprinted in 1907
Systems and Services
1985• HOLLIS –our OPAC
1998 - 1999• VIA Visual Information Access–
union catalog• OASIS Online Archival Search
Information System – union catalog
1999-2000• OLIVIA – image cataloging tool
Systems and Services
2000-2001• DRS Digital Repository Service –
preservation and access repository• NRS Name Resolution Service – to
resolve persistent identifiers• AMS Access Management Service – to
provide access controls• IDS Image Delivery Service • PDS Page Delivery Service• FTS Full-text Search Service• NRS Web Admin• Policy Web Admin
Systems and Services2001-2002
• DRS Web Admin – staff interface to DRS• PDS Maint• Harvard Geospatial Library – union
catalog
2002-2003 • TED TEmplated Database – collection
building tool• SDS Streaming Delivery Service – for
audio delivery• ADS Asynchronous Delivery – for large
files• Cross-catalog search – for federated
searching
Systems and Services2003-2004
• Dynamic IDS – for zoom and pan features w/ JP2
• DMART - Audio deposit tool2004-2005
• RList – Course reserves tool2005-2006
• Virtual Collections2006 - 2007
• Batch Builder2008 - 2009
• Google data loading• WAX
A Transition: Being a Story of the Present
2008-2009
2008: new DRS storage system• New servers, new storage arrays, new tape
library, new storage software• Increased storage capacity• Less complex - DRS loader doesn’t need to
know the details of storage system anymore• Higher availability for deliverable content• Copies stored in 3 different geographic
locations• 3 “low use” copies, 4 “high use” copies
Cumulative file count per format type
2000 2001 2002 2003 2004 2005 2006 2007 2008
0
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
6,000,000
7,000,000
8,000,000
9,000,000
Im age
Tex tContainer
2000 2001 2002 2003 2004 2005 2006 2007 2008
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
20,000
A udio
Annual file size per harvard unit (gb)
2000 2001 2002 2003 2004 2005 2006 2007 2008
0
2000
4000
6000
8000
10000
12000
14000
A rn. A rb. Divinity FA S M us /S pec . Libs HCL GS D GS E HB S
CHS Law Countway HA M HU A rchives K S G Radc liffe
HCL
ArtMuseums
Cumulative non-Google file sizeper use (gb)
• April 2009: 45,742 GB
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
50,000
Oct-0
0
Feb-0
1
Jun-
01
Oct-0
1
Feb-0
2
Jun-
02
Oct-0
2
Feb-0
3
Jun-
03
Oct-0
3
Feb-0
4
Jun-
04
Oct-0
4
Feb-0
5
Jun-
05
Oct-0
5
Feb-0
6
Jun-
06
Oct-0
6
Feb-0
7
Jun-
07
Oct-0
7
Feb-0
8
Jun-
08
Oct-0
8
Feb-0
9
Lowuse
Highuse
Cumulative file size (gb)
• April 2009: 105,652 GB
0
20,000
40,000
60,000
80,000
100,000
120,000
Oct-0
0Feb
-01
Jun-
01Oct
-01
Feb-0
2Ju
n-02
Oct-0
2Feb
-03
Jun-
03Oct
-03
Feb-0
4Ju
n-04
Oct-0
4Feb
-05
Jun-
05Oct
-05
Feb-0
6Ju
n-06
Oct-0
6Feb
-07
Jun-
07Oct
-07
Feb-0
8Ju
n-08
Oct-0
8Feb
-09
Non-Google
DIY -- http://hul.harvard.edu/ois/reporting/
2008: new program, new position
• HUL takes next step in its commitment to digital preservation and establishes:1. Digital Preservation and
Repository Manager Position• March 2008• Andrea Goethals
2. Digital Preservation Program• June 2008• Established within OIS
2008/9 priorities of new digital preservation program
1. Define additional infrastructure requirements to support digital preservation• DRS enhancements• Global digital format registry (GDFR)
2. Identify and analyze new formats for the DRS to support• PDF, email, audio, architectural
drawings, etc.
3. Establish communication network with the 2 communities we inhabit• Broader digital preservation
community • Harvard community
Avenues of communication
• Broader digital preservation community• Conferences and meetings• Collaborative projects• Email conversations, blogs,
newsgroups
• Harvard community• Committees (ULC, CCCC, DMCC,
DCSWG, etc.)• Digital project librarians• Ad-hoc focus groups, meetings and
email with stakeholders (depositors, curators and collection managers)
• Customer surveys
These communities inform our thinking about:
• Concepts and terms• Metadata• Data models• Content
• Recommended & supported formats
• Best practices• Preservation planning and actions• Storage, management and monitoring
• Certifications• Registries• Tools and services
DRS customer survey 2008
• August - September 2008• Users of DRS tools or services• To evaluate the level of
satisfaction with DRS tools, services, and websites
• To understand any unmet needs
Survey findings
• Question 1: What word or phrase best describes the DRS?
• In general the DRS is valued for its preservation services and perceived as stable, secure and trusted.
Other key findings of survey
DRS Customers want:• Support for more formats• Guidance on preservation
formats and content creation• Better search and editing
management tools• Delivery services that use
common or popular third-party applications
Trends in DRS customer needs
1. Problem of abundance2. Remote creators3. Diversity of formats
1. Problem of abundance
DRS owners and depositors:
• Are increasingly overwhelmed by the amount of digital content to preserve
• Can’t fully process the material they want to deposit into the DRS
• Can’t go through a deposit process that is time-consuming
2. Remote creators
• Increasingly DRS owners and depositors are acquiring content they did not create
• DRS staff can not influence the formats or technical properties of this content during creation
3. Diversity of formats• DRS owners and depositors increasingly
need to preserve formats and genres that aren’t currently supported by the DRS
CAD formatsSpreadsheet formats
3D visualization formats
Presentation formats
Additional audio formats
Databases
Video formatsLocally archived websites
Executable file formats Raw survey data
Word processing formats
Raw camera files
Implications of these trends
The DRS needs to:• accept and preserve minimally-
processed content• provide a time-efficient deposit
process• support a broad range of formats
and genresAnd:• can’t rely on the content being in
“preservable” formats prior to deposit into the DRS
DRS 2 and You!: Being a Story of the Future
2009 -
DRS 2 changes
Why?1. To better support digital
preservation2. To better support needs of DRS
depositors, curators and collection managers
DRS 2 changes
1. New conceptual foundation• Objects• Content models
2. User improvements• Support for opaque objects• Support for new file formats• Deposit, management &
delivery tools• Guidance & user community
3. A new approach to metadata4. Increased preservation planning
and activities
Objects
• Currently only a file level in the DRS• All management has to be done at
the individual file level
• Objects are aggregations of files • Page-turned object• Still image object
• More intuitive unit for management, reporting and searching • Example: How many Page-turned
objects do I have in the DRS?
Content models
• Types of objects• Example: audio content model
Support for opaque objects
• A special content model• Allows files in any format• The digital equivalent of buying time at
HD• Content can be minimally processed• Must be intended for long-term
preservation
• The content could be fully processed by depositors but not supported yet by DRS
• Will receive some preservation services• Will be on a path to fuller DRS
preservation
Support for new file formats
• PDF• Audio
• MP3, MP4/AAC
• Drawings• AutoCAD• Adobe Illustrator
• Video• What’s next?
Deposit, management & delivery tools
• Enhanced Batch Builder• Integrated with File Information
Tool Set (FITS)• Enhanced DRS Web Admin
• Better searching • Richer management and reporting• Ability to perform batch updates
• File Delivery Service (FDS)• Created for PDF delivery• Delivers a file to user’s web browser
Future of http://hul.harvard.edu/ois/
Guidance & user community
New website for digital preservation
• Formats central• Content models• DRS practices• HUL digital preservation projects• Emerging standards and best
practices• Tools, services, registries• Resources & Experts
A new approach to metadata
• Moving towards community-standard schemas• PREMIS, MODS, MIX, textMD, etc.
• Metadata files on the file system alongside content files• “object descriptors”
• Preservation, rights, descriptive metadata
• More reliance on embedded metadata• Automatic extraction at deposit time by FITS• Third party delivery applications are
becoming aware of file-embedded metadata
Increased preservation planning and activities
• More granular format identification• Sub-file characterization
• Preservation plans per content model• Digital first aid (content & metadata)• “Localization,” migrations,
normalizations
• Technology watch• Virus checking
DRS 2 process
• Phases of work• DRS 2.1, 2.2, 2.3, etc.
• Themed phases• DRS 2.1: “Object Security and
Integrity”• DRS 2.2: “Management and
Monitoring”
• Includes support for new formats• DRS 2.1: PDFs, opaque objects• DRS 2.2: more audio formats
(MP3, MP4/AAC)
http://hul.harvard.edu/ois/systems/drs/enhancements.html
Questions?
Image credits• Future ghost
• http://www.animationartgallery.com/images/OSC/OSCJL2.gif
• Marley’s ghost• http://cueballcol.files.wordpress.com/2007/12/435px-a_c
hristmas_carol_-_marley27s_ghost.jpg• Ghost of the past
• https://www.1st-art-gallery.com/thumbnail/202533/1/Scrooge-And-The-Ghost-Of-Marley,-From-Dickens-A-Christmas-Carol.jpg
• Ignorance and want• http://doxoblogy.files.wordpress.com/2007/03/a_christm
as_carol_02.jpg• Weight of wikipedia
• http://images.theage.com.au/ftage/ffximage/2008/05/26/300_wikipedia1.jpg
• Lots of people• http://repairstemcell.files.wordpress.com/2009/02/lotsa-
people.jpg• Ghost of the future
• http://www.ibiblio.org/ebooks/Dickens/Carol/4.jpg • Mr. Magoo
• http://www.affordablehousinginstitute.org/blogs/us/Magoo_christmas_future_small.jpg