digital preservation panel medusa at the university of illinois at urbana-champaign: a digital...
TRANSCRIPT
Digital Preservation Panel
Medusa at the University of Illinois at Urbana-Champaign: A Digital Preservation Service Based on
PREMIS
Kyle Rimkus, Preservation LibrarianTom Habing, Head of Software Development
University of Illinois Urbana-Champaign Library
Short Paper Presentation
July 23, 20132013 Joint Conference on Digital Libraries
Indianapolis, Indiana
OAIS Management
SIP: Submission Information PackageAIP: Archival Information PackageDIP: Dissemination Information Package
Access Systems• ContentDM: digitized unique and rare materials from locally held
collections• DSpace: IDEALS institutional repository and electronic theses and
dissertations• ARTStor: image collections used for teaching and learning• Internet Archive: books digitized through participation in the Open
Content Alliance• HathiTrust: digitized books• Voyager catalog and locally-developed web pages: digitized books and
book-like content • Archon: digitized manuscripts and archival content • Olive ActivePaper: digitized newspapers• Ensemble Video: streaming video content, primarily digitized from the
University Archives• DLXS: XML-driven text-analysis research projects
Medusa Digital Preservation Service
• https://medusa.library.illinois.edu/• Documentation:
https://wiki.cites.illinois.edu/wiki/display/LibraryDigitalPreservation/Home
• enduring storage and management environment for digital content at the University of Illinois at Urbana-Champaign Library
• Infrastructure based on the Open Archival Information System reference model (OAIS)
• policies and underlying technological architecture designed in keeping with the model of a Trustworthy Digital Repository (TDR)
National Digital Infrastructure and Information Preservation(NDIIPP) Program grants•Phase I : 2004-2007•Phase II: 2007-2010
“Hub and Spoke” (HandS tool suite): http://dli.grainger.uiuc.edu/echodep/hands/index.html
Metadata for Digital Preservation
Intellectual entities
Intellectual entities RightsRights
ObjectsObjects AgentsAgents
EventsEvents Metadata standards in Medusa:•PREMIS (digital preservation metadata)•MODS (descriptive metadata)
• Quick Overview of Architecture
• Modeling Complex/Compound Objects in PREMIS
• Fedora Content Models for PREMIS
Part 2
Medusa Technical Architecture
Disaster RecoveryPrimary
Bit-level Preservation Object-level Preservation
Collections Registry
Fedora
Blacklight
SOLR
Content Storage Content Storage
Hydra
DX6000 Storage Cluster
Storage Node
Storage Node
...
DX6000 Storage Cluster
Storage Node
Storage Node
...
Replication
Content File Server
SOLRizer
SQL
Dat
abas
e
File SpoolAkubra
Modeling Complex/Compound Objects with PREMIS
Intellectual entities
Intellectual entities RightsRights
ObjectsObjects AgentsAgents
EventsEvents
PREMIS Relationship Type & Subtype• Collection
• Is Member Of• Metadata
• MODS• MARC• …• Parent
• Basic Image Asset• Production Master• Archival Master• Screen Size• …• Parent
• Paged Text Asset• Pages• LoRes PDF• HiRes PDF• …• Parent
• Basic Compound Asset• First Child• Child• Parent• Next Sibling• Previous Sibling
• Derivation• Has Source• Is Source Of
• …
Example
<relationship> <relationshipType>Paged_Text_Asset</relationshipType> <relationshipSubType>Pages</relationshipSubType> <relatedObjectIdentification> <relatedObjectIdentifierType>LOCAL</relatedObjectIdentifierType> <relatedObjectIdentifierValue>MEDUSA:xxx.2</relatedObjectId </relatedObjectIdentification> </relationship>
Page Images
RepresentationMEDUSA:xxx
CollectionIs Member Of
Filexxx_1.jpg
Paged Text AssetPages
RepresentationMEDUSA:xxx.2
Basic Compound AssetFirst Child / Parent
RepresentationMEDUSA:xxx.1
Filexxx_1.jp2
Basic Image AssetProduction Master
RepresentationMEDUSA:xxx.2
Basic Compound AssetNext Sibling / Previous Sibling
Filexxx_2.jpg
Basic Image AssetScreen Size
Filexxx_2.jp2
Basic Image AssetProduction Master
Basic Compound AssetNext Sibling / Previous Sibling
...
Basic Compound AssetParent
Basic Image AssetScreen Size
• Separate Fedora Object and Content Model for Each PREMIS Entity
• All PREMIS relationships are converted to Fedora RELS_EXT
PREMIS to Fedora Content Model
-PREMIS_EVENT()
-DC-RELS_EXT
PID: EVENT_1234
-PREMIS_AGENT()
-DC-RELS_EXT
PID: AGENT_1235hasAgent
-PREMIS_REPRESENTATION_OBJECT()
-DC-RELS_EXT
PID: REPRESENTATION_OBJECT_1236
-PREMIS_FILE_OBJECT()-CONTENT()
-DC-RELS_EXT
PID: FILE_OBJECT_1237relationshipType.Subtype
hasObject
• info:fedora/afmodel:MedusaPremis_FileObject
– DC– RELS-EXT • http://medusa.library.illinois.edu/ns#relationshipTypeRelationshipSubtype• http://medusa.library.illinois.edu/ns#relationshipTypeRelationshipSubtypeLinkingEvent• http://www.loc.gov/premis/rdf/v1#hasEvent• http://www.loc.gov/premis/rdf/v1#hasRightsStatement• http://www.loc.gov/premis/rdf/v1#hasIntellectualEntity
– PREMIS-FILE-OBJECT (inline PREMIS XML)– CONTENT (managed content)
Fedora CModel for PREMIS File Object
Questions?
• Medusa documentation (more coming soon) :• https://medusa.library.illinois.edu • https://wiki.cites.uiuc.edu/wiki/display/
LibraryDigitalPreservation/HomeMore on digital preservation at UIUC:• http://www.library.illinois.edu/prescons/
services/digital_preservation/digital_preservation.html
• Contact: [email protected]