bring out yer sips: an introduction to digital …...bring out yer sips: an introduction to digital...
TRANSCRIPT
Bring out yer SIPs: An Introduction to Digital Preservation with ArchivematicaiSkills WorkshopFebruary 9, 2018
Grant Hurley, Digital Preservation Librarian, Scholars Portal
Agenda
- Basic concepts in digital preservation- Introduction to Archivematica- Preparing transfers + Demo- Processing transfers + Demo- Looking at AIPs- Thinking about DIPs- Processing activity
What’s this “digital preservation” thing?
Uh oh
● Digital objects (both born digital
and digitized) need active management to ensure ongoing access
● Quickly-changing technological norms create risks that must be managed from the object’s creation
● Digital preservation is a set of theories and practices that work to keep digital objects authentic, available and reliable over time.
Identity: what it is; format identification, descriptive information, provenance, etc.
Integrity: establishing that a file remains unaltered over time
Identity: File formats
filename : '/Users/hurleyg/Documents/Teaching/iSkills/CheckYourBits.jpg'filesize : 582231modified : 2018-01-24T15:50:08-05:00errors : matches : - ns : 'pronom' id : 'fmt/43' format : 'JPEG File Interchange Format' version : '1.01' mime : 'image/jpeg' basis : 'extension match jpg; byte match at [[[0 14]] [[582229 2]]]' warning :
File format identifications/descriptions in Pronom (UK National Archives) - ID = Pronom identifierArchivematica uses Siegfried or FIDO
Integrity: The almighty checksum
md5 checksum = 2c93b97c3d7e53dab9161e389c98465c
md5 checksum = 1148058955697062ca583d0cc0474322
The even more almighty OAIS
Other important concepts
Identification: determining what a particular file’s format and version is
Characterization: extracting metadata related to the file’s intrinsic properties. For example, audio sample rate, channels, etc. for a mp3 file.
Validation: determining if a file is well-formed and valid according to its specification.
Normalization: converting a file from a source format to a standardized format.
What is Archivematica?
What it does
- Creates well-formed data packages for long-term preservation and access
- Takes a pre-structured transfer from a data source- Makes a Submission Information Package (SIP)- Transforms the SIP into an Archival Information Package
(AIP) - Also can create a dissemination information Package
(DIP) for access- Each of these functions has configurable tasks associated
What it does
- Stores and applies preservation policies for normalization, access copies, etc.
- Allows access to, and deletion of, AIPs- Assists in ingest of descriptive metadata, rights
information- Manages data flows in and out of system through
separate Storage Service module- Can connect to access systems for DIP deposit (mostly
just AtoM) - Can be fully automated
Where it came from- Standards for digital preservation developed in late
1990s and early 2000s, but no easy way of applying them- UNESCO released 2007 report advocating for open
source digital preservation system- Artefactual Systems started up by creating Access to
Memory (AtoM) system for archival description- Various small open source tools were also being
developed by others for particular tasks- Artefactual developed Archivematica beginning in 2008- Beta release in 2012; current release is 1.6.1 (2017)
What it is- Modular workflow created using a microservices design
pattern - Data follows structured, chained pathway, there the results of one
step triggers the initiation of the next step.
- Components can be replaced or turned off/on.
- Accessible through the browser
- Requires a virtual machine to run on (Ubuntu or CentOS)
- Runs in LAMP environment (Linux, Apache, MySQL, PHP)
- Open source, developed by Artefactual Systems staff
What it isn’t
- A storage system
- An access system
- Easy to install or maintain in production
- User friendly
- A complete digital archives workflow
Who uses itLargely, memory institutions (libraries, archives, galleries, museums) with digital collections that need preserving
- Libraries: - Digitized/born-digital content in institutional repositories- Research data management (several current projects trying to
develop Archivematica’s capacity in this domain)- Digital collections (books, journals, maps, etc.)
- Archives- Digitized collections (photographs, audio-visual materials, etc.)- Born digital donations (all sorts of stuff)
- Private papers/collections- Records from corporate bodies, institutions, etc.
The Workflow
Pre-Transfer*
Selection of objects to
preserve
Metadata preparation
Packaging for transfer
Transfer
Generates METS file to be written
to
Virus scan
File ID, characterization,
validation
Backlog
You can send something here
if you don’t want
to continue
processing it
Appraisal
File format view/analysis
Selection for retention
ID sensitive data
Ingest
Normalize files
Create & store AIP/DIP
Storage &Access*
Store in location
Send access copies to other
systems
*Not in Archivematica
*Linked to by Archivematica
Preparing transfers
Steps
- Determining content and structure (1 SIP = 1 AIP = fonds, series, item? Or section of one of these?)
- Gather and structure metadata (next slide)
- Gather submission documentation (not in demo)
- Package and structure for ingest- All data needs to be in a directory, at minimum
Metadata
Descriptive metadata
- Uses simple Dublin Core as key standard, other information is recorded as ‘Custom’ - Transfer level can be added through interface or imported- Item level must be imported via csv file
Rights metadata
- Mapped to PREMIS - Same import structure as above
Demo
- Set of photos + metadata csv file
- Bagging using Python script
Processing transfers
Demo- Same materials as before
- Uploaded to transfer source on Ontario Library Research Cloud
- Process using standard workflow and settings
- Briefly demo backlog/appraisal tabs
- Store AIP on OLRC
- No DIP
Looking at AIPs
AIP Contents
- METS file
- Originals + normalized copies in ‘objects’ folder
- Materials that made up original transfer
- Logs
Thinking about DIPs
DIPs
- Set of normalized files for access, created with access policies in preservation planning module
- Archivematica can connect to AtoM for DIP deposit to existing description
- Can transfer over some metadata, so description work can be lessened, but only at transfer/item level
Activity time!