digital preservation with archivematica: an introduction
TRANSCRIPT
Archivematica: Digital Preservation
with
Dan Gillean – Artefactual SystemsWestern New York Library Resources Council
webinarFebruary 15th, 2017
An Introduction
Today’s Agenda 1. A brief introduction to Digital Preservation 2. A brief introduction to the Archivematica project 3. A brief hands-on demo with Archivematica
https://pixabay.com/en/notebook-plan-dates-coffee-cup-1895164/
Let’s talk digital preservation
ISO 14721:2002 ISO 16363:2012
Standards allow us to communicate across space and time
https://pixabay.com/p-624054
A reference model – not a systems architecture! https://wiki.archivematica.org/
Overview
ISO 14721
Designated community:
• An identified group of potential Consumers who should be able to understand the preserved information
“Since a key purpose of an OAIS is to preserve information for a Designated Community, the OAIS must understand the Knowledge Base of its Designated Community to understand the minimum Representation Information that must be maintained.“ (p. 2-4)
Information Packages in OAIS
Data objectRepresentat
ion Information
Content information
• Provenance
• Context• Reference• Fixity• Access
RightsPreservation Description
InformationPackaging
informationDescriptive information
Information Packages in OAIS
The bitsWhat we need to
interpret the bits
Content information
What we need to
manage the bits
Preservation Description Information
What we need to know about how it’s all put together
What we need to know to discover content in the package
SIP AIP DIPSubmission Information
Package
Archival Information
Package
Dissemination
Information Package
Information Packages in OAIS
Okay…. But what are we actually DOING
when we “do” digital preservation?City Light Substation located at 7th and Yesler. Item 3889, Engineering Department Photographic Negatives (Record Series 2613-07),
Seattle Municipal Archives via https://www.flickr.com/photos/seattlemunicipalarchives/3749710573
• Governance• Organizational structure• Staffing• Procedural accountability• Preservation policy framework• Documentation• Financial sustainability• Security
ISO 16363 Reminds us that much of digital
preservation readiness is not technical – it’s organizational
ISO 16363 ISO 16363 is divided in 3
main sections:
1. Organizational Infrastructure
2. Digital Object Management
3. Infrastructure and Security Risk Management
Preservation Planning: monitoring, risk analysis, planning for obsolescence of preservation formatsIngest: Q/A of SIPS, prep of AIPsData Management: managing the descriptive info allowing for management and retrieval of contentArchival Storage: storage, maintenance, and retrieval of AIPsAdministration: overall operation, configuration of hard/software, standards compliance, policy and procedureAccess: managing DIP generation and supply to consumers, reports, etc
• Appraisal / selection• Transfer / upload
• Virus scanning• Checksum generation• File identification• Validation and
characterization• Metadata extraction• SIP creation
• Ingest• Normalization/migration or
emulation planning• AIP preparation• DIP preparation
• Preservation• AIP completion and validation• Storage and integrity checking
• Administration and Management• Preservation planning• Auditing• Rights management
• Storage• Geo-redundancy• Fixity checking
• Arrangement / description / cataloguing
• Publishing / display / exhibition• Discovery / retrieval
Your digital preservation activities might include…
Photograph of Women Working at a Bell System international Telephone Switchboard, https://catalog.archives.gov/id/1633445
??????
ISO 16363
Standards, Tools, and Services:
2016
A (highly selective) timeline
FedoraPRONOMDSpace
DROID
PREMIS
PAIMAS Archivist’s Toolkit
Archon
Hydra
Islandora
ArchivematicaBitCurato
r
Blacklight
DPN ExactlyveraPDF
SiegfriedBrunnhild
e
CollectiveAccess
WARCBaseAvalon
2000 2002 2004 2006 2008 2010 2012 2014METS OAIS LOCKSS
JHOVE
Archive-ItTRAC
AtoM ArchivesSpace
BagIt
PCDMDuraCloud ArchivesDirect
Perpetua
ACDPS
Rosetta
Dataverse
PBCoreiRODS
OwnCloud
RODA
Preservica ePADD
QCTools
EADWebrecorde
r
MediaConch
1998
COPTRCommunity Owned digital Preservation Tool Registry
http://coptr.digipres.org/Category:Tools
How do we know where to
start???http://www.desk7.net/wallpapers.aspx?typeid=8589
Know what you have https://commons.wikimedia.org/wiki/File:Magnifying_glass_-
_Faberge.jpg
Embrace Openness
Open SourceOpen StandardsOpen Formats
Open documentation
https://pixabay.com/en/key-keychain-close-up-123554/
Explore what others are doing and
connect with them
https://commons.wikimedia.org/wiki/File:Tower_Optical_Binoculars.jpg
Start an internal audit
Bryan Mason, “Monthly Check up.” https://www.flickr.com/photos/b-may/361018310
ISO 16363 Is the gold standard for auditing a
trustworthy digital repository… but don’t be intimidated, or feel like you need certification to be doing something useful.
For a more accessible breakdown of 16363, See Kara Van Malssen’s slides from PASIG NYC 2016:
https://figshare.com/articles/How_I_learned_to_stop_worrying_and_love_ISO_16363/4055661
Level 1 (Protect) Level 2 (Know) Level 3 (Monitor) Level 4 (Repair)Storage
and Geograph
ic Location
• 2complete copies not collocated
• Get media off diverse storage media and into a system
• At least 3 complete copies• At least 1 in different
geographic location• Document storage system,
media, and what’s needed to use them
• At least 1 copy in location w different disaster threat
• Obsolescence monitoring process for storage system and media
• At least 3 copies in locations w different disaster threats
• Comprehensive plan to keep files and metadata on currently accessible media or systems
File Fixity and Data Integrity
• Fixity check on ingest if checksum provided w content
• Create fixity info if not provided on transfer
• Check fixity on all ingests• Use write-blockers w
original media• Virus check high-risk
content
• Fixity checks at regular intervals
• Maintain fixity logs and supply audit on demand
• Virus check all content• Ability to detect corrupt
data
• Check fixity in response to specific events/activities
• Ability to replace/repair corrupted data
• Ensure no one has write access to all copies
Information
Security
• Identify who has read, write, move, and delete authorizations
• Restrict who has those authorizations to individual files
• Document access restrictions for content
• Maintain logs of who performed what actions on files, incl. deletions and preservation actions
• Perform audit of logs
Metadata• Inventory of content and its
storage locations• Ensure backup and non-
collocation of inventory
• Store admin metadata• Store transformative
metadata and log events
• Store standard technical and descriptive metadata
• Store standard preservation metadata
File Formats
• Encourage creators to use open formats and codecs when possible
• Inventory of file formats in use
• Monitor file format obsolescence issues
• Perform format migrations, emulation, etc. as needed
NDSA Levels of Preservation
Adapted from: http://ndsa.org/activities/levels-of-digital-preservation/
NDSA Levels of Preservation – Categories
Quantity of NDSA Levels of
Preservation CriteriaQuantity of related ISO 16363 Criteria
Storage and Geographic Location 9 34File Fixity and Data Integrity 12 29
Information Security 5 22Metadata 6 50
File Formats 4 32(Unmappable from ISO 16363) - 23
Blog post: https://www.avpreserve.com/papers-and-presentations/mapping-standards-for-richer-assessments-ndsa-levels-of-digital-preservation-and-iso-163632012/
Mappings: https://www.avpreserve.com/wp-content/uploads/2016/05/ISO-Requirements-by-NDSA-LoDP-Categories.xlsx
Slides: http://www.avpreserve.com/wp-content/uploads/2014/07/NDSA_ISO_Presentation_2014.pdf
AVPreserve – 16363/NDSA mappings
Drupal TRAC Review tool
https://wiki.archivematica.org/Internal_audit_tool
Drupal TRAC Review tool
Drupal TRAC Review tool
https://wiki.archivematica.org/Internal_audit_tool
Drupal TRAC Review tool
https://wiki.archivematica.org/Internal_audit_tool
Drupal TRAC Review tool
https://wiki.archivematica.org/Internal_audit_tool
http://dx.doi.org/10.3998/spobooks.bbv9812.0001.001
http://www.dpworkshop.org/
DPM Workshop’s 5 Organizational
Stages1. Acknowledge: Understanding that
digital preservation is a local concern;2. Act: Initiating digital preservation
projects;3. Consolidate: Seguing from projects
to programs;4. Institutionalize: Incorporating the
larger environment; and5. Externalize: Embracing inter-
institutional collaboration and dependency.
Digital Preservation Capability Maturity Model
http://www.securelyrooted.com/dpcmm
Pick a(ny) tool and play
with itBiser Todorov, “Tools.” https://commons.wikimedia.org/wiki/File:Rusty_tools.JPG
POWRR Tool Grid on COPTR
http://www.digipres.org/tools/
https://en.wikipedia.org/wiki/Monolith_(Space_Odyssey)#/media/File:African_monolith_2001.jpg
http://it.harrypotter.wikia.com/wiki/Incantesimo_di_Disarmo
The Magic Wand
The Black BoxV
SUnderstand what software can and can’t do
for you
Participate and
contributehttps://en.wikipedia.org/wiki/File:Sheridan_classroom.jpg
Seek out stakeholde
rs and build your
caseUnique Hotels, “Board Room - Vihula Manor Country Club
& Spa.” https://www.flickr.com/photos/62485988@N05/5692789910
DPC Digital Preservation Business Case Toolkit
http://wiki.dpconline.org/index.php?title=Digital_Preservation_Business_Case_Toolkit
DPC Digital Preservation Business Case Toolkit
http://wiki.dpconline.org/index.php?title=Additional_resources
Share your successes…
and your failureshttps://commons.wikimedia.org/w/index.php?curid=31154812
QUESTIONS?
Part 1: Intro to Digital Preservation
Meet Archivematica(hello world!)
What is Archivematica?Archivematica is a
web- and standards-based, open-source application which
allows your institution to
preserve long-term access to
trustworthy, authentic and reliable digital
content.
Standards basedOpen source
CustomizableIntegrated w 3rd party systemsActive community
2014
2008
2009 2010 2011 2012 2013
2007: UNESCO REPORT
0.1-ALPHA
DASHBOARD
INTRODUCED
Archivematica’s development
0.7
1.0RELEAS
ED!0.9
0.8
Bradley, K., Lei, J., Blackall, C. Towards An Open SourceArchival Repository and Preservation System (2007)
Planning and development begin. Initial Funding via UNESCO MotW Subcommittee, IMF Archives, City of Vancouver Archives
0.6-ALPHA
February 2010
May 2010
February 2011
February 2012
PREMISin
METS
0.10April 2013
August 2012 STORAGE
SERVICE 0.2January 2014
• It captures technical information about an object in order to support the implementation of preservation strategies such as normalization, migration or emulation (PREMIS Object)
• It describes relationships between digital objects (PREMIS Object)
• It provides an audit trail of actions taken by the digital preservation repository to preserve the object (PREMIS Event)
• It names the individuals, organizations and software tools responsible for taking actions to preserve digital objects (PREMIS Agent)
• It specifies the actions a repository is allowed to take to preserve digital objects (PREMIS Rights)
PREMIS
PREMIS, or Preservation Metadata Implementation Strategies, is the recognized standard for metadata about objects in a digital preservation system.
• It provides a wrapper for other metadata, such as PREMIS and Dublin Core.
• It defines relationships between digital objects and other digital objects, and between digital objects and their metadata.
• It can be used to provide technical metadata about digital objects (although Archivematica doesn’t implement it that way: we wrap PREMIS in it instead)
METS, or Metadata Encoding and Transmission Standard, was designed to support inter-repository data exchange.
METS
• Originally developed for exchange between California Digital Library and Library of Congress; specifications written up by IETF in 2008
• System agnostic, interoperable format for storage and exchange
• “Bag and tag” approach: mandatory tag file contains a manifest listing every file in the payload together with its corresponding checksum
BagIt BagIt is a hierarchical file packaging format designed to support disk-based or network-based storage and transfer of arbitrary digital content.
PREMIS in METS XML
Archivematica AIP structurePackaged according to BagIt
specifications
Virus scan, normalization report, extraction log, etc
For browsing in Archivematica
Original + normalized objects, submission docs, original metadata included at SIP creation
<mets:amdSec> <mets:techMD> PREMIS: OBJECT <mets:rightsMD> PREMIS: RIGHTS <mets:digiprovMD> PREMIS: EVENT <mets:digiprovMD> PREMIS: AGENT
PREMIS in METSMETS SECTIONS
<metsHdr> METS header
<dmdSec> Descriptive metadata
<amdSec> Administrative metadata
<fileSec> File section<structMap> Structural Map
Embracing Openness
Open SourceOpen StandardsOpen FormatsOpen documentation https://commons.wikimedia.org/wiki/File:Postcards_and_magnifying_glass.jpg
A program is free software if the program's users have the four essential freedoms: 1. The freedom to run the program as you wish, for
any purpose (freedom 0).2. The freedom to study how the program works, and
change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.
3. The freedom to redistribute copies so you can help your neighbor (freedom 2).
4. The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.
Free Software Foundation
Free Software Definition
https://www.fsf.org/licensing/essays/free-sw.html
What is Free Software?
What is Freedom?
https://commons.wikimedia.org/wiki/File:Beer_mug_transparent.png
“ ”Free as in…
Beer? Speech?
Kitten?
http://www.freestockphotos.biz/stockphoto/9343
What is Freedom?“ ”It’s all 3…and that’s not a bad
thing
Community-based development
…and many others
Broad and active user community
Development PhilosophyCommunity-based
developmentBounty model of business
• Standards-based• Open source / Creative
Commons• Generalize specific use cases• Include all features in public
release• Accept community
improvements• Iterative development via
multiple contributions over subsequent releases
• Maintain: documentation, software, wiki,
• Produce additional resources (e.g. videos, presentations, webinars)
• Participate in user forum• Offer additional paid services• Always include development
in public project
Do one thing well… Micro-
servicesHandshake
sPartnership
s
Gears – Joe DeSousa. https://www.flickr.com/photos/mustangjoe/22711070429
Metal Handshake – Grey Geezer. https://commons.wikimedia.org/wiki/File:Metal_Handshake.jpg
Hands Passing Baton - tableatny, https://www.flickr.com/photos/53370644@N06/4976497160
Micro-services Architecture
Micro-services Architecture
Micro-services: Tools Used
…and more…
Bulk_extractorClam AVElasticsearchExifToolFfmpegFido
FITSGearmangzipImagemagickInkscapeJHOVE
md5deepMediaInfoNFS-commonPython-lxmlSiegfriedunar
Handshakes:
Integration not duplication
Handshakes:AtoMArchivesSpaceArchivist’s ToolkitArkivumBinder
CONTENTdmDataverseDSpaceDuraCloudFedora
HPTrimHydraIslandoraLOCKSSOpenStack
Still from the film Color Blind - Photograph by Pui Shan Chan February 2009. https://en.wikipedia.org/wiki/File:Black_%26_White_Handshake_-_Still_from_the_film_Colour_Blind_(2009).JPG
Partnerships
Building Community Services Together
archivesDIRECT• Partnership with
DuraSpace• U.S. Based• Launched August 2014• Secure storage and
monitoring via DuraCloud
• Artefactual provides AM technical support
http://archivesdirect.org/
Perpetua• Partnership with
Arkivum• U.K. Based• Launched July 2016• Secure storage and
monitoring via Arkivum• Artefactual provides AM
technical support
http://arkivum.com/perpetua/
ArchivesCANADA
Digital Preservation Service
• Partnership with The Canadian Council of Archives (CCA)
• Canada Based• Launched September
2016• Artefactual provides AM
technical support, storage, monitoring
http://archivescanada.ca/ACDPS
QUESTIONS?
Part 2: Intro to Archivematica
RESOURCESAM homepage: https://www.archivematica.org
AM demo: http://sandbox.archivematica.org
Wiki: https://wiki.archivematica.org
Documentation: https://www.archivematica.org/docs/
RESOURCESRoadmap: https://wiki.archivematica.org/Development_roadmap:_Archivematica
Issue tracker: https://projects.artefactual.com/projects/Archivematica
Code repo: https://github.com/artefactual/Archivematica
Forum: https://groups.google.com/forum/#!forum/archivematica