digital preservation with archivematica: an introduction

71
Archivematica: Digital Preservation with Dan Gillean – Artefactual Systems Western New York Library Resources Council webinar February 15 th , 2017 An Introduction

Upload: artefactual-systems-archivematica

Post on 11-Apr-2017

93 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Digital Preservation with Archivematica: An Introduction

Archivematica: Digital Preservation

with

Dan Gillean – Artefactual SystemsWestern New York Library Resources Council

webinarFebruary 15th, 2017

An Introduction

Page 2: Digital Preservation with Archivematica: An Introduction

Today’s Agenda 1. A brief introduction to Digital Preservation 2. A brief introduction to the Archivematica project 3. A brief hands-on demo with Archivematica

https://pixabay.com/en/notebook-plan-dates-coffee-cup-1895164/

Page 3: Digital Preservation with Archivematica: An Introduction

Let’s talk digital preservation

ISO 14721:2002 ISO 16363:2012

Page 4: Digital Preservation with Archivematica: An Introduction

Standards allow us to communicate across space and time

https://pixabay.com/p-624054

Page 5: Digital Preservation with Archivematica: An Introduction

A reference model – not a systems architecture! https://wiki.archivematica.org/

Overview

ISO 14721

Page 6: Digital Preservation with Archivematica: An Introduction

Designated community:

• An identified group of potential Consumers who should be able to understand the preserved information

“Since a key purpose of an OAIS is to preserve information for a Designated Community, the OAIS must understand the Knowledge Base of its Designated Community to understand the minimum Representation Information that must be maintained.“ (p. 2-4)

Page 7: Digital Preservation with Archivematica: An Introduction

Information Packages in OAIS

Data objectRepresentat

ion Information

Content information

• Provenance

• Context• Reference• Fixity• Access

RightsPreservation Description

InformationPackaging

informationDescriptive information

Page 8: Digital Preservation with Archivematica: An Introduction

Information Packages in OAIS

The bitsWhat we need to

interpret the bits

Content information

What we need to

manage the bits

Preservation Description Information

What we need to know about how it’s all put together

What we need to know to discover content in the package

Page 9: Digital Preservation with Archivematica: An Introduction

SIP AIP DIPSubmission Information

Package

Archival Information

Package

Dissemination

Information Package

Information Packages in OAIS

Page 10: Digital Preservation with Archivematica: An Introduction

Okay…. But what are we actually DOING

when we “do” digital preservation?City Light Substation located at 7th and Yesler. Item 3889, Engineering Department Photographic Negatives (Record Series 2613-07),

Seattle Municipal Archives via https://www.flickr.com/photos/seattlemunicipalarchives/3749710573

Page 11: Digital Preservation with Archivematica: An Introduction

• Governance• Organizational structure• Staffing• Procedural accountability• Preservation policy framework• Documentation• Financial sustainability• Security

ISO 16363 Reminds us that much of digital

preservation readiness is not technical – it’s organizational

Page 12: Digital Preservation with Archivematica: An Introduction

ISO 16363 ISO 16363 is divided in 3

main sections:

1. Organizational Infrastructure

2. Digital Object Management

3. Infrastructure and Security Risk Management

Page 13: Digital Preservation with Archivematica: An Introduction

Preservation Planning: monitoring, risk analysis, planning for obsolescence of preservation formatsIngest: Q/A of SIPS, prep of AIPsData Management: managing the descriptive info allowing for management and retrieval of contentArchival Storage: storage, maintenance, and retrieval of AIPsAdministration: overall operation, configuration of hard/software, standards compliance, policy and procedureAccess: managing DIP generation and supply to consumers, reports, etc

Page 14: Digital Preservation with Archivematica: An Introduction

• Appraisal / selection• Transfer / upload

• Virus scanning• Checksum generation• File identification• Validation and

characterization• Metadata extraction• SIP creation

• Ingest• Normalization/migration or

emulation planning• AIP preparation• DIP preparation

• Preservation• AIP completion and validation• Storage and integrity checking

• Administration and Management• Preservation planning• Auditing• Rights management

• Storage• Geo-redundancy• Fixity checking

• Arrangement / description / cataloguing

• Publishing / display / exhibition• Discovery / retrieval

Your digital preservation activities might include…

Photograph of Women Working at a Bell System international Telephone Switchboard, https://catalog.archives.gov/id/1633445

Page 15: Digital Preservation with Archivematica: An Introduction

??????

ISO 16363

Page 16: Digital Preservation with Archivematica: An Introduction

Standards, Tools, and Services:

2016

A (highly selective) timeline

FedoraPRONOMDSpace

DROID

PREMIS

PAIMAS Archivist’s Toolkit

Archon

Hydra

Islandora

ArchivematicaBitCurato

r

Blacklight

DPN ExactlyveraPDF

SiegfriedBrunnhild

e

CollectiveAccess

WARCBaseAvalon

2000 2002 2004 2006 2008 2010 2012 2014METS OAIS LOCKSS

JHOVE

Archive-ItTRAC

AtoM ArchivesSpace

BagIt

PCDMDuraCloud ArchivesDirect

Perpetua

ACDPS

Rosetta

Dataverse

PBCoreiRODS

OwnCloud

RODA

Preservica ePADD

QCTools

EADWebrecorde

r

MediaConch

1998

Page 17: Digital Preservation with Archivematica: An Introduction

COPTRCommunity Owned digital Preservation Tool Registry

http://coptr.digipres.org/Category:Tools

Page 18: Digital Preservation with Archivematica: An Introduction

How do we know where to

start???http://www.desk7.net/wallpapers.aspx?typeid=8589

Page 19: Digital Preservation with Archivematica: An Introduction

Know what you have https://commons.wikimedia.org/wiki/File:Magnifying_glass_-

_Faberge.jpg

Page 20: Digital Preservation with Archivematica: An Introduction

Embrace Openness

Open SourceOpen StandardsOpen Formats

Open documentation

https://pixabay.com/en/key-keychain-close-up-123554/

Page 21: Digital Preservation with Archivematica: An Introduction

Explore what others are doing and

connect with them

https://commons.wikimedia.org/wiki/File:Tower_Optical_Binoculars.jpg

Page 22: Digital Preservation with Archivematica: An Introduction

Start an internal audit

Bryan Mason, “Monthly Check up.” https://www.flickr.com/photos/b-may/361018310

Page 23: Digital Preservation with Archivematica: An Introduction

ISO 16363 Is the gold standard for auditing a

trustworthy digital repository… but don’t be intimidated, or feel like you need certification to be doing something useful.

For a more accessible breakdown of 16363, See Kara Van Malssen’s slides from PASIG NYC 2016:

https://figshare.com/articles/How_I_learned_to_stop_worrying_and_love_ISO_16363/4055661

Page 24: Digital Preservation with Archivematica: An Introduction

Level 1 (Protect) Level 2 (Know) Level 3 (Monitor) Level 4 (Repair)Storage

and Geograph

ic Location

• 2complete copies not collocated

• Get media off diverse storage media and into a system

• At least 3 complete copies• At least 1 in different

geographic location• Document storage system,

media, and what’s needed to use them

• At least 1 copy in location w different disaster threat

• Obsolescence monitoring process for storage system and media

• At least 3 copies in locations w different disaster threats

• Comprehensive plan to keep files and metadata on currently accessible media or systems

File Fixity and Data Integrity

• Fixity check on ingest if checksum provided w content

• Create fixity info if not provided on transfer

• Check fixity on all ingests• Use write-blockers w

original media• Virus check high-risk

content

• Fixity checks at regular intervals

• Maintain fixity logs and supply audit on demand

• Virus check all content• Ability to detect corrupt

data

• Check fixity in response to specific events/activities

• Ability to replace/repair corrupted data

• Ensure no one has write access to all copies

Information

Security

• Identify who has read, write, move, and delete authorizations

• Restrict who has those authorizations to individual files

• Document access restrictions for content

• Maintain logs of who performed what actions on files, incl. deletions and preservation actions

• Perform audit of logs

Metadata• Inventory of content and its

storage locations• Ensure backup and non-

collocation of inventory

• Store admin metadata• Store transformative

metadata and log events

• Store standard technical and descriptive metadata

• Store standard preservation metadata

File Formats

• Encourage creators to use open formats and codecs when possible

• Inventory of file formats in use

• Monitor file format obsolescence issues

• Perform format migrations, emulation, etc. as needed

NDSA Levels of Preservation

Adapted from: http://ndsa.org/activities/levels-of-digital-preservation/

Page 25: Digital Preservation with Archivematica: An Introduction

NDSA Levels of Preservation – Categories

Quantity of NDSA Levels of

Preservation CriteriaQuantity of related ISO 16363 Criteria

Storage and Geographic Location 9 34File Fixity and Data Integrity 12 29

Information Security 5 22Metadata 6 50

File Formats 4 32(Unmappable from ISO 16363) - 23

Blog post: https://www.avpreserve.com/papers-and-presentations/mapping-standards-for-richer-assessments-ndsa-levels-of-digital-preservation-and-iso-163632012/

Mappings: https://www.avpreserve.com/wp-content/uploads/2016/05/ISO-Requirements-by-NDSA-LoDP-Categories.xlsx

Slides: http://www.avpreserve.com/wp-content/uploads/2014/07/NDSA_ISO_Presentation_2014.pdf

AVPreserve – 16363/NDSA mappings

Page 26: Digital Preservation with Archivematica: An Introduction

Drupal TRAC Review tool

https://wiki.archivematica.org/Internal_audit_tool

Page 27: Digital Preservation with Archivematica: An Introduction

Drupal TRAC Review tool

Page 28: Digital Preservation with Archivematica: An Introduction

Drupal TRAC Review tool

https://wiki.archivematica.org/Internal_audit_tool

Page 29: Digital Preservation with Archivematica: An Introduction

Drupal TRAC Review tool

https://wiki.archivematica.org/Internal_audit_tool

Page 30: Digital Preservation with Archivematica: An Introduction

Drupal TRAC Review tool

https://wiki.archivematica.org/Internal_audit_tool

Page 31: Digital Preservation with Archivematica: An Introduction

http://dx.doi.org/10.3998/spobooks.bbv9812.0001.001

http://www.dpworkshop.org/

DPM Workshop’s 5 Organizational

Stages1. Acknowledge: Understanding that

digital preservation is a local concern;2. Act: Initiating digital preservation

projects;3. Consolidate: Seguing from projects

to programs;4. Institutionalize: Incorporating the

larger environment; and5. Externalize: Embracing inter-

institutional collaboration and dependency.

Page 32: Digital Preservation with Archivematica: An Introduction

Digital Preservation Capability Maturity Model

http://www.securelyrooted.com/dpcmm

Page 33: Digital Preservation with Archivematica: An Introduction

Pick a(ny) tool and play

with itBiser Todorov, “Tools.” https://commons.wikimedia.org/wiki/File:Rusty_tools.JPG

Page 34: Digital Preservation with Archivematica: An Introduction

POWRR Tool Grid on COPTR

http://www.digipres.org/tools/

Page 35: Digital Preservation with Archivematica: An Introduction

https://en.wikipedia.org/wiki/Monolith_(Space_Odyssey)#/media/File:African_monolith_2001.jpg

http://it.harrypotter.wikia.com/wiki/Incantesimo_di_Disarmo

The Magic Wand

The Black BoxV

SUnderstand what software can and can’t do

for you

Page 36: Digital Preservation with Archivematica: An Introduction

Participate and

contributehttps://en.wikipedia.org/wiki/File:Sheridan_classroom.jpg

Page 37: Digital Preservation with Archivematica: An Introduction

Seek out stakeholde

rs and build your

caseUnique Hotels, “Board Room - Vihula Manor Country Club

& Spa.” https://www.flickr.com/photos/62485988@N05/5692789910

Page 38: Digital Preservation with Archivematica: An Introduction

DPC Digital Preservation Business Case Toolkit

http://wiki.dpconline.org/index.php?title=Digital_Preservation_Business_Case_Toolkit

Page 39: Digital Preservation with Archivematica: An Introduction

DPC Digital Preservation Business Case Toolkit

http://wiki.dpconline.org/index.php?title=Additional_resources

Page 40: Digital Preservation with Archivematica: An Introduction

Share your successes…

and your failureshttps://commons.wikimedia.org/w/index.php?curid=31154812

Page 41: Digital Preservation with Archivematica: An Introduction

QUESTIONS?

Part 1: Intro to Digital Preservation

Page 42: Digital Preservation with Archivematica: An Introduction

Meet Archivematica(hello world!)

Page 43: Digital Preservation with Archivematica: An Introduction

What is Archivematica?Archivematica is a

web- and standards-based, open-source application which

allows your institution to

preserve long-term access to

trustworthy, authentic and reliable digital

content.

Standards basedOpen source

CustomizableIntegrated w 3rd party systemsActive community

Page 44: Digital Preservation with Archivematica: An Introduction

2014

2008

2009 2010 2011 2012 2013

2007: UNESCO REPORT

0.1-ALPHA

DASHBOARD

INTRODUCED

Archivematica’s development

0.7

1.0RELEAS

ED!0.9

0.8

Bradley, K., Lei, J., Blackall, C. Towards An Open SourceArchival Repository and Preservation System (2007)

Planning and development begin. Initial Funding via UNESCO MotW Subcommittee, IMF Archives, City of Vancouver Archives

0.6-ALPHA

February 2010

May 2010

February 2011

February 2012

PREMISin

METS

0.10April 2013

August 2012 STORAGE

SERVICE 0.2January 2014

Page 45: Digital Preservation with Archivematica: An Introduction

Standards-based

ISO 14721 (OAIS)

PREMIS METS BagIT

Page 46: Digital Preservation with Archivematica: An Introduction

• It captures technical information about an object in order to support the implementation of preservation strategies such as normalization, migration or emulation (PREMIS Object)

• It describes relationships between digital objects (PREMIS Object)

• It provides an audit trail of actions taken by the digital preservation repository to preserve the object (PREMIS Event)

• It names the individuals, organizations and software tools responsible for taking actions to preserve digital objects (PREMIS Agent)

• It specifies the actions a repository is allowed to take to preserve digital objects (PREMIS Rights)

PREMIS

PREMIS, or Preservation Metadata Implementation Strategies, is the recognized standard for metadata about objects in a digital preservation system.

Page 47: Digital Preservation with Archivematica: An Introduction

• It provides a wrapper for other metadata, such as PREMIS and Dublin Core.

• It defines relationships between digital objects and other digital objects, and between digital objects and their metadata.

• It can be used to provide technical metadata about digital objects (although Archivematica doesn’t implement it that way: we wrap PREMIS in it instead)

METS, or Metadata Encoding and Transmission Standard, was designed to support inter-repository data exchange.

METS

Page 48: Digital Preservation with Archivematica: An Introduction

• Originally developed for exchange between California Digital Library and Library of Congress; specifications written up by IETF in 2008

• System agnostic, interoperable format for storage and exchange

• “Bag and tag” approach: mandatory tag file contains a manifest listing every file in the payload together with its corresponding checksum

BagIt BagIt is a hierarchical file packaging format designed to support disk-based or network-based storage and transfer of arbitrary digital content.

Page 49: Digital Preservation with Archivematica: An Introduction
Page 50: Digital Preservation with Archivematica: An Introduction

PREMIS in METS XML

Archivematica AIP structurePackaged according to BagIt

specifications

Virus scan, normalization report, extraction log, etc

For browsing in Archivematica

Original + normalized objects, submission docs, original metadata included at SIP creation

Page 51: Digital Preservation with Archivematica: An Introduction

<mets:amdSec> <mets:techMD>      PREMIS: OBJECT <mets:rightsMD>      PREMIS: RIGHTS <mets:digiprovMD>       PREMIS: EVENT <mets:digiprovMD>      PREMIS: AGENT

PREMIS in METSMETS SECTIONS

<metsHdr>  METS header

<dmdSec>  Descriptive metadata

<amdSec> Administrative metadata

<fileSec>  File section<structMap>  Structural Map

Page 52: Digital Preservation with Archivematica: An Introduction

Embracing Openness

Open SourceOpen StandardsOpen FormatsOpen documentation https://commons.wikimedia.org/wiki/File:Postcards_and_magnifying_glass.jpg

Page 53: Digital Preservation with Archivematica: An Introduction

A program is free software if the program's users have the four essential freedoms: 1. The freedom to run the program as you wish, for

any purpose (freedom 0).2. The freedom to study how the program works, and

change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.

3. The freedom to redistribute copies so you can help your neighbor (freedom 2).

4. The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.

Free Software Foundation

Free Software Definition

https://www.fsf.org/licensing/essays/free-sw.html

What is Free Software?

Page 54: Digital Preservation with Archivematica: An Introduction

What is Freedom?

https://commons.wikimedia.org/wiki/File:Beer_mug_transparent.png

“ ”Free as in…

Beer? Speech?

Kitten?

http://www.freestockphotos.biz/stockphoto/9343

Page 55: Digital Preservation with Archivematica: An Introduction

What is Freedom?“ ”It’s all 3…and that’s not a bad

thing

Page 56: Digital Preservation with Archivematica: An Introduction

Community-based development

…and many others

Page 57: Digital Preservation with Archivematica: An Introduction

Broad and active user community

Page 58: Digital Preservation with Archivematica: An Introduction

Development PhilosophyCommunity-based

developmentBounty model of business

• Standards-based• Open source / Creative

Commons• Generalize specific use cases• Include all features in public

release• Accept community

improvements• Iterative development via

multiple contributions over subsequent releases

• Maintain: documentation, software, wiki,

• Produce additional resources (e.g. videos, presentations, webinars)

• Participate in user forum• Offer additional paid services• Always include development

in public project

Page 59: Digital Preservation with Archivematica: An Introduction

Do one thing well… Micro-

servicesHandshake

sPartnership

s

Gears – Joe DeSousa. https://www.flickr.com/photos/mustangjoe/22711070429

Metal Handshake – Grey Geezer. https://commons.wikimedia.org/wiki/File:Metal_Handshake.jpg

Hands Passing Baton - tableatny, https://www.flickr.com/photos/53370644@N06/4976497160

Page 60: Digital Preservation with Archivematica: An Introduction

Micro-services Architecture

Page 61: Digital Preservation with Archivematica: An Introduction

Micro-services Architecture

Page 62: Digital Preservation with Archivematica: An Introduction

Micro-services: Tools Used

…and more…

Bulk_extractorClam AVElasticsearchExifToolFfmpegFido

FITSGearmangzipImagemagickInkscapeJHOVE

md5deepMediaInfoNFS-commonPython-lxmlSiegfriedunar

Page 63: Digital Preservation with Archivematica: An Introduction

Handshakes:

Integration not duplication

Page 64: Digital Preservation with Archivematica: An Introduction

Handshakes:AtoMArchivesSpaceArchivist’s ToolkitArkivumBinder

CONTENTdmDataverseDSpaceDuraCloudFedora

HPTrimHydraIslandoraLOCKSSOpenStack

Still from the film Color Blind - Photograph by Pui Shan Chan February 2009. https://en.wikipedia.org/wiki/File:Black_%26_White_Handshake_-_Still_from_the_film_Colour_Blind_(2009).JPG

Page 65: Digital Preservation with Archivematica: An Introduction

Partnerships

Building Community Services Together

Page 66: Digital Preservation with Archivematica: An Introduction

archivesDIRECT• Partnership with

DuraSpace• U.S. Based• Launched August 2014• Secure storage and

monitoring via DuraCloud

• Artefactual provides AM technical support

http://archivesdirect.org/

Page 67: Digital Preservation with Archivematica: An Introduction

Perpetua• Partnership with

Arkivum• U.K. Based• Launched July 2016• Secure storage and

monitoring via Arkivum• Artefactual provides AM

technical support

http://arkivum.com/perpetua/

Page 68: Digital Preservation with Archivematica: An Introduction

ArchivesCANADA

Digital Preservation Service

• Partnership with The Canadian Council of Archives (CCA)

• Canada Based• Launched September

2016• Artefactual provides AM

technical support, storage, monitoring

http://archivescanada.ca/ACDPS

Page 69: Digital Preservation with Archivematica: An Introduction

QUESTIONS?

Part 2: Intro to Archivematica

Page 70: Digital Preservation with Archivematica: An Introduction

RESOURCESAM homepage: https://www.archivematica.org

AM demo: http://sandbox.archivematica.org

Wiki: https://wiki.archivematica.org

Documentation: https://www.archivematica.org/docs/

Page 71: Digital Preservation with Archivematica: An Introduction

RESOURCESRoadmap: https://wiki.archivematica.org/Development_roadmap:_Archivematica

Issue tracker: https://projects.artefactual.com/projects/Archivematica

Code repo: https://github.com/artefactual/Archivematica

Forum: https://groups.google.com/forum/#!forum/archivematica