technical appraisal and change impact analysis - idcc17 workshop
TRANSCRIPT
GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation]
Simon Waddington (King’s College London)
Technical appraisal and change impact analysis
Appraisal ◦ Aims to determine which data should be kept by an
organisation◦ Traditionally performed prior to transfer to an archive◦ Guided by policies based on defined criteria
Technical appraisal◦ Evaluation of the (on-going) feasibility of preserving the
digital objects◦ Answers the question “can we preserve?”
Technical appraisal
Simple digital objects ◦ E.g. files, software applications, operating systems◦ Include hardware specification
Complex digital objects ◦ Digital objects made by combining a number of simple
digital objects
Dependency◦ Relationships between components of a complex digital
object◦ Functional relationship
Complex digital objects
Examples of complex digital objects
Digital video artwork Science experiment object
Video codec Container
Media player
Operating system
Computer
Digital video
Document ViewerImage Viewer
Image File
Scripting Language
Database
Document File
Complex digital objects subject to changing external environment◦ Technical appraisal required on an ongoing basis to
support long term reuse Reusability implies complex digital objects
may need to be adapted◦ Potential adaptations termed recovery options◦ Significant properties – specify what features should be
maintained Main risk considered is availability
◦ Obsolescence◦ Hardware failure
Change and reuse
Is this the Flying Scotsman?◦ Cost of the restoration £4.5 million from 2006–2016
Authenticity
Digital video artwork◦ Comprises videos and their surrounding technical environment◦ Video codec, audio codec, subtitles, container, media player, operating
system, computer, display
Mary - digital art conservator◦ Supports acquisition decisions◦ Maintains artworks for exhibition◦ Has limited technical knowledge of video◦ Has no control over the technologies used by artists
Artworks are required for ongoing display◦ Adapt artwork to current technical environment◦ Maintain viewing experience rather than use of specific technologies◦ Potentially exist in multiple versions
Artworks may be maintained indefinitely
Media case study
Sow Farm by John Gerrard
Space science experiment◦ Raw data captured by instrument, stored in database◦ Scripts written by scientists to process raw data◦ Image files and documents generated by scripts
Steve – space science data manager◦ Responsible for maintaining data from multiple experiments◦ Little or no control on the technologies used by scientists◦ Large volumes of experiments to deal with
Examples◦ Earth observation, solar measurements, material science, cell biology◦ Often time-related and expensive/impossible to replicate
Reuse – continuing over long timeframes◦ Compare performance of different instruments◦ Compare processing techniques◦ Determine long term trends e.g. in solar activity◦ Deal with errors and anomalies
Science case study
What are the external risks to a complex digital object?
What are the proximity and impact of those risks and what are the recovery options?
Implementation of the chosen recovery option
Risk assessment process
Maintain inventory of artworks and components ◦ Video formats, players, operating systems etc.
Monitoring the external environment◦ Aka preservation watch◦ Monitors websites and external news sources ◦ Networks with fellow conservators
Technical analysis◦ Records technical specifications of components◦ Learns from practical experience of testing
Mary’s manual approach
External monitoring is time-consuming and unreliable◦ E.g. QuickTime formats
Hard to plan forward◦ Sudden unavailability of a component hard to predict rigorously◦ May imply a large amount of work if a technology is used in many
artworks
Compatibility of components◦ Based on human experience rather than a systematic model
Difficult in determining recovery options◦ Time-consuming analysis and testing of many options
Problems for Mary
Large variety of scripting languages and formats used by scientists◦ No control of the technologies used
Unable to warn scientists that their experiments may need to be updated to maintain reusability
Can’t support scientists who want to rerun a particular experiment◦ E.g. provide information on website
Unfamiliar with older technologies
Problems for Steve
Normalisation◦ Convert objects to one or more “long-lived” formats◦ Performed systematically on all objects at acquisition
Problems◦ Objects may discarded before they require any adaptation◦ Objects may already be sufficiently “future proof”◦ May imply major re-engineering, whereas only minor changes are
sufficient◦ Could increase risks if wrong choices are made
Freezing◦ E.g. virtualisation◦ Software licensing, security and compliance issues◦ May be impossible to source suitable hardware◦ May not be acceptable to users e.g. scientists
Normalisation and freezing
Automated tool to assist in appraisal Main features
◦ Automated harvesting of environmental data and trend analysis
◦ Pre-built domain models for digital video and space science experiments
◦ Collection-level risk, proximity and impact analysis◦ Component-level risk, proximity and impact analysis◦ Object-level analysis and determination of recovery options
Storage◦ Tool creates a registry of objects◦ Objects themselves are not stored in the tool
PERICLES Appraisal Tool
Applied in industries such as aviation Determine availability of hardware components
Reliability engineering approach Standardised
lifecycle model for a technology ◦ Units shipped
against time
Compute lifecycle curve from harvested data ◦ Software repositories e.g. commits and downloads◦ Search engines◦ Wikipedia◦ Usage tracking data◦ Social networks
Confidence measure◦ Correlate results across different data sources
Calibration ◦ Compare results with known dates e.g. operating systems
Validation ◦ Operating systems have known end of support dates◦ Predict start date from incomplete time series
Analysis of external environment
“Push forward” principle
2012 2014 2016 2018 2020 2022 2024
Video codec
Container
Media player
Operating system
Computer
Current obsolescence
Recovery option 1
Recovery option 2
Recovery option 3
Representation of the entities and dependencies◦ OWL ontology◦ Scope - decision about what to leave in and what to leave out
Layered model◦ Domain-independent ontology (Linked Resource Model) to
describe change◦ Domain-dependent ontology – describes e.g. video components
Inherits from existing domain ontologies (e.g. CIDOC-CRM)
Modular◦ Supports reuse in different applications◦ Ontology design patterns
Ecosystem model
Describes the compatibility between instances◦ E.g. media player X and video codec Y
Does not guarantee compatibility◦ Recoverability options require testing and validation◦ Enables alternatives to be excluded
Features◦ Supports full and partial compatibility◦ Instances added by hand – currently command line tool◦ Needs to be updated over time◦ Two prebuilt ontologies provided
Compatibility relations
Reflects the cost of transforming entities of the same type◦ E.g. change media player from Mplayer to Xine
Currently built by hand using command line tool
Needs to be adapted to specific context and updated over time
Transformation relations
Use ontology to populate a probabilistic graphical model◦ States are components in complex digital object
Exhaustive analysis very costly◦ Apply a variation of Pearl’s Belief Propagation Algorithm◦ Based on efficient message passing
Generate recovery options◦ Correspond to different temporal constraints
Bayesian networks
Architecture of tool Based on web
services Java – UI
framework Analysis
components in Python and R
Triple store◦ Fuseki or
PERICLES ERMR
The technical appraisal tool is not a repository or archive
Central point is the ERMR (Entity Registry Model Repository)
Objects (composed of files, software, hardware descriptions)◦ Retained across multiple storage systems◦ Those storage systems may or may not be repositories or
archives
Distributed storage
Model Impact Change Explorer (MICE)◦ Visualisation tool using D3 Javascript library◦ Enables users to evaluate how a potential change to a
resource will impact the overall ecosystem◦ Changes described via “deltas”◦ uses PERSiST, an intermediate component for
semantic interpretation of the DVA ontology
MICE Tool
MICE GUI
MICE-Appraisal Tool IntegrationWorkflow
Engine
PERSIsT API
retrieves dependencies
and impact
forwardsChange (LRM delta)
visualises impact
accepts / rejects change
Entity Registry Model Repository (ERMR)
saves change
Technical Appraisal Tool
recovery options
inserts new
Media / selects
recovery option
returns user’s decision
sends change (RDF triples)
retrieves dependencies
and costs writes recovery options
PERICLES Appraisal Tool◦ Due for release in March 2017◦ Release on Github
PERICLES MICE tool◦ Available on Github at https://github.com/pericles-project/MICE
Licences◦ Apache License Version 2.0, January 2004◦ http://www.apache.org/licenses/
Availability and licences
Demonstrates an automated decision support for technical appraisal
Data-driven approach to monitor environmental trends
Ecosystem model to capture technical information on dependencies
Integrated tools for presenting risk-impact analysis, impact visualisation and recoverability options
Conclusions