d.3.1: state of the art - linked data and digital preservation

37
State of the Art SUMMARY OF D3.1 STATE OF THE ART D GIARETTA

Upload: prelida-project

Post on 01-Jul-2015

135 views

Category:

Technology


4 download

DESCRIPTION

by D. Giaretta (APARSEN), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu

TRANSCRIPT

Page 1: D.3.1: State of the Art - Linked Data and Digital Preservation

State of the ArtSUMMARY OF D3.1 STATE OF THE ART

D GIARETTA

Page 2: D.3.1: State of the Art - Linked Data and Digital Preservation

Outline Preservation – State of the Art Challenges for Linked Data Options Conclusions

Page 3: D.3.1: State of the Art - Linked Data and Digital Preservation

EC policy – a brief history – a personal view

EC support for DP research for creating digital objects

Data Digitisation

e-Infrastructure to Digital Agenda

National funding Significantly more than EC funding What is the EC role?

Page 4: D.3.1: State of the Art - Linked Data and Digital Preservation

DP research: approx 100M€ from EC

From Research on Digital Preservation within projects co-funded by the European Union in the ICT programme, 2011, Stephan Strodl et al http://cordis.europa.eu/fp7/ict/creativity/report-research-digital-preservation_en.pdf

Page 5: D.3.1: State of the Art - Linked Data and Digital Preservation

Situation now

The digital preservation community has failed in persuading the EC that there is need for more funding for DP research◦We do not have a consistent story about:◦ Costs◦ Rights◦ Methods etc◦ “Emulate or Migrate” inadequate!◦ Who is doing it right

Luxembourg unit which previously funded DP research – name changed to “Creativity” - now shows no funding for digital preservation research

EC expects results from the previous 100 M € research by deploying solutions

Page 6: D.3.1: State of the Art - Linked Data and Digital Preservation

Digital Preservation – some quotes: Head of unit funding the Digital Preservation projects asked repeatedly:◦“Who pays and why?”

NSF colleague:◦“Digital preservation is like VAT – people don’t

like it”

Page 7: D.3.1: State of the Art - Linked Data and Digital Preservation

Value pyramid

From Riding the Wave

Page 8: D.3.1: State of the Art - Linked Data and Digital Preservation

“The Digital Agenda for Europe outlines policies and actions to maximise the benefit of the digital revolution for all. Supporting research and innovation is a key priority of the Agenda, essential if we want to establish a flourishing digital economy.”

Neelie Kroes,

Vice-President of the EC, responsible for the Digital Agenda

Data is the new gold.“We have a huge goldmine… Let’s start mining it.”Neelie Kroes

That is the magic to find value amid the mass of data. The right infrastructure, the right networks, the right computing capacity and, last but not least, the right analysis methods and algorithms help us break through the mountains of rock to find the gold within.

Page 9: D.3.1: State of the Art - Linked Data and Digital Preservation

……but

Gold is precious because ◦it is rare ◦it does not combine with other elements◦it does not perish

……..but……….

Data is valuable because ◦there is so much of it◦it is more valuable when it is combined together◦BUT it is far from imperishable

Role for Linked Data

Page 10: D.3.1: State of the Art - Linked Data and Digital Preservation

OR

Page 11: D.3.1: State of the Art - Linked Data and Digital Preservation

Preservation – State of the Art

Page 12: D.3.1: State of the Art - Linked Data and Digital Preservation

Problems when preserving data

Preserve?

Preserve what?

For how long?

How to test?

Which people?

Which organisations?

How well?

• Metadata? – What kind? How much?

Page 13: D.3.1: State of the Art - Linked Data and Digital Preservation

Difficulties in digital preservation

Many different terminologies

Many different views of preservation

Many different kinds of digital objects◦ Documents◦ Data◦…… and new types of objects

Tools and Services◦Which ones work for which digital objects?◦Which tools/techniques fit together?◦ How to integrate new tools

Consistent training needed

Risks vs Cost

Who can you trust?

}Need a consistent, coherent approach to digital preservation- APARSEN.

Need an Audit and Certification system – ISO 16363

OAIS – ISO 14721

Page 14: D.3.1: State of the Art - Linked Data and Digital Preservation

Preservation techniquesFor each technique

look for evidence – what evidence?

must at least make sure we consider different types of data◦rendered vs non-rendered◦composite vs simple◦dynamic vs static◦active vs passive

must look at all types of threats

Page 15: D.3.1: State of the Art - Linked Data and Digital Preservation

Basic preservation activities

Libraries say:

“Emulate or migrate”

◦Works well with data only in special cases◦ Can repeat what was done before instead of new things

◦ Does not help with building cross-disciplinary communities

• Can repeat what has been done before

BUT• Cannot use new applications

• Convert to format which new software can use

BUT• What if there are many

software systems?

Page 16: D.3.1: State of the Art - Linked Data and Digital Preservation

Contains numbers – need meaning

16

Page 17: D.3.1: State of the Art - Linked Data and Digital Preservation

...to be combined and processed to get this

17

Level 2 Level 0 Level 1

ProcessingProcessing/c

ombining

Page 18: D.3.1: State of the Art - Linked Data and Digital Preservation

...or this

18

Page 19: D.3.1: State of the Art - Linked Data and Digital Preservation

OAIS Information model: Representation Information

The Information Model is keyRecursion ends at

KNOWLEDGEBASE of the DESIGNATED COMMUNITY

(this knowledge will change over time and region)

Does not demand that ALL Representation Information be collected at once.

A process which can be tested

Page 20: D.3.1: State of the Art - Linked Data and Digital Preservation

FITS FILE

FITS DICTIONARYFITS

STANDARD

PDF SOFTWAREJAVA VM

PDF STANDARD

FITS JAVA SOFTWARE

DICTIONARY SPECIFICATION

XML SPECIFICATION

UNICODE SPECIFICATION

Rep Info Network

Page 21: D.3.1: State of the Art - Linked Data and Digital Preservation

Additional technique: add Representation Information

Descriptions of the digitally encoded objectIdeal description allows a machine to extract information

Page 22: D.3.1: State of the Art - Linked Data and Digital Preservation

Migration

OAIS defines various types of Migration:◦Do not change the bits ◦Refresh◦Replicate

◦Change the packaging but not the content◦Repackage

◦Change the content◦Transform (usually non-reversible)◦Need to consider “Transformational Information Properties” – important for

AUTHENTICITY◦Related to “Significant properties”◦Add appropriate Representation Information for the new format

22

Page 23: D.3.1: State of the Art - Linked Data and Digital Preservation

AND – be prepared toHand-over

Preservation requires funding Funding for a dataset (or a repository) may stop Need to be ready to hand over everything needed for preservation◦OAIS (ISO 14721) defines “Archival Information Package

(AIP).◦Issues:◦ Storage naming conventions◦Representation Information ◦ Provenance◦ ….

Page 24: D.3.1: State of the Art - Linked Data and Digital Preservation

Preserving digitally encoded information

Ensure that digitally encoded information are understandable and usable over the long term Long term could start at just a few years Chain of preservation

Need to do something because things become “unfamiliar” over timeBut the same techniques enable use of data which is “unfamiliar” right now

Page 25: D.3.1: State of the Art - Linked Data and Digital Preservation

When things changes We need to:

◦Know something has changed

◦ Identify the implications of that change

◦Decide on the best course of action for preservation

◦What RepInfo we need to fill the gaps

◦ Created by someone else or creating a new one

◦ If transformed: how to maintain data authenticity

◦Alternatively: hand it over to another repository

◦Make sure data continues to be usable

Orchestration Service

Gap Identification Service

Preservation Strategy Tk

RepInfo Registry Service

Authenticity Toolkit

Packaging Tk

Data Virtualisation Toolkit

Process Virtualisation Toolkit

RepInfo Toolkit

Page 26: D.3.1: State of the Art - Linked Data and Digital Preservation

SCIDIP-ES

Storage Service

Gap Identification

Service

Orchestration Service

RepInfo Registry Service

Preservation Strategy Toolkit

Data Virtualisation

Toolkit

Process Virtualisation

Toolkit

Authenticity Toolkit

Packaging Toolkit

RepInfo Toolkit

Finding Aid

Toolkit

Cloud Storage

External Access/Use

Services

Persistent ID i/f Service

External PI

services

ISO Certification Organisation

Certification Toolkit

Services: run on remote servers

Toolkits Runs on local machines

• These SUPPLEMENT what repositories do (customised for repositories)

• Make it easier for repositories to do preservation – share the effort

Page 27: D.3.1: State of the Art - Linked Data and Digital Preservation
Page 28: D.3.1: State of the Art - Linked Data and Digital Preservation

Preservation objectives The same digital object may be preserved with different aims in mind by different repositories:For a digital document

Re-print the pages?To understand the numbers printed in the page to

do further research

For a piece of performance artReplay a recording of a particular performance?Re-perform the work?

For a scientific data fileUnderstand the numbers?Understand the numbers in the context of a

particular theory?

Page 29: D.3.1: State of the Art - Linked Data and Digital Preservation

Preservation, Value and Re-use

(re-)usability the essential test for success of preservation◦ Usability usually essential for justifying cost of preservation

Impossible to insist on common formats, semantics or software◦ How to avoid N2 problem?

Impossible to know what formats, semantics or software will be used in future

Needs appropriate Representation Information ◦ for preservation (use in the future when things have become unfamiliar)◦ for use now (use of unfamiliar data i.e. most of it!)◦ automated (re-)use as far as possible

APARSEN is bringing together a coherent, consistent, evidence-based approach to digital preservation involving tools, services, consultancy and training.

Page 30: D.3.1: State of the Art - Linked Data and Digital Preservation

Classification of objects

must at least make sure we consider different types of data◦rendered vs non-rendered◦composite vs simple◦dynamic vs static◦Active vs passive

RDF Triple: dynamic/complex/non-rendered/passive

Page 31: D.3.1: State of the Art - Linked Data and Digital Preservation

Key questions about the what is to be preservedWhat is the object to be preserved?The specific piece of RDF?The specific RDF plus data pointed toThe underlying database (if any)? The whole linked “world”?

What are the preservation objectives?The RDF and whole inference system?Just the RDF?Just the underlying database (if any)?

Page 32: D.3.1: State of the Art - Linked Data and Digital Preservation

Key questions about RDF

What Representation information is needed for the LD?Schema?Additional semantics?Evolution of links e.g. replace this host by a new one)?Snapshots?

What Transformation?One version of RDF to another?Move to replacement for RDF?Change of underlying database?Authenticity??

Who to hand over toWhat to do with the URIs? – maintain or change?What to do with the underlying database (if any)?

Page 33: D.3.1: State of the Art - Linked Data and Digital Preservation

Key questions about the things the RDF points toWill they be preserved?How to find the Representation Information?Will the Persistent Identifiers change?

Page 34: D.3.1: State of the Art - Linked Data and Digital Preservation

Joint Key QuestionsWho will pay, and why?

For which things?

Are some things more valuable – and therefore more likely to be preserved?What happens when some things disappear?

Page 35: D.3.1: State of the Art - Linked Data and Digital Preservation

OptionsBe clear about what is meantUnderstand what is possibleStart with what is agreed as valuableDon’t promise too much

Page 36: D.3.1: State of the Art - Linked Data and Digital Preservation

Input to standardsSee http://www.iso16363.org

Audit and Certification of Trustworthy repositoriesForum: OAIS Futures

Page 37: D.3.1: State of the Art - Linked Data and Digital Preservation

ConclusionsA great deal of funding (€100M) has been invested in digital preservation research by the EU

EC is not putting further funding into digital preservation research

There are technical challenges

The biggest challenge is to be clear about what the preservation aims are for Linked Data