towards a common approach for access to digital archival records in europe. alex thirifays and...

22
iPRES 12th International Conference on Digital Preservation University of North Carolina at Chapel Hill Alex Thirifays Danish National Archives (DNA) E-ARK European Archival Records and Knowledge Preservation Towards a Common Approach for Access to Digital Archival Records in Europe

TRANSCRIPT

iPRES12th International Conference on Digital Preservation

University of North Carolina at Chapel Hill

Alex ThirifaysDanish National Archives (DNA)

E-ARK European Archival Records and Knowledge Preservation

Towards a Common Approach for Access to Digital Archival Records in

Europe

THEE-ARK PROJECT

ISCO-FUNDED

BY THEEUROPEAN

COMMISSIONUNDER THE

ICT-PSPPROGRAMME

www.eark-project.eu

What’s the ambition of E-ARK?Overall goal: Create open source, full-fledged digital archive with • Common workflows and terminology• Common formats (SAD-IP)• Common tools• Solution will be: Scalable, computational, modular, robust,

and adaptable

Common methods• Common framework using international

standards e.g. OAIS, PREMIS, METS, PAIS…• Reuse of existing software (e.g. ICA-AtoM) and formats

(e.g. SIARD)• Open Source, Github, etc.

Different content types• Databases, geodata, Electronic Records Management Systems

(ERMS), individual computer files, and Online Analytical Processing (OLAP)

Who and what? These designated communities…• Producers• Archives• Consumers

Need…• Everything but images (e.g. database archiving, geodata)• User friendliness• Uniformity; reduction of number of tools Savings!• Exchange Is E-ARK the first step of a common

European infrastructure? What’s next?

Get…• The Reference Implementation, which is

Reference implementation

Archival Storage

Access

E-ARKSIP

SIP Creation

Tools

Archival records

Content and Records

Management Systems

SIP – AIPConversion

E-ARKAIP

CMISInterface

Data Mining

Interface

Digital preservation systems

AIP - DIPConversion

Scalable Computation

E-ARKDIP

Archival Search, Access and

Display Tools

Content and Records

Management Systems

Data MiningShowcase

Reference implementation

Ingest

Scope

SIP•Package prepared by Pre-Ingest WP3

AIP•Package created for long-term archive WP4

DIP•Package created for accessWP5 Danish National Archives

‘Access’ workpackage main working areas and method

Access Tools

User needs

Requirements specification

& DIP format

Best practices

Search Interface

Order ManagementAIP-DIP transformation

DIP modification

End-user access to requested archives

The GAP analysis• Examine landscape of current access

solutions;• Examine user needs for access solutions• Compare those and create a

GAP analysis

Findings from GAP analysisUser requirements

Overall users’ needs are not met very well!

• Content data type coverage (databases!) Must bridge!

• Integration of Access services Must bridge!

• Metadata and search quality Must bridge!

• Usability (& exploitation) Must bridge!

The Access process

The use cases

DIP & tool requirementsReq. no

Requirement description Use Case

MoSCoW

23 The DIP must allow for the inclusion of any descriptive metadata from the AIP

UC4.2 M

24 The DIP must allow for the inclusion of any relevant descriptions of access conditions and restrictions

UC4.2 M

25 The DIP must allow for the inclusion of any relevant technical metadata about its content

UC4.2 M

26 The DIP must allow to use any relevant metadata standards within it

UC4.2 M

27 The DIP must include the date and time of the creation

UC4.2 M

28 The DIP must allow to include data in any type or format within it

UC4.2 M

29 The DIP must include information which allows its validation and authentication by the user

UC4.2 M

30 The DIP should include relevant information about the context and provenance of the package (i.e. the position in the archival hierarchy, reference to the creator and archives)

UC4.2 S

31 The DIP should allow for including / logging information about any changes done to the IP during ingest (SIP), preservation (AIP) or access preparation (DIP)

UC4.3 S

32 The DIP should include information about its current status in the DIP preparation workflow (as an example, whether the DIP is ready for delivery or still being modified)

UC4.3 S

Adaptation to local contextsPilot requirements

Metadata requirements

Examination of metadata standards:

• Categorization of metadata elements to enablecomparison of different standards

• Quantification of elements to produce a detailed impression of the coverage of each standard

Result: METS, PREMIS, EAD, EAC-CPF, INSPIRE, SIARD, Moreq

E-ARK DIP data model

E-ARK DIP folder structure

DIP Software component overview

Take-up and sustainability…

• Access attracts increasing attention/funding. For example public authorities need access to their own records. This is why national archives of Sweden and Norway are in the process of creating so-called ‘middle archives’ that cater for these needs.

• Archives need database archiving. Over the coming 5 years, the Danish National Archives will ingest around 100TB of data per year, most of which are databases. No reason to believe that public authorities in other countries generate less data.

• ~Data mining. Exploitation of data is sought for. However, there’s a conflict between confidentiality and access. Will it be solved by EU initiatives like Scrutiny?

Take-up and sustainability…

• The common IP format will – facilitate exchange of information packages and

standardize the search for them and within them– it will also reduce the number of tools needed in the

archival community, and thus their development cost and maintenance cost

• Pilots will prove the concept, which is the main strength of the E-ARK project regarding take-up

• Flexible work flows, micro-services and open source will cater for adaptability to local needs and longevity

…and a glimpse into the future

• Finalisation of the DIP format (January 2016)• Pilot release of E-ARK Access tools (April 2016)• Functionality show-off at iPRES, Bern (2016)• Final release of E-ARK Access tools (January 2017)

Beyond the E-ARK project there’s the possibility of building a common, international archival infrastructure, building on the E-ARK IP formats. It would allow people from everywhere to search in all kinds of archival repositories, exploiting data in new ways, opening the doors for new research, clever journalism, more efficient public administration, and, not the least, new business possibilities for private companies.

No, it’s not Linked Open Data, and never will b

e, but it smells like it

Thank you, iPRES!

Questions?

Alex [email protected]