premis at the british library

22
PREMIS at the British Library Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009

Upload: hiroko

Post on 20-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

PREMIS at the British Library. Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009. General. Archival Information Package (AIP) AIP is just a conceptual entity Conceptual (generic) data model Content files stored on write once media - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PREMIS at the British Library

PREMIS at the British Library

Markus Enders, The British Library

PREMIS Implementation Fair, San Fransisco, CA

07 October 2009

Page 2: PREMIS at the British Library

2

General

Archival Information Package (AIP) AIP is just a conceptual entity Conceptual (generic) data model Content files stored on write once media Content files may be containerized (stored in ZIP or WARC

files)One or more containers per AIP; files in containers may belong to various AIPs

AIP Descriptor: METS file describes the content of the AIPstructure, files, descriptive metadata, preservation metadata

Different METS profiles for different content streamseJournals, newspapers (born digital and digitized), web archiving

Common underlying document model for all AIPs

Page 3: PREMIS at the British Library

3

METS Descriptor

What is stored in the METS Descriptor? Structure of the document (logical and physical in different

structMaps)Not all content streams have two structMaps (born digital streams have only on)

Descriptive metadata File Section

Defines container files as well as content files (nested <file> elements)

Page 4: PREMIS at the British Library

4

METS Descriptor

What is stored in the METS Descriptor? Structure of the document (logical and physical in different

structMaps)Not all content streams have two structMaps (born digital streams

Descriptive metadata File Section

Defines container files as well as content files (nested <file> elements)

Preservation metadataPreservation metadata for files and representations

Page 5: PREMIS at the British Library

5

METS Descriptor

What is stored in the METS Descriptor? Preservation metadata:

Preservation metadata for files and representations

Focusses on: Audit trail – events and agents Technical metadata – basic technical metadata in METS

and PREMIS Assumption: future migrations of files necessary

No emulation considered; no environment information stored

<mets:file> elements <mets:div> elements

Page 6: PREMIS at the British Library

6

Preservation Metadata (PREMIS)in METS

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output

Newspapersuses PREMIS 2.0; MODS 3.3; METS 1.8

Web Archivinguses PREMIS 2.0; MODS 3.3; DC; METS 1.8

Page 7: PREMIS at the British Library

7

Preservation Metadata (PREMIS)eJournal content stream

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output

AIP model: One AIP per article, issue, journal, digital manifestation

Any changes will lead to a new AIP; old version of AIP is referenced

Page 8: PREMIS at the British Library

8

Preservation Metadata (PREMIS)eJournal content stream

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output

AIP model: One AIP per article, issue, journal, digital manifestation

Journal, Issue, Article: AIP consists just of a METS descriptor (mainly descriptive metadata (MODS) embedded and preservation metadata:

PREMIS: regarded as representations of intellectual entities Relationships between representations are recorded in MODS record

Page 9: PREMIS at the British Library

9

Preservation Metadata (PREMIS)eJournal content stream

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove dtd

AIP model: One AIP per article, issue, journal, manifestation

Digital Manifestation: AIP consists of content files and METS descriptor. METS descriptor contains PREMIS records for files and one for the Digital Manifestation itself

Relationships to article recorded in PREMIS record (manifestationOf) Relationships to submission is recorded in PREMIS

(containedInSubmission)

Submission: received content files in ZIP (one AIP)

Page 10: PREMIS at the British Library

10

Preservation Metadata (PREMIS) and METS:eJournal content stream

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output

amdSec: one amdSec per PREMIS record; referenced from <mets:file> and

<mets:div> elements Use of <premis:object>; <premis:agent>; <premis:event> elements

techMD: Extracted data from Jhove (files) PREMIS record of a file

digiprovMD: PREMIS record of representations (journal, issue, article) PREMIS record of a file

Page 11: PREMIS at the British Library

11

Preservation Metadata (PREMIS) and METS:eJournal content stream

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output

PREMIS elements used: objectIdentifier objectCategory preservationLevel size fixity (MD5, SHA-512) format (PRONOM) Relationships, events and agents where necessary

Page 12: PREMIS at the British Library

12

Preservation Metadata (PREMIS) and METS:eJournal content stream

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output

PREMIS elements used: objectIdentifier objectCategory preservationLevel size fixity (MD5, SHA-512) format (PRONOM) Relationships, events and agents where necessary

Redundantly in METS <file> element}

Page 13: PREMIS at the British Library

13

Preservation Metadata (PREMIS):relationships

PREMIS relationships: manifestationOf (between Manifestation and Article) containedInSubmission (between Manifestation and

Submission)

PREMIS relationships (between files: m-n relationships): migration uncompression modification

Relationships are always stored in <digiProvMD> Premis records for files will have techMD and digiProvMD

Page 14: PREMIS at the British Library

14

Preservation Metadata (PREMIS):events

PREMIS events (on file level): integrityCheck formatIdentification validation wellformness propertyExtraction

PREMIS events (on representation level): metadataUpdate

Relationships are always stored in <digiProvMD> Premis records for files will have techMD and digiProvMD

Page 15: PREMIS at the British Library

15

Preservation Metadata (PREMIS):events

PREMIS events always have an agent

Event and agents are stored in each PREMIS record:

In case an event effects more than one object, it must be repeated in each object’s PREMIS record.

Using the same identifier indicating it is the same event.

Page 16: PREMIS at the British Library

16

Preservation Metadata (PREMIS)in METS

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove dtd

Newspapersuses PREMIS 2.0; MODS 3.3; METS 1.8

Web Archivinguses PREMIS 2.0; MODS 3.3; DC; METS 1.8

• Move to PREMIS 2.0• Changes to AIP model

Page 17: PREMIS at the British Library

17

AIPs and PREMIS 2.0

Change of AIP: Newspapers need second structMap (and structLink)

Hierarchy of AIPs no longer possible Instead: one AIP per issue

Manifestations are modelled as a <fileGrp> (various manifestations per AIP possible)

Support of container files (ZIP, WARC) Modelled as nested <file> elements; no PREMIS record for

container files

No file format specific technical metadata is captured

Page 18: PREMIS at the British Library

18

METS and PREMIS 2.0

METS and PREMIS 2.0: Use of new METS schema versions:

<mets:mdWrap MDTYPE="PREMIS:OBJECT">

<premis:object xsi:type="premis:file"> instead of objectCategory

just use <digiProvMD> Agent, object, event in separate <digiProvMD> elements within

the same <amdSec> PREMIS record should be self containing

Page 19: PREMIS at the British Library

19

METS and PREMIS 2.0

Extended list of event types:

deselection: files which are defined in the AIP descriptor but never ingested (no FLocat element)

metadataExtraction vs. propertyExtraction

Extended list of relationship types (relationshipSubType):

modification vs. manipulation

Page 20: PREMIS at the British Library

20

METS and PREMIS 2.0

Extended list of event types:

deselection: files which are defined in the AIP descriptor but never ingested (no FLocat element)

metadataExtraction vs. propertyExtraction

Extended list of relationship types (relationshipSubType): modification vs. manipulation

Page 21: PREMIS at the British Library

21

METS and PREMIS 2.0

Problems:

Validation Using controlled vocabularies Considering dependencies between METS and PREMIS

Standardized workflow for creating METS and PREMIS for all content streams

Currently specific implementations for each content stream

Extending the AIP Model Preservation metadata for metadata records

Page 22: PREMIS at the British Library

22

Thanks

Markus Enders

The British Library

[email protected]