drs 2 metadata migration june 25, 2013. agenda introduction preliminary results - content analysis...

Post on 15-Dec-2015

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DRS 2 Metadata Migration

June 25, 2013

Agenda

• Introduction• Preliminary results - content analysis• Metadata options• Next steps• Questions

INTRODUCTION

Reason for metadata migration

• Different data model– File -> Object (a coherent set of content that is

considered a single intellectual unit for purposes of description, use and/or management: for example a particular book, web harvest, serial or photograph.)

• Different metadata schemas– Many locally-defined -> community-standard

• Different packaging of metadata– Use of METS in some cases -> consistent use of

METS

Path to metadata migration

Analysis • Metadata• Content• Users

Prototype• Proof-of-

concept• Time

estimates

Migration plan• Sequence• Schedule

Develop tools• Dashboard• Object

builders

Metadata migrationWe are here

Key feedback points

Analysis • Metadata• Content• Users

Prototype• Proof-of-

concept• Time

estimates

Migration plan• Sequence• Schedule

Develop tools• Dashboard• Object

builders

Metadata migrationTechnical

options

Process options

Timing

Analysis • Metadata• Content• Users

Prototype• Proof-of-

concept• Time

estimates

Migration plan• Sequence• Schedule

Develop tools• Dashboard• Object

builders

Metadata migration

Next 3 months

What does it involve?

• Aggregate DRS1 files into objects– Different object types = content models

• Generate an object descriptor per object

Document example

PDF file

Document example

PDF file

New object (content model = DOCUMENT)

Document example

PDF file

Descriptor file

New object (content model = DOCUMENT)

Still image example

Archival master

image file

Still image example

Archival master

image file

Productionmaster

image file

Still image example

Archival master

image file

Deliverableimage file

Productionmaster

image file

Still image example

Archival master

image file

New object (content model = STILL IMAGE)

Deliverableimage file

Productionmaster

image file

Still image example

Archival master

image file

Descriptor file

Deliverableimage file

Productionmaster

image file

New object (content model = STILL IMAGE)

Aggregate DRS1 files into objects

• One content file per object– Color profile– Document– Google document container 1– Google document container 2– Google document container 3– Opaque container– Text

Aggregate DRS1 files into objects

• Multiple content files per object– Audio– Web harvest– Biomedical image– PDS document– Target image– MOA2– Still image

Generate object descriptors

• METS format– Embedded schemas (PREMIS, MODS, MIX, etc.)

• Metadata sources– DRS1 database– DRS1 METS files where they exist– Examining the content files– Catalog records?

PRELIMINARY RESULTS:CONTENT ANALYSIS

Preliminary content analysis

• Conceptually “built” objects for 13/14 content models (~36 million / 44 million files)– All but still image– Order helps!

Still Image

MOA2

Biomedical Image

PDS Document

Preliminary content analysis

• 1,091,670 objects from 36,190,120 files– ~33 files per object

• Relatively few surprises but content analysis is not complete

Content cleanup

• MOA2 files (8,024)• Index maps (2,686)• Entity files (1)• Merged PDS descriptors (22,203)

Content cleanup

• Orphaned target image (5), target description files (4)

• Orphaned audio files (71)

METADATA OPTIONS

O

DRS1 DRS2

e.g., billingCodeownerCodeaccessFlag

tech metadataowner-suppliedName

rolepurposequality

usageClass

e.g., accessFlagtech metadata

owner-suppliedNamerole

processingquality

usageClass

e.g., billingCodeownerCode

owner-suppliedName

FILE INFO

FILE INFO

OBJECT INFO

DESCRIPTOR

O

DRS1 DRS2

e.g., billingCodeownerCodeaccessFlag

tech metadataowner-suppliedName

rolepurposequality

usageClass

e.g., accessFlagtech metadata

owner-suppliedNamerole

processingquality

usageClass

e.g., billingCodeownerCode

owner-suppliedName

FILE INFO

FILE INFO

OBJECT INFO

DESCRIPTOR

O

DRS1 DRS2

e.g., billingCodeownerCodeaccessFlag

tech metadataowner-suppliedName

rolepurposequality

usageClass

accessFlagtech metadata

owner-suppliedNamerole

processingquality

usageClass

billingCodeownerCode

owner-suppliedNamecaption unit name

view text

FILE INFO

FILE INFO

OBJECT INFO

DESCRIPTOR

METS

Object LabelMODSPDS info, etc.

Object LabelObject-level MODS

Objects

• Owner supplied name is required• Need to generate during migration• Four cases

– A METS file exists– New object will be built from a single content file– New object will be built from multiple content files– No OSN (potential case)

• Proposal for most cases: – add prefix or suffix to METS or content file owner supplied

name

Objects

• Other required object elements– insertionDate• date of earliest file?

– captionBehavior• for existing objects, set based on billing code• prospectively, set by depositor

– viewText• available for all objects, not just PDS• default to off

Objects

• Descriptive metadata– Take MODS from existing METS as is or import

new• From Aleph• From Finding Aid

– If re-imported, update METS label or not?– Import from OLIVIA based on owner supplied

name for the file?

Objects from existing METS

• Identifiers for Harvard metadata – Identify finding aid identifiers– Convert “Old HOLLIS” numbers– Aleph IDs: include check digit or not?– Convert to URIs or actionable URNs from plain IDs• Could DRS format such URIs for new DRS2 input?

Objects from existing METS

• PDS elements– PDF owner text becomes caption unit name– viewOcr function becomes viewText– goto function will be automatically determined by

presence of structMap/div attributes• Caption behavior – for existing objects, set by billing code

Files

• Run automated processes to identify, validate and characterize file technical characteristics

• Extract technical metadata

Files

• isFirstGenerationinDrs – Values: yes, no, unspecified– Should we supply “yes” for archival masters

and/or top of derivation chain?

Image Files

• Converting from local scheme to MIX• Local field questions– Methodology– History– Source– Enhancements

Text files

• Converting from local scheme to textMD• Descriptor_type will be absorbed into

different places in DRS2

• Extracted metadata can supply• markup_basis • markup_language for specific schemas• possibly other elements

Audio files

• Moving from local schema to AES57-2011: Audio object structures for preservation and restoration

Versioned metadata

• History will be tracked for key administrative elements:– Access flag– Admin flag (new)– Billing code– Owner code

• What values to assign for required creation date and agent for migrated content?

NEXT STEPS

Next steps

• Continue analysis and development of technical requirements

• Build prototype• September check-in on progress• Create metadata migration plan• Open meeting to review plan

OPEN FOR QUESTIONS

top related