automated archiving of dvd content esteva, vega, nieto, scott, gunnels, kumar, lamphear, henriksen,...

20
Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Upload: claribel-banks

Post on 27-Dec-2015

221 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Automated Archiving of DVD Content

Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin

TCDL 2013

Page 2: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Motivation and perspective

• Find a file based preservation solution for video art in DVD media

• Create a SIP including files to fulfill museum functions• Availability of high performance distributed storage • Next generation display systems

Page 3: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Considerations

• Study the role of the DVD in the works technical history

• Options for conversion

• Usage of the DVD in the museum

Page 4: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Considerations

• UT Research Storage Infrastructure and Services

• Example of next generation display systems

Page 5: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Workflow requirements

• Only for DVD-based works• Transcoding should not involve further

compression• Fixity to establish authenticity of disk image

and production quality/preservation master• Quality metric to assess the quality of the

transcoding • Metadata and provenance documentation• Easy to implement• SIP contents should fulfill preservation and

access functions

Page 6: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Preservation roadmap

• Best practices in video and digital preservation– Few references for DVD preservation

• ISO disk images as identical copy of the DVD– Preservation master– Base for metadata extraction and further conversions – Not playable

• High quality access file (also preservation file)– Matroska container and ffv1 codec

• Quality metric – Structural Similarity Index (SSIM)

• Documentation – metadata and processing provenance

Page 7: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Workflow step by step

• Metadata extraction• ISO disk image• Checksum• Transcoding• Checksum• Conversion to other

access files• Quality metric • Visual evaluation

Page 8: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Quality Control

• Full reference quality metric

• Structural Similarity Index

• Aim for result of 1• Algorithm available in

different programming languages

• Modeled closely to human perception

• Visual inspection of the Matroska file

Page 9: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Software choices• Informed by testing• Open source, command line tools• Support technical documentation

Page 10: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Testing and Implementing the Automated Workflow at

UT Libraries

Page 11: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

• Audiovisual Digitization Unit• DVD Archiving– Preservation Reformatting– Digital Preservation

• Example DVD archiving projects– Benson Latin American Collection– Fine Arts Library Streaming Video Project

UT Libraries and Audiovisual Digitization

Page 12: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

DVD Archiving Requirements

ISO image of original

Streaming MP4

DVD access copy

2 checksums

Descriptive metadata

Source metadata

Technical metadata

Page 13: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Current Workflow

Create decrypted ISO disc

image using

DVDShrink

Quality control of ISO

file

Move and delete file

from local

workstation to

remote server

Update Project Record

with transfer

notes and status

Compile ISO file, MP4 file, and all XML files into a single directory folder

Run

BagIt script

to contribute verifyvalid

data, checksum data,

manifest and tagmanifest data, as well as well baginfo

data.

Systems

confederates XML

into single UTVideoMD XML

document

Compile descriptive metadata as MODS XML from catalog MARC record using

MARCedit

Compile source

metadata as XML

from Digitization record

using Oxygen

Compile technical metadata as XML from ISO and MP4

using MediaInf

o

Move all XML documents to remote server

Create streaming MP4 from main ISO

media stream using

Handbrake

Quality

control of MP4 file

Move and

delete file

from local

workstation

to remote server

Create physic

al derivatives from ISO

Quality

control of

physical

derivatives

Label physi

cal derivative

s

Send physical

derivatives and

originals to cataloging

or return to branch and

denote return in Project Record

SIP

BagIt

Copy

Page 14: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Partnering with TACC

• TACC approached UT Libraries to expand testing and implementation of automated workflow

• Benefits of automation for the Libraries:– Streamline and unify current workflow– Less staff-intensive– Reduce opportunities for error

Page 15: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Testing and Results

• Test DVDs– 18 single-stream, unencrypted discs– 3 multi-stream, encrypted discs

• Workstation– Mac Pro running OS 10.8.3 Mountain Lion

• Results– Encryption errors– MP4 streaming compatibility– Saving files to host directory– SSIM and Python Library limitations– Checksums– Enthusiasm for the next phase!

Page 16: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

UT Libraries Modifications

• Addition of drag and drop functionality– DVD– Project Folder

• Addition of web-optimized MP4• Elimination of Matroska file

Page 17: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Future Modifications

• Decryption• 2 Checksums• SSIM quality measure• Expand metadata collection• UTVideoMD• Integrate BagIt

Page 18: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Manual Workflow

Create decrypted ISO disc

image using

DVDShrink

Quality control of ISO

file

Move and delete file

from local

workstation to

remote server

Update Project Record

with transfer

notes and status

Compile ISO file, MP4 file, and all XML files into a single directory folder

Run

BagIt script

to contribute verifyvalid

data, checksum data,

manifest and tagmanifest data, as well as well baginfo

data.

Systems

confederates XML

into single UTVideoMD XML

document

Compile descriptive metadata as MODS XML from catalog MARC record using

MARCedit

Compile source

metadata as XML

from Digitization record

using Oxygen

Compile technical metadata as XML from ISO and MP4

using MediaInf

o

Move all XML documents to remote server

Create streaming MP4 from main ISO

media stream using

Handbrake

Quality

control of MP4 file

Move and

delete file

from local

workstation

to remote server

Create physic

al derivatives from ISO

Quality

control of

physical

derivatives

Label physi

cal derivative

s

Send physical

derivatives and

originals to cataloging

or return to branch and

denote return in Project Record

SIP

BagIt

Copy

Page 19: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

SIP

BagIt

Automated Workflow

Page 20: Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013

Thank YouMaria Esteva and Karla Vega

Texas Advanced Computing Center

Vandy Henriksen, Jennifer Lee, Wendy MartinUniversity of Texas Libraries

Sue Ellen Jeffers and Meredith SuttonBlanton Museum of Art

Bethany ScottCharlotte Mecklenburg Library

Kertana KumarUniversity of Texas College of Natural Sciences

Summer GunnelsUniversity of Texas Cockrell School of Engineering