american archives horror story

25
ARCHIVE LTO FAILURE AND DATA LOSS

Upload: rebecca-fraimow

Post on 18-Jul-2015

847 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: American Archives Horror Story

ARCHIVE

LTO FAILURE AND DATA LOSS

Page 2: American Archives Horror Story

Who we are: WGBH MLA

Page 3: American Archives Horror Story

Who We Are: AAPB

...and more than 120 public radio and television stations and archives nationwide

Page 4: American Archives Horror Story

Digitization recently completed

WGBH’s 7,010 tapes that were sent for digitization

Page 5: American Archives Horror Story

Returned on 17 LTO-6 tapes

Page 6: American Archives Horror Story

• 5,000 hours of digitized and born digital media

• Up to 59,000 files

• Not to exceed 5.24 terabytes after transcoding has occurred

The Born Digital Deliverable

Page 7: American Archives Horror Story

• Lack of staff resources at stations• Absence of existing metadata• Unique identifiers ≠ actual names of

files• Limitations of our metadata

management system • Bicycling hard drives• Access quality vs. preservation

quality• 5.24 terabytes became 300+

terabytes

We had some challenges

Page 8: American Archives Horror Story

• Send multiple batches totaling 13,500 video and audio files

• Pull 300TB of files over our network and place on 76 3TB hard drives

– Stored on LTO-4 robotic machine in IT

– Checksums for most files did not exist

– Many files up to 100GB each

The Plan at WGBH

Page 9: American Archives Horror Story
Page 10: American Archives Horror Story

THE PROBLEM

Out of a set of 2069 files pulled for Batch 3 part 1, 1195 proved to have failed on reaching Crawford

693 failed initial analysis

394 failed QC

108 failed transcode

= 57% failure rate

The next batch had 1310 failures out of 2826 files

Page 11: American Archives Horror Story

THE PROCESS

start with csv file containing final name of file at receiving end, full path to file on source end, ID value of offline storage tape

shell script:

- sorts files by # of storage tape

- logs into DAM using ssh

- transfers file using scp through Artesia from LTO 4 tape (stored as tarball) onto 3 TB hard drive

later versions used tar rather than scp

Page 12: American Archives Horror Story

THE PROCESS (REVISED)post-transfer, compare the megabyte block counts of source and destination products

(no checksum – took too much time to perform on such large files while under time pressure)

failed items automatically removed from drive

transfer script re-run until all files download successfully

if files fail repeatedly, assume they have failed on LTO; backup tape called from Iron Mountain and attempted to be staged from there

Page 13: American Archives Horror Story

THE PROGRESS

Many files that initially failed eventually transferred successfully, either from the initial tape or from a backup, after multiple attempts

Others were never successfully transferred

Out of a planned 10,648 files in the batch, 2173 were never successfully downloaded – a 20% failure rate

Page 14: American Archives Horror Story

BREAKING DOWN THE FAILURES

ffmpeg –i ${filename}mediainfo –f ${filename}

“moov atom not found”

Page 15: American Archives Horror Story
Page 16: American Archives Horror Story

QC FAILURE

Playable files with evidence of corruption defined by Crawford as “issues that would make the file unusable,” for example:

a green screen with no audio

a video that plays for two seconds before the screen going black or grey

pixels shift out of place in zigzag pattern

audio is digital noise only

Page 17: American Archives Horror Story
Page 18: American Archives Horror Story
Page 19: American Archives Horror Story
Page 20: American Archives Horror Story
Page 21: American Archives Horror Story

THE PROGNOSIS

Sample data: 5000 files with checksums generated at creation

1012 of those files could not be transferred from LTO, after multiple attempts

However, MD5s on LTO show the files are unchanged

So the files are good – but can’t be reached?

Page 22: American Archives Horror Story

THE POSSIBILITIES

Files were bad before they went onto LTO –production environment provides little opportunity for QC

Files are good, but inaccessible on LTO because of problems with the way the data is stored on the tape or the interaction of the different technologies used to get it out c

Page 23: American Archives Horror Story

THE PROBLEMS NOW

Administrative distance between institutional IT and archival needs makes it difficult to get clear answers about the technology we’re using

Staff turnover means information about original systems/data transfer processes are lost

Local LTO systems incompatible with older tapes, making direct testing currently impossible

Page 24: American Archives Horror Story

NEXT STEPS

Acquire Linux machine for direct testing of LTO 4 tapes

Test different transfer protocols

More investigation into the SL8500 SAMFS/QFS

Look for patterns in inaccessible files (file size, date uploaded, system architecture on storage tape)

Page 25: American Archives Horror Story

Rebecca Fraimow & Casey Davis@rhfraim

@[email protected]

[email protected]