fraimow curatecamp 2015

Post on 12-Apr-2017

157 Views

Category:

Government & Nonprofit

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Mass Migration Building a Bulk Hard Drive-to-LTO

Workflow From Scratch

Rebecca Fraimow National Digital Stewardship Resident at WGBH

@rhfraim

80 hard drives 11,561 audiovisual files 300 TB of data 1 dedicated LTO workstation 1 dedicated archivist

. . .

Required Scripts & Documents (Initial)

AA_PBCorescript.sh: generates checksums and metadata for each file on drive AA_LTO_checksum.sh: generates checksums for each file on LTO WGBH_Batch1_LimitedCSV_final.csv, WGBH-Batch2-140211.csv, WGBH_batch3.csv, WGBH-Batch4-LimitedCSV.csv: GUID mapping documents

Some drives didn’t perform correctly when removed from their cases Some drives had too much content to fit on one LTO tape Some drives had known failed files on them that were not separated out or identified Some of the content turned out to be derivative material Some of the content had been pulled twice Some drives turned out to have failed files that could only be detected by manual QC

Required Scripts & Documents (Revised)

AA_PBCorescript_with_checks.sh: restructures drive, checks for bad files and derivatives, generates checksums and metadata for each file on drive AA_LTO_checksum.sh: generates checksums for each file on LTO WGBH_Batch1_LimitedCSV_final.csv, WGBH-Batch2-140211.csv, WGBH_batch3.csv, WGBH-Batch4-LimitedCSV.csv: GUID mapping documents AA_LTO_checksum_second_tape.sh: creates a second checksum list for overflow files batch_qt_proofsheet.sh: creates QT_proofs for each files proof_check.sh: QC to identify files incompatible with QT_proofs aapb_MD5_total.csv: list of all files transferred, with checksums corrupted_files.csv: list of files that did not pass MD5 checksum validation derivatives.csv: list of derivative files to be removed from inclusion in the repository md5_original_values.csv: list of all documented MD5s from before files went into Artesia DAM

QT_Proofsheets

Probably OK! NOT OK

SHARE DRIVE

HARD DRIVE

LTO

Contact: rebecca_fraimow@wgbh.org rebeccafraimow@gmail.com

@rhfraim

Code: https://github.com/WGBH/ltoscripts

top related