fraimow curatecamp 2015

11
Mass Migration Building a Bulk Hard Drive-to-LTO Workflow From Scratch Rebecca Fraimow National Digital Stewardship Resident at WGBH @rhfraim

Upload: juliaykim

Post on 12-Apr-2017

157 views

Category:

Government & Nonprofit


0 download

TRANSCRIPT

Page 1: Fraimow CURATEcamp 2015

Mass Migration Building a Bulk Hard Drive-to-LTO

Workflow From Scratch

Rebecca Fraimow National Digital Stewardship Resident at WGBH

@rhfraim

Page 2: Fraimow CURATEcamp 2015
Page 3: Fraimow CURATEcamp 2015

80 hard drives 11,561 audiovisual files 300 TB of data 1 dedicated LTO workstation 1 dedicated archivist

Page 4: Fraimow CURATEcamp 2015

. . .

Page 5: Fraimow CURATEcamp 2015

Required Scripts & Documents (Initial)

AA_PBCorescript.sh: generates checksums and metadata for each file on drive AA_LTO_checksum.sh: generates checksums for each file on LTO WGBH_Batch1_LimitedCSV_final.csv, WGBH-Batch2-140211.csv, WGBH_batch3.csv, WGBH-Batch4-LimitedCSV.csv: GUID mapping documents

Page 6: Fraimow CURATEcamp 2015
Page 7: Fraimow CURATEcamp 2015

Some drives didn’t perform correctly when removed from their cases Some drives had too much content to fit on one LTO tape Some drives had known failed files on them that were not separated out or identified Some of the content turned out to be derivative material Some of the content had been pulled twice Some drives turned out to have failed files that could only be detected by manual QC

Page 8: Fraimow CURATEcamp 2015

Required Scripts & Documents (Revised)

AA_PBCorescript_with_checks.sh: restructures drive, checks for bad files and derivatives, generates checksums and metadata for each file on drive AA_LTO_checksum.sh: generates checksums for each file on LTO WGBH_Batch1_LimitedCSV_final.csv, WGBH-Batch2-140211.csv, WGBH_batch3.csv, WGBH-Batch4-LimitedCSV.csv: GUID mapping documents AA_LTO_checksum_second_tape.sh: creates a second checksum list for overflow files batch_qt_proofsheet.sh: creates QT_proofs for each files proof_check.sh: QC to identify files incompatible with QT_proofs aapb_MD5_total.csv: list of all files transferred, with checksums corrupted_files.csv: list of files that did not pass MD5 checksum validation derivatives.csv: list of derivative files to be removed from inclusion in the repository md5_original_values.csv: list of all documented MD5s from before files went into Artesia DAM

Page 9: Fraimow CURATEcamp 2015

QT_Proofsheets

Probably OK! NOT OK

Page 10: Fraimow CURATEcamp 2015

SHARE DRIVE

HARD DRIVE

LTO

Page 11: Fraimow CURATEcamp 2015

Contact: [email protected] [email protected]

@rhfraim

Code: https://github.com/WGBH/ltoscripts