time travelling analyst: the things that only a time machine can tell me
TRANSCRIPT
PowerPoint Presentation
Time Traveling Analyst: The Things Only a Time Machine Can Tell Me
Ross Spencer - @beet_keeperArchives New Zealand
#ARANZ2015Tuesday September 7 2015
Sun image, R24685027, E4, Archway, Archives New Zealand. http://www.archway.archives.govt.nz/ViewFullItem.do?code=24685027&digital=yes
Background
Two sets of born-digital ingest, Minister's Papers, 'code-named', E1 and E4, E2 and E3.
First sets selected for simplicity.
Second sets followed numerical sequence and were used as a learning exercise.
Complexity grew.
First sets enabled creation of CSV ingest mechanism, configuration of Rosetta, creation of process.
Second sets enabled the proof of that method.
E1~
175 Files
10 Directories
0 Unidentified Objects
0 Unidentified Extensions
7 Known Formats
N.B. E4 also contained two identification false positives.
E4~
1295 Files
6 Directories
2 Unidentified Objects
1 Unidentified Extensions
12 Known Formats
Approximate collection breakdowns at the beginning of the process
Approximate collection breakdowns at the beginning of the process
Approximate collection breakdowns at the beginning of the process
E2~
2519 Files
177 Directories
5 Unidentified Objects
4 Unidentified Extensions
22 Known Formats
25 Extension Mismatches
E3~
1748 Files
144 Directories
8 Unidentified Objects
5 Unidentified Extensions
12 Known Formats
37 Extension Mismatches
N.B. Both collections contained empty folders, empty files, and multiple-id formats.
Let's begin with a story...
E1, the simplest... Enabled us to develop an ingest mechanism for heterogeneous collections and it worked!
E4, not that different, slightly larger, about as 'known', but!
An unexpected exception discovered in the relationship between the preservation system and some of the filenames in the collection...
Where do astronauts go for a beer?
The...
We had filenames with multiple spaces in them...
E.g. 'A [space] [space] Filename.docx'
An innocuous enough looking problem... Our digital preservation system couldn't handle them...
Investigate the system......Confirm it's the system...Ask vendor to fix the problem...No fix forthcoming for next release...
What now...?
Change filenames?...Serious change, this is how we received them!Record provenance...Mechanisms in METS metadata schema [EVENT]How to implement?
We continue...
Configure CSV to handle EVENT fields......Modify CSV generation tool to output blank EVENT fields...Test ingest in system until configuration is perfected
Mechanism works so pre-condition filenames......Record R-Numbers* and design provenance note controlled list...Add data to CSVDONE!!!!
*Dependency on listing being fixed in Archway
Test in digital preservation system fails......UTF-8 character encoding... How to preserve in Excel?Import using special ribbon in Excel...Add notes to sheet...DONE?!Not even now... >.