time travelling analyst: the things that only a time machine can tell me

Download Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me

If you can't read please download the document

Upload: ross-spencer

Post on 13-Apr-2017

1.064 views

Category:

Government & Nonprofit


0 download

TRANSCRIPT

PowerPoint Presentation

Time Traveling Analyst: The Things Only a Time Machine Can Tell Me

Ross Spencer - @beet_keeperArchives New Zealand

#ARANZ2015Tuesday September 7 2015

Sun image, R24685027, E4, Archway, Archives New Zealand. http://www.archway.archives.govt.nz/ViewFullItem.do?code=24685027&digital=yes

Background

Two sets of born-digital ingest, Minister's Papers, 'code-named', E1 and E4, E2 and E3.

First sets selected for simplicity.

Second sets followed numerical sequence and were used as a learning exercise.

Complexity grew.

First sets enabled creation of CSV ingest mechanism, configuration of Rosetta, creation of process.

Second sets enabled the proof of that method.

E1~

175 Files

10 Directories

0 Unidentified Objects

0 Unidentified Extensions

7 Known Formats

N.B. E4 also contained two identification false positives.

E4~

1295 Files

6 Directories

2 Unidentified Objects

1 Unidentified Extensions

12 Known Formats

Approximate collection breakdowns at the beginning of the process

Approximate collection breakdowns at the beginning of the process

Approximate collection breakdowns at the beginning of the process

E2~

2519 Files

177 Directories

5 Unidentified Objects

4 Unidentified Extensions

22 Known Formats

25 Extension Mismatches

E3~

1748 Files

144 Directories

8 Unidentified Objects

5 Unidentified Extensions

12 Known Formats

37 Extension Mismatches

N.B. Both collections contained empty folders, empty files, and multiple-id formats.

Let's begin with a story...

E1, the simplest... Enabled us to develop an ingest mechanism for heterogeneous collections and it worked!

E4, not that different, slightly larger, about as 'known', but!

An unexpected exception discovered in the relationship between the preservation system and some of the filenames in the collection...

Where do astronauts go for a beer?

The...

We had filenames with multiple spaces in them...

E.g. 'A [space] [space] Filename.docx'

An innocuous enough looking problem... Our digital preservation system couldn't handle them...

Investigate the system......Confirm it's the system...Ask vendor to fix the problem...No fix forthcoming for next release...

What now...?

Change filenames?...Serious change, this is how we received them!Record provenance...Mechanisms in METS metadata schema [EVENT]How to implement?

We continue...

Configure CSV to handle EVENT fields......Modify CSV generation tool to output blank EVENT fields...Test ingest in system until configuration is perfected

Mechanism works so pre-condition filenames......Record R-Numbers* and design provenance note controlled list...Add data to CSVDONE!!!!

*Dependency on listing being fixed in Archway

Test in digital preservation system fails......UTF-8 character encoding... How to preserve in Excel?Import using special ribbon in Excel...Add notes to sheet...DONE?!Not even now... >.