megan dirickson, kristin law, nora winslow inf 392k, spring 2013

Archiving the Digital Records of the SAA-UT

Student ChapterMegan Dirickson, Kristin Law, Nora Winslow

INF 392K, Spring 2013

Overview

Previous Work Determining Scope Gathering & Assessing records Appraisal & Arrangement

o Creating the DSpace Collectionso Privacy

Processingo Descriptive Metadata Spreadsheeto Creation of the SIPSo Batch Ingesto Shell scriptingo Batch Metadata Editing

Twitter Future Work Self-Archiving Guidelines

Previous Work

In 2011, Wendy Hagenmaier and Rachel Appel digitized SAA paper records for the Survey of Digitization class.

They digitized 221 objects. They set up a basic schema in

DSpace, which we used as jumping-off point.

Existing Schema

Community-School of Information Student Organizations

Sub-community-Society of American Archivist UT Chapter

Collections: Administrative Records, Archives Week, Correspondence, Events, Financial Records, Marketing, Meeting Minutes, Website

Our Goal

Archive all the existing born-digital records, especially the records from the past year.

But more importantly, set up a self-archiving work flow that would allow future SAA members to easily archive their own records into Dspace.

First Things First

We wanted to gain intellectual control over the materials. We asked: “What exists and where is it? What should be included for the future?”

Used Megan and Kristin’s expertise as previous officers

Rachel and Wendy’s previous documentation

Actually Getting the Records

We asked previous SAA board members to send us anything they had.

Gleaned materials from the SAA’s two websites-the general website and the Archives Week website

Type of Records

Images, documents, recordings, presentations and spreadsheets

Files that made up the websites, html and css files mostly

Twitter and Facebook accounts Listserv emails

Type of Records

SAA Websites

Narrowing the Scope

Over 600 discrete files Experimented with archiving Twitter

and Facebook-mixed results Looked into previous attempts to

archive listserv emails. Facebook and the emails proved too

complicated and time-consuming for the scope of this project.

Appraisal

Appraisal basically consisted of weeding out duplicates, of which there were a lot.

Kristin managed the files that were sent to us from previous members.

Megan gleaned the general SAA website.

Nora worked with the Archives Week website.

Over 900 files

Appraisal

SAA Website Structure

Restructuring DSpace

Large number of files Over 10 year time span We wanted to maintain the

arrangement, but the current structure was too restrictive.o We moved everything up a level, in

order to create collections for each year

New Structure

Communityo School of Information

Student Organizations. Sub-community

o Society of American Archivists:

o UT Student Chapter Sub-sub-communities

o Administrative Recordso Archives Weeko Correspondence o Events o Financial Recordso Marketingo Meeting Minuteso Website and Social Media

Collectionso Calendar year

New Structure

Privacy

All Financial Records collections have been set to be private. These collections contain budgets, potential account information, and information about donations to Archives Week that the donors may wish to keep private. All financial documents from Archives Week planning have intentionally been included with Financial Records in order to keep the Archives Week collections open to the public.

The most current years (2010-2013) of Administrative Records are currently closed. Sensitive documents in these collections include membership rosters (with emails), and mentorship program information. EIDs have been redacted from the 2010-2011 membership rosters.

Tips to ensure privacy EIDs are not to be kept in the digital archive and documents

should be reviewed to be sure that they are not included.

Other sensitive information may be included in the archive, but kept in a private collection. All sensitive documents have been included in only Financial Records and Administrative Records, allowing the remaining collections to be open. Titles of private items will be viewable to the public, but the contents of the items will not be.

It is up to the discretion of the future board to determine when the closed collections may be made publicly available. The Treasurer is responsible for reviewing current and previously deposited records for privacy issues, as the Treasurer will be most cognizant of sensitive information contained in financial and membership records.

Processing—metadata gathering

Kept archival copy of records safe on a flash drive

Made other ‘processing’ copies for determining content and gathering metadata

Created spreadsheet for entering descriptive metadata

This is also when we determined intellectual arrangement of records and spotted duplicates

Creation of SIPs

Create extracted metadata xml file using National Library of New Zealand’s Metadata Extractor

Perl script to create dublin_core formatted xml from extracted xml, and create a new directory for each

Manually add original bitstream to each directory Perl script to create ‘contents’ text file Perl script to change directory names to item_001,

item_002, etc. This had to be done separately for each collection

(about 30 collections)

Batch Ingest

Staged SIPs on Vauxhall in structure mirroring the Dspace structure, and wrote batch ingest command lines before meeting with Sam

Change in command line:o /opt/dspace/bin/dspace import

org.dspace.itemimport.ItemImport --add [email protected] --collection=2081/29160 --

Problems with dublin_core files—junk!

Shell Scripting

Since we had so many collections, we bundled the command lines to execute using shell scripts

The idea was to save time…..but…o The script didn’t leave time to check for

errors before moving on to the next collection

Added: echo sleep 5

Batch Metadata Editing Exported metadata

from each sub-community:

idcollectiondc.contributor.authordc.date.createddc.date.issueddc.identifier.uridc.language.isodc.publisherdc.subjectdc.title

Merged with our descriptive metadata files by matching with id #’s, and adding/changing dublin core fields and data:

idcollectiondc.contributor.author – SAA-UT dc.date.created –changed from ingest date, to date of creation/use of documentdc.date.issueddc.identifier.uridc.language.isodc.publisherdc.subjectdc.title.alternative –moved filename heredc.contributor – if an individual author was knowndc.title --changed from filename to descriptive titledc.coverage.spatialdc.description

Batch Metadata Editing Once the spreadsheet was completely edited, we saved

them as CSVs, and met with Sam again to import the metadata

Each sub-community had to be imported individually (much faster than each collection!)

Command line:Opt/dspace/bin/dspace metadata-import –f /opt/batch_ingests/2081-29125.csv

Weird things happened with the ingest date…

Batch Metadata Editing

Yay, Metadata!!!

Social Media

Twitter provides a simple means for downloading Tweets

We felt that the tweets, especially from 2012, were valuable records. The Archives Week lectures were live-tweeted, providing rich documentation for the events.

The Dspace bundle includes:o Zip file including CSV of tweets (with time/date

stamps)o Screenshot for added visual context

Future Work

Follow workflow and continue archiving records!

Website—too complicated for a simple ingest

Listserv emails Facebook Continued digitization

Self-Archiving Guidelines/Workflow

Naming Conventions and Standards Roles & Responsibilities Basic workflow for importing items

individually to Dspace, including adding descriptive metadata

Security/Access and Privacy Issues Community and Collection structure;

arrangement guidelines for consistency Appraisal/Selection Policies and record

priorities

megan dirickson, kristin law, nora winslow inf 392k, spring 2013

Documents

financial records collections

general saa website

previous members

existing borndigital

previous saa board members

general website

future saa members

previous officersrachel