july 20 2007 nagara 1 producer-archive workflow network mike smorul, mike mcgann, joseph jaja...

14
July 20 2007 NAGARA 1 Producer-Archive Workflow Network Producer-Archive Workflow Network Mike Smorul, Mike McGann, Joseph JaJa Institute for Advanced Computer Science Studies University of Maryland, College Park Sponsored by National Archives and Records Administration, Library of Congress and NSF

Post on 20-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

July 20 2007 NAGARA 1

Producer-Archive Workflow NetworkProducer-Archive Workflow Network

Mike Smorul, Mike McGann, Joseph JaJa

Institute for Advanced Computer Science StudiesUniversity of Maryland, College Park

Sponsored by National Archives and Records Administration, Library of Congress and NSF

July 20 2007 NAGARA 2

Problems Facing IngestionProblems Facing Ingestion

• Ensure integrity of data ingestion• Each producer-archive interaction is

unique• Final destination for items in an archive is

unique.• Differing roles between producer and

archive• Translating record schedules for end users• Hostile producers

July 20 2007 NAGARA 3

What is PAWN?What is PAWN?

• Software that provides an ingestion framework

• Distributed and secure ingestion of digital objects into an archive.

• Handles the process – From package assembly – To archival storage

• Simple, customizable interface for end-users

• Flexible interface for archive publication

July 20 2007 NAGARA 4

Package WorkflowPackage Workflow

1. Create Producer-Archive Agreement2. Client package template.3. Create package based on template4. Once approved, packages can be archived5. Rejected packages can be held until rectified or

deleted for resubmission.

Package Builder Review

Producer Agreement

· AdministrativeStrategic and Performance PlansAppointment and PromotionPolicies and CommitteesAlumni Affairs

· FinancialContracts and GrantsPayrollDonations

· Publication ReportsTechnical ReportsPresentationsPostersOutreach

Template

Template Name: Research ResultsNotes: Published results and conference presentations

Contents:· Presentations

· Technical Reports

Create Template Create Package Audit Package

Activity Log

Package Lifecycle

ArchiveArchive Gateway

Archive

July 20 2007 NAGARA 5

Expanding a Simple WorkflowExpanding a Simple Workflow

• Support for multiple workflows.– Grouped into logical domains

• Definable roles per workflow• Pluggable components for assembly and

archival publishing• Distributed components

– Web-service based components

July 20 2007 NAGARA 6

Data OrganizationData Organization

• Initial attempt– Direct presentation of record schedules to end

users

• Why it didn’t work– Record Schedules are great for archivists, not

end users– Information overload of end users– Schedules alone may not capture enough

context

July 20 2007 NAGARA 7

Domain OrganizationDomain Organization

• Track both record schedules and record context while simplifying the mess.– Allow archivists to model record schedules– Allow local managers to create their own

organizational structure (offices, departments, etc)

– Create locally named package templates that map to schedule items

July 20 2007 NAGARA 8

Domain ExampleDomain Example

July 20 2007 NAGARA 9

Custom RolesCustom Roles

• Actions in PAWN can be grouped together to create roles.– There are no common roles between archives, so allow custom

ones.

• Default roles– Producer – Individual data supplier– Records Manager – Oversight of producers– Archive Manager – Final review and archive publishing– Global Administrator – Creates domain, sysadmin-like account

• Sample Actions– Setting permissions on record sets– Record Schedule creation and modification– Add or delete whole packages– Modify items in a package…

July 20 2007 NAGARA 10

Data Ingest and PublishingData Ingest and Publishing

• Ingest– API for creating custom package

builders

• Archival Publishing– Pluggable component that provides an

API for developing gateways into various services.

– Each gateway may have multiple instances, each configured differently

July 20 2007 NAGARA 11

ComponentsComponents

July 20 2007 NAGARA 12

Case StudiesCase Studies

• ICDL Book Builder• SLAC Record Ingestion• 10,000 CDroms

• Remote ingestion

• Unskilled labor

• Custom hardware

• Sample NARA ingestion

• Model government roles

• DOE Record Schedule

• Custom package builder

• Multiple data sources

• Model logical books

July 20 2007 NAGARA 13

PAWN SummaryPAWN Summary

• Platform for ingestion• Customizable Components

– Roles, ingest and publishing

• Distributed architecture

July 20 2007 NAGARA 14

More informationMore information

• Web site:– http://www.umiacs.umd.edu/research/adapt

• Wiki link for technical details.

• Or “I’m feeling lucky” Google keywords:– ADAPT UMIACS