10041267M-1
ISO “Reference Model For anISO “Reference Model For anOpen Archival InformationOpen Archival Information
System (OAIS)”System (OAIS)”
Tutorial Presentation
Don Sawyer /NASA/NSSDCLou Reich /CSC
October 2002
10041267M-2
Outline of TalkOutline of Talk
History Reference Model overview Some Applications Follow-on Activities
10041267M-3
NASA RoleNASA Role
National Space Science Data Center— NASA’s first digital archive— Experienced many technology changes since 1966
Consultative Committee for Space Data Systems— International group of space agencies— Developed variety of science discipline- independent
standards— Became working body for an ISO TC 20/ SC 13 about
1990
TC20: Aircraft and Space Vehicles
SC13: Space Data and Information Transfer Systems
10041267M-4
Initial Archive Standards ProposalInitial Archive Standards Proposal
ISO suggested that SC 13 should develop archive standards
– Address data used in conjunction with space missions
– Address intermediate and indefinite long term storage of digital data
10041267M-5
ResponseResponse
Response to Consultative Committee for Space Data Systems (CCSDS) and ISO TC 20/SC 13
– No framework widely recognized for developing specific digital archive standards
– Begin by developing a ‘Reference Model’ to establish common terms and concepts
– Ensure broad participation, including traditional archives
(Not restricted to space communities; all participation is welcome!)
– Focus on data in electronic forms, but recognize that other forms exist in most archives
– Follow up with additional archive standards efforts as appropriate
10041267M-6
What is a Reference Model?What is a Reference Model?
A framework — for understanding significant relationships among the
entities of some environment, and — for the development of consistent standards or
specifications supporting that environment.
A reference model— is based on a small number of unifying concepts — is an abstraction of the key concepts, their
relationships, and their interfaces both to each other and to the external environment
— may be used as a basis for education and explaining standards to a non-specialist.
10041267M-7
Organizational ApproachOrganizational Approach
Organize US contribution under a framework with NASA lead— Establish liaison with Federal Geographic Data Committee (FGDC)
and National Archives and Records Administration (NARA)— Agency archives and users must be represented in this process
An “Open” process— Important to stimulate dialogue with broad archive/user
communities— Results of US and International workshops put on WEB— Support e-mail comments/critiques
Broad international workshops also held— UK and France— Issue resolution at ISO/Consultative Committee for Space Data
Systems international workshops
10041267M-8
Technical ApproachTechnical Approach Investigate other Reference Models.
— ISO “Seven Layer”Communications Reference Model— ISO Reference Model for Open Distributed Processing— ISO TC211 Reference Model for Geomantics
Define what is meant by ‘archiving of data’ Break ‘archiving’ into a few functional areas (e.g., ingest,
storage, access, and preservation planning) Define a set of interfaces between the functional areas Define a set of data classes for use in Archiving Choose formal specification techniques
— Data flow diagrams for functional models and interfaces— Unified Modeling Language (UML) for data classes
10041267M-9
ResultsResults
Reference Model targeted to several categories of reader
— Archive designers— Archive users— Archive managers, to clarify digital preservation issues
and assist in securing appropriate resources— Standards developers
Adopted terminology that crosses various disciplines
— Traditional archivists— Scientific data centers— Digital libraries
10041267M-10
Reference Model StatusReference Model Status
Already widely adopted as starting point in digital preservation efforts
— Digital libraries (e.g., Netherlands National Library)— Traditional archives (e.g., US National Archives)— Scientific data centers (e.g., National Space Science Data
Center)— Commercial Organizations (e.g., Aerospace Industries
Association preservation working team)
Recently approved for publication as final CCSDS and ISO (14721:2002) standards
CCSDS version is available at: — http://www.ccsds.org/documents/pdf/CCSDS-650.0-B-1.pdf
10041267M-11
Reference Model for anReference Model for anOpen Archival Information SystemOpen Archival Information System
Technical Overview Technical Overview
10041267M-12
Open Archival Information System (OAIS)Open Archival Information System (OAIS)
Open– Reference Model standard(s) are developed using
a public process and are freely available Information
– Any type of knowledge that can be exchanged– Independent of the forms (i.e., physical or digital)
used to represent the information– Data are the representation forms of information
Archival Information System– Hardware, software, and people who are
responsible for the acquisition, preservation and dissemination of the information
10041267M-13
Document OrganizationDocument Organization
Introduction– Purpose and Scope, Applicability, Rationale, Road Map for Future
Work, Document Structure, and Definitions of Terms OAIS Concepts and Responsibilities
– High level view of OAIS functionality and information models– OAIS external environment– Minimum responsibilities to become an “OAIS”
Detailed Models– Functional model descriptions and information model perspectives
Preservation perspectives– Media migration, compression, format conversions, and access
service preservation Archive Interoperability
– Criteria to distinguish types of cooperation among archives Annexes
– Scenarios of existing archives, compatibility with other standards
10041267M-14
Purpose, Scope, and ApplicabilityPurpose, Scope, and Applicability
Framework for understanding and applying concepts needed for long-term digital information preservation
– Long-term is long enough to be concerned about changing technologies
– Starting point for model addressing non-digital information Provides set of minimal responsibilities to distinguish an OAIS
from other uses of ‘archive’ Framework for comparing architectures and operations of
existing and future archives Basis for development of additional related standards Addresses a full range of archival functions Applicable to all long-term archives and those organizations
and individuals dealing with information that may need long-term preservation
Does NOT specify any implementation
10041267M-15
Model View of an OAIS EnvironmentModel View of an OAIS Environment
OAIS(archive)
Management
Producer Consumer
Producer is the role played by those persons, or client systems, who provide the information to be preserved
Management is the role played by those who set overall OAIS policy as one component in a broader policy domain
Consumer is the role played by those persons, or client systems, who interact with OAIS services to find and acquire preserved information of interest
10041267M-16
Negotiates and accepts Information from information producers
Obtains sufficient control to ensure long-term preservation Determines which communities (designated) need to be
able to understand the preserved information Ensures the information to be preserved is independently
understandable to the Designated Communities Follows documented policies and procedures which ensure
the information is preserved against all reasonable contingencies
Makes the preserved information available to the Designated Communities in forms understandable to those communities
OAIS ResponsibilitiesOAIS Responsibilities
10041267M-17
OAIS Information DefinitionOAIS Information Definition
Information is always expressed (i.e., represented) by some type of data
Data interpreted using its Representation Information yields Information
Information Object preservation requires clear identification and understanding of the Data Object and its associated Representation Information
DataObject
InterpretedUsing its
RepresentationInformation
Yields
InformationObject
10041267M-18
Information Package DefinitionInformation Package Definition
An Information Package is a conceptual container holding two types of information
– Content Information– Preservation Description Information (PDI)
PreservationDescriptionInformation
ContentInformation
10041267M-19
Information Package VariantsInformation Package Variants
Submission Information Package– Negotiated between Producer and OAIS– Sent to OAIS by a Producer
Archival Information Package– Information Package used for preservation– Includes complete set of Preservation Description
Information for the Content Information Dissemination Information Package
– Includes part or all of one or more Archival Information Packages
– Sent to a Consumer by the OAIS
10041267M-20
External Data Flow ViewExternal Data Flow View
Producer
Consumer
queries
resultsets
orders
OAIS
ArchivalInformationPackages
SubmissionInformationPackages
DisseminationInformationPackages
10041267M-21
Detailed ModelsDetailed Models
Overview
10041267M-22
Overview of Detailed ModelsOverview of Detailed Models
It was decided to do both a functional and an information model of the OAIS
Both models were tasked to:— Use the models to better communicate OAIS Concepts— Use a well established, formal modeling technique— Stay as implementation independent as possible— Avoid detailed designs
10041267M-23
Detailed ModelsDetailed Models
Information Model
10041267M-24
General PrinciplesGeneral Principles
Define classes of “information objects’ that illustrate information necessary to enable Long-term storage and access to Archives
The class definition should be implementation Independent
Use a subset of Unified Modeling Language (UML)
10041267M-25
UML NotationUML Notation
Class:
Class Name
Aggregation:
Assembly Class
Part -1 Class Part-2 Class
Multiplicity of Associations:
Class
Class
Class
Class
Exactly one
Many (zero or more)
Optional (zero or one)
One or more
Class-1 Class-2
Association Name
Parent Class
Child -1 Class Child-2 Class
Specialization:
Association:
*
1. .*
0 . .1
1
*
Class-1 Class-2
Association Name
Association Name
Association as a class:
10041267M-26
Information ObjectsInformation Objects
InformationObject
RepresentationInformation
1+
interpretedusing1+Data
Object
interpretedusing
PhysicalObject
DigitalObject
BitSequence
1+
10041267M-27
Representation InformationRepresentation Information
The Representation Information accompanying a physical object, like a moon rock, may give additional meaning
– It typically is a result of some analysis of the physically observable attributes of the rock
The Representation Information accompanying a digital object, or sequence of bits, is used to provide additional meaning.
– It typically maps the bits into commonly recognized data types such as character, integer, and real and into groups of these data types.
– It associates these with higher level meanings which can have complex inter-relationships that are also described
10041267M-28
Recursive Nature ofRecursive Nature ofRepresentation InformationRepresentation Information
Structure Information Semantic Information Other Representation
Information
10041267M-29
Types of Information Used in OAISTypes of Information Used in OAIS
Information
Object
Content
Information
Packaging
Information
Preservation
Description
Information
Descriptive
Information
. . .
10041267M-30
Content InformationContent Information
The information which is the primary object of preservation An instance of Content Information is the information that
an archive is tasked to preserve. Deciding what is the Content Information may not be
obvious and may need to be negotiated with the Producer The Data Object in the Content Information may be either a
Digital Object or a Physical Object (e.g., a physical sample, microfilm)
10041267M-31
Preservation Description InformationPreservation Description Information
Provenance Information– Describes the source of Content Information, who has
had custody of it, what is its history Context Information
– Describes how the Content Information relates to other information outside the Information Package
Reference Information– Provides one or more identifiers, or systems of
identifiers, by which the Content Information may be uniquely identified
Fixity Information– Protects the Content Information from undocumented
alteration
10041267M-32
Preservation Description InformationPreservation Description Information
Content Information
Type
Reference Provenance Context Fixity
SpaceScience Data
• Objec t identifier• J ournal reference Mission,
instrument, title,attribute set
Instrumen tdescription Processin g history Sensor description Instrument Instrumen tmode Decommutatio n map Softwar e interface
specification
Calibratio n history Relat eddata sets Mission Fundin g history
CRC Checksum Reed-Solomon
coding
DigitalLibraryCollections
Bibliograp hicdescription
Persis tentidentifie r
For scann edcollections : metadata a bout the
digitis ation p rocess pointe r to master
vers ion For born-digita l
publications: pointe r to the d igita l
origina l Metadata a bout the
prese rva tion proces s: pointe rs to e arlie r
vers ions o f thecollection item
change his tory
Pointe rs to re la teddocume nts inorigina lenvironme nt at thetime o f publica tion
Digita ls ignature
Checksum Authenticity
indica tor
Software Packag e
Name Author/Origina tor Version num ber Serial num ber
Revis ion h istory License holder Regis tra tion Copyright
Help file Use r gu ide Related softwa re Language
Certifica te Checksum Encryption CRC
10041267M-33
Descriptive InformationDescriptive Information
Contain the data that serves as the input to documents or applications called Access Aids.
Access Aids can be used by a consumer to locate, analyze, retrieve, or order information from the OAIS.
10041267M-34
Packaging InformationPackaging Information
Information which, either actually or logically, binds and relates the components of the package into an identifiable entity on specific media
Examples of Packaging Information include tape marks, directory structures and filenames
10041267M-35
OAIS Archival Information PackageOAIS Archival Information Package
ArchivalInformation
Package (AIP)
ContentInformation
PreservationDescriptionInformation
(PDI)e.g., • Hardcopy document
• Document as an electronic file together with its format description • Scientific data set consisting of image file, text file, and format descriptions file describing the other files
e.g., • How the Content Information came into being, who has held it, how it relates to other information, and how its integrity is assured
PackagingInformation
PackageDescription
further described by
delimited byderived from
e.g., How to find Content information and PDI onsome medium
e.g., Informationsupporting customersearches for AIP
10041267M-36
AIP TypesAIP Types
Based on the difference in Content Object complexity
AIUs contain a single Data Object as the Content Object
AICs contain multiple AIPs in their Content Objects
— Each member of an AIC is an AIP containing Content Information and PDI
— The AIC contains unique PDI on the collection process
10041267M-37
Package Descriptors and Access AidsPackage Descriptors and Access Aids
Package descriptors are needed by an OAIS to provide visibility and access to the OAIS holdings
Package Descriptors contain 1 or more Associated Descriptions which describe the AIP Content Information from the point of view of a single Access Aid
Some example of Access Aids Include:— Finding Aids - assist the consumer in locating
information of interest— Ordering Aids - allow the consumer to discover the cost
of and order AIUs of interest— Retrieval Aids - enable authorized users to retrieve the
AIU described by the Unit Descriptor from Archival Storage
10041267M-38
Information Model SummaryInformation Model Summary
Presented a model of information objects as containing data objects and representation objects
Classified information required for Long-term archiving into 4 classes: Content Information, PDI, Packaging Information and Descriptive Information
Described how these classes would be aggregated and related in an AIP to fully describe an instance of Content Information
Presented information needed for Access, in addition to that needed for Long-term Preservation
Put the Access oriented structures in the context of the other data needed to operate an OAIS
10041267M-39
Detailed ModelsDetailed Models
Functional View
10041267M-40
General PrinciplesGeneral Principles
Highlight the major functional areas important to digital archiving
Use functional decomposition to clarify the range of functionality that might be encountered
– Don't decompose beyond two levels to avoid becoming too implementation dependent
– Provide a useful set of terms and concepts– Do not imply that all archives need to implement all the
sub-functions Identify some common services which are likely to be
needed, and are assumed to be available, as underlying support
10041267M-41
Common ServicesCommon Services
Modern, distributed computing applications assume a number of supporting services
Examples of Common Services include:— inter-process communication— name services— temporary storage allocation— exception handling— security— file and directory services
10041267M-42
OAIS Functional EntitiesOAIS Functional Entities
SIP = Submission Information PackageAIP = Archival Information PackageDIP = Dissemination Information Package
SIP
DescriptiveInfo.
AIP AIP DIP
Administration
PRODUCER
CONSUMER
queriesresult sets
MANAGEMENT
Ingest Access
DataManagement
ArchivalStorage
DescriptiveInfo.
Preservation Planning
orders
10041267M-43
Functional Entities In An OAISFunctional Entities In An OAIS
Ingest: This entity provides the services and functions to accept Submission Information Packages (SIPs) from Producers and prepare the contents for storage and management within the archive
Archival Storage: This entity provides the services and functions for the storage, maintenance and retrieval of Archival Information Packages
Data Management: This entity provides the services and functions for populating, maintaining, and accessing both descriptive information which identifies and documents archive holdings and internal archive administrative data.
Administration: This entity manages the overall operation of the archive system
Preservation Planning: This entity monitors the environment of the OAIS and provides recommendations to ensure that the information stored in the OAIS remain accessible to the Designated User Community over the long term even if the original computing environment becomes obsolete.
Access: This entity supports consumers in determining the existence, description, location and availability of information stored in the OAIS and allowing consumers to request and receive information products
10041267M-44
Ingest Data Flow DiagramIngest Data Flow Diagram
NESTOR/ Grunberger
R ev f, 2-25-99
AIPSIPSIP
AIP
AIP
Descriptive
info.
Co-ordinate Updates
Generate
Descriptive
Info
Archival
Storage
Data
Management
Quality
Assurance
Report
request
ReportAdministrationFormat & doc. stds.
Receive
Submission
Descriptive
info.
Generate
AIP
S torage
confirmation
[Updated] SIP
SIP, AIP Report
SIP
QA results
P
R
O
D
U
C
E
R
10041267M-45
Preservation PlanningPreservation Planning
10041267M-46
Preservation PerspectivesPreservation Perspectives
10041267M-47
Migration ContextMigration Context
Content Information Identifier
Descriptive Information Mapping
AIP Identifier
Archival Storage Mapping
Packaging Information
Content Information Preservation Description
Information
Archival Storage
View
Data Management
and Access View
10041267M-48
Digital MigrationDigital Migration
Digital Migration is defined to be the transfer of digital information, while intending to preserve it, within the OAIS.
Focus on preservation of the full information content New information implementation replaces the old OAIS has full control and responsibility over all aspects of
the transfer
Three major motivators are seen to drive Digital Migrations of Archival Information Packages within an OAIS:
Media Decay Increased Cost Effectiveness New Consumer Service Requirements
10041267M-49
Digital Migration Approaches Digital Migration Approaches
Four primary types of digital migration in response to motivators, ordered by increasing risk of information loss:— Refreshment
• Media replacement with no bit changes— Replication
• No change to Packaging Information or Content Information bits
— Repackaging• Some bit changes in Packaging Information
— Transformation• Reversible: Bit changes in Content Information are
reversible by an algorithm• Non-reversible: Bit changes in Content Information
are not reversible by an algorithm
10041267M-50
Access PreservationAccess Preservation
Effective access to digital information requires the use of software
Application Programming Interfaces (APIs) may be cost-effectively maintained across time by an OAIS when:
— API is not too complex— API is applicable to a wide variety of AIUs
API source code may be ported to new environments— Extensive testing is needed to ensure against information
loss Preservation of executables by full emulation of underlying
hardware is problematic— Hard to know what is the information being preserved— May not be possible to fully emulate associated devices
10041267M-51
Archive InteroperabilityArchive Interoperability
10041267M-52
Archive Interoperability MotivatorsArchive Interoperability Motivators
Users of multiple OAIS archives have reasons to wish for some interoperability or cooperation among the OAISs.
Consumers— Common finding aids to aid in locating information over
several OAIS archives— Common Package Descriptor schema for access— Common DIP schema for dissemination, or a single
global access site. Producers
— common SIP schema for submission to different archives— a single depository for all their products.
Managers— Cost reduction through sharing of expensive hardware
increasing the uniformity and quality of user interactions with the OAIS
10041267M-53
Categories of Archive InteractionsCategories of Archive Interactions
Independent: no knowledge by one OAIS of Standards implemented at another
Cooperating: Potentially common submission standards, and common dissemination standards, but no common access. One archive may make subscription requests for key data at the cooperating archive
Federated: Access to all federated OAIS is provided through a common set of access aids that provide visibility into all participating OAISs. Global dissemination and Ingest are options
Shared resources: An OAIS in which Management has entered into agreements with other OAISs is to share resources to reduce cost. This requires various standards internal to the archive (such as ingest-storage and access-storage interface standards), but does not alter the community’s view of the archive
10041267M-54
Federated ArchivesFederated Archives
Local
Consumer
Local
Consumer
GlobalConsumer
Dissemination Information Package (Optional)
Dissemination Information Package (Optional)
Adm
inistra
tion
Co
mm
on
Ca
talo
gA
ccess
Acce
ssA
ccess
Ingest Access
OAIS 1Administration
Ingest Access
OAIS 2
Administration
10041267M-55
Levels of Autonomy in Associated ArchivesLevels of Autonomy in Associated Archives
No interactions and therefore no association Associations that maintain your autonomy. You have to do
certain things to participate, but you can leave the association without notice or impact to you.
Associations that bind you by contract. To change the nature of this association you will have to re-negotiate the contract. The amount of autonomy retained depends on how difficult it is to negotiate the changes.
10041267M-56
Reference Model SummaryReference Model Summary
Reference model is to be applicable to all digital archives, and their Producers and Consumers
Identifies a minimum set of responsibilities for an archive to claim it is an OAIS
Establishes common terms and concepts for comparing implementations, but does not specify an implementation
Provides detailed models of both archival functions and archival information
Discusses OAIS information migration and interoperability among OAISs
10041267M-57
Some ApplicationsSome Applications
10041267M-58
Basis of Systems ArchitecturesBasis of Systems Architectures
NEDLIB (Networked European Deposit Library) effort used OAIS Reference Model as a basis for the design and architecture of Deposit System for Electronic Publications (DSEP)
National Library of Australia used it as basis for their implementation
CEDARS: A multi-site UK project to create exemplars in Digital Archiving is using OAIS representation data as the basis for research into long term preservation
NSSDC (National Space Science Data Center ) is evolving their archive using OAIS RM as a basis for a new architecture
SIPAD: French space agency plasma physics archive used the OAIS as a basis for design
METS (Metadata Encoding and Transmission Standard) is using OAIS concepts in an implementation of types of Submission, Archival, and Dissemination Information Packages.
InterPARES, a body of National Archives from many countries, adopted OAIS as a starting point for their modeling work
10041267M-59
Enhanced Communications and Enhanced Communications and Productivity among varied CommunitiesProductivity among varied Communities
National Archives and Records Administration contracted some work on long term preservation of collections to the San Diego Super Computer Center. Both parties claimed use of the OAIS RM saved several weeks of effort in the specification of the task
Similar experiences between:— National Library of France and French space agency (CNES)
representatives— National Center for Supercomputer Applications HDF format
developers and DNA researchers— Life Sciences Archive developer and micro-gravity researchers— United States Department of Agriculture and digital preservation
experts
10041267M-60
More OAIS AccomplishmentsMore OAIS Accomplishments Royal Library of the Netherlands (RLN)
— OAIS mandated in their implementation RFP— IBM implementing OAIS-based system for RLN (£5M project)
British National Library is following suit France setting up a working group within ARISTOTE
— interested in archive of digital information, including libraries and Dept of Justice.
• http://www.aristote.asso.fr/ (in french)• “astonishing unifying role” from OAIS reference model
OAIS likely to be used by CODATA archive task group in study on long-term preservation
Playing significant role in Research Libraries Group and OCLC (Online Computer Library Center) digital preservation work
10041267M-61
Follow-on ActivitiesFollow-on Activities
Research Libraries Group has established a web page to track OAIS implementation efforts and issues
— http://www.rlg.org/longterm/oais.html
CCSDS/ISO Producer-Archive Interface Methodology Standard
— Provides framework for Producer/Archive interactions— Identifies steps and types of information exchanged during the
‘negotiation’— May be used as a checklist by archives
CCSDS Certification Coordination Function— Will track and summarize various archive certification efforts— Will attempt to extract high-level model/checklist— RLG is organizing a group to establish certification approaches