ya-ning arthur chen, feng-chien chung computing centre, academia sinica 11 april, isgc 2008

Post on 14-Jan-2016






Click to see full reader


Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008. A hybrid approach of digital long term preservation to institutional repositories - A case study of DSpace/SRB Integration. Outline. Background of MAAT From Website to Institutional Repository - PowerPoint PPT Presentation


A hybrid approach of digital long term preservation to institutional

repositories - A case study of DSpace/SRB Integration

Ya-ning Arthur Chen, Feng-chien Chung

Computing Centre, Academia Sinica

11 April, ISGC 2008


• Background of MAAT• From Website to Institutional Repository• Long Term Preservation & OAIS• The Hybrid Approach• Future

MAAT – Background

• The Metadata Architecture & Application Team (MAAT) was established in 2002 to engage in metadata research and service supportive for the National Digital Archives Program (NDAP) in Taiwan

• To date, the MAAT has been supporting over 80 digital library projects of Taiwan E-Learning & Digital Archive Program (TELDAP, former: NDAP)

MAAT – Motivation

• A number of documents have been created and can be categorized into

– questionnaires,

– work sheets,

– meeting records,

– metadata mapping tables,

– system specifications,

– best practices of metadata standards,

– technical reports,

– research papers,

– briefings, and

– tutorial materials.

• Most documents of the MAAT website are arranged in a static manner.

MAAT Website


Academia Sinica

MAAT - Consideration1

• Document management and repository– over 1,000 documents and URL links have

been arranged and served at the MAAT website.

– the MAAT website needs an effective system of document management.

• Access control – The MAAT website still lacks access

control for document access.

MAAT - Consideration2

• Workflow reengineering– the MAAT website adopts a centralized model

to maintain documents and website arrangement.

– This model is very complicated and labor-intensive, and the overhead cost is very high.

• Usage Statistics Report

MAAT - Challenge

• Too many publications, • Too much change (that is various

document versions), • Too many contributors, and • Too many institutions.

Implementation Level

Static Website

Institution Repository

Phase1: from website to IR

DSpace - feature

• Captures– Digital research material in any format– Directly from creators (e.g. faculty)– Large-scale, stable, managed long-term storage

• Describes– Descriptive metadata (Dublin Core)– Technical metadata (file size, format…) – Rights metadata (licenses, creative commons…)

• Distributes– Via WWW, with necessary access control

• Preserves– Persistent ID and Handle– Bitstream format registry

DSpace - Data Model

MAAT – Content1

• Content Type – 支援計畫 (Documents from the Projects we support)

– 出版與活動 (Documents of Publication and Activity)

– 計畫管理 (Project Management related – restricted documents)

– 研究發展 (Research & Development - restricted documents)

– 48 Communities, 110 collections, 783 items

• Document Format – User upload: 794 pdf files, 446 ms word files, 59 ms powerpoint

slides, 27 xml files, 17 jpeg images, 16 html files, 7 ms excel files…and the others

– System generate: Over 1900 Plain Text files (mainly DSpace License files)…

MAAT – Content2

• Access Method– DSpace user browse and search interface– Search engines (google, yahoo…etc.)– OAI-PMH harvesting



DSpace - Consideration

• The Need for Extending DSpace Storage Capabilities– The amount of documents grows so fast that an

enormous size storage solution is required

• The Lack of Risk Management Mechanism– The Reliable Backup and Disaster Recovery Systems

are not included in the default DSpace Installation

Implementation Level

Statis Website

Institution Repository

Phase1: from website to IR

Institution Repository + Grid

Phase2: from IR to Long Term Preservation

DSpace/SRB Approach1

• In 2004, NARA (with NSF/NPACI) has funded a project aimed at integrating DSpace and SRB to – allow DSpace to use the data grid as a storage layer– permit the exchange of authentic documents between them

• NARA Proposal & Participants– San Diego Super Computer Center (SDSC)

• Member of National Partnership for Advanced Computational Infrastructure (NPACI) an NSF sponsored program

– MIT Libraries– UC San Diego Libraries (UCSD)– Hewlett Packard Laboratories (HP)– National Archives and Records Administration (NARA)

DSpace/SRB Approach2

• In DSpace, there can be multiple bitstream stores, each of these bitstream stores can be traditional storage or SRB storage.

• Both traditional and SRB storage are specified by configuration parameters.

• Both traditional and SRB bitstream stores are configured in dspace.cfg

Examination of DSpace/SRB

• An Open Archive Information System (OAIS) intends to preserve information for access and use by a Designated Community

OAIS Functional Model



3. SRB Storage

6. DSpace User Interface

2. DSpace Ingest

1. Common service (Network, OS…)

Submit Interface& Batch Import


5. DSpace & SRB Admin

OAIS Functional Model…Again

DSpace & SRB Administration


Submit Interface

DSpace User InterfaceSRB

Mass Storage

DSpace Ingest

DSpace Batch Import

Producer, Management and Consumer

• Producer– DSpace may play the role of ingest SIP from

producer, and generate AIP for Management & Storage

• Management– SRB May play the role of receive AIP then Store &

Manage data, and generate AIP for Access• Consumer

– DSpace May Play the role of process the access request and generate the proper DIP for dissemination


Submit Interface

DSpace User InterfaceSRB

Mass Storage

DSpace Ingest

DSpace Batch Import


Archives arrangement

• Logical Archives structure:– DSpace allow multi-level communities and

one level collection– Archive’s principle

• Principle of provenance• Principle of respect des fonds

• Physical Files Arrangement: – SRB Mass Storage Technology


• Best Practice & SOP for DSpace/SRB integration

• Deeper Check Against Activities of OAIS• Preservation Planning and policy

– Monitor Producer/Management/Consumer’s service requirements and emerging technology, develop archival strategy & migration plan


• Feasibility Evaluation– Migrate from SRB to others advanced

technology, such as SRM, iRODS…– Adopt metadata approach to enhance digital

preservation, such as PREMIS and METS (ex: structural map, behavior section…)

Thank You

top related