national partnership for advanced computational infrastructure digital library architecture reagan...

9
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram Ludaescher Richard Marciano Arcot Rajasekar Wayne Schroeder Michael Wan Ilya Zaslavsky Bing Zhu (http://www.npaci.edu/DICE/)

Upload: alexandra-johnston

Post on 28-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram

National Partnership for Advanced Computational Infrastructure

Digital Library Architecture

Reagan Moore Chaitan Baru

Amarnath Gupta George Kremenek

Bertram Ludaescher Richard Marciano Arcot Rajasekar

Wayne Schroeder Michael Wan

Ilya Zaslavsky Bing Zhu

(http://www.npaci.edu/DICE/)

Page 2: National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram

National Partnership for Advanced Computational Infrastructure

What Types of Management Systems are Required?

• Data management• Ability to access multiple types of storage systems, across

separate administration domains

• Information management• Ability to migrate collection onto new information

repository

• Knowledge management• Rule-based ontology mapping• Characterization of rules under which collection is formed• Management of knowledge bases - Topic Maps

Page 3: National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram

National Partnership for Advanced Computational Infrastructure

Information Management Hierarchy

• Persistent Archives• Storage of information model, data model, along with data

• Data Grid• Access to data in a different administration domain

• Digital Library - Presentation / Information Discovery• Interlib - ADEPT, UC Berkeley Digital Library

• Data Collection • Extensible Meta-data catalog - EMCAT

• Data handling• SDSC Storage Resource Broker - SRB

• Archival Storage• High performance storage system - HPSS

Page 4: National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram

National Partnership for Advanced Computational Infrastructure

Digital Library Data Management

• Persistent identifiers• Ability to move a data set without the name changing

• Data set replicas• Management of multiple copies of a data set

• Archival backup of data sets• Integration of disk data caches with archival storage

• Persistent archives• Management of a collection through multiple cycles of

technology evolution

Page 5: National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram

National Partnership for Advanced Computational Infrastructure

SDSC Storage Resource Broker & Meta-data Catalog

SRB

ADSM HPSS DB2 Oracle Unix

Application

File SID DBLobj SID Obj SID

MCAT

Dublin Core

Resource

User

ApplicationMeta-data

RemoteProxies

DataCutter

Third-partycopy

Page 6: National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram

National Partnership for Advanced Computational Infrastructure

Common Information Model

• eXtensible Markup Language (XML) • Use tags to define semantic context for components of the

data set

• Document Type Definition (DTD)• Provides semi-structured representation for organizing

tags that can be applied to groups of digital objects

• Development of standards for tags• Digital sky, Protein Data Bank, Neuroscience brain images• California Digital Library - Art Museum Image Consortium

Page 7: National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram

National Partnership for Advanced Computational Infrastructure

Applications

• Support for distributed data collections• Federation of data collections to form digital

library• Integration of digital libraries with archives • Finding aids for federation of digital libraries

through mediation of information • Data grids for data access• Persistent archives

Page 8: National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram

National Partnership for Advanced Computational Infrastructure

TAPE

DISK

CD

FTP

Media Handlers

METADATA

REPOSITORY

RECORDS

REPOSITORY

AccessioningWork Bench

(snapin)

Text

Image

Photo

Video

Audio

Geographical Information System

Compound Records

WEB

DatabaseMetadata wrapper

record

ReferenceWorkbench

(snapin)

Arrangement

A R C

Catalog

OrderFulfillment

RetrieveRecords

WRAPPER

ACCESSION ARCHIVES REFERENCE TRANSFER

FTP

TAPE

DISK

CDUNWRAPPER

Electronic Records Archive (ERA)

Query &Reference

Tools

InternetIntranet

Presentation

Page 9: National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram

National Partnership for Advanced Computational Infrastructure

More Information

http://www.npaci.edu/DICE