an overview of open digital archive architecture jan-ming ho, phd research fellow and deputy...

63
An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Upload: dulcie-cole

Post on 25-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

An Overview of Open Digital Archive Architecture

Jan-Ming Ho, PhDResearch Fellow and Deputy DirectorInts. Of Info. Sci., Academia Sinica

Page 2: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

The Problem

Page 3: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Collection

Management

proofreading

PreservationFront-end

ProductionDisseminationDigitization Presentation

Workflow AAAUser Services and

ManagementValue-Added

Services

Knowledge

Discovery

Other archive systems

Catalog Service

Multimediaraw data andmetadata

Digital Archive Model

Page 4: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Requirements for NDAEDigital Archive Working Environment

Collection, digitization workflow, and storage Metadata, indexing, and digital object management Discovery and Dissemination Content distribution Retrieval and presentation Models the requirements of content holders and users

Scalability and InteroperabilityMultimedia Processing and Presentation

Retrieval, watermark, summarization, virtual reality, etc.

Multilingual Requirements Unicode and Han Variants Missing Han Characters Thesaurus

AAA – Authentication, Authorization, and AccountingUnion Catalog and Value-added Services

Page 5: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Sample Content Projects in NDAP

Rubbings of Bronze, Stones, and Bamboo Slips Holomorphic rubbings

Archaeological ExcavationsSeal Database of Rare BooksArchives of Specimens of Insects, Fish, and Shell, etc.Old Chinese PaintingsEngravings on Bronze Wares made in Chin Dynasty (265-289A.D.)

Page 6: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Management of Holomorphic Rubbings

Page 7: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Management of Holomorphic Rubbings

Page 8: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Management of Holomorphic Rubbings

Page 9: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Management of Holomorphic Rubbings

Page 10: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Management of Holomorphic Rubbings

Page 11: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica
Page 12: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica
Page 13: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica
Page 14: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica
Page 15: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Directory of Species

Page 16: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Specimen Information System

Page 17: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica
Page 18: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica
Page 19: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica
Page 20: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica
Page 21: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica
Page 22: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Metadata Design

Domain-specific and internationalizationStandardizing metadata to facilitate preservation and dissemination of digital objects, and their applications

Page 23: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

A Service Infrastructure

Dark ArchiveDark Archive

Content Creation Content Creation and Managementand Management

UnionUnionCatalogCatalog

CentralizedCentralizedHostingHosting

DomainDomainCatalogCatalog

AccessAccess

ValueValueAddedAdded

ValueValueAddedAdded

EducationEducationServiceService

ValueValueAddedAdded

Content Creation Content Creation and Managementand Management

Page 24: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

An Educators’ Platform

Education

Resource Exchang

e Platform

Front-end

Back-end

Educational Resources

Online Journals

Education Material

Textbook, Reading

Government Institutes, Non-governmental

Consulting Teams, Seeding Schools

Online Counseling

Educators’ Activities

1. Retrieval of lesson plan and other educational resources

2. Community Interaction

3. Teaching Activity

4. Experience Sharing

5. Journal submission

Page 25: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

A Survey of Related Standards

Page 26: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

OAIS Preservation Metadata

Open Archive Information System Preservation MetadataPreservation metadata is the information infrastructure that supports the processes associated with digital preservation. the information necessary to maintain the viability, renderability, and understandability of digital resources over the long-term. an OAIS has three basic functions: ingest, storage and disseminationIn the ERA concept, these functions are executed in three virtual workspaces: Accession, Archival, and Reference workbenches.

Page 27: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

ERA Block Diagram from [1]

[1] Kenneth Thibodeau, “Building the Archives of the Future, Advances in Preserving Electronic Records at the National Archives and Records Administration,” D-Lib Mag., vol. 7, no. 2, Feb. 2001.

Page 28: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

OAI-PMH and Dublin Core

OAI Protocol for Metadata Harvesting Open Archives Initiative Protocol for Metadata

Harvesting provides an application-independent interoperability

framework based on  metadata harvesting

Dublin Core address the problem of resource discovery for

networked resources 15-element set of descriptors interdisciplinary and international consensus

reached on the semantics of each of the 15 elements

Page 29: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

A Typical OAI-PMH Architecture

OAI-PMHPrivate protocol

Data Providers(Web Server + OAI +

Metadata cache) Service Provider(PKC + OAI Harvester )

Data Providers(Web Server + OAI +

Metadata cache)

Page 30: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Name SpaceDOI

The Digital Object Identifier (DOI®) is a system for identifying and exchanging intellectual property in the digital environment.

URI A URI can be further classified as a locator, a name, or both. "Uniform Resource Locator" (URL) refers to the subset of URI that

identify resources via a representation of their primary access mechanism (e.g., their network "location"), rather than identifying the resource by name or by some other attribute(s) of that resource.

"Uniform Resource Name" (URN) refers to the subset of URI that are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable.

<scheme>:<scheme-specific-part>URN

Uniform Resource Names (URNs) are intended to serve as persistent, location-independent, resource identifiers and are designed to make it easy to map other namespaces into URN-space.

"urn:" <namespace-identifier>":" <namespace-specific-string>

Page 31: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Descriptive Metadata

METS The METS schema is a standard for

encoding descriptive, administrative, and structural metadata regarding objects within a digital library

EAD The EAD Document Type Definition (DTD) is

a standard for encoding archival finding aids using the Standard Generalized Markup Language (SGML).

Page 32: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

More on Descriptive Metadata used in NDAP

MARCTEICDWASpecies 2000 Data StandardECHO OLACMS CSDGMMARC 21 Concise format for Authority DataADL Gazatteer Content Standard

Page 33: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Our Approach

Page 34: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Architecture of ODAE

user#1

user#2

user#3

Remotesystems

UnionCatalog

(Discovery Engine)

Data Provider

Metadata Server

Metadata & Workflow

Server

Missing- Character

Server

Media Center

Repository Manager

Video

Audio

Image

Media Productio

n Streaming

Server

SSO Server

AAA Server

Doc Center

Backend Production

Client

Page 35: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Missing Character Server

Page 36: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Number of Hanzi Characters

BIG5: 13,051GB 2312: 6,763GBK: 21,003GB 18030-2000: 27,000+Unicode 2.1: 20,902Unicode 3.0: 27,484Unicode 3.1: 70,195Estimated number of characters: 50,000+Estimated number of glyphs: 100,000+In common use: 8,000 – 9,000

Page 37: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Missing Character Problem

C.C. Hsieh, et. al. Glyph Expression Maintains a Hanzi Glyph Database

Preparation Heavy users, e.g., content holders Occasional users

Network Presentation Retrieval of documents containing mission characters

Page 38: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Preparing Missing Characters by Content Holders

Installing Hanzi glyph database at the client URL: http://ckip.iis.sinica.edu.tw/CKIP/tool/ It also contains MS Office document

templates for preparing glyph expressions

Inserting glyph expression wherever needed in a document or database

Page 39: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Presenting Missing Characters

Content

Holder

………… glyph

expression …………

Java Applet

………… glyph

expression …………

Java Applet

………… <img>….</img>…………

Glyph Image Server

Client

1.

2. 3.

4.

Web Server

Presentation module

Page 40: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Glyph Image Server

Accept a glyph expression encoded in the form of a CGI queryReturns a glyph image

Page 41: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Missing Character Presentation

The web server automatically inserts a presentation applet into each outgoing web page

Author can also choose to insert the applet into the HTML document

The presentation applet retrieve the same HTML document from the server

Netscape 4.x compatibility The web server extracts the glyph expression from the

document, and converts it into a CGI query for the glyph image server and Writes it back to the browser’s cache

The web browser renders the new web page with the glyph image retrieved from the glyph image server

Page 42: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Network-based Input Method for Missing Characters

Page 43: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Retrieving Documents with Missing Characters

Page 44: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica
Page 45: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

ODAE Content Management Architecture

user#1

user#2

user#3

Remotesystems

UnionCatalog

(Discovery Engine)

Data Provider

Metadata Server

Metadata & Workflow

Server

Missing- Character

Server

Media Center

Repository Manager

Video

Audio

Image

Media Productio

n Streaming

Server

SSO Server

AAA Server

Doc Center

Backend Production

Client

Page 46: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Metadata Server

Page 47: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Goals

The metadata group interacts closely with content holders to look into existing international

metadata activities to define domain-specific metadata and workflow to manage the digital archive

Page 48: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Metadata Server Design

Data Flow Engine

Data Provider of Union Catalog

Index Engine

Content

Holders

Web Surfer

s

Presentation Engine

Preservation Engine

Media Center

Metadata Store

Page 49: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Media Center

Page 50: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Major Functions

A repository of multimedia objectsMedia Processing Rotation, Creating Thumbnails Adding Watermark

Registering a unique name from Local Name Authority

Page 51: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica
Page 52: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Integration with Local Name Authority

Content

Holders

Media CenterLocal Name Authority

Digital ObjectRepository

(URN Handle System)

Page 53: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Union Catalog and Data Provider

Page 54: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Union Catalog ServicesGoals: Archive, Commerce, and Public AccessFunctional Requirements

Full-text Search Using character strings as query to retrieve documents

containing one or all of the strings Dublin Core Search

Search for documents containing a query string in one of the 15 Dublin Core elements

To increase the precision of search results Catalog

Advanced users can make better use of the above two search functions.

However, it is essential for general users to use a hierarchical catalog to get familiar with the archive of digital objects.

For Discovery Purposes

Page 55: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Building an Inter-Agent Union Catalog

Domain metadata

Archive of Digital Objects

Union Catalog

Catalog Mappin

gMetadata-

DC mapping

DC

meta

data

OAI

Page 56: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Individual Content Holder

Domain metadata

Archive of Digital Objects

For In

div

idu

al

Pro

ject

Page 57: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Union Catalog and the Mappings

Domain metadata

Archive of Digital Objects

Union Catalog

Catalog Mappin

gMetadata-

DC mapping

DC

meta

data

For U

nio

n C

ata

log

Page 58: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Defining a Union Catalog

Domain Catalog and Union Catalog Their mapping

Metadata Mapping Mapping essential archive metadata

elements to DC elements One-way mapping

Page 59: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Technical Support for a Union Catalog

OAI-PMH

Archive Storage Server

Archive Storage Server

Archive Storage Server

DP-DB protocol

OAI Data Providers(GKC+KPDB+OAI

extension) Service Provider(GKC + OAI Extension )

Master

Slave

Page 60: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Technical Supports

Tools for Transferring Metadata to OAI Data ProviderTwo additional servers data provider and service provider

Data transfer protocol from metadata database to OAI data providerServer authentication

Page 61: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

An OAI Service Provider

Page 62: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Document Center

http://pkc.iis.sinica.edu.tw/user/ndap/

Page 63: An Overview of Open Digital Archive Architecture Jan-Ming Ho, PhD Research Fellow and Deputy Director Ints. Of Info. Sci., Academia Sinica

Conclusions

Union CategoryDigital Object Model Hierarchical data model is assumed in METS,

OAI-PMH, etc. Relational Model Workflow NARA/ERA and ISO OAIS

Impacts on EducationAAA and E-CommerceModularity and Scalability