an overview of open digital archive architecture jan-ming ho, phd research fellow and deputy...
Post on 25-Dec-2015
224 Views
Preview:
TRANSCRIPT
An Overview of Open Digital Archive Architecture
Jan-Ming Ho, PhDResearch Fellow and Deputy DirectorInts. Of Info. Sci., Academia Sinica
The Problem
Collection
Management
proofreading
PreservationFront-end
ProductionDisseminationDigitization Presentation
Workflow AAAUser Services and
ManagementValue-Added
Services
Knowledge
Discovery
Other archive systems
Catalog Service
Multimediaraw data andmetadata
Digital Archive Model
Requirements for NDAEDigital Archive Working Environment
Collection, digitization workflow, and storage Metadata, indexing, and digital object management Discovery and Dissemination Content distribution Retrieval and presentation Models the requirements of content holders and users
Scalability and InteroperabilityMultimedia Processing and Presentation
Retrieval, watermark, summarization, virtual reality, etc.
Multilingual Requirements Unicode and Han Variants Missing Han Characters Thesaurus
AAA – Authentication, Authorization, and AccountingUnion Catalog and Value-added Services
Sample Content Projects in NDAP
Rubbings of Bronze, Stones, and Bamboo Slips Holomorphic rubbings
Archaeological ExcavationsSeal Database of Rare BooksArchives of Specimens of Insects, Fish, and Shell, etc.Old Chinese PaintingsEngravings on Bronze Wares made in Chin Dynasty (265-289A.D.)
Management of Holomorphic Rubbings
Management of Holomorphic Rubbings
Management of Holomorphic Rubbings
Management of Holomorphic Rubbings
Management of Holomorphic Rubbings
Directory of Species
Specimen Information System
Metadata Design
Domain-specific and internationalizationStandardizing metadata to facilitate preservation and dissemination of digital objects, and their applications
A Service Infrastructure
Dark ArchiveDark Archive
Content Creation Content Creation and Managementand Management
UnionUnionCatalogCatalog
CentralizedCentralizedHostingHosting
DomainDomainCatalogCatalog
AccessAccess
ValueValueAddedAdded
ValueValueAddedAdded
EducationEducationServiceService
ValueValueAddedAdded
Content Creation Content Creation and Managementand Management
An Educators’ Platform
Education
Resource Exchang
e Platform
Front-end
Back-end
Educational Resources
Online Journals
Education Material
Textbook, Reading
Government Institutes, Non-governmental
Consulting Teams, Seeding Schools
Online Counseling
Educators’ Activities
1. Retrieval of lesson plan and other educational resources
2. Community Interaction
3. Teaching Activity
4. Experience Sharing
5. Journal submission
A Survey of Related Standards
OAIS Preservation Metadata
Open Archive Information System Preservation MetadataPreservation metadata is the information infrastructure that supports the processes associated with digital preservation. the information necessary to maintain the viability, renderability, and understandability of digital resources over the long-term. an OAIS has three basic functions: ingest, storage and disseminationIn the ERA concept, these functions are executed in three virtual workspaces: Accession, Archival, and Reference workbenches.
ERA Block Diagram from [1]
[1] Kenneth Thibodeau, “Building the Archives of the Future, Advances in Preserving Electronic Records at the National Archives and Records Administration,” D-Lib Mag., vol. 7, no. 2, Feb. 2001.
OAI-PMH and Dublin Core
OAI Protocol for Metadata Harvesting Open Archives Initiative Protocol for Metadata
Harvesting provides an application-independent interoperability
framework based on metadata harvesting
Dublin Core address the problem of resource discovery for
networked resources 15-element set of descriptors interdisciplinary and international consensus
reached on the semantics of each of the 15 elements
A Typical OAI-PMH Architecture
OAI-PMHPrivate protocol
Data Providers(Web Server + OAI +
Metadata cache) Service Provider(PKC + OAI Harvester )
Data Providers(Web Server + OAI +
Metadata cache)
Name SpaceDOI
The Digital Object Identifier (DOI®) is a system for identifying and exchanging intellectual property in the digital environment.
URI A URI can be further classified as a locator, a name, or both. "Uniform Resource Locator" (URL) refers to the subset of URI that
identify resources via a representation of their primary access mechanism (e.g., their network "location"), rather than identifying the resource by name or by some other attribute(s) of that resource.
"Uniform Resource Name" (URN) refers to the subset of URI that are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable.
<scheme>:<scheme-specific-part>URN
Uniform Resource Names (URNs) are intended to serve as persistent, location-independent, resource identifiers and are designed to make it easy to map other namespaces into URN-space.
"urn:" <namespace-identifier>":" <namespace-specific-string>
Descriptive Metadata
METS The METS schema is a standard for
encoding descriptive, administrative, and structural metadata regarding objects within a digital library
EAD The EAD Document Type Definition (DTD) is
a standard for encoding archival finding aids using the Standard Generalized Markup Language (SGML).
More on Descriptive Metadata used in NDAP
MARCTEICDWASpecies 2000 Data StandardECHO OLACMS CSDGMMARC 21 Concise format for Authority DataADL Gazatteer Content Standard
Our Approach
Architecture of ODAE
user#1
user#2
user#3
Remotesystems
UnionCatalog
(Discovery Engine)
Data Provider
Metadata Server
Metadata & Workflow
Server
Missing- Character
Server
Media Center
Repository Manager
Video
Audio
Image
Media Productio
n Streaming
Server
SSO Server
AAA Server
Doc Center
Backend Production
Client
Missing Character Server
Number of Hanzi Characters
BIG5: 13,051GB 2312: 6,763GBK: 21,003GB 18030-2000: 27,000+Unicode 2.1: 20,902Unicode 3.0: 27,484Unicode 3.1: 70,195Estimated number of characters: 50,000+Estimated number of glyphs: 100,000+In common use: 8,000 – 9,000
Missing Character Problem
C.C. Hsieh, et. al. Glyph Expression Maintains a Hanzi Glyph Database
Preparation Heavy users, e.g., content holders Occasional users
Network Presentation Retrieval of documents containing mission characters
Preparing Missing Characters by Content Holders
Installing Hanzi glyph database at the client URL: http://ckip.iis.sinica.edu.tw/CKIP/tool/ It also contains MS Office document
templates for preparing glyph expressions
Inserting glyph expression wherever needed in a document or database
Presenting Missing Characters
Content
Holder
………… glyph
expression …………
Java Applet
………… glyph
expression …………
Java Applet
………… <img>….</img>…………
Glyph Image Server
Client
1.
2. 3.
4.
Web Server
Presentation module
Glyph Image Server
Accept a glyph expression encoded in the form of a CGI queryReturns a glyph image
Missing Character Presentation
The web server automatically inserts a presentation applet into each outgoing web page
Author can also choose to insert the applet into the HTML document
The presentation applet retrieve the same HTML document from the server
Netscape 4.x compatibility The web server extracts the glyph expression from the
document, and converts it into a CGI query for the glyph image server and Writes it back to the browser’s cache
The web browser renders the new web page with the glyph image retrieved from the glyph image server
Network-based Input Method for Missing Characters
Retrieving Documents with Missing Characters
ODAE Content Management Architecture
user#1
user#2
user#3
Remotesystems
UnionCatalog
(Discovery Engine)
Data Provider
Metadata Server
Metadata & Workflow
Server
Missing- Character
Server
Media Center
Repository Manager
Video
Audio
Image
Media Productio
n Streaming
Server
SSO Server
AAA Server
Doc Center
Backend Production
Client
Metadata Server
Goals
The metadata group interacts closely with content holders to look into existing international
metadata activities to define domain-specific metadata and workflow to manage the digital archive
Metadata Server Design
Data Flow Engine
Data Provider of Union Catalog
Index Engine
Content
Holders
Web Surfer
s
Presentation Engine
Preservation Engine
Media Center
Metadata Store
Media Center
Major Functions
A repository of multimedia objectsMedia Processing Rotation, Creating Thumbnails Adding Watermark
Registering a unique name from Local Name Authority
Integration with Local Name Authority
Content
Holders
Media CenterLocal Name Authority
Digital ObjectRepository
(URN Handle System)
Union Catalog and Data Provider
Union Catalog ServicesGoals: Archive, Commerce, and Public AccessFunctional Requirements
Full-text Search Using character strings as query to retrieve documents
containing one or all of the strings Dublin Core Search
Search for documents containing a query string in one of the 15 Dublin Core elements
To increase the precision of search results Catalog
Advanced users can make better use of the above two search functions.
However, it is essential for general users to use a hierarchical catalog to get familiar with the archive of digital objects.
For Discovery Purposes
Building an Inter-Agent Union Catalog
Domain metadata
Archive of Digital Objects
Union Catalog
Catalog Mappin
gMetadata-
DC mapping
DC
meta
data
OAI
Individual Content Holder
Domain metadata
Archive of Digital Objects
For In
div
idu
al
Pro
ject
Union Catalog and the Mappings
Domain metadata
Archive of Digital Objects
Union Catalog
Catalog Mappin
gMetadata-
DC mapping
DC
meta
data
For U
nio
n C
ata
log
Defining a Union Catalog
Domain Catalog and Union Catalog Their mapping
Metadata Mapping Mapping essential archive metadata
elements to DC elements One-way mapping
Technical Support for a Union Catalog
OAI-PMH
Archive Storage Server
Archive Storage Server
Archive Storage Server
DP-DB protocol
OAI Data Providers(GKC+KPDB+OAI
extension) Service Provider(GKC + OAI Extension )
Master
Slave
Technical Supports
Tools for Transferring Metadata to OAI Data ProviderTwo additional servers data provider and service provider
Data transfer protocol from metadata database to OAI data providerServer authentication
An OAI Service Provider
Document Center
http://pkc.iis.sinica.edu.tw/user/ndap/
Conclusions
Union CategoryDigital Object Model Hierarchical data model is assumed in METS,
OAI-PMH, etc. Relational Model Workflow NARA/ERA and ISO OAIS
Impacts on EducationAAA and E-CommerceModularity and Scalability
top related