architecting an extensible digital repository

29
Architecting an Extensible Digital Repository Anoop Kumar, Ranjani Saigal,Rob Chavez, Nikolai Schwertner Tufts University, Medford, MA

Upload: cornelia-corbin

Post on 31-Dec-2015

38 views

Category:

Documents


0 download

DESCRIPTION

Architecting an Extensible Digital Repository. Anoop Kumar, Ranjani Saigal,Rob Chavez, Nikolai Schwertner Tufts University, Medford, MA. Overview. Background Information on the evolution of TDL Design Requirements TDL Architecture Applications that interface with TDL Tufts DL search VUE. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Architecting an Extensible Digital Repository

Architecting an Extensible Digital Repository

Anoop Kumar, Ranjani Saigal,Rob Chavez, Nikolai Schwertner

Tufts University, Medford, MA

Page 2: Architecting an Extensible Digital Repository

Overview

Background Information on the evolution of TDL

Design Requirements TDL Architecture Applications that interface with TDL

– Tufts DL search– VUE

Page 3: Architecting an Extensible Digital Repository

History of Digital Collections at Tufts

About Tufts – Interdisciplinary– Focus on teaching and learning

Digital Collections at Tufts– Perseus (Classics)– Tufts University Science Knowledgebase (TUSK-Medicine)– Artifact (Art History)– Digital Collections and Archives (DCA)

Bolles, etc– Other (Crime and Punishment)

Page 4: Architecting an Extensible Digital Repository

Projects Materials Tools

Perseus DL 50 million words, highly structured TEI encoded XML texts of many types.

50,000 images

Perseus document management system and tools

DCA 13 million words,

35,000 images,

geospatial datasets

multimedia objects

Perseus document management system and tools

TUSK 15,000 documents

Includes full-text syllabi, digital slide images, lecture recordings (audio and video) and text notes and exam questions, evaluation forms, and bibliographies linked to full-text articles.

Networked course management system interface

Artifact 2500 images links to the Art History slide collection database containing 120,000 entries.

On-demand viewing and searching with Internet-based adaptations of traditional learning aids, such as flashcards, for review and study

Page 5: Architecting an Extensible Digital Repository

Why TDL?(Tufts Digital Library)

The collections were continuously expanding adding content in a variety of formats. The architecture of these libraries was not built to accommodate such expansion.

Needed a university wide digital repository that can manage the ever increasing content while continuing to service the discipline specific needs and leveraging existing and new tools and service

Page 6: Architecting an Extensible Digital Repository

Designing TDL

Digital Collections and Archives partnered with Academic Technology to create a digital library that can manage the content while supporting teaching and learning.

Commitment to comply with standards in the library and the open source community.

Ensure Scalability, Flexibility, Reusability, Extensibility and Interoperability

Page 7: Architecting an Extensible Digital Repository

Design Requirements

Ingest: – Ability to enforce archival

standards Management:

– Use of information packages to facilitate storage and dissemination

– Ability to incorporate content models

Persistence:– Use of persistent identifiers– mapped URNs

Requirements System Services

Unique and persistent identification of materials

Naming Service

Use of archival information packages (AIP)

Digital Object Provider (DOP) Service -- Fedora

Use of submission information Packages (SIP)

Drop Box, Ingestion Service

Use of Dissemination Information Packages (DIP)

DOP Service

Authentication and integrity checking

DOP Service

Dissemination Disseminators, Caching Service, Digital Library Application, Search Service

Access Search Service and other applications

Page 8: Architecting an Extensible Digital Repository

Tufts DL Architecture

Fedora

Drop Box

FedoraIngestionService

ApplicationCreationService

Search IndexingService

Naming ServiceSearch Index

SearchInterface

ApplicationData

ApplicationInterface

FedoraClient

M

U

U

AA

U - UsersM - ManagerA - Administrators

Page 9: Architecting an Extensible Digital Repository

Components of TDL

Component Role

Drop Box and Ingestion Service

Validation, Tagging, Preprocessing, Ingestion

Naming Service Unique persistent identifiers mapped to objects (“tufts:dca:central:MS102.33.1345”)

Fedora Repository Management and access framework for digital objects

Search and Indexing Service

Provides search mechanism

Application Creation Service

Provides mechanism for external applications to interface with repository

Page 10: Architecting an Extensible Digital Repository

TDL Architecture

Drop Box and Ingestion Service Naming Service Fedora Repository Service at Tufts Indexing Service and Search Engine Application Creation Service

Page 11: Architecting an Extensible Digital Repository

Drop Box and Ingestion Service

Page 12: Architecting an Extensible Digital Repository

TDL Architecture

Drop Box and Ingestion Service Naming Service Fedora Repository Service at Tufts Indexing Service and Search Engine Application Creation Service

Page 13: Architecting an Extensible Digital Repository

Naming Service

Assigns, reserves and resolves URNs URN Format

tufts:school name:owner:[collection:]item name

tufts:dca:central:MS102.33.1345 URN Properties

– Provides unique ID to objects deposited into repository

– Service assures resolution to unique resource.

Page 14: Architecting an Extensible Digital Repository

TDL Architecture

Drop Box and Ingestion Service Naming Service Fedora Repository Service at Tufts Indexing Service and Search Engine Application Creation Service

Page 15: Architecting an Extensible Digital Repository

Fedora Repository Service@Tufts

Fedora - Key Features Repository at Tufts Content Models at Tufts

– Objects, Behaviors and Disseminator

Implementation Challenges

Page 16: Architecting an Extensible Digital Repository

Flexible Extensible Data Object Repository Architecture (Fedora)

Support for heterogeneous data types Accommodation of new types as they emerge Aggregation of mixed, possibly distributed, data into

complex objects The ability to specify multiple content disseminations

of these objects The ability to associate rights management schemes

with these disseminations.

Page 17: Architecting an Extensible Digital Repository

StorageDevice

High Bandwidth

(20Mb TIFF)HTTP Request

Medium Bandwidth(20Mb TIFF)

HTTP

(200Kb JPEG)

Medium Bandwidth

Request

Caching Service

Fedora

ProcessingService

HTTPServer

store

s URLs

for

User

Applicati

ons

(200Kb JPEG)

Internet Bandwidth

HTTP Request

Repository Model

Page 18: Architecting an Extensible Digital Repository

Content Model (CM) Hierarchy

Specific Implementations

(TEI text, EAD text, Encyclopedia, Directory, TIFF image, etc)

Text CM

•getTOC

•getChunksList

•getChunk

•Etc.

Image CM

•getThumbnail

•getAccessHigh

•getImageStats

•Etc.

Binary CM

•getObject

•getMIME

•Etc.

Collection CM

•getObjects

•getInfo

•Etc.

VUE CM

•getConceptMap

•getResource

•Etc.

Indexing Disseminators

•getIndexTerms

•getForIndexing

•Etc.

Repository-Level Disseminators

•getArchivalCopy

•getPreview

•getClass

•Etc.

Page 19: Architecting an Extensible Digital Repository

Implementation Challenges

Processing Large XML Documents Transforming Large Images Modeling Collections Advanced Search Customized Search Caching Disseminations

Page 20: Architecting an Extensible Digital Repository

TDL Architecture

Drop Box and Ingestion Service Naming Service Fedora Repository Service at Tufts Indexing Service and Search Engine Application Creation Service

Page 21: Architecting an Extensible Digital Repository

Indexing Service and Search Engine

Indexing– Specialized Polymorphic Disseminators

Implementation– Lucene

Supported Types of Search– Basic Keyword– Advanced metadata based

Accessing the service – HTTP GET/POST– SOAP

Page 22: Architecting an Extensible Digital Repository

TDL Architecture

Drop Box and Ingestion Service Naming Service Fedora Repository Service at Tufts Indexing Service and Search Engine Application Creation Service

Page 23: Architecting an Extensible Digital Repository

Application Creation Service

An important design requirement for TDL was to allow current digital library applications to easily interface with TDL and provide access to the content in the digital library within their own environments in a seamless fashion.

Current applications like Perseus can interface with this service to allow their tools to disseminate the content that resides in TDL

The service has been designed not only to support current application but also to accommodate the needs of future yet-to-be-defined applications like course management systems, learning tools, portals etc.

Page 24: Architecting an Extensible Digital Repository

Applications Accessing TDL Content

Tufts DL Search Visual Understanding Environment (VUE)

Page 25: Architecting an Extensible Digital Repository
Page 26: Architecting an Extensible Digital Repository
Page 27: Architecting an Extensible Digital Repository

Visual Understanding Environment (VUE)

VUE

OKI

FEDORA

DRAPI

DigitalRepository

OKI-FEDORA Bridge

Technical Infrastructure

DR Implementations

DigitalRepository

VUE Architecture

Page 28: Architecting an Extensible Digital Repository

Why TDL?(Tufts Digital Library)

The collections are continuously expanding adding content in a variety of formats. The current architecture of these libraries is not built to accommodate such expansion.

Need a university wide digital repository that can manage the ever increasing content while continuing to service the discipline specific needs and leveraging existing and new tools and service

Page 29: Architecting an Extensible Digital Repository

Future Direction

Authentication and authorization service Customization and enhancement to

Fedora@Tufts to address a wide variety of needs.

Provide automated browsing service for Repository.