dienst distributed networked publishing

32
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University

Upload: dinah

Post on 15-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Dienst Distributed Networked Publishing. Carl Lagoze Digital Library Scientist Cornell University. Cornell Digital Library Research Group (CDLRG). Research and Development of Component-Ware Digital Library Infrastructure - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dienst Distributed Networked Publishing

DienstDistributed Networked Publishing

Carl LagozeDigital Library Scientist

Cornell University

Page 2: Dienst Distributed Networked Publishing

2

Cornell Digital Library Research Group (CDLRG)

• Research and Development of Component-Ware Digital Library Infrastructure

• Developed out of DARPA-funded Computer Science Technical Reports Projects (CS-TR)

Page 3: Dienst Distributed Networked Publishing

3

Component-Ware Digital Libraries

• Service-based infrastructure– Interface (protocol) of each service– Interactions between services– aggregations into logical collections and libraries

• Layered approach accommodates requirements of varying clientele– research libraries - high-integrity, quality of

service, security– informal collections - e.g., web

Page 4: Dienst Distributed Networked Publishing

4

CDLRG Research Projects

• FEDORA

• Distributed Searching and Resource Discovery

• Digital Library Collection Definition

• Metadata (Dublin Core and Warwick Framework)

• Networked Computer Science Technical Reports Project (www.ncstrl.org)

Page 5: Dienst Distributed Networked Publishing

5

What is NCSTRL?

A Vehicle and Testbed for Digital Library Interoperability

A Vehicle for Exploring Policy and Organization

A Production Digital Collection

Page 6: Dienst Distributed Networked Publishing

6

• A growing collection of CS research reports

• A service relied on by users and publishers

• Motivates solving hard, real-world problems: IPR, quality of service, federation of publishers

A Production Digital Collection

Page 7: Dienst Distributed Networked Publishing

7

• Create a modular system based on a standard open architecture

• Provide a testbed for demonstrating and testing new digital library components

• Work with variety of researchers: DLI, ERCIM, Los Alamos

A Testbed for Technology

Page 8: Dienst Distributed Networked Publishing

8

A Vehicle for Exploring Policy and Organization

• Creating a self-sustaining international federated digital collection

• Extending the domain and scope while maintaining a coherent collection

• Policy issues: charging, IPR, liability, technical quality, relationshipto other DL organizations

Page 9: Dienst Distributed Networked Publishing

9

Origins of NCSTRL

• DARPA-funded CS-TR Project– CNRI, Berkeley, CMU, Cornell, MIT,

Stanford

• NSF-funded WATERS Project– Old Dominion, SUNY Buffalo, Virginia,

Virginia Tech

• Other CS Tech Reports Efforts– Harvest, UCSTRI, NZDL

Page 10: Dienst Distributed Networked Publishing

10

NCSTRL Project Participants

• NCSTRL Steering Committee

• NCSTRL Working Group

• Cornell Digital Library Research Group

• The Collection

Page 11: Dienst Distributed Networked Publishing

11

NCSTRL Steering Committee

• Responsible for policy direction, oversight

• How to broaden interoperability efforts into broader community

Page 12: Dienst Distributed Networked Publishing

12

NCSTRL Working Group

• Responsible for operational oversight of the current system

• Membership from CSTR and WATERS projects

Page 13: Dienst Distributed Networked Publishing

13

Cornell Digital Library Research Group

• Responsible for day-to-day support and maintenance of existing system

• Clearing house for technical collaborations

• Evolution and Research Directions

Page 14: Dienst Distributed Networked Publishing

14

Contributing Institutions

105 Institutions in US, Europe, and Asia

Page 15: Dienst Distributed Networked Publishing

15

Dienst

• is a protocol and reference implementation of a distributed digital library service

• where a network of services provide

• World Wide Web browser access,

• uniform search over distributed indexes,

• and multi-formatted documents.

Page 16: Dienst Distributed Networked Publishing

16

Dienst document model

decompositions

physical logical

representations

AS

CII

TIF

F

Pos

tScr

ipt

met

adat

a

Document Handle (URN)

Page 17: Dienst Distributed Networked Publishing

17

Exposing the Model through the Protocol

• Documents addressable through their URNs

• Document service requests– get document metadata– get document formats– get document in format– get document partition (page) in format

Page 18: Dienst Distributed Networked Publishing

18

Dienst Services

send search request

WWWbrowser

Dienst UserInterface

Repository

IndexIndex Index

Repository Repository

send document requestreceive MIME-typed document

send document requestreceive MIME-typed document

send site specific search requestreceive hit list

receive unified hit list

Page 19: Dienst Distributed Networked Publishing

19

Exposing the Services through the Protocol

• All protocol requests are service specific,

• so the functionality of any service can be accessed by another service or a new service.

Page 20: Dienst Distributed Networked Publishing

20

Gateways to non-Conforming Sites

FTP/HTTP “Repositories”

Standard Servers

User Interface Gateway Server

Page 21: Dienst Distributed Networked Publishing

21

Use by External Services

User Interface

Search Engine(Z39.50)

Page 22: Dienst Distributed Networked Publishing

22

Publishing Using DienstRetrospective Conversion

• Scanning of legacy documents– Cornell– MIT– Stanford

• Conversion to common formats– gifs– thumbnails– PostScript

Page 23: Dienst Distributed Networked Publishing

23

Publishing with DienstDigital Originals

• PostScript as lingua franca– “thanks Microsoft”

• Form submission– author-generated descriptive metadata

• Clerical clearing-house

• Automatic format conversion

Page 24: Dienst Distributed Networked Publishing

24

Collection Definition in Digital Libraries

• Multiple levels of selection– authors “publish”– repositories have submission policies– search engines index– objects in search engines aggregated into collections– user interface gateways provide access to multiple

collections

• What is “in” a digital library is defined by what can be found using its resource discovery tools

Page 25: Dienst Distributed Networked Publishing

25

Defining the Collection -Collection Service

Page 26: Dienst Distributed Networked Publishing

26

Regional Structure

central collectionserver

Page 27: Dienst Distributed Networked Publishing

27

Connectivity Regions and Collection Views

Page 28: Dienst Distributed Networked Publishing

28

Improvements to the Protocol - Dienst 5

• Incremental enhancement to existing interoperability framework

• Improved document model– versions– hierarchical part specification– binders (multi-part documents)

• Implementation currently under development

Page 29: Dienst Distributed Networked Publishing

29

Dienst 5 Document Structure

• Structure Request– Reveal, in XML, full or collapsed structure

of a document• e.g., chapters, sections, figures, etc.

– Describe multiple views of a document• e.g., bibliography, content, thumbnails

Page 30: Dienst Distributed Networked Publishing

30

Dienst 5 Document Dissemination

• Disseminate Request– Access to component(s) described by

Structure– e.g., disseminate chapter 2 page 5 in

PostScript

Page 31: Dienst Distributed Networked Publishing

31

Supporting Multiple Collections

• NCSTRL is currently a single collection• Other users of Dienst protocol

– European gray literature, thesis, and dissertation collections

– NASA space science– Mediterranean environment data and software– Los Alamos Pre-prints

• Expanding the technology to multiple collections through regions

Page 32: Dienst Distributed Networked Publishing

32

Lessons Learned and Work to be Done

• Intellectual property• Quality

– quality of collection (reviewing)– quality of metadata– quality of service

• Resisting information entropy• Richer “documents”• Archiving and Preservation