lecture 14 interoperability for information discovery

Post on 14-Jan-2016

20 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Lecture 14 Interoperability for information discovery. CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell.edu. Acknowledgements: Carl Lagoze. Why interoperability?. - PowerPoint PPT Presentation

TRANSCRIPT

1 herbert van de sompel

Lecture 14 Interoperability for information discovery

CS 502 Computing Methods for Digital Libraries

Cornell University – Computer ScienceHerbert Van de Sompelherbertv@cs.cornell.edu

Acknowledgements: Carl Lagoze

2 herbert van de sompel

Why interoperability?

The distributed information environment creates enormous challenges regarding the provision of coherent services. Addressing these challenges requires some form of interoperability.

3 herbert van de sompel

OPAC FTXT

FTXTe-printA&I

A&I

distributed

herbert van de sompel

Information resources: distributed

4 herbert van de sompel

OPAC FTXT

FTXTe-printA&I

A&I

range of authorities, technologies

herbert van de sompel

Information resources: different authorities, technologies

5 herbert van de sompel

OPAC FTXT

FTXTe-printA&I

A&I

¡¡ challenges re integrated access !!

herbert van de sompel

Information resources: challenges re coherent services

6 herbert van de sompel

Interoperability helps!

• UNICODE• XML• XML Schema, XML Namespaces• HTTP• Metadata formats (MARC, Dublin Core, …)

• This lecture: interoperability for resource discovery (searching)• Next week: interoperability for reference linking

7 herbert van de sompel

• reaching interoperability is an organizational challenge (not only technical)

• the challenge is to create incentives for independent digital libraries to adopt specifications

Interoperability: technical & organizational

8 herbert van de sompel

Functionality versus cost of acceptance

Functionality

Cost of acceptance

Metadata HarvestingOAI

SDLIP

Z39.50

This picture does not show the eventual benefit of a protocol in the creation of coherent services

9 herbert van de sompel

A&I

federated searching

image

FTXT

OPAC

e-print

10 herbert van de sompel

Z39.50

http://www.loc.gov/z3950/agency/

11 herbert van de sompel

• Permits one computer (z3950 client) to search and retrieve information on another computer serving a database (z3950 target)

• Important both technically and for its wide use in library systems, A&I databases

• Most development has concentrated on bibliographic data

• Most implementations emphasize searches that use a bibliographic set of attributes to search databases of MARC records

Aims of z39.50

12 herbert van de sompel

• Developed for X.25 networks (connection orientation), conversion to run over TCP fitted later

• Original concept in days when repeating a search was expensive computation (about 1980)

• NISO standard

Technical history of z39.50

13 herbert van de sompel

Abstract view of database searching.

• Server stores a set of databases with searchable indexes

• Interactions are based on a session

• The client opens a connection with the server, carries out a sequence of interactions and then closes the connection.

• During the course of the session, both the server and the client remember the state of their interaction.

z39.50 - principles

14 herbert van de sompel

• The server carries out the search and builds a results set

• Server saves the results set.

• Subsequent message from the client can reference the result set.

• Thus the client can modify a large set by increasingly precise requests, or can request a presentation of any record in the set, without searching entire database.

z39.50 - state

15 herbert van de sompel

init -- client connects to the server and exchanges initial information, e.g., preferred message size

explain -- client inquires of the server what databases are available for searching, the fields that are available, the syntax and formats supported, etc.

search -- client presents a query to a database; choices of syntax for specifying searches

• Boolean queries widely implemented

z39.50 - services

manipulation of results sets -- e.g., sort or delete

present -- requests the server to send specified records from the results set to the client in a specified format

scan – browse indexes

16 herbert van de sompel

In the database named Books find all records for which the access point title contains the value evangeline and the access point author contains the value longfellow.

Z39.50 defines a rich variety of search access points (use attributes) that can be extended by implementers

z39.50 – sample query

http://lcweb.loc.gov/z3950/agency/defns/bib1.html

17 herbert van de sompel

z39.50 – issues

No general acceptance (for instance not supported by web browsers => httpd/z3950 gateways)

Merging results for distributed search

Semantic interoperability ~ z39.50 profiles

18 herbert van de sompel

Simple Digital Library Interoperability Protocol

http://www-diglib.stanford.edu/~testbed/doc2/SDLIP/

19 herbert van de sompel

• Compromise between a full-scale, all encompassing approach such as Z39.50 and the “anything goes” approach typical for ad-hoc search interface design on web

• Developed jointly by Stanford, Berkeley, and UC Santa Barbara

• Heavily influenced by DASL from IETF

SDLIP

20 herbert van de sompel

SDLIP – search middleware

21 herbert van de sompel

SDLIP – managing complexity via separating interfaces

22 herbert van de sompel

• Search Interface – defines simple query language, protocol can then include other languages

• Result Interface – parking meter metaphor (server decides on length of session)

• Source Metadata Interface – allows to query a library proxy about its capabilities (subcollections, attributes that can be searched, …)

SDLIP - interfaces

23 herbert van de sompel

Open Archives Initiative Metadata Harvesting Protocol

http://www.openarchives.org

24 herbert van de sompel

• Low-barrier framework for repository interoperability• Minimal burden for parties providing databases• Plug-in concept to allow community and service

specialization (placeholders – about, descriptor ; parallel metadata formats; …)

OAMH protocol

25 herbert van de sompel

metadata

A&I

image

OPAC

e-print

FTXT

harvester

FTXT

OAMH - metadata harvesting

26 herbert van de sompel

metadata

A&I

image

FTXT

e-print

AuthorTitleAbstractIdentifer

OPAC

OAMH – federated services

27 herbert van de sompel

service provider data provider

Requests

Replies

repos i tory

harves ter

6

OAMH – protocol

28 herbert van de sompel

• low-barrier interoperability

• data-provider & service-provider model

• metadata harvesting model OAMH protocol

Dublin Core

HTTP basedReply • XML Schema

• Self contained• shared metadata format and parallel, community-

specific metadata formats

OAMH – core concepts

• authentication etc. : on purpose outside of the

protocol

29 herbert van de sompel

service provider data provider

DatestampIdentifierSet

Records

repos i tory

harves ter

OAMH – protocol tools

30 herbert van de sompel

Supporting protocol requests:• Identify• ListMetadataFormats• ListSets

Harvesting protocol requests:• ListRecords• ListIdentifiers• GetRecord

repos i tory

service provider data provider

harves ter

OAMH – protocol tools

31 herbert van de sompel

ListMetadataFormats

ListMetadataFormats / Time / Request REPEAT

• Format prefix• Format XML schema

/REPEAT

repos i tory

service provider data provider

harves ter

OAMH – a supporting protocol request

32 herbert van de sompel

* from=a * until=b * set=klmListRecords * metadataPrefix=dc

ListRecords / Time / Request REPEAT

• Identifier• Datestamp

• Metadata/REPEAT

repos i tory

service provider data provider

harves ter

OAMH – a harvesting protocol request

33 herbert van de sompel

• federated services [S&R, SDI, alerting, linking, ...]• database synchronization• harvesting the deep Web• ...

OAMH – applications?

34 herbert van de sompel

OAMH – issues

• sync between data provider and service provider (cf. Web page harvesters)

• everyone harvesting from everyone? (cf. brokers)

• CAN attractive cross-community services be built?

• gaining critical mass

35 herbert van de sompel

• There is (and will never be) one right solution (technical vs. cost vs. complexity vs. ??) (cf metadata)

• Distributed technical solutions have organizational ramifications

• A revival of the simplicity approach: specifications that require little effort by the digital libraries involved, but that can eventually lead to appealing services (when combined with brute force computing and intelligent tools)

Some thoughts

top related