ist- 2001-320015 tutorial oai and oai-pmh for beginners an introduction to the open archives...

109
Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller Humboldt University Berlin, Germany [email protected] Andy Powell UKOLN, University of Bath [email protected]

Upload: isabel-kirk

Post on 27-Mar-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

Tutorial OAI and OAI-PMH for Beginners

An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting

Uwe Müller

Humboldt University Berlin, Germany

[email protected]

Andy Powell

UKOLN, University of Bath

[email protected]

Page 2: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners

Agenda

Part IHistory and overview

Part IITechnical introduction

Coffee/tea break Part III

Implementation issues – data provider and service provider

Part IVImplementation issues – XML schema and supporting

multiple record formats

Page 3: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners

Acknowledgements

Some of the slides presented here are our own! Many of them have been kindly donated by (taken

from!):Herbert Van de Sompel

Carl Lagoze

Michael Nelson

Simeon Warner

(and others probably!)

Page 4: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

Tutorial OAI and OAI-PMH for Beginners

An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting

Part I: History and overview Andy Powell

UKOLN, University of Bath

[email protected]

Page 5: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

OAI roots…

the roots of OAI lie in the development of eprint archives…

arXiv, CogPrints, NACA (NASA), RePEc, NDLTD, NCSTRL

each offered Web interface for deposit of articles and for end-user searches

difficult for end-users to work across archives without having to learn multiple different interfaces

recognised need for single search interface to all archives

Universal Pre-print Service (UPS)

Page 6: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

Searching vs. harvesting

two possible approaches to building the UPS… cross-searching multiple archives based on

protocol like Z39.50 harvesting metadata into one or more ‘central’

services – bulk move data to the user-interface US digital library experience in this area (e.g.

NCSTRL) indicated that cross-searching not preferred approach - distributed searching of N nodes viable, but only for small values of N

NCSTRL: N > 100; bad

Page 7: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

Problems of cross-searching

collection descriptionhow do you know which targets to search?

query-language problemsyntax varies and drifts over time between the various nodes

rank-merging problemhow do you meaningfully merge multiple result sets?

performancetends to be limited by slowest target

difficult to build browse interface

Page 8: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

Universal Preprint Service

a cross-archive DL that that provides services on a collection of metadata harvested from multiple archives

based on NCSTRL+; a modified version of Dienst

demonstrated at Santa Fe NM, October 21-22, 1999

http://ups.cs.odu.edu/

D-Lib Magazine, 6(2) 2000 (2 articles)

http://www.dlib.org/dlib/february00/02contents.html

UPS was soon renamed the Open Archives Initiative (OAI) http://www.openarchives.org/

Page 9: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

RDN experience

similar experience within the UK Resource Discovery Network (RDN)

cross-searching of only 5 subject gateways problems with cross-searching approach

performance

central browse interface

looking for metadata harvesting solution

Page 10: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

Data and service providers

UPS identified two logical groups of services… data providers

handle deposit/publishing of resources in archive

expose metadata about resources in archive

service providersharvest metadata from data providers

use it to offer single user-interface across all harvested metadata

note:data provider may also be responsible for human-oriented (I.e. Web) interface to archive

both functions may be offered by same ‘service’

Page 11: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

Metadata harvesting requirements

in order that harvesting approach can work need agreements about…

transport protocols – HTTP vs. FTP vs. … metadata formats – DC vs. MARC vs. … quality assurance – mandatory elements,

mechanisms for naming of people, subjects, etc., handling duplicated records, best-practice

intellectual property and usage rights – who can do what with the records

work in this area resulted in the “Santa Fe Convention”

Page 12: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

OAI-PMH v 1.0 [01/2001]

goal: optimise discovery of document-like objects

inputs…Santa Fe Conventionvarious DLF meetings on metadata harvestingdeliberations at Cornellalpha-testers of OAI-PMH v 1.0 recognition of DC as ‘best’ core metadata

format for interoperability across multiple archives

Page 13: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

OAI-PMH v 1.0 [01/2001]

low-barrier interoperability specification

metadata harvesting model: data provider / service provider

focus on document-like objects

autonomous protocol

HTTP based

XML responses

unqualified Dublin Core

experimental: 12-18 months

Page 14: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

What’s in a name?

Open Archives Initiative

the protocol is openlydocumented, and metadatais “exposed” to at least somepeer group (note: rights management can still apply!)

archive defined as a“collection of stuff” --not the archivist’s definition of “archive”. “Repository” used in most OAI documents.

OAI is happeningat break-neck speed...

Page 15: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

OAI timeline before v. 2.0

October 21-22, 1999 - initial UPS meeting February 15, 2000 - Santa Fe Convention published in D-Lib Magazine

precursor to the OAI metadata harvesting protocol June 3, 2000 - workshop at ACM DL 2000 (Texas) August 25, 2000 - OAI steering committee formed, DLF/CNI support September 7-8, 2000 - technical meeting at Cornell University

defined the core of the current OAI metadata harvesting protocol September 21, 2000 - workshop at ECDL 2000 (Portugal) November 1, 2000 - Alpha test group announced (~15 organizations) January 23, 2001 - OAI protocol 1.0 announced, OAI Open Day in the U.S.

(Washington DC)purpose: freeze protocol for 12-16 months, generate critical mass

February 26, 2001 - OAI Open Day in Europe (Berlin) July 3, 2001 - OAI protocol 1.1 announced

to reflect changes in the W3C’s XML latest schema recommendation September 8, 2001 - workshop at ECDL 2001 (Darmstadt)

Page 16: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

OAI-PMH v.2.0 [06/2002]

goal: recurrent exchange of metadata about

resources between systems

inputs: OAI-PMH v.1.0 feedback on OAI-implementers deliberations by OAI-tech [09/01 - 06/02]

alpha test group of OAI-PMH v.2.0 [03/02 - 06/02]

officially released June 14, 2002

Page 17: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

OAI-PMH v.2.0 [06/2002]

low-barrier interoperability specification

metadata harvesting model: data provider / service provider

metadata about resources

autonomous protocol

HTTP based

XML responses

unqualified Dublin Core

stable

Page 18: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

about eprintsdocument

like objectsresources

metadata OAMSunqualifiedDublin Core

unqualifiedDublin Core

transport HTTP HTTP HTTP

responses XML XML XML

requests HTTP GET/POST HTTP GET/POST HTTP GET/POST

verbs Dienst OAI-PMH OAI-PMH

nature experimental experimental stable

modelmetadataharvesting

metadataharvesting

metadataharvesting

Santa Feconvention

OAI-PMHv.1.0/1.1

OAI-PMHv.2.0

Page 19: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

Flexible deployment

simple protocol based on HTTP and XML allows for rapid deployment

a number of toolkits available – see part III systems can be deployed in variety of

configurations multiple service providers can harvest from

multiple data providers aggregators can sit between data and service

providers harvesting approach can be complemented with

searching based on Z39.50 or SRW

Page 20: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

Multiple data and service p’s

Data providers

Service providers

Harvestingbased onOAI-PMH

Page 21: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

Aggregators

Data providers

Service providers

Aggregator

Page 22: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

Can be mixed with x-searching

Data providers

Service providers

Harvestingbased onOAI-PMH

Searchingbased onZ39.50 orSRW

Page 23: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

Summary

OAI-PMH – OAI Protocol for Metadata Harvesting low-cost mechanism for harvesting metadata

records from one system to anotherfrom ‘data providers’ to ‘service providers’

development over last 2-3 years has seen move from specific (discovery of e-prints) to generic (sharing descriptions of any resources)

based on HTTP and XML – Web-friendly allows client to say ‘give me some or all of your

records’ where ‘some’ is based ondate-stamps, sets, metadata formats

Page 24: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

Summary (2)

mandates simple DC as record format but extensible to any format encoded in XML

OAI-PMH is not a search protocolbut use can underpin search-based services based on Z39.50 or SRW or …

metadata and full-text typically made freely available – but not a requirement

OAI-PMH can be used between closed groups access-control and compression mechanisms

based on underlying HTTP protocol simple protocol allows easy deployment

systems can be combined in variety of ways

Page 25: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I

Important resources

OAI Web site:http://www.openarchives.org/

OAI-PMH specification:http://www.openarchives.org/OAI/openarchivesprotocol.html

Implementation guidelines:http://www.openarchives.org/OAI/2.0/guidelines.htm

Discussion lists:http://www.openarchives.org/mailman/listinfo/oai-general

http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers

Repository explorer:http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai

Tools: http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai

Page 26: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

Tutorial OAI and OAI-PMH for Beginners

An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting

Part II: Technical Introduction Uwe Müller

Humboldt University Berlin, Germany

[email protected]

Page 27: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Agenda

1. Protocol Basics

2. Protocol Details

3. Request Types

4. Examples

Page 28: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

The Open Archives Initiative (OAI)

Main ideasworld-wide consolidation of scholarly archives

free access on the archives (at least: metadata)

consistent interfaces for archives and service provider

low barrier protocol / effortless implementation

based on existing standards (e.g. HTTP, XML, DC)

Basic functioning

Harvester Repository

Requests (based on HTTP)

Metadata (encoded in XML)

Metadata(Documents)

Metadata

Service Provider Data Provider

„Service”

Page 29: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

OAI: General Assumptions

two groups of ‘participants’ Data Providers (Open Archives, Repositories)

free access of metadata

not necessarily: free access to full texts / resources

easy to implement, low barriers

Service Providersuse OAI interfaces of the Data Providers

harvest and store metadata (no live requests!)

may select certain subsets from Data Providers(set hierarchy, date stamp)

may enrich metadata

offer (value-added) service on the basis of the metadata

Page 30: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

OAI-PMH: Structure Model

Se

rvic

e P

rovi

der

e-print

Da

ta

Pro

vid

er e-prints

e-print

Da

ta

Pro

vid

er Images

e-print

Da

ta

Pro

vid

er

OPAC

e-print

Da

ta

Pro

vid

er Museum

e-print

Da

ta

Pro

vid

er Archive

Requests:

Identify

ListMetadataformats

ListSets

ListIdentifiers

ListRecords

GetRecord

Responses:

General information

Metadata formats

Set structure

Record identifier

Metadata

Da

ta

Pro

vid

er Harvester

Repository

Repository

Repository

Repository

Repository

Page 31: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

OAI-PMH: Protocol Overview

protocol based on HTTP

request arguments as GET or POST parameters

six request types

e.g. http://archive.org?verb=ListRecords&from=2002-11-01

responses are encoded in XML syntax

supports any metadata format (at least: Dublin Core)

logical set hierarchy (definition: data providers)

date stamps (last change of metadata set)

error messages

flow control

Page 32: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Agenda

1. Protocol Basics

2. Protocol Details

3. Request Types

4. Examples

Page 33: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Protocol Details: Definitions

Harvesterclient application issuing OAI-PMH requests

Repositorynetwork accessible server, able to process OAI-PMH requests correctly

Resourceobject the metadata is “about”, nature of resources is not defined in the OAI-PMH

Itemcomponent of an repository from which metadata about a resource can be disseminatedhas an unique identifier

Recordmetadata in a specific metadata format

Identifierunique key for an item in a repository

Setoptional construct for grouping items in a repository

Page 34: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Protocol Details: Definitions (2)

resource

all available metadata about David

item

Dublin Coremetadata

MARCmetadata

SPECTRUMmetadata records

item = identifier

Page 35: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Protocol Details: Records

metadata of a resource in a specific format three parts

1. header (mandatory)identifier (1)datestamp (1)setSpec elements (*)status attribute for deleted item (?)

2. metadata (mandatory)XML encoded metadata with root tag, namespacerepositories must support Dublin Core

3. about (optional)rights statementsprovenance statements

Page 36: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Protocol Details: Datestamps

date of last modification of a metadata set mandatory characteristic of every item two possible granularities:

YYYY-MM-DD, YYYY-MM-DDThh:mm:ssZ function: information on metadata, selective

harvesting (from and until arguments) applications: incremental update mechanisms modification, creating, deletion deletion: three support levels

no, persistent, transient

Page 37: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Protocol Details: Metadata Schema

OAI-PMH supports dissemination of multiple metadata formats from a repository

properties of metadata formatsid string to specify the format (metadataPrefix)metadata schema URL (XML schema to test validity)XML namespace URI (global identifier for metadata format)

repositories must be able to disseminate unqualified Dublin Core

arbitrary metadata formats can be defined and transported via the OAI-PMH

returned metadata must comply with XML namespace specification

Page 38: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Protocol Details: Metadata Schema (2)

minimum standard: unqualified Dublin Corehttp://dublincore.org/

Dublin Core Metadata Element Set contains 15 elements

elements are optional

elements may be repeated

The Dublin Core Metadata Element Set:

Title Contributor Source

Creator Date Language

Subject Type Relation

Description Format Coverage

Publisher Identifier Rights

Page 39: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Protocol Details: Sets

logical partitioning of repositories optional – archives do not have to define sets no recommendations not necessarily exhaustive not necessarily strictly hierarchical function: selective harvesting (set parameter) applications:

subject gateways, dissertation search engine, … examples (Germany, see http://www.dini.de)

publication types (thesis, article, …)document types (text, audio, image, …)content sets, according to DNB (medicine, biology, …)

Page 40: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Protocol Details: Request Format

requests must be submitted using the GET or POST methods of HTTP

repositories must support both methods at least one key=value pair: verb=[RequestType] additional key=value pairs depend on request type example for GET request: http://archive.org/oai?

verb=ListRecords&metadataPrefix=oai_dc encoding of special characters

e.g. “:” (host port separator) becomes “%3A”

Page 41: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Protocol Details: Response

formatted as HTTP responses content type must be text/xml status codes (distinguished from OAI-PMH errors)

e.g. 302 (redirect), 503 (service not available) compression: optional in OAI-PMH,

only identity encoding is mandatory response format: well formed XML with markup:

1. XML declaration (<?xml version="1.0" encoding="UTF-8" ?>)

2. root element named OAI-PMH with three attributes(xmlns, xmlns:xsi, xsi:schemaLocation)

3. three child elements1. responseDate (UTC datetime)2. request (request that generated this response)3. a) error (in case of an error or exception condition)

b) element with the name of the OAI-PMH request

Page 42: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Protocol Details: Flow Control

four of the request types return a list of entries three of them may reply ‘large’ lists OAI-PMH supports partitioning decision on partitioning: repository response to a request includes

incomplete listresumption token + expiration date, size of complete list, cursor (optional)

new request with same request type resumption token as parameterall other parameters omitted!

response includesnext (maybe last) section of the listresumption token (empty if last section of list enclosed)

Page 43: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Protocol Details: Flow Control (2)

Example

Harvester Repository

Service Provider Data Provider

“want to have all your records”

archive.org/oai?verb=ListRecords&metadataPrefix=oai_dc

“have 267, but give you only 100”

100 records + resumptionToken “anyID1”

“want more of this”

archive.org/oai?resumptionToken=anyID1

“have 267, give you another 100”

100 records + resumptionToken “anyID2”

“want more of this”

archive.org/oai?resumptionToken=anyID2

“have 267, give you my last 67”

67 records + resumptionToken “”

Page 44: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Protocol Details: Errors and Exceptions

repositories must indicate OAI-PMH errors inclusion of one or more error elements defined error identifiers

badArgument

badResumptionToken

badVerb

cannotDisseminateFormat

idDoesNotExist

noRecordsMatch

noMetaDataFormats

noSetHierarchy

Page 45: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Agenda

1. Protocol Basics

2. Protocol Details

3. Request Types

4. Examples

Page 46: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Request Types

six different request types1. Identify

2. ListMetadataFormats

3. ListSets

4. ListIdentifiers

5. ListRecords

6. GetRecord

harvester has not to use all types repository must implement all types required and optional arguments depend on request types

Page 47: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Request Type: Identify

functiondescription of an archive

example archive.org/oai-script?verb=Identify

parameters none

errors / exceptionsbadArgument

e.g. archive.org/oai-script?verb=Identify&set=biology

Page 48: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Request Type: Identify (2)

Element Example #

repositoryName My Archive 1

baseURL http://archive.org/oai 1

protocolVersion 2.0 1

earliestDatestamp

1999-01-01 1

deleteRecords no, transient, persistent 1

granularity YYYY-MM-DD, YYYY-MM-DDThh:mm:ssZ 1

adminEmail [email protected] +

compression deflate, compress, … *

description oai-identifier, eprints, friends, … *

response format

Page 49: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Request Type: ListMetadataFormats

functionretrieve available metadata formats from archive

example archive.org/oai-script?verb=ListMetadataFormats&

identifier=oai:HUBerlin.de:3000218parameters

identifier (optional)

errors / exceptionsbadArgumentidDoesNotExist

e.g. archive.org/oai-script?verb=ListMetadataFormats&

identifier=really-wrong-identifier noMetadataFormats

Page 50: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Request Type: ListSets

functionretrieve set structure of a repository

example archive.org/oai-script?verb=ListSets

parameters resumptionToken (exclusive)

errors / exceptionsbadArgumentbadResumptionToken

e.g. archive.org/oai-script?verb=ListSets&resumptionToken=any-wrong-token

noSetHierarchy

Page 51: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Request Type: ListIdentifiers

functionabbreviated form of ListRecords, retrieving only headers

example archive.org/oai-script?verb=ListIdentifiers&

metadataPrefix=oai_dc&from=2002-12-01parameters

from (optional)until (optional) metadataPrefix (required)set (optional) resumptionToken (exclusive)

errors / exceptionsbadArgument, e.g. …&from=2002-12-01-13:45:00badResumptionTokencannotDisseminateFormatnoRecordsMatchnoSetHierarchy

Page 52: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Request Type: ListRecords

functionharvest records from a repository

example archive.org/oai-script?verb=ListRecords&

metadataPrefix=oai_dc&set=biologyparameters

from (optional)until (optional) metadataPrefix (required)set (optional) resumptionToken (exclusive)

errors / exceptionsbadArgumentbadResumptionTokencannotDisseminateFormatnoRecordsMatchnoSetHierarchy

Page 53: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Request Type: GetRecord

functionretrieve individual metadata record from a repository

example archive.org/oai-script?verb=GetRecord&

identifier=oai:HUBerlin.de:3000218&metadataPrefix=oai_dc

parametersidentifier (required)metadataPrefix (required)

errors / exceptionsbadArgumentcannotDisseminateFormatidDoesNotExist

Page 54: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Agenda

1. Protocol Basics

2. Protocol Details

3. Request Types

4. Examples

Page 55: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Example: http://edoc.hu-berlin.de/OAI-2.0?verb=ListIdentifiers&from=2002-01-06&until=2002-01-08&metadataPrefix=oai_dc&set=doctypes:dissertations

<?xml version="1.0" encoding="UTF-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2002-10-22T17:49:49+01:00</responseDate> <request verb="ListIdentifiers" from="2002-01-03" until="2002-01-08" metadataPrefix="oai_dc" set="doctypes:dissertations">http://edoc.hu-berlin.de/OAI-2.0</request> <ListIdentifiers> <header> <identifier>oai:HUBerlin.de:3000819</identifier> <datestamp>2002-01-08</datestamp> <setSpec>doctypes</setSpec> <setSpec>doctypes:dissertations</setSpec> <setSpec>dnb</setSpec> <setSpec>dnb:dnb33</setSpec> </header> <header> <identifier>oai:HUBerlin.de:3000831</identifier> <datestamp>2002-01-07</datestamp> <setSpec>doctypes</setSpec> <setSpec>doctypes:dissertations</setSpec> <setSpec>dnb</setSpec> <setSpec>dnb:dnb27</setSpec> </header> </ListIdentifiers> </OAI-PMH>

Page 56: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Example: http://edoc.hu-berlin.de/OAI-2.0?verb=GetRecord&identifier=oai:HUBerlin:3000819&metadataPrefix=oai_dc

<?xml version="1.0" encoding="UTF-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2002-11-27T14:57:01+01:00</responseDate> <request verb="GetRecord" metadataPrefix="oai_dc" identifier="oai:HUBerlin.de:3000819">http://edoc.hu-berlin.de/OAI-2.0</request> <GetRecord> <record> <header> <identifier>oai:HUBerlin.de:3000819</identifier> […] </header> <metadata> <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:title>Einfluß genetischer Variationen im Tumor Nekrose […]</dc:title> <dc:creator>Schüttlöffel, Antje</dc:creator> […] </metadata> </record> </GetRecord></OAI-PMH>

Page 57: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II

Technical Introduction: Questions?

OAI – official sitehttp://www.openarchives.org/

protocol specificationhttp://www.openarchives.org/OAI/openarchivesprotocol

.html

general mailing listhttp://www.openarchives.org/mailman/listinfo/OAI-general/

implementers mailing listhttp://www.openarchives.org/mailman/listinfo/OAI-implementers/

Page 58: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

Tutorial OAI and OAI-PMH for Beginners

An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting

Part III: Implementation Issues

Data Provider and Service Provider Uwe Müller

Humboldt University Berlin, Germany

[email protected]

Page 59: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Agenda

1. General Considerations

2. Data Provider

3. Service Provider

Page 60: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

General: First Questions

Data Provider Which data do I want to deliver?

Which service providers do I want to provide with data?

Service ProviderWhich Service do I want to provide?

From which data providers do I get the metadata?

In which way the metadata have to be processed?

Data Provider & Service ProviderWhich aspects do we have to agree upon?

Page 61: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

General: Metadata Formats / Sets

required: unqualified Dublin Core special subjects / communities: other metadata

specifications may be requireddescribe resources in a specialised way

definition of an XML schema (publicly available for validation)

define set hierarchysensible partitioning for selective harvesting

agreement between data providers and between data and service providers

Page 62: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

General: Organisational Structure

aggregated data providers if harvested by a service provider, “sub data providers” should not be harvested by same SP (duplication ...)

subject gatewaysselective harvesting if corresponding sets have been defined and implemented

Page 63: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Agenda

1. General Considerations

2. Data Provider

3. Service Provider

Page 64: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Data Provider: Prerequisites

metadata on resources (“items”)should be stored in (SQL) database

possible in case of need: file system …

unique identifier for each item

web server, accessible via the internete.g. apache, IIS

programming interface / APIe.g. Perl, PHP, Java-Servlet

web server extension

access to database (or filesystem)

not needed: session management

Page 65: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Data Provider: Prerequisites (2)

archive identifier / base URL unique identifier for items metadata format (at least: unqualified Dublin

Core) datestamps for metadata (created / last modified) logical set hierarchy (may have)

agreement within (subject) communities

flow control / implementation of resumption token (optional, ‘larger’ archives should have that)

Page 66: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Data Provider: Architecture

OAI Data Provider

OAI request(HTTP request)

Web server (e.g. Apache, IIS)

Programming extension (e.g. PHP, Perl,JavaServlets)

DB response

OAI response(XML instance)

Script / Programme- parsing arguments

- creating error messages- creating SQL statements

-creating XML output

SQL-Database

SQL request

Page 67: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Data Provider: General Structure

Argument Parservalidates OAI requests

Error Generatorcreates XML responses with encoded error messages

Database Query / Local Metadata Extractionretrieves metadata from repository

according to the required metadata format

XML Generator / Response Creationcreates XML responses with encoded metadata information

Flow Controlrealises incomplete list sequences for ‘larger’ repositories

uses resumption token as mechanism

Page 68: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Data Provider: Flow Chart

HTTP request

verb

metadataPrefix

ListIdentifie

rs G

etRecord

Identify

ListMe

tadata

-F

orma

ts

ListSets

ListRecords

elseerror: badVerb

error: cannotDiss-eminateFormat

elseemptyresumption

Token

error: badResumptionToken

unknown

read parametersfrom local system

valid parse the otherparameters

oai_dc

send SQL requestto database

rows>100

store parameters,store and deliver resumptionToken

yes

XML response

error: badArgument

empty

• verb, metadataPrefix, resump-tionToken … OAI arguments

• rows … size of the result list• 100 … here: maximal list size

for responses

deliver min (rows, 100)record headers

no

Page 69: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Data Provider: Resumption Token

should be implemented for “large” lists initiated by data provider store parameters (set, from, …) and number of already

delivered records properties

expiration: expirationDate (optional)completeListSize (optional)already delivered records: cursor (optional)recovery from network errors (possibility to re-issue most recent resumption token)

problemdatabase changestwo possible solutions

duplicate data in a “request table”store date of first request with the other parameters use like additional until argument

Page 70: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Data Provider: Resumption Token (2)

Example

Harvester Repository

Service Provider Data Provider

“want to have all your records”

archive.org/oai?verb=ListRecords&metadataPrefix=oai_dc

“have 267, but give you only 100”

100 records + resumptionToken “anyID1”

“want more of this”

archive.org/oai?resumptionToken=anyID1

“have 267, give you another 100”

100 records + resumptionToken “anyID2”

“want more of this”

archive.org/oai?resumptionToken=anyID2

“have 267, give you my last 67”

67 records + resumptionToken “”

Page 71: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Data Provider: Resumption Token (3)

Repository

Data Provider

Example (2)

“want to have all your records”

archive.org/oai?verb=ListRecords&metadataPrefix=oai_dc

Database

“have 267, but give you only 100”

100 records + resumptionToken “anyID1”

“want more of this”

archive.org/oai?resumptionToken=anyID1

“have 268, give you another 100”

100 records + resumptionToken “anyID2”

select dc-datafrom metadata-table

1267 records

2

insert,update,delete

3

select dc-datafrom metadata-table

45

268 records

anyID1 = { from=empty, until=empty, set=empty, mdP=oai_dc, date= 2002-12-05T15:00:00Z, delivered=100}

Page 72: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Data Provider: Data Representation

use recommended data representationdates

2002-12-052002-xx-xx, 2002, 05.12.2002

language codeeng, ger, ...en, de, english, german

multi values: use own XML element for each entityauthor

<dc:creator>Smith, Adam</dc:creator><dc:creator>Nash, John</dc:creator><dc:creator>Smith, Adam; Nash, John</dc:creator>

Page 73: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Data Provider: Compression

method to reduce traffic and enhance performance optional for both sides: data and service providers handled on HTTP level harvesters may include an Accept-Encoding header in

their requests –specifying preferences harvesters without Accept-Encoding header always

receive uncompressed data repositories must support HTTP identity encoding repositories should specify supported encodings by

including compression elements in the identify response

Page 74: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Data Provider: Test and Registration

create own OAI-PMH requests and send to OAI interface – check results

use the Repository Explorer (VT University)http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai/ provide arguments via HTML formsresponses are validated ‘browsing’ to other requestsautomatic conformance tester

official registration sitehttp://www.openarchives.org/data/registerasprovider.html provide base URLextensive conformance test (incl. error conditions …)information on incorrect behaviourin case of conformance – added to the official listregular checks

Page 75: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Agenda

1. General Considerations

2. Data Provider

3. Service Provider

Page 76: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Service Provider: Examples

Repository Explorer: http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai/

search engines / subject gatewaysCross Archive Searching Service: http://arc.cs.odu.edu/

MyOAI: http://www.myoai.org/

DINI: http://edoc.hu-berlin.de/oaisearch/

Physnet: http://physnet.uni-oldenburg.de/oai/query.php

internal communicationProPrint: http://edoc.hu-berlin.de/proprint/

library compounds

Page 77: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Service Provider: Prerequisites

internet connected server database system (relational or XML) programming environment

can issue HTTP requests to web servers

can issue database requests

XML parser

Page 78: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Service Provider: Structure (1)

Archive Management

selection of archives to be harvested

enter entries manually or

automatically add / remove archives using the official registry

Request Component

creates HTTP requests and sends them to OAI archives (data provider)

demands metadata using the allowed verbs of the OAI-PMH

possibly selective harvesting (set parameter)

Page 79: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Service Provider: Structure (2)

Scheduler

realises timed and regular retrieval of the associated archives

simplest case: manual initiation of the jobs

else: e.g. cron job …

Flow Control

resumption token: partitioning of the result list into incomplete sections – anew request to retrieve more results

HTTP error 503 (service not available) – analysis of response to extract “retry-after” period

Page 80: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Service Provider: Structure (3)

Update Mechanism

realises consolidation of metadata which have been harvested earlier (merge old and new data)

easiest case: always delete all ‘old’ metadata of an archive before harvesting it

reasonable: incremental update (from parameter) – insert new metadata and overwrite changed / deleted metadata (assignment using the unique identifiers)

XML Parser

analyses the responses received from the archives

validation: using the XML schema

transforms the metadata encoded in XML into the internal data structure

Page 81: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Service Provider: Structure (4)

Normaliser transforms data into a homogenous structure

(different metadata formats) harmonises representation (e.g. date, author,

language code) maps / translates different languages

Database mapping the XML structure of the metadata into a

relational database (multi values …) or: use an XML database

Page 82: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Service Provider: Structure (5)

Duplication Checker

merges identical records from different data providers

possibility: unique identifier for the item (e.g. URN, …)

but: often not easily practicable and not risk / error free

Service Module

provides the actual service to the ‘public’

basis: harvested and stored records of the associated archives

uses only local database for requests etc.

Page 83: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Service Provider: Architecture

Data Provider Data Provider Data Provider

Scheduler

Flow controlXML Parser

Normaliser

Database

Service module

User Harvester User

OAI Service Provider

Dublication checker

Update mechanism

Administrator

Page 84: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Service Provider: Resumption Token

optional from the data provider’s point of view but: mandatory for service providers for complete lists: resume sequences of

incomplete lists1. ‘recognise’ that response contains incomplete list

2. re-issue OAI request to data provider in order to get next part of the list

Page 85: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Service Provider: Test and Registration

harvest registered ( OAI complient!) data providers

test behaviour of service provider official registration site

http://www.openarchives.org/service/registerasprovider.html

provide institutional information

web site, email address, ...

Page 86: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III

Data & Service Provider: Questions?

Page 87: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

Tutorial OAI and OAI-PMH for Beginners

An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting

Part IV: Implementation issues - XML schemas and support for multiple

record formats Andy Powell

UKOLN, University of Bath

[email protected]

Page 88: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

Agenda

1. basics

2. XML schema details

3. extending oai_dc for your application

4. using IMS metadata as new record format

Page 89: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

Basics

OAI-PMH uses XML Schemas to define record formats

you can exchange any data you like using OAI-PMH as long as you can encode it as XML and define an XML-Schema for it!

OAI-PMH mandates the ‘oai_dc’ XML schema OAI-PMH documentation also describes use of

XML schema to exchangerfc1807: a schema for rfc1807 format metadata; marc21: a recommended schema for MARC21 metadata, provided by the Library of Congress;oai_marc: a schema for MARC format metadata

Page 90: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

A closer look at oai_dc

the simple DC schema used as mandatory record format in OAI-PMH defines a container schema

container schema is OAI-specific container schema is hosted on the OAI Web site imports a generic DCMES schema generic DCMES schema is hosted on the DCMI

Web site same model likely to be used for ‘qualified’ DC

schema – container schema hosted by OAI, generic schema hosted by DCMI

Page 91: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

An oai_dc record…

an example oai_dc record (viewed via the repository explorer)

here’s the full GetRecord response three important things to notice… namespace for the oia_dc format

xmlns:oai_dc=http://www.openarchives.org/OAI/2.0/oai_dc/

namespace for DCMES elementsxmlns:dc=http://purl.org/dc/elements/1.1/

container schema associated with the oai_dc namespace

xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"

Page 92: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

The XML schemas

The oai_dc container schemahttp://www.openarchives.org/OAI/2.0/oai_dc.xsd

imports DCMES schema fromhttp://dublincore.org/schemas/xmls/simpledc20020312.xsd

defines a container element called ‘dc’ lists the allowed elements within the ‘dc’ container

(from the DCMES namespace/schema above)

Page 93: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

When oai_dc isn’t enough

when the 15 DCMES elements are too limited – e.g. adding extra metadata elements

when you need greater precision in your metadata records – e.g. adding ‘encoding schemes’ to existing elements

when you want to exchange other metadata formats

IMS/IEEE LOM – eLearning metadata

ODRL – Open Digital Rights Language

Page 94: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

Extending the oai_dc schema

simple scenario… RDN currently uses oai_dc schema to exchange

records but wants to add one additional element called

accessControl

note: this is not a real scenario…RDN really wants to use qualified DC records – but doing qualified DC too complicated for this tutorial!

hope to write-up RDN work on exchanging qualified DC in future issue of Ariadne

Page 95: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

Step 1 – metadata format name

the new metadata format needs a name in this case, we’ve chosen

rdn_dc

following OAI’s naming of ‘oai_dc’ alternative possibilities

rdndc

rdn

etc.

Page 96: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

Step 2 – create namespaces

two namespaces are required… namespace for the rdn_dc format

http://www.rdn.ac.uk/oai/rdn_dc/

namespace for the new metadata elements (properties) that we are going to use in this format

http://purl.org/rdn/terms/

note:use of Purl for the elements namespace follows DCMI usage but is not mandatoryhowever, both these namespace URIs should be under your control to ensure uniqueness and prevent re-use in the futureURIs do not need to resolve to anything

Page 97: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

Step 3 – local copy of DC schema

make local copy of the DCMES schema in this case the copy is at

http://www.rdn.ac.uk/oai/rdn_dc/20021204/dc.xsd

this step isn’t strictly necessary in fact – it is probably bad practice to do this but, currently some minor problems with the

DCMI-hosted copy of the schema …working with local copy is easier

Page 98: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

Step 4 – schema for new terms

create an XML schema for the new ‘rdnterms’ in this case the schema is available at

http://www.rdn.ac.uk/oai/rdn_dc/20021204/rdnterms.xsd

the schema defines the new element/propertyaccessControl

and adds it to the dc:any group also creates a new container type

rdnterms:elementContainer note:

schema URI contains a date-stampthis should make future enhancements to the schema easier to implement

Page 99: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

Step 5 – container schema

create a container schema for the new record format

in this case the schema is available athttp://www.rdn.ac.uk/oai/rdn_dc/20021204/rdn_dc.xsd

this simply imports the rdnterms schema then defines a container element called ‘rdndc’ of

typerdnterms:elementContainer

again, the schema URI contains a date-stamp

Page 100: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

Step 6 – validate, validate, val…

create some test records using your new schemashttp://www.rdn.ac.uk/oai/rdn_dc/20021204/test.xml

http://www.rdn.ac.uk/oai/rdn_dc/20021204/oai-test.xml

use the XML schema validator athttp://www.w3.org/2001/03/webdata/xsv

Page 101: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

Step 7 – ListMetadataFormats

add information about the new format to your repository’s response to the ‘ListMetadataFormats’ request…

<metadataFormat>

<metadataPrefix>rdn_dc</metadataPrefix>

<schema>http://www.rdn.ac.uk/oai/rdn_dc/20021113/rdn_dc.xsd</schema>

<metadataNamespace>http://www.rdn.ac.uk/oai/rdn_dc/</metadataNamespace>

</metadataFormat>

Page 102: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

Step 8 – other verbs

modify your repository’s response to the ‘ListSets’, ‘ListIdentifiers’, ‘ListRecords’ and ‘GetRecord’ requests

accept ‘metadataPrefix’ set to new format name ‘rdn_dc’

return records formatted according to the new schema(s)

Page 103: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

Step 9 – validate again

use the Repository Explorer to check that: all requests work with new ‘metadataPrefix’ oai_dc format still works! appropriate records are returned for each format responses validate correctly

Page 104: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

Summary

decide on name for your new metadata format and appropriate namespaces

develop XML schemas for container and new elements if appropriate

create test records and validate modify your repository (source code and/or

configuration files) to support the new format validate and test repository

Page 105: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

Other record formats

can take similar approach with other metadata record formats

IMS/IEEE LOM

ODRL

in these cases, XML schemas and namespaces have already been agreed

deployment of these formats should be easier because you don’t need to define your own schemas…

BUT… XML schema specs continually undergoing revisions currently so sometimes hard for applications like IMS to keep up!

Page 106: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV

Adding support for IMS

modify ‘ListMetadataFormats’ response to include

extend ‘ListSets’, ‘ListIdentifiers’, ‘ListRecords’ and ‘GetRecord’ requests

accept ‘metadataPrefix’ set to ‘ims’ and return records formatted appropriately

<metadataFormat>

<metadataPrefix>ims</metadataPrefix>

<schema>http://www.imsglobal.org/xsd/imsmd_v1p2p2.xsd</schema>

<metadataNamespace>

http://www.imsglobal.org/xsd/imsmd_v1p2

</metadataNamespace>

</metadataFormat>

Page 107: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

Tutorial OAI and OAI-PMH for Beginners

An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting

Page 108: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners

Summary

during today’s tutorial we hope that you have gained an overview of the history behind the OAI-

PMH and an overview of its key features been given a deeper technical insight into how the

protocol works learned something about some of the main

implementation issues found some useful starting points and hints that

will help you as implementors

Page 109: IST- 2001-320015 Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Uwe Müller

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners

Questions

now… feel free to tell us what you didn’t understand and ask general questions (of course!)

Uwe Müller

Humboldt University Berlin, Germany

[email protected]

Andy Powell

UKOLN, University of Bath

[email protected]