open archives iniative – protocol for metadata harvesting

21
Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte, KUL Joris Klerkx, KUL

Upload: orde

Post on 29-Jan-2016

66 views

Category:

Documents


0 download

DESCRIPTION

Open Archives Iniative – Protocol for Metadata Harvesting. Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte, KUL Joris Klerkx, KUL. What is OAI?. Harvesting standard, documented at http://www.openarchives.org/OAI/openarchivesprotocol.html - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Open Archives Iniative –  Protocol for Metadata Harvesting

Open Archives Iniative – Protocol for Metadata Harvesting

Iztok Kavkler, University of Ljubljana

Some slides byStefaan Ternier, KULBram Vandenputte, KULJoris Klerkx, KUL

Page 2: Open Archives Iniative –  Protocol for Metadata Harvesting

2

What is OAI?

Harvesting standard, documented athttp://www.openarchives.org/OAI/openarchivesprotocol.html

Seven service verbs– Identify– ListMetadataFormats– GetRecord– ListRecords– ListIdentifiers– ListSets

Allows multiple metadata formats– DC (Dublin core) format mandatory

Page 3: Open Archives Iniative –  Protocol for Metadata Harvesting

3

How OAI works

OAI “VERBS”– Identify – ListMetadataFormats– GetRecord– ListIdentifiers– ListRecords– ListSets

HARVESTER

REPOSITORY

OAI OAI

Service Provider Metadata Provider

HTTP Request

HTTP Response

(OAI Verb)

(Valid XML)

Page 4: Open Archives Iniative –  Protocol for Metadata Harvesting

4

Try it

Install Apache-Tomcat or any other Java servlet container

Download WAR file from

http://fire.eun.org/Iztok/OAILREApp.war Deploy WAR Demo html

http://localhost:8080/OAILREApp/

Or type a service verb, e.g.http://localhost:8080/OAILREApp/oaiHandler?verb=Identify

Page 5: Open Archives Iniative –  Protocol for Metadata Harvesting

5

The raw XML

By default, the resulting XML has stylesheet attached for pretty rendering

To remove the stylesheet comment the line

OAIHandler.styleSheet=testoai/oaicat.xsl

in file

oaicat.properties (in WAR file or the web-app dir)

Page 6: Open Archives Iniative –  Protocol for Metadata Harvesting

6

OAI XML example<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" ...><responseDate>2007-06-11T06:48:58Z</responseDate><request metadataPrefix="oai_lom"

verb="ListRecords">http://localhost:8080/OAILREApp/oaiHandler</request><ListRecords> <record> <header>

<identifier>oai:oai.xyz-repository.com:exercises/112553</identifier><datestamp>2007-06-09T22:38:28Z</datestamp><setSpec>exercises</setSpec>

</header> <metadata>

<lom xmlns=...> ... </lom> </metadata> </record>

....<resumptionToken expirationDate="2007-06-11T07:48:58Z"completeListSize="42" cursor="10">1181544538265</resumptionToken></ListRecords></OAI-PMH>

Page 7: Open Archives Iniative –  Protocol for Metadata Harvesting

7

OAICat - a Java implementation

OAICat home athttp://www.oclc.org/research/software/oai/cat.htm

Takes care of– web service details– OAI XML specification

The implementer has to provide three classes– RepositoryOAICatalog– RepositoryRecordFactory– Repository2oai_dc (lom, ...) - usually more than

one

Page 8: Open Archives Iniative –  Protocol for Metadata Harvesting

8

A sample implementation

(Source code and libs inhttp://fire.eun.org/Iztok/OAILREApp.zip)

Create a new web module Add servlet oaiHandler to web.xml<servlet>

<servlet-name>LreOAIHandler</servlet-name>

<servlet-class>ORG.oclc.oai.server.OAIHandler</servlet-class>

<load-on-startup>5</load-on-startup>

</servlet>

<servlet-mapping>

<servlet-name>LreOAIHandler</servlet-name>

<url-pattern>/oaiHandler</url-pattern>

</servlet-mapping>

Page 9: Open Archives Iniative –  Protocol for Metadata Harvesting

9

(cont)

Define properties file location<context-param>

<param-name>properties</param-name>

<param-value>oaicat.properties</param-value>

</context-param>

Welcome file for testing<welcome-file-list>

<welcome-file>testoai/index.html</welcome-file>

</welcome-file-list>

Page 10: Open Archives Iniative –  Protocol for Metadata Harvesting

10

Sample record

A record with basic fieldsid, url, title, descr and date

SampleOAICatalog contains an array with 3 sample records

Page 11: Open Archives Iniative –  Protocol for Metadata Harvesting

11

SampleOAICatalog.listIdentifiers

Parameters– from – date to harvest from (String in iso8601

format) date or datetime - depends on granularity

– to – date to harvest to– set – a set name, list only records from this set (if

null, list all records) set names classify objects in natural groups every record may belong to multiple sets (or none)

– metadaPrefix – list only records that support this format (sample formats: oai_dc, oai_lom, ...)

Page 12: Open Archives Iniative –  Protocol for Metadata Harvesting

12

SampleOAICatalog.listIdentifiers

Must return a map with to fields– headers – a String iterator of OAI headers– identifiers – a String iterator of OAI identifiers

Both created by the call (rec is a SampleRecord)String[] header = getRecordFactory().createHeader(rec);

headers.add(header[0]);

identifiers.add(header[1]);

Create resultMap<String, Object> listIdMap = new HashMap<String, Object>();

listIdMap.put("headers", headers.iterator());

listIdMap.put("identifiers", identifiers.iterator());

return listIdMap;

Page 13: Open Archives Iniative –  Protocol for Metadata Harvesting

13

getRecordFactory().createHeader(rec)

Creates header by calling the methods in SampleRecordFactory

String getOAIIdentifier(Object rec)– return full oai identifier “oai:oay.rep.com:id001”

String getDatestamp(Object rec)– returns date in iso8601 format

Iterator<String> getSetSpecs (Object rec)ArrayList<String> list = new ArrayList<String>();

list.add(...);

return list.iterator(); Iterator<String> getAbouts (Object rec) String fromOAIIdentifier(String id)

– helper method – convert id to a local id

Page 14: Open Archives Iniative –  Protocol for Metadata Harvesting

14

SampleOAICatalog.listSets

takes no parameters, returns the list of all sets in this repository– each ListIdentifiers or ListRecords query may

contain a set name, limiting the results to just one set

Page 15: Open Archives Iniative –  Protocol for Metadata Harvesting

15

SampleOAICatalog.getSchemaLocations

like GetRecord, but returns the Vector of all metadata schema locations the record supports– to obtain them, just call

getRecordFactory().getSchemaLocations(rec);

Page 16: Open Archives Iniative –  Protocol for Metadata Harvesting

16

SampleOAICatalog.getRecord

String getRecord(String id, String metadataPrefix)– find record and convert it to xml string (<record> element)– id is in global format – to get local value call

getRecordFactory().fromOAIIdentifier(id)– throw IdDoesNotExistException if record not found– to generate XML use constructRecord

constructRecord(rec, metadataPrefix)

Page 17: Open Archives Iniative –  Protocol for Metadata Harvesting

17

SampleOAICatalog.listRecords

just like ListIdentifiers, only generates a list of XML <record> elements

return a map with one elementMap<String, Object> listRecMap = new HashMap<String, Object>();

listRecMap.put(“records", records.iterator());return listRecMap;

Page 18: Open Archives Iniative –  Protocol for Metadata Harvesting

18

Crosswalks

Conversions of native record type to XML like Sample2oai_lom or Sample2oai_dc

Only two methods per implementation– boolean isAvailableFor(Object rec)– String createMetadata(Object rec)

SampleRecord record = (SampleRecord) rec;return LOMFormat.writeStringWithSchema(record.toLOM());

throw CannotDisseminateFormatException if the metadata not available in this format

Page 19: Open Archives Iniative –  Protocol for Metadata Harvesting

19

SampleRecord.toLOM

uses LOM-j lib to quickly hack together LOMhttp://sourceforge.net/projects/lom-j/

– automatic serialization/deserialization of LOM and DC XML formats

Examplelom.newGeneral().newIdentifier(0).newCatalog().setString("lre");

lom.newGeneral().newIdentifier(0).newEntry().setString("sample:" + id);

lom.newTechnical().newLocation(-1).setString(url);

lom.newGeneral().newTitle().newString(0).newLanguage().setValue("en");

lom.newGeneral().newTitle().newString(0).setString(title);

Page 20: Open Archives Iniative –  Protocol for Metadata Harvesting

20

Resumption

A repository usually has fixed limit on the numer of records to return in one call– if there are more available, it returns a resumption

token, allowing to receive next packet– Implemented by functions

listIdentifiers(String resumptionToken) ,listRecords(String resumptionToken)

– see XYZOAICatalog for details

Page 21: Open Archives Iniative –  Protocol for Metadata Harvesting

21

References

http://www.openarchives.org/OAI/openarchivesprotocol.html http://www.fmf.uni-lj.si/~kavkler/ http://www.oclc.org/research/software/oai/cat.htm http://www.cs.kuleuven.ac.be/~hmdb/SqiOaiMelt http://sourceforge.net/projects/lom-j/ SIO/Trubar OAI url

http://sio.edus.si/LreTomcat/