open archives iniative – protocol for metadata harvesting
DESCRIPTION
Open Archives Iniative – Protocol for Metadata Harvesting. Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte, KUL Joris Klerkx, KUL. What is OAI?. Harvesting standard, documented at http://www.openarchives.org/OAI/openarchivesprotocol.html - PowerPoint PPT PresentationTRANSCRIPT
Open Archives Iniative – Protocol for Metadata Harvesting
Iztok Kavkler, University of Ljubljana
Some slides byStefaan Ternier, KULBram Vandenputte, KULJoris Klerkx, KUL
2
What is OAI?
Harvesting standard, documented athttp://www.openarchives.org/OAI/openarchivesprotocol.html
Seven service verbs– Identify– ListMetadataFormats– GetRecord– ListRecords– ListIdentifiers– ListSets
Allows multiple metadata formats– DC (Dublin core) format mandatory
3
How OAI works
OAI “VERBS”– Identify – ListMetadataFormats– GetRecord– ListIdentifiers– ListRecords– ListSets
HARVESTER
REPOSITORY
OAI OAI
Service Provider Metadata Provider
HTTP Request
HTTP Response
(OAI Verb)
(Valid XML)
4
Try it
Install Apache-Tomcat or any other Java servlet container
Download WAR file from
http://fire.eun.org/Iztok/OAILREApp.war Deploy WAR Demo html
http://localhost:8080/OAILREApp/
Or type a service verb, e.g.http://localhost:8080/OAILREApp/oaiHandler?verb=Identify
5
The raw XML
By default, the resulting XML has stylesheet attached for pretty rendering
To remove the stylesheet comment the line
OAIHandler.styleSheet=testoai/oaicat.xsl
in file
oaicat.properties (in WAR file or the web-app dir)
6
OAI XML example<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" ...><responseDate>2007-06-11T06:48:58Z</responseDate><request metadataPrefix="oai_lom"
verb="ListRecords">http://localhost:8080/OAILREApp/oaiHandler</request><ListRecords> <record> <header>
<identifier>oai:oai.xyz-repository.com:exercises/112553</identifier><datestamp>2007-06-09T22:38:28Z</datestamp><setSpec>exercises</setSpec>
</header> <metadata>
<lom xmlns=...> ... </lom> </metadata> </record>
....<resumptionToken expirationDate="2007-06-11T07:48:58Z"completeListSize="42" cursor="10">1181544538265</resumptionToken></ListRecords></OAI-PMH>
7
OAICat - a Java implementation
OAICat home athttp://www.oclc.org/research/software/oai/cat.htm
Takes care of– web service details– OAI XML specification
The implementer has to provide three classes– RepositoryOAICatalog– RepositoryRecordFactory– Repository2oai_dc (lom, ...) - usually more than
one
8
A sample implementation
(Source code and libs inhttp://fire.eun.org/Iztok/OAILREApp.zip)
Create a new web module Add servlet oaiHandler to web.xml<servlet>
<servlet-name>LreOAIHandler</servlet-name>
<servlet-class>ORG.oclc.oai.server.OAIHandler</servlet-class>
<load-on-startup>5</load-on-startup>
</servlet>
<servlet-mapping>
<servlet-name>LreOAIHandler</servlet-name>
<url-pattern>/oaiHandler</url-pattern>
</servlet-mapping>
9
(cont)
Define properties file location<context-param>
<param-name>properties</param-name>
<param-value>oaicat.properties</param-value>
</context-param>
Welcome file for testing<welcome-file-list>
<welcome-file>testoai/index.html</welcome-file>
</welcome-file-list>
10
Sample record
A record with basic fieldsid, url, title, descr and date
SampleOAICatalog contains an array with 3 sample records
11
SampleOAICatalog.listIdentifiers
Parameters– from – date to harvest from (String in iso8601
format) date or datetime - depends on granularity
– to – date to harvest to– set – a set name, list only records from this set (if
null, list all records) set names classify objects in natural groups every record may belong to multiple sets (or none)
– metadaPrefix – list only records that support this format (sample formats: oai_dc, oai_lom, ...)
12
SampleOAICatalog.listIdentifiers
Must return a map with to fields– headers – a String iterator of OAI headers– identifiers – a String iterator of OAI identifiers
Both created by the call (rec is a SampleRecord)String[] header = getRecordFactory().createHeader(rec);
headers.add(header[0]);
identifiers.add(header[1]);
Create resultMap<String, Object> listIdMap = new HashMap<String, Object>();
listIdMap.put("headers", headers.iterator());
listIdMap.put("identifiers", identifiers.iterator());
return listIdMap;
13
getRecordFactory().createHeader(rec)
Creates header by calling the methods in SampleRecordFactory
String getOAIIdentifier(Object rec)– return full oai identifier “oai:oay.rep.com:id001”
String getDatestamp(Object rec)– returns date in iso8601 format
Iterator<String> getSetSpecs (Object rec)ArrayList<String> list = new ArrayList<String>();
list.add(...);
return list.iterator(); Iterator<String> getAbouts (Object rec) String fromOAIIdentifier(String id)
– helper method – convert id to a local id
14
SampleOAICatalog.listSets
takes no parameters, returns the list of all sets in this repository– each ListIdentifiers or ListRecords query may
contain a set name, limiting the results to just one set
15
SampleOAICatalog.getSchemaLocations
like GetRecord, but returns the Vector of all metadata schema locations the record supports– to obtain them, just call
getRecordFactory().getSchemaLocations(rec);
16
SampleOAICatalog.getRecord
String getRecord(String id, String metadataPrefix)– find record and convert it to xml string (<record> element)– id is in global format – to get local value call
getRecordFactory().fromOAIIdentifier(id)– throw IdDoesNotExistException if record not found– to generate XML use constructRecord
constructRecord(rec, metadataPrefix)
17
SampleOAICatalog.listRecords
just like ListIdentifiers, only generates a list of XML <record> elements
return a map with one elementMap<String, Object> listRecMap = new HashMap<String, Object>();
listRecMap.put(“records", records.iterator());return listRecMap;
18
Crosswalks
Conversions of native record type to XML like Sample2oai_lom or Sample2oai_dc
Only two methods per implementation– boolean isAvailableFor(Object rec)– String createMetadata(Object rec)
SampleRecord record = (SampleRecord) rec;return LOMFormat.writeStringWithSchema(record.toLOM());
throw CannotDisseminateFormatException if the metadata not available in this format
19
SampleRecord.toLOM
uses LOM-j lib to quickly hack together LOMhttp://sourceforge.net/projects/lom-j/
– automatic serialization/deserialization of LOM and DC XML formats
Examplelom.newGeneral().newIdentifier(0).newCatalog().setString("lre");
lom.newGeneral().newIdentifier(0).newEntry().setString("sample:" + id);
lom.newTechnical().newLocation(-1).setString(url);
lom.newGeneral().newTitle().newString(0).newLanguage().setValue("en");
lom.newGeneral().newTitle().newString(0).setString(title);
20
Resumption
A repository usually has fixed limit on the numer of records to return in one call– if there are more available, it returns a resumption
token, allowing to receive next packet– Implemented by functions
listIdentifiers(String resumptionToken) ,listRecords(String resumptionToken)
– see XYZOAICatalog for details
21
References
http://www.openarchives.org/OAI/openarchivesprotocol.html http://www.fmf.uni-lj.si/~kavkler/ http://www.oclc.org/research/software/oai/cat.htm http://www.cs.kuleuven.ac.be/~hmdb/SqiOaiMelt http://sourceforge.net/projects/lom-j/ SIO/Trubar OAI url
http://sio.edus.si/LreTomcat/