excellent xml – systems interoperability at the wellcome library

51
Excellent XML – systems interoperability at the Wellcome Library EIUG 11th Conference, Stirling University 1 & 2 September 2005 Margaret Savage-Jones [email protected]

Upload: sidney

Post on 09-Jan-2016

52 views

Category:

Documents


0 download

DESCRIPTION

EIUG 11th Conference, Stirling University 1 & 2 September 2005 Margaret Savage-Jones [email protected]. Excellent XML – systems interoperability at the Wellcome Library. Millennium - Innovative Interfaces Inc. http://catalogue.wellcome.ac.uk Includes online requesting - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Excellent XML – systems interoperability at the Wellcome Library

Excellent XML – systems interoperability at the Wellcome

Library

EIUG 11th Conference, Stirling University

1 & 2 September 2005

Margaret Savage-Jones

[email protected]

Page 2: Excellent XML – systems interoperability at the Wellcome Library

Wellcome Library Systems

Millennium - Innovative Interfaces Inc.

http://catalogue.wellcome.ac.uk Includes online requesting

from closed stack since mid 2003

Calm - Archive system – DS Ltd http://archives.wellcome.ac.uk

Online access to archive & mss holdings

Miro/MedPhoto image system – System Simulation Ltd

http://medphoto.wellcome.ac.uk

Online access to over 100,000 images, image retrieval & delivery

Page 3: Excellent XML – systems interoperability at the Wellcome Library

Underlying protocol: OAI-PMH

Open Archives Initiative Protocol for Metadata Harvesting - protocol for sharing and harvesting metadata between different OAI-compliant systems

Based on XML and HTTP

One system (CALM or MedPhoto) exposes metadata via an OAI repository. This metadata is harvested by the other system (Millennium) and then loaded

Page 4: Excellent XML – systems interoperability at the Wellcome Library

Motivation With a MARC21, ISAD(G) & a bespoke image repository it was a strategic objective to make these systems interoperate

Phase II of the Closed Stack project - Western Manuscripts and Archives had to be requestable online by summer 2004

XML Harvester development by Innovative with Michigan State University 2001-02. Wellcome placed an order for XML Harvester in January 2003

With CALM ver 4 it was possible to export EAD XML

Page 5: Excellent XML – systems interoperability at the Wellcome Library

Benefits

Online requesting - Western MSS & Archives collections

One circulation system to manage and one set of circ stats

Same interface for all online requests from stack

Archives & manuscripts like other collections

Image sets for library objects displayed in Web OPAC

User can jump from one system to another

No need to rekey user search in other system

Selective harvesting for onward record updating

Page 6: Excellent XML – systems interoperability at the Wellcome Library

Example: archive record (from Crick Coll.)

Page 7: Excellent XML – systems interoperability at the Wellcome Library

Harvested archive record in Web OPAC

Page 8: Excellent XML – systems interoperability at the Wellcome Library

Image of the archive item

Page 9: Excellent XML – systems interoperability at the Wellcome Library

Encoded Archival Description (EAD)

Initially XML Harvester dealt only with EAD and needed

encodinganalogs for parsing. Developed with Michigan

State University (MSU) whose EAD finding aids had

MARC encodinganalogs. Harvester parser read these tags.

Encodinganalogs are attributes in XML records indicateing

field, subfield, indicators etc. in another descriptive encoding

system e.g. MARC21 equivalent to EAD tagged element

Page 10: Excellent XML – systems interoperability at the Wellcome Library

Archive system metadata

Hierarchical, tree structure with collection and component item

level records catalogued in General International Standard Archival

Description, ISAD(G)

Field export from CALM as default subset EAD DTD had some

empty fields – had to export as “DServe Natural” XML which

includes field tags. Catalog.xml output with catalog.DTD

Page 11: Excellent XML – systems interoperability at the Wellcome Library

Pilot – used “Haddad” catalogue XML

Used small set of 87 XML Arabic records – a local variant

of `MASTER’ XML DTD as a pilot to tes XML Harvester

Used stylesheets to filter unwanted fields, add encodinganalogs

and put 87 .xml files in a web server directory ready to be

harvested

Page 12: Excellent XML – systems interoperability at the Wellcome Library
Page 13: Excellent XML – systems interoperability at the Wellcome Library

Web crawler

Harvester reaches the XML files through port 80.

We added a page to the Millennium screens directory

listing files with redirections to the web server folder.

Harvester opened the page, scanned for `HREF’ strings

which directed it to the XML records (file.xml)

The XML Harvester parser read tags from encodinganalogs

to create MARC21 records, writing to a file for loading

Page 14: Excellent XML – systems interoperability at the Wellcome Library

Redirection screen<html>

<head>

<title> Harvester Test</title>

</head>

<body>

<em>Mss Files</em><br>

<strong> Sample Screen # 2</strong>

<PRE>

Test to confirm if harvester can crawl files deposited on wtcalm01

</pre>

<A HREF=http://wtcalm01.wellcome.ac.uk/xml/002.xml>002</A>

<A HREF=http://wtcalm01.wellcome.ac.uk/xml/83.xml>83</A>

<A HREF=http://wtcalm01.wellcome.ac.uk/xml/82.xml>82</A>

</body>

</html>

Page 15: Excellent XML – systems interoperability at the Wellcome Library

Example – encodinganalogs for 856

- <hyperlink>

-<url ENCODINGANALOG=”85607$u”>

<xsl:text>http://http://wisdom.welcome.ac.uk/xml/</xsl:text>

<xsl:value-of select+”substring-after(/?idno,`WMS Arabic`)”/>

<xsl:text>.html</xsl:text>

</url>

<text ENCODINGANALOG=”85607$z”>View full manuscript record</text>

</hyperlink>

Page 16: Excellent XML – systems interoperability at the Wellcome Library

Harvested MARC21 “Haddad” record

Page 17: Excellent XML – systems interoperability at the Wellcome Library

Links: to PDF and Request button

Page 18: Excellent XML – systems interoperability at the Wellcome Library

Lessons

Arabic records would be loaded only once but records from

CALM would need regular reharvesting/overlay

Need a more sophisticated approach than crawling a web

directory – XML Harvester can harvest from OAI Repository and

use datestamps in OAI to harvest records created, or modified

in specified date range

XSLT could be used to transform records to MARC21 OAI

without using encodinganalogs.

Page 19: Excellent XML – systems interoperability at the Wellcome Library

Archives OAI repository

Built on CALM server using freeware University of Illinois

Provider service tool (Runs under Windows IIS)

Other Requirements:

Microsoft 2000 serverMicrosoft IIS ver 4 or higherMicrosoft ASPMicrosoft XML Parser (MSXML) 4.0Microsoft ActiveX Data objects and ODBC compliant datasource i.e. MS Acces97+ databaseFirewall access on port 80

Page 20: Excellent XML – systems interoperability at the Wellcome Library

Key decisions

Metadata export – chose full CALM record XML DTD (not EAD)

Matchpoint – decided to load contents of Calm RefNo field to Millennium 001 indexed in `o’

Also had to consider:

Hierarchical record level to harvest

Navigation between the two systems

Millennium parameters

Page 21: Excellent XML – systems interoperability at the Wellcome Library

Decision: Record level to harvest

A “Collection” could consist of more than 40 boxes. Must have

1:1 record relationship to make requesting and retrieval work

Decision to exclude archives Collection records & use Component

level records. Each of these represent 1 item (box, folder, piece)

and links to a single bib records with attached item for circulation

in Millennium

Page 22: Excellent XML – systems interoperability at the Wellcome Library

Decision: NavigationArchivists wanted the archives (CALM) interface to offer

the main search route for Western Archives & MSS

User is taken from CALM record into Millennium to place

their request then back to their CALM record to continue

browsing their hit list - – two links were needed

Forward: runs cgi script to search Millennium for

corresponding bib record

Back: 856 with URL link (can be inserted by Harvester)

Page 23: Excellent XML – systems interoperability at the Wellcome Library

Example: Links

Forward: cgi script runs search of Millennium `o’ index for

match on CALM RefNo value

http://catalogue.wellcome.ac.uk/search/o?SEARCH=PPCRI%2FA%2F1%2F2%2F8

Back: RefNo PP/CRI/A/1/2/8 built into OAI record URL linking

to CALM web front end - RefNo value built into search string

http://archives.wellcome.ac.uk/DServe/dserve.exe?& dsqIni=

Dserve.ini&dsqApp=Archive&dsqCmd=show.tcl& dsqDb=

Catalog&dsqPos=0&dsqSearch=((text)='PP/CRI/A/1/2/8')

Page 24: Excellent XML – systems interoperability at the Wellcome Library

Calm XML export file<?xml version="1.0" encoding="utf-8" ?>

- <record>

- <DScribeRecord>  <RecordType>Component</RecordType>   <IDENTITY />   <RefNo>MS4385/4404</RefNo>   <AltRefNo>MS.4404</AltRefNo>   <PreviousNumbers />   <Title>Notes and extracts on Chemistry, Volumetric Analysis, (etc.)</Title>   <Date>c. 1865</Date>   <Level>Item</Level>   <Extent>1 volume</Extent>   <UserText5>Bentley House</UserText5>   <Location />   <UserText3>Western MSS series 3 - Requestable</UserText3>   <UserWrapped9 />   <UserText6 />   <UserText7 />

Page 25: Excellent XML – systems interoperability at the Wellcome Library

Mapping Calm XML to Marc21

Fields tags used: 001, 008, 245, 260, 500, 506, 655, 856

And 949 to make the item. Harvester inserts a 99x tag with load

identification code e.g. CALM20040820225128

Found that Component records do not have `author’ which is

only held at Collection level – but not a problem

Mock’ bib and item records keyed to Millennium to:

- demonstrate navigation & agree content with team

- act as a benchmark when harvested records loaded

Page 26: Excellent XML – systems interoperability at the Wellcome Library

XSLT – eXtensible Style Language Transformation

Used XSLT to split the XML single output file into 48,000 component

.xml records using the <DescribeRecord> as record delimiter

and then transform them to MARC21 OAI records listed to

XML Harvester by our OAI repository

The OAI repository installed on the CALM staging server

uses the University of Illinois Provider service tool - freeware

Page 27: Excellent XML – systems interoperability at the Wellcome Library

Millennium parameters

To cope with `open’ v `closed’ archive collections

– new codes were added to archives records and mapped to

new Millennium branch codes which would trigger Millcirc rules

New branch codes added to Request Rules, Determiner Table,

WWWOPTIONS, Locations served

New MATTYPE to exclude Western Mss and archives from the

Asian Mss scope

Page 28: Excellent XML – systems interoperability at the Wellcome Library

Config file for archives record harvest

@LOGLEVEL=CONFIG

@DBNAME=CALM

@URL=http://wtcalm02/oai/oai.asp

@CREATEOVERLAYFROMURI=true

@9XXMARCTAG=991

@USEOAI=true

@DATE=20000606000000

@[email protected]

@SHOWMETADATA=true

Page 29: Excellent XML – systems interoperability at the Wellcome Library

Management interface for XML Harvester

Page 30: Excellent XML – systems interoperability at the Wellcome Library

Archive record: Request link to Web OPAC

Page 31: Excellent XML – systems interoperability at the Wellcome Library

Harvested archive record in Millennium

Page 32: Excellent XML – systems interoperability at the Wellcome Library

Patron login screen to place request

Page 33: Excellent XML – systems interoperability at the Wellcome Library

Confirmation of request

Page 34: Excellent XML – systems interoperability at the Wellcome Library

Interoperation sought with image system

To integrate MedPhoto, a bespoke photo library system,

and Millennium for seamless display and ordering of images

MedPhoto holds images and records for more than 60,000 items

catalogued in Millennium – Iconographic collection, archives &

manuscripts, rare books etc.

Specific need for Millennium User to see images associated with

library objects

Page 35: Excellent XML – systems interoperability at the Wellcome Library

Media management interface

Page 36: Excellent XML – systems interoperability at the Wellcome Library

Config file for image URL harvest@LOGLEVEL=CONFIG

@DBNAME=MEDPHOTO

@URL=http://aquarius.wellcome.ac.uk:6969/ixbin/hixserv

@RECID_MARCTAG=001

@CREATEOVERLAYFROMURI=true

@9XXMARCTAG=991

@USEOAI=true

@REQUIRE_EADID=false

@DATE=20000606000000

@OAIFROMDATE=20050701000000

@OAIUNTILDATE=20050731000000

@[email protected]

@OAISET=bib

Page 37: Excellent XML – systems interoperability at the Wellcome Library

Selective Harvesting – images

Harvest full “bib” set and load to Millennium populating 962s

then each month request list of all new image URLs created since

the last harvest with a Millennium .b number in their record.

<http://medphoto.wellcome.ac.uk:6969/ixbin/hixserv?verb=ListRecords&meta

dataPrefix=marc21&set=bib&from=2005-05-01&until=2005-05-31>

(for records in May)

<http://medphoto.wellcome.ac.uk:6969/ixbin/hixserv?verb=ListRecords&meta

dataPrefix=marc21&set=bib&from=2005-06-01&until=2005-06-30>

(for records in June and so on)

Page 38: Excellent XML – systems interoperability at the Wellcome Library

Harvesting: Image OAI repository

OAI repository built by SSL on MedPhoto server

Metadata matchpoint .b bib record no. is common element

Between Millennium and MedPhoto

XML Harvester selectively requests record set “bib” which all

Have .b nos, parses the returned list of MARC21 OAI records

and creates a file of MARC records for loading

Matches on .b and overlays inserting 962 for each image

962|u holds URL for thumbnail and |e holds `launchpad`URL

Page 39: Excellent XML – systems interoperability at the Wellcome Library

MARC21 record ready to load File Name: DONE-MEDPHOTO_20050601192747.marc (411,392 bytes) Offset:

256 Blocks: 1 - 2

LEADER 00403nam a2200085uu 4500

DIRECTORY

001000900000 035001500009 856008000024 962018500104 991002800289

TAGS

1 000 00403nam a2200085uu 4500@

2 001 L0027751@

3 035 |a.b12857890@

4 856 4 1

|uhttp://medphoto.wellcome.ac.uk/ixbin/imageserv?MIDMIRO=L0027751|zView image@

5 962

|a000:000:URL:b0000000:000000:0:0:0:0:0:0|tImage|vn|uhttp://medphoto.wellcome.ac

.uk/ixbin/hixclient.exe?MIROPAC=L0027751|ehttp://medphoto.wellcome.ac.uk/ixbin/i

mageserv?MIRO=L0027751@

6 991 |aMEDPHOTO{228}20050601192747@

Page 40: Excellent XML – systems interoperability at the Wellcome Library
Page 41: Excellent XML – systems interoperability at the Wellcome Library

Example: with |t default

Page 42: Excellent XML – systems interoperability at the Wellcome Library
Page 43: Excellent XML – systems interoperability at the Wellcome Library

“Launch pad”

We saw an opportunity for further integration – used

Intermediate screen – URL delivered by MedPhoto repository and

loaded to 962 |e

User can hotlink from this “launch pad” into image system

to register, use a light box, email, download or order the

image online from the image system before returning to

Web OPAC

Page 44: Excellent XML – systems interoperability at the Wellcome Library
Page 45: Excellent XML – systems interoperability at the Wellcome Library
Page 46: Excellent XML – systems interoperability at the Wellcome Library

What we usedXML Harvester product (III)

OAI repository software

VBScript – for file splitting operation

Instant Saxon (command line XSLT processor)

Microsoft MSXML core services (e.g. ver 5)

Media Management for 962 (or load URLs to 856)

Three OAI-PMH compliant library systems

Shared Record IDs as matchpoints

Some experience of working with stylesheets

Some experience of load tables and record loading

Page 47: Excellent XML – systems interoperability at the Wellcome Library

Work in progress

Harvesting legacy catalogues/XML for other Asian MSS

e.g.Iskander and Jain project (with Oxford University)

Complete testing and batch loading of 60,000 thumbnail and

“launchpad” URLs to 962’s

Establish routines to manage updates for new, deleted

or amended records – utilise OAI-PMH selective harvesting

Further automation of routines where practicable

Page 48: Excellent XML – systems interoperability at the Wellcome Library

Wish List/Enhancements

Global edit for 962 tag

More documentation for XML Harvester

Access to underlying harvester parameters e.g. for XSLT

processor and XML parser

Automation of selective harvesting for maintenance

Page 49: Excellent XML – systems interoperability at the Wellcome Library

Useful linksXML http://www.w3.org/XML

EAD http://www.loc.gov/ead/

OAI software http://oai.grainger.uiuc.edu/projectinfo.htm

XSLT http://saxon.sourceforge.net/saxon6.4.3/instant.html

http://www.openarchives.org/OAI/openarchivesprotocol.html

http://www.openarchives.org/OAI/2.0/guidelines-marcxml.htm

OAI tutorial http://www.oaiforum.org/tutorial

OAI repository testing http://re.cs.uct.ac.za/

Page 50: Excellent XML – systems interoperability at the Wellcome Library

Some example records

http://catalogue.wellcome.ac.uk/record=b1465521

http://catalogue.wellcome.ac.uk/record=b1580232

http://catalogue.wellcome.ac.uk/record=b1313568 

http://catalogue.wellcome.ac.uk/record=b1613633

http://catalogue.wellcome.ac.uk/search/o?SEARCH=PPCRI%2FA%2F1%2F2%2F8

Page 51: Excellent XML – systems interoperability at the Wellcome Library

Excellent XML: systems interoperability at the Wellcome Library

Thanks for your attention

Margaret Savage-Jones

Library Systems Administrator

[email protected]