the open archives initiative and oaister: past, present and future kat hagedorn university of...
Post on 22-Dec-2015
214 views
TRANSCRIPT
The Open Archives Initiative and OAIster:
Past, Present and Future
Kat HagedornKat Hagedorn
University of Michigan LibrariesUniversity of Michigan Libraries
April 6, 2006April 6, 2006
The oy(ai)ster and the hareThe oy(ai)ster and the hare
Well, if oysters had feet…Well, if oysters had feet…
Other projects move faster (think Google)Other projects move faster (think Google) OAI still building speedOAI still building speed Follows the punctuated equilibrium model…Follows the punctuated equilibrium model…
* © Johnny Hart!
OAIster records over time
0
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
6,000,000
7,000,000
8,000,000
Jun-02Aug-02Oct-02Dec-02Feb-03Apr-03Jun-03Aug-03Oct-03Dec-03Feb-04Apr-04Jun-04Aug-04Oct-04Dec-04Feb-05Apr-05Jun-05Aug-05Oct-05Dec-05Feb-06
months
# records
OAIster repositories over time
0
100
200
300
400
500
600
700
Jun-02Aug-02Oct-02Dec-02Feb-03Apr-03Jun-03Aug-03Oct-03Dec-03Feb-04Apr-04Jun-04Aug-04Oct-04Dec-04Feb-05Apr-05Jun-05Aug-05Oct-05Dec-05Feb-06
months
# repositories
Why OAIster?Why OAIster?
And why the silly name?And why the silly name?
Initially, wanted to build the Academic Initially, wanted to build the Academic HotBot (yup, you read that right)HotBot (yup, you read that right)
Essentially, a union catalog of those Essentially, a union catalog of those “objects” that couldn’t easily be spidered“objects” that couldn’t easily be spidered
Currently, have more records that link to Currently, have more records that link to “objects” than there are records in our “objects” than there are records in our OPACOPAC
What does OAIster contain?What does OAIster contain?
Harvest everything availableHarvest everything available except obvious test repositoriesexcept obvious test repositories
Keep nearly everythingKeep nearly everything must have a digital object linkmust have a digital object link must have decent metadatamust have decent metadata must be scholarly or informationalmust be scholarly or informational
For example…For example…
Why do (should) people use it?Why do (should) people use it?
It’s big-- over 7 million last monthIt’s big-- over 7 million last month It’s varied-- contains articles, books, images It’s varied-- contains articles, books, images
of artwork, datasets, videos, audios, finding of artwork, datasets, videos, audios, finding aids, manuscriptsaids, manuscripts
It keeps growing-- as long as they keep It keeps growing-- as long as they keep paying my salarypaying my salary
One interface to rule them all?One interface to rule them all?
If you don’t know this…If you don’t know this… www.oaister.orgwww.oaister.org www.oaister.umdl.umich.edu/o/oaisterwww.oaister.umdl.umich.edu/o/oaister
……how do you get to the content?how do you get to the content? We consider part of our mission making this We consider part of our mission making this
metadata as widely available as possible, metadata as widely available as possible, so…so…
Approached us as part of a big content Approached us as part of a big content appropriation pushappropriation push
Send them our metadata monthly-- takes Send them our metadata monthly-- takes them about a week to include it in the them about a week to include it in the search indexsearch index
For example--For example--
SRU interfaceSRU interface
Federated search engines are “it” now--Federated search engines are “it” now--trying to solve problem of how to search trying to solve problem of how to search simultaneouslysimultaneously
Perfect place for OAIsterPerfect place for OAIster Built SRU interface (Z39.50 deemed older Built SRU interface (Z39.50 deemed older
tech at this point)tech at this point) ExLibris building connector for MetaLib toolExLibris building connector for MetaLib tool For example--For example--
OAI: what it is (finally)OAI: what it is (finally)
Stands for Open Archives InitiativeStands for Open Archives Initiative “…develops and promotes interoperability standards
that aim to facilitate the efficient dissemination of content.”
Includes a Protocol for Metadata Harvesting Includes a Protocol for Metadata Harvesting (PMH), i.e., what we use to fill OAIster(PMH), i.e., what we use to fill OAIster
Consists of data providers and service Consists of data providers and service providersproviders
OAI: what it is notOAI: what it is not
OAI ≠ open access OAI ≠ open access “…defining and promoting machine interfaces that
facilitate the availability of content from a variety of providers. Openness does not mean ‘free’ or ‘unlimited’ access to the information repositories that conform to the OAI-PMH.”
However, a large majority of OAIster However, a large majority of OAIster records are available to all and sundryrecords are available to all and sundry
Perfect opportunity-- freely sharing free stuffPerfect opportunity-- freely sharing free stuff
OAIster and open accessOAIster and open access
We harvest a large number of open access We harvest a large number of open access “self-publishing” repositories, e.g.,“self-publishing” repositories, e.g., DSpace: 68DSpace: 68 EPrints: 113EPrints: 113 OJS: 21OJS: 21
Plus green and gold standard peer-reviewed Plus green and gold standard peer-reviewed digital object records from repositories like digital object records from repositories like PLOS and arXivPLOS and arXiv
OAI-PMH modelOAI-PMH model
Data providers:Data providers: XML UTF-8 metadata recordsXML UTF-8 metadata records hosted by shareware softwarehosted by shareware software
Service providers:Service providers: discover the data providerdiscover the data provider harvest that metadataharvest that metadata transform it…transform it… index it and make it searchableindex it and make it searchable
Transformation toolTransformation tool
Remove “no digital object” recordsRemove “no digital object” records Add normalized fields for limiting searchAdd normalized fields for limiting search
currently resource type normalized to 5 values: currently resource type normalized to 5 values: text, image, audio, video, datasettext, image, audio, video, dataset
planning on date normalizationplanning on date normalization Maps Simple Dublin Core to our own DLXS Maps Simple Dublin Core to our own DLXS
Bibliographic Class for indexingBibliographic Class for indexing
System designSystem design
UM harvester
Record storage
XSLT transformation
tool
BibClass indexes
OAI-enabled DC records
XSL stylesheets (per source
type)
Search interface(XPAT)
MODS / Aquifer portalsMODS / Aquifer portals
Only harvest Simple Dublin Core for OAIsterOnly harvest Simple Dublin Core for OAIster Experimenting with harvesting MODSExperimenting with harvesting MODS
Why MODS?Why MODS?
Is the metadata standard of choice among Is the metadata standard of choice among richer, enhanced formatsricher, enhanced formats
Offers more focused ability to search and Offers more focused ability to search and retrieve recordsretrieve records
Based on MARC, but human-readableBased on MARC, but human-readable Digital Library Federation (we’re members) Digital Library Federation (we’re members)
is pushing for its useis pushing for its use
What’d we do with MODS?What’d we do with MODS?
Mapping MODS to DLXS Bibliographic Mapping MODS to DLXS Bibliographic Class with many modificationsClass with many modifications adding attributes-- handle display title (The adding attributes-- handle display title (The
quick fox…) vs. sort title (quick fox…, The)quick fox…) vs. sort title (quick fox…, The) merging fields-- namePartsmerging fields-- nameParts splitting out subject fields-- topical, name, splitting out subject fields-- topical, name,
geographical, hierarchicalgeographical, hierarchical Not all that perfectNot all that perfect
merged fields don’t always make sensemerged fields don’t always make sense not fully leveraging the richer fields in searchnot fully leveraging the richer fields in search
What else?What else?
Added bookbag functionsAdded bookbag functions Added thumbnailsAdded thumbnails Created better search interfaceCreated better search interface
Next…Next… tackle date normalizationtackle date normalization downloading of MODS directly from interfacedownloading of MODS directly from interface port useful features and widgets to OAIsterport useful features and widgets to OAIster
Onwards…Onwards…
Receive grant to work on Receive grant to work on metadata remediation…metadata remediation…
……meaning ways to cluster meaning ways to cluster and classify metadata so it is and classify metadata so it is more easily searchable and more easily searchable and browseable browseable
And continue to work on best And continue to work on best practices for data providerspractices for data providers
Questions?Questions?
Kat HagedornKat Hagedorn University of Michigan LibrariesUniversity of Michigan Libraries Digital Library Production ServiceDigital Library Production Service www.oaister.orgwww.oaister.org [email protected]@umich.edu