oaister kat hagedorn university of michigan libraries september 12, 2007

OAIster

Kat Hagedorn

University of Michigan Libraries

September 12, 2007

Outline

Brief history/overview of OAI

Why OAIster was created

OAIster: digital union catalog

EPrints, open access and OAIster

Google (and others) and OAIster

What is OAI?

OAI stands for Open Archives Initiative“…develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content.”

Probably should have been called SAI: Shared Archives Initiative

Includes a Protocol for Metadata Harvesting (PMH), i.e., what we use to fill OAIster

Consists of data providers and service providers

Metadata records

Data providers use protocol to share their metadata records

Service providers harvest the metadata so they can provide a service using them

Metadata needs to beXML1.1 compliantUTF-8 enabledSufficient for discovery

OAI: what it is not

OAI ≠ open access “…defining and promoting machine interfaces that facilitate the

availability of content from a variety of providers. Openness does not mean ‘free’ or ‘unlimited’ access to the information repositories that conform to the OAI-PMH.”

However, a large majority of OAIster records are available to all and sundry

Perfect opportunity-- freely sharing free stuff

Why OAIster?

Initially, wanted to build the Academic HotBot (now we would say the Academic Google)

Essentially, a union catalog of digital objects that often can’t be roboted or spidered

Currently, have more records that link to “objects” than there are records in our OPAC: 13+ million

What does OAIster contain?

Pre-prints, post-prints, published articles, grey literature, scanned images, archival videos…

Harvest everything available except obvious test repositories

Keep nearly everything must have a valid digital object link must have decent metadata must be scholarly or informational

http://memory.loc.gov/mbrs/varsmp/0526.mpgLibrary of Congress Digitized Historical Collections

http://name.umdl.umich.edu/ADM0370.0002.001University of Michigan Digital Collections

Why do (should) people use it?

It’s big-- will pass 14 million shortly

It’s varied-- besides articles, photos, and videos, it contains datasets, audio files, finding aids, manuscripts…

It keeps growing-- as long as they keep paying my salary

EPrints in OAIster

Kept pace with EPrints movement

Currently actively harvest 138 EPrints repositories (another 18 are inactive)

All told, 190K+ records

A word about repository discovery…

EPrints in OAIster

Not a drop in the bucket, when consider that EPrints is less than a decade old

Don’t forget that OAIster contains more than records pointing to full-text

And it also doesn’t point to only open access materials

Restricted materials

Effort needed to partition restricted from freely accessible

Often, full-text objects are embargoed, but not reflected in metadata

Also felt not providing restricted materials could be a disservice

Benefit of EPrints in OAIster

All valid records in one place

Actively checked and re-harvested

Inclusive search, so end-users retrieve associated and serendipitous materials

When use OAIster, when Google?

Google vs. everything elseGoogle can’t get at everything, until it

starts using OAI itself (DSpace aside)Google contains junk, OAIster rarely

does (anymore)Most importantly, we use metadata,

Google doesn’t

Questions?

Kat Hagedorn

University of Michigan Libraries

Digital Library Production Service

www.oaister.org

[email protected]

oaister kat hagedorn university of michigan libraries september 12, 2007

Documents