building a fedora architecture to support diverse collections jon dunn ryan scherle digital library...

32
Building a Fedora Architecture to Support Diverse Collections Jon Dunn Ryan Scherle Digital Library Program Indiana University

Upload: andrew-obrien

Post on 27-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Building a Fedora Architecture to Support Diverse Collections

Jon DunnRyan Scherle

Digital Library ProgramIndiana University

Indiana University Digital Library Program Joint venture of Libraries and University

Information Technology Services (UITS) formed in 1997

Bloomington-based; supporting 8 campuses Engaged in digital collection building, infrastructure

design/management, and research activities Supporting library, archive, museum, academic

department, and faculty-based digital collections projects

Digital Library Content Types at IU Books Manuscripts Photographs Art images Music audio Video Sheet music Musical score images Music notation files …and more

Current DLP Technical Environment: Access Systems DLXS (University of Michigan)

Text Finding Aids Bibliographic information

IBM Content Manager Locally-developed systems

Cushman Photograph Collection DIDO: Digital Images Delivered Online Variations2 Page turners (sheet music, METS Navigator)

Current DLP Technical Environment: Storage DLP server disk storage Tivoli Storage Manager IU Massive Data Storage System (MDSS)

HPSS software1.6 petabytes of StorageTek and IBM

automated tapeAccess via FTP, PFTP, HSI

Motivations for a repository

Centralize access and preservation functions for IU’s digital collections

Reduce DLP staff time and attention needed to create and maintain collections

Enable librarians, curators, archivists to digitize new collections

Enable digital preservation

DL Infrastructure Project

Proposal funded by University Information Technology Services to reengineer digital library infrastructure around Fedora

Builds on experience with Fedora in context of EVIA Digital Archive (ethnomusicology video)

Building services and tools around Fedora Searching/browsing of metadata and

content End-user UI for display/navigation of

metadata and content Cataloging and ingest tools Preservation services

IU Content Models

Defining a content model

Focus on what you can do with an object Behaviors are primary Behaviors are the way all external processes

will interact with the object Keep datastreams “private”

Diversity Multiple media types Multiple brands Multiple tools

Standard disseminators All objects subscribe to the default disseminator Most objects subscribe to the metadata disseminator Most objects subscribe to type-specific disseminators

Metadata dissem

getMetadata(type)

Default dissem

getDefaultView

getLabel

getFullView

getPreview

getAssetDefinition

Simple images Each image is a single Fedora object Images are available in a variety of sizes Each image belongs to one or more collections

Default dissem

Metadata dissem

Collection obj

Collection dissem

Default dissem

Metadata dissem

Image obj

Image dissem

Default dissem

Metadata dissem

Image obj

Image dissem

Default dissem

Metadata dissem

Collection obj

Collection dissem

Default dissem

Metadata dissem

Book obj

Paged dissem

Default dissem

Metadata dissem

Book obj

Paged dissem

Default dissem

Metadata dissem

Page obj

Image dissem

Default dissem

Metadata dissem

Page obj

Image dissem

Default dissem

Metadata dissem

Page obj

Image dissem

Default dissem

Metadata dissem

Page obj

Image dissem

Object-level disseminators

Image getThumbnail getScreenSize getLarge getMaster

Video getSmilFile playSmilFile getStructMap getActionObject getObjectID

PagedImage getNumChildren getChildren

PagedText getSummary getChunkList getChunk(label) getRawText getFriendlyText getTextPage(num)

Printable getPrintableVersion

Collection-level disseminators

Collection getSize listMembers(start,max)

CollectionRender renderItemPreview(pid) renderItemFullView(pid)

CollectionPagedImage viewPageTurner(pid,

pagenum)

CollectionPagedText viewText(pid, pagenum,

style) viewChunk(pid, label, style) viewPage(pid, num, style)

Image Demos

Sample Image Frank M. Hohenberger Collection U.S. Steel Collection

But what about the metadata?

Different content types have different types of metadata MARC for general library holdings MODS for collections we catalog TEI for textual collections EAD for archival collections Combinations: Some items need METS for structure,

TEI for text, MODS for description, etc.

The solution: METS

No, not the Fedora METS METS within a datastream, and everything else

within the METS A standard way of dealing with DC, MODS,

technical, structural, provenance, process, etc. Sample Image

Implementing the disseminators

Simple Image DC THUMBNAIL SCREEN LARGE METADATA RELS-EXT

Paged Object DC METADATA RELS-EXT

Collection DC METADATA INGEST_CONFIG

Want more info?

More detailed content model pages

are available on our project wiki.

IU Fedora Tools

Ingest Tool

The Ingest Tool transforms raw metadata and media files into Fedora objects that conform to our content models.

Ingest Tool

Fedora

MODSEAD JPG PDF

DatastreamsFOXML

METS Navigator

METS Navigator is a METS-based system for displaying and navigating multi-image digital objects.

It was built to be extendible and configurable. Web pages with navigational structure are built from

metadata in the repository. Available from http://metsnav.sourceforge.net

Demos

Default METS Navigator Collection

Jane Johnson Collection

Using METS Navigator with Fedora

METS document must meet minimal format requirements Logical and physical structMap Files marked with USE and GROUPID attributes Files are URLs that point to Fedora

METS Navigator may be called from a disseminator, but it is better if called separately.

Full integration instructions

Cataloging tools

No good solutions for non-MARC descriptive/structural metadata creation Some exist for specific domains: e.g. art image

cataloging Need content- or collection-appropriate interfaces Catalog directly into Fedora or into database?

Data synchronization issues Common framework or separate tools? Starting to investigate

Delivery tools

Right now: collection-specific web sites Moving towards: generic applications

appropriate to content models Examples: documentary photos, art images, books,

sheet music… May integrate components from other places

(e.g. Virginia collector tool) Exposing metadata to external services via OAI-

PMH, SRU (for Metasearch)

Other tools and services via Fedora Service Framework Search tool

Expanded, with thesaurus support Preservation integrity services

Infrastructure Project Challenges Time and resources vs. scope of work Sorting out old collections – digital

archeology Implementing new infrastructure while

continuing to do new projects Maintaining current functionality

Infrastructure Project Challenges Metadata entry / cataloging tool design Integration with MDSS/HPSS - classes of

storage Art images Searching system Preservation system

Thank You!

Contact info:Jon Dunn [email protected] Scherle [email protected]

Infrastructure project wiki: http://wiki.dlib.indiana.edu/confluence/display/INF