object reuse and exchange (ore) : experience in the open language archives community

16
Object Reuse and Exchange (ORE) : Practice and Experience in the Open Language Archives Community (OLAC) Baden Hughes Information Services The University of Melbourne [email protected]

Upload: baden-hughes

Post on 16-Jan-2015

564 views

Category:

Education


0 download

DESCRIPTION

Talk at APSR Clever Collections 2007 (http://www.apsr.edu.au/clevercollections/index.htm)

TRANSCRIPT

Page 1: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

Object Reuse and Exchange (ORE) :

Practice and Experience in the Open Language Archives Community (OLAC)

Baden Hughes

Information Services

The University of Melbourne

[email protected]

Page 2: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

Presentation Overview• OAI/OAI-PMH introduction• ORE introduction• OAI and ORE Compared• OLAC introduction• Compounds in OLAC – Use Cases• OLAC Pre-ORE Compound Implementations• OLAC and ORE Implementation •OLAC and Other Compound Options• Future Work• Conclusion

Page 3: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

OAI Overview and Key Concepts• Developed over the period 2001-2004• The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism for repository interoperability• Two components

• Data providers are repositories that expose structured metadata via OAI-PMH• Service providers then make OAI-PMH service requests to harvest the metadata (a set of six services that are invoked over HTTP/S)

• Now pervasive as an interchange/interoperability mesh between repository and service platforms, both commercially developed and community driven• Fundamental focus is on singular objects or object descriptions, but with basic support for ‘sets’ of objects grouped in a single repository – URI is a resource, data provider based identifiers

OAI and OAI-PMH

Page 4: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

ORE Overview•Currently in development (2006-2008) under auspices of OAI

•Specifications that allow distributed repositories to exchange information about their constituent digital objects

•Enabling cross-repository services and aggregate collection services as a key outcome

•Motivated by scholarly developments in the use and production ofaggregations of digital objects across media types, semantic types, locations, and relationships

•Objectives to deliver towards standardized approaches to identify, describe, and exchange these new outputs of scholarship

Object Re-Use and Exchange (ORE)

Page 5: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

The ORE Data Model•ORE data model that defines how to associate an identifier, a URI, with aggregations of web resources•By reference to these identifiers, aggregations of resources can then be linked to, cited, and described with metadata, in the same manner as any web resource. •ORE data model also allows description of the structure and semantics of these aggregations•ORE specifications define how these descriptions can then be packaged in the XML-based Atom syndication format or in RDF/XML, making them available to a variety of applications•Represents a fundamental change from the classic web architecture model –in ORE terms a URI is not only a resource, but also compound object entry point

Object Re-Use and Exchange (ORE)

Page 6: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

OLAC Overview• Developed 2000 - present

• International partnership of institutions and individuals who are creating a worldwide virtual library of language resources by:

(i) developing consensus on best current practice for the digital archiving of language resources, and

(ii) developing a network of interoperating repositories and services for housing and accessing such resources.

• Currently 37 archives, 30K objects, ~30% have a URI• Specialised OAI sub-community with extensions to oai_dc schema to account for language-centric objects

• Language• Linguistic subject• Linguistic data type • Participant role• Discourse type

•Existing OLAC infrastructure – static and dynamic data providers, harvest/aggregation suite, search and APIs, quality metrics and evaluation

Open Language Archives Community (OLAC)

Page 7: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

Natural Groupings for OLAC Items• Three OLAC extensions to lend themselves to natural implementation of compounds for collecting together sets of related materials

• OLAC Language• Unique identification of languages and groups, uses ISO639-3 as controlled vocabulary• Use Case: collection of objects which share the same language focus

• OLAC Linguistic Data Type• Extension of DCMI Type to describe content of a given resource from the perspective of recognised structural types of linguistic information• Use Case: collection of objects which share the same linguistic data type

• OLAC Linguistic Subject• Extension of DCMI Subject to describe the content of a given resource as about a particular subfield of linguistic science• Use Case: collection of objects which share the same linguistic subject

Compounds and OLAC : Use Cases (1)

Page 8: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

Extended Groupings for OLAC Items• Other thematic groupings within the OLAC context which are of potential interest

• Primary Documentation Sets and Derivatives• Identification of related and derivative linguistic documentation• Use Case: collection of objects which share the same documentary heritage

• Areal/Linguistic Groupings• Identification of language materials of similar geographic extend or genealogy• Use Case: collection of objects which share the same linguistic data type

Compounds and OLAC : Use Cases (2)

Page 9: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

Existing Support (1)

Page 10: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

Existing Support (2)

Page 11: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

A Close Approximation …

• Courtesy of extensions to DC schema, some inherent support for groupings based on element/attribute properties eg

– can find all objects with the same descriptive property, regardless of data provider

• Significant engineering investment in service providers• Community-based search infrastructure, leveraging the ‘natural grain’ of data holdings• Resolver service – identifier to object

But OLAC specific …

• In the generic OAI world, these types of finer grained distinctions are not consistently used across repository communities

OLAC Pre-ORE Compound Implementations

Page 12: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

Expressing Basic Compounds Using TRIX

• Theory: express compounds across OLAC data providers using the dimensions of language, linguistic subject and linguistic data type using TRIX, a low overhead RDF/XML expression, and generate these from an SRU type service (search service API)

• Practice: • Not all objects have URIs, requiring some URI surrogates to be created• SRU service requires additional functionality to handle URI surrogates in strings• Resolver service (identifier to record) requires additional functionality to handle URI surrogates

• Still basically successful from a technical perspective

Implementation Experiments (1)

Page 13: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

Support for Compound Object Data Providers• Theory: Add a new type of data provider schema, allowing for an object to have multiple components including URIs (since all fields are optional and optionally repeatable),

• Practice:

– Requires extension of existing static repository creation toolchain which assumes optional, but not infinitely optionally repeatable fields

– Requires implementing an XSL transform that dumbs these rich compound descriptions down for service providers which are not appropriately equipped (of the 4 main OLAC service providers, only one has support)

• Still basically successful, but jury out about usage ‘in the wild’

Implementation Experiments (2)

Page 14: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

Where could OLAC go with ORE ?• Retreat from the bleeding edge to leading edge - wait until ORE specification is finalised (March 2008) ☺

• Implement OLAC data provider support for XML-based ATOM syndication format and RDF/XML expression• Complete re-architecting OLAC search to support and use ORE-style compounds• Implement OLAC search results as compound data provider ?• Develop virtual data providers using ORE to link non-OLAC items into logical sets with OLAC items

• starting with WorldCat holdings, identified by LCSH-ISO639 language mappings

• Extend OLAC-ORE support in other toolchains which are OLAC enabled

Future Directions

Page 15: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

The Cost of Cleverness ?

• Development of ORE challenges the notion of ‘collection’ as a singular repository of objects, clearly in response to the evolution of scholarly practice in both content development and consumption• For OLAC as a specialised OAI sub-community with extensions to core OAI that allow for natural grouping of similarly themed objects across repositories, ORE offers interesting potential• However, there is a notable juxtaposition between communities which use specialised nomenclature and controlled vocabularies who can largely already handle compounding or surrogates locally and more generic resource communities which cannot – ORE will bring benefits to both

Conclusion

Page 16: Object Reuse and Exchange (ORE) : Experience in the Open Language Archives Community

© Copyright The University of Melbourne 2006