resourcesync overview

Download ResourceSync Overview

Post on 23-Aug-2014

1.579 views

Category:

Internet

2 download

Embed Size (px)

DESCRIPTION

The slides were used to accompany an overview of the outcomes of the ResourceSync project at the 2014 Spring Membership Meeting of the Coalition for Networked Information (CNI). The launch of ResourceSync, a joint project of the National Information Standards Organization (NISO) and the Open Archives Initiative (OAI) funded by the Alfred P. Sloan Foundation, was motivated by the ubiquitous need to synchronize resources for applications in the realm of cultural heritage and research communication. After an initial problem definition and scoping phase, the project has designed, specified, and tested a framework for web-based synchronization that is based on SiteMaps, a protocol widely used by web servers to advertise the resources they make available to search engines for indexing. This choice allows repositories to address both search engine optimization and resource synchronization needs using the same technology. The ResourceSync framework specifies various modular capabilities that a repository can support in order to allow third party systems to remain synchronized with its evolving resources. For example, a Resource List provides an inventory of resources whereas a Change List details resources that were created, deleted or updated during a given temporal interval. Support for capabilities can be combined in order to meet local or community requirements. The framework specifies capabilities that require a third party to recurrently poll for up-to-date information about a repositories’ resources but also publish/subscribe capabilities that keep third parties informed about changes through notifications, thereby significantly reducing synchronization latency.

TRANSCRIPT

  • ResourceSync was funded by the Sloan Foundation & JISC A Modular Framework for Web-Based Resource Synchronization Martin Klein Los Alamos National Laboratory @mart1nkle1n http://www.openarchives.org/rs #resourcesync Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp ResourceSync
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 ResourceSync Collaboration between NISO and the Open Archives Initiative Funded by the Sloan Foundation and JISC Goal: Devise a specification for web-based resource synchronization
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 This ResourceSync Presentation Problem Domain Scope Framework - Overview Framework Technology Demonstration Status
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Background - OAI-PMH Recurrent metadata exchange from a Data Provider to Service Providers XML metadata only Repository centric Devised 1999-2002, prior to REST, prior to dominance of web search engines
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Revisit the Problem Domain - ResourceSync Synchronization of resources from a Source to Destinations Web resources, anything with an HTTP URI & representation Resource centric Devised 2012-2013, leverages key ingredients of web interoperability, existing specifications, existing Search Engine Optimization practice
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Problem Statement Consideration: Source (server) A has resources that change over time: they get created, modified, deleted Destination (servers) X, Y, and Z leverage (some) resources of Source A Problem: Destinations want to keep in step with the resource changes at Source A
  • A Sources Resources
  • A Sources Resources Evolve over Time
  • A Sources Resources Evolve over Time
  • A Sources Resources Evolve over Time
  • A Sources Resources Evolve over Time
  • A Sources Resources Evolve over Time
  • A Sources Resources Evolve over Time
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Problem Statement Consideration: Source (server) A has resources that change over time: they get created, modified, deleted Destination (servers) X, Y, and Z leverage (some) resources of Source A Problem: Destinations want to keep in step with the resource changes at Source A Goal: Design an approach for resource synchronization aligned with the Web Architecture that has a fair chance of adoption by different communities
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 This ResourceSync Presentation Problem Domain Scope Framework - Overview Framework Technology Demonstration Status
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Scope Collection Size Size of a Sources resource collection: A few resources - small web sites, repositories Millions of resources large repositories, datasets, linked data collections
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Scope Change Frequency Change frequency of a Sources resources: Low daily, weekly, monthly High seconds, minutes
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Scope Synchronization Latency Destinations requirements regarding synchronization latency: High latency acceptable Low latency essential
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Scope Collection Coverage Destinations requirements regarding the coverage of a Sources resources: Partial coverage of the Sources resources acceptable Full coverage of the Sources resources verifiable
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Scope Bitstream Accuracy Destinations requirements regarding bitstream accuracy: Unverifiable bitstream accuracy acceptable Verifiable bitstream coverage essential
  • One to One Synchronization
  • One to Many Master Copy
  • Many to One - Aggregator
  • Selective Synchronization
  • Metadata Harvesting
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 This ResourceSync Presentation Problem Domain Scope Framework - Overview Framework Technology Demonstration Status
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 A Sources Resources Evolve over Time
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Solution Perspective - Destination Destination needs regarding synchronization: Baseline synchronization: Initial catch-up operation to align with the Sources resources Incremental synchronization: Remain synchronized as the Sources resources evolve Audit: Destination determines whether it effectively is in sync with the Source - Bitstream accuracy - Coverage of resources
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Solution Perspective - Source Source communicates about the state of its resources: Publish inventory: snapshot of the state of resources at a moment in time Publish changes: enumeration of resource changes that occurred during a temporal interval Notify about changes: send notifications as changes occur Communication payload: Minimal, e.g. HTTP URI of resource Additional, e.g. content-based hash of resource
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Resource List In order to meet a Destinations need for baseline synchronization, the Source may publish a Resource List A Resource List is an inventory, a snapshot of existing resources Per resource, it minimally provides the resources URI Process: - Destination obtains the Resource List - Destination obtains listed resources by their URI - Optimization: Resource Dump, a list pointing to ZIP files that contain resource representations
  • Publish Resource List: Inventory at Tx Resource List @Tx = { A ; B ; C }
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Change List In order to meet a Destinations need for incremental synchronization, the Source may publish a Change List A Change List enumerates resource change events that occurred in a temporal interval For each event, it minimally lists datetime, URI of the resource, the nature of the change Process: - Destination obtains the Change List - Destination obtains created/updated resources, removes deleted resources - Optimization: Change Dump
  • Publish Change List: Resource Changes During Interval Ty-Tz Change List [Ty,Tz] = { A updated @Tc ; B updated @Tc ; C created @Td ; D deleted @Te ; C updated @Tf }
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Change Notification In order to meet a Destinations need for incremental synchronization and low latency, the Source may send Change Notifications A Change Notification conveys resource change events as they occur For each event, it minimally lists datetime, URI of the resource, the nature of the change - Process: - Destination receives Change Notification - Destination obtains created/updated resources, removes deleted resources
  • Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Send

Recommended

View more >