resourcesync - an introduction

24
ResourceSync - An Introduction Todd Carpenter Executive Director, NISO ALCTS Continuing Resources Standards Forum Sunday, June 24, 2012 With thanks to Herbert Van de Sompel and Robert Sanderson (LANL)

Upload: national-information-standards-organization-niso

Post on 08-May-2015

482 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: ResourceSync - An Introduction

ResourceSync - An Introduction

Todd CarpenterExecutive Director, NISO

ALCTS Continuing Resources Standards ForumSunday, June 24, 2012

With thanks to Herbert Van de Sompel and Robert Sanderson (LANL)

Page 2: ResourceSync - An Introduction

@TAC_NISO Twitter Highlights• Presenting this morning on the ResourceSync project at ALCTS Continuing Resources Standards

Forum #ALCTSCRS #ala12

• I’m pre-tweeing my slides during #rsync presentation. Slides here:  _________ #ala12

• NISO mission is to develop and maintain technical standards related to information, documentation, discovery and distribution of content #ala12

• Standards are all around us, even if we don't notice them, especially in books.  Things like page numbers, paper, binding, even spelling is standardized. #NISO #ala12

• Machines don’t talk like people do.  Then again some people don’t talk like other people do, particularly teenagers #ala12

• So where did the ResourceSync project start?  #NISO approached OAI about updating the PMH protocol. #ala12

• The #NISO / OAI ResourceSync project was possible through the generous support of the Alfred P. Sloan Foundation.  Thank you!  #ala12

• What is RSync trying to solve: Source Server has resources that change.  Destination servers want to leverage some or all of Source on regular ongoing basis in near-real-time & at web scale. #ala12

• Syntonization can be good enough or perfect and synchronization can be fast or fast enough.  #ala12

• RSync is studying a number of existing protocols to determine which (or combination of) protocols can best meet needs.  We have an bias against developing new spec from scratch. #ala12

• There are several models for synchronizing content: pull, push, conditional pull, mediated feed and pull, and a mix of feed/push/pull/service models. #ala12

• The goal of ResourceSync is to find the model that most efficiently distributes the content, while limiting the tax on the source system. #ala12

• This is very early days in the process of standards development. We’re still in the incubation stage. Consensus and adoption phases will come in 2013 and beyond. #ala12

• We hope to have a beta specification available by the end of 2012 of ResourceSync #ala12

Page 3: ResourceSync - An Introduction

Non-profit industry trade association accredited by ANSI

Mission of developing and maintaining technical standards related to information, documentation, discovery and distribution of published materials and media

Volunteer driven organization: 400+ spread out across the world

About

Page 4: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

Standards  are  familiar,  even  if  you  don’t  no4ce

4

Page 5: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 5

Machines don’t talk like people do

Page 6: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 6

Machines talk like this

Page 7: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

How  did  we  get  here?

• OAI-­‐PMH  Protocol– Developed  in  200X

– Developed  by  Herbert  van  de  Sompel,  Carl  Lagoze  and  the  OAI  team

– Fairly  wide  adopQon  in  scholarly  community

• In  spring  2011,  NISO  approached  OAI  to  discuss  updaQng  PMH  Protocol

• Response  was  “Let’s  try  something  else  more  in  line  with  more  modern  technology”  

7

Page 8: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 8

A partnership is born

• Agreement to launch RSync as a NISO standards process

• Partnership on grant application

• OAI team comprised core technology team

• Partnership on grant application

Page 9: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

Special  thanks  are  due  to...  

9

Page 10: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

What  we  trying  to  solve?

Consideration:Source (server) A has resources that change over time: they get

created, modified, deleted, moved, …Destination (servers) X, Y, and Z leverage (some) resources of

Source A.Problem:

Destinations want to keep in step with the resource changes at Source A: resource synchronization.

Task of ResourceSync effort:Design an approach for resource synchronization aligned with the

Web Architecture that has a fair chance of adoption by different communities.The approach must scale better than recurrent HTTP HEAD/

GET on resources.

10

Page 11: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

Use  cases  differ

How good is the synchronization?

How fast is the synchronization?

11

Perfect Good  enough

Fast Fast  enough

Page 12: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

3  disQnct  needs  regarding  resource  synchronizaQon

Baseline  matching:  An  approach  to  allow  a  DesQnaQon  that  wants  to  start  synchronizing  with  a  Source  to  perform  an  iniQal  catch  up  –  Dump.

Incremental  resource  synchronizaQon:  An  approach  to  allow  a  DesQnaQon  to  remain  up-­‐to-­‐date  regarding  changes  at  the  Source.

Audit:  An  approach  to  allow  checking  whether  a  DesQnaQon  is  in  sync  with  a  Source    –  Inventory.

=>  All  3  are  considered  in  scope  for  the  effort

12

Page 13: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

ResourceSync  Working  GroupHerbert Van de Sompel (Chair)Los Alamos National Laboratory

Todd Carpenter (Co-Chair)National Information Standards Organization (NISO)

Nettie LagaceNational Information Standards Organization (NISO)

Manuel BernhardtDelving B.V.

Kevin FordLibrary of Congress

Bernhard HaslhoferCornell University

Richard JonesJoint Information Systems Committee (JISC)

Martin KleinLos Alamos National Laboratory

Graham KlyneJoint Information Systems Committee (JISC)

Carl LagozeCornell University

Stuart LewisJoint Information Systems Committee (JISC)

Peter MurrayLyrasis

Michael NelsonOld Dominion UniversityDavid RosenthalStanford University

Christian SadilekRed Hat

Shlomo SandersEx Libris, Inc.

Robert SandersonLos Alamos National Laboratory

Sjoerd SiebingaDelving B.V.

Ed SummersLibrary of Congress

Simeon WarnerCornell University

Jeff YoungOCLC Online Computer Library Center

13

Page 14: ResourceSync - An Introduction

8/23/11 Data  AeribuQon  and  CitaQon  Workshop14

hep://imgs.xkcd.com/comics/standards.png/

Page 15: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

Change  NoQficaQon  -­‐  Protocols

Atom PubSubHubbub (PuSH)XMPP

PubSub extensionBoSH (XMPP over HTTP)

Comet / HTTP StreamingOpen an HTTP connection and keep reading from itBayeux Protocol

Long PollingKeep HTTP connection open until a message, then reopenBoSH, Bayeux option

WebSocketsNullMQ / ZeroMQXMPP over WebSockets?

15

Page 16: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

Incremental  Synchroniza9on  

Change  NoQficaQon  (CN)Alert  that  something  happened  (create,update,delete)

Content  Transfer  (CT)Transfer  of  just  the  change  or  the  full  resource

16

Page 17: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

Trivial  versus  OpQmal  Approaches

• Trivial  Approach  -­‐  Retrieve  &  Compare

• OpQmal  Approach  -­‐  push only the change to only the destinations monitoring the resource

17

Page 18: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

More  advanced  opQons

• Trivial  Approach  plus  CondiQonal  GET:– Retrieve  every  resource  if  it  has  changed

– EssenQally  this  is  a  Change  NoQficaQon  Pull

– Not  scalable,  strain  on  Source  Systems,  no  way  to  know  of  new  resources

18

Page 19: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

More  advanced  opQonsSimplest  Workable  Model:

Introduce  a  Feed  of  change  noQficaQons  for  all  resources

Atom,  RSS,  OAI-­‐PMH,  SiteMaps,  etc

=>SQll  not  efficient,  no  way  to  know  when  to  pull

19

Page 20: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

More  advanced  opQonsFeed  Extension  SoluQon:

ConQnue  the  Feed  paradigm,  but  introduce  aggregaQng  service  and  ping  noQficaQon  to  re-­‐pull  (simulated  push)

Only  advantageous  if  Source  already  supports  a  Feed

20

Page 21: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

The  lifecycle  of  standards  

21

You  are  here

Page 22: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

Ongoing  Research

• Change  NoQficaQon  -­‐  XMPP  &  XMPP  PubSub  &  bleeps

– LANL

– Ongoing  Experiment  with  Live  DBPedia

• Change  NoQficaQon  -­‐  Comet  /  HTTP  Streaming  &  bleeps

– ODU

– Bayeux  Protocol  via  Faye  ImplementaQon

• Change  NoQficaQon  -­‐  Change  Simulator

– Cornell  U

– Generate  configurable  change  noQficaQons

– Use  as  standardized  input  to  different  systems  for  tesQng

• Baseline  Matching  &  Audit

– Cornell  U

22

Page 23: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012

Timeline

• Project  Launch  =  November  2011

• Approved  work  item  =  December  2011

• Working  Group  formed  =  February  2012

• Webinar  on  project  =  March  2012

• JCDL  meeQng,  Washington  DC  =  June  2012

• Alpha  =  ??  September  2012

• Beta/Dran  for  trail  use  =  ??  December  2012

• Comment  period  =  ??  December  2012  -­‐  March  2012

• Training  =  ??  May  -­‐  July  2013

• Approval  =  ??  December  2013

23

Page 24: ResourceSync - An Introduction

June  23,  2012 ALCTS  CRS  Standards  IG  -­‐  ALA  Annual  2012 24

Thank you!

Todd Carpenter, Executive [email protected]

National Information Standards Organization (NISO)One North Charles Street, Suite 1905Baltimore, MD 21201 USA+1 (301) 654-2512www.niso.org

NOTE  =>NISO  IS  MOVING  IN  JULY  2012  <=