resourcesync - overview and real-world use cases for discovery, harvesting, and synchronization of...

67
An overview of capabilities and real-world use cases for discovery, harvesting, and synchronization of resources on the web http://www.openarchives.org/rs #resourcesync ResourceSync ANSI/NISO Z39.99-2017 Martin Klein Gretchen Gueguen Mark Matienzo Petr Knoth

Upload: martin-klein

Post on 21-Apr-2017

128 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

An overview of capabilities and real-world use cases for discovery, harvesting, and synchronization of resources on the web

http://www.openarchives.org/rs #resourcesync

ResourceSync ANSI/NISO Z39.99-2017

Martin Klein

Gretchen Gueguen

Mark Matienzo

Petr Knoth

Page 2: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ResourceSync was funded by the Sloan Foundation & JISC

Martin Klein Los Alamos National Laboratory

@mart1nkle1n

http://www.openarchives.org/rs #resourcesync

ResourceSync ANSI/NISO Z39.99-2017

Page 3: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ResourceSync - @mart1nkle1n DPLAfest, Chicago, April 20 2017

Background - OAI-PMH

•  Recurrent metadata exchange from a Data Provider to Service Providers

•  XML metadata only

•  Repository centric

•  Devised 1999-2002, prior to REST, prior to dominance of web search engines

Page 4: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ResourceSync - @mart1nkle1n DPLAfest, Chicago, April 20 2017

Revisit the Problem Domain - ResourceSync

•  Synchronization of resources from a Source to Destinations

•  Web resources, anything with an HTTP URI & representation

•  Resource centric

•  Devised 2012-2013, leverages key ingredients of web interoperability, existing specifications

•  Updated in 2017 to v1.1

Page 5: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

One to One Synchronization

Page 6: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

One to Many – Master Copy

Page 7: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Many to One - Aggregator

Page 8: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Selective Synchronization

Page 9: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Metadata Harvesting

Page 10: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ResourceSync - @mart1nkle1n DPLAfest, Chicago, April 20 2017

ResourceSync Capabilities

•  Resource List •  Inventory, baseline synchronization

•  Change List •  Resource change events that occurred in a temporal interval,

incremental synchronization

•  Resource Dump •  Change Dump •  Notifications (separate specification) •  Archives (beta draft)

Page 11: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ResourceSync - @mart1nkle1n DPLAfest, Chicago, April 20 2017

Sitemap

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9”> <url> <loc>http://example.com/res1</loc> <lastmod>2017-01-02T13:00:00Z</lastmod> </url> <url> <loc>http://example.com/res2</loc> <lastmod>2017-01-02T14:00:00Z</lastmod> <changefreq>daily</changefreq> </url> … </urlset>

Page 12: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ResourceSync - @mart1nkle1n DPLAfest, Chicago, April 20 2017

Resource List

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" at="2017-01-03T09:00:00Z” /> <url> <loc>http://example.com/res1</loc> <rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" type="application/pdf" /> <rs:ln rel="describedby" href="http://example.com/res1_dublin_core_md.xml" type="application/xml" /> </url> <url> … </url> </urlset>

Page 13: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ResourceSync - @mart1nkle1n DPLAfest, Chicago, April 20 2017

Change List

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="changelist" from="2017-01-02T09:00:00Z" until="2017-01-03T09:00:00Z" /> <url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change="created" datetime="2017-01-02T13:00Z" /> </url> <url> <loc>http://example.com/res3</loc> <rs:md change="updated" datetime="2017-01-02T15:00Z" /> </url> </urlset>

Page 14: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ResourceSync - @mart1nkle1n DPLAfest, Chicago, April 20 2017

ResourceSync Change Notifications

•  Notifications about change events to resources •  Source notifies subscribed Destinations (cf. recurrent pull) •  Push-based approach via WebSub •  Similar, sitemap-based payload •  Decrease synchronization latency between Source and Destination •  Change Notification Specification v1.0

Page 15: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ResourceSync - @mart1nkle1n DPLAfest, Chicago, April 20 2017

EHRI Use Case

•  Aggregation of information about Holocaust collections •  held by 1,800+ organizations worldwide •  into a central service •  EAD as exchange format

•  Diversity of data sources and locations

•  databases, spreadsheets (“home collections”)

https://ehri-project.eu/ http://portal.ehri-project.eu

https://twitter.com/EHRIproject

Page 16: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ResourceSync - @mart1nkle1n DPLAfest, Chicago, April 20 2017

EHRI Use Case

•  Special ResourceSync implementation •  Bridges gap between local systems and ResourceSync

capability documents on a web server •  Filters local resources by subject, time period, etc •  Set up by EHRI technical staff, run by contributing party

•  Baseline synchronization: Resource Lists •  Incremental synchronization: Change Lists •  Together with EAD files moved from local system to web server

•  Dropbox, FTP, USB stick

•  Service: partners expose EADs, server collects and offers value-added services e.g., graph database

https://ehri-project.eu/ http://portal.ehri-project.eu

https://twitter.com/EHRIproject

Page 17: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ResourceSync - @mart1nkle1n DPLAfest, Chicago, April 20 2017

CLARIAH Use Case

•  Various institutions host evolving collections •  Make collection items uniformly available via RDF graph •  Central registry holds description of all collections

•  Researchers use Virtual Research Environment to •  Discover collections (via registry) •  Collect graphs from respective institution •  Keep graphs up to date

https://www.clariah.nl/ https://twitter.com/CLARIAH_NL

Page 18: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ResourceSync - @mart1nkle1n DPLAfest, Chicago, April 20 2017

CLARIAH Use Case

•  Baseline synchronization •  Download graph from DB •  Serialized as one or more files, one RDF triple per line

(+ s p o graph_name) •  + stands for “add” •  URIs of files listed in Resource List

•  Incremental synchronization •  Changes logged in one or more files, one change per line

(+/- s p o graph_name) •  + stands for “add”, “-” for delete •  URIs of files listed in Change List

https://www.clariah.nl/ https://twitter.com/CLARIAH_NL

Page 19: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ResourceSync - @mart1nkle1n DPLAfest, Chicago, April 20 2017

ResourceSync Tools

•  Source implementation •  Python •  DANS & LANL & CORE •  Connectors to file system, Solr index •  OAI-PMH converter (planned) •  https://github.com/resourcesync/py-resourcesync

•  Client implementation •  Python •  https://github.com/resync/resync

•  Notification implementation

•  PubSubHubbub •  https://github.com/resync/resourcesync_push

Page 20: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Hyku & DPLAResourceSync Implementations

Gretchen Gueguen, Data Services CoordinatorDigital Public Library of America,[email protected]

Page 21: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Project Background● IMLS National Leadership Grant

(30 months)● Foster a national digital

platform through community-based repository infrastructure

● Leverage & contribute to Hydra, both in code and community

Page 22: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Primary Project Goals1. Develop turnkey (“easy to install, easy to maintain”)

Hydra-based application that leverages and improves on core code components

2. Develop metadata aggregation & enrichment tools

3. Work toward a hosted service in the cloud

Page 23: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Metadata Aggregation @DPLA

Page 24: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Metadata Aggregation @DPLAMethods for Data Aggregation:

● OAI PMH (21 providers)● Custom APIs/other (8 providers)● Direct file transfer (3 providers)

Biggest Drawbacks:

● Re-synchronizing entire data sets● Relying on http requests

Page 25: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ResourceSync and Hyku● ResourceSync publishing support built into MVP

● Test application with 50,000 records to start○ Limit for a single list. To add more, we would need to make a list of

lists.

● Resource lists and change lists are supported

● Resource or change dumps not currently supported

● Content negotiation for JSON-LD, N-Triples, and Turtle

Page 26: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ResourceSync and DPLAHarvester developed for Hyku endpoint

● Development for this specific endpoint means that it’s not a full test of all ResourceSync capabilities

● We suspect that we will prefer the Dump to the List○ Using the List means making HTTP calls for each item in order to do

the content negotiation○ Dump allows us to just download specifically what we need○ We will still be downloading records that weren’t updated but given

the typical size of the diff for each provider this single download may still be preferable to 100,000 HTTP requests

● Future implementations may require us to build on this initial harvester if the specifics are different

Page 27: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Next Steps

Hyku:

● Possibly support Dump● Increase test set over

50K

DPLA:

● Harvest from 3 DPLA providers implementing ResourceSync by end of year

Page 28: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

IIIF & ResourceSync:Supporting discovery

Mark A. Matienzo, Stanford University Libraries@anarchivist / https://orcid.org/0000-0003-3270-1306DPLAFest — Chicago, Illinois — April 20, 2017

Page 29: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

International Image Interoperability Framework

A communitythat develops Shared APIs

implements them in Softwareand exposes interoperable Content

http://iiif.io/

Page 30: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

IIIF Communityhttp://iiif.io/community

● IIIF Consortium○ Currently 38 state/national

libraries, universities, museums, tech firms

○ Provides sustainability and steering for the initiative

● Wider community○ 80+ CH institutions, companies,

and projects using IIIF standards○ iiif-discuss list = 670+ members○ IIIF Slack = 300+ members

● Community & Technical Specification Groups

Page 31: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Shared APIshttp://iiif.io/api/

● Image API○ Transfer image pixels, regions, etc.○ Image manipulation

● Presentation API○ Presentation of an object (pixels +

navigation and metadata)○ Easily share and re-use, mix and

match content○ Annotate content

● Search API○ Search annotations

● Authentication API○ Provide interoperability for

access-restricted content

Page 32: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Software Implementations

https://github.com/IIIF/awesome-iiif

Page 33: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

IIIF ContentAll kinds of image resources:

artworks, photographs,manuscripts, newspapers

Investigating AV and 3D

Page 34: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

“Discovery”in IIIF

Finding interoperable resources

Two main concerns:

● How can users find IIIF resources?

● How can users then get those resources into an environment where they can use them?

Page 35: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Scoping the problemWhat resources

can be discovered?

Types of resources in IIIF:

● Content (Image API)● Description (Presentation API)

The Image API does not provide description of image content, just technical and rights metadata.

Discovery requires Description resources to provide information about Content resources.

Page 36: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Presentation APIA Manifest providesjust enough metadata (descriptive, structural, etc.) to drive a viewer.

A Collection groups Manifests or other Collections.

http://iiif.io/api/presentation/2.1/

Page 37: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Community work

IIIF Discovery Technical Specification Group

iiif.io/community/groups/discovery/

IIIF Discovery TSG scope:

● Crawling and harvesting● Content indexing● Change notification● Import to viewers

Page 38: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Presentation API constraints

Informing decisions

The Presentation API does not include semantic descriptions, but can reference them using seeAlso.

IIIF (including the Presentation API) has a resource-centric view of the web, not a service-centric view (cf Sitemaps/ResourceSync vs OAI-PMH).

Page 39: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Examples

Page 40: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Basic Sitemaps at NC State

● Example demonstrates use of Simple sitemaps without any extensions, including ResourceSync

● Intended to expand upon existing practice of publishing sitemaps from digital collections

Page 41: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Sitemap entry for manifests

<url> <loc>https://d.lib.ncsu.edu/collections/catalog/bh1141pnc004/manifest</loc> <lastmod>2016-12-13T15:38:19Z</lastmod></url>

Sitemap entry for landing page

<url> <loc>https://d.lib.ncsu.edu/collections/catalog/bh1141pnc004</loc> <lastmod>2017-03-27T19:33:52Z</lastmod></url>

Sample of NCSU Sitemaps

Courtesy Jason Ronallo, North Carolina State University

Page 42: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Prototyping at Europeana

Exploring Sitemaps and extensions for discovery of

IIIF resources for harvesting

● Partnership with University College Dublin and National Library of Wales

● ResourceSync satisfied key needs identified within requirements

● ResourceSync accommodated additional metadata prototyped in an IIIF Sitemap Extension

● Follows several synchronization paradigms

Page 43: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Uses Sitemaps and IIIF Extension

<url> <loc>http://newspapers.library.wales/view/3320640</loc> <iiif:Manifest xmlns:iiif="http://iiif.io/api/presentation/2/"> http://dams.llgc.org.uk/iiif/newspaper/issue/3320640/manifest.json </iiif:Manifest> <dct:isPartOf>http://dams.llgc.org.uk/iiif/newspapers/3320639.json</dct:isPartOf> <lastmod>2014-11-08</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority></url>

Example of NLW Sitemap Entry

Courtesy Nuno Freire, Europeana

Page 44: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Uses Sitemaps and ResourceSync and DCMES as Extensions

<url> <loc>https://digital.ucd.ie/view/ucdlib:38491</loc> <rs:ln rel="alternate" href="https://digital.ucd.ie/view/ucdlib:38491" type="application/json" dcterms:conformsTo="http://iiif.io/api/presentation/2.1/"/> <rs:ln rel="collection” href="https://digital.ucd.ie/view/ucdlib:38488” type="application/json" dcterms:conformsTo="http://iiif.io/api/presentation/2.1/"/> <lastmod>2014-08-24T04:09:09.716Z</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority></url>

Example of UCD Resource List Entry

Courtesy Nuno Freire, Europeana

Page 45: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Uses Sitemaps, ResourceSync, and Sitemap Image Extension

Sample of UCD Resource List

Courtesy John Howard, University College Dublin

Page 46: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

ConclusionsStrengths

● ResourceSync addresses core requirements for exposing IIIF resources for harvesting

● Can build on publication of existing sitemaps easily

● Leverages Many-to-One, Selective Synchronization, and Metadata Harvesting paradigms

● Can adopt additional extensions to implement needed features

● Plenty of opportunity to contribute; need more prototypes

Challenges

● IIIF community’s needs for discovery are not necessarily what other sitemap consumers want (e.g. Google)

● Identifying the primary resource influences structure

● Unclear whether search engines support custom extensions, and what ranking impact would be

Page 47: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Thank You!Mark A. Matienzo, Stanford University Libraries@anarchivist / https://orcid.org/0000-0003-3270-1306DPLAFest — Chicago, Illinois — April 20, 2017

Page 48: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Seamlessaccesstotheworld’sopenaccessresearchpapersvia

ResourceSync

PetrKnoth

Page 49: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

UseCase1:ResourceSyncasaseamlesslayeroverheterogenousAPIs

Page 50: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

UseCase1:WhatisCORE?

OA Repositories OA Journals

Mostly OAI-PMH

COREaggregatesand

providesfreeaccessto

millionsofresearch

articlesaggregated

fromthousandsofOA

repositoriesand

journals.

Page 51: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

UseCase1:WhatisCORE?

OA Repositories OA Journals

Mostly OAI-PMH

COREaggregatesand

providesfreeaccessto

millionsofresearch

articlesaggregated

fromthousandsofOA

repositoriesand

journals.

» Enrichmentand

harmonisationof

aggregateddata

» Products/services:› Portal› API› Datadumps

› Recommendation

systemforlibraries

› Repositorydashboard› B2Bandanalyticalservices

Page 52: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

UseCase1:WhatisCORE?

OA Repositories OA Journals

Mostly OAI-PMH

COREaggregatesand

providesfreeaccessto

millionsofresearch

articlesaggregated

fromthousandsofOA

repositoriesand

journals.

» 70million+

metadatarecords

» Over6millionfull

textshostedon

CORE

» ~1.5million

monthlyactive

users

» Aggregatingfrom

2,500repositories

and10kOA

journals

Page 53: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

UseCase1:Keyissue

Keyplayersdonotprovideinteroperabilityformachine

accesstometadataandcontentofresearchpapers.

35%

23% 18%

12%

12%

Accessingfull-textbyharves5ngthewebsite

Majorsearch

engines

Recongnised

servicesupon

approval

75%

12%

13%

Restric5ngaccesstofull-text

Don'trestrict

accessinanyway

Specifyacrawl

delay

Allowaccessto

specificrobots

39%

11% 39%

11%

Referenceofanar5cle’sfull-textonmetadata

Directlinktofull-

text

Interface

supporBngfull-text

transfer

50% 42%

8%

Accessingcontentstandards

OAI

OwnAPI

Z39.50

36%

24% 4%

32%

4%

Filesformat

PDF

HTML

Plaintext

HTML

JSON

54% 31%

15%

AutomateddownloadsofOAfull-text

Website

API

FTP

Page 54: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

UseCase1:Approach

OA Repositories OA Journals

Key publishers (OA + hybrid OA)

Publisher connector

Mostly OAI-PMH

A range of bespoke APIs

+ many others

Provideseamlessaccessovernon-standardisedAPIs.

What protocol?

Page 55: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

UseCase1:Approach

OA Repositories OA Journals

Key publishers (OA + hybrid OA)

Publisher connector

Mostly OAI-PMH

A range of bespoke APIs

+ many others

Provideseamlessaccessovernon-standardisedAPIs.

What protocol? » WhynotOAI-PMH?

› slowandveryinefficient

forbigrepositories.

› Standardisedformetadatatransferbut

notforcontenttransfer.

› Verydifficultto

representtherichnessof

metadatafromabroad

rangeofdataproviders.

Page 56: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

UseCase1:ResourceSyncasaseamlessaccesslayer» Veryscalableimplementationon

boththeserverand

clientside

» Interpretationofmetadatahappens

usingexistingpipeline

attheaggregator.

» 1.5millionOA

publicationsfrom

Elsevier,Springerand

othersalready

exposed.» Availableat:https://publisher-connector.core.ac.uk/resourcesync

OA Repositories OA Journals

Key publishers (OA + hybrid OA)

Publisher connector

Mostly OAI-PMH

A range of bespoke APIs

+ many others

ResourceSync

Page 57: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

UseCase2:ExposingenricheddataforTextandDataMining(TDM)viaResourceSync

Page 58: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

UseCase2:SubscribingtoResourceSync

OA Repositories OA Journals

Key publishers (OA + hybrid OA)

Publisher connector

Mostly OAI-PMH

A range of bespoke APIs

ResourceSync

+ many others

» Otheraggregatorscan

subscribetothePublisher

connectortomakeuseoftheir

ingestionpipelinesand

enrichmenttechnologies

Page 59: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

UseCase2:ContentingestioninOpenMinTeD

OA Repositories OA Journals

Key publishers (OA + hybrid OA)

Publisher connector ResourceSync

Mostly OAI-PMH

OMTD-SHARE (over REST)

A range of bespoke APIs

+ many others

» COREandOpenAIREarecontentsourcesintheOpenMinTeD

TDMplatform(EUinfrastructureproject)beingdevelopedto

enabletheminingofscholarlyliterature.

Page 60: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

UseCase2:ExposingenricheddataforTDM

OA Repositories OA Journals

Key publishers (OA + hybrid OA)

Publisher connector ResourceSync

Mostly OAI-PMH

A range of bespoke APIs

+ many others

ResourceSync

» Butotherswantsimilarsolutions…typically,theywanttobe

abletosyncandhostthedata.

Page 61: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

UseCase3:MakerepositoriesandjournalsadoptResourceSync

Page 62: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

UseCase3:ReplaceOAI-PMHwithResourceSync

OA Repositories OA Journals

Key publishers (OA + hybrid OA)

Publisher connector ResourceSync

Mostly OAI-PMH

OMTD-SHARE (over REST)

A range of bespoke APIs

+ many others

ResourceSync

ResourceSync

» Willbeagamechanger…

» AdvocatedbyCOARNext

GenerationRepositoriesWG

Page 63: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Keycontributionsandconsiderations

Page 64: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

What’snewaboutourimplementationofResourceSync?

» Scalestomanymillionsofresourcesasrequiredby

aggregators(asopposedtoexistingimplementationsfor

repositoriesthatarescalablefortensofthousandsof

resources)

» Real-timeupdatingofResourceListsandChangeLists

(avoidingunnecessarybatchprocesses).

» Combinationofreal-timeupdatesandscalability

Page 65: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Architecturalchoices

» Basedontheprincipleofchangesbeingcommunicated

toacontrollerastheyhappen(ratherthanhavingtobe

detectedpriortoResourceList/ChangeListupdates)

» UsesElasticsearchasadatabase» Hashingmechanismtodistributesizeofeach

ResourceListlinkandaclevermechanismforiterative

updatingofResourceLists

Page 66: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

Conclusions» ResourceSync:› broadrangeofusesinscholarlycommunication.

› solvesproblemswithaggregatingcontentoverOAI-PMH,faster&

moreefficientaggregation=>fresherdatainaggregatorscompared

toOAI-PMH

» WeusedResourceSyncto”liberate”over1.5millionOApapers(and

growing)fromkeypublishers

» COREsoontoprovideaccesstoover8millionOAfulltextsvia

ResourceSync.

» COREactivelycontributestotheadoptionofResourceSyncinthe

repositoriescommunity(aspartofOpenMinTeDandCOARNGR)

Page 67: ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web

An overview of capabilities and real-world use cases for discovery, harvesting, and synchronization of resources on the web

http://www.openarchives.org/rs #resourcesync

ResourceSync ANSI/NISO Z39.99-2017

@mart1nkle1n @G_AmSpinnrade @anarchivist @petrknoth