building an open platform across diverse content and technologies 2017 platform.pdf · building an...

25
Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles, R. S. Doiel, Tommy Keswick, and Thomas Morrell California Institute of Technology Library Pasadena, CA, USA

Upload: others

Post on 11-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Building an open platform across diverse content and technologies

Stephen Davison, Betsy Coles, R. S. Doiel, Tommy Keswick, and Thomas Morrell

California Institute of Technology LibraryPasadena, CA, USA

Page 2: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Themes

Constraints and OpportunitiesMetadata management and expressionData flow and workflowConnections: APIs and identifiers

Outline

Introduction• Institutional context• Repository diversitySeed problemsToolsFuture directionsPrinciples/lessons learned

An open platform across diverse content and technologies...

Page 3: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Challenges

Working with limited resources (human, financial)Matching needs across systems with staff skillsMaintaining multiple workflows and metadata standards

Responses

Focused activitiesStaff specializationRepository specialization• Institutional repository• Digital Library• Research data• Born digital

An open platform across diverse content and technologies...

Page 4: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Our choices

• Institutional repository: EPrints• Digital Library: Islandora (Drupal,

Fedora Commons)• Research Data repository:

Invenio (TIND RDM)• Born digital collections:

ArchivesSpace, ePADD, etc.

Page 5: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

“Building at the edge” : the seed problems ...

EPrints• List of most recently published

articles• “Expensive” query in EPrints• Decided not to develop plugin

but to leverage API and create a data “feed”

Caltech Archives• Migration from ColdFusion &

FileMaker Pro to ArchivesSpace

• Needed to replicate existing website functionality using AS data

• Feed of AS data drives website and populates generic “feeds”

Page 6: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

• dataset - a JSON document manager, on disc or in S3 storage• datatools - utilities for working with JSON, XLS and CSV data

Command line tools

dataset

dsindexer

dsfind

dsws

Amazon S3

Page 7: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

• cait - A set of utilities that augment the ArchivesSpace API• ep - EPrints REST API harvest and client tools• caltechdata_feeds - Use Invenio Read API to harvest metadata• OAI-PMH - Will be used to harvest Islandora

Middleware and systems integration

dataset

dsindexer

dsfind

dsws

Amazon S3

ArchivesSpace

Invenio (TIND)

Islandora

EPrints

Page 8: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Web applications

ArchivesSpace

Invenio (TIND)

Islandora

EPrints

Archives Web

Site

Feeds

Library Web

Site

IIIF

• mkpage - lightweight tool for generating static web pages• feeds.library.caltech.edu• Content for archives.caltech.edu

• library.caltech.edu runs on Drupal and pulls data from Feeds

dataset

dsindexer

dsfind

dsws

Amazon S3

Page 9: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

ArchivesSpace

Invenio (TIND)

Islandora

EPrints

Batch update tools

Repository

management tools

and widgets

(e.g. type ahead

data entry)

CrossRef FundRefORCID

Archives Web

Site

Feeds

Library Web

Site

IIIF

• ot – Collect info from ORCID API

dataset

dsindexer

dsfind

dsws

Amazon S3

External Data

Page 10: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Preservation/Backup

Storage

ArchivesSpace

Invenio (TIND)

Islandora

EPrints

Batch update tools

Repository

management tools

and widgets

(e.g. type ahead

data entry)

CrossRef FundRefORCID

External data sources

Archives Web

Site

Feeds

Library Web

Site

IIIF

dataset

dsindexer

dsfind

dsws

Amazon S3

Backup and Preservation• Harvest content for

preservation• eprints_bagit• caltechdata_read• islandora_bagit

Page 11: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Tools/Demos

Feeds - feeds.library.caltech.edu• Centralised publications list for Caltech, groups, and researchers

Dataset and search• Combine records from CaltechAUTHORS and CaltechDATA • Cross repository search

Page 12: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Exposing Services to Users

Preservation/Backup

Storage

ArchivesSpace

Invenio (TIND)

Islandora

EPrints

Batch update tools

Repository

management tools

and widgets

(e.g. type ahead

data entry)

CrossRef FundRefORCID

Archives Web

Site

Feeds

Library Web

Site

IIIF

dataset

dsindexer

dsfind

dsws

Amazon S3

• Provide content to various audiences• Library website as starting point for all• Website can display feeds from many sources

Page 13: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Future Work: Data Integration

Preservation/Backup

Storage

ArchivesSpace

Invenio (TIND)

Islandora

EPrints

Batch update tools

Repository

management tools

and widgets

(e.g. type ahead

data entry)

CrossRef FundRefORCID

Archives Web

Site

Feeds

Library Web

Site

IIIF

dataset

dsindexer

dsfind

dsws

Amazon S3

• Connect publication and data records by “IsSupplement” tags• Links added in one repository are automatically reflected in the other

Page 14: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Preservation/Backup

Storage

ArchivesSpace

Invenio (TIND)

Islandora

EPrints

Batch update tools

Repository

management tools

and widgets

(e.g. type ahead

data entry)

CrossRef FundRefORCID

Archives Web

Site

Feeds

Library Web

Site

IIIF

dataset

dsindexer

dsfind

dsws

Amazon S3

Future Work: Adding Metadata

• Standardize funding information and add identifiers

• Collect and Display CrossRef Event Data for items in repositories

Page 15: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Preservation/Backup

Storage

ArchivesSpace

Invenio (TIND)

Islandora

EPrints

Batch update tools

Repository

management tools

and widgets

(e.g. type ahead

data entry)

CrossRef FundRefORCID

Archives Web

Site

Feeds

Library Web

Site

IIIF

dataset

dsindexer

dsfind

dsws

Amazon S3

Future Work: Author Identifiers

• Identify and centralize individuals in all repositories

• Determine Caltech affiliation• Update other repositories with

author identifiers

Page 16: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Prefer API over

direct DB access

Page 17: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Develop small;

iterate frequently

Prefer API over

direct DB access

dataset

dsindexer

dsfind

dsws

Page 18: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Develop small;

iterate frequently

Keep structures

simple

Prefer API over

direct DB access

Batch update tools

Repository management

tools and widgets

(e.g. type ahead data

entry)

Page 19: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Data flows in one directionPreservation/Backup

Storage

ArchivesSpace

Invenio (TIND)

Islandora

EPrints

Batch update tools

Repository

management tools

and widgets

(e.g. type ahead

data entry)

CrossRef FundRefORCID

External data sources

Archives Web

Site

Feeds

Library Web

Site

IIIF

dataset

dsindexer

dsfind

dsws

Amazon S3

Page 20: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Develop at the

edges

Develop small;

iterate frequently

Keep structures

simple

Prefer API over

direct DB accessArchivesSpace

Invenio (TIND)

Islandora

EPrints

Page 21: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Develop at the

edges

Ongoing

harvesting, rather

than one-off

migrations

Develop small;

iterate frequently

Keep structures

simple

Prefer API over

direct DB access

Page 22: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

In the abstract…

Archives

management

Data repository

Digital object

repository

Publications

repository

External data

source

Web

publication

Feeds

Lightweight

tools

External data

source

External data

source

Storage

Page 23: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Users and use cases

Archives

management

Data repository

Digital object

repository

Publications

repository

External data

source

Lightweight

tools

External data

source

External data

source

Storage

Researchers/groups

Collaborators

Harvesters/aggregators

Analysts

Identity management

Compliance assessment

Value assessment/statistics

IR/Archives integration

Page 24: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Why not Hydra/Samvera [or …]?

There will always be many specialized systems

Systems will come and go

Migration will be inevitable

Continuity at the core; change only when necessary

Continuous change “at the edges”

Concentrate on building user-oriented tools, services, feeds, web sites

Page 25: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,

Thank you

Stephen [email protected]

Betsy [email protected]

R. S. [email protected]

Tommy [email protected]

Thomas [email protected]

Linde+Robinson Building at Caltechhttps://tccon-wiki.caltech.edu/@api/deki/files/1602/=L%252bR.jpg

Credit: David Wakely Photography

http://feeds.library.caltech.edu

https://github.com/caltechlibrary/dataset

https://github.com/caltechlibrary/dataset-demo