![Page 1: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/1.jpg)
Building an open platform across diverse content and technologies
Stephen Davison, Betsy Coles, R. S. Doiel, Tommy Keswick, and Thomas Morrell
California Institute of Technology LibraryPasadena, CA, USA
![Page 2: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/2.jpg)
Themes
Constraints and OpportunitiesMetadata management and expressionData flow and workflowConnections: APIs and identifiers
Outline
Introduction• Institutional context• Repository diversitySeed problemsToolsFuture directionsPrinciples/lessons learned
An open platform across diverse content and technologies...
![Page 3: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/3.jpg)
Challenges
Working with limited resources (human, financial)Matching needs across systems with staff skillsMaintaining multiple workflows and metadata standards
Responses
Focused activitiesStaff specializationRepository specialization• Institutional repository• Digital Library• Research data• Born digital
An open platform across diverse content and technologies...
![Page 4: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/4.jpg)
Our choices
• Institutional repository: EPrints• Digital Library: Islandora (Drupal,
Fedora Commons)• Research Data repository:
Invenio (TIND RDM)• Born digital collections:
ArchivesSpace, ePADD, etc.
![Page 5: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/5.jpg)
“Building at the edge” : the seed problems ...
EPrints• List of most recently published
articles• “Expensive” query in EPrints• Decided not to develop plugin
but to leverage API and create a data “feed”
Caltech Archives• Migration from ColdFusion &
FileMaker Pro to ArchivesSpace
• Needed to replicate existing website functionality using AS data
• Feed of AS data drives website and populates generic “feeds”
![Page 6: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/6.jpg)
• dataset - a JSON document manager, on disc or in S3 storage• datatools - utilities for working with JSON, XLS and CSV data
Command line tools
dataset
dsindexer
dsfind
dsws
Amazon S3
![Page 7: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/7.jpg)
• cait - A set of utilities that augment the ArchivesSpace API• ep - EPrints REST API harvest and client tools• caltechdata_feeds - Use Invenio Read API to harvest metadata• OAI-PMH - Will be used to harvest Islandora
Middleware and systems integration
dataset
dsindexer
dsfind
dsws
Amazon S3
ArchivesSpace
Invenio (TIND)
Islandora
EPrints
![Page 8: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/8.jpg)
Web applications
ArchivesSpace
Invenio (TIND)
Islandora
EPrints
Archives Web
Site
Feeds
Library Web
Site
IIIF
• mkpage - lightweight tool for generating static web pages• feeds.library.caltech.edu• Content for archives.caltech.edu
• library.caltech.edu runs on Drupal and pulls data from Feeds
dataset
dsindexer
dsfind
dsws
Amazon S3
![Page 9: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/9.jpg)
ArchivesSpace
Invenio (TIND)
Islandora
EPrints
Batch update tools
Repository
management tools
and widgets
(e.g. type ahead
data entry)
CrossRef FundRefORCID
Archives Web
Site
Feeds
Library Web
Site
IIIF
• ot – Collect info from ORCID API
dataset
dsindexer
dsfind
dsws
Amazon S3
External Data
![Page 10: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/10.jpg)
Preservation/Backup
Storage
ArchivesSpace
Invenio (TIND)
Islandora
EPrints
Batch update tools
Repository
management tools
and widgets
(e.g. type ahead
data entry)
CrossRef FundRefORCID
External data sources
Archives Web
Site
Feeds
Library Web
Site
IIIF
dataset
dsindexer
dsfind
dsws
Amazon S3
Backup and Preservation• Harvest content for
preservation• eprints_bagit• caltechdata_read• islandora_bagit
![Page 11: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/11.jpg)
Tools/Demos
Feeds - feeds.library.caltech.edu• Centralised publications list for Caltech, groups, and researchers
Dataset and search• Combine records from CaltechAUTHORS and CaltechDATA • Cross repository search
![Page 12: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/12.jpg)
Exposing Services to Users
Preservation/Backup
Storage
ArchivesSpace
Invenio (TIND)
Islandora
EPrints
Batch update tools
Repository
management tools
and widgets
(e.g. type ahead
data entry)
CrossRef FundRefORCID
Archives Web
Site
Feeds
Library Web
Site
IIIF
dataset
dsindexer
dsfind
dsws
Amazon S3
• Provide content to various audiences• Library website as starting point for all• Website can display feeds from many sources
![Page 13: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/13.jpg)
Future Work: Data Integration
Preservation/Backup
Storage
ArchivesSpace
Invenio (TIND)
Islandora
EPrints
Batch update tools
Repository
management tools
and widgets
(e.g. type ahead
data entry)
CrossRef FundRefORCID
Archives Web
Site
Feeds
Library Web
Site
IIIF
dataset
dsindexer
dsfind
dsws
Amazon S3
• Connect publication and data records by “IsSupplement” tags• Links added in one repository are automatically reflected in the other
![Page 14: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/14.jpg)
Preservation/Backup
Storage
ArchivesSpace
Invenio (TIND)
Islandora
EPrints
Batch update tools
Repository
management tools
and widgets
(e.g. type ahead
data entry)
CrossRef FundRefORCID
Archives Web
Site
Feeds
Library Web
Site
IIIF
dataset
dsindexer
dsfind
dsws
Amazon S3
Future Work: Adding Metadata
• Standardize funding information and add identifiers
• Collect and Display CrossRef Event Data for items in repositories
![Page 15: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/15.jpg)
Preservation/Backup
Storage
ArchivesSpace
Invenio (TIND)
Islandora
EPrints
Batch update tools
Repository
management tools
and widgets
(e.g. type ahead
data entry)
CrossRef FundRefORCID
Archives Web
Site
Feeds
Library Web
Site
IIIF
dataset
dsindexer
dsfind
dsws
Amazon S3
Future Work: Author Identifiers
• Identify and centralize individuals in all repositories
• Determine Caltech affiliation• Update other repositories with
author identifiers
![Page 16: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/16.jpg)
Prefer API over
direct DB access
![Page 17: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/17.jpg)
Develop small;
iterate frequently
Prefer API over
direct DB access
dataset
dsindexer
dsfind
dsws
![Page 18: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/18.jpg)
Develop small;
iterate frequently
Keep structures
simple
Prefer API over
direct DB access
Batch update tools
Repository management
tools and widgets
(e.g. type ahead data
entry)
![Page 19: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/19.jpg)
Data flows in one directionPreservation/Backup
Storage
ArchivesSpace
Invenio (TIND)
Islandora
EPrints
Batch update tools
Repository
management tools
and widgets
(e.g. type ahead
data entry)
CrossRef FundRefORCID
External data sources
Archives Web
Site
Feeds
Library Web
Site
IIIF
dataset
dsindexer
dsfind
dsws
Amazon S3
![Page 20: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/20.jpg)
Develop at the
edges
Develop small;
iterate frequently
Keep structures
simple
Prefer API over
direct DB accessArchivesSpace
Invenio (TIND)
Islandora
EPrints
![Page 21: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/21.jpg)
Develop at the
edges
Ongoing
harvesting, rather
than one-off
migrations
Develop small;
iterate frequently
Keep structures
simple
Prefer API over
direct DB access
![Page 22: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/22.jpg)
In the abstract…
Archives
management
Data repository
Digital object
repository
Publications
repository
External data
source
Web
publication
Feeds
Lightweight
tools
External data
source
External data
source
Storage
![Page 23: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/23.jpg)
Users and use cases
Archives
management
Data repository
Digital object
repository
Publications
repository
External data
source
Lightweight
tools
External data
source
External data
source
Storage
Researchers/groups
Collaborators
Harvesters/aggregators
Analysts
Identity management
Compliance assessment
Value assessment/statistics
IR/Archives integration
![Page 24: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/24.jpg)
Why not Hydra/Samvera [or …]?
There will always be many specialized systems
Systems will come and go
Migration will be inevitable
Continuity at the core; change only when necessary
Continuous change “at the edges”
Concentrate on building user-oriented tools, services, feeds, web sites
![Page 25: Building an open platform across diverse content and technologies 2017 Platform.pdf · Building an open platform across diverse content and technologies Stephen Davison, Betsy Coles,](https://reader036.vdocuments.us/reader036/viewer/2022071214/6042723446e07319577424e2/html5/thumbnails/25.jpg)
Thank you
Stephen [email protected]
Betsy [email protected]
R. S. [email protected]
Tommy [email protected]
Thomas [email protected]
Linde+Robinson Building at Caltechhttps://tccon-wiki.caltech.edu/@api/deki/files/1602/=L%252bR.jpg
Credit: David Wakely Photography
http://feeds.library.caltech.edu
https://github.com/caltechlibrary/dataset
https://github.com/caltechlibrary/dataset-demo