a micro-services-based approach for curation and preservation solutions stephen abrams patricia...
TRANSCRIPT
A Micro-Services-Based Approach for Curation and Preservation Solutions
Stephen AbramsPatricia Cruse
John KunzePerry Willett
University of California Curation CenterCalifornia Digital Library
Global Oracle PASIG User Group Meeting, Redwood Shores, May 10-12, 2011
Who are we?
What do we do?
• Innovative curation solutions (preservation and access) for the UC community and external partners– EZID, Merritt repository, Web Archiving Service (WAS)– Guidelines and best practices– Integration with other CDL programs (e.g., OA publishing,
special collections) and external initiatives (e.g., DataCite, DataONE, HathiTrust, etc.)
Publish Preserve
Access
Collect
Discover
Gather
Create
Share
CurationResearchTeachingLearning
Information lifecycleScholarly lifecycle
That’s simple, right?
• Ever increasing number, size, and diversity of content
That’s simple, right?
• Ever increasing number, size, and diversity of content
• Expanded use by new constituencies
Information Center for the Environment
Minnesota Digital Library
That’s simple, right?
• Ever increasing number, size, and diversity of content
• Expanded use by new constituencies
• Ongoing (potentially disruptive) changes in technology and user expectation
• Increasing obligations, shrinking budgets
http://www.flickr.com/photos/krbuchholz/3278516200/ http://www.flickr.com/photos/mildlydiverting/32286893
Merritt repository
“How can I meet the data management requirements of my grant?”
“I know my desktop content is at risk; what should I do?”
“What’s a good way to share the data underlying a recent publication?”
“How can I ensure persistent availability?”
Merritt repository
Preservation back-end for existing discovery services
Dark archive for preservation masters
Integration with distributed data gridsBright archive for
preservation and end-user access
Managerially friendly
Model free
application/msword 342.5 KB
Strongly versioned
Easy submission
Merritt micro-services
• Merritt is built from a micro-services toolkit– IdM/Authn/Authz LDAP– Persistent identifiers EZID– Persistent storage CAN/Pairtree/Dflat/Checkm/ReDD– Fixity Fixity– Replication
Replication– Catalog Inventory/4store– Ingest Ingest/Zookeeper– Characterization JHOVE2
– Discovery XTF– Transformation– Notification– Annotation
Version 2
GhOST/Shibboleth
Micro-service choreography
User agent Ingest
Inventory
Fixity
EZID
NodeStore
Node
Node
SIP AIP AIP
DIP
DIP
DIP
Notification
Identifier
Micro-services
The Unix philosophy• “Make each program do one thing
well”• “To do a new job, build afresh
rather than complicate old programs by adding new features”
• “Expect the output of every program to become the input to another, as yet unknown, program”
• “Design and build software … to be tried early”
• “Don't hesitate to throw away the clumsy parts and rebuild them”
L. McIlroy et al.“Unix time-sharing system forward”
Bell System Technical Journal 57:6, part 2 (1978): 1902
• Complex emergent behavior• Low barrier, low maintenance,
low commitment• Policy neutral,
protocol/platform independent• The file system is the database
http://www.flickr.com/photos/oskay/265899811
Curious Oysters, http://www.flickr.com/photos/thecuriousoysters/4458657148/
Questions?
For more information
UC Curation Centerhttp://www.cdlib.org/[email protected]
Merritt repositoryhttp://merritt.cdlib.org/
UC3Stephen Abrams Margaret LowLisa Colvin David LoyPatricia Cruse Mark Reyes Scott Fisher Tracy Seneca Alex Genadinik Joan StarrErik Hetzner Marisa StrongGreg Janée Perry WillettJohn Kunze
• Abrams, Cruse, Kunze, & Minor, “Curation micro-services: A pipeline metaphor for repositories,” Journal of Digital Information 12:2 (2011)
• Abrams, Kunze, & Loy, “An emergent micro-services approach to digital curation infrastructure” International Journal of Digital Curation 5:1 (2010): 172-186