presentations introduction case studies: – policies, services, interoperability, mashups: bnf,...
TRANSCRIPT
Presentations
• Introduction• Case Studies:
– Policies, Services, Interoperability, Mashups:• BNF, DCAPE, PoDRI, e-Legacy
– RENCI Federated Data Projects:• NARA TPAP, RENCI VO, TIP
– Interfaces:• Islandora, Jargon, CDR
A UnifiedWeb interface for
Browsing or searching
Flickr file system/flickr/commons/Using flickr API, a RESTful web API
Each /flickr/commons/Institution “folder” translates to the result of one or two calls to the flickr API, presented to iRODS as if it were a file system
For a collection to integrate, it would need to have some remote API that we could write a driver for and one or more ways to map that collection into a tree
Each mountable service is made into a resource with all relevant info (location, resource type, etc.
iRODS federates major collectionsFrom Ken Arnold, SHAMAN project
YouTubeMedia accessible
through API
User Sees Single Hierarchy
New ServiceMountable file system: Hulu, photobucket, etc.
UserWith Client Views & Manages Data
My DataDisk, Tape, Database,
Filesystem, etc.
The iRODS Data System can install in a “layer” over existing or new data, letting you view, manage, and share part or all of diverse data in a unified Collection.
iRODS Shows Unified “Virtual Collection”
My DataDisk, Tape, Database,
Filesystem, etc.
User Sees Single “Virtual Collection”
Partner’s DataRemote Disk, Tape,
Filesystem, etc.
UserWith iRODS Client
searches CATALOG to find and get Data
Users can search for, access, add/extract metadata, annotate, analyze & process, replicate, copy, share data, manage & track access, subscribe, and more.
Accessing Data in the iRODS System
“Gets data to user.”
“I need data!”
“Finds the data.”
Data ServerDisk, Tape, Database,
Filesystem, etc.
iRODS MetadataCatalog
Keeps track of data
iRODS Data System
User InterfaceWeb or GUI Client to
Access and Manage Data & Metadata*
Overview of iRODS Components
iRODS ServerData on Disk
iRODS MetadataCatalog
DatabaseTracks state of data
iRODS Rule Engine
Implements Policies
*Access data with: Web-based Browser, iRODS GUI, Command Line clients, Dspace, Fedora, Kepler workflow, WebDAV, user level file system, etc.
CommunityDecides how to manage shared
Collection(s)
"Layers" in iRODS: From Users to Storage
PoliciesExpress goals for data
access, sharing, preservation, etc.
PoliciesExpress goals for data
access, sharing, preservation, etc.
RulesImplement Policies in
computer-actionable form
RulesImplement Policies in
computer-actionable form
iRODS Server Executes Micro-
servicesMicro-servicesOperate on reomte data
Micro-servicesOperate on reomte data
Under the hood - a glimpse
iRODS ServerRule Engine
• Data request goes to 1st Server
iRODS ServerRule Engine
iRODS Server Rule Engine
DB
• Server looks up information in catalog• Catalog tells 2nd federated server has data• 1st server asks 2nd server for data• 2nd server applies Rules and serves data
• User asks for data (using logical properties)
Meta DataCatalog
NC State Duke Chapel Hill
Policies in iRODS • Policies: Express community goals for data access and sharing,
management, long-term preservation, uses, etc. • Policy Examples
– Run a particular workflow when a “set of files” is ingested into a collection (e.g. make thumbnails of images, post to website).
– Automatically replicate a file added to a collection into 3 geographically distributed sites.
– Automatically extract metadata for a file of a certain type and store in metadata catalog.
– Periodically check integrity of files in a Collection and repair/replace if needed/possible.
– Automatically pick a certain storage location based on user or collection or size or type.
– Let a user access a collection only if using certificate-based login.– Send a notification when a certain file is ingested.– etc.
Policies, Services, Interoperability, Mashups:
Richard Marciano, SILS
e-Legacy Mashup
RSSRSSFeed
ReaderFeed
ReaderData Grid(SRB/iRODS)Data Grid(SRB/iRODS)
AppraisalAppraisal
Description Arrangement Preservation
Description Arrangement Preservation
e-Legacy Demo
Subscribe to RSS
Subscribe to RSS
Review Received Entry
Review Received Entry
Share and Tag Share
and Tag
MeetPreservation
Criteria
MeetPreservation
Criteria
Preserve toiRODS
Preserve toiRODS
YesYes
National Library of France:Distributed Archiving & Preservation System (SPAR)
BNF: French National Library• Three rules:
– Import• Import an input document into iRODS• Add import date and checksum as AVU-triplet metadata• Replicate to other resources
– Get• Locate a copy of the record• Return if physical checksum .eq. stored checksum• If not, delete replica, copy a good one over it
– Audit• Locate all replicas of a data object• Compute a physical checksum using system’s MD5• Compare the result of the checksum stored in user metadata• All stale copies are removed and then replicated from another good copy• When all copies are audited, a clean copy is staged onto a specific FS directory
BNF: French National Library• Three rules:
– Import• Import an input document into iRODS• Add import date and checksum as AVU-triplet metadata• Replicate to other resources
– Get• Locate a copy of the record• Return if physical checksum .eq. stored checksum• If not, delete replica, copy a good one over it
– Audit• Locate all replicas of a data object• Compute a physical checksum using system’s MD5• Compare the result of the checksum stored in user metadata• All stale copies are removed and then replicated from another good copy• When all copies are audited, a clean copy is staged onto a specific FS directory
BNF: French National Library• Micro-Services
– Add metadata to an iRODS object– Import an object into iRODS, compute MD5 checksum and validate
against the supplied one. Once validated, add MD5SUM and import date as metadata. If invalid, content is removed from iRODS
– Return the value of an iRODS object metadata attribute– Prepare to retrieve a metadata attribute for a resource– Prepare to retrieve a metadata attribute for an object– Get the input resources belonging to a zone name– Get iCAT results regarding location info for a record– Execute MD5SUM on the physical content and return value– Return a pseudo random string of specified length– Delete a stale replica and replicate over it from another fresh copy– Stale replica replacement can be eager (synchronous execution) or
lazy (delayed execution)
DCAPE
DCAPE
DCAPE
PoDRI: Policy-Driven Repository Interoperability
RENCI Federated Data Projects
Leesa Brieger, RENCI
RENCI VO Data Grid
iRODS Server Metadata Catalog (iCAT)
DB RENCI, Europa Center
iRODS ServeriRODS Server
UNC-A UNC-CH
NCSU Duke
iRODS ServeriRODS Server iRODS Server
• Client asks for data
• Data request goes to iRODS server
• Server looks up information in iCAT
• iCAT tells which iRODS server has data
• Data is retrieved from physical location and delivered to client
• Client asks for data
• Data request goes to iRODS server
• Server looks up information in iCAT
• iCAT tells which iRODS server has data
• Data is retrieved from physical location and delivered to client
ECU
National Archives and Records Administration Transcontinental Persistent Archive Prototype (TPAP)
UMD UCSD
iCAT iCAT
Georgia Tech
iCAT
Federation of Seven Independent Data Grids
NARA II
iCAT
NARA I
iCAT
• Extensible Environment: can federate with additional research and education sites.
• Each data grid uses different vendor products.
Rocket Center UNC
iCATiCAT
Federated Repositories
TUCASI Infrastructure Project (TIP)
TUCASI Infrastructure Project (TIP)
• Leverage data resources for competitive research and leadership• Support research and education efforts in a wide range of disciplines and
domains• National leadership in next-generation data management
• Model for long term campus storage• Architecture and design; hardware, software• Operations and support• Data policies
Selection and retention Ingest, curation and preservation Collections and repository management
Goals
A TestClassroom content on a DICE/RENCI
data grid
Panopto Elluminate
Interfaces Jargon, Web, REST, SOAP
Mike Conway, DICE CenterJargon, Java, Interface Developer
GoalsMake integration simple by creating clear, familiar service API.Make IRODS a familiar, easy-to-use resource to mid-tier Java developers.Develop a REST/SOAP service model for common use-cases using mature tools.Create an out-of-the-box web interface that makes IRODS easy for administrators and archivists.
Currently...•Jargon is a pure-Java API that talks to IRODS over Java sockets.
•Jargon is fairly low-level and can be tricky at first.
•Used in multiple projects including WebDAV interface, as well as integration with the Fedora repository via the irodsfedora library.
Jargon (next...)Jargon-core: Jargon re-factored
High level service API, POJO's, Spring-friendly Emphasis on testability
Jargon-akubra: Implementation of an Akubra module for IRODS via Jargon
Jargon-lingo: Application of mature open-source tools over Jargon-core to provide REST-ful, SOAP, and Web interface to IRODS.
Conceptual Diagram
IRODS Grid
Jargon-core
Jargon-lingo Jargon-akubra
Custom code(Java, Groovy,
JythonJruby, etc.)
DuraSpaceFrameworks
Web
SOAP/REST
IRODSServiceModel
TRLN Partners QuestionnaireNC StateJim Tuttle
DukeSeth Shaw
DukeWinston Atkins
DukeRussell Koonts
UNCWill Owen
1. Preservation Projects
• Geo NDIIPP• Images• e-Theses• Dissertations
• records • TRAC• 30 criteria
• Fedora iRODS• checksum• 2 copies
• CDR
2. Status • Planned • planned• production
• ½ way • testing phase • near production
3. Preservation Challenges
• permission• auditing• replication
• search/browse • version control
• policies• tiered storage
• getting the backlog
• generating meta.• consolidating meta.• prez. planning• sys. reliability
4. iRODS • no • no • no • yes • yes
5. iRODS Challenges
• NA • NA • NA • none • rules syntax• documentation• production configuration• stable release
6. Questions None None None • working w. archivists• maintenance releases• iRODS book