presentations introduction case studies: – policies, services, interoperability, mashups: bnf,...

31
Presentations • Introduction • Case Studies: – Policies, Services, Interoperability, Mashups: • BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects: • NARA TPAP, RENCI VO, TIP – Interfaces: • Islandora, Jargon, CDR

Upload: estefany-landry

Post on 28-Mar-2015

259 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

Presentations

• Introduction• Case Studies:

– Policies, Services, Interoperability, Mashups:• BNF, DCAPE, PoDRI, e-Legacy

– RENCI Federated Data Projects:• NARA TPAP, RENCI VO, TIP

– Interfaces:• Islandora, Jargon, CDR

Page 2: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

A UnifiedWeb interface for

Browsing or searching

Flickr file system/flickr/commons/Using flickr API, a RESTful web API

Each /flickr/commons/Institution “folder” translates to the result of one or two calls to the flickr API, presented to iRODS as if it were a file system

For a collection to integrate, it would need to have some remote API that we could write a driver for and one or more ways to map that collection into a tree

Each mountable service is made into a resource with all relevant info (location, resource type, etc.

iRODS federates major collectionsFrom Ken Arnold, SHAMAN project

YouTubeMedia accessible

through API

User Sees Single Hierarchy

New ServiceMountable file system: Hulu, photobucket, etc.

Page 3: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

UserWith Client Views & Manages Data

My DataDisk, Tape, Database,

Filesystem, etc.

The iRODS Data System can install in a “layer” over existing or new data, letting you view, manage, and share part or all of diverse data in a unified Collection.

iRODS Shows Unified “Virtual Collection”

My DataDisk, Tape, Database,

Filesystem, etc.

User Sees Single “Virtual Collection”

Partner’s DataRemote Disk, Tape,

Filesystem, etc.

Page 4: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

UserWith iRODS Client

searches CATALOG to find and get Data

Users can search for, access, add/extract metadata, annotate, analyze & process, replicate, copy, share data, manage & track access, subscribe, and more.

Accessing Data in the iRODS System

“Gets data to user.”

“I need data!”

“Finds the data.”

Data ServerDisk, Tape, Database,

Filesystem, etc.

iRODS MetadataCatalog

Keeps track of data

iRODS Data System

Page 5: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

User InterfaceWeb or GUI Client to

Access and Manage Data & Metadata*

Overview of iRODS Components

iRODS ServerData on Disk

iRODS MetadataCatalog

DatabaseTracks state of data

iRODS Rule Engine

Implements Policies

*Access data with: Web-based Browser, iRODS GUI, Command Line clients, Dspace, Fedora, Kepler workflow, WebDAV, user level file system, etc.

Page 6: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

CommunityDecides how to manage shared

Collection(s)

"Layers" in iRODS: From Users to Storage

PoliciesExpress goals for data

access, sharing, preservation, etc.

PoliciesExpress goals for data

access, sharing, preservation, etc.

RulesImplement Policies in

computer-actionable form

RulesImplement Policies in

computer-actionable form

iRODS Server Executes Micro-

servicesMicro-servicesOperate on reomte data

Micro-servicesOperate on reomte data

Page 7: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

Under the hood - a glimpse

iRODS ServerRule Engine

• Data request goes to 1st Server

iRODS ServerRule Engine

iRODS Server Rule Engine

DB

• Server looks up information in catalog• Catalog tells 2nd federated server has data• 1st server asks 2nd server for data• 2nd server applies Rules and serves data

• User asks for data (using logical properties)

Meta DataCatalog

NC State Duke Chapel Hill

Page 8: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

Policies in iRODS • Policies: Express community goals for data access and sharing,

management, long-term preservation, uses, etc. • Policy Examples

– Run a particular workflow when a “set of files” is ingested into a collection (e.g. make thumbnails of images, post to website).

– Automatically replicate a file added to a collection into 3 geographically distributed sites.

– Automatically extract metadata for a file of a certain type and store in metadata catalog.

– Periodically check integrity of files in a Collection and repair/replace if needed/possible.

– Automatically pick a certain storage location based on user or collection or size or type.

– Let a user access a collection only if using certificate-based login.– Send a notification when a certain file is ingested.– etc.

Page 9: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

Policies, Services, Interoperability, Mashups:

Richard Marciano, SILS

Page 10: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

e-Legacy Mashup

RSSRSSFeed

ReaderFeed

ReaderData Grid(SRB/iRODS)Data Grid(SRB/iRODS)

Page 11: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

AppraisalAppraisal

Description Arrangement Preservation

Description Arrangement Preservation

e-Legacy Demo

Subscribe to RSS

Subscribe to RSS

Review Received Entry

Review Received Entry

Share and Tag Share

and Tag

MeetPreservation

Criteria

MeetPreservation

Criteria

Preserve toiRODS

Preserve toiRODS

YesYes

Page 12: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

National Library of France:Distributed Archiving & Preservation System (SPAR)

Page 13: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

BNF: French National Library• Three rules:

– Import• Import an input document into iRODS• Add import date and checksum as AVU-triplet metadata• Replicate to other resources

– Get• Locate a copy of the record• Return if physical checksum .eq. stored checksum• If not, delete replica, copy a good one over it

– Audit• Locate all replicas of a data object• Compute a physical checksum using system’s MD5• Compare the result of the checksum stored in user metadata• All stale copies are removed and then replicated from another good copy• When all copies are audited, a clean copy is staged onto a specific FS directory

Page 14: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

BNF: French National Library• Three rules:

– Import• Import an input document into iRODS• Add import date and checksum as AVU-triplet metadata• Replicate to other resources

– Get• Locate a copy of the record• Return if physical checksum .eq. stored checksum• If not, delete replica, copy a good one over it

– Audit• Locate all replicas of a data object• Compute a physical checksum using system’s MD5• Compare the result of the checksum stored in user metadata• All stale copies are removed and then replicated from another good copy• When all copies are audited, a clean copy is staged onto a specific FS directory

Page 15: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

BNF: French National Library• Micro-Services

– Add metadata to an iRODS object– Import an object into iRODS, compute MD5 checksum and validate

against the supplied one. Once validated, add MD5SUM and import date as metadata. If invalid, content is removed from iRODS

– Return the value of an iRODS object metadata attribute– Prepare to retrieve a metadata attribute for a resource– Prepare to retrieve a metadata attribute for an object– Get the input resources belonging to a zone name– Get iCAT results regarding location info for a record– Execute MD5SUM on the physical content and return value– Return a pseudo random string of specified length– Delete a stale replica and replicate over it from another fresh copy– Stale replica replacement can be eager (synchronous execution) or

lazy (delayed execution)

Page 16: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

DCAPE

Page 17: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

DCAPE

Page 18: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

DCAPE

Page 19: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

PoDRI: Policy-Driven Repository Interoperability

Page 20: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

RENCI Federated Data Projects

Leesa Brieger, RENCI

Page 21: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

RENCI VO Data Grid

iRODS Server Metadata Catalog (iCAT)

DB RENCI, Europa Center

iRODS ServeriRODS Server

UNC-A UNC-CH

NCSU Duke

iRODS ServeriRODS Server iRODS Server

• Client asks for data

• Data request goes to iRODS server

• Server looks up information in iCAT

• iCAT tells which iRODS server has data

• Data is retrieved from physical location and delivered to client

• Client asks for data

• Data request goes to iRODS server

• Server looks up information in iCAT

• iCAT tells which iRODS server has data

• Data is retrieved from physical location and delivered to client

ECU

Page 22: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

National Archives and Records Administration Transcontinental Persistent Archive Prototype (TPAP)

UMD UCSD

iCAT iCAT

Georgia Tech

iCAT

Federation of Seven Independent Data Grids

NARA II

iCAT

NARA I

iCAT

• Extensible Environment: can federate with additional research and education sites.

• Each data grid uses different vendor products.

Rocket Center UNC

iCATiCAT

Page 23: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

Federated Repositories

TUCASI Infrastructure Project (TIP)

Page 24: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

TUCASI Infrastructure Project (TIP)

• Leverage data resources for competitive research and leadership• Support research and education efforts in a wide range of disciplines and

domains• National leadership in next-generation data management

• Model for long term campus storage• Architecture and design; hardware, software• Operations and support• Data policies

Selection and retention Ingest, curation and preservation Collections and repository management

Goals

Page 25: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

A TestClassroom content on a DICE/RENCI

data grid

Panopto Elluminate

Page 26: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

Interfaces Jargon, Web, REST, SOAP

Mike Conway, DICE CenterJargon, Java, Interface Developer

Page 27: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

GoalsMake integration simple by creating clear, familiar service API.Make IRODS a familiar, easy-to-use resource to mid-tier Java developers.Develop a REST/SOAP service model for common use-cases using mature tools.Create an out-of-the-box web interface that makes IRODS easy for administrators and archivists.

Page 28: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

Currently...•Jargon is a pure-Java API that talks to IRODS over Java sockets.

•Jargon is fairly low-level and can be tricky at first.

•Used in multiple projects including WebDAV interface, as well as integration with the Fedora repository via the irodsfedora library.

Page 29: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

Jargon (next...)Jargon-core: Jargon re-factored

High level service API, POJO's, Spring-friendly Emphasis on testability

Jargon-akubra: Implementation of an Akubra module for IRODS via Jargon

Jargon-lingo: Application of mature open-source tools over Jargon-core to provide REST-ful, SOAP, and Web interface to IRODS.

Page 30: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

Conceptual Diagram

IRODS Grid

Jargon-core

Jargon-lingo Jargon-akubra

Custom code(Java, Groovy,

JythonJruby, etc.)

DuraSpaceFrameworks

Web

SOAP/REST

IRODSServiceModel

Page 31: Presentations Introduction Case Studies: – Policies, Services, Interoperability, Mashups: BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects:

TRLN Partners QuestionnaireNC StateJim Tuttle

DukeSeth Shaw

DukeWinston Atkins

DukeRussell Koonts

UNCWill Owen

1. Preservation Projects

• Geo NDIIPP• Images• e-Theses• Dissertations

• records • TRAC• 30 criteria

• Fedora iRODS• checksum• 2 copies

• CDR

2. Status • Planned • planned• production

• ½ way • testing phase • near production

3. Preservation Challenges

• permission• auditing• replication

• search/browse • version control

• policies• tiered storage

• getting the backlog

• generating meta.• consolidating meta.• prez. planning• sys. reliability

4. iRODS • no • no • no • yes • yes

5. iRODS Challenges

• NA • NA • NA • none • rules syntax• documentation• production configuration• stable release

6. Questions None None None • working w. archivists• maintenance releases• iRODS book