san diego supercomputer center self-organizing smart namespaces : next generation data grid systems...
TRANSCRIPT
San Diego Supercomputer CenterSan Diego Supercomputer Centerwww.iRODS.org
Self-organizing Smart Namespaces : Next Generation Data Grid Systems
Arun Jagatheesan
iRODS.org
San Diego Supercomputer Centerwww.iRODS.org 2
Content Outline
• State of the art• Where we stand • Concepts
• What is next, new, hot and exciting?• Yesterday’s research - now• Today’s research - future?
• What could be done from OGF, SNIA, IETF??• Standard for distributed data management• Risks, rewards
San Diego Supercomputer Centerwww.iRODS.org 3
State of the art - where we are now(Shameless self promotion or fact!)
• Estimated 2 petabytes of data brokerage• Multiple agencies- DoD, NARA, NSF, NIH, …• Multiple countries - US, UK, Japan, France…, Antartica• Span off a private company …
We don’t live in the past anyways…
San Diego Supercomputer Centerwww.iRODS.org 4
Concepts and Lessons(Current understanding - looking back)
• Don’t hide distributed computing• Allows users to “enjoy” distributed namespace rather than cheat them
with “location opaque” namespace (unlike traditional file systems)• Human readable or enjoy-able (No urls, uuids etc)
• Logical mappings to physical heterogeneities• Data (files), storage resource, metadata, user groups, policies, and
even file systems become logical entities in data grids• Hide every thing including with logical human-friendly names
• Keep it simple and scalable (It’s the data model & design)• Not layer on top of another layer. Finished product not lego blocks.• Hybrid approach - Neither too much P2P nor too much centralization.
Just the right level of distributed computing with some TLC for users
San Diego Supercomputer Centerwww.iRODS.org 5
Content Outline
• State of the art• Where we stand • Concepts
• What is next, new, hot and exciting?• An use case - LSST• Yesterday’s research - now• Today’s research - future?
• What could be done from OGF, SNIA, IETF??• Standard for distributed data management• Risks, rewards
San Diego Supercomputer Centerwww.iRODS.org 6
Motivational Use Case
• LSST = Large Synoptic Survey Telescope • 150+ Petabytes• Multiple countries, multiple data centers• Multiple heterogeneous file systems (high
performance, high distribution, interoperability, P2P, …)
• Multiple heterogeneous hardware
San Diego Supercomputer Centerwww.iRODS.org 7
Yesterday’s research
• Data Grid Workflows and policies• Some concepts prototyped in SRB Matrix• Event, Condition, Action (ECA) based “data grid flows”
• If, for, for-each, if-else, switch-case
• Server-side workflows on data grids• Use a separate language to capture the recipe of workflow
and execute it as action - Data Grid Language• Let the flow be with you (Flow data type was introduced)
San Diego Supercomputer Centerwww.iRODS.org 8
Today’s research = future
• Now = Lessons learnt + yesterday’s research• Allow logical namespace to reflect local
namespace (local file system logically mounted on global namespace)
• Allow users to define their own policies and workflows (Services, rules)
• iRODS.org - Open source platform - world’s first open source Data Grid Management System (DGMS).
San Diego Supercomputer Centerwww.iRODS.org 9
iRODS.org
• Its all about the namespace and how user’s or applications interact with it
• What if we made this namespace “smart”• ECA Rules + Machine Learning or bootstrapped
learning• Event: (any thing, as simple as a file upload)• Condition: based on system or user metadata• Action: Any system-defined or user-defined
service
San Diego Supercomputer Centerwww.iRODS.org 10
iRODS
• Namespace #1 (data)• Human readable data names to data (or virtual data)
• Namespace #2 (resource)• Human readable resource names to storage resource (allows
distributed computing)
• Namespace #3 (policies)• Human readable policy namespace of how data needs to be managed
• Again every thing can be accessed and controlled by end-users (not just SYSTEM adminis)
San Diego Supercomputer Centerwww.iRODS.org 11
Content Outline
• State of the art• Where we stand • Concepts
• What is next, new, hot and exciting?• An use case - LSST• Yesterday’s research - now• Today’s research - future?
• What could be done from OGF, SNIA, IETF??• Standard for distributed data management• Risks, rewards
San Diego Supercomputer Centerwww.iRODS.org 12
OGF, SNIA and iRODS.org
• Collaborative data management• FAN / Data grid??? - but still Distributed data management• But still needs a standard simple API as a standard
• Data grid namespace on XAM resources• Standardize a simple API (java, C/C++) to provide data grid concepts
on top of existing SNIA XAM or products
• Open source data grid software • Involve engineers from different participating member organizations
• Multi-institutional participation• Multiple countries, mulitple companies, academic and commercial
participants
San Diego Supercomputer Centerwww.iRODS.org 13
Enthusiasm is contagious
http://www.iRODS.org