san diego supercomputer center self-organizing smart namespaces : next generation data grid systems...

13
San Diego Supercomputer Center San Diego Supercomputer Center www.iRODS.org Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

Upload: clark-warrens

Post on 15-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: San Diego Supercomputer Center  Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

San Diego Supercomputer CenterSan Diego Supercomputer Centerwww.iRODS.org

Self-organizing Smart Namespaces : Next Generation Data Grid Systems

Arun Jagatheesan

iRODS.org

Page 2: San Diego Supercomputer Center  Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

San Diego Supercomputer Centerwww.iRODS.org 2

Content Outline

• State of the art• Where we stand • Concepts

• What is next, new, hot and exciting?• Yesterday’s research - now• Today’s research - future?

• What could be done from OGF, SNIA, IETF??• Standard for distributed data management• Risks, rewards

Page 3: San Diego Supercomputer Center  Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

San Diego Supercomputer Centerwww.iRODS.org 3

State of the art - where we are now(Shameless self promotion or fact!)

• Estimated 2 petabytes of data brokerage• Multiple agencies- DoD, NARA, NSF, NIH, …• Multiple countries - US, UK, Japan, France…, Antartica• Span off a private company …

We don’t live in the past anyways…

Page 4: San Diego Supercomputer Center  Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

San Diego Supercomputer Centerwww.iRODS.org 4

Concepts and Lessons(Current understanding - looking back)

• Don’t hide distributed computing• Allows users to “enjoy” distributed namespace rather than cheat them

with “location opaque” namespace (unlike traditional file systems)• Human readable or enjoy-able (No urls, uuids etc)

• Logical mappings to physical heterogeneities• Data (files), storage resource, metadata, user groups, policies, and

even file systems become logical entities in data grids• Hide every thing including with logical human-friendly names

• Keep it simple and scalable (It’s the data model & design)• Not layer on top of another layer. Finished product not lego blocks.• Hybrid approach - Neither too much P2P nor too much centralization.

Just the right level of distributed computing with some TLC for users

Page 5: San Diego Supercomputer Center  Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

San Diego Supercomputer Centerwww.iRODS.org 5

Content Outline

• State of the art• Where we stand • Concepts

• What is next, new, hot and exciting?• An use case - LSST• Yesterday’s research - now• Today’s research - future?

• What could be done from OGF, SNIA, IETF??• Standard for distributed data management• Risks, rewards

Page 6: San Diego Supercomputer Center  Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

San Diego Supercomputer Centerwww.iRODS.org 6

Motivational Use Case

• LSST = Large Synoptic Survey Telescope • 150+ Petabytes• Multiple countries, multiple data centers• Multiple heterogeneous file systems (high

performance, high distribution, interoperability, P2P, …)

• Multiple heterogeneous hardware

Page 7: San Diego Supercomputer Center  Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

San Diego Supercomputer Centerwww.iRODS.org 7

Yesterday’s research

• Data Grid Workflows and policies• Some concepts prototyped in SRB Matrix• Event, Condition, Action (ECA) based “data grid flows”

• If, for, for-each, if-else, switch-case

• Server-side workflows on data grids• Use a separate language to capture the recipe of workflow

and execute it as action - Data Grid Language• Let the flow be with you (Flow data type was introduced)

Page 8: San Diego Supercomputer Center  Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

San Diego Supercomputer Centerwww.iRODS.org 8

Today’s research = future

• Now = Lessons learnt + yesterday’s research• Allow logical namespace to reflect local

namespace (local file system logically mounted on global namespace)

• Allow users to define their own policies and workflows (Services, rules)

• iRODS.org - Open source platform - world’s first open source Data Grid Management System (DGMS).

Page 9: San Diego Supercomputer Center  Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

San Diego Supercomputer Centerwww.iRODS.org 9

iRODS.org

• Its all about the namespace and how user’s or applications interact with it

• What if we made this namespace “smart”• ECA Rules + Machine Learning or bootstrapped

learning• Event: (any thing, as simple as a file upload)• Condition: based on system or user metadata• Action: Any system-defined or user-defined

service

Page 10: San Diego Supercomputer Center  Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

San Diego Supercomputer Centerwww.iRODS.org 10

iRODS

• Namespace #1 (data)• Human readable data names to data (or virtual data)

• Namespace #2 (resource)• Human readable resource names to storage resource (allows

distributed computing)

• Namespace #3 (policies)• Human readable policy namespace of how data needs to be managed

• Again every thing can be accessed and controlled by end-users (not just SYSTEM adminis)

Page 11: San Diego Supercomputer Center  Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

San Diego Supercomputer Centerwww.iRODS.org 11

Content Outline

• State of the art• Where we stand • Concepts

• What is next, new, hot and exciting?• An use case - LSST• Yesterday’s research - now• Today’s research - future?

• What could be done from OGF, SNIA, IETF??• Standard for distributed data management• Risks, rewards

Page 12: San Diego Supercomputer Center  Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

San Diego Supercomputer Centerwww.iRODS.org 12

OGF, SNIA and iRODS.org

• Collaborative data management• FAN / Data grid??? - but still Distributed data management• But still needs a standard simple API as a standard

• Data grid namespace on XAM resources• Standardize a simple API (java, C/C++) to provide data grid concepts

on top of existing SNIA XAM or products

• Open source data grid software • Involve engineers from different participating member organizations

• Multi-institutional participation• Multiple countries, mulitple companies, academic and commercial

participants

Page 13: San Diego Supercomputer Center  Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

San Diego Supercomputer Centerwww.iRODS.org 13

Enthusiasm is contagious

http://www.iRODS.org