science for the future: strategies for moving and sharing data

33
globus online Science for the Future Strategies for distributing and sharing data www.globusonline.org Ian Foster [email protected]

Upload: ian-foster

Post on 10-May-2015

407 views

Category:

Technology


2 download

DESCRIPTION

A talk at the National User Facility Organization (NUFO) 2013 meeting at LBNL, where the theme this year is "the future of scientific data."

TRANSCRIPT

Page 1: Science for the Future: Strategies for Moving and Sharing Data

globus online

Science for the Future

Strategies for distributing and sharing data

www.globusonline.org

Ian [email protected]

Page 2: Science for the Future: Strategies for Moving and Sharing Data

Big science data should be easy

RegistryStaging Store

IngestStore

AnalysisStore

Community Store

Archive Mirror

IngestStore

AnalysisStore

Community Store

Archive Mirror

Registry

Page 3: Science for the Future: Strategies for Moving and Sharing Data

… but it’s hard and frustrating!

RegistryStaging Store

IngestStore

AnalysisStore

Community Store

Archive Mirror

IngestStore

AnalysisStore

Community Store

Archive Mirror

Registry

Quotaexceeded

!

Expiredcredential

s

!

Networkfailed. Retry.

!

Permissiondenied

!

Page 4: Science for the Future: Strategies for Moving and Sharing Data

Excerpts from ESNet reports• “Transfers often take longer than expected

based on available network capacities”

• “Lack of an easy to use interface to some of the high-performance tools”

• “Tools [are] too difficult to install and use”

• “Time and interruption to other work required to supervise large data transfers”

• “Need data transfer tools that are easy to use, well-supported, and permitted by site and facility cybersecurity organizations”

Page 5: Science for the Future: Strategies for Moving and Sharing Data

We envisage a world where data …

… flows rapidly, reliably, and securely among:

experimental facilities, online and archival

storage, computing facilities, and remote institutions

Page 6: Science for the Future: Strategies for Moving and Sharing Data

We envisage a world where data …

… is easily integrated into dynamic datasets that also include metadata and programs necessary to understand and regenerate it

Page 7: Science for the Future: Strategies for Moving and Sharing Data

We envisage a world where data …

… is readily discoverable and accessible to collaborators, regardless of their and the data’s location

Page 8: Science for the Future: Strategies for Moving and Sharing Data

We believe a new approach is needed to deliver data

management infrastructure

FrictionlessAffordable

Sustainable

Like … but for science!

Page 9: Science for the Future: Strategies for Moving and Sharing Data

Focusing on “frictionless”, we’ve started to do this with the Globus Online service …

Transfer and sharing of large data sets …

… with dropbox-like characteristics …

… directly from your own storage systems

Page 10: Science for the Future: Strategies for Moving and Sharing Data

We started with reliable, secure, high-performance file transfer …

DataSource

DataDestinatio

n

User initiates transfer request

1

Globus Online moves and syncs files

2

Globus Online notifies user

3

Page 11: Science for the Future: Strategies for Moving and Sharing Data

… and then made it simple to share big data off existing storage systems

DataSource

User A selects file(s) to share, selects user or group, and sets permissions

1

Globus Online tracks shared files; no need to move files to cloud storage!

2

User B logs in to Globus Online and

accesses shared file

3

Page 12: Science for the Future: Strategies for Moving and Sharing Data

Early adoption is encouraging

Page 13: Science for the Future: Strategies for Moving and Sharing Data

Early adoption is encouraging

~18 PB and 1B files moved

10x (or better) performance vs. scp

99.9% availability

Page 14: Science for the Future: Strategies for Moving and Sharing Data
Page 15: Science for the Future: Strategies for Moving and Sharing Data

B. Winjum (UCLA) moves 900K-file plasma physics datasets UCLA NERSC

Page 16: Science for the Future: Strategies for Moving and Sharing Data

Dan Kozak (Caltech) replicates 1 PB LIGO astronomy data for resilience

Page 17: Science for the Future: Strategies for Moving and Sharing Data

Exemplar: APS Beamline 2-BM

X-Ray imaging, tomography, ~few µm to 30nm resolution

Currently can generate >100TB per day

<1GB/s data rate; ~3-5GB/s in 5-10 years

Page 18: Science for the Future: Strategies for Moving and Sharing Data

Transforming data acquisition

Current• Experimental parameters

optimized manually

• Collected data combined with visual inspection to confirm optimal condition

• Data reconstructed and sent to users via external drive

• User team starts data reduction at home institution

Page 19: Science for the Future: Strategies for Moving and Sharing Data

Transforming data acquisition

Envisaged• Experimental

parameters optimized automatically

• Collected data available to optimization programs

• Data are automatically reconstructed, reduced, and shared with local and remote participants

• User team leaves the APS with reduced data

Current• Experimental parameters

optimized manually

• Collected data combined with visual inspection to confirm optimal condition

• Data reconstructed and sent to users via external drive

• User team starts data reduction at home institution

Page 20: Science for the Future: Strategies for Moving and Sharing Data

Facility data acquisition

Globus Online as enabler

Globus Online transfer service

Reduced data

Analysis/SharingGlobus

Online sharing service

Globus Online dataset service*

* In development

Page 21: Science for the Future: Strategies for Moving and Sharing Data

21Credit: Kerstin Kleese-van Dam

Erin Miller (PNNL) collects data at Advanced Photon Source, renders at PNNL, and views at ANL

Page 22: Science for the Future: Strategies for Moving and Sharing Data

We believe a new approach is needed to deliver data

management infrastructure

FrictionlessAffordable

Sustainable

Page 23: Science for the Future: Strategies for Moving and Sharing Data

We’ve got a handle on “frictionless”

• Web interface, REST API, command line

• InCommon, Oauth, OpenID, X.509, …

• Credential management

• Group definition and management

• Transfer management and optimization

• Reliability via transfer retries

• Integration with ESNet “Science DMZs”

• One-click “Globus Connect” install

• 5-minute Globus Connect Multi User install

Page 24: Science for the Future: Strategies for Moving and Sharing Data

“Affordable” and “sustainable”?

Common expectation is either:– High-priced commercial software

(with generally higher levels of quality)

Or:– Free, open source software

(with generally lower levels of quality)

We aim to offer the best of all worlds!

Page 25: Science for the Future: Strategies for Moving and Sharing Data

We are a non-profit service provider to the non-profit

research community

Page 26: Science for the Future: Strategies for Moving and Sharing Data

Our challenge:

Sustainability

We are a non-profit service provider to the non-profit

research community

Page 27: Science for the Future: Strategies for Moving and Sharing Data

Starting at $20k per year

• Managed endpoints with sharing

• Multiple GridFTP servers per endpoint

• Branded web sites

• Alternate identity provider

• Usage reporting

• Mass storage system (MSS) optimizations

• Operations monitoring and management

• Input into and access to product roadmap

Globus Online Provider Plans

Page 28: Science for the Future: Strategies for Moving and Sharing Data

Provider Plan not required to get started

Use Globus Connect Multiuser to easily connect your resources with Globus

Go to: globusonline.org/gcmu

Registry

Staging Store

IngestStore

AnalysisStore

Community Store

Archive Mirror

IngestStore

AnalysisStore

Community Store

Archive Mirror

Registry

Page 29: Science for the Future: Strategies for Moving and Sharing Data

We hope you will join us

Page 30: Science for the Future: Strategies for Moving and Sharing Data

Providers are also using Globus Online as a platform

Globus Nexus (Identity, Group, Profile)

Sharing Service

Transfer Service

Dataset Services

Globus Toolkit

Glo

bu

s O

nlin

e A

PIs

Glo

bu

s C

on

nect

Page 31: Science for the Future: Strategies for Moving and Sharing Data

Early platform adopters

Page 32: Science for the Future: Strategies for Moving and Sharing Data

Our research is supported by:

U.S . DEPARTMENT OF

ENERGY

Page 33: Science for the Future: Strategies for Moving and Sharing Data

Questions

Contact: [email protected]

Providers: globusonline.org/provider-plans

Researchers: globusonline.org/plus

www.globusonline.org