the doe science grid computing and data infrastructure for large-scale science william johnston,...

21
The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest National Lab Ian Foster, Argonne National Lab Al Giest, Oak Ridge National Lab Bill Kramer, National Energy Research Scientific Computing Center and the DOE Science Grid Engineering Team http://doesciencegrid.org

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

The DOE Science Grid

Computing and Data Infrastructure for Large-Scale Science

William Johnston,Lawrence Berkeley National Lab

Ray Bair, Pacific Northwest National Lab

Ian Foster, Argonne National Lab

Al Giest, Oak Ridge National Lab

Bill Kramer, National Energy ResearchScientific Computing Center

and the DOE Science Grid Engineering Team

http://doesciencegrid.org

Page 2: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 2

The Need for Science Grids

The nature of how large scale science is done is changing distributed data, computing, people, instruments instruments integrated with large-scale computing

“Grids” are middleware designed to facilitate the routine interactions of all of these resources in order to support widely distributed,multi-institutional science and engineering.

Page 3: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 3

Distributed Science Example:Supernova Cosmology

“Supernova cosmology” is cosmology that is based on finding and observing special types of supernova during the few weeks of their observable life

It has lead to some remarkable science(Science magazine’s “Breakthrough of the year award” (1998): Supernova cosmology indicates universe expands forever),however it is rapidly becoming limited by the ability of the researchers to manage the complex data-computing-instrument interactions

Page 4: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

Supernova Cosmology Requires Complex,Widely Distributed Workflow Management

Page 5: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 5

Supernova Cosmology This is one of the class of problems that Grids are

focused on. It involves: management of complex workflow reliable, wide area, high volume data management inclusion of supercomputers in time constrained

scenarios easily accessible pools of computing resources eventual inclusion of instruments that will be

semi-automatically retargeted based on data analysis and simulation

next generation will generate vastly more data(from SNAP - satellite based observation)

Page 6: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 6

What are Grids?

Middleware foruniform, secure, and highly capable access tolarge and small scale computing, data, and instrument systems, all of which are distributed across organizations

Services supporting construction ofapplication frameworks and science portals

Persistent infrastructure for distributed applications(e.g. security services and resource discovery)

200 people working on standards at the IETF-likeGlobal Grid Forum (www.gridforum.org)

Page 7: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 7

Grids There are several different types of user of Grid

services discipline scientists problem solving system / framework / science

portal developers computational tool / application writers Grid system managers Grid service builders

Each of these user communities have somewhat different requirements for Grids, and the Grid services available or under development are trying to address all of these groups

Page 8: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

Gri

d

Info

rmat

ion

S

ervi

ce

Un

ifo

rmR

eso

urc

eA

cces

s

Bro

keri

ng

Glo

bal

Q

ueu

ing

Glo

bal

Eve

nt

Ser

vice

s

Co

-S

ched

uli

ng

Dat

a M

ana

gem

ent

Un

ifo

rm D

ata

Acc

ess

Co

llab

ora

tio

n

and

Rem

ote

In

stru

men

t S

ervi

ces

Net

wo

rk

Cac

he

Co

mm

un

icat

ion

S

ervi

ces

Au

then

tic

atio

n

Au

tho

riza

tio

n

Sec

uri

ty

Ser

vice

s

Au

dit

ing

Fau

lt

Man

ag

emen

t

Mo

nit

ori

ng

Grid Common Services: Standardized Services and Resources Interfaces

Web Services and Portal ToolkitsApplications (Simulations, Data Analysis, etc.)

Application Toolkits (Visualization, Data Publication/Subscription, etc.)Execution support and Frameworks (Globus MPI, Condor-G, CORBA-G)

Distributed Resources

Science Portals andScientific Workflow Management Systems

Condor poolsof

workstations

network caches

tertiary storage

scientific instrumentsclusters

national supercomputer

facilities

Architecture of a Grid

= operational services (Globus, SRB)

High Speed Communication Services

Page 9: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 9

State of Grids Grids are real, and they are useful now Basic Grid services are being deployed to support

uniform and secure access to computing, data, and instrument systems that are distributed across organizations

Grid execution management tools (e.g.Condor-G) are being deployed

Data services, such as uniform access to tertiary storage systems and global metadata catalogues, are being deployed (e.g. GridFTP andStorage Resource Broker)

Web services supporting application frameworks and science portals are being prototyped

Page 10: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 10

State of Grids Persistent infrastructure is being built

Grid services are being maintained on the compute and data systems of interest (Grid sysadmin)

cryptographic authentication supporting single sign-onis provided through Public Key Infrastructure

resource discovery services are being maintained(Grid Information Service – distributed directory service)

This is happening, e.g., in the DOE Science Grid,EU Data Grid, UK eScience Grid, NASA’s IPG, etc.

For DOE science projects, ESNet is running a PKI Certification Authority and assisting with policy issues among DOE Labs and their collaborators

Page 11: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 11

DOE Science Grid SciDAC project to explore the issues for providing

persistent operational Grid support in the DOE environment: LBNL, NERSC, PNNL, ANL, and ORNL Initial computing resources

10 small, medium, and large clusters

High bandwidth connectivity end-to-end(high-speed links from site systems to ESNet gateways)

Storage resources: four tertiary storage systems (NERSC, PNNL, ANL, and ORNL)

Globus providing the Grid Common Services Collaboration with ESNet for security and directory

services

Page 12: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

Initial Science Grid Configuration

NERSCSupercomputing

& Large-Scale Storage

PNNL

LBNL

ANL

ESnet

Europe

DOEScience Grid

ORNL

ESNet

MDSCA

Grid Managed Resources

User InterfacesApplication Frameworks

Applications

Grid Services: Uniform access to distributed resources

Gri

d

Info

rmat

ion

S

ervi

ce

Un

ifo

rmR

eso

urc

eA

cces

s

Bro

keri

ng

Glo

bal

Qu

euin

g

Glo

bal

Eve

nt

Ser

vice

s

Co

-Sch

edu

lin

g

Dat

a C

atal

og

uin

g

Un

ifo

rm D

ata

Acc

ess

Co

llab

ora

tio

n

and

Rem

ote

In

stru

men

t S

ervi

ces

Net

wo

rk C

ach

e

Co

mm

un

icat

ion

S

ervi

ces

Au

then

tica

tio

n

Au

tho

riza

tio

n

Sec

uri

ty

Ser

vice

s

Au

dit

ing

Fau

lt

Man

agem

ent

Mo

nit

ori

ng

Asia-Pacific

Funded by the U.S. Dept. of Energy, Office of Science,Office of Advanced Scientific Computing Research,

Mathematical, Information, and Computational Sciences Division

Page 13: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

Grid Services: Uniform access to distributed resources

Science Portal and Application Framework

Gri

d

Info

rmat

ion

S

ervi

ce

Un

ifo

rmR

eso

urc

eA

cces

s

Bro

keri

ng

Glo

bal

Qu

euin

g

Glo

bal

Eve

nt

Ser

vice

s

Co

-Sch

edu

lin

g

Dat

a C

atal

og

uin

g

Un

ifo

rm D

ata

Acc

ess

Co

llab

ora

tio

n

and

Rem

ote

In

stru

men

t S

ervi

ces

Net

wo

rk C

ach

e

Co

mm

un

icat

ion

S

ervi

ces

Au

then

tica

tio

n

Au

tho

riza

tio

n

Sec

uri

ty

Ser

vice

s

Au

dit

ing

Fau

lt

Man

agem

ent

Mo

nit

ori

ng

compute and data management requests

Grid Managed Resources

NERSCSupercomputing

Large-Scale Storage

PNNLLBNL

ANL ORNL

SNAP

ESnet

PPDG

Asia-Pacific Europe

DOE Science Gridand the

DOE Science Environment

Page 14: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 14

The DOE Science Grid Program:Three Strongly Linked Challenges

How do we reliably and effectively deploy and operate aDOE Science Grid? Requires coordinated effort by multiple labs ESNet for directory and certificate services Manage basic software plus import other R&D work What else? Will see.

Application partnerships linking computer scientists and application groups How do we exploit Grid infrastructure to facilitate DOE

applications?

Enabling R&D Extending technology base for Grids Packaging Grid software for deployment Developing application toolkits Web services for science portals

Page 15: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 15

Roadmap for the Science Grid

CY2001 2002 2003 2004

Grid compute and data resourcesfederated across partner sites using Globus

Pilot simulation and collaboratory users

Production Grid Info. Services and Certificate Authority

Auditing and monitoring services

Grid system administration

Help desk, tutorials, applicationintegration support

Integrate R&D from other projects

ScalableScienceGrid

Page 16: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 16

SciDAC Applications and theDOE Science Grid

SciGrid has some computing and storage resources that can be made available to other SciDAC projects By “some” we mean that usage authorization

models do not change by incorporating a systems into the Grid

To compute on individual SciGrid systems you have to negotiate with the owners of those systems

However, all of the SciGrid systems have committed to provide some computing and data resources to SciGrid users

Page 17: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 17

SciDAC Applications and theDOE Science Grid

There are several ways to “join” the SciGrid As a user As a new SciGrid site (incorporating your resources)

There are different issues for users andnew SciGrid sites

Users: Users will get instruction on how to access Grid

services Users must obtain a SciGrid PKI identity certificate There is some client software that must run on the

user’s system

Page 18: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 18

SciDAC Applications and theDOE Science Grid

New SciGrid sites: New SciGrid sites (where you wish to incorporate

your resources into the SciGrid) need to join the Engineering Working Group

This is where the joint system admin issues are worked out

This is where Grid software issues are worked out Keith Jackson chairs the WG

Page 19: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 19

SciDAC Applications and theDOE Science Grid

New SciGrid sites may use theGrid Information Services (resource directory) of an existing site, or may set up their own

New SciGrid sites may also use their ownPKI Certification Authorities, however the issuing CAs must have published policy compatible with the ESNet CA

Entrust CAs will work, in principle – however there is very little practical experience with this, and a little additional software may be necessary

Page 20: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 20

Science Grid:A New Type of Infrastructure

Grid services providing standardized and highly capable distributed access to resources used by a community

Persistent services for distributed applications Support for building science portals

Vision: DOE Science Grid will lay the groundworkto support DOE Science applications that require, e.g., distributed collaborations, very large data volumes, unique instruments, and theincorporation of supercomputing resources into these environments

Page 21: The DOE Science Grid Computing and Data Infrastructure for Large-Scale Science William Johnston, Lawrence Berkeley National Lab Ray Bair, Pacific Northwest

DOE Science Grid 21

DOE Science Grid Level of Effort

LBNL/DSD, 2 FTE; NERSC, 1 FTE; ANL, 2 FTE; ORNL, 1.5 FTE; PNNL, 1.0 FTE Project Leadership (0.35 FTE): Johnston/LBNL,

Foster/ANL, Geist/ORNL, Bair/PNNL, Kramer/NERSC Directory and Certificate Infrastructure

(1.0 FTE): ANL and LBNL/ESnet Grid Deployment, Evaluation, Enhancement

(2.7 FTE): ANL, LBNL, ORNL, PNNL, NERSC Application and User Support

(1.5 FTE): ANL, ORNL, PNNL Monitoring and Auditing Infrastructure and Security Model

(2.0 FTE): ANL, LBNL, ORN