paul avery university of florida phys.ufl/~avery/ [email protected]

53
UT Arlington Colloquium (Jan. 24, 20 Paul Avery 1 Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ [email protected] Physics Colloquium University of Texas at Arlington Jan. 24, 2002 Global Data Grids for 21 st Century Science

Upload: montana

Post on 12-Jan-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Global Data Grids for 21 st Century Science. Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ [email protected]. Physics Colloquium University of Texas at Arlington Jan. 24, 2002. What is a Grid?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 1

Paul AveryUniversity of Floridahttp://www.phys.ufl.edu/~avery/[email protected]

Physics ColloquiumUniversity of Texas at Arlington

Jan. 24, 2002

Global Data Grids for21st Century Science

Page 2: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 2

What is a Grid?Grid: Geographically distributed computing

resources configured for coordinated use

Physical resources & networks provide raw capability

“Middleware” software ties it together

Page 3: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 3

Applications for GridsClimate modeling

Climate scientists visualize, annotate, & analyze Terabytes of simulation data

BiologyA biochemist exploits 10,000 computers to screen

100,000 compounds in an hour

High energy physics3,000 physicists worldwide pool Petaflops of CPU

resources to analyze Petabytes of data

EngineeringCivil engineers collaborate to design, execute, & analyze

shake table experimentsA multidisciplinary analysis in aerospace couples code and

data in four companies

From Ian Foster

Page 4: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 4

Applications for Grids (cont.)Application Service Providers

A home user invokes architectural design functions at an application service provider

An application service provider purchases cycles from compute cycle providers

CommercialScientists at a multinational soap company design a new

product

CommunitiesAn emergency response team couples real time data,

weather model, population dataA community group pools members’ PCs to analyze

alternative designs for a local road

HealthHospitals and international agencies collaborate on

stemming a major disease outbreak From Ian Foster

Page 5: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 5

Proto-Grid: SETI@homeCommunity: SETI researchers + enthusiastsArecibo radio data sent to users (250KB data

chunks)Over 2M PCs used

Page 6: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 6

Community1000s of home computer

usersPhilanthropic computing

vendor (Entropia)Research group (Scripps)

Common goalAdvance AIDS research

More Advanced Proto-Grid:Evaluation of AIDS Drugs

Page 7: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 7

Early Information Infrastructure

Network-centricSimple, fixed end systemsFew embedded capabilitiesFew servicesNo user-level quality of service

O(108) nodes

Network

Page 8: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 8

Emerging Information Infrastructure

Application-centricHeterogeneous, mobile end-systemsMany embedded capabilitiesRich servicesUser-level quality of service

QoS

ResourceDiscovery

O(1010) nodes

Qualitatively different,not just “faster andmore reliable”

Processing

Grid

Caching

Page 9: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 9

Why Grids?Resources for complex problems are distributed

Advanced scientific instruments (accelerators, telescopes, …)

Storage and computingGroups of people

Communities require access to common servicesScientific collaborations (physics, astronomy, biology, eng.

…)Government agenciesHealth care organizations, large corporations, …

Goal is to build “Virtual Organizations”Make all community resources available to any VO

memberLeverage strengths at different institutions Add people & resources dynamically

Page 10: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 10

Grid ChallengesOverall goal

Coordinated sharing of resources

Technical problems to overcomeAuthentication, authorization, policy, auditingResource discovery, access, allocation, controlFailure detection & recoveryResource brokering

Additional issue: lack of central control & knowledge

Preservation of local site autonomyPolicy discovery and negotiation important

Page 11: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 11

Layered Grid Architecture(Analogy to Internet Architecture)

Application

FabricControlling things locally:Accessing, controlling resources

ConnectivityTalking to things:communications, security

ResourceSharing single resources:negotiating access, controlling use

CollectiveManaging multiple resources:ubiquitous infrastructure services

UserSpecialized services:App. specific distributed services

InternetTransport

Application

Link

Inte

rnet P

roto

col

Arc

hite

ctu

re

From Ian Foster

Page 12: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 12

Globus Project and ToolkitGlobus Project™ (Argonne + USC/ISI)

O(40) researchers & developers Identify and define core protocols and services

Globus Toolkit™A major product of the Globus ProjectReference implementation of core protocols & servicesGrowing open source developer community

Globus Toolkit used by all Data Grid projects todayUS: GriPhyN, PPDG, TeraGrid, iVDGLEU: EU-DataGrid and national projects

Page 13: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 13

Globus General ApproachDefine Grid protocols & APIs

Protocol-mediated access to remote resources

Integrate and extend existing standards

Develop reference implementationOpen source Globus ToolkitClient & server SDKs, services, tools,

etc.

Grid-enable wide variety of toolsGlobus ToolkitFTP, SSH, Condor, SRB, MPI, …

Learn about real world problemsDeploymentTestingApplications

Diverse global services

Coreservice

s

Diverse OS services

Applications

Page 14: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 14

Globus Toolkit ProtocolsSecurity (connectivity layer)

Grid Security Infrastructure (GSI)

Resource management (resource layer)Grid Resource Allocation Management (GRAM)

Information services (resource layer)Grid Resource Information Protocol (GRIP)

Data transfer (resource layer)Grid File Transfer Protocol (GridFTP)

Page 15: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 15

Data Grids

Page 16: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 16

Data Intensive Science: 2000-2015Scientific discovery increasingly driven by IT

Computationally intensive analysesMassive data collectionsData distributed across networks of varying capabilityGeographically distributed collaboration

Dominant factor: data growth (1 Petabyte = 1000 TB)

2000 ~0.5 Petabyte2005 ~10 Petabytes2010 ~100 Petabytes2015 ~1000 Petabytes?

How to collect, manage,access and interpret thisquantity of data?

Drives demand for “Data Grids” to handleadditional dimension of data access & movement

Page 17: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 17

Global Data Grid Challenge

“Global scientific communities will perform computationally demanding analyses of distributed datasets that will grow by at least 3 orders of magnitude over the next decade, from the 100 Terabyte to the 100 Petabyte scale.”

Page 18: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 18

Data Intensive Physical SciencesHigh energy & nuclear physicsGravity wave searches

LIGO, GEO, VIRGO

Astronomy: Digital sky surveysNow: Sloan Sky Survey, 2MASSFuture: VISTA, other Gigapixel arrays“Virtual” Observatories (Global Virtual Observatory)

Time-dependent 3-D systems (simulation & data)Earth ObservationClimate modelingGeophysics, earthquake modelingFluids, aerodynamic designPollutant dispersal scenarios

Page 19: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 19

Data Intensive Biology and Medicine

Medical dataX-Ray, mammography data, etc. (many petabytes)Digitizing patient records (ditto)

X-ray crystallographyBright X-Ray sources, e.g. Argonne Advanced Photon

Source

Molecular genomics and related disciplinesHuman Genome, other genome databasesProteomics (protein structure, activities, …)Protein interactions, drug delivery

Brain scans (3-D, time dependent)Virtual Population Laboratory (proposed)

Database of populations, geography, transportation corridors

Simulate likely spread of disease outbreaks

Craig Venter keynote@SC2001

Page 20: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 20

Data and CorporationsCorporations and Grids

National, international, globalBusiness units, research teamsSales dataTransparent access to distributed databases

Corporate issuesShort term and long term partnershipsOverlapping networksManage, control access to data and resourcesSecurity

Page 21: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 21

Example: High Energy Physics“Compact” Muon Solenoid

at the LHC (CERN)

Smithsonianstandard man

Page 22: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 22

LHC Computing Challenges“Events” resulting from beam-beam collisions:

Signal event is obscured by 20 overlapping uninteresting collisions in same crossing

CPU time does not scale from previous generations

2000 2007

Page 23: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 23

All charged tracks with pt > 2 GeV

Reconstructed tracks with pt > 25 GeV

(+30 minimum bias events)

109 events/sec, selectivity: 1 in 1013

LHC: Higgs Decay into 4 muons

Page 24: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 24

1800 Physicists150 Institutes32 Countries

LHC Computing ChallengesComplexity of LHC interaction environment & resulting dataScale: Petabytes of data per year (100 PB by ~2010-12)GLobal distribution of people and resources

Page 25: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 25

Tier0 CERNTier1 National LabTier2 Regional Center (University, etc.)Tier3 University workgroupTier4 Workstation

Global LHC Data Grid

Tier 1

T2

T2

T2

T2

T2

3

3

3

3

3

3

3

3

3

3

3

Tier 0 (CERN)

4 4 4 4

3 3

Key ideas:Hierarchical structureTier2 centers

Page 26: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 26

Global LHC Data Grid

Tier2 Center

Online System

CERN Computer Center > 20

TIPS

USA CenterFrance Center

Italy Center UK Center

InstituteInstituteInstituteInstitute ~0.25TIPS

Workstations,other portals

~100 MBytes/sec

2.5 Gbits/sec

100 - 1000

Mbits/sec

Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute has ~10 physicists working on one or more channels

Physics data cache

~PBytes/sec

2.5 Gbits/sec

Tier2 CenterTier2 CenterTier2 Center

~622 Mbits/sec

Tier 0 +1

Tier 1

Tier 3

Tier 4

Tier2 Center Tier 2

Experiment CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1

Page 27: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 27

Example: Global Virtual Observatory

Source Catalogs Image Data

Specialized Data:Spectroscopy, Time Series,

PolarizationInformation Archives:

Derived & legacy data: NED,Simbad,ADS, etcDiscovery Tools:

Visualization, Statistics

Standards

Multi-wavelength astronomy,Multiple surveys

Page 28: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 28

GVO Data ChallengeDigital representation of the sky

All-sky + deep fields Integrated catalog and image databasesSpectra of selected samples

Size of the archived data40,000 square degreesResolution < 0.1 arcsec > 50 trillion pixelsOne band (2 bytes/pixel) 100 TerabytesMulti-wavelength: 500-1000 TerabytesTime dimension: Many Petabytes

Large, globally distributed database engines Integrated catalog and image databasesMulti-Petabyte data sizeGbyte/s aggregate I/O speed per site

Page 29: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 29

Sloan Digital Sky Survey Data Grid

Page 30: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 30

LIGO (Gravity Wave) Data Grid

HanfordObservatory

LivingstonObservatory

Caltech

MIT

INet2Abilene

Tier1 LSCLSC

LSCLSC

LSCTier2

OC3

OC48

OC3

OC12

OC48

Page 31: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 31

Data Grid Projects

Page 32: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 32

Large Data Grid ProjectsFunded projects

GriPhyN USA NSF $11.9M + $1.6M 2000-2005EU DataGrid EU EC €10M 2001-2004PPDG USA DOE $9.5M 2001-2004TeraGrid USA NSF $53M 2001-? iVDGL USA NSF $13.7M + $2M 2001-2006DataTAG EU EC €4M 2002-2004

Proposed projectsGridPP UK PPARC >$15M? 2001-2004

Many national projects Initiatives in US, UK, Italy, France, NL, Germany, Japan, …EU networking initiatives (Géant, SURFNet)

Page 33: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 33

Future• OO-collection export• Cache, state tracking• Prediction

PPDG Middleware Components

Object- and File-basedApplication Services (Request Interpreter)

Cache Manager

File Access Service (Request Planner)

Matchmaking Service

Cost EstimationFile

FetchingService

File Replication Index

End-to-End Network Services

Mass Storage Manager

Resource Management

File Mover

File Mover

Site Boundary

Security Domain

Page 34: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 34

Work Package

Work Package title Lead contractor

WP1 Grid Workload Management INFN

WP2 Grid Data Management CERN

WP3 Grid Monitoring Services PPARC

WP4 Fabric Management CERN

WP5 Mass Storage Management PPARC

WP6 Integration Testbed CNRS

WP7 Network Services CNRS

WP8 High Energy Physics Applications CERN

WP9 Earth Observation Science Applications ESA

WP10 Biology Science Applications INFN

WP11 Dissemination and Exploitation INFN

WP12 Project Management CERN

EU DataGrid Project

Page 35: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 35

GriPhyN: PetaScale Virtual-Data Grids

Virtual Data Tools

Request Planning &

Scheduling ToolsRequest Execution & Management Tools

Transforms

Distributed resources(code, storage, CPUs,networks)

Resource Management

Services

Resource Management

Services

Security and Policy

Services

Security and Policy

Services

Other Grid ServicesOther Grid

Services

Interactive User Tools

Production TeamIndividual Investigator Workgroups

Raw data source

~1 Petaflop~100 Petabytes

Page 36: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 36

GriPhyN Research AgendaVirtual Data technologies (fig.)

Derived data, calculable via algorithm Instantiated 0, 1, or many times (e.g., caches)“Fetch value” vs “execute algorithm”Very complex (versions, consistency, cost calculation, etc)

LIGO example“Get gravitational strain for 2 minutes around each of 200

gamma-ray bursts over the last year”

For each requested data value, need toLocate item location and algorithm Determine costs of fetching vs calculatingPlan data movements & computations required to obtain

results Execute the plan

Page 37: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 37

Virtual Data in Action

Data request may Compute locally Compute remotely Access local data Access remote data

Scheduling based on Local policies Global policies Cost

Major facilities, archives

Regional facilities, caches

Local facilities, cachesFetch item

Page 38: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 38

GriPhyN/PPDG Data Grid Architecture

Application

Planner

Executor

Catalog Services

Info Services

Policy/Security

Monitoring

Repl. Mgmt.

Reliable TransferService

Compute Resource Storage Resource

DAG

DAG

DAGMAN, Kangaroo

GRAM GridFTP; GRAM; SRM

GSI, CAS

MDS

MCAT; GriPhyN catalogs

GDMP

MDS

Globus

= initial solution is operational

Page 39: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 39

Transparency wrt materialization

Id Trans F ParamName …i1 F X F.X …i2 F Y F.Y …i10 G Y P G(P).Y …

Trans Prog Cost …F URL:f 10 …G URL:g 20 …

Program storage

Trans. name

URLs for program location

Derived Data Catalog

Transformation Catalog

Update uponmaterialization

App specific attr. id …… i2,i10……

Derived Metadata Catalog

id

Id Trans Param Name …i1 F X F.X …i2 F Y F.Y …i10 G Y P G(P).Y …

Trans Prog Cost …F URL:f 10 …G URL:g 20 …

Program storage

Trans. name

URLs for program location

App-specific-attr id … … i2,i10……

id

Physical file storage

URLs for physical file location

Name LObjN …

F.X logO3 … …

LCN PFNs …logC1 URL1logC2 URL2 URL3logC3 URL4logC4 URL5 URL6

Metadata Catalog

Replica Catalog

Logical Container Name

GCMS

Object Name

Transparency wrt location

Name LObjN … … X logO1 … …Y logO2 … …F.X logO3 … …G(1).Y logO4 … …

LCN PFNs …logC1 URL1logC2 URL2 URL3logC3 URL4logC4 URL5 URL6

Replica Catalog

GCMSGCMS

Object Name

Catalog Architecture

Metadata Catalog

Page 40: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 40

April 2001CaltechNCSAWisconsin NCSA Linux

cluster

5) Secondary reports complete to master

Master Condor job running at

Caltech

7) GridFTP fetches data from UniTree

NCSA UniTree - GridFTP-enabled FTP server

4) 100 data files transferred via GridFTP, ~ 1 GB each

Secondary Condor job on UW pool

3) 100 Monte Carlo jobs on Wisconsin Condor pool

2) Launch secondary job on Wisconsin pool; input files via Globus GASS

Caltech workstation

6) Master starts reconstruction jobs via Globus jobmanager on cluster

8) Processed objectivity database stored to UniTree

9) Reconstruction job reports complete to master

Early GriPhyN Challenge Problem:CMS Data Reconstruction

Page 41: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 41

0

20

40

60

80

100

120

Pre / Simulation Jobs / Post (UW Condor)

ooHits at NCSA

ooDigis at NCSA

Delay due to script error

Trace of a Condor-G Physics Run

Page 42: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 42

iVDGL: A World Grid Laboratory

International Virtual-Data Grid LaboratoryA global Grid laboratory (US, EU, Asia, …)A place to conduct Data Grid tests “at scale”A mechanism to create common Grid infrastructureA facility to perform production exercises for LHC

experimentsA laboratory for other disciplines to perform Data Grid tests

US part funded by NSF: Sep. 25, 2001$13.65M + $2M

“We propose to create, operate and evaluate, over asustained period of time, an international researchlaboratory for data-intensive science.”

From NSF proposal, 2001

Page 43: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 43

iVDGL Summary InformationPrincipal components

Tier1 sites (laboratories)Tier2 sites (universities)Selected Tier3 sites (universities)Fast networks: US, Europe, transatlantic, transpacificGrid Operations Center (GOC)Computer Science support teams (6 UK Fellows)Coordination, management

Proposed international participants Initially US, EU, Japan, AustraliaOther world regions laterDiscussions w/ Russia, China, Pakistan, India, Brazil

Complementary EU project: DataTAGTransatlantic network from CERN to STAR-TAP (+ people) Initially 2.5 Gb/s

Page 44: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 44

U Florida CMSCaltech CMS, LIGOUC San Diego CMS, CS Indiana U ATLAS, iGOCBoston U ATLASU Wisconsin, Milwaukee LIGOPenn State LIGO Johns Hopkins SDSS, NVOU Chicago CSU Southern California CSU Wisconsin, Madison CSSalish Kootenai Outreach, LIGOHampton U Outreach, ATLASU Texas, Brownsville Outreach, LIGOFermilab CMS, SDSS, NVOBrookhaven ATLASArgonne LabATLAS, CS

US iVDGL Proposal Participants

T2 / Software

CS support

T3 / Outreach

T1 / Labs(not funded)

Page 45: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 45

Initial US-iVDGL Data Grid

Tier1 (FNAL)Proto-Tier2Tier3 university

Caltech/UCSDFlorida

Wisconsin

Fermilab BNLIndiana

BUMichigan

Other sites to be added in

2002

SKC

Brownsville

Hampton

PSU

Page 46: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 46

iVDGL Map (2002-2003)

Tier0/1 facility

Tier2 facility

10 Gbps link

2.5 Gbps link

622 Mbps link

Other link

Tier3 facility

DataTAG

Surfnet

Page 47: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 47

“Infrastructure” Data Grid ProjectsGriPhyN (US, NSF)

Petascale Virtual-Data Gridshttp://www.griphyn.org/

Particle Physics Data Grid (US, DOE)Data Grid applications for HENPhttp://www.ppdg.net/

European Data Grid (EC, EU)Data Grid technologies, EU deploymenthttp://www.eu-datagrid.org/

TeraGrid Project (US, NSF)Dist. supercomp. resources (13 TFlops)http://www.teragrid.org/

iVDGL + DataTAG (NSF, EC, others)Global Grid lab & transatlantic network

Collaborations of application scientists & computer scientists

Focus on infrastructure development & deployment

Broad application

Page 48: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 48

Data Grid Project TimelineGriPhyN approved, $11.9M+$1.6M

EU DataGrid approved, $9.3M

1st Grid coordination meeting

PPDG approved, $9.5M

2nd Grid coordination meeting

iVDGL approved, $13.65M+$2M

TeraGrid approved ($53M)

Q1 02

Q4 00

Q1 01

Q2 01

Q3 01

Q4 013rd Grid coordination meeting

4th Grid coordination meeting

DataTAG approved (€4M) LHC Grid Computing Project

Page 49: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 49

Need for Common Grid Infrastructure

Grid computing sometimes compared to electric gridYou plug in to get a resource (CPU, storage, …)You don’t care where the resource is located

Want to avoid this situation in Grid computing

This analogy is more appropriate than originally intendedIt expresses a USA viewpoint uniform power gridWhat happens when you travel around the world?

Different frequencies 60 Hz, 50 HzDifferent voltages 120 V, 220 VDifferent sockets! USA, 2 pin, France, UK, etc.

Page 50: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 50

Role of Grid Infrastructure Provide essential common Grid services

Cannot afford to develop separate infrastructures(Manpower, timing, immediate needs, etc.)

Meet needs of high-end scientific & engin’g collaborations

HENP, astrophysics, GVO, earthquake, climate, space, biology, …

Already international and even global in scopeDrive future requirements

Be broadly applicable outside scienceGovernment agencies: National, regional (EU), UNNon-governmental organizations (NGOs)Corporations, business networks (e.g., suppliers, R&D)Other “virtual organizations” (see Anatomy of the Grid)

Be scalable to the Global level

Page 51: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 51

Grid Coordination EffortsGlobal Grid Forum (GGF)

www.gridforum.org International forum for general Grid effortsMany working groups, standards definitionsNext one in Toronto, Feb. 17-20

HICB (High energy physics)Represents HEP collaborations, primarily LHC experiments Joint development & deployment of Data Grid middlewareGriPhyN, PPDG, TeraGrid, iVDGL, EU-DataGrid, LCG,

DataTAG, CrossgridCommon testbed, open source software modelSeveral meeting so far

New infrastructure Data Grid projects?Fold into existing Grid landscape (primarily US + EU)

Page 52: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 52

SummaryData Grids will qualitatively and quantitatively

change the nature of collaborations and approaches to computing

The iVDGL will provide vast experience for new collaborations

Many challenges during the coming transitionNew grid projects will provide rich experience and lessonsDifficult to predict situation even 3-5 years ahead

Page 53: Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

UT Arlington Colloquium (Jan. 24, 2002)

Paul Avery 53

Grid References Grid Book

www.mkp.com/grids Globus

www.globus.org Global Grid Forum

www.gridforum.org TeraGrid

www.teragrid.org EU DataGrid

www.eu-datagrid.org PPDG

www.ppdg.net GriPhyN

www.griphyn.org iVDGL

www.ivdgl.org