development of the grid computing infrastructure daniel j. harvey acknowledgements nasa ames...

Development of the Development of the Grid Computing Grid Computing InfrastructureInfrastructureDaniel J. HarveyDaniel J. Harvey

AcknowledgementsAcknowledgementsNASA Ames Research Center NASA Ames Research Center

Sunnyvale, CaliforniaSunnyvale, California

Presentation OverviewPresentation Overview

What is grid computing?What is grid computing? Historical preliminariesHistorical preliminaries Grid potentialGrid potential Technological challengesTechnological challenges Current developmentsCurrent developments Future outlookFuture outlook Personal researchPersonal research

What is Grid Computing?What is Grid Computing?

The grid is a hardware/software infrastructure The grid is a hardware/software infrastructure that enables heterogeneous geographically that enables heterogeneous geographically separated clusters of processors to be separated clusters of processors to be connected in a virtual environment.connected in a virtual environment.

Heterogeneous:Heterogeneous: The computer configurations The computer configurations are highly varied.are highly varied.

Clusters:Clusters: Generally high performance computer Generally high performance computer configurations consisting of tens or hundreds of configurations consisting of tens or hundreds of processors that interact through processors that interact through interconnections.interconnections.

Virtual:Virtual: The users operate as if the entire grid The users operate as if the entire grid was a single integrated system.was a single integrated system.

Motivation for GridsMotivation for Grids The needThe need

Solutions to many large-scale computational problems Solutions to many large-scale computational problems are not feasible on existing supercomputer systems.are not feasible on existing supercomputer systems.

Expensive equipment are not always available locally.Expensive equipment are not always available locally. There is a growing demand for applications such as There is a growing demand for applications such as

virtual reality, data mining, and remote collaborationvirtual reality, data mining, and remote collaboration Possible solutionsPossible solutions

Many very powerful systems are not fully utilized.Many very powerful systems are not fully utilized. Pooling resources is cost effective.Pooling resources is cost effective. Communication technology is progressing rapidly.Communication technology is progressing rapidly. The internet has shown global interconnections to be The internet has shown global interconnections to be

effectiveeffective

Historical PreliminariesHistorical Preliminaries

Existing infrastructures in societyExisting infrastructures in society Power gridPower grid Transportation systemTransportation system Railroad systemRailroad system

RequirementsRequirements Universal accessUniversal access Standardized useStandardized use DependableDependable Cost Effective Cost Effective

Early Testbed ProjectEarly Testbed ProjectCorporation for National Research Corporation for National Research InitiativesInitiatives

When:When: Early 1990sEarly 1990s Where:Where: Cross ContinentCross Continent By Who:By Who: National National

Science Foundation (NSF) Science Foundation (NSF) and Defense Advanced and Defense Advanced Research Projects (DARPA)Research Projects (DARPA)

Funding:Funding: 15.8 million 15.8 million Purpose:Purpose: Improved Improved

Supercomputer utilization Supercomputer utilization and investigation of Gigabit and investigation of Gigabit network technologiesnetwork technologies

Early TestBed ExamplesEarly TestBed Examples AURORAAURORA – – Link MIT and University of Pennsylvania at Link MIT and University of Pennsylvania at

622 Mbps using fiber optics622 Mbps using fiber optics BLANCABLANCA – – Link AT&T Bell Labs, Berkeley, University of Link AT&T Bell Labs, Berkeley, University of

Wisconsin, LBL, Cray Research, University of Illinois with the Wisconsin, LBL, Cray Research, University of Illinois with the first nationwide testbed.first nationwide testbed.

CASACASA – – Link Los Alamos, Cal Tech, Jet Propulsion Link Los Alamos, Cal Tech, Jet Propulsion Laboratory, San Diego Supercomputing center. High speed Laboratory, San Diego Supercomputing center. High speed point to point networkpoint to point network..

NECTARNECTAR – – Link Pittsburgh Supercomputer Center and Link Pittsburgh Supercomputer Center and Carnegie Mellon University. Interface LAN’s to high speed Carnegie Mellon University. Interface LAN’s to high speed backbones.backbones.

VISTANETVISTANET – Link– Link University of North Carolina, North University of North Carolina, North Carolina State University, MCNC. Prototype for public Carolina State University, MCNC. Prototype for public switched gigabyte network.switched gigabyte network.

Early Results with CASAEarly Results with CASA

Chemical reaction modelingChemical reaction modeling Achieved super linear speedupAchieved super linear speedup 3.3 faster on two supercomputers 3.3 faster on two supercomputers

computers that speed achieved on either computers that speed achieved on either supercomputer.supercomputer.

Matched application to supercomputer Matched application to supercomputer characteristics characteristics

Modeling of ocean currentsModeling of ocean currents Calculations processed at UCLA, real time Calculations processed at UCLA, real time

visualization at Intel, displayed at the SDSCvisualization at Intel, displayed at the SDSC

Lessons LearnedLessons Learned Synchronization Synchronization

difficult requiring tedious manual interventiondifficult requiring tedious manual intervention HeterogeneityHeterogeneity

varied hardware, software, and operational varied hardware, software, and operational proceduresprocedures

Fault ToleranceFault Tolerance Critical for dependable serviceCritical for dependable service

Poor performancePoor performance Latency tolerance, redundant communication, Latency tolerance, redundant communication,

load-balance, communication protocol, load-balance, communication protocol, contention, routingcontention, routing

Security issuesSecurity issues

The I-WAY Project The I-WAY Project (IEEE Supercomputing ’95)(IEEE Supercomputing ’95)

PurposePurpose Connect the individual testbedsConnect the individual testbeds Exploit available resourcesExploit available resources

NSF, DOE, NASA networking infrastructureNSF, DOE, NASA networking infrastructure Supercomputers at dozens of laboratoriesSupercomputers at dozens of laboratories Broadband national networksBroadband national networks High-end graphics workstations High-end graphics workstations Virtual Reality (VR) and collaborative environmentsVirtual Reality (VR) and collaborative environments

Demonstrate innovative approaches to Demonstrate innovative approaches to scientific applicationsscientific applications

Examples of I-Way Examples of I-Way DemonstrationsDemonstrations

N-body galaxy simulationN-body galaxy simulation Coupled Cray, SGI, Thinking Machines, IBMCoupled Cray, SGI, Thinking Machines, IBM Virtual reality results displayed in the CAVE Virtual reality results displayed in the CAVE

at Supercomputing 95.at Supercomputing 95. Cave is a room-sized virtual environment.Cave is a room-sized virtual environment.

Industrial emission control systemIndustrial emission control system Coupled Chicago supercomputer with Cave Coupled Chicago supercomputer with Cave

in San Diego and Washington, D.C.in San Diego and Washington, D.C. Teleimmersion, colaborative environmentTeleimmersion, colaborative environment

Grid PotentialGrid PotentialApplications identified by I-WayApplications identified by I-Way

Large Scale Scientific ProblemsLarge Scale Scientific Problems simulations, data analysissimulations, data analysis

Data Intensive ApplicationsData Intensive Applications Data mining, distributed data storageData mining, distributed data storage

Teleimmersion applicationsTeleimmersion applications Virtual Reality, collaborationVirtual Reality, collaboration

Remote instrument controlRemote instrument control Particle accelerators, medical imaging, Particle accelerators, medical imaging,

feedback controlfeedback control

Current Grid NetworksCurrent Grid NetworksNational Technology Grid NASA Information Power Grid

Globus Ubiquitous Testbed Organization (GUSTO)

Atmospheric Ozone Atmospheric Ozone ObservationObservationLarge scale data collectionLarge scale data collection

DataGrid Testbed

Fundamental Particle Fundamental Particle ResearchResearchRemote instrumentationRemote instrumentation

DataGrid Testbed

Genome ResearchGenome ResearchData mining, code management, remote GUI Data mining, code management, remote GUI interfacesinterfaces

DataGrid Testbed

Teleimmersive Teleimmersive ApplicationsApplications

Electronic Visualization Laboratory University of Illinois at Chicago

Supercomputer SimulationsSupercomputer SimulationsParallel execution and visualizationParallel execution and visualization

Cactus Computational ToolkitUniversity of Illinois at Urbana-Champaign

Couple Photon Source to Couple Photon Source to SupercomputersSupercomputersAdvanced visualization and computationAdvanced visualization and computation

Argonne’s National Laboratory

Technological ChallengesTechnological ChallengesOperationOperation AuthenticationAuthentication AuthorizationAuthorization Resource discoveryResource discovery Resource allocationResource allocation Fault recoveryFault recovery Staging of executablesStaging of executables HeterogeneityHeterogeneity InstrumentationInstrumentation Auditing and Auditing and

accountingaccounting

Application SoftwareApplication Software Data storageData storage Load balancingLoad balancing Latency toleranceLatency tolerance Redundant Redundant

communicationcommunication Existing softwareExisting software

Connectivity

• Latency and bandwidth

• Communication protocol

OperationOperation Authentication and AuthorizationAuthentication and Authorization

Verify who is attempting to use available resourcesVerify who is attempting to use available resources Verify that access to a particular resource is acceptableVerify that access to a particular resource is acceptable

Resource discovery and allocationResource discovery and allocation Published list of resourcesPublished list of resources Updated resource statusUpdated resource status

Fault recovery, synchronization and staging Fault recovery, synchronization and staging executionexecution Dynamic reallocationDynamic reallocation Run time dependenciesRun time dependencies

Heterogeneity and portabilityHeterogeneity and portability Different policies and resources on different systemsDifferent policies and resources on different systems

Application SoftwareApplication Software Data storageData storage

Access to distributed data basesAccess to distributed data bases Load balancingLoad balancing

Workload needs to be balanced across all Workload needs to be balanced across all systemssystems

Latency toleranceLatency tolerance Streaming, overlap processing and Streaming, overlap processing and

communicationcommunication Redundant communicationRedundant communication

Implementation of staged communicationImplementation of staged communication Existing softwareExisting software

Preserving existing software as much as possiblePreserving existing software as much as possible

ConnectivityConnectivity

LatencyLatency Internet based connections vary by 1,000 %Internet based connections vary by 1,000 %

BandwidthBandwidth Mixture of high speed gigabyte connections Mixture of high speed gigabyte connections

and slow speed local connectionsand slow speed local connections ProtocolProtocol

Minimize communication hopsMinimize communication hops ““guarantee of service” instead of “best effort” guarantee of service” instead of “best effort” Adapted to application requirementsAdapted to application requirements

Maintenance and costMaintenance and cost

The Grid environmentThe Grid environment TransparentTransparent

Present the look and feel of a single coupled systemPresent the look and feel of a single coupled system IncrementalIncremental

Minimal initial programming modifications requiredMinimal initial programming modifications required Additional grid benefits achieved over timeAdditional grid benefits achieved over time

PortablePortable Available and easy to install on all popular platformsAvailable and easy to install on all popular platforms

Grid based toolsGrid based tools Minimize effort to implement grid aware Minimize effort to implement grid aware

applicationsapplications

Current DevelopmentsCurrent Developments Models for a grid computing Models for a grid computing

infrastructureinfrastructure Corba (extending “off the shelf” technology)Corba (extending “off the shelf” technology) Legion (object oriented)Legion (object oriented) Globus (toolkit approach)Globus (toolkit approach)

Commercial interestCommercial interest IBM has invested four billion dollars to utilize IBM has invested four billion dollars to utilize

Globus for implementation of grid-based Globus for implementation of grid-based computer farms around the world.computer farms around the world.

NASA is developing its Information Power Grid NASA is developing its Information Power Grid using Globus for the framework.using Globus for the framework.

The European Union is developing the Data The European Union is developing the Data GridGrid

CorbaCorbaA set of facilities linked through “off-the-shelf” A set of facilities linked through “off-the-shelf” packagespackages

Client Server ModelClient Server Model Web based technologyWeb based technology Wrap existing code in Java objectsWrap existing code in Java objects Exploit Object level and thread level Exploit Object level and thread level

parallelismparallelism Utilize Current public-key security Utilize Current public-key security

techniquestechniques

LegionLegionVision:Vision:A single unified virtual machine that ties A single unified virtual machine that ties together millions of processors and objectstogether millions of processors and objects

Object OrientedObject Oriented Each object defines the rules for accessEach object defines the rules for access Well defined object interfacesWell defined object interfaces A core set of objects provide basic servicesA core set of objects provide basic services Users can define and create their own objectsUsers can define and create their own objects Users and executing objects manipulate remote objectsUsers and executing objects manipulate remote objects

High PerformanceHigh Performance Users select hosts based on load and job affinityUsers select hosts based on load and job affinity Object wrapping for support for parallel programming.Object wrapping for support for parallel programming.

User AutonomyUser Autonomy Users choose scheduling policies and security Users choose scheduling policies and security

arrangements.arrangements.

GlobusGlobusAn An open-architectureopen-architecture integrated “bag” of basic grid integrated “bag” of basic grid servicesservices

Middleware layered architectureMiddleware layered architecture builds global services on top of core local services.builds global services on top of core local services.

Translucent interfaces to core Globus Translucent interfaces to core Globus servicesservices Well-defined interfaces that can be accessed Well-defined interfaces that can be accessed

directly by applications.directly by applications. Interfaces that are indirectly accessed through Interfaces that are indirectly accessed through

augmented software tools provided by developersaugmented software tools provided by developers Coexistence with existing applicationsCoexistence with existing applications

System EvolutionSystem Evolution Incremental implementationIncremental implementation Existing tools can be enhanced or replaced as Existing tools can be enhanced or replaced as

neededneeded

DataGrid ProjectDataGrid ProjectBased on GlobusBased on GlobusFunded by the European UnionFunded by the European Union

Globus Core ServicesGlobus Core Services

Meta-computing directory service (MDS)Meta-computing directory service (MDS) Status of global resourcesStatus of global resources

Global security infrastructure (GSI)Global security infrastructure (GSI) Authentication and authorizationAuthentication and authorization

Global resource allocation manager (GRAM)Global resource allocation manager (GRAM) Allocation of processors, instrumentation, memory, etc.Allocation of processors, instrumentation, memory, etc.

Global execution management (GEM) - Global execution management (GEM) - Staging Staging executablesexecutables

Heartbeat monitor (HBM) Heartbeat monitor (HBM) - - Fault detection and recoveryFault detection and recovery

Global access to secondary storage (GASS)Global access to secondary storage (GASS) Access to distributed data storage facilitiesAccess to distributed data storage facilities

Portable heterogeneous communication library Portable heterogeneous communication library (Nexus)(Nexus) Communication between processors, Parallel programmingCommunication between processors, Parallel programming

Future OutlookFuture Outlook

SuccessesSuccesses The basic Globus approach has proven The basic Globus approach has proven

successfulsuccessful Acceptable bandwidths are within reachAcceptable bandwidths are within reach

ChallengesChallenges ““speed of light” latency limitationsspeed of light” latency limitations Improved protocol schemesImproved protocol schemes

Less overhead, fewer hops, less susceptible to loadLess overhead, fewer hops, less susceptible to load Application specificApplication specific

Additional grid-based tools are neededAdditional grid-based tools are needed Open AreasOpen Areas

Which applications are applicable for grid Which applications are applicable for grid implementations?implementations?

Personal ResearchPersonal Research

What applications are suitable for grid solutionsWhat applications are suitable for grid solutions MinEX: A latency-tolerant dynamic partitioner for grid computing MinEX: A latency-tolerant dynamic partitioner for grid computing

applications, applications, Special Issue Journal of the FGCSSpecial Issue Journal of the FGCS, 2002, 2002 A Latency-tolerant partitioner for distributed computing on the A Latency-tolerant partitioner for distributed computing on the

Information Power Grid, Information Power Grid, IPDPS proceedings, 2001IPDPS proceedings, 2001 Latency hiding in partitioning and load balancing of grid Latency hiding in partitioning and load balancing of grid

computing applications, computing applications, IEEE International Symposium CCGridIEEE International Symposium CCGrid, , 2001, Finalist for best paper2001, Finalist for best paper

Partitioning schemePartitioning scheme to maximize grid-based to maximize grid-based performance performance

Performance of a heterogeneous grid partitioner for N-body Performance of a heterogeneous grid partitioner for N-body Applications, Applications, ICPP proceedingsICPP proceedings, 2002, under review, 2002, under review

Helicopter rotor testHelicopter rotor test

Three adaptationsThree adaptations

13,967 - 137,474 vertices13,967 - 137,474 vertices

60,968 - 137,414 60,968 - 137,414 tetrahedratetrahedra

74,343 - 913412 edges74,343 - 913412 edges

Time dependent shock waveTime dependent shock wave

Nine adaptationsNine adaptations

50,000 - 1,833,730 elements50,000 - 1,833,730 elements

Adaptive Mesh Adaptive Mesh ExperimentsExperiments

1) Latency tolerance is more critical as clusters 1) Latency tolerance is more critical as clusters increaseincrease2) New grid-based approaches are needed2) New grid-based approaches are needed

Expected runtimesExpected runtimes(no latency tolerance(no latency tolerance))

INTERCONNECT SLOWDOWNS

Expected runtimes Expected runtimes ((maximum latency tolerance)maximum latency tolerance)

Runtimes in thousands of unitsRuntimes in thousands of units

Experimental ResultsExperimental ResultsSimulated Grids of 32 ProcessorsSimulated Grids of 32 ProcessorsVarying clusters and interconnect speedsVarying clusters and interconnect speeds

Clusters 3 10 100 1000 1 473 473 473 473 2 728 863 1228 4102 3 755 1168 2783 18512 4 791 1361 3667 25040 5 854 1649 5677 53912 6 915 1717 8512 76169 7 956 1915 10958 80568 8 968 2178 11492 93566

Clusters 3 10 100 1000 1 287 287 287 287 2 298 469 763 3941 3 322 548 2386 12705 4 328 680 3297 21888 5 336 768 4369 33092 6 345 856 5044 52668 7 352 893 5480 61079 8 357 1048 5721 61321


PartitioningPartitioning MotivationMotivation

Avoid the possibilities that some processors Avoid the possibilities that some processors are overloaded while others are idle.are overloaded while others are idle.

ImplementationImplementation Suspend the applicationSuspend the application Model the problem using a dual graph Model the problem using a dual graph

approachapproach Utilize an available partitioner to balance loadUtilize an available partitioner to balance load Move data sets between processorsMove data sets between processors Resume the applicationResume the application

Overlap processing, communication, Overlap processing, communication, and data movementand data movement

Minimize communication costsMinimize communication costs Eliminate redundanciesEliminate redundancies RegionalizeRegionalize

Latency ToleranceLatency ToleranceMinimize the effect of high grid bandwidths and Minimize the effect of high grid bandwidths and latencieslatencies

Dual Graph ModelingDual Graph ModelingExample: Adaptive MeshesExample: Adaptive Meshes

Partitioner DeficienciesPartitioner Deficiencies

Partitioning and data set redistribution Partitioning and data set redistribution are executed in separate steps. are executed in separate steps. It is possible to achieve load balance and It is possible to achieve load balance and

still require incur communication coststill require incur communication cost Data set redistribution is not minimized Data set redistribution is not minimized

during partitioningduring partitioning Application latency tolerance algorithms Application latency tolerance algorithms

are not consideredare not considered The grid configuration is not consideredThe grid configuration is not considered

Research GoalsResearch Goals Create a partitioning tool for the GridCreate a partitioning tool for the Grid Objective goal to minimize runtime Objective goal to minimize runtime

instead of achieving processing a instead of achieving processing a balanced workloadbalanced workload

Anticipate latency tolerance Anticipate latency tolerance techniques employed by solverstechniques employed by solvers

Map the partitioning graph onto the Map the partitioning graph onto the grid configurationgrid configuration

Combine partitioning and data-set Combine partitioning and data-set movement into one partitioning stepmovement into one partitioning step

The N-body problemThe N-body problem

Simulating movement of a N-bodiesSimulating movement of a N-bodies Based on gravitational or electrostatic forcesBased on gravitational or electrostatic forces Applicable to many scientific problemsApplicable to many scientific problems

Implementation of SolverImplementation of Solver Modified Barnes & Hut (Treat far clusters as single bodies)Modified Barnes & Hut (Treat far clusters as single bodies) Present dual-graph to MinEX and MeTiS partitioners at Present dual-graph to MinEX and MeTiS partitioners at

each time stepeach time step

Simulated Grid environmentSimulated Grid environment More practical than actually building gridsMore practical than actually building grids Various configurations simulated by changing parametersVarious configurations simulated by changing parameters

- Small advantage at fast interconnects- Small advantage at fast interconnects- Effect of slow bandwidth completely hidden - Effect of slow bandwidth completely hidden - Huge advantage at the slower speeds- Huge advantage at the slower speeds- Improved load balance achieved (Not Shown)- Improved load balance achieved (Not Shown)- Increased MinEX advantage on heterogeneous - Increased MinEX advantage on heterogeneous configurationsconfigurations

MeTisMeTis

Clusters 3 10 100 1000 1 15 15 15 15 2 15 15 40 304 3 15 15 38 331 4 16 15 51 372 5 15 15 52 396 6 16 16 74 391 7 15 16 46 393 8 16 15 70 405

INTERCONNECT SLOWDOWNSMinEXMinEX

Clusters 3 10 100 1000 1 16 16 16 16 2 16 23 91 825 3 16 23 109 1017 4 16 23 119 1115 5 16 23 123 1161 6 16 23 128 1205 7 16 23 131 1244 8 16 23 132 1253


Runtimes in thousands of unitsRuntimes in thousands of units

Experimental ResultsExperimental ResultsComparisions: MinEX to MeTiSComparisions: MinEX to MeTiS

ConclusionsConclusions

Additional InformationAdditional Information The Grid Blueprint for a new Computing The Grid Blueprint for a new Computing

InfrastructureInfrastructure Ian Foster and Carl Kesselman, ISBN 1-55860-475-8Ian Foster and Carl Kesselman, ISBN 1-55860-475-8 Morgan Kaufmann Publishers, 1999Morgan Kaufmann Publishers, 1999 Semi-technical overview of grid computingSemi-technical overview of grid computing 589 references to publications589 references to publications

http://www.http://www.globusglobus.org.org Downloadable softwareDownloadable software Detailed description of GlobusDetailed description of Globus

Personal Web Site (Personal Web Site (http://www.http://www.sousou..eduedu//cscs//harveyharvey))

development of the grid computing infrastructure daniel j. harvey acknowledgements nasa ames...

Documents

entire grid

california slide

existing supercomputer

virtual reality

national research initiatives

single integrated system

need solutions

global interconnections