development of the grid computing infrastructure daniel j. harvey acknowledgements nasa ames...
Post on 19-Dec-2015
213 views
TRANSCRIPT
Development of the Development of the Grid Computing Grid Computing InfrastructureInfrastructureDaniel J. HarveyDaniel J. Harvey
AcknowledgementsAcknowledgementsNASA Ames Research Center NASA Ames Research Center
Sunnyvale, CaliforniaSunnyvale, California
Presentation OverviewPresentation Overview
What is grid computing?What is grid computing? Historical preliminariesHistorical preliminaries Grid potentialGrid potential Technological challengesTechnological challenges Current developmentsCurrent developments Future outlookFuture outlook Personal researchPersonal research
What is Grid Computing?What is Grid Computing?
The grid is a hardware/software infrastructure The grid is a hardware/software infrastructure that enables heterogeneous geographically that enables heterogeneous geographically separated clusters of processors to be separated clusters of processors to be connected in a virtual environment.connected in a virtual environment.
Heterogeneous:Heterogeneous: The computer configurations The computer configurations are highly varied.are highly varied.
Clusters:Clusters: Generally high performance computer Generally high performance computer configurations consisting of tens or hundreds of configurations consisting of tens or hundreds of processors that interact through processors that interact through interconnections.interconnections.
Virtual:Virtual: The users operate as if the entire grid The users operate as if the entire grid was a single integrated system.was a single integrated system.
Motivation for GridsMotivation for Grids The needThe need
Solutions to many large-scale computational problems Solutions to many large-scale computational problems are not feasible on existing supercomputer systems.are not feasible on existing supercomputer systems.
Expensive equipment are not always available locally.Expensive equipment are not always available locally. There is a growing demand for applications such as There is a growing demand for applications such as
virtual reality, data mining, and remote collaborationvirtual reality, data mining, and remote collaboration Possible solutionsPossible solutions
Many very powerful systems are not fully utilized.Many very powerful systems are not fully utilized. Pooling resources is cost effective.Pooling resources is cost effective. Communication technology is progressing rapidly.Communication technology is progressing rapidly. The internet has shown global interconnections to be The internet has shown global interconnections to be
effectiveeffective
Historical PreliminariesHistorical Preliminaries
Existing infrastructures in societyExisting infrastructures in society Power gridPower grid Transportation systemTransportation system Railroad systemRailroad system
RequirementsRequirements Universal accessUniversal access Standardized useStandardized use DependableDependable Cost Effective Cost Effective
Early Testbed ProjectEarly Testbed ProjectCorporation for National Research Corporation for National Research InitiativesInitiatives
When:When: Early 1990sEarly 1990s Where:Where: Cross ContinentCross Continent By Who:By Who: National National
Science Foundation (NSF) Science Foundation (NSF) and Defense Advanced and Defense Advanced Research Projects (DARPA)Research Projects (DARPA)
Funding:Funding: 15.8 million 15.8 million Purpose:Purpose: Improved Improved
Supercomputer utilization Supercomputer utilization and investigation of Gigabit and investigation of Gigabit network technologiesnetwork technologies
Early TestBed ExamplesEarly TestBed Examples AURORAAURORA – – Link MIT and University of Pennsylvania at Link MIT and University of Pennsylvania at
622 Mbps using fiber optics622 Mbps using fiber optics BLANCABLANCA – – Link AT&T Bell Labs, Berkeley, University of Link AT&T Bell Labs, Berkeley, University of
Wisconsin, LBL, Cray Research, University of Illinois with the Wisconsin, LBL, Cray Research, University of Illinois with the first nationwide testbed.first nationwide testbed.
CASACASA – – Link Los Alamos, Cal Tech, Jet Propulsion Link Los Alamos, Cal Tech, Jet Propulsion Laboratory, San Diego Supercomputing center. High speed Laboratory, San Diego Supercomputing center. High speed point to point networkpoint to point network..
NECTARNECTAR – – Link Pittsburgh Supercomputer Center and Link Pittsburgh Supercomputer Center and Carnegie Mellon University. Interface LAN’s to high speed Carnegie Mellon University. Interface LAN’s to high speed backbones.backbones.
VISTANETVISTANET – Link– Link University of North Carolina, North University of North Carolina, North Carolina State University, MCNC. Prototype for public Carolina State University, MCNC. Prototype for public switched gigabyte network.switched gigabyte network.
Early Results with CASAEarly Results with CASA
Chemical reaction modelingChemical reaction modeling Achieved super linear speedupAchieved super linear speedup 3.3 faster on two supercomputers 3.3 faster on two supercomputers
computers that speed achieved on either computers that speed achieved on either supercomputer.supercomputer.
Matched application to supercomputer Matched application to supercomputer characteristics characteristics
Modeling of ocean currentsModeling of ocean currents Calculations processed at UCLA, real time Calculations processed at UCLA, real time
visualization at Intel, displayed at the SDSCvisualization at Intel, displayed at the SDSC
Lessons LearnedLessons Learned Synchronization Synchronization
difficult requiring tedious manual interventiondifficult requiring tedious manual intervention HeterogeneityHeterogeneity
varied hardware, software, and operational varied hardware, software, and operational proceduresprocedures
Fault ToleranceFault Tolerance Critical for dependable serviceCritical for dependable service
Poor performancePoor performance Latency tolerance, redundant communication, Latency tolerance, redundant communication,
load-balance, communication protocol, load-balance, communication protocol, contention, routingcontention, routing
Security issuesSecurity issues
The I-WAY Project The I-WAY Project (IEEE Supercomputing ’95)(IEEE Supercomputing ’95)
PurposePurpose Connect the individual testbedsConnect the individual testbeds Exploit available resourcesExploit available resources
NSF, DOE, NASA networking infrastructureNSF, DOE, NASA networking infrastructure Supercomputers at dozens of laboratoriesSupercomputers at dozens of laboratories Broadband national networksBroadband national networks High-end graphics workstations High-end graphics workstations Virtual Reality (VR) and collaborative environmentsVirtual Reality (VR) and collaborative environments
Demonstrate innovative approaches to Demonstrate innovative approaches to scientific applicationsscientific applications
Examples of I-Way Examples of I-Way DemonstrationsDemonstrations
N-body galaxy simulationN-body galaxy simulation Coupled Cray, SGI, Thinking Machines, IBMCoupled Cray, SGI, Thinking Machines, IBM Virtual reality results displayed in the CAVE Virtual reality results displayed in the CAVE
at Supercomputing 95.at Supercomputing 95. Cave is a room-sized virtual environment.Cave is a room-sized virtual environment.
Industrial emission control systemIndustrial emission control system Coupled Chicago supercomputer with Cave Coupled Chicago supercomputer with Cave
in San Diego and Washington, D.C.in San Diego and Washington, D.C. Teleimmersion, colaborative environmentTeleimmersion, colaborative environment
Grid PotentialGrid PotentialApplications identified by I-WayApplications identified by I-Way
Large Scale Scientific ProblemsLarge Scale Scientific Problems simulations, data analysissimulations, data analysis
Data Intensive ApplicationsData Intensive Applications Data mining, distributed data storageData mining, distributed data storage
Teleimmersion applicationsTeleimmersion applications Virtual Reality, collaborationVirtual Reality, collaboration
Remote instrument controlRemote instrument control Particle accelerators, medical imaging, Particle accelerators, medical imaging,
feedback controlfeedback control
Current Grid NetworksCurrent Grid NetworksNational Technology Grid NASA Information Power Grid
Globus Ubiquitous Testbed Organization (GUSTO)
Atmospheric Ozone Atmospheric Ozone ObservationObservationLarge scale data collectionLarge scale data collection
DataGrid Testbed
Fundamental Particle Fundamental Particle ResearchResearchRemote instrumentationRemote instrumentation
DataGrid Testbed
Genome ResearchGenome ResearchData mining, code management, remote GUI Data mining, code management, remote GUI interfacesinterfaces
DataGrid Testbed
Teleimmersive Teleimmersive ApplicationsApplications
Electronic Visualization Laboratory University of Illinois at Chicago
Supercomputer SimulationsSupercomputer SimulationsParallel execution and visualizationParallel execution and visualization
Cactus Computational ToolkitUniversity of Illinois at Urbana-Champaign
Couple Photon Source to Couple Photon Source to SupercomputersSupercomputersAdvanced visualization and computationAdvanced visualization and computation
Argonne’s National Laboratory
Technological ChallengesTechnological ChallengesOperationOperation AuthenticationAuthentication AuthorizationAuthorization Resource discoveryResource discovery Resource allocationResource allocation Fault recoveryFault recovery Staging of executablesStaging of executables HeterogeneityHeterogeneity InstrumentationInstrumentation Auditing and Auditing and
accountingaccounting
Application SoftwareApplication Software Data storageData storage Load balancingLoad balancing Latency toleranceLatency tolerance Redundant Redundant
communicationcommunication Existing softwareExisting software
Connectivity
• Latency and bandwidth
• Communication protocol
OperationOperation Authentication and AuthorizationAuthentication and Authorization
Verify who is attempting to use available resourcesVerify who is attempting to use available resources Verify that access to a particular resource is acceptableVerify that access to a particular resource is acceptable
Resource discovery and allocationResource discovery and allocation Published list of resourcesPublished list of resources Updated resource statusUpdated resource status
Fault recovery, synchronization and staging Fault recovery, synchronization and staging executionexecution Dynamic reallocationDynamic reallocation Run time dependenciesRun time dependencies
Heterogeneity and portabilityHeterogeneity and portability Different policies and resources on different systemsDifferent policies and resources on different systems
Application SoftwareApplication Software Data storageData storage
Access to distributed data basesAccess to distributed data bases Load balancingLoad balancing
Workload needs to be balanced across all Workload needs to be balanced across all systemssystems
Latency toleranceLatency tolerance Streaming, overlap processing and Streaming, overlap processing and
communicationcommunication Redundant communicationRedundant communication
Implementation of staged communicationImplementation of staged communication Existing softwareExisting software
Preserving existing software as much as possiblePreserving existing software as much as possible
ConnectivityConnectivity
LatencyLatency Internet based connections vary by 1,000 %Internet based connections vary by 1,000 %
BandwidthBandwidth Mixture of high speed gigabyte connections Mixture of high speed gigabyte connections
and slow speed local connectionsand slow speed local connections ProtocolProtocol
Minimize communication hopsMinimize communication hops ““guarantee of service” instead of “best effort” guarantee of service” instead of “best effort” Adapted to application requirementsAdapted to application requirements
Maintenance and costMaintenance and cost
The Grid environmentThe Grid environment TransparentTransparent
Present the look and feel of a single coupled systemPresent the look and feel of a single coupled system IncrementalIncremental
Minimal initial programming modifications requiredMinimal initial programming modifications required Additional grid benefits achieved over timeAdditional grid benefits achieved over time
PortablePortable Available and easy to install on all popular platformsAvailable and easy to install on all popular platforms
Grid based toolsGrid based tools Minimize effort to implement grid aware Minimize effort to implement grid aware
applicationsapplications
Current DevelopmentsCurrent Developments Models for a grid computing Models for a grid computing
infrastructureinfrastructure Corba (extending “off the shelf” technology)Corba (extending “off the shelf” technology) Legion (object oriented)Legion (object oriented) Globus (toolkit approach)Globus (toolkit approach)
Commercial interestCommercial interest IBM has invested four billion dollars to utilize IBM has invested four billion dollars to utilize
Globus for implementation of grid-based Globus for implementation of grid-based computer farms around the world.computer farms around the world.
NASA is developing its Information Power Grid NASA is developing its Information Power Grid using Globus for the framework.using Globus for the framework.
The European Union is developing the Data The European Union is developing the Data GridGrid
CorbaCorbaA set of facilities linked through “off-the-shelf” A set of facilities linked through “off-the-shelf” packagespackages
Client Server ModelClient Server Model Web based technologyWeb based technology Wrap existing code in Java objectsWrap existing code in Java objects Exploit Object level and thread level Exploit Object level and thread level
parallelismparallelism Utilize Current public-key security Utilize Current public-key security
techniquestechniques
LegionLegionVision:Vision:A single unified virtual machine that ties A single unified virtual machine that ties together millions of processors and objectstogether millions of processors and objects
Object OrientedObject Oriented Each object defines the rules for accessEach object defines the rules for access Well defined object interfacesWell defined object interfaces A core set of objects provide basic servicesA core set of objects provide basic services Users can define and create their own objectsUsers can define and create their own objects Users and executing objects manipulate remote objectsUsers and executing objects manipulate remote objects
High PerformanceHigh Performance Users select hosts based on load and job affinityUsers select hosts based on load and job affinity Object wrapping for support for parallel programming.Object wrapping for support for parallel programming.
User AutonomyUser Autonomy Users choose scheduling policies and security Users choose scheduling policies and security
arrangements.arrangements.
GlobusGlobusAn An open-architectureopen-architecture integrated “bag” of basic grid integrated “bag” of basic grid servicesservices
Middleware layered architectureMiddleware layered architecture builds global services on top of core local services.builds global services on top of core local services.
Translucent interfaces to core Globus Translucent interfaces to core Globus servicesservices Well-defined interfaces that can be accessed Well-defined interfaces that can be accessed
directly by applications.directly by applications. Interfaces that are indirectly accessed through Interfaces that are indirectly accessed through
augmented software tools provided by developersaugmented software tools provided by developers Coexistence with existing applicationsCoexistence with existing applications
System EvolutionSystem Evolution Incremental implementationIncremental implementation Existing tools can be enhanced or replaced as Existing tools can be enhanced or replaced as
neededneeded
DataGrid ProjectDataGrid ProjectBased on GlobusBased on GlobusFunded by the European UnionFunded by the European Union
Globus Core ServicesGlobus Core Services
Meta-computing directory service (MDS)Meta-computing directory service (MDS) Status of global resourcesStatus of global resources
Global security infrastructure (GSI)Global security infrastructure (GSI) Authentication and authorizationAuthentication and authorization
Global resource allocation manager (GRAM)Global resource allocation manager (GRAM) Allocation of processors, instrumentation, memory, etc.Allocation of processors, instrumentation, memory, etc.
Global execution management (GEM) - Global execution management (GEM) - Staging Staging executablesexecutables
Heartbeat monitor (HBM) Heartbeat monitor (HBM) - - Fault detection and recoveryFault detection and recovery
Global access to secondary storage (GASS)Global access to secondary storage (GASS) Access to distributed data storage facilitiesAccess to distributed data storage facilities
Portable heterogeneous communication library Portable heterogeneous communication library (Nexus)(Nexus) Communication between processors, Parallel programmingCommunication between processors, Parallel programming
Future OutlookFuture Outlook
SuccessesSuccesses The basic Globus approach has proven The basic Globus approach has proven
successfulsuccessful Acceptable bandwidths are within reachAcceptable bandwidths are within reach
ChallengesChallenges ““speed of light” latency limitationsspeed of light” latency limitations Improved protocol schemesImproved protocol schemes
Less overhead, fewer hops, less susceptible to loadLess overhead, fewer hops, less susceptible to load Application specificApplication specific
Additional grid-based tools are neededAdditional grid-based tools are needed Open AreasOpen Areas
Which applications are applicable for grid Which applications are applicable for grid implementations?implementations?
Personal ResearchPersonal Research
What applications are suitable for grid solutionsWhat applications are suitable for grid solutions MinEX: A latency-tolerant dynamic partitioner for grid computing MinEX: A latency-tolerant dynamic partitioner for grid computing
applications, applications, Special Issue Journal of the FGCSSpecial Issue Journal of the FGCS, 2002, 2002 A Latency-tolerant partitioner for distributed computing on the A Latency-tolerant partitioner for distributed computing on the
Information Power Grid, Information Power Grid, IPDPS proceedings, 2001IPDPS proceedings, 2001 Latency hiding in partitioning and load balancing of grid Latency hiding in partitioning and load balancing of grid
computing applications, computing applications, IEEE International Symposium CCGridIEEE International Symposium CCGrid, , 2001, Finalist for best paper2001, Finalist for best paper
Partitioning schemePartitioning scheme to maximize grid-based to maximize grid-based performance performance
Performance of a heterogeneous grid partitioner for N-body Performance of a heterogeneous grid partitioner for N-body Applications, Applications, ICPP proceedingsICPP proceedings, 2002, under review, 2002, under review
Helicopter rotor testHelicopter rotor test
Three adaptationsThree adaptations
13,967 - 137,474 vertices13,967 - 137,474 vertices
60,968 - 137,414 60,968 - 137,414 tetrahedratetrahedra
74,343 - 913412 edges74,343 - 913412 edges
Time dependent shock waveTime dependent shock wave
Nine adaptationsNine adaptations
50,000 - 1,833,730 elements50,000 - 1,833,730 elements
Adaptive Mesh Adaptive Mesh ExperimentsExperiments
1) Latency tolerance is more critical as clusters 1) Latency tolerance is more critical as clusters increaseincrease2) New grid-based approaches are needed2) New grid-based approaches are needed
Expected runtimesExpected runtimes(no latency tolerance(no latency tolerance))
INTERCONNECT SLOWDOWNS
Expected runtimes Expected runtimes ((maximum latency tolerance)maximum latency tolerance)
Runtimes in thousands of unitsRuntimes in thousands of units
Experimental ResultsExperimental ResultsSimulated Grids of 32 ProcessorsSimulated Grids of 32 ProcessorsVarying clusters and interconnect speedsVarying clusters and interconnect speeds
Clusters 3 10 100 1000 1 473 473 473 473 2 728 863 1228 4102 3 755 1168 2783 18512 4 791 1361 3667 25040 5 854 1649 5677 53912 6 915 1717 8512 76169 7 956 1915 10958 80568 8 968 2178 11492 93566
Clusters 3 10 100 1000 1 287 287 287 287 2 298 469 763 3941 3 322 548 2386 12705 4 328 680 3297 21888 5 336 768 4369 33092 6 345 856 5044 52668 7 352 893 5480 61079 8 357 1048 5721 61321
INTERCONNECT SLOWDOWNS
PartitioningPartitioning MotivationMotivation
Avoid the possibilities that some processors Avoid the possibilities that some processors are overloaded while others are idle.are overloaded while others are idle.
ImplementationImplementation Suspend the applicationSuspend the application Model the problem using a dual graph Model the problem using a dual graph
approachapproach Utilize an available partitioner to balance loadUtilize an available partitioner to balance load Move data sets between processorsMove data sets between processors Resume the applicationResume the application
Overlap processing, communication, Overlap processing, communication, and data movementand data movement
Minimize communication costsMinimize communication costs Eliminate redundanciesEliminate redundancies RegionalizeRegionalize
Latency ToleranceLatency ToleranceMinimize the effect of high grid bandwidths and Minimize the effect of high grid bandwidths and latencieslatencies
Dual Graph ModelingDual Graph ModelingExample: Adaptive MeshesExample: Adaptive Meshes
Partitioner DeficienciesPartitioner Deficiencies
Partitioning and data set redistribution Partitioning and data set redistribution are executed in separate steps. are executed in separate steps. It is possible to achieve load balance and It is possible to achieve load balance and
still require incur communication coststill require incur communication cost Data set redistribution is not minimized Data set redistribution is not minimized
during partitioningduring partitioning Application latency tolerance algorithms Application latency tolerance algorithms
are not consideredare not considered The grid configuration is not consideredThe grid configuration is not considered
Research GoalsResearch Goals Create a partitioning tool for the GridCreate a partitioning tool for the Grid Objective goal to minimize runtime Objective goal to minimize runtime
instead of achieving processing a instead of achieving processing a balanced workloadbalanced workload
Anticipate latency tolerance Anticipate latency tolerance techniques employed by solverstechniques employed by solvers
Map the partitioning graph onto the Map the partitioning graph onto the grid configurationgrid configuration
Combine partitioning and data-set Combine partitioning and data-set movement into one partitioning stepmovement into one partitioning step
The N-body problemThe N-body problem
Simulating movement of a N-bodiesSimulating movement of a N-bodies Based on gravitational or electrostatic forcesBased on gravitational or electrostatic forces Applicable to many scientific problemsApplicable to many scientific problems
Implementation of SolverImplementation of Solver Modified Barnes & Hut (Treat far clusters as single bodies)Modified Barnes & Hut (Treat far clusters as single bodies) Present dual-graph to MinEX and MeTiS partitioners at Present dual-graph to MinEX and MeTiS partitioners at
each time stepeach time step
Simulated Grid environmentSimulated Grid environment More practical than actually building gridsMore practical than actually building grids Various configurations simulated by changing parametersVarious configurations simulated by changing parameters
- Small advantage at fast interconnects- Small advantage at fast interconnects- Effect of slow bandwidth completely hidden - Effect of slow bandwidth completely hidden - Huge advantage at the slower speeds- Huge advantage at the slower speeds- Improved load balance achieved (Not Shown)- Improved load balance achieved (Not Shown)- Increased MinEX advantage on heterogeneous - Increased MinEX advantage on heterogeneous configurationsconfigurations
MeTisMeTis
Clusters 3 10 100 1000 1 15 15 15 15 2 15 15 40 304 3 15 15 38 331 4 16 15 51 372 5 15 15 52 396 6 16 16 74 391 7 15 16 46 393 8 16 15 70 405
INTERCONNECT SLOWDOWNSMinEXMinEX
Clusters 3 10 100 1000 1 16 16 16 16 2 16 23 91 825 3 16 23 109 1017 4 16 23 119 1115 5 16 23 123 1161 6 16 23 128 1205 7 16 23 131 1244 8 16 23 132 1253
INTERCONNECT SLOWDOWNS
Runtimes in thousands of unitsRuntimes in thousands of units
Experimental ResultsExperimental ResultsComparisions: MinEX to MeTiSComparisions: MinEX to MeTiS
ConclusionsConclusions
Additional InformationAdditional Information The Grid Blueprint for a new Computing The Grid Blueprint for a new Computing
InfrastructureInfrastructure Ian Foster and Carl Kesselman, ISBN 1-55860-475-8Ian Foster and Carl Kesselman, ISBN 1-55860-475-8 Morgan Kaufmann Publishers, 1999Morgan Kaufmann Publishers, 1999 Semi-technical overview of grid computingSemi-technical overview of grid computing 589 references to publications589 references to publications
http://www.http://www.globusglobus.org.org Downloadable softwareDownloadable software Detailed description of GlobusDetailed description of Globus
Personal Web Site (Personal Web Site (http://www.http://www.sousou..eduedu//cscs//harveyharvey))