resource and test management in grids

21
June 21, 2022 1 Resource and Test Management in Grids Rapid Prototyping in e-Science VL-e Workshop, Amsterdam, NL Dick Epema, Catalin Dumitrescu, Hashim Mohamed, Alexandru Iosup , Ozan Sonmez Parallel and Distributed Systems Group Delft University of Technology

Upload: abena

Post on 24-Feb-2016

30 views

Category:

Documents


3 download

DESCRIPTION

Resource and Test Management in Grids. Dick Epema, Catalin Dumitrescu, Hashim Mohamed, Alexandru Iosup , Ozan Sonmez. Parallel and Distributed Systems Group Delft University of Technology. Rapid Prototyping in e-Science VL-e Workshop, Amsterdam, NL. A Brief Introduction to Grid Computing. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Resource and Test Management in Grids

April 22, 20231

Resource and Test Management in Grids

Rapid Prototyping in e-Science VL-e Workshop, Amsterdam, NL

Dick Epema, Catalin Dumitrescu, Hashim Mohamed,

Alexandru Iosup, Ozan SonmezParallel and Distributed Systems GroupDelft University of Technology

Page 2: Resource and Test Management in Grids

April 22, 20233

A Brief Introduction to Grid Computing

• Typical grid environmente.g., the DAS• Applications [!]• Resources

• Compute (Clusters)• Storage• (Dedicated) Network

• Virtual Organizations, Projects (e.g., VL-e), Groups, Users

• Grids vs. (traditional)parallel production environments• Dynamic• Heterogeneous• Very large-scale (world)• No central administration→ Most problems are NP-hard,

need experimental validation

Page 3: Resource and Test Management in Grids

April 22, 20234

Outline

• A Brief Introduction to Grid Computing• Koala: Processor and Data Co-Allocation in Grids

The Co-Allocation Problem in Grids The Koala Design Koala and the DAS Community The Future of Koala

• GrenchMark: Analyzing, Testing, and Comparing Grids Grid Performance Evaluation Issues The GrenchMark Architecture Experience with GrenchMark

• Take home message

Page 4: Resource and Test Management in Grids

April 22, 20235

The Co-allocation Problem in Grids (1)Motivation• Co-allocation = the simultaneous allocation of

resources in multiple clusters to single applications which consist of multiple components

• Reasons• Use more resources than available at single cluster at given time• Create a specific virtual environment (e.g., visualization cluster ,

geographically spread data)• Achieve reliability through replication on multiple clusters• Avoid resource contention on the same site (e.g., batches)

Page 5: Resource and Test Management in Grids

April 22, 20236

The Co-allocation Problem in Grids (2) Overall Example

global queue

LS

local queues with local schedulers

local jobsglobal job

KOALA

clusters

LS LS load sharing

co-allocation

Source: Dick Epema

Page 6: Resource and Test Management in Grids

April 22, 20237

The Co-allocation Problem in Grids (3)Details: Processors and Data Co-Alloc.

• Jobs have access to processors and data from many sites• Files stored at different file sites, replicas may exist• Scheduler decides on job component placement at execution sites• Jobs can be of high or low priority

Source: Hashim Mohamed

Page 7: Resource and Test Management in Grids

April 22, 20238

The Co-allocation Problem in Grids (4)Details: Co-Allocated Job Typesfixed jobs

Job component size and placement

fixed by user

non-fixed jobs

Job component size fixed by user, placement by scheduler

decisionsemi-fixed jobs

Job component size and placement by scheduler decision / fixed by user

flexible jobs

Job component size and placement by scheduler

decision

Page 8: Resource and Test Management in Grids

April 22, 20239

The Koala Design

SelectionPlacing job

components

ControlTransfer

executable and

input files

Instantiation

Claiming resources

selected for each job

component

RunSubmit, then monitor job execution

(fault-tolerance)

Source: Hashim Mohamed

Page 9: Resource and Test Management in Grids

April 22, 202310

The Koala Selection StepMany Placement Policies

• Originally supported co-allocation policies:• Worst-Fit: balance job components across sites• Close-to-Files: take into account the locations of input

files to minimize transfer times • (Flexible) Cluster Minimization: mitigate inter-cluster

communication; can also split the job automatically• But, different application types require

different ways of component placement• So:• Modular structure with pluggable policies• Take into account internal structure of applications

Page 10: Resource and Test Management in Grids

April 22, 202311

The Koala Selection StepHOCs: Exploiting Application Structure• Higher-Order Components:• Pre-packaged software components with

generic patterns of parallel behavior • Patterns: master-worker, pipelines, wavefront

• Benefits:• Facilitates parallel programming in grids• Enables user-transparent scheduling in grids

• Most important additional middleware:• Translation layer that builds a performance

model from the HOC patterns and the user-supplied application parameters

• Supported by KOALA (with Univ. of Münster)• Initial results: up to 50% reduction in runtimes

Page 11: Resource and Test Management in Grids

April 22, 202312

• Problem: How to support many application types, each with specific (and difficult) requirements?

• Solution: runners (=interface modules)

• Currently supported:• Any type of single-component job• MPI/DUROC jobs• Ibis jobs• HOC applications

• API for extensions: write your own!

The Koala Instantiation StepThe Runners

runner

Page 12: Resource and Test Management in Grids

April 22, 202313

Koala and the DAS Community

• Extensive experience gathered while assessing various co-allocation policies: over 25,000 completed jobs!

• Koala has been released on the DAS in Sep 2005

[ www.st.ewi.tudelft.nl/koala/ ] • Hands-on Tutorials (last in Spring 2006)• Documentation (web-site)• Papers

• IEEE Cluster’04, Dagstuhl FGG’04, EGC’05, IEEE CCGrid’05, IEEE Cluster’06, etc.

• Koala helps you get results: • IEEE CCGrid’06, others submitted

Page 13: Resource and Test Management in Grids

April 22, 202314

The Future of Koala• Support for more applications types, e.g.,

• Workflows, Parameter sweep applications• Scheduling your application?• Communication-aware and

application-aware scheduling policies:• Take into account the communication pattern of

applications when co-allocating• Also schedule bandwidth (in DAS3)

• Support heterogeneity• DAS3• DAS2 + DAS3• DAS3 + Grid’5000 + RoGRID

vrije Universiteitvrije Universiteit

• Peer-to-peer structure instead of hierarchical grid scheduler

Page 14: Resource and Test Management in Grids

April 22, 202315

Outline

• A Brief Introduction to Grid Computing• Koala: Processor and Data Co-Allocation in Grids

The Co-Allocation Problem in Grids The Koala Design Koala and the DAS Community The Future of Koala

• GrenchMark: Analyzing, Testing, and Comparing Grids Grid Performance Evaluation Issues The GrenchMark Architecture GrenchMark and the DAS Community

• Take home message

Page 15: Resource and Test Management in Grids

April 22, 202316

Grid Performance Evaluation Current Practice

• Performance Indicators• Define my own metrics, or use U and AWT/ART, or both

• Workload Structure• Run my own workload; Mostly all users are created equal

assumption (unrealistic) • Do not make comparisons (incompatible workloads)• No repeatability of results (e.g., background load)

Need a common Need a common performance performance

evaluation framework evaluation framework for Grid:for Grid:

GrenchMarkGrenchMark

Page 16: Resource and Test Management in Grids

April 22, 202317

GrenchMark: a Framework for Analyzing, Testing, and Comparing grids

• What’s in a name?grid benchmark → working towards a generic tool for the whole community: help standardizing the testing procedures, but benchmarks are too early; we use synthetic grid workloads instead

• What’s it about?A systematic approach to analyzing, testing, and comparing grid settings, based on synthetic workloads• A set of metrics and workload units for analyzing grid settings

[JSSPP’06]• A set of representative grid applications

• Both real and synthetic• Easy-to-use tools to create synthetic grid workloads• Flexible, extensible framework

Page 17: Resource and Test Management in Grids

April 22, 202318

GrenchMark Overview: Easy to Generate and Run Synthetic Workloads

Page 18: Resource and Test Management in Grids

April 22, 202319

… but More Complicated Than You Think• Workload structure

• User-defined and statistical models • Dynamic jobs arrival• Burstiness and self-similarity• Feedback, background load• Machine usage assumptions• Users, VOs

• Metrics• A(W) Run/Wait/Resp. Time • Efficiency, MakeSpan• Failure rate [!]

• (Grid) notions• Co-allocation, interactive jobs, malleable, moldable, …

• Measurement methods• Long workloads• Saturated / non-saturated system• Start-up, production, and cool-down scenarios• Scaling workload to system

• Applications• Synthetic• Real

• Workload definition language

• Base language layer• Extended language layer

• Other• Can use the same workload for both simulations and real environments

Page 19: Resource and Test Management in Grids

April 22, 202320

GrenchMark and the DAS community• Generic Performance Evaluation [IEEE

CCGrid’06]• Grid System Analysis

• Performance testing, What-if analysis• Functionality Testing in Grid Environments

• System functionality testing, Periodic testing• Comparing Grid Settings

• Single site vs. co-allocated jobs• Releasing the Koala Grid Scheduler on the DAS

• 5,000+ jobs successfully run (in all workloads); • Functionality tests for 3 different job submission modules

• GrenchMark has been released in Nov 2005 [ grenchmark.st.ewi.tudelft.nl ]

Page 20: Resource and Test Management in Grids

April 22, 202322

• PDS Group/TU Delft - resource and test management in Grid systems

• Koala: Processor and Data Co-Allocation in Grids [ www.st.ewi.tudelft.nl/koala/ ] - Grid scheduling with co-allocation and fault-tolerance - many placement policies available - extensible runners system - easy-to-use, flexible - tutorials, on-line documentation, papers

• GrenchMark: Analyzing, Testing, and Comparing Grids [ grenchmark.st.ewi.tudelft.nl ] - generic tool for the whole community - generates diverse grid workloads - easy-to-use, flexible, portable, extensible, …

Take home message

Page 21: Resource and Test Management in Grids

April 22, 202323

Thank you!

Questions? Remarks?

Observations? All welcome!

grenchmark.st.ewi.tudelfgrenchmark.st.ewi.tudelft.nl/t.nl/

www.st.ewi.tudelft.nl/www.st.ewi.tudelft.nl/koalakoala