dgsim : comparing grid resource management architectures through trace-based simulation

20
1 Euro-Par 2008, Las Palmas, 27 August 2008 DGSim: Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan Sonmez, and Dick Epema PDS Group Delft University of Technology The Netherlands

Upload: ivy

Post on 26-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation. Alexandru Iosup , Ozan Sonmez, and Dick Epema PDS Group Delft University of Technology The Netherlands. A Grid Research Toolbox. Hypothesis: (a) is better than (b). For scenario 1, …. 1. 3. DGSim. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

1Euro-Par 2008, Las Palmas, 27 August 2008

DGSim: Comparing Grid Resource Management

Architectures Through Trace-Based Simulation

Alexandru Iosup, Ozan Sonmez, and Dick Epema

PDS GroupDelft University of Technology

The Netherlands

Page 2: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 2

A Grid Research Toolbox

• Hypothesis: (a) is better than (b).

DGSim

1

2

3

For scenario 1, …

Page 3: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 3

A Grid Research Toolbox

• Hypothesis: (a) is better than (b).

DGSim

1

2

3

For scenario 1, …

Page 4: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 4

The Problem with Grid Simulations• Three decades of writing simulators in computer science

→ writing the simulator is not the problem• The problem: getting from solution design to

experimental results with an automated simulation tool• Experimental setup

• Tool to generate realistic experimental setups• Experiment support for grid resource management

• Tool to manage large numbers of related simulations• Performance

• Not the simulation time (decades of optimizations there)• Tool proved to work with large simulations (number of

resources, workload size, etc.)

Page 5: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 5

Outline

1. Problem Statement2. The DGSim Framework3. DGSim Validation4. DGSim Examples5. Future Work

Page 6: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 6

2. The DGSim FrameworkName, Goal, and Challenges• DGSim = Delft Grid Simulator

• Simulate various grid resource management architectures• Multi-cluster grids• Grids of grids (THE grid)

• Challenges• Many types of architectures• Generating and replaying grid workloads• Management of the simulations

• Many repetitions of a simulation for statistical relevance• Simulations with many parameters• Managing results (e.g., analysis tools)• Enabling collaborative experiments

Two GRM architectures

Page 7: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 7

2. The DGSim Framework Overview

Discrete-EventSimulator

Page 8: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 8

2. The DGSim Framework Model Details: Inter-Operation Architectures

Hybrid hierarchical/ decentralize

d

Decentralized

Hierarchical

Independent

Centralized

Page 9: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 9

2. The DGSim Framework Model Details: Resource Dynamics & Evolution• Resource dynamics

• Short-term changes in resource availability status

• Resource evolution• Long-term changes in number & … of resources

A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, IEEE/ACM Grid, 2007.

Page 10: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 10

2. The DGSim Framework Workloads: Generation and Model(s)

• Parallel jobs• Adapting the Lublin-Feitelson model to grids

• Bags-of-Tasks: groups of independent single-processor tasks• Validated with seven long-term grid tracesA. Iosup, O.O. Sonmez, S. Anoep, D.H.J.Epema, The Performance

of Bags-of-Tasks in Large-Scale Distributed Computing Systems, IEEE HPDC, 2008.

A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, ACM/IEEE SuperComputing, 2007.

• Workload Generation• Generate synthetic workload with realistic characteristics• Iterative workload generation: incur specified load on a grid

Page 11: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 11

Outline

1. Problem Statement2. The DGSim Framework3. DGSim Validation4. DGSim Examples5. Future Work

Page 12: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 12

3. DGSim Validation Functional Validation

• Functional validation (simple scenario)• Workload = 100 jobs ct. size 10,000 arrive at t=0• System: grid scheduler over one 10-resource cluster

resource = 1 work unit/second, information delay = 0-3600s

Page 13: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 13

3. DGSim Validation Real vs. Simulated DAS-3 Multi-Cluster Grid• Simulator setup

• Application: synthetic parallel, communication-intensive (all-gather) Measured: runtime for various configurations (co-allocation)

• System: heterogeneous clusters, Koala co-allocating scheduler

• Workload: 300 jobs, submitted over a period of 6 hours• All jobs submitted through central cluster gateways

• Results• Scheduling algorithm leads to similar results in real and

simulated environments → can use simulator for analyzing scheduling trends

• Under-estimation of waiting time (failures lead to more contention)

Page 14: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 14

Outline

1. Problem Statement2. The DGSim Framework3. DGSim Validation4. DGSim Examples5. Future Work

Page 15: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 15

4. DGSim ExamplesSample 1/3

• Investigate mechanisms for inter-operating grids• New mechanism: DMM• Trace-based performance

evaluation through simulations• Real and model-based traces• Largest trace: 1.4M jobs• Simulate Grid’5000+DAS-2• Explored a design space of

over 1 million design points

A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, ACM/IEEE SuperComputing, 2007.

Page 16: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 16

4. DGSim ExamplesSample 2/3

• What is the performance impact of the dynamic grid resource availability?• Four models for grid resource

availability information• Trace-based performance

evaluation through simulations• Real traces• Simulate Grid’5000• KA = AMA > HMA >> SA

A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, IEEE/ACM Grid, 2007.

Resource availability

Static Dynamic

AvailabilityInformatio

nDelay

On-Time (0)

Short period

Long period

SA KA

AMA

HMA

-

5,000

10,000

15,000

SA KA AMA AMA HMA 1wk HMA 1mo HMA Fixed

Model

Av

g. N

orm

. G

'pu

t [c

pu

s/d

ay

/pro

c]

Avg.

Norm

. G

’put.

[c

puse

conds/

day/p

roc]

Goodput decreases withintervention delay

Model

SA KA AMA60s

AMA1h

HMA1w

HMA1mo

HMANever

Page 17: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 17

4. DGSim ExamplesSample 3/3

• Analyze performance of bag-of-tasks scheduling algorithms • Information availability framework:

Known, Unknown, Historical records

• Trace-based performance evaluation through simulations

• Real and model-based traces• Simulate Grid’5000+DAS• Evaluated 8 scheduling algorithms• Explored a design space of

over 2 million design pointsA. Iosup, O.O. Sonmez, S. Anoep, D.H.J.Epema, The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems, IEEE HPDC, 2008.

Task Information

Reso

urc

e

Info

rmati

on

K H U

K

H

U

ECT, FPLT

FPFECT-P

DFPLT,

MQDSTFR

RR, WQR

Page 18: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 18

Outline

1. Problem Statement2. The DGSim Framework3. DGSim Validation4. DGSim Examples5. Future Work

Page 19: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 19

Conclusion and Future Work

• The DGSim framework • Tool to generate realistic experimental setups• Tool to manage large numbers of grouped simulations • Tool proved to work with large simulations

• Validated underlying models and assumptions• Resource dynamics and evolution model• Workload model

• Comparing grid resource management architectures• Proven in various settings

• Future work• More scenarios• Library of ready-to-use scenarios

Page 20: DGSim : Comparing Grid Resource Management Architectures  Through Trace-Based Simulation

Euro-Par 2008, Las Palmas, 27 August 2008 20

Thank you! Questions? Remarks? Observations?• Contact: [email protected] [google

“Iosup“]

• Web sites:o http://www.vl-e.nl : VL-e project

o http://www.pds.ewi.tudelft.nl : PDS group articles & software