dgsim : comparing grid resource management architectures through trace-based simulation
DESCRIPTION
DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation. Alexandru Iosup , Ozan Sonmez, and Dick Epema PDS Group Delft University of Technology The Netherlands. A Grid Research Toolbox. Hypothesis: (a) is better than (b). For scenario 1, …. 1. 3. DGSim. - PowerPoint PPT PresentationTRANSCRIPT
1Euro-Par 2008, Las Palmas, 27 August 2008
DGSim: Comparing Grid Resource Management
Architectures Through Trace-Based Simulation
Alexandru Iosup, Ozan Sonmez, and Dick Epema
PDS GroupDelft University of Technology
The Netherlands
Euro-Par 2008, Las Palmas, 27 August 2008 2
A Grid Research Toolbox
• Hypothesis: (a) is better than (b).
DGSim
1
2
3
For scenario 1, …
Euro-Par 2008, Las Palmas, 27 August 2008 3
A Grid Research Toolbox
• Hypothesis: (a) is better than (b).
DGSim
1
2
3
For scenario 1, …
Euro-Par 2008, Las Palmas, 27 August 2008 4
The Problem with Grid Simulations• Three decades of writing simulators in computer science
→ writing the simulator is not the problem• The problem: getting from solution design to
experimental results with an automated simulation tool• Experimental setup
• Tool to generate realistic experimental setups• Experiment support for grid resource management
• Tool to manage large numbers of related simulations• Performance
• Not the simulation time (decades of optimizations there)• Tool proved to work with large simulations (number of
resources, workload size, etc.)
Euro-Par 2008, Las Palmas, 27 August 2008 5
Outline
1. Problem Statement2. The DGSim Framework3. DGSim Validation4. DGSim Examples5. Future Work
Euro-Par 2008, Las Palmas, 27 August 2008 6
2. The DGSim FrameworkName, Goal, and Challenges• DGSim = Delft Grid Simulator
• Simulate various grid resource management architectures• Multi-cluster grids• Grids of grids (THE grid)
• Challenges• Many types of architectures• Generating and replaying grid workloads• Management of the simulations
• Many repetitions of a simulation for statistical relevance• Simulations with many parameters• Managing results (e.g., analysis tools)• Enabling collaborative experiments
Two GRM architectures
Euro-Par 2008, Las Palmas, 27 August 2008 7
2. The DGSim Framework Overview
Discrete-EventSimulator
Euro-Par 2008, Las Palmas, 27 August 2008 8
2. The DGSim Framework Model Details: Inter-Operation Architectures
Hybrid hierarchical/ decentralize
d
Decentralized
Hierarchical
Independent
Centralized
Euro-Par 2008, Las Palmas, 27 August 2008 9
2. The DGSim Framework Model Details: Resource Dynamics & Evolution• Resource dynamics
• Short-term changes in resource availability status
• Resource evolution• Long-term changes in number & … of resources
A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, IEEE/ACM Grid, 2007.
Euro-Par 2008, Las Palmas, 27 August 2008 10
2. The DGSim Framework Workloads: Generation and Model(s)
• Parallel jobs• Adapting the Lublin-Feitelson model to grids
• Bags-of-Tasks: groups of independent single-processor tasks• Validated with seven long-term grid tracesA. Iosup, O.O. Sonmez, S. Anoep, D.H.J.Epema, The Performance
of Bags-of-Tasks in Large-Scale Distributed Computing Systems, IEEE HPDC, 2008.
A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, ACM/IEEE SuperComputing, 2007.
• Workload Generation• Generate synthetic workload with realistic characteristics• Iterative workload generation: incur specified load on a grid
Euro-Par 2008, Las Palmas, 27 August 2008 11
Outline
1. Problem Statement2. The DGSim Framework3. DGSim Validation4. DGSim Examples5. Future Work
Euro-Par 2008, Las Palmas, 27 August 2008 12
3. DGSim Validation Functional Validation
• Functional validation (simple scenario)• Workload = 100 jobs ct. size 10,000 arrive at t=0• System: grid scheduler over one 10-resource cluster
resource = 1 work unit/second, information delay = 0-3600s
Euro-Par 2008, Las Palmas, 27 August 2008 13
3. DGSim Validation Real vs. Simulated DAS-3 Multi-Cluster Grid• Simulator setup
• Application: synthetic parallel, communication-intensive (all-gather) Measured: runtime for various configurations (co-allocation)
• System: heterogeneous clusters, Koala co-allocating scheduler
• Workload: 300 jobs, submitted over a period of 6 hours• All jobs submitted through central cluster gateways
• Results• Scheduling algorithm leads to similar results in real and
simulated environments → can use simulator for analyzing scheduling trends
• Under-estimation of waiting time (failures lead to more contention)
Euro-Par 2008, Las Palmas, 27 August 2008 14
Outline
1. Problem Statement2. The DGSim Framework3. DGSim Validation4. DGSim Examples5. Future Work
Euro-Par 2008, Las Palmas, 27 August 2008 15
4. DGSim ExamplesSample 1/3
• Investigate mechanisms for inter-operating grids• New mechanism: DMM• Trace-based performance
evaluation through simulations• Real and model-based traces• Largest trace: 1.4M jobs• Simulate Grid’5000+DAS-2• Explored a design space of
over 1 million design points
A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, ACM/IEEE SuperComputing, 2007.
Euro-Par 2008, Las Palmas, 27 August 2008 16
4. DGSim ExamplesSample 2/3
• What is the performance impact of the dynamic grid resource availability?• Four models for grid resource
availability information• Trace-based performance
evaluation through simulations• Real traces• Simulate Grid’5000• KA = AMA > HMA >> SA
A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, IEEE/ACM Grid, 2007.
Resource availability
Static Dynamic
AvailabilityInformatio
nDelay
On-Time (0)
Short period
Long period
SA KA
AMA
HMA
-
5,000
10,000
15,000
SA KA AMA AMA HMA 1wk HMA 1mo HMA Fixed
Model
Av
g. N
orm
. G
'pu
t [c
pu
s/d
ay
/pro
c]
Avg.
Norm
. G
’put.
[c
puse
conds/
day/p
roc]
Goodput decreases withintervention delay
Model
SA KA AMA60s
AMA1h
HMA1w
HMA1mo
HMANever
Euro-Par 2008, Las Palmas, 27 August 2008 17
4. DGSim ExamplesSample 3/3
• Analyze performance of bag-of-tasks scheduling algorithms • Information availability framework:
Known, Unknown, Historical records
• Trace-based performance evaluation through simulations
• Real and model-based traces• Simulate Grid’5000+DAS• Evaluated 8 scheduling algorithms• Explored a design space of
over 2 million design pointsA. Iosup, O.O. Sonmez, S. Anoep, D.H.J.Epema, The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems, IEEE HPDC, 2008.
Task Information
Reso
urc
e
Info
rmati
on
K H U
K
H
U
ECT, FPLT
FPFECT-P
DFPLT,
MQDSTFR
RR, WQR
Euro-Par 2008, Las Palmas, 27 August 2008 18
Outline
1. Problem Statement2. The DGSim Framework3. DGSim Validation4. DGSim Examples5. Future Work
Euro-Par 2008, Las Palmas, 27 August 2008 19
Conclusion and Future Work
• The DGSim framework • Tool to generate realistic experimental setups• Tool to manage large numbers of grouped simulations • Tool proved to work with large simulations
• Validated underlying models and assumptions• Resource dynamics and evolution model• Workload model
• Comparing grid resource management architectures• Proven in various settings
• Future work• More scenarios• Library of ready-to-use scenarios
Euro-Par 2008, Las Palmas, 27 August 2008 20
Thank you! Questions? Remarks? Observations?• Contact: [email protected] [google
“Iosup“]
• Web sites:o http://www.vl-e.nl : VL-e project
o http://www.pds.ewi.tudelft.nl : PDS group articles & software