dynamiccloudsim: simulating heterogeneity in computational …buxmarcn/... · 2017. 1. 4. · and...
TRANSCRIPT
DynamicCloudSim: Simulating Heterogeneity in
Computational Clouds
Marc Bux, Ulf Leser
{bux|leser}@informatik.hu-berlin.de
The 2nd international workshop on Scalable Workflow
Enactment Engines and Technologies (SWEET'13)
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 2
Meet Sandra
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 3
Meet Sandra
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 4
Meet Sandra
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 5
• Small Instance: 1.7 GB RAM, 1 EC2 Compute Unit, 160 GB local storage
• Compute Unit: equiv. CPU capacity of a 1.0-1.2 GHz Opteron or Xeon
• No guarantees wrt. I/O throughput and network delay / bandwidth
Meet Paul
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 6
Any one cloud instance is unlike another.
Meet Paul
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 7
Heterogeneity in EC2 Cloud Instances • Different CPUs on physical
host systems [Jackson10, Schad10]
– Intel Xeon E5430 (2.66 GHz quad)
– AMD Opteron 270 (2 GHz dual)
– AMD Opteron 2218 HE (2.6 GHz dual)
• I/O throughput varies as well [Dejun10]
– No correlation between
CPU and I/O performance
Am
azo
n E
C2
Per
form
an
ce [
Sch
ad
10
]
Sou
rce:
[D
eju
n1
0]
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 8
• Occasional CPU performance slumps and failures during task
execution [Dejun10, Jackson10]
• Variance in I/O and network throughput [Zaharia08 ,Jackson10]
• Performance depends on hour of day and day of week [Schad10]
Dynamic Changes of Performance
EC2 Disk performance vs. VM co-allocation [Zaharia08]
CPU performance slumps [Dejun10]
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 9
Vision
Adaptive scheduling of scientific workflows
• Exploit heterogeneous resources
• Exhibit robustness to instability
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 10
Vision
• The standard approach for evaluation is simulation [Braun01, Blythe05]
• Cloud simulation toolkits do not model instability
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 11
Agenda
1) Simulating Heterogeneity in Computational Clouds
2) Evaluating Established Workflow Schedulers
3) Summary and Outlook
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 12
Agenda
1) Simulating Heterogeneity in Computational Clouds
2) Evaluating Established Workflow Schedulers
3) Summary and Outlook
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 13
CloudSim
Datacenter
Host
VM
Task
• R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, R. Buyya (2011),
CloudSim: a toolkit for modeling and simulation of cloud computing
environments and evaluation of resource provisioning algorithms,
Software - Practice and Experience 41(1):23-50.
• More than 250 citations in Google Scholar
• https://code.google.com/p/cloudsim/
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 14
DynamicCloudSim
Datacenter
Heterogeneous Host
Dynamic VM
Error-prone Task
• Extend CloudSim with models for
1. Heterogeneous computational resources (Het)
2. Dynamic changes of performance at runtime (DCR)
3. Straggler VMs and failed task executions (SaF)
• More fine-grained representation of computational resources
• https://code.google.com/p/dynamiccloudsim/
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 15
Realism – can we ever get there?
• Simulation can never perfectly resemble reality
• We model inhomogeneity and dynamic changes by
sampling from normal distributions
• Default mean and STD/RSD Parameters are obtained
from [Zaharia08, Dejun10, Jackson10, Schad10, Iosup11]
Many performance characteristics in EC2 follow a normal distribution [Schad10]
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 16
Simulating VM Performance: DCS vs CS
1. Heterogeneous computational resources (Het)
2. Dynamic changes of performance at runtime (DCR)
3. Straggler VMs and failed task executions (SaF)
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 17
Agenda
1) Simulating Heterogeneity in Computational Clouds
2) Evaluating Established Workflow Schedulers
a) Scheduling Scientific Workflows
b) Evaluation Workflows
c) Evaluation Results
3) Summary and Outlook
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 18
Agenda
1) Simulating Heterogeneity in Computational Clouds
2) Evaluating Established Workflow Schedulers
a) Scheduling Scientific Workflows
b) Evaluation Workflows
c) Evaluation Results
3) Summary and Outlook
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 19
Scheduling of Scientific Workflows
• Scheduling:
– Mapping tasks to the available physical resources
– Usual goal: minimize overall execution time
• Static Scheduling:
– Schedule is assembled prior to workflow execution
– Schedule is strictly abided at runtime
• Adaptive Scheduling:
– Monitor computational infrastructure
– Adjust workflow execution at runtime
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 20
Static Schedulers
• Baseline: Round Robin
– Assign tasks to resources in turn
– Equal amount of tasks per resource
• Elaborate: HEFT (Het. Earliest Finish Time) [Topcuoglu02]
– Implemented in SWfMS Pegasus
– Requires runtime estimates for each task on each resource
– Assign tasks with longest time to finish a fixed timeslot on
a suitable (well-performing) resource
– Exploit heterogeneity in computational infrastructure (Het)
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 21
Adaptive Schedulers
• Baseline: Greedy Task Queue
– Assign tasks to resources at runtime in first-come-first-
served manner
– Adapts to changes of performance at runtime (DCR)
• Elaborate: LATE (Longest Approx. Time to End) [Zaharia08]
– Developed for Hadoop to increase robustness to instability
– 10% of Tasks progressing at rate below average are
replicated and speculatively executed
– Exploit dynamic changes of performance
– Robust to straggler VMs and failed task executions (SaF)
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 22
Agenda
1) Simulating Heterogeneity in Computational Clouds
2) Evaluating Established Workflow Schedulers
a) Scheduling Scientific Workflows
b) Evaluation Workflows
c) Evaluation Results
3) Summary and Outlook
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 23
Evaluation Workflow: Montage [Berriman04]
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 24
Abstract Montage Workflow
One task can have many task instances.
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 25
Concrete Montage Workflow • 43,318 tasks reading and writing 534 GB of data
• 10 GB input files which have to be uploaded to the cloud
• Determine avg. runtime over 100 simulations of workflow exec.
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 26
Eval. Workflow: Comparative Genomics
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 27
Concrete Genomics Workflow
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 28
Concrete Genomics Workflow
• Align 10% of the reads produced in a sequencing experiment
against the smallest of human chromosomes (chr22)
– Use about 0.2% of the available data
• 4,266 tasks reading and writing 436 GB of data (2.3 GB upload)
Indexing (bowtie, SHRiMP, PerM)
Alignment (bowtie, SHRiMP, PerM)
Convert (samtools view)
Sort (samtools sort)
Merge (merge)
Preprocess (samtools mpileup)
Variant calling (VarScan)
“Sense-Making” (VCFTools)
Upload to cloud
Download from cloud
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 29
Agenda
1) Simulating Heterogeneity in Computational Clouds
2) Evaluating Established Workflow Schedulers
a) Scheduling Scientific Workflows
b) Evaluation Workflows
c) Evaluation Results
3) Summary and Outlook
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 30
Runtime depending on Heterogeneity (Het)
0
0.125
0.25
0.3750.5
0
200
400
600
800
1000
1200
1400
Static RoundRobin
HEFTGreedyQueue
LATE
368 304
296 311
371 301 300 308
450
296 303 315
715
296 308 313
1314
286 300 300
RSD Parameters for Heterogeneous Resources (Het) A
vera
ge R
un
tim
e in
Min
ute
s
0
0.125
0.25
0.3750.5
0
200
400
600
800
Static RoundRobin
HEFTGreedyQueue
LATE
203
143 163 178
220
148 163 179
275
150 166 177
602
152 187 182
747
149 195 185
RSD Parameters for Heterogeneous Resources (Het) A
vera
ge R
un
tim
e in
Min
ute
s
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 31
Runtime depending on Dynamic Changes (DCR)
0
0.125
0.25
0.3750.5
0
100
200
300
400
500
600
Static RoundRobin
HEFTGreedyQueue
LATE
368
304 296 311
352 301
296 317
394 357
299 308
465 439
311 299
574 530
307 289
RSD Parameters for Dynamic Changes at Runtime (DCR) A
vera
ge R
un
tim
e in
Min
ute
s
0
0.125
0.25
0.3750.5
0
100
200
300
400
Static RoundRobin
HEFTGreedyQueue
LATE
203
143 163 178
216
165 166 176
241
190 165 179
295 255
170 180
393
314
207 177
RSD Parameters for Dynamic Changes at Runtime (DCR) A
vera
ge R
un
tim
e in
Min
ute
s
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 32
Runtime with Stragglers and Failures (SaF)
0
0.00625
0.0125
0.018750.025
0
500
1000
1500
2000
2500
3000
Static RoundRobin
HEFTGreedyQueue
LATE
368 304
296 311
598 405
396 316
876 659
586
317
1365
962 790
316
2559
1291 1137
321
Likelihood of Straggler VMs and Failed Tasks (SaF) A
vera
ge R
un
tim
e in
Min
ute
s
0
0.00625
0.0125
0.018750.025
0
500
1000
1500
2000
Static RoundRobin
HEFTGreedyQueue
LATE
203 143 163
178
352 262
237 180
617
411 444
187
1025
604 635
188
1990
984 1125
195
Likelihood of Straggler VMs and Failed Tasks (SaF) A
vera
ge R
un
tim
e in
Min
ute
s
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 33
That’s all well and good, but…
• Scheduling in SWfMS: Static or Greedy Task Queue
• HEFT and LATE have a computational overhead and
require information not available in real scenarios:
– HEFT: runtime estimates of each task on each machine
– LATE: progress rate of each running task
• Untapped optimization potential:
multiple resource scheduling
– Find appropriate matches between tasks and machines
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 34
Summary and Outlook
• EC2: Heterogeneity and instability in VM performance
• DynamicCloudSim introduces several factors of
instability into CloudSim
• Simulation experiments reproduce known strengths
and shortcomings of established schedulers
• Outlook: Comparative evaluation on real hardware
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 35
Thanks for your attention!
https://code.google.com/p/dynamiccloudsim/
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 36
Questions
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 37
Literature • [Braun01] T. D. Braun, H. J. Siegel, N. Beck, L. L. Boloni, M.
Maheswarans, A. I. Reuther, J. P. Robertson, M. D. Theys, B.
Yao, D. Hensgen, R. F. Freund (2001), A Comparison Study of
Eleven Static Heuristics for Mapping a Class of Independent
Tasks onto Heterogeneous Distributed Computing Systems,
Journal of Parallel and Distributed Computing 61:810–837.
• [Blythe05] J. Blythe, S. Jain, E. Deelman, Y. Gil, K. Vahi, A.
Mandal, K. Kennedy (2005), Task Scheduling Strategies for
Workflow-based Applications in Grids, in: Proceedings of the
5th IEEE International Symposium on Cluster Computing and
the Grid, volume 2, Cardiff, UK, pp. 759–767.
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 38
Literature (cont.) • [Jackson10] K. R. Jackson, et al. (2010), Performance Analysis
of High Performance Computing Applications on the Amazon
Web Services Cloud, in: Proceedings of the 2nd International
Conference on Cloud Computing Technology and Science,
Indianapolis, USA, pp. 159-168.
• [Dejun09] J. Dejun, et al. (2009), EC2 Performance Analysis for
Resource Provisioning of Service-Oriented Applications, in:
Proceedings of the 7th International Conference on Service
Oriented Computing, Stockholm, Sweden, pp. 197-207.
• [Zaharia08] M. Zaharia, et al. (2008), Improving MapReduce
Performance in Heterogeneous Environments, in: Proceedings
of the 8th USENIX Symposium on Operating Systems Design
and Implementation, San Diego, USA, pp. 29-42.
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 39
Literature (cont.) • [Schad10] J. Schad, J. Dittrich, J.-A. Quiané-Ruiz (2010),
Runtime Measurements in the Cloud: Observing, Analyzing,
and Reducing Variance, Proceedings of the VLDB Endowment
3(1):460–471.
• [Iosup11] A. Iosup, N. Yigitbasi, D. Epema (2011), On the
Performance Variability of Production Cloud Services, in:
Proceedings of the 2011 11th IEEE/ACM International
Symposium on Cluster, Cloud and Grid Computing, Newport
Beach, California, USA, pp. 104–113.
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 40
Literature (cont.) • [Topcuoglu02] H. Topcuoglu, S. Hariri, M.-Y. Wu (2002),
Performance-Effective and Low-Complexity Task Scheduling
for Heterogeneous Computing, IEEE Transactions on Parallel
and Distributed Systems 13(3):260-274.
• [Berriman04] G. B. Berriman, et al. (2004), Montage: a grid-
enabled engine for delivering custom science-grade mosaics
on demand, in: Proceedings of the SPIE Conference on
Astronomical Telescopes and Instrumentation, volume 5493,
Glasgow, Scotland, pp. 221-232.