dynamiccloudsim: simulating heterogeneity in computational …buxmarcn/... · 2017. 1. 4. · and...

40
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds Marc Bux, Ulf Leser {bux|leser}@informatik.hu-berlin.de The 2nd international workshop on Scalable Workflow Enactment Engines and Technologies (SWEET'13)

Upload: others

Post on 28-Apr-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in

Computational Clouds

Marc Bux, Ulf Leser

{bux|leser}@informatik.hu-berlin.de

The 2nd international workshop on Scalable Workflow

Enactment Engines and Technologies (SWEET'13)

Page 2: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 2

Meet Sandra

Page 3: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 3

Meet Sandra

Page 4: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 4

Meet Sandra

Page 5: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 5

• Small Instance: 1.7 GB RAM, 1 EC2 Compute Unit, 160 GB local storage

• Compute Unit: equiv. CPU capacity of a 1.0-1.2 GHz Opteron or Xeon

• No guarantees wrt. I/O throughput and network delay / bandwidth

Meet Paul

Page 6: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 6

Any one cloud instance is unlike another.

Meet Paul

Page 7: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 7

Heterogeneity in EC2 Cloud Instances • Different CPUs on physical

host systems [Jackson10, Schad10]

– Intel Xeon E5430 (2.66 GHz quad)

– AMD Opteron 270 (2 GHz dual)

– AMD Opteron 2218 HE (2.6 GHz dual)

• I/O throughput varies as well [Dejun10]

– No correlation between

CPU and I/O performance

Am

azo

n E

C2

Per

form

an

ce [

Sch

ad

10

]

Sou

rce:

[D

eju

n1

0]

Page 8: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 8

• Occasional CPU performance slumps and failures during task

execution [Dejun10, Jackson10]

• Variance in I/O and network throughput [Zaharia08 ,Jackson10]

• Performance depends on hour of day and day of week [Schad10]

Dynamic Changes of Performance

EC2 Disk performance vs. VM co-allocation [Zaharia08]

CPU performance slumps [Dejun10]

Page 9: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 9

Vision

Adaptive scheduling of scientific workflows

• Exploit heterogeneous resources

• Exhibit robustness to instability

Page 10: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 10

Vision

• The standard approach for evaluation is simulation [Braun01, Blythe05]

• Cloud simulation toolkits do not model instability

Page 11: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 11

Agenda

1) Simulating Heterogeneity in Computational Clouds

2) Evaluating Established Workflow Schedulers

3) Summary and Outlook

Page 12: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 12

Agenda

1) Simulating Heterogeneity in Computational Clouds

2) Evaluating Established Workflow Schedulers

3) Summary and Outlook

Page 13: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 13

CloudSim

Datacenter

Host

VM

Task

• R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, R. Buyya (2011),

CloudSim: a toolkit for modeling and simulation of cloud computing

environments and evaluation of resource provisioning algorithms,

Software - Practice and Experience 41(1):23-50.

• More than 250 citations in Google Scholar

• https://code.google.com/p/cloudsim/

Page 14: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 14

DynamicCloudSim

Datacenter

Heterogeneous Host

Dynamic VM

Error-prone Task

• Extend CloudSim with models for

1. Heterogeneous computational resources (Het)

2. Dynamic changes of performance at runtime (DCR)

3. Straggler VMs and failed task executions (SaF)

• More fine-grained representation of computational resources

• https://code.google.com/p/dynamiccloudsim/

Page 15: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 15

Realism – can we ever get there?

• Simulation can never perfectly resemble reality

• We model inhomogeneity and dynamic changes by

sampling from normal distributions

• Default mean and STD/RSD Parameters are obtained

from [Zaharia08, Dejun10, Jackson10, Schad10, Iosup11]

Many performance characteristics in EC2 follow a normal distribution [Schad10]

Page 16: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 16

Simulating VM Performance: DCS vs CS

1. Heterogeneous computational resources (Het)

2. Dynamic changes of performance at runtime (DCR)

3. Straggler VMs and failed task executions (SaF)

Page 17: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 17

Agenda

1) Simulating Heterogeneity in Computational Clouds

2) Evaluating Established Workflow Schedulers

a) Scheduling Scientific Workflows

b) Evaluation Workflows

c) Evaluation Results

3) Summary and Outlook

Page 18: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 18

Agenda

1) Simulating Heterogeneity in Computational Clouds

2) Evaluating Established Workflow Schedulers

a) Scheduling Scientific Workflows

b) Evaluation Workflows

c) Evaluation Results

3) Summary and Outlook

Page 19: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 19

Scheduling of Scientific Workflows

• Scheduling:

– Mapping tasks to the available physical resources

– Usual goal: minimize overall execution time

• Static Scheduling:

– Schedule is assembled prior to workflow execution

– Schedule is strictly abided at runtime

• Adaptive Scheduling:

– Monitor computational infrastructure

– Adjust workflow execution at runtime

Page 20: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 20

Static Schedulers

• Baseline: Round Robin

– Assign tasks to resources in turn

– Equal amount of tasks per resource

• Elaborate: HEFT (Het. Earliest Finish Time) [Topcuoglu02]

– Implemented in SWfMS Pegasus

– Requires runtime estimates for each task on each resource

– Assign tasks with longest time to finish a fixed timeslot on

a suitable (well-performing) resource

– Exploit heterogeneity in computational infrastructure (Het)

Page 21: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 21

Adaptive Schedulers

• Baseline: Greedy Task Queue

– Assign tasks to resources at runtime in first-come-first-

served manner

– Adapts to changes of performance at runtime (DCR)

• Elaborate: LATE (Longest Approx. Time to End) [Zaharia08]

– Developed for Hadoop to increase robustness to instability

– 10% of Tasks progressing at rate below average are

replicated and speculatively executed

– Exploit dynamic changes of performance

– Robust to straggler VMs and failed task executions (SaF)

Page 22: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 22

Agenda

1) Simulating Heterogeneity in Computational Clouds

2) Evaluating Established Workflow Schedulers

a) Scheduling Scientific Workflows

b) Evaluation Workflows

c) Evaluation Results

3) Summary and Outlook

Page 23: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 23

Evaluation Workflow: Montage [Berriman04]

Page 24: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 24

Abstract Montage Workflow

One task can have many task instances.

Page 25: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 25

Concrete Montage Workflow • 43,318 tasks reading and writing 534 GB of data

• 10 GB input files which have to be uploaded to the cloud

• Determine avg. runtime over 100 simulations of workflow exec.

Page 26: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 26

Eval. Workflow: Comparative Genomics

Page 27: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 27

Concrete Genomics Workflow

Page 28: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 28

Concrete Genomics Workflow

• Align 10% of the reads produced in a sequencing experiment

against the smallest of human chromosomes (chr22)

– Use about 0.2% of the available data

• 4,266 tasks reading and writing 436 GB of data (2.3 GB upload)

Indexing (bowtie, SHRiMP, PerM)

Alignment (bowtie, SHRiMP, PerM)

Convert (samtools view)

Sort (samtools sort)

Merge (merge)

Preprocess (samtools mpileup)

Variant calling (VarScan)

“Sense-Making” (VCFTools)

Upload to cloud

Download from cloud

Page 29: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 29

Agenda

1) Simulating Heterogeneity in Computational Clouds

2) Evaluating Established Workflow Schedulers

a) Scheduling Scientific Workflows

b) Evaluation Workflows

c) Evaluation Results

3) Summary and Outlook

Page 30: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 30

Runtime depending on Heterogeneity (Het)

0

0.125

0.25

0.3750.5

0

200

400

600

800

1000

1200

1400

Static RoundRobin

HEFTGreedyQueue

LATE

368 304

296 311

371 301 300 308

450

296 303 315

715

296 308 313

1314

286 300 300

RSD Parameters for Heterogeneous Resources (Het) A

vera

ge R

un

tim

e in

Min

ute

s

0

0.125

0.25

0.3750.5

0

200

400

600

800

Static RoundRobin

HEFTGreedyQueue

LATE

203

143 163 178

220

148 163 179

275

150 166 177

602

152 187 182

747

149 195 185

RSD Parameters for Heterogeneous Resources (Het) A

vera

ge R

un

tim

e in

Min

ute

s

Page 31: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 31

Runtime depending on Dynamic Changes (DCR)

0

0.125

0.25

0.3750.5

0

100

200

300

400

500

600

Static RoundRobin

HEFTGreedyQueue

LATE

368

304 296 311

352 301

296 317

394 357

299 308

465 439

311 299

574 530

307 289

RSD Parameters for Dynamic Changes at Runtime (DCR) A

vera

ge R

un

tim

e in

Min

ute

s

0

0.125

0.25

0.3750.5

0

100

200

300

400

Static RoundRobin

HEFTGreedyQueue

LATE

203

143 163 178

216

165 166 176

241

190 165 179

295 255

170 180

393

314

207 177

RSD Parameters for Dynamic Changes at Runtime (DCR) A

vera

ge R

un

tim

e in

Min

ute

s

Page 32: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 32

Runtime with Stragglers and Failures (SaF)

0

0.00625

0.0125

0.018750.025

0

500

1000

1500

2000

2500

3000

Static RoundRobin

HEFTGreedyQueue

LATE

368 304

296 311

598 405

396 316

876 659

586

317

1365

962 790

316

2559

1291 1137

321

Likelihood of Straggler VMs and Failed Tasks (SaF) A

vera

ge R

un

tim

e in

Min

ute

s

0

0.00625

0.0125

0.018750.025

0

500

1000

1500

2000

Static RoundRobin

HEFTGreedyQueue

LATE

203 143 163

178

352 262

237 180

617

411 444

187

1025

604 635

188

1990

984 1125

195

Likelihood of Straggler VMs and Failed Tasks (SaF) A

vera

ge R

un

tim

e in

Min

ute

s

Page 33: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 33

That’s all well and good, but…

• Scheduling in SWfMS: Static or Greedy Task Queue

• HEFT and LATE have a computational overhead and

require information not available in real scenarios:

– HEFT: runtime estimates of each task on each machine

– LATE: progress rate of each running task

• Untapped optimization potential:

multiple resource scheduling

– Find appropriate matches between tasks and machines

Page 34: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 34

Summary and Outlook

• EC2: Heterogeneity and instability in VM performance

• DynamicCloudSim introduces several factors of

instability into CloudSim

• Simulation experiments reproduce known strengths

and shortcomings of established schedulers

• Outlook: Comparative evaluation on real hardware

Page 35: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 35

Thanks for your attention!

https://code.google.com/p/dynamiccloudsim/

Page 36: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 36

Questions

Page 37: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 37

Literature • [Braun01] T. D. Braun, H. J. Siegel, N. Beck, L. L. Boloni, M.

Maheswarans, A. I. Reuther, J. P. Robertson, M. D. Theys, B.

Yao, D. Hensgen, R. F. Freund (2001), A Comparison Study of

Eleven Static Heuristics for Mapping a Class of Independent

Tasks onto Heterogeneous Distributed Computing Systems,

Journal of Parallel and Distributed Computing 61:810–837.

• [Blythe05] J. Blythe, S. Jain, E. Deelman, Y. Gil, K. Vahi, A.

Mandal, K. Kennedy (2005), Task Scheduling Strategies for

Workflow-based Applications in Grids, in: Proceedings of the

5th IEEE International Symposium on Cluster Computing and

the Grid, volume 2, Cardiff, UK, pp. 759–767.

Page 38: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 38

Literature (cont.) • [Jackson10] K. R. Jackson, et al. (2010), Performance Analysis

of High Performance Computing Applications on the Amazon

Web Services Cloud, in: Proceedings of the 2nd International

Conference on Cloud Computing Technology and Science,

Indianapolis, USA, pp. 159-168.

• [Dejun09] J. Dejun, et al. (2009), EC2 Performance Analysis for

Resource Provisioning of Service-Oriented Applications, in:

Proceedings of the 7th International Conference on Service

Oriented Computing, Stockholm, Sweden, pp. 197-207.

• [Zaharia08] M. Zaharia, et al. (2008), Improving MapReduce

Performance in Heterogeneous Environments, in: Proceedings

of the 8th USENIX Symposium on Operating Systems Design

and Implementation, San Diego, USA, pp. 29-42.

Page 39: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 39

Literature (cont.) • [Schad10] J. Schad, J. Dittrich, J.-A. Quiané-Ruiz (2010),

Runtime Measurements in the Cloud: Observing, Analyzing,

and Reducing Variance, Proceedings of the VLDB Endowment

3(1):460–471.

• [Iosup11] A. Iosup, N. Yigitbasi, D. Epema (2011), On the

Performance Variability of Production Cloud Services, in:

Proceedings of the 2011 11th IEEE/ACM International

Symposium on Cluster, Cloud and Grid Computing, Newport

Beach, California, USA, pp. 104–113.

Page 40: DynamicCloudSim: Simulating Heterogeneity in Computational …buxmarcn/... · 2017. 1. 4. · and Distributed Systems 13(3):260-274. • [Berriman04] G. B. Berriman, et al. (2004),

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 40

Literature (cont.) • [Topcuoglu02] H. Topcuoglu, S. Hariri, M.-Y. Wu (2002),

Performance-Effective and Low-Complexity Task Scheduling

for Heterogeneous Computing, IEEE Transactions on Parallel

and Distributed Systems 13(3):260-274.

• [Berriman04] G. B. Berriman, et al. (2004), Montage: a grid-

enabled engine for delivering custom science-grade mosaics

on demand, in: Proceedings of the SPIE Conference on

Astronomical Telescopes and Instrumentation, volume 5493,

Glasgow, Scotland, pp. 221-232.