experiments with complex scientific applications on hybrid cloud infrastructures

11
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures Maciej Malawski 1,2 , Piotr Nowakowski 1 , Tomasz Gubała 1 , Marek Kasztelnik 1 , Marian Bubak 1,2 , Rafael Ferreira da Silva 3 , Ewa Deelman 3 , Jarek Nabrzyski 4 NSFCloud Workshop on Experimental Support for Cloud CompuOng December 11Q12, 2014, Arlington, VA AGH University of Science and Technology: 1 ACC Cyfronet AGH, ul. Nawojki 11, 30-950 Kraków, Poland 2 Department of Computer Science, al. Mickiewicza 30, 30-095 Kraków, Poland 3 University of Southern California, Information Sciences Institute, Marina Del Rey, CA, USA 4 Center for Research Computing, University of Notre Dame, IN, USA

Upload: rafael-ferreira-da-silva

Post on 14-Jul-2015

142 views

Category:

Technology


1 download

TRANSCRIPT

Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures

Maciej'Malawski1,2,'Piotr'Nowakowski1,'Tomasz'Gubała1,'Marek'Kasztelnik1,'Marian'Bubak1,2,'Rafael'Ferreira'da'Silva3,'Ewa'Deelman3,'Jarek'Nabrzyski4'

'NSFCloud'Workshop'on'Experimental'Support'for'Cloud'CompuOng''

December'11Q12,'2014,'Arlington,'VA'

AGH University of Science and Technology: 1 ACC Cyfronet AGH, ul. Nawojki 11, 30-950 Kraków, Poland 2 Department of Computer Science, al. Mickiewicza 30, 30-095 Kraków, Poland 3 University of Southern California, Information Sciences Institute, Marina Del Rey, CA, USA 4 Center for Research Computing, University of Notre Dame, IN, USA

2

Research Challenges

•  Execution of complex scientific applications on clouds: workflows and their ensembles •  Pegasus Workflow Management System (OCI SI2-SSI #1148515)

•  HyperFlow Workflow Engine

•  Platform for deployment and sharing of scientific applications on hybrid clouds •  Atmosphere Framework

•  Algorithms for scheduling, provisioning and cost optimization: •  Dynamic and Static Algorithms •  Mathematical Programming •  Cloud Workflow Simulator

2

3

Research: The Atmosphere Framework

3

Hybrid cloud as a means of provisioning computing power for virtual experiments

Cloud'Management'Portlets'

GUI'host'(provisions'endQuser'features'and'access'opOons)'

Provide'GUI'elements'which'enable'service'developers'and'end'users'to'interact'with'the'Atmosphere'plaYorm'and'create/deploy'services'on'the'available'cloud'resources'

Atmosphere'Core'Services'Host'

user'accounts'

Atmosphere'Registry'(AIR)'

available'cloud'sites' services'and'templates'

Atmosphere'Core'

Secure'RESTful'API'(Cloud'Facade)'

• AuthenOcaOon'and'authorizaOon'logic'• CommunicaOon'with'underlying'computaOonal'clouds'•  Launching'and'monitoring'service'instances'• CreaOng'new'service'templates'• Billing'and'accounOng'•  Logging'and'administraOve'services'

Worker'Node'

Worker'Node'

Worker'Node'

Head'Node'

Image'store'OpenStack'cloud'site'at'ACC'CYFRONET'AGH'

96'CPU'cores'

184'GB'RAM'

4'TB'storage'

private'IP'space'

Worker'node'w/large'resource'pool'(“fat'node”)'

Head'Node'

Image'store'VPHQShare'cloud'site'at'UNIVIE'

Worker'node'w/large'resource'pool'(“fat'node”)'

128'CPU'cores'

256'GB'RAM'

4'TB'storage'

private'IP'space'

API'host'

Image'store'Amazon'ElasOc'Compute'Cloud'(EC2)'–'European'availability'zone'

Massive'(funcOonally'limitless)'hardware'resource'pool'public'IP'space'

Worker'Node'

Worker'Node'

4

Research: Simulation and Scheduling of Large-Scale Scientific Workflows on IaaS Clouds

•  Large-scale scientific workflows from Pegasus WMS •  Workflows of 100,000 tasks

•  Workflow Ensembles •  Schedule as many workflows as possible

within a budget and deadline •  Uses a Cloud Workflow Simulator

4

TimeVM

M. Malawski, G. Juve, E. Deelman, J. Nabrzyski: Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds. SC 2012: 22

5

Research: Cost Optimization of Applications on Clouds

•  Infrastructure model •  Multiple compute and storage

clouds •  Heterogeneous instance types

•  Application model •  Bag of tasks •  Multi-level workflows

•  Modeling with AMPL and CMPL •  Modeling Language for

Mathematical Programming

•  Cost optimization •  Under deadline constraints

•  Mixed integer programming •  Bonmin, Cplex solvers

5

M. Malawski, K. Figiela, J. Nabrzyski, Cost minimization for computational applications on hybrid cloud infrastructures, Future Generation Computer Systems, 29(7), 2013, pp.1786-1794, http://dx.doi.org/10.1016/j.future.2013.01.004 M. Malawski, K. Figiela, M. Bubak, E. Deelman, J. Nabrzyski, Cost Optimization of Execution of Multi-level Deadline-Constrained Scientific Workflows on Clouds. PPAM, 2013, 251-260 http://dx.doi.org/10.1007/978-3-642-55224-3_24

0

500

1000

1500

2000

2500

3000

0 10 20 30 40 50 60 70 80 90 100

Cos

t ($)

Time limit (hours)

20000 tasks, 512 MiB input and 512 MiB output, task execution time 0.1h @ 1ccu machine

Rackspace instances

Rackspace and private instances

Amazon's and private instances

Multiple providers

Amazon S3Rackspace Cloud Files

Optimal

Layer 1 A

Layer 2B

B B C

Layer 3 D

Layer 4 E

Layer 5 F

1h

2.5 h

0.5 h

0.3 h

2 h

6 h

Private cloud

Compute

private

Amazon

Storage

Compute

m1.small m1.large

t1.micro m2.xlarge

Task

Input

Output

Application

Rackspace

Storage

Compute

rs.1gb rs.2gb

rs.4gb rs.16gb

6

Research: Cloud Performance Evaluation •  Performance of VM deployment times

•  Virtualization overhead

•  Evaluation of open source cloud stacks •  Eucalyptus, OpenNebula, OpenStack

•  Survey of European public cloud providers

•  Performance evaluation of top cloud providers

•  EC2, RackSpace, SoftLayer •  A grant from Amazon has been obtained

6

M. Bubak, M. Kasztelnik, M. Malawski, J. Meizner, P. Nowakowski, S. Varma, Evaluation of Cloud Providers for VPH Applications, poster at CCGrid2013, Delft, the Netherlands, pp.13-16, 2013

IaaS Provider

EEA Zoning

jClouds API

Support

BLOB storage support

Per-hour

instance billing

API Access

Published price

VM Image

Import / Export

Relational DB

support Score Weight 20 20 10 5 5 5 3 2 1 Amazon AWS 1 1 1 1 1 1 0 1 27 2 Rackspace 1 1 1 1 1 1 0 1 27 3 SoftLayer 1 1 1 1 1 1 0 0 25 4 CloudSigma 1 1 0 1 1 1 1 0 18 5 ElasticHosts 1 1 0 1 1 1 1 0 18 6 Serverlove 1 1 0 1 1 1 1 0 18 7 GoGrid 1 1 0 1 1 1 0 0 15 8 Terremark ecloud 1 1 0 1 1 0 1 0 13 9 RimuHosting 1 1 0 0 1 1 0 1 12

10 Stratogen 1 1 0 0 1 0 1 0 8 11 Bluelock 1 1 0 0 1 0 0 0 5 12 Fujitsu GCP 1 1 0 0 1 0 0 0 5 13 BitRefinery 0 0 0 0 0 1 0 1 0 14 BrightBox 1 0 0 1 1 1 1 0 0 15 BT Global Services 1 0 0 0 1 0 1 0 0 16 Carpathia Hosting 1 0 0 0 0 0 1 0 0 17 City Cloud 1 0 0 1 1 1 0 0 0 18 Claris Networks 0 0 0 1 0 0 0 0 0 19 Codero 0 0 0 1 1 1 0 0 0 20 CSC 1 0 0 0 0 0 1 0 0 21 Datapipe 1 0 0 1 1 0 0 0 0 22 e24cloud 1 0 0 1 0 1 0 0 0 23 eApps 0 0 0 0 0 1 0 0 0 24 FlexiScale 1 0 0 1 1 1 1 0 0 25 Google GCE 1 0 1 1 1 1 0 1 0 26 Green House Data 0 0 0 0 1 0 1 0 0 27 Hosting.com 0 0 0 0 0 1 1 1 0 28 HP Cloud 0 1 1 1 1 1 1 1 0 29 IBM SmartCloud 0 0 1 1 1 1 0 1 0 30 IIJ GIO 0 0 0 0 0 0 0 0 0 31 iland cloud 1 0 0 1 0 1 1 0 0 32 Internap 0 0 1 1 1 1 0 0 0 33 Joyent 0 0 0 1 1 1 0 0 0 34 LunaCloud 1 0 1 1 1 1 0 0 0 35 Oktawave 1 0 1 1 1 1 0 1 0 36 Openhosting.co.uk 1 0 0 0 0 1 0 0 0 37 Openhosting.com 0 1 0 1 1 1 1 0 0 38 OpSource 1 0 1 1 1 1 1 0 0 39 ProfitBricks 1 0 0 1 1 1 0 0 0 40 Qube 1 0 0 0 0 1 0 0 0 41 ReliaCloud 0 0 0 0 0 0 0 0 0 42 SaavisDirect 0 0 1 1 0 1 0 0 0 43 SkaliCloud 0 1 0 1 1 1 1 0 0 44 Teklinks 0 0 0 0 0 0 0 0 0 45 Terremark vcloud 0 1 0 1 1 1 1 0 0 46 Tier 3 0 0 0 0 1 0 0 0 0 47 Umbee 1 0 0 1 1 1 1 0 0 48 VPS.net 1 0 0 0 1 1 0 0 0 49 Windows Azure 1 0 1 1 1 1 0 1 0

7

Experiment: Evaluation of autoscaling techniques for Atmosphere cloud platform

•  Challenges •  Requires repeated tests under

varying workloads •  Experiments in an isolated

environment

•  Goals •  Perform autoscaling based on:

•  Complex event processing •  Time series database

•  Build an isolated environment on NSFCloud

7

8

Experiment: Scalability of Scientific Workflows in HyperFlow Model

•  Challenges •  Issues on data transfers and data locality •  Calibrate the performance models of applications

•  Goals •  Execute large-scale deployments on multi-site NSFCloud facilities •  Assess the impact of network latency and bandwidth limitations

8

PaaSage application

Hyperflowengine

Task scheduler

Cloud

Executor 1Executor 1

VMs

Workflow componentsExecutor

RabbitMQ

Job queue

Results

Monitoring

Ready tasks Scheduled tasks

Redis

CAMEL model

Workflow graph

Workflow CAMEL

generator

Workflow generator

PaaSage platform

Upperware Executionware

MetricsMetrics

Deploy & scale infrastructure

9

Experiment: Influence of Variability of Clouds on the Quality of Algorithms

•  Challenges •  Static scheduling methods assume that the

estimates of task runtimes are available •  The runtime variations and various

uncertainties influence the actual execution

•  Goals •  A large-scale experimental testbed will allow

investigating the influence of the uncertainties

•  Development of new models to mitigate uncertainties negative effects

9

0.0

0.5

1.0

1.5

2.0

Mak

espa

n / D

eadl

ine

DPD

SW

ADPD

SSP

SSD

PDS

WAD

PDS

SPSS

DPD

SW

ADPD

SSP

SSD

PDS

WAD

PDS

SPSS

DPD

SW

ADPD

SSP

SSD

PDS

WAD

PDS

SPSS

DPD

SW

ADPD

SSP

SS

0 % 1 % 2 % 5 % 10 % 20 % 50 %

DPDSWADPDSSPSS

Runtime estimate error

10

Experiment: Interoperation of Cloud Testbed of PL-Grid Infrastructure with NSFCloud

•  PL-Grid •  One of the largest national grid infrastructures in Europe (2500+ users,

500+ teams) •  Cloud testbed based on OpenNebula and OpenStack

•  Goals •  Possibility to run transatlantic and global-scale experiments •  Evaluation of impact of wide-area and high-latency networks

10

Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures

Thank&you.&

!DICE!Team!at!AGH:!h0p://dice.cyfronet.pl!

Center!for!Research!Compu@ng!at!Notre!Dame:!h0ps://crc.nd.edu!Pegasus!Team!at!USC:!h0p://pegasus.isi.edu!