simulation of heterogeneous cloud infrastructures

Simulation of heterogeneous cloud

infrastructures

Konstantinos Giannoutakis

Information Technologies Institute/

Centre for Research and Technology

Hellas

ITI/CERTH

Overview • Introduction

• Simulation

• Towards a framework for simulating heterogeneous clouds

• Conclusions

Cloud

Environments

• Cloud environments are becoming more popular during the past

decades.

• This fact is due to the flexibility of Cloud environments in resource

allocation as well as resiliency in both software and underlying

hardware.

• According to Forbes, 92% of the total workload will be executed in

Cloud environments by 2020.

• Moreover, by 2020 hyper-scale data-centers will be 485 (259 in

2015 or 21% of all installed data-center servers) or 47% of all

installed data-center servers. (Cisco Global Cloud Index)

• Three major Cloud providers (Microsoft, Amazon, Google) have

almost 1.5 million data-centers.

Cloud

Environments

• Traditional Cloud environments are formed using CPU-based data-

centers and their architecture is based on the Warehouse Scale

Computer (WSC) (Barroso and Holzle, 2009).

• Recently, heterogeneous hardware such as:

- GPUs

- Intel MICs

- FPGA

- High Performance Clusters

started to be integrated in the Cloud in order to be used for

processing more demanding and specialized workloads (i.e. HPC

applications), while simultaneously decreasing energy

consumption.

• However, the addition of such hardware substantially increases

complexity of monitoring this hardware, provision it or developing

software that can fully take it into advantage.

Cloud

Environments

Cloud

Environments

Despite the Pros in modern Cloud environments there are also Cons

which include:

• Overprovisioning: More resources are installed than actually

required in order to match user requests.

• Underutilization: Utilization of modern Clouds is very low (20%-

30%) resulting to increased power consumption.

• Management Issues: Due to the continuously growing scale and

heterogeneity of modern Cloud environments the centralized

management is not effective since more and more outdated data is

used for resource provisioning.

• Organization Issues: Organization of resources to maximize

utilization and "sharpen" the choice of adequate hardware to match

accurately end-user needs is based on global decisions.

Cloud

Environments

• Most of these problems can be tackled effectively by local

decisions based on a hierarchical self-organization and self-

management system.

• The CloudLightning project aims in using self-organization and

self-management strategies to effectively manage heterogeneous

resource at hyper-scale.

• However, one question remains: How we evaluate, study or

improve hyper-scale Cloud environments, especially since most

hyper-scale environments belong to private companies?

• The answer is: Simulation

Requirements

for hyper-scale

simulations

What are the key requirements, that limit existing DES simulators, for

hyper-scale simulations?

• Very large amount of computations.

• Accurate models for power consumption based on adequate

interpolating models.

• Native parallel design in order to be able to execute in HPC

environments.

• Support for tasks that can span across multiple Virtual Machines

(VMs).

• Support for accelerators (GPU,MIC,DFE).

• The simulator should be designed in a language that is build for

high performance computations (i.e. C or C++).

With the above in mind, a new simulator has been build

(CloudLightning simulator).

Architecture • In order to design a simulator for large scale phenomena we can

borrow the design from large scale Engineering and Physics

simulations.

• These simulations are based on a time-advancing loop with

prescribed time granularity.

• The time advances from t0 = 0 to tend with a prescribed sampling

step tstep. (in seconds, milliseconds, etc.).

• This design enables for integration of dynamical components,

since the state of these components can be updated with respect

to tstep.

• This time-stepping approach allows for a dynamic resolution of the

results, since a large time-step will only reveal a coarse picture of

the system, while a small time-step will reveal more fine

interactions.

Architecture

Abstract Cloud architecture with one Cell with one data-center

Architecture Note: A Cell can contain multiple data-centers (WSCs) but one broker

(Cell Manager).

Algorithm 1 Driver for the hyper-scale Cloud Simulator

1: Initialize data-centers, network, storage

2: for t = t0 = 0 to tmax with step tstep do

3: Create task queue in Gateway Service at time t

4: Send tasks to Broker

5: Receive tasks from Broker and find adequate resources

6: Assign tasks to the resources

7: Perform update on the affected components from the task-assignment

8: end for

Parallelization • The parallelization of the simulator is a two stage process (Coarse-

Fine grain parallelization).

• The Gateway Service is residing in the Head Node and is

responsible for creating the task queue and sending tasks to the

Cells.

• Each Cell resides in one multi-core compute node.

• Coarse grain parallelization can be performed via the Message

Passing Interface (MPI).

• The communications between the Gateway Service and the Cells

are minimal, thus even for large number of incoming tasks the

overall running time is not affected significantly.

• These communications are limited to sending the task resource

requirements and parameters: (1) Number of VMs, (2) Number of

vCPUs per VM, (3) Memory per VM, (4) Storage per VM and (5)

Network Bandwidth.

Parallelization • The most computationally intensive task is the search for adequate

resources for a task and the update of the state of all the

components inside a Cell.

• However, these actions can be performed in parallel using the

multiple cores in each compute node. This, fine-grain

parallelization substantially accelerates simulation locally.

• This inherently parallel design scales both in terms of number of

Cells (horizontally) and number of resources in the Cell (vertically).

• Moreover, this design enables the use of dynamic components

(dynamic Brokers) which change their logical architecture based

on characteristics of the underlying resources.

ParallelizationAlgorithm 2 Parallel driver for the hyper-scale Cloud Simulator

1: Initialize local data-centers, network, storage in each Cell

2: for t = t0 = 0 to tmax with step tstep do

3: if Head Node then

4: Create task queue in Gateway Service at time t

5: Send tasks to Broker

6: else

7: Receive tasks from Broker and find adequate resources in parallel

8: Assign tasks to the resources

9: Perform update in parallel on the affected components

10: end if

11: Barrier synchronization of distributed threads

12: end for

Power

consumption

models

The power consumption models used for servers in Cloud simulators

are of two kinds:

• Global models based on minimum and maximum power

consumption. For example:

𝑃 𝑢 = 𝑃𝑖𝑑𝑙𝑒 + (𝑃𝑚𝑎𝑥 − 𝑃𝑖𝑑𝑙𝑒)u, u ∈ [0, 1]

where u is the utilization of the server.

• Piecewise linear interpolation from data obtained from

organizations such as spec.org. This data is the measured power

consumption under certain utilization. For example:

𝑃 𝑢 = 𝑃(𝑢𝑖)+(𝑃(𝑢𝑖+1) −𝑃(𝑢𝑖))(𝑢−𝑢𝑖), 𝑢𝑖 ≤ 𝑢 ≤ 𝑢𝑖+1, 𝑢 ∈ [0,10)

The two examples are the usual practices for computing the power

consumption in present simulators.

For piecewise interpolation models, "not-a-knot" piecewise cubic

interpolation can be used.

Support for

accelerators

• Available simulators do not support accelerators such GPUs, MICs

and FPGAs.

• The execution model of the these devices is similar to that of the

CPUs, however, accelerators cannot be shared by multiple users

in the Cloud.

• Thus, if a user acquires an accelerator its computational power is

totally utilized by that instance.

• The power consumption of these devices is either minimum (idle

state) or maximum. Thus, the power consumption of a server with

accelerators is:

𝑃 𝑢 = 𝑃𝑐𝑝𝑢 𝑢 +

𝑖=1

𝑛𝑎𝑐𝑐

𝜌𝑖𝑃𝑚𝑎𝑥𝑎𝑐𝑐 +

𝑖=1

𝑛𝑎𝑐𝑐

(1 − 𝜌𝑖)𝑃𝑖𝑑𝑙𝑒𝑎𝑐𝑐

where 𝜌𝑖 ∈ [0,1] is the average utilization of the i-th accelerator

and 𝑛𝑎𝑐𝑐 the number of accelerators.

Execution

models

There are three basic types of scheduling execution for the VMs residing on a

server:

• Modern execution models are

primarily based on Space-

Time.

• Gang scheduling is applied by

modern operating systems. In

Gang scheduling threads

belonging to the same

application are scheduled

together.

• Bag of Gangs scheduling is

used when simultaneous

application with multiple

threads are scheduled

together.

Space shared:

Time shared:

Space-Time shared:

Design and

Extensibility

• The simulator based on the presented analysis has been designed

and implemented in C++.

• The Message Passing Interface (MPI), for distributed memory

parallel systems, as well as the Open Multi-Processing (OpenMP),

for shared memory parallel systems, libraries and extensions are

supported in C++.

• The C++ STL includes all the required libraries to build required

lists, queues and maps.

• C++ is also a compiled language and offers fine grain control in

memory and threads.

Design and

Extensibility

Design and

Extensibility

• The selected decomposed approach enables for easy extension of

the simulator.

• The extension procedure requires only to insert methods to the

appropriate class. In example, a new power consumption model

can be inserted in the Power Consumption component.

• Adding models can be performed with minimal interaction with the

source code.

• The addition of a new component, in example, a second statistics

engine, requires designing the new class, updating the Cell class

to include it and add the update procedure in the Update and

Statistics Engine.

• Finally, the MPI is responsible for scaling across Compute Nodes,

while OpenMP is responsible for scaling update and search

procedures across the available Cores of a compute node.

Conclusions • A new hybrid parallel framework for hyper-scale Cloud simulations

has been presented that takes advantage of HPC clusters.

• The new framework is extensible in terms of new models and

components with small software additions.

• Improved power consumption models have been considered that

are more natural to the actual power consumption of modern

CPUs.

• Execution and power consumption models for accelerators have

been given.

References • L.A. Barroso and U. Holzle, The Datacenter as a Computer: An

Introduction to the Design of Warehouse-Scale Machines, Morgan

& Claypool Publishers, 2009.

• Cisco, Cisco Global Cloud Index: Forecast and Methodology,

2015-2020, 2016.

• CloudLightning, http://www.cloudlightning.eu, 2016.

Konstantinos Giannoutakis

[email protected]

THANK YOU