dynamic placement of virtual machines for managing sla violations

29
Dynamic Placement of Virtual Machines for Managing SLA Violations NORMAN BOBROFF, ANDRZEJ KOCHUT, KIRK BEATY SOME SLIDE CONTENT ADAPTED FROM ALEXANDER NUS PRESENTED BY JON LOGAN

Upload: gerodi

Post on 23-Feb-2016

83 views

Category:

Documents


0 download

DESCRIPTION

Dynamic Placement of Virtual Machines for Managing SLA Violations. Norman Bobroff , Andrzej Kochut , Kirk Beaty Some slide content adapted from Alexander Nus Presented by Jon Logan. Virtual machines are becoming more and more popular throughout our datacenters Servers use electricity - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dynamic Placement of Virtual Machines for Managing SLA Violations

Dynamic Placement of Virtual Machines for Managing SLA ViolationsNORMAN BOBROFF, ANDRZEJ KOCHUT, KIRK BEATYSOME SLIDE CONTENT ADAPTED FROM ALEXANDER NUSPRESENTED BY JON LOGAN

Page 2: Dynamic Placement of Virtual Machines for Managing SLA Violations

Motivation

Virtual machines are becoming more and more popular throughout our datacenters

Servers use electricity Electricity can be expensive!

How do we minimize the number of utilized machines, while meeting our SLA obligations?

Usage patterns of machines are NOT static, and generally change dynamically

Page 3: Dynamic Placement of Virtual Machines for Managing SLA Violations

Goals

Maximize utilization of active machines Minimize Service Level Agreement (SLA) violations Minimize number of active machines

Power off unused machines to conserve cost (electricity)

Essentially, minimize cost while meeting SLA guarantees

Page 4: Dynamic Placement of Virtual Machines for Managing SLA Violations

Static Allocation

All machines are taken offline, and historical usage is used to determine ideal placement

Happens very infrequently (~weeks or months) Must interrupt service to relocate Utilization is not consistent in many cases!

Demand may vary significantly within the period between allocations

Page 5: Dynamic Placement of Virtual Machines for Managing SLA Violations

Dynamic Allocation

VMs are seamlessly migrated between machines based on predicted demand

Is done rather frequently (~minutes, hours) Live migration

Minimal (~ms) service disruptions during migration Allows for allocations to more closely follow

demand

Page 6: Dynamic Placement of Virtual Machines for Managing SLA Violations

Live Migration

Moves a VM image between machines without service interruption

The paper cites a ~45 second transition time VM must be serialized and transferred over the

network Artificially limits our reallocation period

Can’t reallocate faster than we can migrate!

Page 7: Dynamic Placement of Virtual Machines for Managing SLA Violations

Service Level Agreement

Essentially is a contract between the provider and the customer that states that resources R will be available X% of the time Violations cost money! X is usually high (ex. 95%)

VMs do not necessarily use this entire resource allocation at all times, but it must be available should they choose to use it Ex. VM may be doing batch processing, and only do substantial work

between 12:00AM and 1:00AM

Page 8: Dynamic Placement of Virtual Machines for Managing SLA Violations

Static vs Dynamic Usages

Workloads are not static! Try to predict the usage of the VM in a

time T Reallocate machines to be able to meet

that predicted usage Need to be within a certain percentile to

meet SLA requirements Capacity savings is simply

Static Allocation - (Predicted Usage + Error Factor)

Repeat this process every time T

Page 9: Dynamic Placement of Virtual Machines for Managing SLA Violations

What Workloads Are Best For Dynamic Allocation?

Not all Workloads are created equal Some tend to be better than others

Constant workloads = bad! A workload is an ideal candidate for dynamic

allocation if It has strong variability AND It has strong autocorrelation combined with periodic

behavior Essentially, you need to have a decent degree of

variability, and be able to reasonably predict its usage

Page 10: Dynamic Placement of Virtual Machines for Managing SLA Violations

Workload 3a

Strongly variable – good Autocorrelation ~0.8 – good Weak periodic behavior – bad

Verdict – Good Large variability offers significant

potential for optimization Strong autocorrelation makes it possible

to obtain a low-error predication

Page 11: Dynamic Placement of Virtual Machines for Managing SLA Violations

Workload 3b

Weakly variable - bad Decaying autocorrelation - bad Weak periodic behavior – bad

Verdict – Bad Low variability makes potential gain low Weak autocorrelation and no periodic

component make it difficult to predict demand

Page 12: Dynamic Placement of Virtual Machines for Managing SLA Violations

Workload 3c

Strongly variable – good Strong Autocorrelation– good Strong periodic behavior – good

Verdict – Very Good An ideal case for dynamic

allocation

Page 13: Dynamic Placement of Virtual Machines for Managing SLA Violations

Potential Gain

Page 14: Dynamic Placement of Virtual Machines for Managing SLA Violations
Page 15: Dynamic Placement of Virtual Machines for Managing SLA Violations

Demand forecast algorithm

Determine the periods in demand using ‘common sense’ aided by periodogram (e.g.time-of-day,day of week,…)

Decompose the process into deterministic periodic and residual components Di + ri

Estimate the deterministic part using averaging of multiple smoothed historical periods

Fit Auto Regressive Moving Average (ARMA) model to the residual process

Use the combined components for demand predictionUi = Di + ri

Page 16: Dynamic Placement of Virtual Machines for Managing SLA Violations

Management Algorithm

Goal is to minimize time averaged number of active servers without violating the SLA agreement

Machines that are not utilized to handle VMs are powered off or put in a low power state Will be reactivated if/when required (minimally, the next period)

The time to power on & migrate must be less than the period T Responsible for actual migrations of machines Placing of VMs is essentially a version of the bin packing problem

NP hard! We use an approximation, using first-fit

Page 17: Dynamic Placement of Virtual Machines for Managing SLA Violations

Management Algorithm

Measure – Measure usage Forecast – Predict usage for the next window Remap – Relocate machines if necessary

Preform this (MFR) at regular intervals Designed to try to predict the “best we can do”

Page 18: Dynamic Placement of Virtual Machines for Managing SLA Violations

Management Algorithm Overview

Page 19: Dynamic Placement of Virtual Machines for Managing SLA Violations

Key Terms

N – virtual machines M – physical machines Cm – Maximum capacity of physical machine fn

i, k – forcast value for resource demand of VM n at interval i+k

R – migration interval Cp(u, o2) – (1-p)-percentile of Gaussian distribution with

mean u and variance o2

Page 20: Dynamic Placement of Virtual Machines for Managing SLA Violations

Management Algorithm

Page 21: Dynamic Placement of Virtual Machines for Managing SLA Violations

Management Algorithm (2)

Page 22: Dynamic Placement of Virtual Machines for Managing SLA Violations

Management Algorithm (3)

Page 23: Dynamic Placement of Virtual Machines for Managing SLA Violations

Management Algorithm (4)

Page 24: Dynamic Placement of Virtual Machines for Managing SLA Violations

Simulations

Simulated using traces gathered from hundreds of production servers using various applications

Traces contain CPU, memory, storage, and network We are only focusing on CPU usage

Samples were collected every 15 minutes The simulated study

Verifies that the MFR meets SLA targets Quantifies the reduction of SLA violations Quantifies the number of saved machines Explores the relationship between the remapping interval and the gain from

dynamic management Performs measurements to determine properties of a practical infrastructure with

respect to migration of VMs

Page 25: Dynamic Placement of Virtual Machines for Managing SLA Violations

Overflows vs Number of PMs

Page 26: Dynamic Placement of Virtual Machines for Managing SLA Violations

Number of Machines vs Overflow Desired

Significantly reduces number of machines active

Page 27: Dynamic Placement of Virtual Machines for Managing SLA Violations

Performance degrades as the migration interval increases

Essentially, the prediction is the max usage predicted within the range

Page 28: Dynamic Placement of Virtual Machines for Managing SLA Violations

Limitations

The paper only looks at one resource utilization In this case, CPU utilization In the real world, you have numerous resources to handle

allocations for Memory, CPU, IO, Network, etc.

Assumes bandwidth between machines is free & unrestricted Relocating some VMs in some cases may not be worth the cost of

relocating the image Their study size is small

Only 6 physical machines What if different VMs have different SLA requirements? What if your PMs had differing hardware?

Page 29: Dynamic Placement of Virtual Machines for Managing SLA Violations

Conclusion

Based on the simulated data, it significantly reduces cost to execute virtual machines

Relies on an ideal case of VMs Predictable and volatile usage

Algorithm could be optimized to reduce the number of VM relocations, or to more optimally schedule

Simulation is too small The paper claims a 44% average savings in the number of

active PMs