virtualizing mission-critical apps 1pm est, 3/29/2011 ilya mirman philip thomas

30
Virtualizing Mission-Critical Apps 1PM EST, 3/29/2011 Ilya Mirman Philip Thomas

Upload: mariel

Post on 21-Feb-2016

41 views

Category:

Documents


1 download

DESCRIPTION

Virtualizing Mission-Critical Apps 1PM EST, 3/29/2011 Ilya Mirman Philip Thomas. Agenda. The Rise of “The Virtualization Chasm” 3 Fundamental inefficiencies Best practices Live demonstration. Background. Before Virtualization. Excess capacity to keep utilization under 80%. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

Virtualizing Mission-Critical Apps

1PM EST, 3/29/2011Ilya MirmanPhilip Thomas

Page 2: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

2

Agenda

• The Rise of “The Virtualization Chasm”• 3 Fundamental inefficiencies• Best practices• Live demonstration

Page 3: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

Background

Page 4: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

4

Before Virtualization

10

12

14

16

2

4

8

6

Cap

acity

• Traditional IT guarantees apps’performance by

– Dedicating physicalmachines (PM) to apps

– Provisioning sufficient capacityto service peak loads

• Consider an app requiring16 cores, 8GB memory and 10kIOPS (IO Per Sec) IO bandwidthto service its peaks

PM

Excesscapacityto keep

utilizationunder 80%

Peak CPU Workload

CPUcapacity16 cores

Memory capacity:

8 GB

IO capacity: 10k IOPS

CPU

Mem

IO

Page 5: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

5

Over-Provisioning Waste

• Workloads are ‘bursty’: Average/peak is often under 10%

• Dedicating hardware wastes the slack capacity between average & peak 10

12

14

16

2

4

8

6

Cap

acity

PM

Capacity over-

provisioned for peak demands

Averageutilization:

10%

Wastedcapacity

Page 6: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

6

Virtualization is Set to Resolve This Waste

• Consolidate workloads into shared PMs• This increases average utilization additively• But it also increases interference among VMs

– E.g., Peak traffic of VM1 can interferewith CPU availability for other VMs

VM1 VM2 VM3 VM4 VM5 VM6 VM7 VM8 VM9 VM10

2

4

8

6

Peak Workloads of VMs

PMs

Consolidate into shared

PMs

Page 7: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

7

VMs Compete for Resources

• Best-effort resource allocations (vs. dedicated)– VMs get their allocations, if capacity is available– VMs experience interference when capacity is insufficient

• Interference can create congestion, bottlenecks and delays• Performance-insensitive apps can tolerate interference

– Permit simple, risk-free virtualization

But mission-critical apps are highly vulnerable to interference!

Page 8: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

8

The Rise of “The Virtualization Chasm”

Percentage Apps Virtualized20% 80% 100%

RO

I

40%

Production Apps

“The Virtualization-Chasm”

Virtualization 1.0 Virtualization 2.0

• Virtualization 1.0: Virtualize performance-insensitive apps– E.g., Print servers, non-critical web apps (The low-hanging fruits)– 20%-30% of enterprise apps

Performance-Insensitive Apps

• Virtualization 2.0: Virtualize production apps– The remaining 70%-80% important/critical production apps

Page 9: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

Virtualizing Mission-Critical Apps

Page 10: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

10

The Key Challenge: Ensuring That ProductionApps Get Their Resources

• Interference results from statistical over-commitment – Apps’ demands can exceed capacity momentarily

• Interference may be controlled by two mechanisms– Resource allocation: protect apps against over-commitment– Workload placement: move workloads to minimize interference

Let’s take a look at recommendations from the hypervisor vendors…

Page 11: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

11

VMWare Best Practices: Managing Productions Apps Performance Best Practice Guide to Exchange Server Virtualization:http://www.vmware.com/files/pdf/Exchange_2010_on_VMware_-_Best_Practices_Guide.pdf

“It is recommended that standalone servers…be designed to not exceed 70% utilization duringpeak period.”

Assure PeakUtilization:

Avoid Over-Commitment:“For performance-critical Exchange virtual machines (i.e., production systems),try to ensure the total number of vCPUs assigned to all the virtual machines is equalto or less than the total number of cores onthe host machine.”

Page 12: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

12

VMWare Best Practices: Managing Productions Apps Performance VMWare Production Apps Strategy Rests on 2 Rules:

VMs running production apps should ensure that:

“Resource allocationsare sufficient to serve

peak demands.”R-I

“Aggregate allocationsdo not exceed the

PM capacity.”R-II

R-I guarantees that an app may get its peak demands

served, if capacityis available.

R-II guarantees that the capacity allocation will be

available.

i.e., if VM1 and VM2 each need 4 vCPUs, we need a PM with ≥8 CPUs!

Page 13: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

13

Wait….Really? Then why virtualize?

• Though there’s no sharing of resources, still enjoy the other benefits of virtualization (app isolation, VM set-up, back-up, etc.)

“Resource allocationsare sufficient to serve

peak demands.”R-I

“Aggregate allocationsdo not exceed the

PM capacity.”R-II

R-I guarantees that an app may get its peak demands

served, if capacityis available.

R-II guarantees that the capacity allocation will be

available.

Page 14: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

14

Virtualization Can Result in3 Fundamental Inefficiencies

Over-provisioning inefficiency

Workload packing inefficiency

Non-adaptive control inefficiency

1. 2. 3.

These fundamental inefficiencies are considered next…

Page 15: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

Over-provisioning Inefficiency

Page 16: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

16

How to Avoid Over-Provisioning Waste?

• To Avoid Waste: Increase average workload withoutincreasing reservations

– Add performance-insensitive apps with high average workload

– E.g., consolidate spam-filter apps, email archival apps alongside mission-critical apps

• Need additional best practice rule: Smart consolidation

Best Practice #1:

Maintain a consolidation-balance between

performance-sensitive and insensitive workloads

Page 17: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

Workload-Packing Inefficiency

Page 18: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

18

A Greatly Simplified Example

2

4

8

6

10

12

14

16

PM1 PM2 PM3

2

4

8

6

VM1 VM2 VM3 VM4 VM5 VM6

Virtualized Workloads

Manual Ad-Hoc Workload Assignment

CPU capacity: 16 cores

Memory capacity: 8 GB

IO capacity: 10k IOPS

Page 19: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

19

What If We Get New VMs?

2

4

8

6

10

12

14

16

PM1 PM2 PM3

• Can we do better?• Optimized assignment uses

40% less resources (3 PM vs. 5)

2

4

8

6

10

12

14

16

PM1 PM2 PM3 PM4 PM5

Ad Hoc Assignment

VM7 VM8 VM9 VM10

2

4

8

6

Page 20: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

20

What Can We Learn from This Example?

• Changes may require (re-)assignment of workloads• Even a trivialized example can be very complex• Complexity and waste can grow dramatically

– When the number of VMs increases– When physical machines vary– When there are constraints (e.g., storage access, security policies)– When the rate of changes is high

• Ad hoc processes can lead to costly inefficiencies• Planning and workload placement must consider all workload

types (not just CPU)

Page 21: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

21

Overcoming the Packing Inefficiency

• Use improved workload placement algorithms

– Look holistically at all workloads and resources

– Exploit the flexibility of performance-insensitive workloads

– Exploit the dynamics of workloads peaks & troughs

Best Practice #2:

Use improved workload placement algorithms

Page 22: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

Non-adaptive Control Inefficiency

Page 23: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

23

1

15 16 17 18 19 20 21 22 23 24 01 02 03 04 05 06 07 08 09 10 11 12 13 14

10

k-IO

PS R

ate

Time

Mission-Critical App Example

• Virtualized MS Exchange app• High IOPS during the night (2AM-5AM)

– Peak: 10 k-IOPS– <1 k-IOPS during the rest of the time

Page 24: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

24

What If Workloads Grow?

• Can we do better?• Optimized assignment uses

25% less resources

2

4

8

6

10

12

14

16

PM1 PM2 PM3 PM4

VM1 VM2 VM3 VM4 VM5 VM6

2

4

8

6

What if VM1 needs more memory & storage?

2

4

8

6

10

12

14

16

PM1 PM2 PM3

Page 25: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

25

Adaptive vs. Non-Adaptive Workload Control

• Workloads demands (and interference) change over time– E.g., Exchange server is active through the night– Why keep its reservation during the day?

• Static workload mgmt is limited in handling emergent problems– Apps profiles reflect long-term statistics; fluctuations can cause

interferences• Adaptive workload control offers superior mgmt

– Exploit workload dynamics to reduce waste of static policies– Eliminate emergent interferences

Best Practice #3:

Provide adaptive control to optimize resource use & avoid

interference

Best Practice #4:

Use of forward lookingworkload projection

Page 26: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

26

Adaptive Control: Too Complex for Manual Management

• Manual management requires administrators to:– Master voluminous details of hypervisor and

applications internals – Manage interference and waste problems manually– Manage resource allocations and move applications

as workloads change– Maintain tight-coordination between virtualization

& app administratorsThis complexity is a central barrier for Virtualization 2.0 !!!

Page 27: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

Virtualizing Production Apps:Improved Best Practices

Page 28: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

28

Conclusions

• Workload placement can be very inefficient – Over-provisioning waste; workload-packing waste; non-adaptive

inefficiencies• Virtualization is much too complex for manual administration• Must be augmented by workload management:

– Eliminate the over-provisioning waste through balanced consolidation

– Minimize the workload-packing waste by exploiting workload features

– Support adaptive control to optimize resource use & avoid interference

Virtualization 2.0 Strategy: Replace manual mgmt with automated optimized workload management

Page 29: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

Live Demonstration

Page 30: Virtualizing  Mission-Critical Apps 1PM EST,  3/29/2011 Ilya Mirman Philip Thomas

Thank you!www.vmturbo.com