virtualisation oversubscription - what's so scary?

35
Commercial in Confidence www.metron-athene.com Virtualisation Oversubscription (What’s so scary?) [email protected]

Upload: metron

Post on 15-Apr-2017

80 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Virtualisation Oversubscription(What’s so scary?)

[email protected]

Page 2: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Topics

• What led me here• Oversubscription Overview• CPU Oversubscription• Memory Oversubscription• What’s the worst that can happen? (Queueing

theory, the simple version)

Page 3: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Overcommit vs Oversubscribe

• Overcommit = Oversubscribe

Page 4: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

What led me here

• Clients– “Oh, we don’t oversubscribe”

• Fear

• Misunderstanding

Page 5: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Flying Navigation by Dead Reckoning

• You know where you started• You know how long you flew for• You know your air speed• You know what direction you flew in

• What if the wind changed in the last 8 hours?

• WW2 bombing saw 1 in 5 bomb loads within 5 miles of the target.

Page 6: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Virtualisation Used Capacity by Dead Reckoning

• You know what you started with• You know what you provisioned• You know how much is left

• Not especially efficient

Page 7: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Oversubscription

• Allocating more than you have– Thin Provisioning– Deduplication & Compression

Allocated

Exists

Allocated

Exists

Allocated

Used

Page 8: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

What can be oversubscribed?

• CPUs• Memory• Disk• NICs

– Nobody ever seems to think about that one– VMs on a single host = no NIC involved– Otherwise…

Page 9: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

CPU VMware Maximums

• Virtual Machine Maximum– 128 vCPUs per VM

• Host CPU maximums– Logical CPUs per host 480– Virtual machines per host 1024 – Virtual CPUs per host 4096– Virtual CPUs per core 32

• The achievable number of vCPUs per core depends on the workload and specifics of the hardware. For more information, see the latest version of Performance Best Practices for VMware vSphere

https://www.vmware.com/pdf/vsphere6/r60/vsphere-60-configuration-maximums.pdf

Page 10: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Memory VMware Maximums

• 6TB per Host– Well 12TB on specific hardware

• 4TB per VM

Page 11: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Memory Oversubscription

• How?– Free Space– Page Sharing– Balloon Driver (VMware) – Reservations– Shares

Page 12: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Memory

• Transparent Page Sharing– Deduplication in memory

• Balloon Driver– Vmmemctl process “steals” memory inside the VM

allowing that memory to be used by other VMs. This may cause the OS to page.

• VMkernel Swap– VM thinks pages are in memory. ESX has put that

memory on disk in a Vmkernel Swap file.– “Performance is NOT optimal”

Page 13: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Transparent Page Sharing

VM1 VM2

ESX

Page 14: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Balloon Driver (vmmemctl)

VM1 VM2

ESX

Page 15: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Memory test

• Memory vs. disk speed is…?– A) Memory is 100x faster than disk– B) Memory is 1,000x faster than disk– C) Memory is 10,000x faster than disk– D) Memory is 100,000x faster than disk– E) Memory is 1,000,000x faster than disk– F) I have no memory of the event, your honour

Page 16: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

VMkernel Swap

0%10%20%30%40%50%60%70%80%90%

100%

BalloonSwap FileReservation MB

Example:• Assume maximum

memory contention• Default 65% can be

Balloon driver• Example Reservation is

30%• 5% In the VMkernel

(.vswp) file.

Page 17: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Memory

433MB Active Memory

2.6GB Unique Memory

1.4GB Shared Memory

50MB Balloon Driver Memory

150MB ESX Overhead for

the VM

Page 18: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Reservations

• Resource Pools or VMs• If they want it, they get it• If they don’t want it, it’s available to all• Cannot reserve more than exists

• Oversubscribe– Protect core VMs with a reservation

Page 19: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Memory Idle Tax

• Memory has Shares• Memory Tax associates a value to each page used• Default Idle Tax rate is 75%• This makes idle memory cost 4 times as many

shares as active memory

Page 20: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

CPU Oversubscription

• How?– Time slicing– Co-Scheduling– Reservations– Shares– Limits

Page 21: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Time Slicing

• Cores are shared between vCPUs in time slices– 1 vCPU to 1 core at any point in time

• More vCPUs = More time slicing• Processes do this on CPUs all the time

– So why it is so scary?– Over 100 processes on my laptop share 4 CPUs

Running Dormant/IdleVM1

VM1

Page 22: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

IdleReadyThreads

VMWare Processor Scheduling: vCPU Co-Scheduling & Ready Time

1

2

3

4

VM

VM

VM

VM

VM

VM

VM

VM

VM

Page 23: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Reservations\Shares\Limits

Page 24: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Reservations

Prod VMReservation

CP

U U

sed

by P

rodu

ctio

n V

M

CPU Used by Test VM

1)The Production VM wants to use all the CPU available.2)The Test VM starts and also wants to use all the CPU available.3)Each uses 50% CPU4)The Production VM wants 250MHz CPU while Test wants to use 4000MHz CPU. Production gets 100% of it’s request. Test does not.

100% CPU

100% CPU

100% CPU

0% CPU 50% CPU

50%

CP

U

Page 25: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Reservations & Shares

Prod VMReservation

CP

U U

sed

by P

rodu

ctio

n V

M

CPU Used by Test VM

1)The Production VM (2000 Shares) wants to use all the CPU available.2)The Test VM (1000 Shares) also wants to use all the CPU available.3)Production gets 66% CPU, Test gets 33% CPU.4)The Production VM wants 250MHz CPU while Test could still use 4000MHz CPU. Production gets 100% of it’s request. Test does not.

100% CPU

100% CPU

100% CPU

0% CPU 33% CPU

66%

CP

U

Page 26: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Expandable Reservation 1

Root (RP)Total CPU: 10200 MHz

Software (RP)Reservation: 3000 MHz

Expandable : Yes

Production (RP)Reservation: 1200 MHz

Expandable : Yes

Test (RP)Reservation: 1000 MHz

Expandable : No

VM1Res: 400 MHz

VM2Res: 300 MHz

VM7Res: 500 MHz

Why Cant VM7 Start?

1200 MHz Required. 1000 MHz Available.

Page 27: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Expandable Reservation 2

Root (RP)Total CPU: 10200 MHz

Software (RP)Reservation: 3000 MHz

Expandable : Yes

Production (RP)Reservation: 1200 MHz

Expandable : Yes

Test (RP)Reservation: 1000 MHz

Expandable : Yes

VM1Res: 400 MHz

VM2Res: 300 MHz

VM7Res: 500 MHz

VM3Res: 500 MHz

VM4Res: 500 MHz

VM5Res: 500 MHz

VM6Res: 500 MHz

2000MHz Requested1200MHz Reservation2000MHz of Parent Used

1200MHz Requested1000MHz Available In ParentWhere is the “extra” taken from?

3200MHz Requested3000MHz Reservation

200MHz used byTest (RP)

Page 28: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

What’s the worst that can happen?

• Memory• It fills up• Then bad things happen

• CPU• Bad things happen• Then it’s full/maxed• Queueing Theory

Page 29: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Contention and Queuing

• Finite system resources• Single workstation = no contention (usually)• More than One User = Possible Contention• Contention = Queuing

– This is COMPLETELY NORMAL– It’s how operating systems work.

• Excessive Queuing = Poor Performance and Long Response Times

Page 30: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Basic Ideas of Queuing

QueueServer

Arriving customers, transactions

A

Leaving customers, transactions

L

Queuing Time

QService Time

S

Response Time

Page 31: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Utilization and Response Time

Response Time

0 0.5 1.0Utilization

Service Time

R = S / (1 - U)

Page 32: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Benefits of Multiple Servers

Response Time

0 0.5 1.0Utilization

Service Time

Single CPU

Dual CPU16-way CPU

Page 33: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Why are we interested in this queue stuff again?

• VMs Queue for free CPUs– Ready Time– Co-Stop time– Higher utilisation = higher contention– More concerned about CPU busy than vCPU to logical

CPU ratio– Because it’s maths, you can model it

Page 34: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Roundup

• Oversubscription does not equal unacceptable performance

• Virtualisation is expecting you to oversubscribe– It’s the reason it exists

• Take the fear out of oversubscription through proper planning– Plan for performance, not ratios

Page 35: Virtualisation Oversubscription - What's so scary?

Commercial in Confidencewww.metron-athene.com

Thank You

[email protected]