capacity analysis techniques applied to vmware vms (aka ......vm: number of physical processors...

35
Debbie Sheetz, BMC Software Capacity Analysis Techniques Applied to VMware VMs (aka When is a Server not really a Server?) La Jolla, CA November 5 th 2013

Upload: others

Post on 23-Nov-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

Debbie Sheetz, BMC Software

Capacity Analysis Techniques Applied to VMware VMs (aka When is a Server not really a Server?)

La Jolla, CA November 5th 2013

Page 2: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 2

Presentation Overview

How to Approach Performance/Capacity Evaluation for a Virtualized Application

- How is this like evaluating performance of a non-virtualized application?- What kinds of measurements are necessary?

Methodology- Understanding Virtualization Layers

Which layers matter How layers relate Where layer measurements come from

- Identifying Metric Clusters CPU: capacity utilization and performance stress Memory: capacity utilization, stress, shortage

- Application performance is the sum of all its parts

Case Studies1. Right-sizing VMware Linux guest memory2. VMware Linux guest memory health assessment

Conclusions

Page 3: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 3

How to Approach Performance/Capacity Evaluation for a Virtualized Application

Computer performance analysis and prediction depends on having cause and effect relationships

- High CPU queue = poor response time- Memory shortage = degraded response time

Need to identify groups of related metrics, i.e. “metric clusters” - CPU: CPU capacity utilization and queue length- Memory: capacity utilization, pressure, shortage

So far, all of this applies to physical or virtual servers …

Virtualization introduces layers- Relationship of virtualization “layers” to the application- Different layers have different measurements available- Paper “Modeling/Sizing Techniques for Different Virtualization

Strategies” from CMG 2008 outlines this approach

Page 4: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 4

Performance Evaluation for a Virtualized ApplicationLayers

What’s between the application and the hardware resources it uses?

PHYSICAL SERVER VIRTUALIZED APPLICATIONS

Page 5: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 5

Performance Evaluation for a Virtualized ApplicationESX Layers

What are the most important ESX server components for our analysis?

- VM – the virtual machine Contains the operating system and the applications running on it

- Cluster – a set of physical hosts A VM is assigned to a cluster At any given moment the VM is running on one of the physical hosts

owned by that cluster If VMotion is enabled, the VM can be automatically moved from one

host to another to achieve balanced hosts- Host – a physical host

Owns hardware resources such as CPU, physical memory, disks, and network interfaces

Page 6: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 6

Performance Evaluation for a Virtualized ApplicationESX Layers

How are the “layers” for ESX Server related to each other?

1. the application2. the operating system

hosting the application 3. the virtual machine hosting

the operating system4. the physical host on which

the virtual machine runs 5. the cluster which owns a

number of physical hosts and runs a group of virtual machines

Page 7: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 7

Performance Evaluation for a Virtualized ApplicationMeasuring ESX Layers

Where are performance metrics for each ESX “layer” obtained?- Layers 1 and 2 are reported from the host operating system - Layers 3, 4, and 5 are reported from ESX (vCenter)

Page 8: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 8

Performance Evaluation for a Virtualized ApplicationMetric Clusters/CPU

What affects the setting of a CPU capacity threshold? - Why not set it at 100%?

For interactive workloads, CPU queueing causes poor performance For non-interactive workloads, 100% can be perfect! (see paper

“Analytic Modeling Techniques for Predicting Batch Window Elapsed Time" from CMG 2009)

Interactive workloads can be spiky – need headroom Workload forecasting can be inexact - need headroom, too Failover planning

- After taking into account non-performance constraints, then need to observe the CPU queue length May need to further reduce the CPU capacity threshold

So the metric cluster is CPU CAPACITY UTILIZATION and CPU QUEUE LENGTH

Page 9: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 9

Performance Evaluation for a Virtualized ApplicationMetric Clusters/Memory

What affects the setting of a memory capacity threshold? - The philosophy is similar to CPU, but the metrics are not as simple

CPU usage is in direct proportion to the workload, for memory that’s not always true

Need a combination of capacity usage and performance “warning” metrics

Memory metrics differ by operating system

So the metric cluster is MEMORY CAPACITY UTILIZATION, MEMORY

PRESSURE, and MEMORY SHORTAGE

Page 10: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 10

Performance Evaluation for a Virtualized ApplicationMetric Clusters/CPU Metrics

CPU Capacity Utilization metrics - For ESX, CPU utilization is MHz Used divided by MHz Available

Cluster: sum of all hosts MHz Host: number of physical processors * MHz per processor VM: number of physical processors configured * MHz per processor

- For Linux and Windows, CPU Utilization is CPU Seconds Used (User CPU + System CPU) divided by CPU Seconds Available (number of processors seen by the OS * seconds)

CPU Queue Length metrics- For ESX, CPU Queue Length is CPU Ready divided by seconds for each VM

Cluster/Host: sum of all VMs CPU queue length (see paper "Virtualization Performance and Capacity Data Classification Schema”, from CMG 2010)

- For Linux, Run Queue Depth is available (sampled metric)- For Windows, Processor Queue Length is available (sampled metric)

Page 11: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 11

Performance Evaluation for a Virtualized ApplicationMetric Clusters/Memory Metrics

Memory Capacity Utilization metrics - For ESX, Memory utilization is either Consumed Memory divided by

Configured Memory or Active Memory divided by Configured Memory Cluster: sum of all hosts configured memory Host/VM: physical memory configured

- For Linux and Windows, Memory utilization is Memory Used divided by Memory Available Also require breakdown of physical memory by usage type: Free, Files

Cache, Process, and System memory

Page 12: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 12

Performance Evaluation for a Virtualized ApplicationMetric Clusters/Memory Metrics

Memory Pressure metrics- For ESX VMs, Hosts, and Clusters

Balloon Memory is available ratio of Active Memory to Consumed Memory can be calculated

- For Linux, Page Scans is available- For Windows, no equivalent metric

Memory Shortage metrics- For ESX VMs, Hosts, and Clusters

Swapping (Paging) is available Memory Swapped is available

- For Linux, Paging (to disk) is available- For Windows, Paging (to disk) is available, but includes File Cache support- For Linux and Windows, Virtual Memory Utilization approaching 100%

Page 13: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 13

Performance Evaluation for a Virtualized ApplicationApplication Performance is the Sum of Its Parts

Application Resource Demand is a function of- Workload Volume

How many transactions does the application need to support– Time of day– Day of the week– Time of the year, etc.

- Workload Resource Profile CPU, Memory, I/O, and Network required per transaction

Application Performance = Resource Demand + Queueing- Demand = Workload Volume * Workload Resource Profile

This is called service time in an analytic model- Queueing occurs when demand can’t be met immediately by available

hardware resources This is called wait time or queueing delay in an analytic model

RESPONSE TIME = SERVICE TIME + WAIT TIME

Page 14: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 14

Capacity Evaluation Techniques for VMware VMs Case Studies Overview

Case Studies- Demonstrate selected aspects of the capacity analysis methodology- Shows VMware ESX-hosted Linux guests

Methodology applies to any virtualized platform - Two Case Studies

1. Right-sizing VMware Linux guest memory2. VMware Linux guest memory health assessment

Page 15: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 15

Case StudyRight-sizing VMware Virtual Machine (VM) Memory

VMware provides two measurements of VM memory usage: Consumed and Active Memory- Which one should be used for capacity planning?

Consumed Memory is often almost as large as Configured Memory Active is usually much smaller than Consumed, often near a factor of 10

- So Consumed is quite conservative and Active Memory much less so

Using Active, you can run

about 750 and 250 more VMs

on each cluster

Using Consumed, you can run

about 100 and 15 more VMs

on each cluster

ESX

Cluster

Layer 5

Page 16: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 16

Case StudyRight-sizing VMware Virtual Machine (VM) Memory

For capacity planning, some recommend using Active + a buffer (such as 70% above Active)- Much less conservative than Consumed, so more VMs could be run on the

current hardware

Page 17: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 17

Case StudyRight-sizing VMware Virtual Machine (VM) Memory

SOLUTION: Choose conservative or aggressive approach depending on corporate philosophy- Consumed Memory: Configure each VM with the “Consumed” amount of

memory; allow for memory over-commitment on the host/cluster- Active Memory +: Configure each VM with less memory than the current

Consumed (but more than Active) and carefully monitor for memory stress; if there is stress, increase the Configured Memory

Active Memory is

rarely over 1 GB, and

the original VM

Configured Memory is

16 GB

It’s decided to try 4 GB

as the new Configured

Memory

ESX VM Layer 3

Page 18: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 18

Case StudyRight-sizing VMware Virtual Machine (VM) Memory

SOLUTION: Need to monitor from both ESX and VM perspectives- ESX perspective: Memory utilization reduces overall; no paging, no

ballooning, no swapping

Consumed Memory % is 100% of

Configured, then reduces to < 50%

Restored to

16 GB

Restored to

16 GB

Configured Memory reduced to 4

GB, then is restored to 16 GB

ESX VM Layer 3

Page 19: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 19

Case StudyRight-sizing VMware Virtual Machine (VM) Memory

SOLUTION: Need to monitor from both ESX and VM perspectives- Guest perspective: Crisis! Virtual memory runs out, memory utilization is

100%, paging occurs, no process memory left, applications stop

Paging rate spikes to .3 MB/sec until

Configured Memory is restored, then is 0

Restored to

16 GB

Restored to

16 GB

Physical Memory utilization is 80-

100% until the Configured Memory is

restored

Linux Layer 2

Page 20: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 20

Case StudyRight-sizing VMware Virtual Machine (VM) Memory

VM and its applications are suffering badly

Virtual (swap) Memory utilization rises to100% ; when the system is rebooted to restore

the Configured Memory, utilization is 0% again

Processes consume most of memory, but it’s not enough (see the paging, swapping

problems) ; when memory is restored the “Good” memory profile returns: plenty of free

memory, more file system cache, and more process memory

Restored to

16 GB

Restored to

16 GB

Linux Layer 2

Page 21: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 21

Case StudyRight-sizing VMware Virtual Machine (VM) Memory

EVEN BETTER SOLUTION: Be very careful when “right-sizing”- Confirm application memory requirements before downsizing VM - Consider using VMware over-commitment instead of manually

reconfiguring individual VMs Need to monitor ESX-measured paging, ballooning, swapping

All cluster hosts are under-committed

Density of 1 indicates physical=virtual; <1

is under; >1 is over-committed

Reconfiguration

Specific host is well under the

memory threshold of 80%

ESX Host Layer 4

Page 22: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 22

Case StudyRight-sizing VMware Virtual Machine (VM) Memory

EVEN BETTER SOLUTION: Be very careful when “right-sizing”- If using manual reconfiguration, must screen for guest-measured

Paging (and/or scanning) increase Physical memory utilization increase and/or changes in profile Virtual (swap) memory utilization increase

- Recommend screening all important guests for Memory or CPU stress

Virtualized guest measurements can’t always be taken

literally, but are absolutely necessary for capacity analysis!

More detail in the paper about this

Page 23: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 23

Case StudyVMware Virtual Machine (VM) Memory Health

Changes have been seen in vCenter VM Memory metrics. Are these changes impacting application performance? Is there an ESX capacity shortfall?- Many memory metrics available from ESX for a VM

VM Used (same as Active

Memory and Memory Usage

from vCenter) has a clear daily

pattern

VM Configured Memory is

steady at 7.8 GB

There are large shifts between

Granted/Shared/Zero and

Balloon MemoryESX VM Layer 3

Page 24: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 24

Case StudyVMware Virtual Machine (VM) Memory Health

Additional drill down on the ESX VM memory characteristics- Specific memory pressure metrics

The ratio of Active to

Consumed is changing –

higher ratio indicates

memory pressure

Ballooning also indicates

memory pressure

Granted and Balloon show

their relationship, which is

that when there is memory

pressure, Granted reducesESX VM Layer 3

Page 25: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 25

Case StudyVMware Virtual Machine (VM) Memory Health

Is this affecting the application’s performance?- CPU and Memory patterns for the application don’t change despite the

changes seen at the VM level

CPU usage by process shows a very

consistent daily pattern of workload

volume and workload profile

Linux Layer 1

Memory usage of active

processes shows a consistent

daily pattern, correlated with

CPU usage

Page 26: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 26

Case StudyVMware Virtual Machine (VM) Memory Health

Is this affecting the application’s performance?- What about memory pressure or shortage metrics

Paging and scanning are zero Virtual memory utilization is very low

Linux Layer 2

Memory usage

breakdown shows

Process memory

increases as a percentage

and Files Cache and Free

decrease when ballooning

(memory pressure) occurs

In the first case study, that

was correlated with a

memory shortage, but not

here

Page 27: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 27

Case StudyVMware Virtual Machine (VM) Memory Health

Application is OK, but why are these changes occurring in ESX?- What about memory capacity utilization?

It’s consistently around

93%

Definitely over the

capacity threshold of 80%

Need to check for memory

pressure and shortage

metrics next

ESX Host Layer 4

Page 28: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 28

Case StudyVMware Virtual Machine (VM) Memory Health

Why are these changes occurring in ESX?- What do the memory pressure and shortage metrics show?

Ballooning indicates pressure and it’s

occurring consistently

Swapped memory indicates a shortage

and it’s occurring consistently

ESX Host Layer 4

Paging (Swapping) indicates a

shortage and it’s occurring pretty

consistently

Page 29: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 29

Case StudyVMware Virtual Machine (VM) Memory Health

So the ESX host is definitely experiencing a memory shortage- The cluster containing this host is showing around 70% memory capacity

utilization Even if with DRS enabled, this host is quite “worse than average” for

both memory and CPU capacity utilization

Possible solutions- Investigate why cluster isn’t better balanced

Didn’t have the data for the other hosts to do this analysis- Investigate moving one or more VMs to less utilized clusters- Upgrade memory on the cluster hosts

Page 30: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 30

Case StudyVMware Virtual Machine (VM) Memory Health

Memory pressure metric Ratio of Active to Consumed Memory compared across layers- Individual VM experience is much different than the “average” experience

Cluster and Host ratio is rising, which shows

memory pressure is increasing overall

Our VM is much worse than average

ESX Cluster/Host /VM

Layers 5/4/3

Active VMs are the only VMs we

need to know about

Ratio is approaching 60% for some

ESX VM Layer 3

Page 31: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 31

Conclusions

Virtualization is simple for the application, not so easy for the capacity planner/performance analyst- Must identify the relevant “layers” between the application and the

hardware resources it uses- Need appropriate measurements from every layer

Often requires multiple data sources Apparently similar metrics can mean entirely different things

- Need to perform analysis on several layers at the same time to get a complete picture

Page 32: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 32

Conclusions

Need to use the same techniques as for a physical server- Set hardware capacity

resource utilization thresholds according to both performance and other constraints

- Understand that “high” for one resource can produce “low” for another resource

- Identify workload demand patterns within servers/guests

- Metric clusters (capacity and performance) are needed for each resource

Page 33: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 33

Conclusions

Use higher layer metrics carefully - Averages (or other summarizations) can obscure exactly what you need to

see Focus on application and VM layers for actual performance Focus on active VMs

- Threshold hardware resource layer only Threshold interpretation for VMs requires multi-layer analysis Threshold interpretation for cluster requires host analysis, too

- Use compatible metric comparison units such as percentage of total capacity or queue length per processor rather than MHz, GB, total queue length, etc.

- Higher layers provide essential overall capacity planning projections and trends

Page 34: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 34

Q&A

Page 35: Capacity Analysis Techniques Applied to VMware VMs (aka ......VM: number of physical processors configured * MHz per processor-For Linux and Windows, CPU Utilization is CPU Seconds

© Copyright 1/22/2016 BMC Software, Inc 35

Learn more at www.bmc.com