capacity analysis techniques applied to vmware vms (aka ......vm: number of physical processors...
TRANSCRIPT
Debbie Sheetz, BMC Software
Capacity Analysis Techniques Applied to VMware VMs (aka When is a Server not really a Server?)
La Jolla, CA November 5th 2013
© Copyright 1/22/2016 BMC Software, Inc 2
Presentation Overview
How to Approach Performance/Capacity Evaluation for a Virtualized Application
- How is this like evaluating performance of a non-virtualized application?- What kinds of measurements are necessary?
Methodology- Understanding Virtualization Layers
Which layers matter How layers relate Where layer measurements come from
- Identifying Metric Clusters CPU: capacity utilization and performance stress Memory: capacity utilization, stress, shortage
- Application performance is the sum of all its parts
Case Studies1. Right-sizing VMware Linux guest memory2. VMware Linux guest memory health assessment
Conclusions
© Copyright 1/22/2016 BMC Software, Inc 3
How to Approach Performance/Capacity Evaluation for a Virtualized Application
Computer performance analysis and prediction depends on having cause and effect relationships
- High CPU queue = poor response time- Memory shortage = degraded response time
Need to identify groups of related metrics, i.e. “metric clusters” - CPU: CPU capacity utilization and queue length- Memory: capacity utilization, pressure, shortage
So far, all of this applies to physical or virtual servers …
Virtualization introduces layers- Relationship of virtualization “layers” to the application- Different layers have different measurements available- Paper “Modeling/Sizing Techniques for Different Virtualization
Strategies” from CMG 2008 outlines this approach
© Copyright 1/22/2016 BMC Software, Inc 4
Performance Evaluation for a Virtualized ApplicationLayers
What’s between the application and the hardware resources it uses?
PHYSICAL SERVER VIRTUALIZED APPLICATIONS
© Copyright 1/22/2016 BMC Software, Inc 5
Performance Evaluation for a Virtualized ApplicationESX Layers
What are the most important ESX server components for our analysis?
- VM – the virtual machine Contains the operating system and the applications running on it
- Cluster – a set of physical hosts A VM is assigned to a cluster At any given moment the VM is running on one of the physical hosts
owned by that cluster If VMotion is enabled, the VM can be automatically moved from one
host to another to achieve balanced hosts- Host – a physical host
Owns hardware resources such as CPU, physical memory, disks, and network interfaces
© Copyright 1/22/2016 BMC Software, Inc 6
Performance Evaluation for a Virtualized ApplicationESX Layers
How are the “layers” for ESX Server related to each other?
1. the application2. the operating system
hosting the application 3. the virtual machine hosting
the operating system4. the physical host on which
the virtual machine runs 5. the cluster which owns a
number of physical hosts and runs a group of virtual machines
© Copyright 1/22/2016 BMC Software, Inc 7
Performance Evaluation for a Virtualized ApplicationMeasuring ESX Layers
Where are performance metrics for each ESX “layer” obtained?- Layers 1 and 2 are reported from the host operating system - Layers 3, 4, and 5 are reported from ESX (vCenter)
© Copyright 1/22/2016 BMC Software, Inc 8
Performance Evaluation for a Virtualized ApplicationMetric Clusters/CPU
What affects the setting of a CPU capacity threshold? - Why not set it at 100%?
For interactive workloads, CPU queueing causes poor performance For non-interactive workloads, 100% can be perfect! (see paper
“Analytic Modeling Techniques for Predicting Batch Window Elapsed Time" from CMG 2009)
Interactive workloads can be spiky – need headroom Workload forecasting can be inexact - need headroom, too Failover planning
- After taking into account non-performance constraints, then need to observe the CPU queue length May need to further reduce the CPU capacity threshold
So the metric cluster is CPU CAPACITY UTILIZATION and CPU QUEUE LENGTH
© Copyright 1/22/2016 BMC Software, Inc 9
Performance Evaluation for a Virtualized ApplicationMetric Clusters/Memory
What affects the setting of a memory capacity threshold? - The philosophy is similar to CPU, but the metrics are not as simple
CPU usage is in direct proportion to the workload, for memory that’s not always true
Need a combination of capacity usage and performance “warning” metrics
Memory metrics differ by operating system
So the metric cluster is MEMORY CAPACITY UTILIZATION, MEMORY
PRESSURE, and MEMORY SHORTAGE
© Copyright 1/22/2016 BMC Software, Inc 10
Performance Evaluation for a Virtualized ApplicationMetric Clusters/CPU Metrics
CPU Capacity Utilization metrics - For ESX, CPU utilization is MHz Used divided by MHz Available
Cluster: sum of all hosts MHz Host: number of physical processors * MHz per processor VM: number of physical processors configured * MHz per processor
- For Linux and Windows, CPU Utilization is CPU Seconds Used (User CPU + System CPU) divided by CPU Seconds Available (number of processors seen by the OS * seconds)
CPU Queue Length metrics- For ESX, CPU Queue Length is CPU Ready divided by seconds for each VM
Cluster/Host: sum of all VMs CPU queue length (see paper "Virtualization Performance and Capacity Data Classification Schema”, from CMG 2010)
- For Linux, Run Queue Depth is available (sampled metric)- For Windows, Processor Queue Length is available (sampled metric)
© Copyright 1/22/2016 BMC Software, Inc 11
Performance Evaluation for a Virtualized ApplicationMetric Clusters/Memory Metrics
Memory Capacity Utilization metrics - For ESX, Memory utilization is either Consumed Memory divided by
Configured Memory or Active Memory divided by Configured Memory Cluster: sum of all hosts configured memory Host/VM: physical memory configured
- For Linux and Windows, Memory utilization is Memory Used divided by Memory Available Also require breakdown of physical memory by usage type: Free, Files
Cache, Process, and System memory
© Copyright 1/22/2016 BMC Software, Inc 12
Performance Evaluation for a Virtualized ApplicationMetric Clusters/Memory Metrics
Memory Pressure metrics- For ESX VMs, Hosts, and Clusters
Balloon Memory is available ratio of Active Memory to Consumed Memory can be calculated
- For Linux, Page Scans is available- For Windows, no equivalent metric
Memory Shortage metrics- For ESX VMs, Hosts, and Clusters
Swapping (Paging) is available Memory Swapped is available
- For Linux, Paging (to disk) is available- For Windows, Paging (to disk) is available, but includes File Cache support- For Linux and Windows, Virtual Memory Utilization approaching 100%
© Copyright 1/22/2016 BMC Software, Inc 13
Performance Evaluation for a Virtualized ApplicationApplication Performance is the Sum of Its Parts
Application Resource Demand is a function of- Workload Volume
How many transactions does the application need to support– Time of day– Day of the week– Time of the year, etc.
- Workload Resource Profile CPU, Memory, I/O, and Network required per transaction
Application Performance = Resource Demand + Queueing- Demand = Workload Volume * Workload Resource Profile
This is called service time in an analytic model- Queueing occurs when demand can’t be met immediately by available
hardware resources This is called wait time or queueing delay in an analytic model
RESPONSE TIME = SERVICE TIME + WAIT TIME
© Copyright 1/22/2016 BMC Software, Inc 14
Capacity Evaluation Techniques for VMware VMs Case Studies Overview
Case Studies- Demonstrate selected aspects of the capacity analysis methodology- Shows VMware ESX-hosted Linux guests
Methodology applies to any virtualized platform - Two Case Studies
1. Right-sizing VMware Linux guest memory2. VMware Linux guest memory health assessment
© Copyright 1/22/2016 BMC Software, Inc 15
Case StudyRight-sizing VMware Virtual Machine (VM) Memory
VMware provides two measurements of VM memory usage: Consumed and Active Memory- Which one should be used for capacity planning?
Consumed Memory is often almost as large as Configured Memory Active is usually much smaller than Consumed, often near a factor of 10
- So Consumed is quite conservative and Active Memory much less so
Using Active, you can run
about 750 and 250 more VMs
on each cluster
Using Consumed, you can run
about 100 and 15 more VMs
on each cluster
ESX
Cluster
Layer 5
© Copyright 1/22/2016 BMC Software, Inc 16
Case StudyRight-sizing VMware Virtual Machine (VM) Memory
For capacity planning, some recommend using Active + a buffer (such as 70% above Active)- Much less conservative than Consumed, so more VMs could be run on the
current hardware
© Copyright 1/22/2016 BMC Software, Inc 17
Case StudyRight-sizing VMware Virtual Machine (VM) Memory
SOLUTION: Choose conservative or aggressive approach depending on corporate philosophy- Consumed Memory: Configure each VM with the “Consumed” amount of
memory; allow for memory over-commitment on the host/cluster- Active Memory +: Configure each VM with less memory than the current
Consumed (but more than Active) and carefully monitor for memory stress; if there is stress, increase the Configured Memory
Active Memory is
rarely over 1 GB, and
the original VM
Configured Memory is
16 GB
It’s decided to try 4 GB
as the new Configured
Memory
ESX VM Layer 3
© Copyright 1/22/2016 BMC Software, Inc 18
Case StudyRight-sizing VMware Virtual Machine (VM) Memory
SOLUTION: Need to monitor from both ESX and VM perspectives- ESX perspective: Memory utilization reduces overall; no paging, no
ballooning, no swapping
Consumed Memory % is 100% of
Configured, then reduces to < 50%
Restored to
16 GB
Restored to
16 GB
Configured Memory reduced to 4
GB, then is restored to 16 GB
ESX VM Layer 3
© Copyright 1/22/2016 BMC Software, Inc 19
Case StudyRight-sizing VMware Virtual Machine (VM) Memory
SOLUTION: Need to monitor from both ESX and VM perspectives- Guest perspective: Crisis! Virtual memory runs out, memory utilization is
100%, paging occurs, no process memory left, applications stop
Paging rate spikes to .3 MB/sec until
Configured Memory is restored, then is 0
Restored to
16 GB
Restored to
16 GB
Physical Memory utilization is 80-
100% until the Configured Memory is
restored
Linux Layer 2
© Copyright 1/22/2016 BMC Software, Inc 20
Case StudyRight-sizing VMware Virtual Machine (VM) Memory
VM and its applications are suffering badly
Virtual (swap) Memory utilization rises to100% ; when the system is rebooted to restore
the Configured Memory, utilization is 0% again
Processes consume most of memory, but it’s not enough (see the paging, swapping
problems) ; when memory is restored the “Good” memory profile returns: plenty of free
memory, more file system cache, and more process memory
Restored to
16 GB
Restored to
16 GB
Linux Layer 2
© Copyright 1/22/2016 BMC Software, Inc 21
Case StudyRight-sizing VMware Virtual Machine (VM) Memory
EVEN BETTER SOLUTION: Be very careful when “right-sizing”- Confirm application memory requirements before downsizing VM - Consider using VMware over-commitment instead of manually
reconfiguring individual VMs Need to monitor ESX-measured paging, ballooning, swapping
All cluster hosts are under-committed
Density of 1 indicates physical=virtual; <1
is under; >1 is over-committed
Reconfiguration
Specific host is well under the
memory threshold of 80%
ESX Host Layer 4
© Copyright 1/22/2016 BMC Software, Inc 22
Case StudyRight-sizing VMware Virtual Machine (VM) Memory
EVEN BETTER SOLUTION: Be very careful when “right-sizing”- If using manual reconfiguration, must screen for guest-measured
Paging (and/or scanning) increase Physical memory utilization increase and/or changes in profile Virtual (swap) memory utilization increase
- Recommend screening all important guests for Memory or CPU stress
Virtualized guest measurements can’t always be taken
literally, but are absolutely necessary for capacity analysis!
More detail in the paper about this
© Copyright 1/22/2016 BMC Software, Inc 23
Case StudyVMware Virtual Machine (VM) Memory Health
Changes have been seen in vCenter VM Memory metrics. Are these changes impacting application performance? Is there an ESX capacity shortfall?- Many memory metrics available from ESX for a VM
VM Used (same as Active
Memory and Memory Usage
from vCenter) has a clear daily
pattern
VM Configured Memory is
steady at 7.8 GB
There are large shifts between
Granted/Shared/Zero and
Balloon MemoryESX VM Layer 3
© Copyright 1/22/2016 BMC Software, Inc 24
Case StudyVMware Virtual Machine (VM) Memory Health
Additional drill down on the ESX VM memory characteristics- Specific memory pressure metrics
The ratio of Active to
Consumed is changing –
higher ratio indicates
memory pressure
Ballooning also indicates
memory pressure
Granted and Balloon show
their relationship, which is
that when there is memory
pressure, Granted reducesESX VM Layer 3
© Copyright 1/22/2016 BMC Software, Inc 25
Case StudyVMware Virtual Machine (VM) Memory Health
Is this affecting the application’s performance?- CPU and Memory patterns for the application don’t change despite the
changes seen at the VM level
CPU usage by process shows a very
consistent daily pattern of workload
volume and workload profile
Linux Layer 1
Memory usage of active
processes shows a consistent
daily pattern, correlated with
CPU usage
© Copyright 1/22/2016 BMC Software, Inc 26
Case StudyVMware Virtual Machine (VM) Memory Health
Is this affecting the application’s performance?- What about memory pressure or shortage metrics
Paging and scanning are zero Virtual memory utilization is very low
Linux Layer 2
Memory usage
breakdown shows
Process memory
increases as a percentage
and Files Cache and Free
decrease when ballooning
(memory pressure) occurs
In the first case study, that
was correlated with a
memory shortage, but not
here
© Copyright 1/22/2016 BMC Software, Inc 27
Case StudyVMware Virtual Machine (VM) Memory Health
Application is OK, but why are these changes occurring in ESX?- What about memory capacity utilization?
It’s consistently around
93%
Definitely over the
capacity threshold of 80%
Need to check for memory
pressure and shortage
metrics next
ESX Host Layer 4
© Copyright 1/22/2016 BMC Software, Inc 28
Case StudyVMware Virtual Machine (VM) Memory Health
Why are these changes occurring in ESX?- What do the memory pressure and shortage metrics show?
Ballooning indicates pressure and it’s
occurring consistently
Swapped memory indicates a shortage
and it’s occurring consistently
ESX Host Layer 4
Paging (Swapping) indicates a
shortage and it’s occurring pretty
consistently
© Copyright 1/22/2016 BMC Software, Inc 29
Case StudyVMware Virtual Machine (VM) Memory Health
So the ESX host is definitely experiencing a memory shortage- The cluster containing this host is showing around 70% memory capacity
utilization Even if with DRS enabled, this host is quite “worse than average” for
both memory and CPU capacity utilization
Possible solutions- Investigate why cluster isn’t better balanced
Didn’t have the data for the other hosts to do this analysis- Investigate moving one or more VMs to less utilized clusters- Upgrade memory on the cluster hosts
© Copyright 1/22/2016 BMC Software, Inc 30
Case StudyVMware Virtual Machine (VM) Memory Health
Memory pressure metric Ratio of Active to Consumed Memory compared across layers- Individual VM experience is much different than the “average” experience
Cluster and Host ratio is rising, which shows
memory pressure is increasing overall
Our VM is much worse than average
ESX Cluster/Host /VM
Layers 5/4/3
Active VMs are the only VMs we
need to know about
Ratio is approaching 60% for some
ESX VM Layer 3
© Copyright 1/22/2016 BMC Software, Inc 31
Conclusions
Virtualization is simple for the application, not so easy for the capacity planner/performance analyst- Must identify the relevant “layers” between the application and the
hardware resources it uses- Need appropriate measurements from every layer
Often requires multiple data sources Apparently similar metrics can mean entirely different things
- Need to perform analysis on several layers at the same time to get a complete picture
© Copyright 1/22/2016 BMC Software, Inc 32
Conclusions
Need to use the same techniques as for a physical server- Set hardware capacity
resource utilization thresholds according to both performance and other constraints
- Understand that “high” for one resource can produce “low” for another resource
- Identify workload demand patterns within servers/guests
- Metric clusters (capacity and performance) are needed for each resource
© Copyright 1/22/2016 BMC Software, Inc 33
Conclusions
Use higher layer metrics carefully - Averages (or other summarizations) can obscure exactly what you need to
see Focus on application and VM layers for actual performance Focus on active VMs
- Threshold hardware resource layer only Threshold interpretation for VMs requires multi-layer analysis Threshold interpretation for cluster requires host analysis, too
- Use compatible metric comparison units such as percentage of total capacity or queue length per processor rather than MHz, GB, total queue length, etc.
- Higher layers provide essential overall capacity planning projections and trends
© Copyright 1/22/2016 BMC Software, Inc 34
Q&A
© Copyright 1/22/2016 BMC Software, Inc 35
Learn more at www.bmc.com