cmgi 2015 tt1 s4 chasing-performance
DESCRIPTION
Performance for vsphereTRANSCRIPT
Computer Measurement Group, India 1 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Computer Measurement Group, India 1
www.cmgindia.org
Chasing Performance
Understand, Troubleshoot and Optimize Workload performance on vSphere Platform
Adarsh Jagadeeshwaran, Ramprasad K S, Sai Inabattini
VMware India
Computer Measurement Group, India 2 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
What we refine
• Introduction
• Virtualization Basics
• vSphere Platform Basics
• Scoping
• Approach
• Data Collection and Analysis
• IO Profiling
• Optimizing
Computer Measurement Group, India 3 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Introduction
Computer Measurement Group, India 4 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Virtualization Basics
Computer Measurement Group, India 5 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Virtualizing x86…
• Non-virtualized (“native”) system – OS is designed run on bare metal hardware
• Assumption of full ownership with full privileges – Use the CPU privilege levels to restrict the access – Operating System code run with higher privilege – Application code has less privilege – Privilege escalations are either faulted or ignored
• Challenges (Virtualization) – Hardware is shared among multiple VMs – Cannot let guest run with highest privilege – Virtual Machines need to be isolated and contained
• Classical “ring compression” or “de-privileging”
– Run guest OS kernel in Ring 1 – Privileged instructions trap; emulated by VMM
• System call, Exceptions, • Privileged resource, (IOAPIC)
– Not enough for x86 as all privilege escalations cannot be trapped
Computer Measurement Group, India 6 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Binary Translation
• Combination of Direct Execution and Binary Translation (BT) – Virtual Machine Monitor (VMM) tracks the execution
from guest
– Just in time translation of privileged execution
• Translate the code that can cause privilege escalation
• Use translation cache for reusable translations
– Direct execution of user level code (Ring 3)
• Impacts – Overhead of translation and cache
– Translated code is not as optimal as original code
• VMM needs to run in the same address space as guest – VMM data region needs protection from guest access
– Use segmentation on 32 bit platform to protect VMM region
– Segmentation is not available on all 64 Bit platform
– BT inefficient to run 64 bit guests on these platforms
Computer Measurement Group, India 7 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Hardware Assist - CPU
• New Execution modes for privileges execution – Root mode for VMM
– Not Root mode with Ring 0 – 3 for guest execution
– Additional control structures (VMCS/VMCB)
– Guest executions need not to be monitored
• Preprogram the execution to trap privilege escalations and pass the control to VMM
• On exit to VMM, VMM inspects the guest execution and completes on behalf of guest
• Impacts – Exit to VMM can be expensive
– In guest updates (page table) costlier to track
– IN/OUT instruction and memory mapped I/O have higher overhead
Computer Measurement Group, India 8 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Memory Virtualization
• Virtual Memory – Application see a contiguous fixed address space - Virtual
Address (VA)
– OS maps Virtual Address to Physical Address (PA)
• Uses Page Tables to store this VA to PA mapping. (normal page size: 4K)
– CPU walks the page table for any translations
• Address in %CR3 = page table location
• Uses TLB (Translation look-aside Buffers) to cache the translations
• In a Virtual Machine – Additional address translation is required
• VA -> PA -> MA
– VMM uses shadow page table to map Guest VA to MA and replaces the guest page tables
– CPU uses VA -> MA mapping (no two level walks)
• Reduced cost at execution
0GB 4GB
App 1 App 2
0GB 4GB
Physical
Machine
0GB 4GB
App 1 App 2
0GB 4GB
VM - Physical
Computer Measurement Group, India 9 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Memory Virtualization
• Impact of using Shadow Page Tables – Tracing the guest page table writes are expensive
– Cost of Propagating the changes to shadow page table
– Page faults need to be intercepted
• Hidden page faults (Missed in Shadow page table)
• True page fault (Forward to guest)
– Context switches to be monitored
• CR3 updates need to be intercepted to replace the guest page table with shadow page table
– Resources (memory) for maintaining shadow page tables
Computer Measurement Group, India 10 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Hardware Assist - MMU
• Nested or Extended Page Tables – Hardware assistance for two level of address translations
• VA – PA mapping is set up by guest as usual
• PA – MA mapping is set up by hypervisor
– CPU walks guest page tables and then hypervisor page tables to arrive at VM to MA mapping during TLB fill
• Benefits – No traces required (Guest page table modifications are as fast as native)
– No exits on guest page faults (True faults), Context switches
– No memory overhead for shadow page tables
– Scalable across SMP
• Impact – TLB fill is expensive and page walk is costlier
– Using large pages will absorb some costs here (2MB v/s 4KB)
• Smaller page tables – fewer levels to traverse
• TLB capacity increase thus lesser TLB miss
Computer Measurement Group, India 11 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
I/O Virtualization
• Software based I/O virtualization – I/O devices are virtual
– IN/OUT instructions are monitored and completed by VMM
– Impact of traps will induce additional latency
• Hardware Assistance – PCI pass through
• A physical I/O device dedicated to guest
• Depends on availability of hardware support (IO-MMU)
• Guest has control over I/O device and transacts directly
– SR-IOV
• A single device can be shared across multiple guests
• Multiple Virtual Functions appear as if they are physical devices
• Needs hardware support and capable devices
– Good for latency sensitive applications
Computer Measurement Group, India 12 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
vSphere Platform Basics
Computer Measurement Group, India 13 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
vSphere ESXi
vSphere ESXi is a Type 1 hypervisor runs directly on the bare metal
• VMkernel is the purpose built
kernel which manages and
schedules resources
• Resources are allocated based on
the entitlements and availability
• Guests run in isolation controlled
by a Virtual Machine Monitor
(VMM)
• Device Drivers load as VMkernel
modules
• Other Worlds include processes
related to management of the
server (hostd, vpxa, dcui etc…)
Computer Measurement Group, India 15 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
CPU Scheduler
• Designed around being fair (Fairness) with emphasis on responsiveness and resource utilization – Share based algorithm with option to guarantee resources
– Highly Scalable
• Supports 128 vCPUs /4 TB RAM per Virtual machine
• 480 Logical CPUs on the host
• 1024 Virtual Machines/4096 vCPUs per host
• 1:32 over commitment ratio
• CPU time is distributed to virtual machines and other worlds (helpers, hostd, vpxa, sfcbd etc…) – Scheduler's default quantum is 50 ms
– World can be de-scheduled before this quantum expires if it becomes idle
– VM can overrun the quantum if it does not yield
Computer Measurement Group, India 16 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Co – Scheduling
• SMP Virtual Machines (vSMP) pose unique challenge for optimized resource usage
– Guest expects all CPUs to be scheduled at the same time (Co-Scheduling)
• But all CPUs might not have enough workload (waste??)
• Optimization
– Do not schedule siblings if they are idle
• Provides an illusion of synchronous progress
• Skew in execution can be dangerous (no forever run)
– Keep a check on skew and keep it bounded
– Strict Co-Scheduling
• If the skew is beyond threshold stop all sibling CPUs and schedule them together
• Can cause CPU fragmentation
– Relaxed Co-Scheduling
• Decisions to stop or proceed done at each siblings
– Not all siblings to be stopped (Only which far ahead should stop)
Computer Measurement Group, India 17 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Load Balancing
• Running worlds need to be load balanced across PCPUs to have proper resource utilization and responsiveness
– Worlds can be migrated across PCPUs (Pull or Push)
– Not All migrations can be good (Cache/NUMA advantages at risk)
• Virtual Machines are scheduled across the NUMA nodes
• Based on either CPU, Memory load or Round Robin
• Can trigger sub-optimal performance if guest is not aware of topology
• Rebalance the work load across
• CPU load (Short term imbalance or long term fairness)
• Effectiveness of Last Level Cache
• CPU and Memory Topology
• Action Affinity
• Cost of migration
• Migrations across NUMA nodes are carefully evaluated
Computer Measurement Group, India 18 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Memory Scheduler
• Proportional share based allocation (as CPU)
– Will reclaim the allocation if found idle
– Over commitment friendly memory conservation algorithms
• Transparent Page Sharing
• Memory Ballooning
• Memory Compression
• Swap to SSD (Low Latency Swap)
• Swap to File
• Large Page Support (2MB)
– Lesser overhead for page tables
– TLB can hold more addressable space
– Amortizes the cost of Nested Page Table walks
Computer Measurement Group, India 19 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Storage I/O Path
Guest OS Guest OS
SCSI HBA Emulation SCSI HBA Emulation
SCSI Command Emulation SCSI Command Emulation
Virtual Disk Virtual Mode RDM, VMDK, Snapshot
Virtual Disk Virtual Mode RDM, VMDK, Snapshot
Block Device Block Device
Physical Mode RDM
Physical Mode RDM
Logical Device I/O scheduler
Device and Path Management
Adapter I/O Scheduler
Device Drivers
I/O Adapter
File System Switch File System Switch
VMFS VMFS Virtual SAN Virtual SAN NFS NFS
TCP/IP
VMkernel
Virtual Machine / VMM
Physical Infra
Computer Measurement Group, India 20 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Network I/O Path
Guest OS Guest OS
NIC Emulation NIC Emulation
Virtual Switch Virtual Switch
Adapter I/O Scheduler
Device Drivers
I/O Adapter
VMkernel
Virtual Machine / VMM
Physical Infra
I/O Filters/Chain (Pre) I/O Filters/Chain (Pre)
I/O Filters/Chain (Post) I/O Filters/Chain (Post)
VLAN Team Mirror Offloads
Computer Measurement Group, India 21 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Scoping
Computer Measurement Group, India 22 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Resource utilization is high possibly impacting the performance
Computer Measurement Group, India 23 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Process takes more time in a VM when compared to Physical
Computer Measurement Group, India 24 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Application startup time is too long
Computer Measurement Group, India 25 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Application is performing poorly in VM
Computer Measurement Group, India 26 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
File copy is too slow
Physical Machine Virtual Machine
Computer Measurement Group, India 27 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Network access is too slow (Packet Drops??
Computer Measurement Group, India 28 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
What and How
• Understanding the Issue – Clearly define and scope the problem
– Collect details about the scenario of manifestation
– Have we collected the evidences (Statistics, Results)
– Do we have any specific pattern
– Is it only one VM/Application having the issue?
• Redefine – What are the expectations (SLA….)
– Is it a comparison against? If yes is it apple to apple
– Eliminate obvious resource related mismatches
– Do we have enough resources allocated?
– Redefine the issue after factoring all above
– Tools and Measures, Exit criteria
Computer Measurement Group, India 29 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Required…..
• Good to have – Platform level benchmarks
– I/O benchmarks are handy
– Application Profile
– SLA and comparison details
– More than one iteration of test and data collection
• Tools/Metrics – vCenter Data Collection, ESXTOP, vSCSI Stats, Net Stats, vRealize
Operations Manager
– Guest OS Level Statistics
– Perfmon, SAR, vmstat, iostat
– Benchmarking Tools
– IOMeter, Netperf, Uperf, Database Stress tools
Computer Measurement Group, India 30 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Approach
Computer Measurement Group, India 31 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Where to look
• Application Level – App specific Performance data
– Guest OS - CPU/Memory, I/O Statistics
• Virtualization layer – vCenter Performance Charts, Limits, Shares & other
Contentions, ESXTOP
• Physical Server Level – CPU and Memory Saturation
– Power Management
– I/O Bandwidth
• Connectivity Layer – Network/FC Switches and data paths.
– Packet Loss, Bandwidth Utilization
• Peripheral Devices – Utilization, Latency, Throughput
Computer Measurement Group, India 32 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Methodology
Test or Monitor
Collect
Data
Validate Data
Analyze Data
Tune
Computer Measurement Group, India 33 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Benchmarking
• Real Program – Run the program and measure the performance with respect to
response/completion times
• Micro Benchmark – Performance of a specific activity of a program is tested – Short run and involves no/small set of IO – Ex: Memory Allocation/De Allocation, Program Loop
• Component Benchmark – Measure raw performance of components – CPU/ Memory, Network IO, Storage IO
• Kernel Benchmarking – Targeted towards profiling kernel performance
• Synthetic Benchmarks – Specifically written tests to mimic the operations from a program/platform
• I/O, Database benchmarks – Measure systems throughput
Computer Measurement Group, India 34 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Data Collection
Computer Measurement Group, India 35 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Application and OS Stats
• Benchmark tools – IOMeter, Netperf, uperf, sysbench, DB hammer…..
• CPU Utilization – Used by Application – Privileged (System) Time V/s User Time – Load Averages or Processor Queue Length
• Memory Utilization – Available (Free) – Pages/Sec (Swap I/O), Page Faults
• I/O – Read/Writes, Throughput – Queue Lengths – Latency/Wait Times – Network Packets Received, Errors, Discarded, Resets
Computer Measurement Group, India 36 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
vSphere Layer
• vCenter Real time/Historic performance charts
– Start here as it provides historical data for comparison
– Minimum Collection Interval 20 Seconds (Refresh: 1 min)
– Enough if the problem window is longer (> ~3 minutes)
• Host Level Statistics
– Host does not store much of historic data (Real time only)
– esxtop provides real-time statistics (minimum interval: 2 Sec)
• Provides CPU, Memory, Storage, Network and other System Level Statistics
• Can be intimidating to perform live analysis
• Collect the data using batch mode option and perform offline analysis (remember to –a switch to include all stats)
– vscsistats, netstats – I/O statistics
Computer Measurement Group, India 37 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Analysis
Computer Measurement Group, India 38 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
CPU Resources
• Availability of the CPU time is key for any application performance – ESXi CPU scheduler distributes CPU time to all running worlds based
on the entitlements and reservations using share based algorithm
– Several factors influence the availability of scheduling of CPU
– Contentions can result in degraded performance
Computer Measurement Group, India 39 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
CPU Statistics
• What to Look
– USED (Used): CPU time consumed by the world
– RUN (Run): Time for which world was scheduled on the CPU
– SYS (System): CPU time used by VMkernel on behalf of world
– WAIT (Wait): CPU time spent idle/busy/waiting for resource
– IDLE (Idle): World had no work to do
– RDY (Ready): World is waiting for availability of CPU resources
– CSTP(Co-Stop): For VM with multiple vCPUs. Suggests vCPUs are waiting to be scheduled together
– SWPWT (Swap Wait): World waiting for Swap I/O completion
– DMD (Demand): Moving average of CPU demand from world
– LAT_C (Latency): Overall latency
Computer Measurement Group, India 40 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
CPU Usage
• Used and Run – A high value for used suggests CPU bound workload – Compare with Demand – Will adding more CPU to VM help?
• Used substantially differs from Run – Scheduled CPU is not consumed by the VM
• Usually latency is associated with scenario – Check the Power Management Settings for the Active Policy
• Too much transition into C1 and lower states can be a problem
• Look at the page showing Power/CPU State data (“p”)
Computer Measurement Group, India 41 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
CPU Ready and Co-Stop
• Ready: Time spent waiting for the availability of resource – Guest wants to execute but no free CPU slot available
• Co-Stop: All siblings vCPUs are not making progress at same rate – Stop the vCPU which is ahead till slower vCPUs can catchup
• Check the Physical CPU V/s Virtual CPU Ratio – Higher consolidation ratio can result in higher ready time for VM
Computer Measurement Group, India 42 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Memory Stats
• Memory State – Indicator of Memory Pressure – High : Sufficient Memory available to satisfy current demand
• No memory conservation activities necessary – Soft (< 4%): Software based conservations/sharing techniques used
• Balloon, Memory Compression, Low Latency Swap, – Hard (< 2%): Swap to disk (swap file of the VM) – Low (< 1%): Allocations would be delayed/stunned
• Not a workable scenarios. VMs will be stunned (applications fail/disconnect)
Computer Measurement Group, India 43 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Key Memory Stats
• NUMA related Statistics
– NRMEM (Not In vCenter):Memory Allocated on Remote NUMA Node
– NLMEM (Not In vCenter): Memory Allocated on Local NUMA Node
– N%L (Not in vCenter): Percentage of Memory on Local Node
• Utilization related Statistics
– GRANT (Granted): Physical memory allocated
– THCD (Active): Indicates the current Memory transaction
– MCTLSZ (Balloon): Memory reclaimed using Memory Control Driver
– SWCUR (Swapped): Memory that is swapped Out
– SWW/s, SWR/s (Swap out /in): Current Swap Activity
Computer Measurement Group, India 44 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
VM Stats in UI
Memory Usage - No Pressure Memory Usage – Under Pressure
Computer Measurement Group, India 45 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Storage Stats
• Esxtop storage views
– Adapter (d)
– Disk Device (u)
– VM (v)
• What should be looked at
– Queue Statistics
– Latency Statistics
– Error Statistics
Computer Measurement Group, India 46 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Latency – Why?
• Majority of the Device Latencies are caused by external infrastructure
– Busy Storage Device
– Load factor
– Burst I/O resulting from Backup, OS/Antivirus updates, Replication, Boot Storm
– Noisy neighbor issues
• Un-Optimized access to the Device (Path policies and configuration)
• Unreliable fabric/network
– Faulty cables, adapters
• I/O to the disk with snapshot will see inconsistent latency
Computer Measurement Group, India 47 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Network Stats
• Network issues generally manifest in failed connections, high number of retransmissions and packet drops. – TCP will recover but results in poor throughput. Application may not sustain.
• Network I/O can be more CPU intensive than Storage – Focus should be on Drop related stats
– In majority of the cases receive drops are associated with CPU contentions
– Transmit drops are associated with overload of physical network adapters
– Are we seeing too much broadcast traffic on the links?
Computer Measurement Group, India 48 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
IO Profiling
Computer Measurement Group, India 49 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
vSCSI Stats
• Data collection tool which provides into I/O pattern of a Virtual Machine
– Can be used to profile the I/O requirements of an Application
• I/O Size (Length)
• Queue Size (Otstanding I/O)
• Read/Write Ratio
• How Random (or How Sequential)
• Inter-arrival rate (How demanding)
• Latency (Read/Write)
Computer Measurement Group, India 50 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Net Stats
• Can be used to profile Network I/O
– Type of Traffic (TCP/UDP, IPV4/IPV6)
– Transmit and Receive Rates (PPS)
– Transmit and Receive Packet Sizes
– Burst or flat (Cluster of Packets)
– Inter-arrival Time (How active)
– CPU Usage
Computer Measurement Group, India 51 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Optimization
Computer Measurement Group, India 52 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Sizing and Configuration
• Virtual Machine Sizing and Configuration – Pick the right sizing for CPU and Memory
• Over allocation of CPU can result in scheduling conflicts • Under allocation of memory results in guest paging • Over allocation of Memory will result in higher overhead • Ballooning away memory might be bad for memory intensive applications
– Use pre allocated eager zeroed thin disks for I/O intensive workloads – pvSCSI and vmxnet3 have lower CPU requirement compared to other
emulations – Are we running with unwarranted/unnoticed Limits/Reservations
• Server Hardware Considerations – Make sure Hardware Assists for CPU/Memory and I/O devices are available – Is the Power Management Settings Optimal (Default: Balanced)
• Can result in too conservative CPU usage in low load scenarios • I/O bound workloads perform poorly in these condition • Disable C-States for turning off any power management(Check the BIOS)
Computer Measurement Group, India 53 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Workload Placement
• Planning should consider Average as well as Peak utilization • Consolidation Ratios
– Make sure sufficient resources for CPU bound critical workloads available
– Higher consolidation ratios are acceptable for desktop/test workloads
• Size the VM with attention to the NUMA layout of the server – VMs spreading on multiple NUMA nodes can be a problem – vNUMA will help to avoid this penalty
• Don't place all heavy hitters on same hosts – Use Affinity/Anti-Affinity rules to guaranteed placements
• DRS can help to keep the VMs happy – Make sure VMotion works (VMotion normally needs some CPU
Resources)
• Use Resource Pools (with reservations) over VM level reservations – Resource pools also help to implement priority based SLAs
Computer Measurement Group, India 54 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
I/O Optimization
• Storage – Optimal path policies to be used (Vendor involvement might be
required)
– Keep an eye on storage utilization and load
– Pick right RAID level based on the usage pattern
– Storage I/O Control can help to reduce the impact of busy conditions
– Storage DRS can keep the load balanced
• Network – Use a hardware with offload features (Savings on CPU time)
• Segmentation, Checksum etc..
– Jumbo frames have lesser CPU overhead
• May not help if average size of I/O transactions are small (~1500)
– Network I/O Control can help to prioritize the workload
• BIOS/Drivers – Recommended to keep the driver and BIOS level up to date
Computer Measurement Group, India 55 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Latency Sensitive Application
• Source of Latency – Virtualization Overhead
– CPU Contention (Consolidation Ratio)
• Additional Contention in case of using Hyper Threading
– Techniques used for optimizing the CPU usage for network handling
• Interrupt coalescing
• Large Receive Offload (LRO)
– CPU Power Management
• C-State Transitions
• Optimization – Use virtual machine settings to mark the sensitivity of the VM
• Exclusive access to CPU resources
• Limit the de-scheduling of VM
• Use of coalescing and other optimization methods are limited
– Use SR-IOV based pass-trough for further reduction in latency
Computer Measurement Group, India 56 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Questions?
Computer Measurement Group, India 57 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Thank You
Computer Measurement Group, India 58 Presented at CMG India 2nd Annual Conference 2015. Copyright © VMware Inc, 2015
Appendix
• Performance Monitoring Utilities: resxtop and esxtop – https://pubs.vmware.com/vsphere-
60/index.jsp#com.vmware.vsphere.monitoring.doc/GUID-A31249BF-B5DC-455B-AFC7-7D0BBD6E37B6.html
• vScsiStats – http://cormachogan.com/2013/07/10/getting-started-with-vscsistats/
• NetStats – Command line tool
• From a shell session run ‘net-stats –h’ for details
• Latency Sensitive Applications – http://www.vmware.com/files/pdf/techpaper/latency-sensitive-perf-
vsphere55.pdf
• vCenter Performance Data Overview – http://pubs.vmware.com/vsphere-
60/topic/com.vmware.vsphere.monitoring.doc/GUID-AA1F733C-1450-4437-AB0A-E5FA24CC386E.html