Mastering Performance Monitoring and Capacity Planning using vRealize Operations Manager
Reghuram Vasanthakumari, Staff Engineer, VMware Mohit Kataria, Product Owner, VMware
Disclaimer
• This presentation may contain product features that are currently under development
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind
• Technical feasibility and market demand will affect final delivery
• Pricing and packaging for any new technologies or features discussed or presented have not been determined
2
Agenda
1 Introduction to vRealize Operation Suite
2 Operations Management Goals
3 Real World Troubleshooting Scenarios
4 Q&A
3
4
Today’s Reality in Operations Management
Monitoring Data Overload Alert Storms
Finger Pointing
DBA
VI Storage
Over-provisioning
5
Volume of Monitoring Data is Exploding
6
Metrics & Data
Volume
Traditional Stack
(Server, Storage,
Networking, Web, App
Server and DB)
Virtualized Infrastructure
(incl. Storage and
Network Virtualization)
Distributed & Mobile Apps
(incl. Public cloud, SaaS,
Mash-ups, …)
Alert
Volume
“Operations Gap”
“Commit to a comprehensive IT Operations Analytics strategy to
optimize today's operations and support future I&O work” – Gartner
Evolution of Operations Analytics Technology
7
Proactive Reactive
Automated
Manual
Hyperic, SCOM,
Nagios, …
Traditional
Monitoring
Data collection
(Metrics, logs, …)
• Static thresholds
• Alerts
Predictive
Analytics
vRealize Operations
6.0
• Detect complex
issues from multiple
symptoms
• Remediation and
automation engine
• Scale-out, data-
agnostic platform
Data Collection Data collection Data collection
Event
Correlation
BMC, HP, CA,
IBM, …
• Aggregation
• Masking & filtering
• Rules-based alert
suppression
Data Collection Data collection Data collection
Performance
Analytics
VR Ops 1.0-5.x,
Netuitive, …
• Self-learning
• Dynamic thresholds
• Super metrics
Data collection Data collection
10x Alert
Reduction
VMware’s Approach to Operations Analytics
8
Operations Analytics & Automation
Operations Analytics & Automation
Performance & Availability Performance & Availability
Logs & Unstructured
Data
Logs & Unstructured
Data
Topology Analysis Topology Analysis
Configuration Health
Configuration Health
Capacity Planning Capacity Planning
vRealize Operations vRealize Operations
Operations Console Operations Console
Extensibility
Extensibility
Integrated Management Disciplines
Integrated Management Disciplines
Performance Performance Compliance Compliance Configuration Configuration Capacity Capacity Availability Availability
Resilient, Scale-Out Platform
Resilient, Scale-Out Platform
App Visibility App Visibility Logs* Logs* Analytics Analytics
Reporting/
Alerting
Reporting/
Alerting Automation Automation SDK SDK
Management
Packs
Management
Packs
APIs APIs
Quality of Service
Quality of Service
vRealize Operations Overview
Operational Efficiency
Operational Efficiency
Control and Compliance Control and Compliance
9
*vRealize Log Insight is not part of vRealize Operations but included with vRealize Operations Insight and vRealize Suite
Agenda
1 Introduction to vRealize Operation Suite
2 Operations Management Goals
3 Real World Troubleshooting Scenarios
4 Q&A
10
Status Quo Goal
• Are you able to meet or exceed service level expectations?
• Can you remediate issues before end users are impacted?
• How many monitoring tools are you using?
Quality of
Service
• What is your average Mean Time to Incident & Resolution?
• Do you manage your infrastructure capacity?
• How do you plan for future needs?
Operational
Efficiency
• Is your IT infrastructure compliant to regulatory standards?
• Can you proactively enforce IT standards in your organization?
Control
and
Compliance
Operations Management Goals
11
Status Quo
• Are you able to meet or exceed service level agreements?
• Can you remediate issues before end users are impacted?
• How many monitoring tools are you using?
• What is your average Mean Time to Remediate (MTTR)?
• Do you leverage automated capacity optimization to improve
resource utilization?
• Are you able to accurately forecast your future capacity needs?
• Is your IT infrastructure compliant to regulatory standards?
• Do you have the capability to create flexible groups and policies
for different resource types and teams?
• Can you proactively enforce IT standards in your organization?
Goal
What Operations Management Teams are Looking For?
Quality of
Service
Operational
Efficiency
Control
and
Compliance
12
How VMware Helps in Delivering Quality of Service
Improve performance and
avoid disruption with self-
learning management tools
Improve performance and
avoid disruption with self-
learning management tools
Key Capabilities
Benefits
90% reduction in alert volume
Proactively detect & avoid
incidents early-on
Quality of Service Quality of Service
Self-learning predictive analytics
Smart alerts identify problems
based on multiple symptoms
13
No new monitoring tools or point
products needed
Domain-specific management
packs for MS, SAP, NSX etc.
• Dynamic Thresholds
• Problem Based
• 10x Alert Reduction
• Static Thresholds
• Symptom Focused
• 100s of Alerts
Traditional Monitoring Predictive Analytics Traditional Monitoring Predictive Analytics
Evolution of Traditional Monitoring towards Operational Analytics
14
Smart Alert 1
Smart Alert 2
Smart Alert 3
Smart Alert 4
Alert Storms
Problem Based Alerts combine multiple
symptoms
Predictive Analytics
Problem Detection from
multiple symptoms drives
recommendation and
proactive action
Health Risk Efficiency
Dynamic Thresholds
How is VMware Self-learning Analytics Different?
15
Super Metrics
Dynamic Thresholds adapt
to workload changes and
eliminate alert storms and
false positives
Immediate
Issues Future
Issues
Optimization
Opportunities
Super Metrics combine
hundreds of KPIs into
health, risk and efficiency
scores
1 1 2 2 3 3
Applying Analytics to the Past, Present and Future Infrastructure and Application Behavior
Learned Behavior Expected Demand Real-time Events
< >
Historical Data Planned Projects Predicted Behavior
Automate
Workflows
Automate
Workflows Improve Analytics &
Avoid Risk
Improve Analytics &
Avoid Risk
Identify Stress &
Improve Efficiency
Identify Stress &
Improve Efficiency
vRealize
Operations
Adopting an Analytics Based Process
17
1. Identify key metrics to measure – do not focus on the UI!
2. Start with vSphere and gradually broaden scope
3. Build a library of best practices and repeatable workflows
4. Incent team to focus on issue prevention
5. Share your success with other teams
5 Steps to an Analytics Based Process
Health Alert – “Performance” Troubleshooting
18
Performance alert contributing to
degraded health. Let’s click to
see details …
Performance alert contributing to
degraded health. Let’s click to
see details …
Smart Alerts deliver Insight and Information
19
Correlate symptoms across
the stack
Correlate symptoms across
the stack
Customize Alerts to Your Needs
20
Add remediation actions from
vCenter, vRealize Orchestrator
or Python scripts
Add remediation actions from
vCenter, vRealize Orchestrator
or Python scripts
Combine Analytics with
symptoms and recommendations
Combine Analytics with
symptoms and recommendations
Status Quo
• Are you able to meet or exceed service level agreements?
• Do you user point products to manage your IT infrastructure?
• Can you remediate issues before end users are impacted?
• What is your average Mean Time to Incident & Resolution?
• Do you manage your infrastructure capacity?
• How do you plan for future needs?
• Is your IT infrastructure compliant to regulatory standards?
• Do you have the capability to create flexible groups and policies
for different resource types and teams?
• Can you proactively enforce IT standards in your organization?
Goal
What Operations Management Teams are Looking For?
Quality of
Service
Operational
Efficiency
Control
and
Compliance
21
Performance Performance Higher utilization Higher utilization
Ignore Waste Ignore Waste Higher density
Higher density
safe safe
Production Test-Dev
How would you like to
manage capacity risk?
What are your goals to
optimize your environment
22
How Do You Model Your Capacity Needs?
Identify the Right Controls Identify the Right Controls
Allocation and Demand Model Allocation and Demand Model
Over-commit ratios Over-commit ratios
Thresholds for capacity risk Thresholds for capacity risk
Buffers Buffers
Business Hours Business Hours
Compute Storage
70% Utilized (Just right)
90% Utilized (Danger)
Network
35% Utilized (Over Provisioned)
• Capacity Monitoring and Analytics
– Capacity modeling for heterogeneous environments
– Out-of-the-box default policy configuration flow
– Enhanced forecasting functions and granular data
23
How VMware simplifies Capacity Management
• Project Planning
– Enhanced “What-If Scenarios”
– Plan projects, visualize changes and reserve capacity for future projects
– Extensible views, reports and alert definitions for capacity
Right-size environment
Run What-If Scenarios based on business needs
Capacity Analytics
CONFIDENTIAL 24
Capacity Analytics to inform when,
why, what and where
Capacity Analytics to inform when,
why, what and where
Granular breakdown of
capacity metrics for
Compute, Memory,
Network and Storage
Granular breakdown of
capacity metrics for
Compute, Memory,
Network and Storage
Capacity Planning – New Project
CONFIDENTIAL 25
Add new VMs to deploy
SharePoint app into Cluster
Add new VMs to deploy
SharePoint app into Cluster
Use existing profile of VMs
to calculate capacity needs
Use existing profile of VMs
to calculate capacity needs
Based on this new project
Cluster will need more capacity
Based on this new project
Cluster will need more capacity
Planning – Add Capacity
CONFIDENTIAL 26
Capacity plan is good! Capacity plan is good!
Plan another project to see how
many ESXi hosts are needed to
meet capacity shortfall
Plan another project to see how
many ESXi hosts are needed to
meet capacity shortfall
Optimization – Identify Overprovisioned Resources
CONFIDENTIAL – Shared under NDA ONLY 27
Breakdown of
reclaimable capacity
Breakdown of
reclaimable capacity
Automation – Take Action to Reclaim Capacity
28
One-click action
to optimize your capacity
One-click action
to optimize your capacity
Status Quo
• Are you able to meet or exceed service level agreements?
• Do you user point products to manage your IT infrastructure?
• Can you remediate issues before end users are impacted?
• What is your average Mean Time to Remediate (MTTR)?
• Do you leverage automated capacity optimization to improve
resource utilization?
• Are you able to accurately forecast your future capacity needs?
• Is your IT infrastructure compliant to regulatory standards?
• Can you proactively enforce IT standards in your organization?
Goal
What Operations Management Teams are Looking For?
Quality of
Service
Operational
Efficiency
Control
and
Compliance
29
How VMware Helps in Enabling More Compliance and Control
Get continuous compliance and
proactive management across
apps and infrastructure
Get continuous compliance and
proactive management across
apps and infrastructure
Key Capabilities
Benefits
Control and Compliance Control and Compliance
30
Proactive management via
flexible groups and policies
Adhere to vendor guidelines.
security best practices and
regulatory standards.
45% reduction in time spent on
ensuring compliance
Complete control with no need for
manual processes
IT Compliance Challenges
31
Silo-ed Monitoring and Compliance
Monitoring Compliance
Not integrated
No Performance Correlation to Changes
Performance Changes
Managing Users and Access Controls
Need to have tight controls in place
Missing insights
Multitude of Requirements
Security Best Practices
Vendor Hardening Guidelines
Regulatory Standards
VMware Covers the Spectrum of IT Compliance
32
•Achieve compliance to regulatory standards such as PCI, HIPAA etc.
•Ensure the compliance to internal IT policies and security best practices.
•Adopt latest guidelines from vendors such as Microsoft, Cisco etc.
•Deploy and operate VMware Products in a secure manner.
vSphere Security
Hardening
vSphere Security
Hardening
Vendor Best
Practices
Vendor Best
Practices
Regulatory Compliance Regulatory Compliance
Custom IT Policies
Custom IT Policies
Flexible Groups and Policies
33
• Proactive Management
– Prioritize critical workloads by defining thresholds, alerts and configuration settings for specific resource groups
– Define custom policies for specific workload types, applications or clusters.
– Apply to both vSphere and non vSphere object types
– Example: Production resources vs. development resources
Monitor compliance to
standards
Monitor compliance to
standards
PCI DSS Standard PCI DSS Standard
Continuous Compliance Monitoring & Enforcement
34
Take action on non-compliant items by
launching Configuration Manager
Take action on non-compliant items by
launching Configuration Manager
Operations Management in the Cloud Era
Purpose built for mobile/cloud era • Self-learning predictive analytics and smart alerts
• Capacity optimization across virtual and physical stack
Policy based automation • Automated root cause analysis with compliance visibility
• Granular access control and orchestrated workflows
Fast time to value • Fast and easy deployment as a virtual appliance
• Best for vSphere and supports multi hypervisors
1
2
3
START TODAY!
“Intelligent Operations from Apps to Storage”
From the trusted market leader • Virtualization and cloud systems management leader
• The only integrated, open and comprehensive solution
4
35
Agenda
36
1 Introduction to vRealize Operations Suite
2 Operations Management Goals
3 Real World Troubleshooting Scenarios
4 Q&A
How do Customers find problems in their infrastructure ?
37
Search for
problem
Search for
problem Phone call /
support ticket
Phone call /
support ticket Big Visual Big Visual Blind Luck !
Start By
vR Ops God !
Alerts/Notifications Alerts/Notifications
One day in the life of VMware Admin…
• A VM Owner complains to IaaS Team that her VM is slow.
• Her application architect has verified that:
– The VM CPU and RAM utilization is good.
– The disk latency is good.
– There is no network drop packets.
– No change in the application settings
– No recent patch to Windows
What do you do?
• A: Check ESXi utilization. If it’s low, tell her to doubt no more.
• B: Buy her a nice lunch + flower. Ask her to forget about it
• C: Call your VMware TAM & MCS. That’s why you pay them right?
• D: Roll up your sleeve. You are born for this!
What’s wrong with these statements?
• Cluster CPU
– CPU Ratio is high at 1:5 times on cluster “XYZ”
– Rest all other cluster overcommit ratio looks good around 1:3
– Keep the over commitment ratio to 1:4.
– CPU usage is around 50% on cluster “ABCDE”. Since they are UAT servers, don’t worry.
– Rest other cluster CPU utilization is around 25%. This is good!
• Cluster RAM
– We recommend 1:2 overcommit ratio between physical RAM and virtual RAM.
– Memory Usage on most of the cluster is high around 60%
– Cluster “ABCD” is running peak at around 75%. CPU utilization should be less than 70%
– If we see that Active Mem% is also high than we should add more RAM to cluster
– % Active should not exceed 50-60% and Memory should be running at high state on each host
39
Monitoring
• There are 2 levels to monitor in VMware:
– The VM.
• VM is the most important as that’s all customers care.
• They do not care about your infrastructure. It is a Service. IaaS.
– The Infra.
• Software: NSX, vCenter, VSAN, vRealize, Distributed vSwitch, Datastore
• ESXi + hardware
• Storage & Fabric
• Network
• There are 4 areas to monitor
• The 4 areas above impact one another
2 distinct layer
SDDC SDDC
VM VM VM VM VM VM VM VM
VM VM VM VM VM VM VM VM
VM VM VM VM VM VM VM VM
VM VM VM VM VM VM VM VM
Performance: We check if it is being served well by the platform. Other VM is irrelevant from VM Owner point of view.
Performance: We check if it is being served well by the platform. Other VM is irrelevant from VM Owner point of view.
1 1
Capacity. We check if VM is right-sized. If too small, increase its configuration. If too big, right size it for better performance
Capacity. We check if VM is right-sized. If too small, increase its configuration. If too big, right size it for better performance
2 2
Performance: We check if IaaS is serving everyone well. Make sure there is no contention for resource among all the VMs
Performance: We check if IaaS is serving everyone well. Make sure there is no contention for resource among all the VMs
1 1
Capacity: Check utilization. Too low, we spent too much on hardware. Too high, we need to buy more hardware.
Capacity: Check utilization. Too low, we spent too much on hardware. Too high, we need to buy more hardware.
2 2
Configuration: Check for Compliance and Config Drift Availability: Get alert for hardware fault or software stop working
Configuration: Check for Compliance and Config Drift Availability: Get alert for hardware fault or software stop working
3 3
Consumer Layer
Provider Layer
Performance
How do you know your IaaS is performing fast? How do you know your IaaS is performing fast?
ESXi utilization a 10% means your ESXi is fast?
ESXi utilization a 90% means your ESXi is fast?
Storage is doing 10K IOPS?
Network is processing 8 Gbps?
ESXi utilization a 10% means your ESXi is fast?
ESXi utilization a 90% means your ESXi is fast?
Storage is doing 10K IOPS?
Network is processing 8 Gbps?
What counter do you use as a proof to your customers (VM Owner)? What counter do you use as a proof to your customers (VM Owner)?
Utilization? Utilization?
Performance is measured by how well your IaaS serves the VMs.
Fast is relative to your customer. Use SLA as your defense line.
Capacity
Performance and Capacity Management
Performance Capacity
Focus is on the VM. In most cases, does not apply to IaaS
Focus is on the IaaS. VM Capacity Management is just right sizing
Primary counter: Contention or Latency. Utilization is largely irrelevant.
Primary counter: Contention or Latency Secondary counter: Utilization
Does not take into account Availability SLA
Takes into account Availability SLA Tier 1 is in fact Availabity-driven.
The Consumer Layer The “dining area”
CONFIDENTIAL 45
How a VM gets its resource
Provisioned
Limit
Reservation
Entitlement
0 vCPU or 0 GB
Contention
Usage
Demand This is the counter
we need to measure
4 vCPU or 16 GB
Dashboards
• Detail monitoring of a single VM
– When customer complains that his VM is slow. Can help desk value right away?
• Large VMs Monitoring
– Because they are actually hurting your IaaS business
– This impacts both Performance and Capacity
• VM Right Sizing
• Excessive Usage
– Excessive Usage by 1-2 VM can impact the overall IaaS performance.
– VMs with excessive usage hurts the business, if we do not charge for Network and Disk IOPS
Single VM Monitoring
• A VM Owner complains that his VM is slow.
– It was okay the day before
– How does Help Desk quickly determine where the issue is?
• How well does Infra serve the VM?
– VM CPU Contention
– VM RAM contention
– VM Disk latency. For each virtual disk, not average.
• Is VM undersized?
– VM CPU Utilisation
– VM RAM Consumed (not Usage)
– VM RAM Usage
– VM Disk IOPS
Dashboard 1
Single VM
Monitoring
Dashboard 1
Single VM
Monitoring
Are the Large VMs oversized?
• They cause performance issue
– They impact others, and also themselves!
– ESXi vmkernel scheduler has to find available cores for all the vCPU, even though they are idle.
– Other VMs maybe migrated from core to core. The counter at esxtop tracks this migration.
• Tends to have slower performance
– ESXi may not have all the available vCPU for them.
• Reduces consolidation ratio
– You can pack more vCPU with smaller VM than with big VM.
– Unless you have progressive pricing, you make more money with smaller VM as you sell more vCPU.
Dashboard of Large VMs
• Overall Picture
– A line chart showing Max CPU Demand among all the Large VMs
• If this is low, they are way oversubscribed. Remember, it only takes 1 VM to make this number high.
• This number should be 80% most of the time, indicating right sizing.
– A line chart showing Average CPU Demand
• If this chart is below <25% all the time for entire month, then the large VMs are over sized.
• Heat Map of Large VMs
– Size by vCPU config. So it’s easy to see who the biggest among these large VMs.
– Color by CPU Workload. Both high and low are bad. You want to see ~50% CPU utilisation
• To differentiate between the 2 ends, choose Black and Red. Expect to see mostly green.
• Top-N CPU Demand
– Allows us to zoom into specific time to see the past
• Line chart of a selected VM (automatically plotted)
As expected, the Max of All VMs is low. We can go
back in time and see over 3 months. As expected, they are mostly Black. This means
they are over provisioned.
This shows the Top 15 VM. You can change the
period to any time. This is auto shown. We are showing CPU and RAM.
You expect 70% range, not 20% like this example.
CONFIDENTIAL 57
CONFIDENTIAL 58
Dashboard 2
Large VM
Monitoring
Dashboard 2
Large VM
Monitoring
Any Excessive Utilization in our DC?
• A VM consumes 5 resources:
1. vCPU
2. vRAM (GB)
3. Disk Space
4. Disk IOPS
5. Network (Mbps)
• The first 3 you can bound and control
• The last 2 you can, but normally you don’t do it. You should.
– Application Team does not normally know how much IOPS or Network they need.
– Do you allow any VM to generate 100K IOPS?
– Do you allow any VM to saturate 1Gb link?
• Need a dashboard to track excessive usage
– Disk IOPS
– Network throughput
Dashboard for Excessive Utilisation
• Excessive Storage consumption
– Line Chart:
• Max VM Disk IOPS among all VMs
• Average VM Disk IOPS
– Heat Map
• Size by IOPS. Color by Latency
• If you see a big box, that means you have a VM dominating your storage IOPS.
• Excessive Network consumption
– Similar concept as above
This tracks the IOPS from VM. From here we can tell is a distinct peak. It looks like it’s coming from
1 VM, as the average is far lower. This is a cluster of 500 VM, so even if 1 VM hits 13,200 IOPS, the
average did not even pass 15 IOPS.
Let’s zoom into the peak.
Excessive Storage Dashboard
The peak was 13,212 IOPS on 24 May, around 3:16 am. Let’s find out
which VM.
Excessive Storage Dashboard
• We can list the Top VMs generating the IOPS on any given period.
Bingo, it was VM 63ee that did that 13212 IOPS.
Catcha!
The dashboards are great.
But it does not tell you how the IOPS distribution
among all the VMs. It also does not tell if the VMs
are experiencing high latency.
You need a Heat Map for this.
At a glance, we can tell the IOPS distribution among the VMs. We can also tell if they getting low
latency or not.
Dashboard 3
Excessive DC
Utilization
Dashboard 3
Excessive DC
Utilization
And that’s it! You “passed” those dashboards, you’re done with the “dining area”!
67
The Provider Layer The “kitchen”
CONFIDENTIAL 68
Performance Management
• Overall Performance Monitoring
– Is any of our customers experiencing bad performance?
– CPU, RAM, Disk, Network
• If yes, who are affected?
– Different VM may get different impact.
– VM 007 may get hit on CPU, while VM 747 may get hit on Storage.
Performance SLA Monitoring
• How do we prove that….not a single VM… in any service tier…. fails the SLA threshold we agree for that tier… in the past 1 month?
• Since VMs move around in a cluster due to DRS and HA, we need to track at Cluster level.
• If you oversubscribe, there is a risk of Contention.
– For Tier 1, do not overcommit.
– For Tier 2 and 3, do overcommit.
Using Max and Average to determine how VMs are served
If the Max is: • below what you think your customers can tolerate, then you are good.
• Near the threshold, then your capacity is full. Do not add more VM.
• Above the threshold, move a few VMs out, preferably the large ones.
This dashboard is good as summary. You stop here if there is no issue.
Yes, 1 dashboard!
Which VMs are affected?
• The previous slides give us info at Cluster level.
– If there is no VM affected, it’s good. No need to analyse further.
– If there are VMs affected, we want to know which ones.
• We can address the above by listing the Top 30 VM
– CPU Contention
– RAM Contention
– Disk Latency
– Network drop packet (ensure it is 0)
– Network latency (this needs NetFlow)
These are the top 40 VMs which
experienced the worst CPU
Contention.
These are the top 40 VMs which
experienced the worst RAM
Contention.
These are the top 40 VMs which
experienced the worst Disk
Latency.
And that’s it! If Performance is ok, it’s time to review Capacity
79
Capacity Management based on Business Policy
http://virtual-red-dot.info/capacity-management-based-on-business-policy/
Performance Policy
81
Group Discussion: What should your Performance Policy be?
Capacity Management: Tier 1
5 line charts showing these in the past 3 months
• Number of vCPU left in the cluster.
• Number of vRAM left in the cluster.
• Number of VM left in the cluster.
• Maximum & Average storage latency experience by any VM in the cluster
• “Usable” space left in the datastore cluster.
82
If the number is approaching low number (your threshold) for it’s time to
increase supply (e.g. IOPS, Cluster)
If the number is approaching low number (your threshold) for it’s time to
increase supply (e.g. IOPS, Cluster)
Capacity Management: Tier 2 or 3
5 line charts showing data in the past 3 months
• The Maximum CPU Contention experience by any VM in the cluster.
– This number has to be lower than the SLA we promise.
• The Maximum RAM Contention experience by any VM in the cluster.
– This number has to be lower than the SLA we promise.
• The total number of VM left in the cluster.
• The Maximum & Average storage latency experience by any VM in the cluster
• The disk capacity left in the datastore cluster.
83
Key Takeaways
Agree on a Performance SLA.
Contention, not Utilization.
Capacity is defined by Performance.
CONFIDENTIAL 84
Thank you
Appendix
86
Understanding VM CPU Demand vs Usage
vSphere Reported
Cpu Usage What VM Got Right now
Contention What VM Could not Get
vROps Reported CPU Demand What VM wants
If CPU Demand What VM wants
Cpu Usage What VM Got Right now
Performance
Impact
Performance
Impact VM Has Needs Troubleshooting Troubleshooting
Troubleshooting Population Pressure
Entitlement What VM can ever Get
Cpu Usage What VM Got Right now
Contention What VM Could not get
Has If VM Population
Pressure
Population
Pressure Needs Move VM Move VM
Add
Physical Capacity
Add
Physical Capacity
vR Ops 6.0 Out of Box
Troubleshooting Memory
CONFIDENTIAL 89
Allocation
(No Overcommit)
Allocation
(No Overcommit)
• Most Conservative • Configured Memory • Wasteful in non Production Env
Usage
(Active)
Usage
(Active)
• Most Aggressive • Current Active Demand
Consumed
(All Touched Bits)
Consumed
(All Touched Bits)
• vSphere reported • Moderate Approach • Java, SQL, Xchange
Oracle VM • Memory Configured : 1GB • Memory Consumed : 721MB • Memory Demand : 292MB
Our Philosophy Is Not your Philosophy : Mem Consumed in 6.1
91
Total Memory Touched by VM
vSphere vROps 6.1