how are my vm’s doing? managing for performance
DESCRIPTION
How Are My VM’s Doing? Managing for Performance. Mike Matchett Dir. Product Management [email protected]. Akorri Customers & Awards Any size enterprise across multiple industries. Healthcare, Education, Legal. Manufacturing, High Tech, Pharma. Financial / Insurance. Online Services. - PowerPoint PPT PresentationTRANSCRIPT
Akorri Copyright © 2008
How Are My VM’s Doing?Managing for Performance
Mike Matchett Dir. Product [email protected]
Akorri © 2008 www.akorri.com2
Akorri Customers & AwardsAny size enterprise across multiple industries
Financial / InsuranceHealthcare, Education, Legal
Manufacturing, High Tech, Pharma Online Services
Akorri © 2008 www.akorri.com3
Virtualization Decouples Apps & Resources
SAN SAN
NETWORK
Server PoolServer Pool
SAN LAYER
Storage PoolStorage Pool Tier 1Tier 2
Archive
NETWORK
Physical Infrastructure Model Virtual Infrastructure Model
Akorri © 2008 www.akorri.com4
Management of IT Virtualization
• Good– Sharing resource “Pools” means less dedicated waste– Normalized resource units lowers administrative costs– Explicit "entitlements“ with “unused” available at peaks
• Bad– Hard to see deep physical resource sharing by application– Hard to tell if the whole pool is shared efficiently– When contention happens it’s bad for everyone at once
• Ugly Ugly – Who's 100% is really 100%?
– ESX knobs and switches control capacity, not performance
Akorri © 2008 www.akorri.com5
AkorriCross-domain Management for Virtual Infrastructure
• Agent-less Collection Across Databases, Servers, Storage, VMware & Storage Virtualization
• Advanced Analytics & Modeling
• Performance and Utilization
• Troubleshooting & Root Cause
• Optimization and Planning
• Rapidly Delivers ROI:
– Faster Problem Resolution– Avoid Performance Problems– Planning and Optimization
V2.0
Akorri © 2008 www.akorri.com6
Akorri BalancePoint’s ModelIncludes Server and Storage Virtualization
Storage Storage
Virtualization
VirtualizationServer
Server
Virtualization
Virtualization
Akorri © 2008 www.akorri.com7
Troubleshooting Performance Issues is Difficult in Virtualized Data Centers
RecognizeRecognizeProblemProblemProactiveProactiveAnalysisAnalysis
NoNoProblemProblem
•Map virtual topology•Identify faults•Identify bottlenecks•Identify contention•Make recommendations
•IRT™•Performance Index™•Utilization Analysis•Management Reporting
X-Domain Analytics
ResolveResolveProblemProblem
BalancePointBalancePoint
BalancePoint
RecognizeRecognizeProblemProblem
Track DownTrack DownDepend-Depend-enciesencies
InterrogateInterrogateComponentsComponents
IsolateIsolateFaultsFaults
FindFind““Root Root
Cause”Cause”
ResolveResolveProblemProblem
Akorri © 2008 www.akorri.com8
Example:Topology
• VMware ESX• Netapp iSCSI
• CPU problem
Akorri © 2008 www.akorri.com9
Same Example – Storage View
Akorri © 2008 www.akorri.com10
KPIs and Metrics - Example
Infrastructure Infrastructure ResponseResponseTimeTime
UsageUsageIndexIndex
IOPSIOPS
CapacityCapacity
Akorri © 2008 www.akorri.com11
Understand Resource Contention
Application Contention for a RAID group
VMware ESX Server CPU Contention
Akorri © 2008 www.akorri.com12
Dynamic Thresholds and Prediction• Thresholds can be dynamically set based on historical behavior• Predicts performance for the next 48 hours• Helps to manage seasonality and identify spikes in future activity
Identify ProblemsBefore They Happen
Akorri © 2008 www.akorri.com13
IT Service Management
For Effectiveness (Performance Analysis) -• Load/Throughput - Number of Transactions• Response Time – Time it takes a Transaction to
complete
And for Efficiency (Capacity Management) -• Utilization – How Busy is the service?
– How much of the available service capacity is being used?– How many transactions can it handle at good performance
levels?
Akorri © 2008 www.akorri.com14
Response Time is Non-Linear• Max Capacity
happens when system is 100% utilized
• Service Level is set to a performance threshold
• Optimal Capacity happens at less than 100% utilization
Res
po
nse
Tim
e (s
ec/t
ran
)
Arrival Rate (trans/sec)
Uti
lizat
ionService
Level
ServiceTime
0 Trans = MAXThroughput
= OptimalThroughput
0%
100%
Akorri © 2008 www.akorri.com15
Queuing Theory to The Rescue…
• Queuing Models create Response Time curves– Based on established mathematics– Useful analytically (historically) as well as predictively– A simple queuing model can represent a check-out line at the
grocery store
• Complex Queuing Network Models can represent nested IT domains– Advanced cross-domain solutions model IT virtualization
Akorri © 2008 www.akorri.com16
Infrastructure Response Time Are we giving good performance? • Infrastructure Efficiency -
How long to service each transaction?
• Can be scored for how much of the time good service is provided…– But requires a known
Service LevelP
erfo
rman
ce
Service Level
ResponseResponse Time Time
Akorri © 2008 www.akorri.com17
Akorri Performance IndexA better 100%...
Is Infrastructure Over- or Under-Utilized?
= 100 Optimal Utilization– Optimal Point is based on
modeling for performance
> 100 OVER– Performance is in jeopardy
– Infrastructure over-utilized
< 100 UNDER– Performance is stable
– Infrastructure has headroom
Per
form
ance
PI = 0
PI = 100
PI > 100
OptimalOptimal PointPoint
Akorri © 2008 www.akorri.com18
Practical Examples with BalancePoint
• Operations Management and P2V Planning
• Justifying Additional Physical Servers for Virtualized Server Clusters
• Trigger/measure IT optimization projects
• CIO Investment Planning
Akorri © 2008 www.akorri.com19
Scorecard ReportingKey Performance and Utilization Information for ESX and VMs,
Physical Servers, Application Service, Storage Usage
Akorri © 2008 www.akorri.com20
Do I Need More ESX Hosts?Can My Current Servers Support More Virtual Machines?
• E.g. VMware ESX Servers
• Model for PI factors in:– Server capability
– Storage capability
– Other apps (contention)
• Easily rolls up to cluster, domain, and datacenter scores
100
Workload(Transactions per Sec)
Yes: More VMs
No: Over Utilized
Per
form
ance
(Re
sp
on
se
Tim
e p
re T
ran
sa
cti
on
)
Akorri © 2008 www.akorri.com21
Example: VMware Status ReportKey Performance and Utilization Information for ESX and VMs
Akorri © 2008 www.akorri.com22
The Business of IT Trigger and Measure IT Optimization Projects
For Example - • If PI is always low (<20%)
– Server Consolidation
– Storage Tiering
• If PI is often high (>120%)
– Infrastructure Upgrades
– Application Tuning
• If PI varies high and low– Load Re-balancing
– Server and Storage Virtualization
PI over Time
PI over Time
Akorri © 2008 www.akorri.com23
CIO Reviews IT KPI’s For Every Application/VM each Quarter…
2020
4040
6060
8080
100100
12512515015019019024024032532540540548048057057069069090090010001000
0-100: SHOWS HEADROOM0-100: SHOWS HEADROOM
100+: INDICATES RISK100+: INDICATES RISK
• PI is “linear” up to 100• A score of 100 = “Optimal”• Example: an ESX server with 5 VMs and a PI
score of “50” could handle 5 more similar VMs
• PI is “non-linear” over 100• Escalates rapidly with poor performance• High penalty for poor service levels
The PI ScaleThe PI Scale
Akorri © 2008 www.akorri.com24
For Akorri VMware CustomersWhat We Do:
• Provide single view of VMware infrastructure
• Alert on current and future performance problems, and identifies the source of the problem
• Help troubleshoot performance problems through advanced analytics and predictive modeling
• Optimize server/storage utilization
• Drive IT alignment across virtual infrastructure
BalancePoint helps ensure the success of BalancePoint helps ensure the success of virtualization projects in production environments.virtualization projects in production environments.
Akorri © 2008 www.akorri.com25
Thanks!
Mike Matchett
Director Product Management
Akorri
http://www.akorri.com
Live BalancePoint webex demos every Wed – check website for details…
Akorri © 2008 www.akorri.com26
Additional Slides
Akorri © 2008 www.akorri.com27
Availability v. Performance
• Availability– Relatively easy to monitor and measure inside and out– ROI is limited to minimizing amount of downtime
• 100% uptime is the best you can do
• Performance– Hard to measure internally, calibrate externally– ROI is theoretically unbounded
• Can always try to improve performance another 10%...
Improving Availability from 99.99 to 99.999% buys 5 minutes of uptime/yr - (100% * 5 min).
Improving Performance by 10% can buy continuing productivity - (+10% * 7*24*365*60 min)
Akorri © 2008 www.akorri.com28
Manage Availability or Performance?
• Availability – Under-performing systems don’t meet service levels,
and are therefore not considered available…
• Performance– Un-available systems are just performing very very
badly…
At a service level the all-or-nothing Availability definition works. However IT must use performance to manage, optimize, and plan.
Akorri © 2008 www.akorri.com29
Infrastructure AgilityProve Virtualization Works…
• An analysis of variance of infrastructure efficiency over time– Lower variance means
higher agility
• Resources dedicated to single applications will usually show low agility
• Shared resource pools are dynamically assigned to applications demonstrating high agility
PI over Time
PI over Time
High Variability = Low Agility
Agile Datacenters automatically handle large changes in application usage while also optimizing IT investment!
Akorri © 2008 www.akorri.com30
How?• BalancePoint discovers and collects performance and
utilization data directly from VirtualCenter and also from:– VM OS – ESX Server OS– Database– Server components– Storage systems
• Collection is done without any software agents
• BalancePoint uses advanced analytical techniques to correlate across the I/O stack:– Queue depth analysis– Infrastructure response time and throughput– Historical / time series analysis– Storage and server capacity utilization analysis and trending
Akorri © 2008 www.akorri.com31
BalancePoint VMware Advantages
• Multiple points of deep data collection and analysis across domains – DB, VM, CPU, memory, HBA, array– Not simply collecting and presenting VirtualCenter stats
• Heterogeneous storage array support and drill down – Other VMware management tools have little/no storage insight
• Akorri performance analytics & metrics (IRT, UI, PI)– Not simply reporting raw stats
• Rapid installation due to agent-less design– No heavy agent infrastructure
BalancePoint shows exactly what is happening BalancePoint shows exactly what is happening across the server and storage infrastructure.across the server and storage infrastructure.
Akorri © 2008 www.akorri.com32
What Else?
• Akorri is a VMware Technology Alliance partner
• BalancePoint is VMotion/DRS “aware”– Identifies when a VM has moved & tracks performance changes
• BalancePoint supports all major storage array vendors– EMC, IBM, HP, HDS, Netapp, Dell, Engenio, Dot Hill, etc.
• BalancePoint supports all major server OS’s– VMware, Linux, Windows, Solaris, HPUX, AIX, etc.
Managing VirtualInfrastructure
Akorri © 2008 www.akorri.com33
BalancePoint Produces Results• An internet business avoided purchasingavoided purchasing unnecessary
storage hardware worth $350K.
• A financial firm found a bottleneckfound a bottleneck in HBA settings that was slowing down millions of dollars worth of storage.
• An insurance company realized $271K in year-one ROI$271K in year-one ROI
• A healthcare company cut troubleshooting timecut troubleshooting time for application performance events in half.
• A financial company avoided buying more softwareavoided buying more software that could only manage vendor-specific platforms.
• A service provider used BalancePoint to ensure the success of a business-critical VMware projectbusiness-critical VMware project.
Akorri © 2008 www.akorri.com34
Oracle and SQL• Automatically maps database
to storage infrastructure– Oracle instances
– Oracle schema elements
• Creates ViewPoint Topology
• Provides visibility into complex Oracle configurations
• Improves troubleshooting of Oracle issues, performance and capacity problems
Deep Storage Insight for Database Applications