or distribution - rainfocus | the world’s only insight …...or distribution • this presentation...
TRANSCRIPT
Speaker(s)
SER2849BU
Sai Inabattini
Extreme Performance Series: Predictive DRS -Performance and Best Practices
VMworld 2017 Content: Not fo
r publication or distri
bution
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
#SER2849B CONFIDENTIAL 2
VMworld 2017 Content: Not fo
r publication or distri
bution
Case 1
• Some VMs suffer briefly because of periodic resource usage surges in other VMs in my cluster – application performance drop
Case 2
• I tend to reserve capacity for VMs based on their peak load, even when their average loads are much lower – inefficient resource usage
Problem: Occasional Resource Contention
#SER2849B CONFIDENTIAL 3
VMworld 2017 Content: Not fo
r publication or distri
bution
Addressing Resource Contention
1. Reactive
– Move VMs when the contention happens
– Benefits: Minimal overhead, only move VMs that must be moved
– Performance impact: VMs may suffer briefly, since remediation happens after contention starts
2. Proactive
a) Statically reserve more resources
• Performance impact: No impact, but overprovisioning resources is not good
b) Learn workload pattern, move VMs before resource demand spike
• Performance impact: No impact for regular, periodic workloads
• Cannot handle unexpected short term resource demand spikes
• Cannot predict events that trigger load imbalance
#SER2849B CONFIDENTIAL 4
VMworld 2017 Content: Not fo
r publication or distri
bution
What Is the Best Solution?
Good balance of both approaches:
Predicting future demands + Reacting to current and future demands
This is predictive-DRS (pDRS)
• New in vSphere 6.5 and vROPs 6.4
#SER2849B CONFIDENTIAL 5
VMworld 2017 Content: Not fo
r publication or distri
bution
Speaker(s)
SER2849BU
Sai Inabattini
Extreme Performance Series: Predictive DRS -Performance and Best Practices
VMworld 2017 Content: Not fo
r publication or distri
bution
Agenda
1 Introduction to pDRS
2 Software requirements and Configuration
3 Application performance case studies
4 Frequently Asked Questions
5 Conclusion
#SER2849B CONFIDENTIAL 7
VMworld 2017 Content: Not fo
r publication or distri
bution
IntroductionWhat is pDRS?
VMworld 2017 Content: Not fo
r publication or distri
bution
What Is Predictive DRS?
• DRS enabled with predictions
• Powerful resource scheduling of DRS + Predictive analytics of vROPs
• Uses both reactive and proactive approaches to balance the workload distribution
vSphere DRS
vRealize Operations
p
#SER2849B CONFIDENTIAL 9
VMworld 2017 Content: Not fo
r publication or distri
bution
How Does pDRS Work?
Resource usage data
from vCenter
Predictions
Recommendations
DRS/vCenter
vROps
#SER2849B CONFIDENTIAL 10
VMworld 2017 Content: Not fo
r publication or distri
bution
vROps Dynamic Thresholds (DT)
• Sophisticated Analytics – 10 different algorithms
• Learns Normal Behavior for every metric for every object
• Detects Hourly, daily, monthly patterns
• Generates Upper and Lower bound of “normal” called Dynamic Thresholds (DT)
• This DT is polished and filtered to generate predictions
#SER2849B CONFIDENTIAL 11
VMworld 2017 Content: Not fo
r publication or distri
bution
Software Requirements and Configuration
VMworld 2017 Content: Not fo
r publication or distri
bution
Software Requirements
• vSphere 6.5 (Enterprise+)
• vRealize Operations Manager (vROps) 6.4 and newer
– vROps 6.4: Supports up to 4000 VMs in a cluster
– vROps 6.5 and above, No limits on vROps side
vROps + vSphere 6.5(E+) = pDRS is Free
#SER2849B CONFIDENTIAL 13
VMworld 2017 Content: Not fo
r publication or distri
bution
Configuration - vSphere
• Enable predictive DRS in the vCenter server
– cluster → configure → vSphere DRS
• Make sure that the clocks in vCenter server and vROps are synced to within a few minutes.
Note: If the clock skew is > 5 Mins, vCenter server discards the predictions
#SER2849B CONFIDENTIAL 14
VMworld 2017 Content: Not fo
r publication or distri
bution
Configuration – vROps
Enable vCenter adapter to provide stats to pDRS
#SER2849B CONFIDENTIAL 15
VMworld 2017 Content: Not fo
r publication or distri
bution
Application Performance Case Studies
VMworld 2017 Content: Not fo
r publication or distri
bution
Test Scenario
• “Follow the sun” model
Type A – 8 hours
Type B – 8 hours
Type C – 8 hours
Type B starts
Type A ends
#SER2849B CONFIDENTIAL 17
VMworld 2017 Content: Not fo
r publication or distri
bution
Detecting Workload Surges
• Two cases
– Impact of predictions on the Type A workloads (already running)
– Impact of predictions on the Type B workloads (about to start)
• Test Benchmark: DVDStore
– Benchmark tool that simulates an online store that sells DVDs
– OLTP Database test workload
– Workload is CPU intensive
– Throughput is recorded in transactions per minute
• Test VMs
– Windows VMs
– DVDStore VMs
#SER2849B CONFIDENTIAL 18
VMworld 2017 Content: Not fo
r publication or distri
bution
• Initial State
Impact on Type A Workloads
DVDStore VMs (Type A)
Idle Windows VMs (Type B)
#SER2849B CONFIDENTIAL 19
VMworld 2017 Content: Not fo
r publication or distri
bution
Impact on Type A (contd.)
Predictions from vROps for Type B VMs
#SER2849B CONFIDENTIAL 20
VMworld 2017 Content: Not fo
r publication or distri
bution
• DRS Recommendations (due to predictions for Type B VMs)
Impact on Type A (contd.)
pDRS enabled
pDRS disabled
#SER2849B CONFIDENTIAL 21
VMworld 2017 Content: Not fo
r publication or distri
bution
Impact on Type A (contd.)
• Final state after workload surge
With pDRS - Disabled
With pDRS - Enabled
#SER2849B CONFIDENTIAL 22
VMworld 2017 Content: Not fo
r publication or distri
bution
Impact on Type A (contd.)
• Application performance
Type B workload starts
DRS remediates the Imbalance
#SER2849B CONFIDENTIAL 23
VMworld 2017 Content: Not fo
r publication or distri
bution
Impact on Type B (Newly Started Workloads)
• Initial State
Idle DVDStore VMs (Type B)
Windows VMs (Type A)
#SER2849B CONFIDENTIAL 24
VMworld 2017 Content: Not fo
r publication or distri
bution
Impact on Type B (contd.)
• Predictions from vROps for Type B VMs
#SER2849B CONFIDENTIAL 25
VMworld 2017 Content: Not fo
r publication or distri
bution
• DRS Recommendations
Impact on Type B (contd.)
pDRS enabled
pDRS disabled
#SER2849B CONFIDENTIAL 26
VMworld 2017 Content: Not fo
r publication or distri
bution
Impact on Type B (contd.)
• Final state after workload surge
pDRS disabled
pDRS enabled
#SER2849B CONFIDENTIAL 27
VMworld 2017 Content: Not fo
r publication or distri
bution
DVDStore – Effect of load during Application startup
• Application Performance
Impact on Type B (contd.)
Application Starts
DRS remediates load imbalance
Workload stabilizes
#SER2849B CONFIDENTIAL 28
VMworld 2017 Content: Not fo
r publication or distri
bution
II. Distributed Power Management (DPM) with Predictions
• DPM is the cluster level power management engine that provides additional power savings
• Dynamically consolidates workloads during periods of low resource utilization
• Migrates Virtual machines onto fewer hosts and the un-needed ESX hosts are powered off
• When workload demand increases, ESX hosts are powered back
#SER2849B CONFIDENTIAL 29
VMworld 2017 Content: Not fo
r publication or distri
bution
DPM with Predictions (contd.)
#SER2849B CONFIDENTIAL 30
VMworld 2017 Content: Not fo
r publication or distri
bution
Distributed Power Management with Predictions (contd.)
#SER2849B CONFIDENTIAL 31
VMworld 2017 Content: Not fo
r publication or distri
bution
Frequently Asked Questions
VMworld 2017 Content: Not fo
r publication or distri
bution
Workloads that pDRS Can Predict
• Any type of workload with periodic usage pattern
• Short spikes in the order of minutes will not be predicted
• More consistent the workload is, more accurate the predictions will be
#SER2849B CONFIDENTIAL 33
VMworld 2017 Content: Not fo
r publication or distri
bution
Learning Period
• The default learning period is a minimum of 14 days for generating predictions
• Longer the learning period, better the accuracy of predictions
• Predictions will be available only after 14 days of data
#SER2849B CONFIDENTIAL 34
VMworld 2017 Content: Not fo
r publication or distri
bution
Current Demand vs Future Demand
• pDRS will always ensure current VM demand will not be affected due to future demand
• VM demand = Max(Current demand, Future demand)
• Current demand of VMs will never be clipped in favor of future demand
#SER2849B CONFIDENTIAL 35
VMworld 2017 Content: Not fo
r publication or distri
bution
Tuning
• Compute Dynamic Thresholds
– You can manually force vROps to collect data for calculating Dynamic Thresholds (DT)
– “Administration” → “Support” → “Dynamic Thresholds”
• Look ahead interval
– Amount of time DRS looks ahead while accounting predictions
– Default value is 1 hour
– Use DRS advanced option ProactiveDrsLookaheadIntervalSecs to change
– Max allowed value is 12 hours
#SER2849B CONFIDENTIAL 36
VMworld 2017 Content: Not fo
r publication or distri
bution
VMs with Predictions vs VMs without Predictions
• How will pDRS behave when I have a mix of VMs (some with predictions, and some without) in the same cluster?
• VMs with predictions,
– VM demand = Max (Current demand, Future demand)
• VMs without predictions,
– VM demand = Current demand
#SER2849B CONFIDENTIAL 37
VMworld 2017 Content: Not fo
r publication or distri
bution
Filtering Predictions in the vROPs
• Once a day, vROps sends the next 26 hours of predictions to VC
• For 26 hours, there will be 52 samples, 1 sample for every 30 minutes
• Prediction samples that do not meet the accuracy criteria will be discarded and set to a value -1
• Consecutive identical samples will be merged to send a single multi-hour sample
#SER2849B CONFIDENTIAL 38
VMworld 2017 Content: Not fo
r publication or distri
bution
• How can I differentiate DRS vMotions due to predictions?
• Resource demand on Host A increased due to increased demand in green and red VMs
• DRS moved blue VMs to Host B to balanced the cluster load
• In this case, DRS chose blue VMs as moving them will balance the cluster faster
Host A
Identify vMotions Due to Predictions
VMs without predictions
VMs with predictions
VMs with predictions dropped
Host B
#SER2849B CONFIDENTIAL 39
VMworld 2017 Content: Not fo
r publication or distri
bution
Conclusion
• pDRS can help avoid contention before the performance of a VM degrades
• Forecasting in pDRS works best for VMs with periodic workload patterns
• Current demand will never be clipped to favor future demand
• Provides the best solution through Reactive + Predictive approach
#SER2849B CONFIDENTIAL 40
VMworld 2017 Content: Not fo
r publication or distri
bution
DRS Flings
• DRS Lens
– Provides a simple, yet powerful interface to highlight the value proposition of vSphere DRS
– https://labs.vmware.com/flings/drs-lens
• DRS Dump Insight
– Service portal where users can upload drmdump files and it provides a summary of the DRS run
– https://labs.vmware.com/flings/drs-dump-insight
#SER2849B CONFIDENTIAL 41
VMworld 2017 Content: Not fo
r publication or distri
bution
Other Performance Sessions
• vCenter Performance Deep Dive [SER1504BU]
• Extreme Performance Series: Performance Best Practices [SER2724BU]
• Extreme Performance Series: vSAN Performance Troubleshooting [STO1515BU]
• Maximum Performance with Mark Achtemichuk [VIRT2368GU]
#SER2849B CONFIDENTIAL 42
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution