extreme performance series: sql server, oracle, and sap
TRANSCRIPT
#vmworld
BCA1482BU
Extreme Performance Series: SQL Server, Oracle, and SAP Monster DB VMs
Todd Muirhead, VMware, Inc.
#BCA1482BU
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Disclaimer
This presentation may contain product features or functionality that are currently under development.
This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined.
2
The information in this presentation is for informational purposes only and may not be incorporated into any contract. There is no commitment or obligation to deliver any items presented herein. VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Agenda
3
Monster Virtual Machines
History of Monsters
vSphere 6.7u2 – 256 vCPUs, EFI Firmware
Hyper-Threading vs Cores
Oracle
Scale Up and Generational
Persistent Memory (PMEM)
SQL Server
Scale Up
AMD EPYC Rome
SAP HANA
Native vs Virtual Performance
4 Generations of Server Performance
Best Practices for Monster VMsVMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 4
History of Monster VMsMarveling at the Passage of Time
Iron Man and Incredible Hulk(2008) vSphere 3.5 2008 4 vCPUs 64 GB
No Marvel Movies… (2009) vSphere 4.0 2009 8 vCPUs 256 GB
Iron Man 2 (2010) vSphere 4.1 2010 8 vCPUs 256 GB
Captain America and Thor (2011) vSphere 5.0 2011 32 vCPUs 1TB
The Avengers (2012) vSphere 5.1 2012 64 vCPUs 1TB
Iron Man 3 and Thor 2 (2013) vSphere 5.5 2013 64 vCPUs 1TB
Avengers 2 – Age of Ultron (2015) vSphere 6.0 2015 128 vCPUs 4TB
Captain America Civil War (2016) vSphere 6.5 2016 128 vCPUs 6TB
Avengers 3 – Endgame (2019) vSphere 6.7u2 2019 256 vCPUs 6TBVMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 5
vSphere 6.7u2 – More than an Update
vSphere 6.7u2 is a very big “update” release
Maximum number of vCPUs for a VM increased from 128 to 256
Specifically pulled in for support of large SAP HANA instances
Enables a single VM to use current generation skylake or Cascade lake entire 4-socket host –with 112 cores / 224 threads
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 6
Virtual Machine EFI Firmware
EFI type firmware must be specified to be able to use more than 128 vCPUs
Existing VMs cannot be switched to EFI firmware type – a reinstall of the OS is required to switch the Firmware type from BIOS to EFI
Required reinstall all of the test VMs to be able to upgrade existing Monster VMs to more than 128vCPUs
Located in Edit Settings -> VM Options Tab -> Boot Options -> Firmware
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Sockets, Cores, Logical Processors
Processor counts continue to increase with each generation
• Hyper-Threading doubles the number of Logical Processors, but doesn't double performance
When sizing your VMs, CPU Cores is the most relevant value
• Hyper-Threading typically provides 15-20% more performanceVMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
• NUMA = Non-Uniform Memory Access
• Physical hosts are divided into NUMA nodes = 1 CPU and its local memory
• NUMA layout for atypical 4-socket Intel server:
• NUMA allows CPUs to access local memory faster than non-local (remote) memory
• If possible, right-size VMs (vCPU and vMem) to “fit” your underlying host architecture
– For Monster VMs, ESXi automatically creates virtual NUMA nodes for the VM and spans physical NUMA nodes
An example host with:4 CPUs, 112 cores, 6 TB of RAM
each NUMA node has:1 CPU, 28 cores, 1.5 TB RAM
Know Your NUMARight-Size VMs to your Host Architecture
VMworld 2019 Content: Not for publication or distribution
9©2019 VMware, Inc.
Oracle Monster VM Testing
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 10
Testing Configuration
IvyBridge-EX ServerXeon E7-4890 v2VMware vSphere 6.5
EthernetFibre Channel
Driver Server4 socket VMware vSphere 6.7
Haswell/Broadwell-EXServersXeon E7-8890 v3, v4VMware vSphere 6.7u2
Skylake-SP ServerXeon Platinum 8180VMware vSphere 6.7u2
Storage ArrayDELL|EMC Unity Array
Cascade Lake ServerXeon Platinum 8280LVMware vSphere 6.7u2
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 11
vSphere 6.7u2
RHEL 7.5
Oracle 12c 12.1.0.2
256 GB of RAM
200 GB SGA
200 GB of Large Pages
DVD Store 3 with ~200 GB of data on disk
EFI type Firmware
All Flash Storage on EMC Unity Array – separate LUNs for log and data
Oracle Testing Details
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 12
Test workload was open source DVD Store 3 (github.com/dvdstore/ds3)
OLTP workload simulating online store selling “DVDs”
Utilizes many database features including stored procedures, transactions, triggers, foreign keys, and full-text indexes
Version 3 includes customer reviews with intelligent review rankings
Measured in terms of Orders Per Minute
Each order is made up of a series of steps – login, browse for products, browse reviews, purchase products
Supports Oracle, SQL Server, and MySQL
Workload was run at increasing levels of load to find the highest performing test configuration
New and more complex queries make results not comparable with previous DVD Store versions
DVD Store 3
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Cool esxtop Screenshot
VMworld 2019 Content: Not for publication or distribution
Virtual Performance Grows with Hardware Performance Increases
#VIRT1052BU CONFIDENTIAL 14
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 5
Re
lati
ve
Ord
ers
Pe
r M
inu
te (
OP
M)
Generational Monster Oracle DB VM Performance
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 15
Performance of Oracle Sky Lake vs Cascade Lake Monster VM
• Same number of cores - 112 vCPU VMs
• 12% more throughput for Cascade Lake
• Continued performance gain with new generation of servers – with same number of cores
0
0.2
0.4
0.6
0.8
1
1.2
1
Re
lati
ve
Ord
ers
Pe
r M
inu
te (
OP
M)
Oracle DVD Store 3 Skylake vs Cascade Lake Scale Up Single VM Performance
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 16
Scalability is good up to 112 vCPUs
Max Size VM for this host – 224 vCPUs
208 vCPUs outperforms 112 vCPUs by 3%
Cascade Lake Oracle DVD Store 3 Single VM Scale Up
0
50000
100000
150000
200000
250000
300000
1 2 3 4 5 6
Ord
ers
Pe
r M
inu
te (
OP
M)
Oracle DVD Store 3 Cascade Lake Scale Up Single VM Performance
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 17
Persistent Memory - PMEM
• Persistent memory
• Has the characteristics of memory
– DRAM-like latency and bandwidth
– CPU can use regular load/store byte-addressable instructions
• Fully ACID Compliant - maintains data during power cycles
Persistent Memory
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 18
PMEM Access ModelsExposing PMEM To Virtual Machines
External Block Storage Model (vPMEMDISK)
PMEM backed Datastore
No modifications to application or database
Existing applications or migrations to PMEM
Byte-Addressable Model (vPMEM)
VM direct access to NVDIMMs
OS requirements:
Windows Server 2016, RHEL 7.5, CentOS 7.4
Most performant model
vDISKvPMEM-
DISK
DataStore (VMDK)PMEM DataStore
(PMEMDSK)
vSCSI
NVDIMM
Memory Bus
PMEM DataStore (NVDIMM)
NVDIMM
Memory Bus
vPMEM
External Device Block Storage Model
Byte-Addressable PMEM Model
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 19
PMEM – Upgrade via Storage vMotion
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 20
Moved Database Storage to vPMEM Disk
• Simple Storage vMotion
• 10% gain in throughput
• CPU Utilization and IOPS increased
• Seamless to OS and Application
• Increases performance gain for Cascade Lake to over 20% vs previous generation
Oracle Monster VM and vPMEM Disk
1 2O
rde
rs P
er
Min
ute
(O
PM
)
Performance with Oracle DatabaseFlash Storage vs vPMEM Disk
VMworld 2019 Content: Not for publication or distribution
21©2019 VMware, Inc.
SQL Server MonsterVM Testing
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 22
Four Socket Skylake / Cascade Lake Server
• 28 cores per socket / 112 cores per host / 224 threads
Windows Server 2016
SQL Server 2017
Test Workload – Cloud Database Benchmark
• Transactional workload
• Output measured in terms of Transactions Per Second (TPS)
• Test database size was 1.5TB
Virtual Machine Configuration
• 2 TB of RAM
• 4 pvSCSI Adapters
• DELL|EMC Unity Array Flash Storage
SQL Server Monster VM Testing
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Cool CDB / Matrix Screenshot
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
SQL Server CDB Test Results
0
5000
10000
15000
20000
25000
30000
1200 1500 1800 2100 2400 2700 3000
Th
rou
gh
pu
t (T
ran
sacti
on
s P
er
Se
co
nd
T
PS
)
Users
Monster SQL Server VM Scalability on vSphere 6.7u2 to 224 vCPUs
224 vCPUs
56 vCPUs
28 vCPUs
112 vCPUs
Cascade Lake = Green
Sky Lake = Blue
Cascade Lake Outperforms Sky Lake at all sizes tested
224 vCPUs is 17% better than 112 vCPUsVMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
SQL Server Monster VM on AMD EPYC Rome
2 Socket AMD EPYC 7742
• 64 cores / 128 threads per socket
• 2.25GHz (Turbo Boost up to 3.4GHz)
• Each CCD has 8 cores
1 TB Micron Memory
• DDR4
• 3200MHz
Windows Server 2016 / SQL Server 2017
• 1 TB of vRAM
• DVD Store 3 workload
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
SQL Server Monster VM on AMD EPYC Rome
AMD EPYC Rome VM testing config
• 64 Cores per socket
• Tested 64 vCPUs and 128 vCPUs
Testing Results
• 86% Gain from 64 to 128 vCPUs
• NUMA nodes increased from 1 to 2
Observations
• Increasing the virtual NUMA to match EPYC CCDs – did not improve performance
• vSphere 6.7u3 introduces support for Rome
• 128 vCPUs is max for AMD as of 6.7u3
• There are a lot of cores!0
0.5
1
1.5
2
1 2
Re
lati
ve
Ord
ers
Pe
r M
inu
te (
OP
M)
SQL Server with DVD Store 3 on AMD EPYC Rome Dual Socket Server
VMworld 2019 Content: Not for publication or distribution
27©2019 VMware, Inc.
SAP Hana Monster VMs
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
SAP HANA Overview
SAP HANA is an in-memory database
• Both real-time analytics & transaction workloads
• Large Configurations
• Database for next generation of SAP ERP
• Certified Solutions
– Partners work closely with SAP
– Certification includes Performance KPIs
– vSphere has been certified since 5.5 (2013)
• SAP Hana has been a key driver for increasing VM Size limits
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 29
SAP HANA Performance: Virtual vs Native
Virtual 192 vCPUs Native HT on - 192 threads
Tra
nsa
cti
on
s/H
ou
r
SAP HANA Mixed Workload on 192 vCPUs
SAP HANA Mixed Workload Test
• OLTP and OLAP concurrently
• One Large 192 vCPU VM vs Native
• Both tests used full 4-socket host
Delta between virtual and native was ~5%
Tested with vSphere 6.7u2 on Broadwell four socket host with HT enabled
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 30
Virtual SAP HANA Performance continues to increase with new generations
• Haswell was limited to 128 vCPUs
• Cascade Lake outperformed Skylake by 6% with same number of vCPUs
SAP HANA Generational Performance Gains
1 2 3 4
Tra
nsa
cti
on
s P
er
Ho
ur
Server Generational SAP HANA Monster VM Performance Gains
VMworld 2019 Content: Not for publication or distribution
31©2019 VMware, Inc.
Best Practices –Monster VMs
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 32
First, size the CPU resources a VM needs – then look at NUMA node sizes
• Monster DB VMs that need to span NUMA nodes (aka “wide” VMs) perform great, so if you need more vCPUs than what one socket provides, increase it!
• If you don’t need that much CPU, then it can be more efficient to keep things within a NUMA node
Run monster DB VMs on newest servers for best performance
Do not pin VMs (CPU affinity)
Enable Hyper-Threading (usually BIOS default)
Leave Latency Sensitivity at default value of Normal
Use High Performance Power Policy for Best Performance
CPU Best Practices
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Memory Best Practices
Use large memory pages
• Virtual databases benefit from large pages more than most applications due to memory usage patterns
Set memory reservation
• Equal to Oracle SGA / SQL Server active memory
• Full reservation for SAP HANA
Size VM memory based on NUMA node memory size
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 34
Storage Best Practices
Configure storage carefully and properly
• Database VMs need dedicated storage LUNs
• Work with the storage administrator closely on storage configuration
Use flash or SSDs if possible - vSAN
Virtual SCSI adapters should always be paravirtualSCSI (pvSCSI)
Use multiple pvSCSI adapters – spread data across
If using iSCSI or NFS based storage, enable jumbo frames
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 35
Use Oracle Best Practices within the VM
• Use Oracle installer RPM to prep the VM with their best practices
• Evaluate the use of NUMA before enabling (Default is off)
• Use Linux Hugepages (sometimes called large pages)
– Enable at the OS level
– Oracle memory management parameters
Enable virtual NUMA for the VM if it is wide
• Even though Oracle is not using NUMA, performance was still better with vNUMA than without
Refer to Oracle Databases on VMware Best Practices Guide for more information
Oracle Best Practices
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
SQL Server Best Practices
SQL Server is largely auto-tuning; newer releases better utilize NUMA
• “Apply SQL Server 2016 and SQL Server internally leverages SOFT NUMA partitioning to achieve double digit performance gains.”Source: https://blogs.msdn.microsoft.com/psssql/2016/03/30/sql-2016-it-just-runs-faster-automatic-soft-numa/
• However, some tuning is still advisable:
– Max Degree of Parallelization – MAXDOP
– Cost Threshold for Parallelism – Increase from default of 5
– Enable Large Pages- Trace Flag 834, Lock Pages in Memory right for SQL Server
• 5 SQL Server Settings to Change:https://www.brentozar.com/archive/2013/09/five-sql-server-settings-to-change/
– Microsoft SQL Server on VMware vSphere Best Practices Guide:http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/sql-server-on-vmware-best-practices-guide.pdf
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 37
Ensure your SQL Server license allows for the scale you need
• Web/Express/Standard have limitations vs. Developer/Enterprise Editions:
Source: https://www.microsoft.com/en-us/cloud-platform/sql-server-editions
SQL Server Best Practices
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Key Points / Conclusions
Monster Database VMs (Oracle, SQL Server, and SAP Hana) can be run successfully with high performance using the latest server technologies on vSphere.
vSphere 6.7u2 provides support for up to 256 vCPUs allowing a single VM to be configured to use all CPU resources of current generation 4 socket servers.
Intel Cascade Lake platform provides a performance boost over previous generation with more performance per core and PMEM.
AMD EPYC Rome is introduced in vSphere 6.7u3 and provides good performance for databases.
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Extreme Performance Series: Sessions
HBI2526BU Performance Best Practices
BCA1482BU SQL Server, Oracle, and SAP Monster Database VMs
HBI2090BU vSphere Compute & Memory Schedulers
HBI2880BU DRS 2.0 Performance Deep Dive
HBI2090BU vSphere & Intel Optane DC PMEM=Max Performance
BCA1430BU Accelerating Application/Database Performance In the Self-Learning SDDC
HBI1421BU Innovations in vMotion: Features, Performance and Best Practices
BCA1393BU SAP HANA on vSphere 6.7u2 and Intel Cascade Lake Best Practices
MLA1594BU Optimize Virtualized Deep Learning Performance with New Intel Architectures
BCA1332BU The Different Forms of Machine Learning: How They Fit with VMware
BCA2551BU Low Latency Media & Entertainment Workloads
HCI1606BU SAP HANA on vSAN: Best Practice Recommendations and Lessons Learned
HCI1619BU Troubleshooting performance issues with vSAN Performance Diagnostics
BCA1563BU High Performance Virtualized Spark Clusters on Kubernetes for Deep Learning
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Extreme Performance Series: Hands On Labs
SPL-2004-01-SDC
SPL-2004-02-CHG
ELW-2004-01-SDC
SPL-2047-01-EMT
SPL-2048-01-EMT
ELW-2048-02-EMT
SURVEY: TheVMware Performance Engineering team is always looking for feedback about your experience with the performance of our products, our various tools, interfaces and where we can improve:
www.vmware.com/go/perf
Mastering vSphere Performance
vSphere Challenge Lab
Expert Led Workshop: Mastering vSphere Performance
Accelerate Machine Learning in vSphere Using GPUs
Launch Your Machine Learning Workloads in Minutes on vSphere
Expert Led Workshop: Launch Your Machine Learning Workloads in Minutes on vSphere
& Accelerate them using GPUs
VMworld 2019 Content: Not for publication or distribution
VMworld 2019 Content: Not for publication or distribution
VMworld 2019 Content: Not for publication or distribution