kvm forum 2011 performance improvements optimizations d
TRANSCRIPT
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
1/41
1
KVM PERFORMANCEIMPROVEMENTS AND
OPTIMIZATIONS
Mark WagnerPrincipal SW Engineer, Red Hat
August 14, 2011
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
2/41
2
Overview
Discuss a range of topics about KVM performance
How to improve out of the box experience But crammed into 30 minutes
Use libvirt where possible
Note that not all features in all releases
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
3/41
3
Before we dive in...
RHEL6-default0
50
100
150
200
250
300
350
400
450
Guest NFS Write Performance - are we sure ?
Is this really a 10Gbit line ?
Throu
ghput(MBytes/se
cond)
By defaultthe rtl8139
device is chosen
Arrow showsimprovement
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
4/414
Agenda
Low hanging fruit
Memory Networking
Block I/O basics
NUMA and affinity settings
CPU Settings
Wrap up
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
5/415
Recent Performance Improvements
Performance enhancements in every component
Component Feature
CPU/Kernel NUMA Ticketed spinlocks; Completely fair scheduler;Extensive use of Read Copy Update (RCU)Scales up to 64 vcpus per guest
Memory Large memory optimizations: Transparent Huge Pages isideal for hardware based virtualization
Networking Vhost-net a kernel based virtio w/ better throughput andlatency. SRIOV for ~native performance
Block AIO, MSI, scatter gather.
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
6/416
Remember this ?
RHEL6-default0
50
100
150
200
250
300
350
400
450
Guest NFS Write Performance
Throu
ghput(MBytes/second)
Impact of not specifyingOS at guest creation
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
7/417
Be Specific !
virt-manager will:
Make sure the guest
will function
Optimize as it can
The more info you provide the
more tailoring will happen
Specify theOS details
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
8/41
8
Specify OS + flavor
Specifying Linux will get you:
The virtio driver
If the kernel is recent
enough the vhost_netdrivers
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
9/41
9
I Like This Much Better
Default vhost virtio0
50
100
150
200
250
300
350
400
450
Guest NFS Write Performance
Impact of specifying OS Type at Creation
Throu
ghput(MBytes/se
cond)
12.5 x
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
10/41
10
Memory Tuning Huge Pages
2M pages vs 4K standard Linux page
Virtual to physical page map is 512 times smaller
TLB can map more physical page resulting fewer misses
Traditional Huge Pages always pinned
We now have Transparent Huge Pages
Most databases support Huge Pages
Benefits not only Host but guests
Try them in a guest too !
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
11/41
11
Transparent Huge Pages
No-THP THPK
50K
100K
150K
200K
250K
300K
350K
400K
450K
500K
SPECjbb workload24-cpu, 24 vcpu Westmere EP, 24GB
guest
bare metal
T
ransactionsPerM
inute
30%25%
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
12/41
12
Network Tuning Tips
Separate networks for different functions
Use arp_filter to prevent ARP Flux
echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
Use /etc/sysctl.conf for permanent
Packet size - MTU
Need to make sure it is set across all components
Don't need HW to bridge intra-box communications
VM traffic never hits the HW on same box
Can really kick up MTU as needed
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
13/41
13
KVM Network Architecture - VirtIO
Virtual Machine sees paravirtualizednetwork device VirtIO VirtIO drivers included in Linux Kernel VirtIO drivers available for Windows
Network stack implemented in userspace
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
14/41
14
KVM Network ArchitectureVirtio
Context switch host
kernel userspace
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
15/41
15
1 21 64
195
765
2048
6147
24573
65536
0
50
100
150
200250
300
350
400
Network Latency virt io
Guest Receive (Lower is better)
Virtio
Host
Message Size (Bytes)
Latency(use
cs)
Latency comparison
4X gap in latency
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
16/41
16
KVM Network Architecture vhost_net
Moves QEMU network stack from
userspace to kernel
Improved performance
Lower Latency
Reduced context switching
One less copy
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
17/41
17
1 19 51 131
387
1027
3075
8195
2457
9
6553
9
0
50
100
150
200
250
300
350
400
Network Latency - vhost_net
Guest Receive (Lower is better)
VirtioVhost_net
Host
Message Size (Bytes)
Latency
(usecs)
Latency comparison
Latency much closerto bare metal
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
18/41
18
Host CPU Consumption virtio vs vhost_net
32-vh
ost
3
2-vio
128-
vhost
12
8-vio
512-
vhost
51
2-vio
204
8-vh
ost
204
8-vio
819
2-vh
ost
819
2-vio
3276
8-vh
ost
3276
8-vio
0
5
10
15
20
25
30
35
40
45
Host CPU Consumpt ion, virtio vs Vhost
8 Guests TCP Receive
%usr
%soft
%guest
%sys
Message Size (Bytes)
%T
otalHostCPU(Lo
werisBetter)
Major difference
is usr time
Two columnsis a data set
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
19/41
19
vhost_net Efficiency
32 64 128
256
512
1024
2048
4092
8192
1638
4
3276
8
6550
7
0
50
100
150
200
250
300
350
400
8 Guest Scale Out RX Vhost vs Virtio - % Host CPUMbit per % CPU netperf TCP_STREAM
Vhost
Virtio
Message Size (Bytes)
Mbit/%C
PU(biggeris
better)
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
20/41
20
KVM Architecture Device Assignment vs SR/IOV
Device AssignmentSR-IOV
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
21/41
21
KVM Network Architecture PCI Device Assignment
Physical NIC is passed directly to guest
Device is not available to anything else on the host
Guest sees real physical device
Needs physical device driver
Requires hardware support
Intel VT-D or AMD IOMMU
Lose hardware independence
1:1 mapping of NIC to Guest
BTW - This also works on some I/O controllers
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
22/41
22
KVM Network Architecture SR-IOV
Single Root I/O Virtualization
New class of PCI devices that present multiple virtual devices that
appear as regular PCI devices
Guest sees real physical device
Needs physical (virtual) device driver
Requires hardware support
Actual device can still be shared
Low overhead, high throughput
No live migration well its difficult
Lose hardware independence
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
23/41
23
Latency comparison
1 19 51 131
387
1027
3075
8195
2457
9
6553
9
0
50
100
150
200
250
300
350
400
Network Latency by guest in terface method
Guest Receive (Lower is better)
Virtio
Vhost_net
SR-IOV
Host
Message Size (Bytes)
Latency(use
cs)
SR-IOV latency closeto bare metal
SR-IOV latency close to bare metal
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
24/41
24
KVM w/ SR-IOV Intel Niantic 10Gb Postgres DB
ThroughputinOr
der/min(OPM)
69,984
86,46992,680
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
100,000
1 Red Hat KVM
bridged guest
1 Red Hat KVM
SR-IOV guest
1 database instance
(bare metal)
TotalOPM
DVD Store Version 2 results
93%Bare Metal
76%Bare Metal
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
25/41
25
I/O Tuning - Hardware
Know your Storage
SAS or SATA? Fibre Channel, Ethernet or SSD? Bandwidth limits
Multiple HBAs Device-mapper-multipath Provides multipathing capabilities and LUN persistence
How to test
Low level I/O tools dd, iozone, dt, etc
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
26/41
26
I/O Tuning Understanding I/O Elevators
Deadline
Two queues per device, one for read and one for writes
IOs dispatched based on time spent in queue
CFQ Per process queue
Each process queue gets fixed time slice (based on process priority)
Noop FIFO
Simple I/O Merging
Lowest CPU Cost
Can set at Boot-time
Grub command line elevator=deadline/cfq/noop Or Dynamically per device
echo deadline > /sys/class/block/sda/queue/scheduler
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
27/41
27
Virtualization Tuning I/O elevators - OLTP
1Guest 2 Guests 4 GuestsK
50K
100K
150K
200K
250K
300K
Performance Impact of I/O Elevators on OLTP Workload
Host running Deadline Scheduler
Noop
CFQ
Deadline
TransactionsperMinute
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
28/41
28
Virtualization Tuning - Caching
Cache=none I/O from the guest in not cached
Cache=writethrough
I/O from the guest is cached and written through on the host
Potential scaling problems with this option with multiple guests(host cpu used to maintain cache)
Cache=writeback - Not supported
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
29/41
29
Effect of I/O Cache settings on Guest performance
1Guest 4GuestsK
100K
200K
300K
400K
500K
600K
700K
800K
900K
OLTP like workload
FusionIO storage
Cache=WT
Cache=none
TransactionPerMinute
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
30/41
30
I/O Tuning - Filesystems
B
Configure read ahead
Database ( parameters to configure read ahead) Block devices ( getra , setra )
Asynchronous I/O
Eliminate Synchronous I/O stall Critical for I/O intensive applications
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
31/41
31
AIO Native vs Threaded (default)
10U 20UK
100K
200K
300K
400K
500K
600K
700K
800K
900K
1000K
Impact of AIO selection on OLTP Workload"cache=none" setting used - Threaded is default
AIO Threaded
AIO Native
Number of Users (x 100)
TransactionsPer
Minute
Configurable per device (only by xml configuration file)Libvirt xml file - driver name='qemu' type='raw' cache='none' io='native'
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
32/41
32
Remember Network Device Assignment ?
Device Assignment It works for Block too ! Device Specific Similar Benefits And drawbacks...
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
33/41
3334
KVM VirtIO KVM/PCI-PassThrough Bare-Metalk
2k
4k
6k
8k
10k
12k
14 k
16k
18k
20k
SAS Mixed Analytics Workload - Metal/KVMIntel Westmere EP 12-core, 24 GB Mem, LSI 16 SAS
SAS systemSAS Total
Timetocom
plete(secs)
6% longer
25% longer
Block Device Passthrough - SAS Workload
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
34/41
34
NUMA (Non Uniform Memory Access)
Multi Socket Multi core architecture
NUMA is needed for scaling
Keep memory latencies low
Linux completely NUMA aware
Additional performance gains by enforcing NUMA placement
Still some out of the box work is needed
How to enforce NUMA placement
numactl CPU and memory pinning
One way to test if you get a gain is to mistune it.
Libvirt now supports some NUMA placement
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
35/41
35
Memory Tuning - NUMA# numactl --hardwareavailable: 8 nodes (0-7)node 0 cpus: 0 1 2 3 4 5
node 0 size: 8189 MBnode 0 free: 7220 MBnode 1 cpus: 6 7 8 9 10 11node 1 size: 8192 MB...node 7 cpus: 42 43 44 45 46 47node 7 size: 8192 MB
node 7 free: 7816 MBnode distances:node 0 1 2 3 4 5 6 7
0: 10 16 16 22 16 22 16 221: 16 10 22 16 16 22 22 162: 16 22 10 16 16 16 16 163: 22 16 16 10 16 16 22 22
4: 16 16 16 16 10 16 16 225: 22 22 16 16 16 10 22 166: 16 22 16 22 16 22 10 167: 22 16 16 22 22 16 16 10
InternodeMemory distanceFrom SLIT table
Note variation ininternode distances
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
36/41
36
Virtualization Tuning Using NUMA
4 Guest -24 vcpu-56G 4 Guest -24 vcpu-56G-NUMAK
50K
100K
150K
200K
250K
300K
350K
400K
Impact of NUMA in multiguest OLTP
location,location,location
Guest 4
Guest 3
Guest 2
Guest 1
TransactionsPe
rSecond
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
37/41
37
Specifying Processor Details
Mixed results with CPU typeand topology
The Red Hat team is still
exploring some topology
performance quirks
Both model and topology
Experiment and see what works
best in your case
CPU Pi i Affi i
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
38/41
38
CPU Pinning - Affinity
Virt-manager allows CPU
selection based on NUMA
topology
True NUMA support in
libvirt
Virsh pinning allows finer grain
control
1:1 pinning
Good gains with pinning
P f it i t l
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
39/41
39
Performance monitoring tools
Monitoring tools
top, vmstat, ps, iostat, netstat, sar, perf Kernel tools
/proc, sysctl, AltSysrq Networking
ethtool, ifconfig Profiling
oprofile, strace, ltrace, systemtap, perf
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
40/41
40
Wrap up
KVM can be tuned effectively Understand what is going on under the covers Turn off stuff you don't need Be specific when you create your guest Look at using NUMA or affinity Choose appropriate elevators (Deadline vs CFQ) Choose your cache wisely
For More Information
-
7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D
41/41
41
For More Information
KVM Wiki
http://www.linux-kvm.org/page/Main_Page
irc, email lists, etc
http://www.linux-kvm.org/page/Lists%2C_IRC libvirt Wiki
http://libvirt.org/ New, revamped edition of the Virtualization Guide
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/index.html
Should be available soon !