red hat enterprise virtualization performance hat enterprise virtualization performance mark wagner...
Post on 10-May-2018
266 Views
Preview:
TRANSCRIPT
Red Hat Enterprise
Virtualization PerformanceMark Wagner Senior Principal Engineer, Red Hat June 13, 2013
Agenda
● Overview
● Features that help with Performance
● Tuning
● RHEV + RHS
● Migration to RHEV
● Wrap Up
● Questions
Agenda
● Overview
● Features that help with Performance
● Tuning
● RHEV + RHS
● Migration to RHEV
● Wrap Up
● Questions
COMPLETE DATACENTER VIRTUALIZATION SOLUTION
● Leading performance: Top virtualization benchmarks for performance and scalability
● Affordable: Lower TCO and higher ROI than competitive platforms
● Enterprise-ready: Powerful mix of enterprise features and a rich set of partners
● Open: Offers choice and interoperability with no proprietary lock-in
● Cross-platform: Optimized for Microsoft Windows and Linux guests
ENTERPRISE VIRTUALIZATIONFROM THE PEOPLE WHO BROUGHT YOU
RED HAT ENTERPRISE LINUX
RED HAT ENTERPRISE VIRTUALIZATION ARCHITECTURE
● Inherits performance, scalability, security and supportability of Red Hat Enterprise Linux
● Shares Red Hat Enterprise Linux hardware and software ecosystem
● Host: 160 logical CPU (4,096 theoretical max), 2 TB RAM (64TB theoretical max)
● Guest: 160 vCPU, 2 TB RAM
● Supports latest silicon virtualization technology
● Microsoft certified for Windows guests
SMALL-FORM FACTOR, SCALABLE, HIGH PERFORMANCE HYPERVISORBASED ON RED HAT ENTERPRISE
LINUX
RED HAT ENTERPRISE VIRTUALIZATION HYPERVISOR/KVM OVERVIEW
INDUSTRY LEADERS IN INFRASTRUCTURE, NETWORKING, AND STORAGE ARE BACKING RED HAT ENTERPRISE VIRTUALIZATION
INDUSTRY LEADERSHIP: THE ONLY END-TO-END OPEN VIRTUALIZATION INFRASTRUCTURE
SPECvirt2010: RHEL 6 KVM Post Industry Leading Results
http://www.spec.org/virt_sc2010/results/
Virtualization Layer and HardwareBlue = Disk I/OGreen = Network I/O
Client Hardware
System Under Test (SUT)
> 1 SPECvirt Tile/core> 1 SPECvirt Tile/core
Key Enablers: SR-IOV
Huge Pages
NUMA
Node Binding
VMware ESX 4.1 HP DL380 G7 (12 Cores, 78 VMs)
RHEL 6 (KVM) IBM HS22V (12 Cores, 84 VMs)
VMware ESXi 5.0 HP DL385 G7 (16 Cores, 102 VMs)
RHEV 3.1 HP DL380p gen8 (16 Cores, 150 VMs)
VMware ESXi 4.1 HP BL620c G7 (20 Cores, 120 VMs)
RHEL 6 (KVM) IBM HX5 w/ MAX5 (20 Cores, 132 VMs)
VMware ESXi 4.1 HP DL380 G7 (12 Cores, 168 Vms)
VMware ESXi 4.1 IBM x3850 X5 (40 Cores, 234 VMs)
RHEL 6 (KVM) HP DL580 G7 (40 Cores, 288 VMs)
RHEL 6 (KVM) IBM x3850 X5 (64 Cores, 336 VMs)
RHEL 6 (KVM) HP DL980 G7 (80 Cores, 552 VMs)
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
10,000
1,221 1,367 1,5702,442
1,878 2,1442,742
3,8244,682
5,467
8,956
Best SPECvirt_sc2010 Scores by CPU Cores
(As of May 30, 2013)
System
SP
EC
virt
_sc
20
10
sco
re
Comparison based on best performing Red Hat and VMware solutions by cpu core count published at www.spec.org as of May 17, 2013. SPEC® and the benchmark name SPECvir_sct® are registered trademarks of the Standard Performance Evaluation Corporation. For more information about SPECvirt_sc2010, see www.spec.org/virt_sc2010/.
2-socket 162-socket 12 2-socket 20
4-socket 40
8-socket 64/80
SPECvirt2010: Red Hat Owns Industry Leading Results
Agenda
● Overview
● Features that help with Performance
● Tuning
● RHEV + RHS
● Migration to RHEV
● Wrap Up
● Questions
Features that help with Performance
Use these features to help improve Guest Performance
● Host CPU
● CPU Pin
● Hooks
● Direct LUN
● Huge Pages
● Migration
● Numad
● MTU
Features that help with Performance
Use Host CPU
● Pros● Allows for guest to use hardware feature of CPU● Can provide good performance gains
● Cons● Prevents migration
RHEV – CPU Pinning
RHEL6.4 Single Large Guest (Parallel OpenMP Benchmark)
1 2 4 8 160
50
100
150
200
250
300
350
RHEL6.4 vs RHEL6.4 CPU type Linpack 2-node KVM guest
(Intel SandyBridge 8core/16cpu)
Kvm6.4 Kvm6.4 +cputype 16cpu-baremetal
Linpack Threads (intel nxn @20000)
Gfl
op
s
Features that help with Performance
CPU Pin
● Helps keep data cache lines hot
● Keep host scheduler from moving guests around
● Improved NUMA locality● If you pin correctly...
RHEV – CPU Pinning
RHEV – CPU Pinning
20U 60U 100U
4 Guests, 2 Hosts
Out of the Box Manual Pin
User Sets - Scaling
Tot
al T
rans
actio
ns /
min
ute
Features that help with Performance
A few others
● Hooks● The hook mechanism has been around for a long time● Some items move from hook to feature
● Direct LUN
● SR-IOV is currently one of the more important hooks
Features that help with Performance
A few others
● Direct LUN● Allows you to use directly attached storage● Typically higher performance
● Standard HugePages 2MB
● Reserve/free via
● /proc/sys/vm/nr_hugepages● /sys/devices/node/*
/hugepages/*/nrhugepages● Used via hugetlbfs
● GB Hugepages 1GB
● Reserved at boot time/no freeing
● Used via hugetlbfs
● Transparent HugePages 2MB
● On by default via boot args or /sys
● Used for anonymous memory
Features that help with Performance
Physical Memory
Virtual AddressSpace
TLB
128 data128 instruction
RHEV – Huge Pages in Guest
20U 60U 100U
Impact of Huge Pages in Guest
~ 10-15% improvement with huge pages
Regular Huge Pages
User sets - scaling
Tot
al T
rans
actio
ns /
min
ute
Features that help with Performance
Migration support
● Under the Cluster -> Policy settings● Can set duration and CPU load thresholds● Moves VM when limits are hit
● Useful for ● Maintenance● Power savings● Load balancing
Migration for Power Savings
Migration for Performance
Tuning for Migration
0
100000
200000
300000
400000
500000
600000
700000
Live Migration Without TuningNote due to high load did not finish
TPM-RR TPM – LM 32
Time
TP
M
Tuning for Migration
Check vdsm defaults
● /usr/share/doc/vdsm-4.10.2/vdsm.conf.sample● # Maximum bandwidth for migration, in MiBps, 0 means
libvirt's
# default, since 0.10.x default in libvirt is unlimited
# migration_max_bandwidth = 32● Edit /etc/vdsm/vdsm.conf
● Verify parameters are in the correct section
● Restart vdsm daemon for the changes to take effect● service vdsmd restart
Tuning for Migration
0
100000
200000
300000
400000
500000
600000
700000
Impact of Tuning on Live migrationCompletes in approximately 1 minute
TPM-RR TPM – LM 32 TPM – UL
Time
TP
M
RHEV – Migration for Even Distribution
Guest 1 Guest 2 Guest 3 Guest 4 Agg of 4 Guest
W/O Migration Auto Migration
Tra
nsa
ctio
ns
pe
r M
inu
teHost policy was set to 51%. Guest migration started automatically, resulting in overallhigher performance as both hosts were utilized. The single guest migration completed in approximately one minute.
Tra
nsa
ctio
ns
/ m
inu
te
Four NUMA node system,fully-connected topology
Node 0 RAM
QPI links, IO, etc.
Core 0
Core 3
Core 1
Core 2
L3 Cache
Node 1 RAM
QPI links, IO, etc.
Core 0
Core 3
Core 1
Core 2
L3 Cache
Node 2 RAM
QPI links, IO, etc.
Core 0
Core 3
Core 1
Core 2
L3 Cache
Node 3 RAM
QPI links, IO, etc.
Core 0
Core 3
Core 1
Core 2
L3 Cache
Node 3
Node 1Node 0
Node 2
Sample remote access latencies
4 socket / 4 node: 1.5x
4 socket / 8 node: 2.7x
8 socket / 8 node: 2.8x
32 node system: 5.5x● (30/32 inter-node latencies >= 4x)
10 ( 32/1024: 3.1%) 13 ( 32/1024: 3.1%) 40 ( 64/1024: 6.2%) 48 (448/1024: 43.8%) 55 (448/1024: 43.8%)
So, what's the NUMA problem?
● The Linux system scheduler is very good at maintaining responsiveness and optimizing for CPU utilization
● Tries to use idle CPUs, regardless of where process memory is located.... Using remote memory degrades performance!
● Red Hat is working with the upstream community to increase NUMA awareness of the scheduler and to implement automatic NUMA balancing.
● Remote memory latency matters most for long-running, significant processes, e.g., HPTC, VMs, etc.
numad can help improve NUMA performance● New RHEL6.4 user-level daemon to automatically
improve out of the box NUMA system performance, and to balance NUMA usage in dynamic workload environments
● Was tech-preview in RHEL6.3
● Not enabled by default
● See numad(8)
numad aligns process memory and CPU threads within nodes
Node 0Node 0Node 0 Node 2Node 1 Node 3
Process 37
Process 29
Process 19
Process 61
Node 0Node 0Node 0 Node 2Node 1 Node 3
Proc 37Proc
29
Proc19 Proc
61
Before numad After numad
RHEV – hand tuning vs numad
20U 60U 100U
untuned manual pin numad
User Sets - Scaling
Tot
al T
rans
actio
ns /
min
ute
numad gives same performance improvements as manual pinning and also allows migration
Agenda
● Overview
● Features that help with Performance
● Tuning
● RHEV + RHS
● Migration to RHEV
● Wrap Up
● Questions
Tuning
Tuning for both the hypervisor and guest
● Already covered vdsm
● tuned
● Kernel
● MTU
tuned Profile Comparison MatrixTunable default enterprise-
storagevirtual-host
virtual-guest
latency-performance
throughput-performance
kernel.sched_min_granularity_ns
4ms 10ms 10ms 10ms 10ms
kernel.sched_wakeup_granularity_ns
4ms 15ms 15ms 15ms 15ms
vm.dirty_ratio 20% RAM 40% 10% 40% 40%
vm.dirty_background_ratio
10% RAM 5%
vm.swappiness 60 10 30
I/O Scheduler (Elevator)CFQ deadline deadline deadline deadline deadline
Filesystem Barriers On Off Off Off
CPU Governor ondemand performance performance performance
Disk Read-ahead 4x
Disable THP Yes
Disable C-States Yes
https://access.redhat.com/site/solutions/369093
● RHEL scheduler tries to keep all CPUs busy by moving tasks form overloaded CPUs to idle CPUs
● You detect using “perf stat”, look for excessive “migrations”
● Issues on larger systems where the scheduler is a bit too active
● Can tune sched_migration_cost to help calm the scheduler down
● This is especially effective on multi-socket systems
Load Balancing
● /proc/sys/kernel/sched_migration_cost
● Amount of time after the last execution that a task is considered to be “cache hot” in migration decisions. A “hot” task is less likely to be migrated, so increasing this variable reduces task migrations. The default value is 500000 (ns).
● If the CPU idle time is higher than expected when there are runnable processes, try reducing this value. If tasks bounce between CPUs or nodes too often, try increasing it.
● Rule of thumb – increase by 2-10x to reduce load balancing
● Increase by 10x on large systems when many CGROUPs are actively used (ex: RHEV/ KVM/RHOS)
Load Balancing
sched_migration_cost
exit_10 exit_100 exit_1000 fork_10 fork_100 fork_10000.00
50.00
100.00
150.00
200.00
250.00
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
140.00%
RHEL6.3 Effect of sched_migration_cost on fork/exitIntel Westmere EP 24cpu/12core, 24 GB mem
usec/call default 500us usec/call tuned 4ms percent improvement
use
c/c
all
Pe
rce
nt
● Improved interface allows for setting MTU
● On faster networks this can be a big win● Of course it depends on the data patterns● Assumes switch is set correctly
MTU
Agenda
● Overview
● Features that help with Performance
● Tuning
● RHEV + RHS
● Migration to RHEV
● Wrap Up
● Questions
RHEV + RHS
Integration has been underway
● Scale testing single volume over 8 RHS servers● Not necessarily what we would recommend● 1024 guests with RHEV● 2048 guests with KVM
● Another internal group had 2250 guests with RHEV
● 512 guests all driving IO● Sum of guest memory sized to fit in hosts memory● No swapping
Host (hypervisor)
qemu-kvm process
Software layers in Virtual Block Storage
Guest (VM)ext4 /mnt/test
LVM volume/dev/vg_guest/test
virtio-block /dev/vda
Kernel - FUSE
glusterfs
/mnt/your-gluster-volume/guest-image-pathname
qemu-kvm
Guest ...
network
Scaling RHEV / KVM / RHS128 VMs all performed I/O simultaneously
RHEV Host A
RHEL6.2z
Gluster3.3
RHS 2.0U4
RHEL6.3z
RHEV3.1
RHEV / KVM / RHS Tuning
● gluster volume set <volume> group virt
● RHS server: tuned-adm profile rhs-virtualization
● KVM host: tuned-adm profile virtual-host
● ideally separate gluster volumes for app. files, disk images
● For better response time shrink guest block device queue● /sys/block/vda/queue/nr_request ( 8 )
● for best sequential read throughput, raise VM readahead● /sys/block/vda/queue/read_ahead_kb ( 2048 )
Impact of Tuning Gluster and Kernel Alone
Random Write Random Read Sequential Write Sequential Read0
1000
2000
3000
4000
5000
6000
Effect of Tuning on Large File Virtio-Block I/O
2 replicas, 8 servers, 16 hosts, 128 VMs, 32G per Server, 64K recsz
untuned tuned
Th
rou
gh
pu
t in
MB
per
Sec
For sequential I/O, RHEV host utilizes 10 GbE net Only 1 RHEV host, 8 RHS servers, 2-replica volume,
1 thread per VM, 16 GB files, 4 KB transfer size
1 VM 2 VMs 4 VMs 8 VMs 0
100
200
300
400
500
600
VM sequential write throughput
KVM guests/host
tra
nsf
er
rate
(M
B/s
)
1 VM 2 VMs 4 VMs 8 VMs0
200
400
600
800
1000
1200
VM sequential read throughput
128 KB2048 KB
KVM guests/host
tra
nsf
er
rate
(M
B/s
) Guest readahead
Red Hat recommends
RHEV/RHS Scales as Hardware is Added
0 1 2 3 4 5 6 7 8 90
1000
2000
3000
4000
5000
6000
7000
8000
0
500
1000
1500
2000
2500
3000
3500
4000
Scaling Sequential I/O of 128 VMsOne host per gluster server, virtual-block
64-KB transfer size, one thread/guest
seq-write seq-read
RHS servers
rea
d M
B/s
wri
te M
B/s
(2
-re
plic
a)
0 1 2 3 4 5 6 7 8 90
5000
10000
15000
20000
25000
30000
Scaling Random IOPS with 128 guests
1 RHS server/RHEV host, 64-KB transfer size, 1 thread/guest
random-write random-read
RHS servers
thro
ug
hp
ut
(IO
PS
)
For More Information
● RHEV/RHS with 128 guestshttps://access.redhat.com/site/articles/393123
● RHEV/RHS single-host performancehttps://access.redhat.com/site/articles/313973
Agenda
● Overview
● Features that help with Performance
● Tuning
● RHEV + RHS
● Migration to RHEV
● Wrap Up
● Questions
Migrating to RHEV
Several detailed Reference Architecture papers on this● Red Hat customer portal https://access.redhat.com
● Requires user account
● Scripts and configuration files provided
Agenda
● Overview
● Features that help with Performance
● Tuning
● RHEV + RHS
● Migration to RHEV
● Wrap Up
● Questions
Reference Architectures
Two places to get Red Hat reference architectures
● Red Hat resource library www.redhat.com
● Free
● Red Hat customer portal https://access.redhat.com
● Requires user account
● Scripts and configuration files provided
● RHEV/RHS with 128 guestshttps://access.redhat.com/site/articles/393123s
● RHEV/RHS single-host performancehttps://access.redhat.com/site/articles/313973
06/12 Sessions
Time Title
10:40 AM – 11:40 AM Introduction to Red Hat OpenStack
2:30 PM - 3:30 PM Introduction & Overview of OpenStack for IaaS Clouds
3:40 PM - 4:40 PM Red Hat IaaS Overview & Roadmap
3:40 PM - 4:40 PM Integration of Storage,OpenStack & Virtualization
06/13 Sessions
Time Title
10:40 AM – 11:40 AM KVM Hypervisor Roadmap & Technology Update
2:30 PM - 3:30 PM Migrating 1,000 VMs from VMware to Red Hat Enterprise Virtualization: A Case Study
3:40 PM - 4:40 PM War Stories from the Cloud: Lessons from US Defense Agencies
4:50 PM - 5:50 PM Red Hat Virtualization Deep Dive
4:50 PM - 5:50 PM Red Hat Enterprise Virtualization Performance
4:50 PM - 5:50 PM Real world perspectives: Gaining Competitive Advantages with Red Hat Solutions
06/14 Sessions
Time Title
11:00 AM - 12:00 PM Network Virtualization & Software-defined Networking
9:45 AM - 10:45 PM Hypervisor Technology Comparison & Migration
Agenda
● Overview
● Features that help with Performance
● Tuning
● RHEV + RHS
● Migration to RHEV
● Wrap Up
● Questions
top related