red hat enterprise virtualization performance hat enterprise virtualization performance mark wagner...

Red Hat Enterprise

Virtualization PerformanceMark Wagner Senior Principal Engineer, Red Hat June 13, 2013

Agenda

● Overview

● Features that help with Performance

● Tuning

● RHEV + RHS

● Migration to RHEV

● Wrap Up

● Questions

Agenda

● Overview

● Tuning

● RHEV + RHS

● Wrap Up

● Questions

COMPLETE DATACENTER VIRTUALIZATION SOLUTION

● Leading performance: Top virtualization benchmarks for performance and scalability

● Affordable: Lower TCO and higher ROI than competitive platforms

● Enterprise-ready: Powerful mix of enterprise features and a rich set of partners

● Open: Offers choice and interoperability with no proprietary lock-in

● Cross-platform: Optimized for Microsoft Windows and Linux guests

ENTERPRISE VIRTUALIZATIONFROM THE PEOPLE WHO BROUGHT YOU

RED HAT ENTERPRISE LINUX

RED HAT ENTERPRISE VIRTUALIZATION ARCHITECTURE

● Inherits performance, scalability, security and supportability of Red Hat Enterprise Linux

● Shares Red Hat Enterprise Linux hardware and software ecosystem

● Host: 160 logical CPU (4,096 theoretical max), 2 TB RAM (64TB theoretical max)

● Guest: 160 vCPU, 2 TB RAM

● Supports latest silicon virtualization technology

● Microsoft certified for Windows guests

SMALL-FORM FACTOR, SCALABLE, HIGH PERFORMANCE HYPERVISORBASED ON RED HAT ENTERPRISE

RED HAT ENTERPRISE VIRTUALIZATION HYPERVISOR/KVM OVERVIEW

INDUSTRY LEADERS IN INFRASTRUCTURE, NETWORKING, AND STORAGE ARE BACKING RED HAT ENTERPRISE VIRTUALIZATION

INDUSTRY LEADERSHIP: THE ONLY END-TO-END OPEN VIRTUALIZATION INFRASTRUCTURE

SPECvirt2010: RHEL 6 KVM Post Industry Leading Results

http://www.spec.org/virt_sc2010/results/

Virtualization Layer and HardwareBlue = Disk I/OGreen = Network I/O

Client Hardware

System Under Test (SUT)

> 1 SPECvirt Tile/core> 1 SPECvirt Tile/core

Key Enablers: SR-IOV

Huge Pages

Node Binding

VMware ESX 4.1 HP DL380 G7 (12 Cores, 78 VMs)

RHEL 6 (KVM) IBM HS22V (12 Cores, 84 VMs)

VMware ESXi 5.0 HP DL385 G7 (16 Cores, 102 VMs)

RHEV 3.1 HP DL380p gen8 (16 Cores, 150 VMs)

VMware ESXi 4.1 HP BL620c G7 (20 Cores, 120 VMs)

RHEL 6 (KVM) IBM HX5 w/ MAX5 (20 Cores, 132 VMs)

VMware ESXi 4.1 HP DL380 G7 (12 Cores, 168 Vms)

VMware ESXi 4.1 IBM x3850 X5 (40 Cores, 234 VMs)

RHEL 6 (KVM) HP DL580 G7 (40 Cores, 288 VMs)

RHEL 6 (KVM) IBM x3850 X5 (64 Cores, 336 VMs)

RHEL 6 (KVM) HP DL980 G7 (80 Cores, 552 VMs)

10,000

1,221 1,367 1,5702,442

1,878 2,1442,742

3,8244,682

Best SPECvirt_sc2010 Scores by CPU Cores

(As of May 30, 2013)

System

Comparison based on best performing Red Hat and VMware solutions by cpu core count published at www.spec.org as of May 17, 2013. SPEC® and the benchmark name SPECvir_sct® are registered trademarks of the Standard Performance Evaluation Corporation. For more information about SPECvirt_sc2010, see www.spec.org/virt_sc2010/.

2-socket 162-socket 12 2-socket 20

4-socket 40

8-socket 64/80

SPECvirt2010: Red Hat Owns Industry Leading Results

Agenda

● Overview

● Tuning

● RHEV + RHS

● Wrap Up

● Questions

Features that help with Performance

Use these features to help improve Guest Performance

● Host CPU

● CPU Pin

● Hooks

● Direct LUN

● Huge Pages

● Migration

● Numad

● MTU

Use Host CPU

● Pros● Allows for guest to use hardware feature of CPU● Can provide good performance gains

● Cons● Prevents migration

RHEV – CPU Pinning

RHEL6.4 Single Large Guest (Parallel OpenMP Benchmark)

1 2 4 8 160

RHEL6.4 vs RHEL6.4 CPU type Linpack 2-node KVM guest

(Intel SandyBridge 8core/16cpu)

Kvm6.4 Kvm6.4 +cputype 16cpu-baremetal

Linpack Threads (intel nxn @20000)

CPU Pin

● Helps keep data cache lines hot

● Keep host scheduler from moving guests around

● Improved NUMA locality● If you pin correctly...

RHEV – CPU Pinning

20U 60U 100U

4 Guests, 2 Hosts

Out of the Box Manual Pin

User Sets - Scaling

A few others

● Hooks● The hook mechanism has been around for a long time● Some items move from hook to feature

● Direct LUN

● SR-IOV is currently one of the more important hooks

A few others

● Direct LUN● Allows you to use directly attached storage● Typically higher performance

● Standard HugePages 2MB

● Reserve/free via

● /proc/sys/vm/nr_hugepages● /sys/devices/node/*

/hugepages/*/nrhugepages● Used via hugetlbfs

● GB Hugepages 1GB

● Reserved at boot time/no freeing

● Used via hugetlbfs

● Transparent HugePages 2MB

● On by default via boot args or /sys

● Used for anonymous memory

Physical Memory

Virtual AddressSpace

128 data128 instruction

RHEV – Huge Pages in Guest

20U 60U 100U

Impact of Huge Pages in Guest

~ 10-15% improvement with huge pages

Regular Huge Pages

User sets - scaling

Migration support

● Under the Cluster -> Policy settings● Can set duration and CPU load thresholds● Moves VM when limits are hit

● Useful for ● Maintenance● Power savings● Load balancing

Migration for Power Savings

Migration for Performance

Tuning for Migration

100000

200000

300000

400000

500000

600000

700000

Live Migration Without TuningNote due to high load did not finish

TPM-RR TPM – LM 32

Check vdsm defaults

● /usr/share/doc/vdsm-4.10.2/vdsm.conf.sample● # Maximum bandwidth for migration, in MiBps, 0 means

libvirt's

# default, since 0.10.x default in libvirt is unlimited

# migration_max_bandwidth = 32● Edit /etc/vdsm/vdsm.conf

● Verify parameters are in the correct section

● Restart vdsm daemon for the changes to take effect● service vdsmd restart

100000

200000

300000

400000

500000

600000

700000

Impact of Tuning on Live migrationCompletes in approximately 1 minute

TPM-RR TPM – LM 32 TPM – UL

RHEV – Migration for Even Distribution

Guest 1 Guest 2 Guest 3 Guest 4 Agg of 4 Guest

W/O Migration Auto Migration

teHost policy was set to 51%. Guest migration started automatically, resulting in overallhigher performance as both hosts were utilized. The single guest migration completed in approximately one minute.

Four NUMA node system,fully-connected topology

Node 0 RAM

QPI links, IO, etc.

Core 0

Core 3

Core 1

Core 2

L3 Cache

Node 1 RAM

QPI links, IO, etc.

Core 0

Core 3

Core 1

Core 2

L3 Cache

Node 2 RAM

QPI links, IO, etc.

Core 0

Core 3

Core 1

Core 2

L3 Cache

Node 3 RAM

QPI links, IO, etc.

Core 0

Core 3

Core 1

Core 2

L3 Cache

Node 3

Node 1Node 0

Node 2

Sample remote access latencies

4 socket / 4 node: 1.5x

32 node system: 5.5x● (30/32 inter-node latencies >= 4x)

10 ( 32/1024: 3.1%) 13 ( 32/1024: 3.1%) 40 ( 64/1024: 6.2%) 48 (448/1024: 43.8%) 55 (448/1024: 43.8%)

So, what's the NUMA problem?

● The Linux system scheduler is very good at maintaining responsiveness and optimizing for CPU utilization

● Tries to use idle CPUs, regardless of where process memory is located.... Using remote memory degrades performance!

● Red Hat is working with the upstream community to increase NUMA awareness of the scheduler and to implement automatic NUMA balancing.

● Remote memory latency matters most for long-running, significant processes, e.g., HPTC, VMs, etc.

numad can help improve NUMA performance● New RHEL6.4 user-level daemon to automatically

improve out of the box NUMA system performance, and to balance NUMA usage in dynamic workload environments

● Was tech-preview in RHEL6.3

● Not enabled by default

● See numad(8)

numad aligns process memory and CPU threads within nodes

Node 0Node 0Node 0 Node 2Node 1 Node 3

Process 37

Process 29

Process 19

Process 61

Node 0Node 0Node 0 Node 2Node 1 Node 3

Proc 37Proc

Proc19 Proc

Before numad After numad

RHEV – hand tuning vs numad

20U 60U 100U

untuned manual pin numad

User Sets - Scaling

numad gives same performance improvements as manual pinning and also allows migration

Agenda

● Overview

● Tuning

● RHEV + RHS

● Wrap Up

● Questions

Tuning

Tuning for both the hypervisor and guest

● Already covered vdsm

● tuned

● Kernel

● MTU

tuned Profile Comparison MatrixTunable default enterprise-

storagevirtual-host

virtual-guest

latency-performance

throughput-performance

kernel.sched_min_granularity_ns

4ms 10ms 10ms 10ms 10ms

kernel.sched_wakeup_granularity_ns

4ms 15ms 15ms 15ms 15ms

vm.dirty_ratio 20% RAM 40% 10% 40% 40%

vm.dirty_background_ratio

10% RAM 5%

vm.swappiness 60 10 30

I/O Scheduler (Elevator)CFQ deadline deadline deadline deadline deadline

Filesystem Barriers On Off Off Off

CPU Governor ondemand performance performance performance

Disk Read-ahead 4x

Disable THP Yes

Disable C-States Yes

https://access.redhat.com/site/solutions/369093

● RHEL scheduler tries to keep all CPUs busy by moving tasks form overloaded CPUs to idle CPUs

● You detect using “perf stat”, look for excessive “migrations”

● Issues on larger systems where the scheduler is a bit too active

● Can tune sched_migration_cost to help calm the scheduler down

● This is especially effective on multi-socket systems

Load Balancing

● /proc/sys/kernel/sched_migration_cost

● Amount of time after the last execution that a task is considered to be “cache hot” in migration decisions. A “hot” task is less likely to be migrated, so increasing this variable reduces task migrations. The default value is 500000 (ns).

● If the CPU idle time is higher than expected when there are runnable processes, try reducing this value. If tasks bounce between CPUs or nodes too often, try increasing it.

● Rule of thumb – increase by 2-10x to reduce load balancing

● Increase by 10x on large systems when many CGROUPs are actively used (ex: RHEV/ KVM/RHOS)

Load Balancing

sched_migration_cost

exit_10 exit_100 exit_1000 fork_10 fork_100 fork_10000.00

100.00

150.00

200.00

250.00

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

140.00%

RHEL6.3 Effect of sched_migration_cost on fork/exitIntel Westmere EP 24cpu/12core, 24 GB mem

usec/call default 500us usec/call tuned 4ms percent improvement

● Improved interface allows for setting MTU

● On faster networks this can be a big win● Of course it depends on the data patterns● Assumes switch is set correctly

Agenda

● Overview

● Tuning

● RHEV + RHS

● Wrap Up

● Questions

RHEV + RHS

Integration has been underway

● Scale testing single volume over 8 RHS servers● Not necessarily what we would recommend● 1024 guests with RHEV● 2048 guests with KVM

● Another internal group had 2250 guests with RHEV

● 512 guests all driving IO● Sum of guest memory sized to fit in hosts memory● No swapping

Host (hypervisor)

qemu-kvm process

Software layers in Virtual Block Storage

Guest (VM)ext4 /mnt/test

LVM volume/dev/vg_guest/test

virtio-block /dev/vda

Kernel - FUSE

glusterfs

/mnt/your-gluster-volume/guest-image-pathname

qemu-kvm

Guest ...

network

Scaling RHEV / KVM / RHS128 VMs all performed I/O simultaneously

RHEV Host A

RHEL6.2z

Gluster3.3

RHS 2.0U4

RHEL6.3z

RHEV3.1

RHEV / KVM / RHS Tuning

● gluster volume set <volume> group virt

● RHS server: tuned-adm profile rhs-virtualization

● KVM host: tuned-adm profile virtual-host

● ideally separate gluster volumes for app. files, disk images

● For better response time shrink guest block device queue● /sys/block/vda/queue/nr_request ( 8 )

● for best sequential read throughput, raise VM readahead● /sys/block/vda/queue/read_ahead_kb ( 2048 )

Impact of Tuning Gluster and Kernel Alone

Random Write Random Read Sequential Write Sequential Read0

Effect of Tuning on Large File Virtio-Block I/O

2 replicas, 8 servers, 16 hosts, 128 VMs, 32G per Server, 64K recsz

untuned tuned

For sequential I/O, RHEV host utilizes 10 GbE net Only 1 RHEV host, 8 RHS servers, 2-replica volume,

1 thread per VM, 16 GB files, 4 KB transfer size

1 VM 2 VMs 4 VMs 8 VMs 0

VM sequential write throughput

KVM guests/host

1 VM 2 VMs 4 VMs 8 VMs0

VM sequential read throughput

128 KB2048 KB

KVM guests/host

) Guest readahead

Red Hat recommends

RHEV/RHS Scales as Hardware is Added

0 1 2 3 4 5 6 7 8 90

Scaling Sequential I/O of 128 VMsOne host per gluster server, virtual-block

64-KB transfer size, one thread/guest

seq-write seq-read

RHS servers

0 1 2 3 4 5 6 7 8 90

Scaling Random IOPS with 128 guests

1 RHS server/RHEV host, 64-KB transfer size, 1 thread/guest

random-write random-read

RHS servers

For More Information

● RHEV/RHS with 128 guestshttps://access.redhat.com/site/articles/393123

● RHEV/RHS single-host performancehttps://access.redhat.com/site/articles/313973

Agenda

● Overview

● Tuning

● RHEV + RHS

● Wrap Up

● Questions

Migrating to RHEV

Several detailed Reference Architecture papers on this● Red Hat customer portal https://access.redhat.com

● Requires user account

● Scripts and configuration files provided

Agenda

● Overview

● Tuning

● RHEV + RHS

● Wrap Up

● Questions

Reference Architectures

Two places to get Red Hat reference architectures

● Red Hat resource library www.redhat.com

● Free

● Red Hat customer portal https://access.redhat.com

● Requires user account

● Scripts and configuration files provided

● RHEV/RHS with 128 guestshttps://access.redhat.com/site/articles/393123s

● RHEV/RHS single-host performancehttps://access.redhat.com/site/articles/313973

06/12 Sessions

Time Title

10:40 AM – 11:40 AM Introduction to Red Hat OpenStack

2:30 PM - 3:30 PM Introduction & Overview of OpenStack for IaaS Clouds

3:40 PM - 4:40 PM Red Hat IaaS Overview & Roadmap

3:40 PM - 4:40 PM Integration of Storage,OpenStack & Virtualization

06/13 Sessions

Time Title

10:40 AM – 11:40 AM KVM Hypervisor Roadmap & Technology Update

2:30 PM - 3:30 PM Migrating 1,000 VMs from VMware to Red Hat Enterprise Virtualization: A Case Study

3:40 PM - 4:40 PM War Stories from the Cloud: Lessons from US Defense Agencies

4:50 PM - 5:50 PM Red Hat Virtualization Deep Dive

4:50 PM - 5:50 PM Red Hat Enterprise Virtualization Performance

4:50 PM - 5:50 PM Real world perspectives: Gaining Competitive Advantages with Red Hat Solutions

06/14 Sessions

Time Title

11:00 AM - 12:00 PM Network Virtualization & Software-defined Networking

9:45 AM - 10:45 PM Hypervisor Technology Comparison & Migration

Agenda

● Overview

● Tuning

● RHEV + RHS

● Wrap Up

● Questions

red hat enterprise virtualization performance hat enterprise virtualization performance mark wagner...

Documents

red hat enterprise linux 5.2 virtualization guide

red hat enterprise virtualization performance ·...

red hat enterprise virtualization...

enterprise virtualization: comparing red hat and oracle...

red hat enterprise linux 7 virtualization tuning and...

installing red hat enterprise virtualization...

description of red hat enterprise virtualization for...

red hat enterprise virtualization for desktops

red hat enterprise linux hardware certification 1.0 program...

red hat enterprise virtualization 3.1 manual de … ·...

migrating middleware applications using red hat enterprise...

rh319: red hat enterprise virtualization with exam · red...

red hat enterprise virtualization - kvm-based ... · pdf...

red hat enterprise virtualization 3 · red hat enterprise...

red hat enterprise virtualization 3.0 installation guide

red hat enterprise virtualization · red hat enterprise...

red hat enterprise virtualization sizing guide...

red hat enterprise virtualization sizing guide - principled

red hat enterprise virtualization manager for desktops

red hat enterprise virtualization 3.0 manual de inicio...