kvm forum 2011 performance improvements optimizations d

Upload: stgreek

Post on 05-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    1/41

    1

    KVM PERFORMANCEIMPROVEMENTS AND

    OPTIMIZATIONS

    Mark WagnerPrincipal SW Engineer, Red Hat

    August 14, 2011

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    2/41

    2

    Overview

    Discuss a range of topics about KVM performance

    How to improve out of the box experience But crammed into 30 minutes

    Use libvirt where possible

    Note that not all features in all releases

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    3/41

    3

    Before we dive in...

    RHEL6-default0

    50

    100

    150

    200

    250

    300

    350

    400

    450

    Guest NFS Write Performance - are we sure ?

    Is this really a 10Gbit line ?

    Throu

    ghput(MBytes/se

    cond)

    By defaultthe rtl8139

    device is chosen

    Arrow showsimprovement

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    4/414

    Agenda

    Low hanging fruit

    Memory Networking

    Block I/O basics

    NUMA and affinity settings

    CPU Settings

    Wrap up

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    5/415

    Recent Performance Improvements

    Performance enhancements in every component

    Component Feature

    CPU/Kernel NUMA Ticketed spinlocks; Completely fair scheduler;Extensive use of Read Copy Update (RCU)Scales up to 64 vcpus per guest

    Memory Large memory optimizations: Transparent Huge Pages isideal for hardware based virtualization

    Networking Vhost-net a kernel based virtio w/ better throughput andlatency. SRIOV for ~native performance

    Block AIO, MSI, scatter gather.

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    6/416

    Remember this ?

    RHEL6-default0

    50

    100

    150

    200

    250

    300

    350

    400

    450

    Guest NFS Write Performance

    Throu

    ghput(MBytes/second)

    Impact of not specifyingOS at guest creation

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    7/417

    Be Specific !

    virt-manager will:

    Make sure the guest

    will function

    Optimize as it can

    The more info you provide the

    more tailoring will happen

    Specify theOS details

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    8/41

    8

    Specify OS + flavor

    Specifying Linux will get you:

    The virtio driver

    If the kernel is recent

    enough the vhost_netdrivers

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    9/41

    9

    I Like This Much Better

    Default vhost virtio0

    50

    100

    150

    200

    250

    300

    350

    400

    450

    Guest NFS Write Performance

    Impact of specifying OS Type at Creation

    Throu

    ghput(MBytes/se

    cond)

    12.5 x

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    10/41

    10

    Memory Tuning Huge Pages

    2M pages vs 4K standard Linux page

    Virtual to physical page map is 512 times smaller

    TLB can map more physical page resulting fewer misses

    Traditional Huge Pages always pinned

    We now have Transparent Huge Pages

    Most databases support Huge Pages

    Benefits not only Host but guests

    Try them in a guest too !

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    11/41

    11

    Transparent Huge Pages

    No-THP THPK

    50K

    100K

    150K

    200K

    250K

    300K

    350K

    400K

    450K

    500K

    SPECjbb workload24-cpu, 24 vcpu Westmere EP, 24GB

    guest

    bare metal

    T

    ransactionsPerM

    inute

    30%25%

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    12/41

    12

    Network Tuning Tips

    Separate networks for different functions

    Use arp_filter to prevent ARP Flux

    echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter

    Use /etc/sysctl.conf for permanent

    Packet size - MTU

    Need to make sure it is set across all components

    Don't need HW to bridge intra-box communications

    VM traffic never hits the HW on same box

    Can really kick up MTU as needed

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    13/41

    13

    KVM Network Architecture - VirtIO

    Virtual Machine sees paravirtualizednetwork device VirtIO VirtIO drivers included in Linux Kernel VirtIO drivers available for Windows

    Network stack implemented in userspace

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    14/41

    14

    KVM Network ArchitectureVirtio

    Context switch host

    kernel userspace

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    15/41

    15

    1 21 64

    195

    765

    2048

    6147

    24573

    65536

    0

    50

    100

    150

    200250

    300

    350

    400

    Network Latency virt io

    Guest Receive (Lower is better)

    Virtio

    Host

    Message Size (Bytes)

    Latency(use

    cs)

    Latency comparison

    4X gap in latency

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    16/41

    16

    KVM Network Architecture vhost_net

    Moves QEMU network stack from

    userspace to kernel

    Improved performance

    Lower Latency

    Reduced context switching

    One less copy

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    17/41

    17

    1 19 51 131

    387

    1027

    3075

    8195

    2457

    9

    6553

    9

    0

    50

    100

    150

    200

    250

    300

    350

    400

    Network Latency - vhost_net

    Guest Receive (Lower is better)

    VirtioVhost_net

    Host

    Message Size (Bytes)

    Latency

    (usecs)

    Latency comparison

    Latency much closerto bare metal

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    18/41

    18

    Host CPU Consumption virtio vs vhost_net

    32-vh

    ost

    3

    2-vio

    128-

    vhost

    12

    8-vio

    512-

    vhost

    51

    2-vio

    204

    8-vh

    ost

    204

    8-vio

    819

    2-vh

    ost

    819

    2-vio

    3276

    8-vh

    ost

    3276

    8-vio

    0

    5

    10

    15

    20

    25

    30

    35

    40

    45

    Host CPU Consumpt ion, virtio vs Vhost

    8 Guests TCP Receive

    %usr

    %soft

    %guest

    %sys

    Message Size (Bytes)

    %T

    otalHostCPU(Lo

    werisBetter)

    Major difference

    is usr time

    Two columnsis a data set

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    19/41

    19

    vhost_net Efficiency

    32 64 128

    256

    512

    1024

    2048

    4092

    8192

    1638

    4

    3276

    8

    6550

    7

    0

    50

    100

    150

    200

    250

    300

    350

    400

    8 Guest Scale Out RX Vhost vs Virtio - % Host CPUMbit per % CPU netperf TCP_STREAM

    Vhost

    Virtio

    Message Size (Bytes)

    Mbit/%C

    PU(biggeris

    better)

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    20/41

    20

    KVM Architecture Device Assignment vs SR/IOV

    Device AssignmentSR-IOV

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    21/41

    21

    KVM Network Architecture PCI Device Assignment

    Physical NIC is passed directly to guest

    Device is not available to anything else on the host

    Guest sees real physical device

    Needs physical device driver

    Requires hardware support

    Intel VT-D or AMD IOMMU

    Lose hardware independence

    1:1 mapping of NIC to Guest

    BTW - This also works on some I/O controllers

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    22/41

    22

    KVM Network Architecture SR-IOV

    Single Root I/O Virtualization

    New class of PCI devices that present multiple virtual devices that

    appear as regular PCI devices

    Guest sees real physical device

    Needs physical (virtual) device driver

    Requires hardware support

    Actual device can still be shared

    Low overhead, high throughput

    No live migration well its difficult

    Lose hardware independence

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    23/41

    23

    Latency comparison

    1 19 51 131

    387

    1027

    3075

    8195

    2457

    9

    6553

    9

    0

    50

    100

    150

    200

    250

    300

    350

    400

    Network Latency by guest in terface method

    Guest Receive (Lower is better)

    Virtio

    Vhost_net

    SR-IOV

    Host

    Message Size (Bytes)

    Latency(use

    cs)

    SR-IOV latency closeto bare metal

    SR-IOV latency close to bare metal

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    24/41

    24

    KVM w/ SR-IOV Intel Niantic 10Gb Postgres DB

    ThroughputinOr

    der/min(OPM)

    69,984

    86,46992,680

    0

    10,000

    20,000

    30,000

    40,000

    50,000

    60,000

    70,000

    80,000

    90,000

    100,000

    1 Red Hat KVM

    bridged guest

    1 Red Hat KVM

    SR-IOV guest

    1 database instance

    (bare metal)

    TotalOPM

    DVD Store Version 2 results

    93%Bare Metal

    76%Bare Metal

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    25/41

    25

    I/O Tuning - Hardware

    Know your Storage

    SAS or SATA? Fibre Channel, Ethernet or SSD? Bandwidth limits

    Multiple HBAs Device-mapper-multipath Provides multipathing capabilities and LUN persistence

    How to test

    Low level I/O tools dd, iozone, dt, etc

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    26/41

    26

    I/O Tuning Understanding I/O Elevators

    Deadline

    Two queues per device, one for read and one for writes

    IOs dispatched based on time spent in queue

    CFQ Per process queue

    Each process queue gets fixed time slice (based on process priority)

    Noop FIFO

    Simple I/O Merging

    Lowest CPU Cost

    Can set at Boot-time

    Grub command line elevator=deadline/cfq/noop Or Dynamically per device

    echo deadline > /sys/class/block/sda/queue/scheduler

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    27/41

    27

    Virtualization Tuning I/O elevators - OLTP

    1Guest 2 Guests 4 GuestsK

    50K

    100K

    150K

    200K

    250K

    300K

    Performance Impact of I/O Elevators on OLTP Workload

    Host running Deadline Scheduler

    Noop

    CFQ

    Deadline

    TransactionsperMinute

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    28/41

    28

    Virtualization Tuning - Caching

    Cache=none I/O from the guest in not cached

    Cache=writethrough

    I/O from the guest is cached and written through on the host

    Potential scaling problems with this option with multiple guests(host cpu used to maintain cache)

    Cache=writeback - Not supported

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    29/41

    29

    Effect of I/O Cache settings on Guest performance

    1Guest 4GuestsK

    100K

    200K

    300K

    400K

    500K

    600K

    700K

    800K

    900K

    OLTP like workload

    FusionIO storage

    Cache=WT

    Cache=none

    TransactionPerMinute

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    30/41

    30

    I/O Tuning - Filesystems

    B

    Configure read ahead

    Database ( parameters to configure read ahead) Block devices ( getra , setra )

    Asynchronous I/O

    Eliminate Synchronous I/O stall Critical for I/O intensive applications

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    31/41

    31

    AIO Native vs Threaded (default)

    10U 20UK

    100K

    200K

    300K

    400K

    500K

    600K

    700K

    800K

    900K

    1000K

    Impact of AIO selection on OLTP Workload"cache=none" setting used - Threaded is default

    AIO Threaded

    AIO Native

    Number of Users (x 100)

    TransactionsPer

    Minute

    Configurable per device (only by xml configuration file)Libvirt xml file - driver name='qemu' type='raw' cache='none' io='native'

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    32/41

    32

    Remember Network Device Assignment ?

    Device Assignment It works for Block too ! Device Specific Similar Benefits And drawbacks...

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    33/41

    3334

    KVM VirtIO KVM/PCI-PassThrough Bare-Metalk

    2k

    4k

    6k

    8k

    10k

    12k

    14 k

    16k

    18k

    20k

    SAS Mixed Analytics Workload - Metal/KVMIntel Westmere EP 12-core, 24 GB Mem, LSI 16 SAS

    SAS systemSAS Total

    Timetocom

    plete(secs)

    6% longer

    25% longer

    Block Device Passthrough - SAS Workload

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    34/41

    34

    NUMA (Non Uniform Memory Access)

    Multi Socket Multi core architecture

    NUMA is needed for scaling

    Keep memory latencies low

    Linux completely NUMA aware

    Additional performance gains by enforcing NUMA placement

    Still some out of the box work is needed

    How to enforce NUMA placement

    numactl CPU and memory pinning

    One way to test if you get a gain is to mistune it.

    Libvirt now supports some NUMA placement

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    35/41

    35

    Memory Tuning - NUMA# numactl --hardwareavailable: 8 nodes (0-7)node 0 cpus: 0 1 2 3 4 5

    node 0 size: 8189 MBnode 0 free: 7220 MBnode 1 cpus: 6 7 8 9 10 11node 1 size: 8192 MB...node 7 cpus: 42 43 44 45 46 47node 7 size: 8192 MB

    node 7 free: 7816 MBnode distances:node 0 1 2 3 4 5 6 7

    0: 10 16 16 22 16 22 16 221: 16 10 22 16 16 22 22 162: 16 22 10 16 16 16 16 163: 22 16 16 10 16 16 22 22

    4: 16 16 16 16 10 16 16 225: 22 22 16 16 16 10 22 166: 16 22 16 22 16 22 10 167: 22 16 16 22 22 16 16 10

    InternodeMemory distanceFrom SLIT table

    Note variation ininternode distances

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    36/41

    36

    Virtualization Tuning Using NUMA

    4 Guest -24 vcpu-56G 4 Guest -24 vcpu-56G-NUMAK

    50K

    100K

    150K

    200K

    250K

    300K

    350K

    400K

    Impact of NUMA in multiguest OLTP

    location,location,location

    Guest 4

    Guest 3

    Guest 2

    Guest 1

    TransactionsPe

    rSecond

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    37/41

    37

    Specifying Processor Details

    Mixed results with CPU typeand topology

    The Red Hat team is still

    exploring some topology

    performance quirks

    Both model and topology

    Experiment and see what works

    best in your case

    CPU Pi i Affi i

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    38/41

    38

    CPU Pinning - Affinity

    Virt-manager allows CPU

    selection based on NUMA

    topology

    True NUMA support in

    libvirt

    Virsh pinning allows finer grain

    control

    1:1 pinning

    Good gains with pinning

    P f it i t l

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    39/41

    39

    Performance monitoring tools

    Monitoring tools

    top, vmstat, ps, iostat, netstat, sar, perf Kernel tools

    /proc, sysctl, AltSysrq Networking

    ethtool, ifconfig Profiling

    oprofile, strace, ltrace, systemtap, perf

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    40/41

    40

    Wrap up

    KVM can be tuned effectively Understand what is going on under the covers Turn off stuff you don't need Be specific when you create your guest Look at using NUMA or affinity Choose appropriate elevators (Deadline vs CFQ) Choose your cache wisely

    For More Information

  • 7/31/2019 Kvm Forum 2011 Performance Improvements Optimizations D

    41/41

    41

    For More Information

    KVM Wiki

    http://www.linux-kvm.org/page/Main_Page

    irc, email lists, etc

    http://www.linux-kvm.org/page/Lists%2C_IRC libvirt Wiki

    http://libvirt.org/ New, revamped edition of the Virtualization Guide

    http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/index.html

    Should be available soon !