high-performance vms
TRANSCRIPT
![Page 1: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/1.jpg)
HIGH-PERFORMANCEVMS
USING OPENSTACK NOVAby Nikola Đipanov
$ WHOAMI
![Page 2: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/2.jpg)
$ WHOAMISoftware engineer @ Red HatWorking on OpenStack Nova since 2012Nova core developer since 2013
THIS TALK
![Page 3: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/3.jpg)
THIS TALKOpenStack - the elastic cloudHigh-perf requirements in the cloudNUMALarge pagesCPU pinningIO devicesChallenge with exposing low level details in the cloud
OPENSTACK
![Page 4: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/4.jpg)
OPENSTACKCloud infrastructure
Open-source (98.76% Python)
Multiple projects (compute, network, block storage, imagestorage, messaging, ....)
Self-service user API and dashboard (*aaS)
OPENSTACK NOVA
![Page 5: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/5.jpg)
OPENSTACK NOVA
THE NOVA "ELASTIC CLOUD" APPROACH
![Page 6: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/6.jpg)
THE NOVA "ELASTIC CLOUD" APPROACHAllow for quick provisioning of new (comodity) hardware
Additional cloud resources (handled by other components)- VM images, block storage, networks...
Concept of flavors - combinations of VM resources (CPU,RAM, disk...)
Simple scheduling - focus on scale
Users have no visibility into hardware
NOVA ARCHITECTURE
![Page 7: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/7.jpg)
NOVA ARCHITECTURE
![Page 8: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/8.jpg)
NOVA SCHEDULING (IN MORE DETAIL) 1/2
![Page 9: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/9.jpg)
NOVA SCHEDULING (IN MORE DETAIL) 1/2Flavor (admin controlled) has the basic information about
resources assigned to an instance
Limited policy can be overriden through image metadata(mostly for OS/app related stuff)
Each compute host periodically exposes it's view ofresources to the scheduler
For each instance request scheduler running each set ofhost resources through a set of filters
Considers only the ones that pass all filters (optionally inparticular order)
NOVA SCHEDULING (IN MORE DETAIL) 2/2
![Page 10: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/10.jpg)
NOVA SCHEDULING (IN MORE DETAIL) 2/2Default filters consider overcommit of CPU/RAM
(tunable)
Basic placement does not dictate how to use resources onthe host granularity
(apart from PCI devs, kind of special cased)
HIGH-PERF REQUIREMENTS - MOTIVATION
![Page 11: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/11.jpg)
HIGH-PERF REQUIREMENTS - MOTIVATIONAllow for performance-sensitive apps to run in the cloud
Example use-case: Network Function VirtualizationCloud instances with dedicated resources (a bit of anoxymoron)The key is to allow for low (or at least predictable)latency
Better HW utilization on modern machinesHave a way to take into account NUMA effects onmoder hardwareMake this info available to the guest application/OS
HIGH-PERF REQUIREMENTS - THE CLOUD WAY
![Page 12: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/12.jpg)
HIGH-PERF REQUIREMENTS - THE CLOUD WAYRelying on users having knowledge about the hardware
they are running on - against the cloud paradigm
Need a way to allow users to request high-performancefeatures without the need to understand HW specifics
NUMA AWARENESS
![Page 13: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/13.jpg)
NUMA AWARENESSModern HW increasingly providing NUMA
Benefits of IaaS controller being NUMA aware:Memory bandwith & access latencyCache efficiency
Some workloads can benefit from NUMA guaranteestoo (especially combined with IO device pass-through)
Allow users to define a virtual NUMA topologyMake sure it maps to actual host topology
NUMA - LIBVIRT SUPPORT (HOST
![Page 14: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/14.jpg)
NUMA - LIBVIRT SUPPORT (HOSTCAPABILITIES)
<capabilities> <host> <topology> <cells num="2"> <cell id="0"> <memory unit="KiB">4047764</memory> <pages unit="KiB" size="4">999141</pages> <pages unit="KiB" size="2048">25</pages> <distances> <sibling id="0" value="10"> <sibling id="1" value="20"> </sibling></sibling></distances> <cpus num="4"> <cpu id="0" socket_id="0" core_id="0" siblings="0"> <cpu id="1" socket_id="0" core_id="1" siblings="1"> <cpu id="2" socket_id="0" core_id="2" siblings="2"> <cpu id="3" socket_id="0" core_id="3" siblings="3"> </cpu></cpu></cpu></cpu></cpus>
REQUESTING NUMA FOR AN OPENSTACK VM
![Page 15: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/15.jpg)
REQUESTING NUMA FOR AN OPENSTACK VMSet on the flavor (admin only)Default - no NUMA awareness
Simple case:hw:numa_nodes=2
Specifying more details:hw:numa_cpu.0=0,1hw:numa_cpu.1=2,3,4,5hw:numa_mem.0=500hw:numa_mem.1=1500
NUMA AWARENESS - IMPLEMENTATION
![Page 16: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/16.jpg)
NUMA AWARENESS - IMPLEMENTATIONDETAILS
Compute host NUMA topology exposed to theschedulerRequested instance topology is persisted for theinstance (NO mapping to host cells)Filter runs a placement algorithm for each hostOnce on compute host - re-calculate the placement andassign host<->instance node and persist itLibvirt driver implements the requested policy
NB: Users cannot influence final host node placement - it'sdecided by the fitting algo
NUMA LIBVIRT CONFIG - CPU PLACEMENT
![Page 17: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/17.jpg)
NUMA LIBVIRT CONFIG - CPU PLACEMENT<vcpu placement="static">6</vcpu><cputune> <vcpupin vcpu="0" cpuset="01"> <vcpupin vcpu="1" cpuset="01"> <vcpupin vcpu="2" cpuset="47"> <vcpupin vcpu="3" cpuset="47"> <vcpupin vcpu="4" cpuset="47"> <vcpupin vcpu="5" cpuset="47"> <emulatorpin cpuset="01,47"></emulatorpin></vcpupin></vcpupin></vcpupin></vcpupin></vcpupin></vcpupin></cputune>
NUMA LIBVIRT CONFIG - MEMORY AND TOPO
![Page 18: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/18.jpg)
NUMA LIBVIRT CONFIG - MEMORY AND TOPO<memory>2048000</memory><numatune> <memory mode="strict" nodeset="01"> <memnode cellid="0" mode="strict" nodeset="0"> <memnode cellid="1" mode="strict" nodeset="1"></memnode></memnode></memory></numatune><cpu> <numa> <cell id="0" cpus="0,1" memory="512000"> <cell id="1" cpus="1,2,3,4" memory="1536000"> </cell></cell></numa></cpu>
HUGE PAGES
![Page 19: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/19.jpg)
HUGE PAGESModern architectures support several page sizes
Provide dedicated RAM to VM processesMaximize TLB efficiency
HUGE PAGES - SOME CAVEATS
![Page 20: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/20.jpg)
HUGE PAGES - SOME CAVEATSNeed to be set up on the host separately (outside ofscope of Nova)
This breaks the "commodity hardware, easilydeployable" promise a bit
VM RAM has to be a multiple of the page sizeNo possibility for overcommit
Also interferes with the cloud promise of betterutilization
REQUESTING HP FOR AN OPENSTACK VM
![Page 21: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/21.jpg)
REQUESTING HP FOR AN OPENSTACK VMSet on the flavor (admin only)Default - no huge pages
hw:mem_page_size=large|small|any|2MB|1GB
HUGE PAGES - IMPLEMENTATION DETAILS
![Page 22: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/22.jpg)
HUGE PAGES - IMPLEMENTATION DETAILSEach compute host exposes data about it's huge pagesto the scheduler per NUMA nodeFilters run the same placement algorithm as fro NUMA,but now consider HP availability as wellOnce on compute host - re-calculate the placement andassign host<->instance node and persist itLibvirt driver implements the requested policy
HUGE PAGES LIBVIRT CONFIG
![Page 23: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/23.jpg)
HUGE PAGES LIBVIRT CONFIG(Can be per node, but Nova does not allow that
granularity)
<memorybacking> <hugepages> <page size="2" unit="MiB" nodeset="01"> <page size="1" unit="GiB" nodeset="2"> </page></page></hugepages></memorybacking>
CPU PINNING
![Page 24: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/24.jpg)
CPU PINNINGVM gets a dedicated CPUs for deterministicperformanceImprove performance of different workloads byavoiding/preferring hyperthreads.
CPU PINNING - SOME CAVEATS
![Page 25: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/25.jpg)
CPU PINNING - SOME CAVEATSRequires a dedicated set of hosts (simple scheduling, noautomatic VM reconfiguration)
This breaks the "commodity hardware, easilydeployable" promise a bit too
No possibility for overcommit (by design of course)Trades off maximizing utilization for performance ofspecific workloads
REQUESTING HP FOR AN OPENSTACK VM
![Page 26: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/26.jpg)
REQUESTING HP FOR AN OPENSTACK VMSet on the flavor (admin only)Default - no CPU pinning
hw:cpu_policy=shared|dedicatedhw:cpu_threads_policy=avoid|separate|isolate|prefer proposed but not merged at this point
CPU PINNING - IMPLEMENTATION DETAILS
![Page 27: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/27.jpg)
CPU PINNING - IMPLEMENTATION DETAILSCompute nodes expose available CPUs per NUMA nodeFilters run the same placement algorithm as for NUMA,but now consider CPU availabilityFlavors need to be set up to request for a specific set ofhosts (an aggregate) in addition to the CPU pinningconstraingEverything else same as for NUMA/HP
CPU PINNING LIBVIRT CONFIG
![Page 28: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/28.jpg)
CPU PINNING LIBVIRT CONFIG(memory is handled the same as for NUMA/Huge pages if
requested)
<cputune> <vcpupin vcpu="0" cpuset="0"> <vcpupin vcpu="1" cpuset="1"> <vcpupin vcpu="2" cpuset="4"> <vcpupin vcpu="3" cpuset="5"> <vcpupin vcpu="4" cpuset="6"> <vcpupin vcpu="5" cpuset="7"> <emulatorpin cpuset="01,47"></emulatorpin></vcpupin></vcpupin></vcpupin></vcpupin></vcpupin></vcpupin></cputune>
PCI PASS-THROUGH DEVICE LOCALITY
![Page 29: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/29.jpg)
PCI PASS-THROUGH DEVICE LOCALITYPass-through of PCI devices (not developed as part ofthis effort)Make sure that PCI devices are local to the NUMA nodethe VM is pinned to
PCI DEVICE LOCALITY - IMPLEMENTATION
![Page 30: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/30.jpg)
PCI DEVICE LOCALITY - IMPLEMENTATIONDETAILS
Compute nodes expose the NUMA node device is localtoo (libvirt has this info)Make sure that NUMA placement algo also considersrequested PCI devicesCurrent limitation - no matching of devices to guestnodes
HIGH PERF VMS IN OPENSTACK - THE GOOD
![Page 31: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/31.jpg)
HIGH PERF VMS IN OPENSTACK - THE GOODPARTS
Enable a major open source cloud solution to be used bya whole new class of usersExpands the ecosystem, fosters innovation...
CHALLENGE WITH EXPOSING LOW LEVEL
![Page 32: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/32.jpg)
CHALLENGE WITH EXPOSING LOW LEVELDETAILS IN THE CLOUD
We cannot expose low level details to the user so theAPI needs to hide them while still being usefulComplicates scheduling (SW) and hardwaremanagement (Ops)Nova specific challenges:
Not used by a big chunk of users - off by defaultInternals (esp. scheduler) code not up to thecomplexity needed for it to work properly
QUESTIONS?
![Page 33: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/33.jpg)
QUESTIONS?
THANK YOU!
![Page 34: HIGH-PERFORMANCE VMS](https://reader030.vdocuments.us/reader030/viewer/2022020620/61e3777c57826c250b6539e6/html5/thumbnails/34.jpg)
THANK YOU!