windsor: domain 0 disaggregation for xenserver and xcp
DESCRIPTION
In a traditional Xen configuration domain 0 is used for a large number of different functions including running the toolstack(s), backends for network and disk I/O, running the QEMU device model instances, driving the physical devices in the system, handling guest console/framebuffer I/O and miscellaneous monitoring and management functions. Having all these functions in one domain produces a complex environment which is susceptible to shared fate on the failure of any one function, has complex interactions between functions (including resource contention) which makes it difficult to predict performance, and has limited flexibility (such as requiring the same kernel for all device drivers). ""Domain 0 disaggregation"" has been discussed for some time as a way to break out domain 0's functions into separate domains. Doing this enables each domain to be tailored to its function such as using a different kernel or operating system to drive different physical devices. Splitting functions into separate domains removes some of the unintentional interactions such as in-domain resource contention and reduces the system impact of the failure of a single function such as a device driver crash. Although domain 0 disaggregation is not new it is seldom used in practise and much of its use is focussed on providing enhanced security. Citrix XenServer will be moving towards a disaggregated domain 0 in order to provide better security, scalability, performance, reliability, supportability and flexibility. This talk will describe XenServer's “Windsor” architecture and explain how it will provide the above benefits to customers and users. We will present an overview of the architecture and some early experimental measurements showing the benefits.TRANSCRIPT
August 28, 2012
Windsor Domain 0 disaggregation for XenServer
James Bulpin, Director of Technology, XenServer, Citrix
Introduction
• XenServer/XenCloudPlatform (XCP): a distribution of Xen, a domain 0 and
everything else needed to create a platform for virtualization
• A platform for server virtualisation
• A platform for virtual desktops (e.g. XenDesktop)
• A platform for the cloud (e.g. OpenStack and CloudStack)
• A platform for virtualised networking (e.g. NetScaler)
• All use cases are tending towards much higher numbers of guest VMs per host
• Current architecture works now but will hit bottlenecks as servers get bigger
and more powerful
• Want a flexible, modular solution to scalability
Overview
• XenServer architecture evolution – a brief history
• Limitations of the current architecture
• Windsor: 3rd generation XenServer architecture ᵒ Using domain 0 disaggregation for scalability and performance
This presentation is about the internal technology of
XenServer/XCP. It is not a statement about the future
feature set or capabilities of any particular XenServer
release.
XenServer Architecture – a
brief history
XenServer Architecture – a brief history (1)
• First generation architecture in Burbank – XenEnterprise 3.0.0
• Single host virtualisation: no resource pools, no XenMotion
• Based on open-source xend/xm toolstack
• Basic C host agent driving xend
• XenAdmin Java management application
• Initially used a small, read-only dom0 ᵒ Moved to writable environment in 3.1
XenServer Architecture – a brief history (2)
• Second generation architecture in Rio – XenServer 4.0.0
• Current releases, including XenServer 6.1 coming soon, based on this
• All current versions of XCP based on this
• Sophisticated Ocaml xapi toolstack implementing the XenAPI ᵒ Uses only lowest level part of open-source Xen toolstack (libxc)
• Clustered architecture for resource pools, XenMotion and master mobility
• Domain 0 evolved from 1st generation ᵒ Fairly full Linux distribution based on CentOS
ᵒ Writable environment using RPMs for package management
Limitations of the current
architecture
XenServer 2nd generation host architecture
8
xapi
kernel
dom0
Xen Hypervisor
Linux storage devices
Linux Network devices
Storage Network Domains db
SM i/f scripts
xenstore
Blktap/ blkback Netback
qemu-dm
vm consoles
XML/RPC; HTTP
XenAPI Streaming services
other pool hosts
Hardware devices
VM1
Shared memory pages
tapdisk
vswitch
ovs-vswitchd
DVS Controller
Limitations of the current architecture
• With future larger, powerful servers domain 0 will become a bottleneck ᵒ Will limit scalability and performance
• Lack of failure containment
• Hard to reason about dom0 security
• Non-optimal use of modern NUMA architectures
• Limited third party extensibility
“Windsor” – XenServer’s new
architecture
Goals for the new architecture
• Improved scalability on single XenServer host
• Remove performance bottlenecks
• Cloud scale horizontal scaling of XenServer hosts
• Better isolation for multi-tenancy
• Increased availability and quality of service
• Higher Trusted Computing Base measurement coverage
Design principles and overview
• Scale-out, not scale-up – exploit parallelism and locality
• Disaggregate Domain-0 functionality to other domains
• Encapsulate and simplify
• Clean APIs between components
• Flexibility
• Design for scalability and security
The approach
• Can we improve performance and scalability with a bigger domain 0? – Yes
• Can we tune domain 0 to better utilise hardware resource? – Yes
• But... this makes domain 0 a bigger and more complex environment ᵒ Easy to change one thing and have a negative effect elsewhere
ᵒ Multiple individuals and organisations working in the same complex environment
ᵒ Still don’t get containment of failures
• Instead disaggregate domain 0 functionality into multiple system domains ᵒ Can be thought of as “virtualising dom0” – after all we tell users that it’s better to have
one workload per VM and use a hypervisor to run multiple VMs per server
ᵒ Helps disentangle complex behaviour
ᵒ Easier to provision suitable resources, allows for clear parallelism
ᵒ Contains failures
Domain 0 disaggregation
• Moving virtualisation functions out of domain 0 into other domains ᵒ Driver domains contain backend drivers (netback, blkback) and physical device drivers
• Use PCI pass-through to give access to physical device to be virtualised
ᵒ System domains to provide shared services such as logging
ᵒ Domains for running qemu(s) – per VM stub domains or shared qemu domains
ᵒ Monitoring, management and interface domains
• Has been around for a while, mostly with a security focus ᵒ Improving Xen Security through Disaggregation (Murray et. al., VEE08)
[http://www.cl.cam.ac.uk/~dgm36/publications/2008-murray2008improving.pdf]
ᵒ Breaking up is hard to do: Xoar (Colp et. al., SOSP11) [http://www.cs.ubc.ca/~andy/papers/xoar-
sosp-final.pdf]
ᵒ Qubes (driver domains) [http://qubes-os.org]
ᵒ Citrix XenClient XT (network driver domains) [http://www.citrix.com/xenclient]
Targets for disaggregation
• Storage driver domains – storage stack (e.g. VHD), ring backends, device
drivers
• Network driver domains – network stack (e.g. OVS), ring backends, device
drivers
• xenstored domain – busy, central control database for low level functionality
• xapi management domain
• qemu domain (per node, per tenant or per-VM) – emulated BIOS etc.
Benefits
• Better VM density
• Better use of scale-out hardware (NUMA, many cores, balanced I/O)
• Improved stability
• Improved availability (fault isolation)
• Opportunity for secure boot etc.
• More extensible, future value-add opportunities
CPU CPU RAM RAM NIC
(or SR-
IOV VF)
NIC (or SR-
IOV VF)
NIC (or SR-
IOV VF)
NIC (or SR-
IOV VF)
RAID
Xen
Dom0 Network
driver
domain
NFS/
iSCSI driver
domain
Qemu
domain
xapi
domain
Logging
domain Local
storage driver
domain
NFS/
iSCSI driver
domain
Network
driver
domain
Qemu
domain
eth eth eth eth scsi
User VM User VM
NB gntdev NB
NF BF NF BF
dbus over v4v
qemu qemu
xapi xenopsd
libxl
healthd
Domain
manager
vswitch
networkd
tapdisk
blktap3
storaged
syslogd vswitch
networkd
tapdisk
blktap3
storaged
tapdisk
blktap3
storaged
gntdev gntdev
Hardware
XenServer Hypervisor
NIC Storage
Logging
XAPI
qemu
console
Replication to scale-out on big servers
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
ISV
• Scalability • More VMs, more system domains
• Isolation for cloud environments
XenServer Hypervisor
Hardware
NIC Storage qemu
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
NIC Storage qemu
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM xenstored
Dom
0
Balanced I/O
Socket 0
Core Core
Core Core
PCIe
I/O hub
Socket 1
Core Core
Core Core
PCIe
I/O hub
DIMMs DIMMs
Net 0 Net 1
NIC NIC
Network
driver
domain
User
VMs
NIC NIC
Network
driver
domain
User
VMs
QPI
NUMA affinity
and memory
placement
Use driver
domain on
this node
Redundant
failover to
other node All IO
processing,
DMA, etc.
kept within
NUMA node
Flexibility – value-add storage appliances
(today)
Xen
Dom0 Storage appliance User VM
BF
Clever
stuff
BF NF BB NB BB
iSCSI/
NFS
Local
SR
Appliance exposes its
storage to Dom0 via
NFS/iSCSI over local
networking
VM block
accesses
traverse
several rings
Flexibility – value-add storage appliances
(Windsor)
Xen
Dom0 Driver domain User VM
BF
blktap3
Clever
stuff
scsi
Xen
Dom0 Driver domain User VM
BF
blktap3
Clever
stuff
scsi
Network link
between cluster
nodes
(PV VIF or PCI
passthrough
NIC/SR-IOV VF) Access to
local
disks/SSDs
Local blk ring,
faster than
NFS/iSCSI
via dom0
Summary
• Next generation XenServer architecture built to scale-out using domain 0
disaggregation techniques ᵒ Scale with the workload and server size
ᵒ Expect to significantly enhance scalability and aggregate performance
• Well defined APIs between components will allow better extensibility
• Avoids complexity of single, large, multi-function domain 0 ᵒ Easier to reason about
ᵒ Easier to maintain and debug
ᵒ Containment of failures
Work better. Live better.