which hypervisor is best? my sql on ceph
TRANSCRIPT
Which Hypervisor is Best?MySQL on Ceph
3:30pm – 4:20pmRoom 203
WHOIS
Kyle BaderStorage Solution ArchitecturesRed Hat
Yves TrudeauPrincipal ArchitectPercona
AGENDA
• Ceph Architecture Elevator Pitch• Tuning Ceph Block (RBD)• Tuning QEMU Block Virtualization• Benchmarks
Ceph Architecture
ARCHITECTURAL COMPONENTS
RGWA web services
gateway for object storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,
PHP)RADOS
A software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and
scale-out metadata
APP HOST/VM CLIENT
ARCHITECTURAL COMPONENTS
RGWA web services
gateway for object storage, compatible
with S3 and Swift
LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,
PHP)RADOS
A software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors
RBDA reliable, fully-distributed block device with cloud
platform integration
CEPHFSA distributed file
system with POSIX semantics and
scale-out metadata
APP HOST/VM CLIENT
Linux Containers vs
Virtual Machines
KVM/QEMU RBD BACKEND
M M
RADOS CLUSTER
VM
HYPERVISORLIBRBD
PERCONA ON KRBD
M M
RADOS CLUSTER
CONTAINER HOSTKRBD
TUNING CEPH BLOCK
TUNING CEPH BLOCK
• Format• Order• Fancy Striping• TCP_NO_DELAY
RBD FORMAT
• Format 1• Deprecated• Supported by all versions of Ceph• No reason to use it in greenfield environment
• Format 2• New, default, format• Support snapshot and clone
RBD ODER
• The chunk / striping boundary for block device• Default is 4MB -> 22• 4MB = 222
• Used default during our testing
RBD: Fancy Striping
• Only available to QEMU / librbd• Finer striping for parallelization of small writes across
order• Helps with some HDD workloads• Used default during our testing
TCP_NO_DELAY
• Disables Nagel congestion control algorithm• Important for latency sensitive workloads• Good for maximizing IOPS -> MySQL• Default in QEMU• Default in KRBD• Added in mainline 4.2• Backported to RHEL 7.2 3.10-236+
TUNING QEMUBLOCK VIRTUALIZATION
TUNING QEMU BLOCK
• Paravirtual Devices• AIO Mode• Caching• x-data-plane• num_queues
QEMU: PARAVIRTUAL DEVICES
• Virtio-blk• Virtio-scsi
QEMU: AIO MODE
• Threads• Software implementation of aio using thread pool
• Native• User Kernel AIO• Way to go in the future
QEMU: CACHING
Writeback None Writethrough Directsync
Uses Host Page Cache Yes No Yes No
Guest Disk WCE Enabled Enabled Disabled Disabled
rbd_cache True False True False
rbd_max_dirty 25165824 0 0 0
QEMU: Timers
• Block storage benchmark too – fio• Very frequent access to CPU timing registers• Accesses need to be emulated• Can block main QEMU event loop with concurrent high
IO load
QEMU: Timers
• Block storage benchmark too – fio• Very frequent access to CPU timing registers• Accesses need to be emulated• Can block main QEMU event loop with concurrent high
IO load
BENCHMARKS
BENCHMARKS• Sysbench OLTP, 32 tables of each 28M rows, ~200GB• MySQL config: 50GB buffer pool, 8MB log file size, ACID• Filesystem: XFS with noatime, nodiratime, nobarrier• Data reloaded before each test• 100% reads: --oltp-point-select=100• 100% writes: --oltp-index-updates=100• 70%/30% reads/writes: --oltp-index-updates=28 --oltp-point-select=70
--rand-type=uniform• 20 minute run time per test, iterations averaged• 64 threads, 8 cores
BASIC QEMU PERFORMANCE
qemu tcg qemu-kvm-default io=threads cache=none io=native cache=none0
5000
10000
15000
20000
25000
30000
35000
Reads Writes R/W 70/30
IOPS
THREAD CACHING MODES
io=threads cache=none io=threads cache=writethrough io=threads cache=writeback0
5000
10000
15000
20000
25000
30000
Reads Writes R/W 70/30
IOPS
DEDICATED DISPATCH THREADS
io=na
tive ca
che=no
ne
io=na
tive ca
che=dir
ectsyn
c
io=na
tive ca
che=dir
ectsyn
c_x00
0d_io
threa
d=1
io=na
tive ca
che=dir
ectsyn
c_x00
0d_io
threa
d=2
0
5000
10000
15000
20000
25000
30000
35000
Reads Writes R/W 70/30
IOPS
DATA PLANE AND VIRTIO-SCSI QUEUES
x-data-plane virtio-scsi, num-queues=4 virtio-scsi, num-queues=2, vectors=3
virtio-scsi, num-queues=4, vectors=5
0
5000
10000
15000
20000
25000
30000
35000
40000
Reads Writes R/W 70/30
IOPS
CONTAINERS AND METAL
Metal (t
askset
-c 10
-17)
lxc (c
group
cpu 1
0-17)
io=thr
eads
cache
=none
io=na
tive ca
che=no
ne
virtio
-scsi,
num-qu
eues=
2, vec
tors=
30
10000
20000
30000
40000
50000
60000
Reads Writes R/W 70/30
IOPS
THANK YOU