my sql on ceph

40
MySQL and Ceph 2:20pm 3:10pm Room 203 MySQL in the Cloud Head-to-Head Performance Lab 1:20pm 2:10pm Room 203

Upload: redhatstorage

Post on 23-Jan-2018

1.628 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: My SQL on Ceph

MySQL and

Ceph

2:20pm – 3:10pm

Room 203

MySQL in the CloudHead-to-Head Performance Lab

1:20pm – 2:10pm

Room 203

Page 2: My SQL on Ceph

WHOIS

Brent Compton and Kyle Bader

Storage Solution Architectures

Red Hat

Yves Trudeau

Principal Architect

Percona

Page 3: My SQL on Ceph

AGENDA

MySQL on Ceph MySQL in the CloudHead-to-Head Performance Lab

• MySQL on Ceph vs. AWS• Head-to-head: Performance• Head-to-head: Price/performance• IOPS performance nodes for Ceph

• Why MySQL on Ceph• Ceph Architecture• Tuning: MySQL on Ceph• HW Architectural Considerations

Page 4: My SQL on Ceph

Why MySQL on Ceph

Page 5: My SQL on Ceph

• Ceph #1 block storage for OpenStack clouds

• MySQL #4 workload on OpenStack

(#1-3 often use databases too!)

• 70% Apps use LAMP on OpenStack

• Ceph leading open-source SDS

• MySQL leading open-source RDBMS

WHY MYSQL ON CEPH?MARKET DRIVERS

Page 6: My SQL on Ceph

• Shared, elastic storage pool

• Dynamic DB placement

• Flexible volume resizing

• Live instance migration

• Backup to object pool

• Read replicas via copy-on-write snapshots

WHY MYSQL ON CEPH?OPS EFFICIENCY

Page 7: My SQL on Ceph

WHY MYSQL ON CEPH?PUBLIC CLOUD FIDELITY

• Hybrid Cloud requires familiar platforms

• Developers want platform consistency

• Block storage, like the big kids

• Object storage, like the big kids

• Your hardware, datacenter, staff

Page 8: My SQL on Ceph

WHY MYSQL ON CEPH?HYBRID CLOUD REQUIRES HIGH IOPS

Ceph Provides

• Spinning Block – General Purpose

• Object Storage - Capacity

• SSD Block – High IOPS

Page 9: My SQL on Ceph

CEPH ARCHITECTURE

Page 10: My SQL on Ceph

ARCHITECTURAL COMPONENTS

RGWA web services

gateway for object

storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised of

self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-

distributed block

device with cloud

platform integration

CEPHFSA distributed file

system with POSIX

semantics and scale-

out metadata

APP HOST/VM CLIENT

Page 11: My SQL on Ceph

CEPH OSD

Page 12: My SQL on Ceph

RADOS CLUSTER

RADOS CLUSTER

Page 13: My SQL on Ceph

RADOS COMPONENTS

OSDs

• 10s to 10000s in a cluster

• Typically one per disk

• Serve stored objects to clients

• Intelligently peer for replication & recovery

Monitors

• Maintain cluster membership and state

• Provide consensus for distributed decision-making

• Small, odd number

• These do not serve stored objects to clients

Page 14: My SQL on Ceph

WHERE DO OBJECTS LIVE?

??

Page 15: My SQL on Ceph

A METADATA SERVER?

1

2

Page 16: My SQL on Ceph

CALCULATED PLACEMENT

Page 17: My SQL on Ceph

EVEN BETTER: CRUSH

CLUSTERPLACEMENT GROUPS

(PGs)

Page 18: My SQL on Ceph

CRUSH IS A QUICK CALCULATION

CLUSTER

Page 19: My SQL on Ceph

DYNAMIC DATA PLACEMENT

CRUSH:

• Pseudo-random placement algorithm

• Fast calculation, no lookup

• Repeatable, deterministic

• Statistically uniform distribution

• Stable mapping

• Limited data migration on change

• Rule-based configuration

• Infrastructure topology aware

• Adjustable replication

• Weighting

Page 20: My SQL on Ceph

DATA IS ORGANIZED INTO POOLS

CLUSTERPOOLS(CONTAINING PGs)

POOL

A

POOL

B

POOL

C

POOL

D

Page 21: My SQL on Ceph

ACCESS METHODS

Page 22: My SQL on Ceph

ARCHITECTURAL COMPONENTS

RGWA web services

gateway for object

storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised of

self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-

distributed block

device with cloud

platform integration

CEPHFSA distributed file

system with POSIX

semantics and scale-

out metadata

APP HOST/VM CLIENT

Page 23: My SQL on Ceph

ARCHITECTURAL COMPONENTS

RGWA web services

gateway for object

storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised of

self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-

distributed block

device with cloud

platform integration

CEPHFSA distributed file

system with POSIX

semantics and scale-

out metadata

APP HOST/VM CLIENT

Page 24: My SQL on Ceph

ACCESSING A RADOS CLUSTER

RADOS CLUSTER

socket

Page 25: My SQL on Ceph

RADOS ACCESS FOR APPLICATIONS

LIBRADOS

• Direct access to RADOS for applications

• C, C++, Python, PHP, Java, Erlang

• Direct access to storage nodes

• No HTTP overhead

Page 26: My SQL on Ceph

ARCHITECTURAL COMPONENTS

RGWA web services

gateway for object

storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised of

self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-

distributed block

device with cloud

platform integration

CEPHFSA distributed file

system with POSIX

semantics and scale-

out metadata

APP HOST/VM CLIENT

Page 27: My SQL on Ceph

STORING VIRTUAL DISKS

RADOS CLUSTER

Page 28: My SQL on Ceph

STORING VIRTUAL DISKS

RADOS CLUSTER

Page 29: My SQL on Ceph

STORING VIRTUAL DISKS

RADOS CLUSTER

Page 30: My SQL on Ceph

PERCONA ON KRBD

RADOS CLUSTER

Page 31: My SQL on Ceph

TUNING MYSQL ON CEPH

Page 32: My SQL on Ceph

TUNING FOR HARMONYOVERVIEW

Tuning MySQL

• Buffer pool > 20%

• Flush each Tx or batch?

• Parallel double write-buffer

flushTuning Ceph

• RHCS 1.3.2, tcmalloc 2.4

• 128M thread cache

• Co-resident journals

• 2-4 OSDs per SSD

Page 33: My SQL on Ceph

TUNING FOR HARMONYSAMPLE EFFECT OF MYSQL BUFFER POOL ON TpmC

-

200,000

400,000

600,000

800,000

1,000,000

1,200,000

0 1000 2000 3000 4000 5000 6000 7000 8000

tpm

C

Time (seconds) - 1 data point per minute

64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses

1% Buffer Pool

5% Buffer Pool

25% Buffer Pool

50% Buffer Pool

75% Buffer Pool

Page 34: My SQL on Ceph

TUNING FOR HARMONYSAMPLE EFFECT OF MYSQL Tx FLUSH ON TpmC

-

500,000

1,000,000

1,500,000

2,000,000

2,500,000

0 1000 2000 3000 4000 5000 6000 7000 8000

tpm

C

Time (seconds) - 1 data point per minute

64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses

Batch Tx flush (1 sec)

Per Tx flush

Page 35: My SQL on Ceph

TUNING FOR HARMONYSAMPLE EFFECT OF CEPH TCMALLOC VERSION ON TpmC

-

200,000

400,000

600,000

800,000

1,000,000

1,200,000

0 1000 2000 3000 4000 5000 6000 7000 8000

tpm

C

Time (seconds) - 1 data point per minute

64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses

Per Tx flush

Per Tx flush (tcmalloc v2.4)

Page 36: My SQL on Ceph

TUNING FOR HARMONYCREATING A SEPARATE POOL TO SERVE IOPS WORKLOADS

Creating multiple pools in the CRUSH map

• Distinct branch in OSD tree

• Edit CRUSH map, add SSD rules

• Create pool, set crush_ruleset to SSD rule

• Add Volume Type to Cinder

Page 37: My SQL on Ceph

TUNING FOR HARMONYIF YOU MUST USE MAGNETIC MEDIA

Reducing seeks on magnetic pools

• RBD cache is safe

• RAID Controllers with write-back cache

• SSD Journals

• Software caches

Page 38: My SQL on Ceph

HW ARCHITECTURE

CONSIDERATIONS

Page 39: My SQL on Ceph

ARCHITECTURAL CONSIDERATIONSUNDERSTANDING THE WORKLOAD

Traditional Ceph Workload

• $/GB

• PBs

• Unstructured data

• MB/sec

MySQL Ceph Workload

• $/IOP

• TBs

• Structured data

• IOPS

Page 40: My SQL on Ceph

NEXT UP

2:20pm – 3:10pm

Room 203

MySQL in the CloudHead-to-Head Performance Lab