my sql on ceph

Post on 23-Jan-2018

1.629 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MySQL and

Ceph

2:20pm – 3:10pm

Room 203

MySQL in the CloudHead-to-Head Performance Lab

1:20pm – 2:10pm

Room 203

WHOIS

Brent Compton and Kyle Bader

Storage Solution Architectures

Red Hat

Yves Trudeau

Principal Architect

Percona

AGENDA

MySQL on Ceph MySQL in the CloudHead-to-Head Performance Lab

• MySQL on Ceph vs. AWS• Head-to-head: Performance• Head-to-head: Price/performance• IOPS performance nodes for Ceph

• Why MySQL on Ceph• Ceph Architecture• Tuning: MySQL on Ceph• HW Architectural Considerations

Why MySQL on Ceph

• Ceph #1 block storage for OpenStack clouds

• MySQL #4 workload on OpenStack

(#1-3 often use databases too!)

• 70% Apps use LAMP on OpenStack

• Ceph leading open-source SDS

• MySQL leading open-source RDBMS

WHY MYSQL ON CEPH?MARKET DRIVERS

• Shared, elastic storage pool

• Dynamic DB placement

• Flexible volume resizing

• Live instance migration

• Backup to object pool

• Read replicas via copy-on-write snapshots

WHY MYSQL ON CEPH?OPS EFFICIENCY

WHY MYSQL ON CEPH?PUBLIC CLOUD FIDELITY

• Hybrid Cloud requires familiar platforms

• Developers want platform consistency

• Block storage, like the big kids

• Object storage, like the big kids

• Your hardware, datacenter, staff

WHY MYSQL ON CEPH?HYBRID CLOUD REQUIRES HIGH IOPS

Ceph Provides

• Spinning Block – General Purpose

• Object Storage - Capacity

• SSD Block – High IOPS

CEPH ARCHITECTURE

ARCHITECTURAL COMPONENTS

RGWA web services

gateway for object

storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised of

self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-

distributed block

device with cloud

platform integration

CEPHFSA distributed file

system with POSIX

semantics and scale-

out metadata

APP HOST/VM CLIENT

CEPH OSD

RADOS CLUSTER

RADOS CLUSTER

RADOS COMPONENTS

OSDs

• 10s to 10000s in a cluster

• Typically one per disk

• Serve stored objects to clients

• Intelligently peer for replication & recovery

Monitors

• Maintain cluster membership and state

• Provide consensus for distributed decision-making

• Small, odd number

• These do not serve stored objects to clients

WHERE DO OBJECTS LIVE?

??

A METADATA SERVER?

1

2

CALCULATED PLACEMENT

EVEN BETTER: CRUSH

CLUSTERPLACEMENT GROUPS

(PGs)

CRUSH IS A QUICK CALCULATION

CLUSTER

DYNAMIC DATA PLACEMENT

CRUSH:

• Pseudo-random placement algorithm

• Fast calculation, no lookup

• Repeatable, deterministic

• Statistically uniform distribution

• Stable mapping

• Limited data migration on change

• Rule-based configuration

• Infrastructure topology aware

• Adjustable replication

• Weighting

DATA IS ORGANIZED INTO POOLS

CLUSTERPOOLS(CONTAINING PGs)

POOL

A

POOL

B

POOL

C

POOL

D

ACCESS METHODS

ARCHITECTURAL COMPONENTS

RGWA web services

gateway for object

storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised of

self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-

distributed block

device with cloud

platform integration

CEPHFSA distributed file

system with POSIX

semantics and scale-

out metadata

APP HOST/VM CLIENT

ARCHITECTURAL COMPONENTS

RGWA web services

gateway for object

storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised of

self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-

distributed block

device with cloud

platform integration

CEPHFSA distributed file

system with POSIX

semantics and scale-

out metadata

APP HOST/VM CLIENT

ACCESSING A RADOS CLUSTER

RADOS CLUSTER

socket

RADOS ACCESS FOR APPLICATIONS

LIBRADOS

• Direct access to RADOS for applications

• C, C++, Python, PHP, Java, Erlang

• Direct access to storage nodes

• No HTTP overhead

ARCHITECTURAL COMPONENTS

RGWA web services

gateway for object

storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised of

self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-

distributed block

device with cloud

platform integration

CEPHFSA distributed file

system with POSIX

semantics and scale-

out metadata

APP HOST/VM CLIENT

STORING VIRTUAL DISKS

RADOS CLUSTER

STORING VIRTUAL DISKS

RADOS CLUSTER

STORING VIRTUAL DISKS

RADOS CLUSTER

PERCONA ON KRBD

RADOS CLUSTER

TUNING MYSQL ON CEPH

TUNING FOR HARMONYOVERVIEW

Tuning MySQL

• Buffer pool > 20%

• Flush each Tx or batch?

• Parallel double write-buffer

flushTuning Ceph

• RHCS 1.3.2, tcmalloc 2.4

• 128M thread cache

• Co-resident journals

• 2-4 OSDs per SSD

TUNING FOR HARMONYSAMPLE EFFECT OF MYSQL BUFFER POOL ON TpmC

-

200,000

400,000

600,000

800,000

1,000,000

1,200,000

0 1000 2000 3000 4000 5000 6000 7000 8000

tpm

C

Time (seconds) - 1 data point per minute

64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses

1% Buffer Pool

5% Buffer Pool

25% Buffer Pool

50% Buffer Pool

75% Buffer Pool

TUNING FOR HARMONYSAMPLE EFFECT OF MYSQL Tx FLUSH ON TpmC

-

500,000

1,000,000

1,500,000

2,000,000

2,500,000

0 1000 2000 3000 4000 5000 6000 7000 8000

tpm

C

Time (seconds) - 1 data point per minute

64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses

Batch Tx flush (1 sec)

Per Tx flush

TUNING FOR HARMONYSAMPLE EFFECT OF CEPH TCMALLOC VERSION ON TpmC

-

200,000

400,000

600,000

800,000

1,000,000

1,200,000

0 1000 2000 3000 4000 5000 6000 7000 8000

tpm

C

Time (seconds) - 1 data point per minute

64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses

Per Tx flush

Per Tx flush (tcmalloc v2.4)

TUNING FOR HARMONYCREATING A SEPARATE POOL TO SERVE IOPS WORKLOADS

Creating multiple pools in the CRUSH map

• Distinct branch in OSD tree

• Edit CRUSH map, add SSD rules

• Create pool, set crush_ruleset to SSD rule

• Add Volume Type to Cinder

TUNING FOR HARMONYIF YOU MUST USE MAGNETIC MEDIA

Reducing seeks on magnetic pools

• RBD cache is safe

• RAID Controllers with write-back cache

• SSD Journals

• Software caches

HW ARCHITECTURE

CONSIDERATIONS

ARCHITECTURAL CONSIDERATIONSUNDERSTANDING THE WORKLOAD

Traditional Ceph Workload

• $/GB

• PBs

• Unstructured data

• MB/sec

MySQL Ceph Workload

• $/IOP

• TBs

• Structured data

• IOPS

NEXT UP

2:20pm – 3:10pm

Room 203

MySQL in the CloudHead-to-Head Performance Lab

top related