ceph on rdma

22
Emerging Storage Solutions (EMS) SanDisk Confidential 1 c CEPH Performance on XIO

Upload: somnath-roy

Post on 16-Jul-2015

793 views

Category:

Software


2 download

TRANSCRIPT

Emerging Storage Solutions (EMS) SanDisk Confidential 1c

CEPH Performance on XIO

Emerging Storage Solutions (EMS) SanDisk Confidential 2

Setup

4 OSDs, one per SSD (4TB)

4 pools, 4 rbd images (one per pool)

1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD

Block size = 4K, 100% RR

Working set ~4 TB

Code base is latest ceph master

Server has 40 cores and 64 GB RAM

Shards : thread_per_shard = 25:1

Emerging Storage Solutions (EMS) SanDisk Confidential 3

ResultTransport IOPS BW % of read served

from diskUser%cpu Sys%cpu %idle

TCP ~50K ~200M ~99 ~15 ~12 ~55

RDMA ~130K ~520M ~99 ~40 ~19 ~11

Summary:• ~1.5X performance gain• TCP iops/core = 2777, XIO iops/core = 3651

Emerging Storage Solutions (EMS) SanDisk Confidential 4

Setup

16 OSDs, one per SSD (4TB)

4 pools, 4 rbd images (one per pool)

1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD

Block size = 4K, 100% RR

Working set ~4 TB

Code base is latest ceph master

Server has 40 cores and 64 GB RAM

Shards : thread_per_shard = 25:1, 10:1

Emerging Storage Solutions (EMS) SanDisk Confidential 5

ResultTransport IOPS BW Disk read (%) Cpu usage

cluster Nodes(%idle)

Cpu usage Client nodes(%idle)

Mem usage cluster nodes (%)

TCP ~118K ~470M ~99% ~3 ~26 ~16%

RDMA ~120K ~480M ~99% ~7 ~25 ~28%

Summary:• TCP is catching up; TCP iops/core = 3041, XIO iops/core = 3225 in cluster nodes• More memory consumed by XIO

Emerging Storage Solutions (EMS) SanDisk Confidential 6

Setup

16 OSDs, one per SSD (4TB)

2 hosts, 8 OSDs each

4 pools, 4 rbd images (one per pool)

1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD

Block size = 4K, 100% RR

Working set ~6 TB

Code base is latest ceph master

Server has 40 cores and 64 GB RAM

Shards : thread_per_shard = 25:1, 10:1

Emerging Storage Solutions (EMS) SanDisk Confidential 7

ResultTransport IOPS BW Disk read (%) Cpu usage

cluster Nodes(%idle)

Cpu usage Client nodes(%idle)

Mem usage cluster nodes (%)

TCP ~175K ~700M ~99% ~8 ~18 ~16%

RDMA ~238K ~952M ~99% ~14 ~20 ~28%

Summary:• ~36% performance gain• TCP iops/core = 4755, XIO iops/core = 6918 in cluster nodes• More than 10% memory usage by RDMA

Emerging Storage Solutions (EMS) SanDisk Confidential 8

Setup

32 OSDs, one per SSD (4TB)

2 hosts, 16 OSDs each

4 pools, 4 rbd images (one per pool)

1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD

Block size = 4K, 100% RR

Working set ~6 TB

Code base is latest ceph master

Server has 40 cores and 64 GB RAM

Shards : thread_per_shard = 25:1, 10:1,15:1,5:2

Emerging Storage Solutions (EMS) SanDisk Confidential 9

ResultTransport IOPS BW Disk read (%) Cpu usage

cluster Nodes(%idle)

Cpu usage Client nodes(%idle)

Mem usage cluster nodes (%)

TCP ~214K ~775M ~99% ~9 ~12 ~16%

RDMA ~230K ~870M ~99% ~12 ~18 ~28%

Summary:• TCP is catching up again, not much of gain• TCP iops/core = 2939, XIO iops/core = 3267 in cluster nodes• More emory usage per cluster node

Emerging Storage Solutions (EMS) SanDisk Confidential 10

Did some testing with more powerful setup

8 OSDs, one per SSD (4TB)

4 pools, 4 rbd images (one per pool)

1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD

Block size = 4K, 100% RR

Working set ~4 TB

Code base is latest ceph master

Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM

Shards : thread_per_shard = 25:1

Emerging Storage Solutions (EMS) SanDisk Confidential 11

ResultTransport IOPS BW (% of read served

from disk)Cpu usage cluster Nodes(%idle)

Cpu usage Client nodes(%idle)

Mem usage cluster nodes (%)

TCP ~148K ~505M ~99% ~15 ~68 ~11%

RDMA ~166K ~665M ~99% ~18 ~73 ~19%

Summary:• ~12% performance gain• TCP iops/core = 3109, XIO iops/core = 3616 in cluster nodes.• For client node, TCP iops/core = 8258, XIO iops/core = 10978• More than 8% memory usage by RDMA

Emerging Storage Solutions (EMS) SanDisk Confidential 12

Result no disk hitTransport IOPS BW (% of read served

from disk)Cpu usage cluster Nodes(%idle)

Cpu usage Client nodes(%idle)

Mem usage cluster nodes (%)

TCP ~265K ~1037M ~0 ~35 ~40 ~11%

RDMA ~276K ~1084M ~0 ~60 ~63 ~19%

Summary:• Not much difference throughput wise• But, significant difference here.. TCP iops/core = 7280, XIO iops/core = 12,321 in cluster nodes• More than 8% memory usage by RDMA

Emerging Storage Solutions (EMS) SanDisk Confidential 13

Bumping up OSDs on the same setup

16 OSDs, one per SSD (4TB)

4 pools, 4 rbd images (one per pool)

1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD

Block size = 4K, 100% RR

Working set ~4 TB

Code base is latest ceph master

Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM

Shards : thread_per_shard = 10:1, 4:2, 25:1

Little bit experiment with xio_portal_thread features

Emerging Storage Solutions (EMS) SanDisk Confidential 14

ResultTransport IOPS BW (% of read served

from disk)Cpu usage cluster Nodes(%idle)

Cpu usage Client nodes(%idle)

Mem usage cluster nodes (%)

TCP ~142K ~505M ~99% ~18 ~68 ~18%

RDMA ~166K ~665M ~99% ~18 ~73 ~38%

Summary:• TCP iops/core = 3092, XIO iops/core = 3614 in cluster nodes• TCP iops/core = 7924, XIO iops/core = 10978• More than 2X memory usage by RDMA• No t much scaling between 8 and 16 OSDs for both TCP/RDMA !!! Nothing is saturated at this point.

Emerging Storage Solutions (EMS) SanDisk Confidential 15

Result no disk hitTransport IOPS BW (% of read served

from disk)Cpu usage cluster Nodes(%idle)

Cpu usage Client nodes(%idle)

Mem usage cluster nodes (%)

TCP ~268K ~1049M ~0 ~37 ~37 ~17%

RDMA ~400K (when osdside portal thread = 2, client side = 8)

~1600M ~0 ~40 ~42 ~40%

Summary:

• Well, suspecting some lock contention in the OSD layer, started playing with xio portal threads• With less number of portal threads (2) in the OSD node, bumped up the no disk hit performance to 400K !!• I can see increasing XIO portal threads in OSD layer decreasing performance in this case• Tried with some shard options but TCP remains almost similar to 8 OSD case. Seems like this is a limit.

Emerging Storage Solutions (EMS) SanDisk Confidential 16

Checking the scale out nature

32OSDs, one per SSD (4TB)

2 nodes with 16 OSDs each

4 pools, 4 rbd images (one per pool)

1 physical client box. Total 4 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD

Block size = 4K, 100% RR

Working set ~4 TB

Code base is latest ceph master

Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM

Shards : thread_per_shard = 10:1, 4:2, 25:1

Little bit experiment with xio_portal_thread features

Emerging Storage Solutions (EMS) SanDisk Confidential 17

Result no disk hitTransport IOPS BW (% of read served

from disk)Cpu usage cluster Nodes(%idle)

Cpu usage Client nodes(%idle)

Mem usage cluster nodes (%)

TCP ~323K ~1263M ~0 ~40 ~12 ~18.7%

RDMA ~343K ~1339M ~0 ~55 ~30 ~37.5%

Summary:• TCP is scaling but not XIO ! • In fact it is giving less throughput than 16 OSD setup !• TCP iops/core = 4806, XIO iops/core = 6805 in cluster nodes• TCP iops/core = 6565, XIO iops/core=8750, even more significant in the client nodes• XIO mem usage per node is again ~2X

Emerging Storage Solutions (EMS) SanDisk Confidential 18

ResultTransport IOPS BW (% of read served

from disk)Cpu usage per cluster Nodes(%idle)

Cpu usage Client nodes(%idle)

Mem usage per cluster nodes (%)

TCP ~249K ~973M ~99% ~22 ~18 ~15.5%

RDMA ~258K ~1006M ~99% ~24 ~40 ~38%

Summary:• TCP/XIO similar throughput• TCP iops/core = 5422, XIO iops/core=7678. Significant gain with XIO in client side• XIO mem usage per node is again more than 2X

Emerging Storage Solutions (EMS) SanDisk Confidential 19

Trying out bigger block sizes 32OSDs, one per SSD (4TB)

2 nodes with 16 OSDs each

4 pools, 4 rbd images (one per pool)

1 physical client box. Total 1 fio_rbd clients, each with 8 (num_jobs) * 32 = 256 QD

Couldn’t able to run 4 clients in parallel in case of XIO

Block size = 16K/64K, 100% RR

Working set ~4 TB

Code base is latest ceph master

Server has 56 cores Xeon E5-2697 v3 @ 2.60GHz and 64 GB RAM

Shards : thread_per_shard = 10:1, 4:2, 25:1

Little bit experiment with xio_portal_thread features

Emerging Storage Solutions (EMS) SanDisk Confidential 20

Result(32OSDS,16K,1client )Transport IOPS BW (% of read served

from disk)Cpu usage per cluster Nodes(%idle)

Cpu usage Client nodes(%idle)

Mem usage per cluster nodes (%)

TCP ~150K ~2354M ~99% ~35 ~48 ~15.5%

RDMA ~152K (spiky) ~2355M ~99% ~40 ~60 ~38%

Summary:• TCP/XIO similar throughput• XIO is very spiky • Couldn’t run more than 1 client (8 num_jobs) with XIO.• But, cpu gain is visible

Emerging Storage Solutions (EMS) SanDisk Confidential 21

Result(32OSDS, 1 client)Transport IOPS BW (% of read served

from disk)Cpu usage per cluster Nodes(%idle)

Cpu usage Client nodes(%idle)

Mem usage per cluster nodes (%)

TCP ~53K ~3312M ~99% ~57 ~74 ~15.5%

RDMA ~55K (but spiky) ~3625M ~99% ~57 ~82 ~39%

Summary:• TCP/XIO similar throughput• XIO is very spiky • Couldn’t run more than 1 client (8 num_jobs) with XIO.• But, cpu gain is visible specially in client side

Emerging Storage Solutions (EMS) SanDisk Confidential 22

Summary

Highlights:

– Definite improvement on iops/core

– Single client is much more efficient with XIO messenger

– Lower number of OSDs can give high throughput

– If we can fix the internal XIO messenger contention, it has potential to outperform TCP in a big way

Lowlights:

– TCP is catching up fast with increasing OSDs

– TCP also scaling out well than XIO I guess

– XIO present state is *unstable*, some crash/peering problem

– Startup time for a connection is much higher for XIO

– XIO connection is taking time to stabilize to a fix throughput

– Memory requirement is considerably higher