ssd deployment strategies for mysql

52
1 Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database SSD Deployment Strategies for MySQL Yoshinori Matsunobu Lead of MySQL Professional Services APAC Sun Microsystems [email protected]

Upload: yoshinori-matsunobu

Post on 15-Jan-2015

25.594 views

Category:

Technology


1 download

DESCRIPTION

Slides for MySQL Conference & Expo 2010: http://en.oreilly.com/mysql2010/public/schedule/detail/13519

TRANSCRIPT

Page 1: SSD Deployment Strategies for MySQL

1Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

SSD Deployment Strategies for MySQL

Yoshinori Matsunobu

Lead of MySQL Professional Services APACSun Microsystems

[email protected]

Page 2: SSD Deployment Strategies for MySQL

2Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

What do you need to consider? (H/W layer)

• SSD or HDD?• Interface

– SATA/SAS or PCI-Express?• RAID

– H/W RAID, S/W RAID or JBOD? • Network

– Is 1GbE enough? • Memory

– Is 2GB RAM + PCI-E SSD faster than 64GB RAM + 8HDDs?

• CPU– Nehalem or older Xeon?

Page 3: SSD Deployment Strategies for MySQL

3Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

What do you need to consider?

• Redundancy– RAID– DRBD (network mirroring)– Semi-Sync MySQL Replication– Async MySQL Replication

• Filesystem– ext3, xfs, raw device ?

• File location– Data file, Redo log file, etc

• SSD specific issues– Write performance deterioration– Write endurance

Page 4: SSD Deployment Strategies for MySQL

4Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Why SSD? IOPS!

• IOPS: Number of (random) disk i/o operations per second

• Almost all database operations require random access– Selecting records by index scan– Updating records– Deleting records– Modifying indexes

• Regular SAS HDD : 200 iops per drive (disk seek & rotation is slow)

• SSD : 2,000+ (writes) / 5,000+ (reads) per drive– highly depending on SSDs and device drivers

• Let’s start from basic benchmarks

Page 5: SSD Deployment Strategies for MySQL

5Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Tested HDD/SSD for this session

• SSD– Intel X25-E (SATA, 30GB, SLC)– Fusion I/O (PCI-Express, 160GB, SLC)

• HDD– Seagate 160GB SAS 15000RPM

Page 6: SSD Deployment Strategies for MySQL

6Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Table of contents

• Basic Performance on SSD/HDD– Random Reads– Random Writes– Sequential Reads– Sequential Writes– fsync() speed– Filesystem difference– IOPS and I/O unit size

• MySQL Deployments

Page 7: SSD Deployment Strategies for MySQL

7Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Random Read benchmark

• HDD: 196 reads/s at 1 i/o thread, 443 reads/s at 100 i/o threads• Intel : 3508 reads/s at 1 i/o thread, 14538 reads/s at 100 i/o threads• Fusion I/O : 10526 reads/s at 1 i/o thread, 41379 reads/s at 100 i/o threads• Single thread throughput on Intel is 16x better than on HDD, Fusion is 25x better• SSD’s concurrency (4x) is much better than HDD’s (2.2x)• Very strong reason to use SSD

Direct Random Read IOPS (Single Drive, 16KB, xfs)

05000

100001500020000

250003000035000

4000045000

1 2 3 4 5 6 8 10 15 20 30 40 50 100 200

# of I/O threads

IOP

S HDD

Intel SSD

Fusion I/O

Page 8: SSD Deployment Strategies for MySQL

8Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

High Concurrency

• Single SSD drive has multiple NAND Flash Memory chips (i.e. 40 x 4GB Flash Memory = 160GB)

• Highly depending on I/O controller and Applications– Single threaded application can not gain concurrency advantage

Page 9: SSD Deployment Strategies for MySQL

9Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

PCI-Express SSD

• Advantage– PCI-Express is much faster interface than SAS/SATA

• (current) Disadvantages– Most motherboards have limited # of PCI-E slots– No hot swap mechanism

North BridgePCI-Express Controller

SSD I/O Controller

Flash

CPU

South BridgeSAS/SATA Controller

2GB/s (PCI-Express x 8) 300MB/s

SSD I/O Controller

Flash

Page 10: SSD Deployment Strategies for MySQL

10Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Write performance on SSD

• Very strong reason to use SSD• But wait.. Can we get a high write throughput *anytime*?

– Not always.. Let’s check how data is written to Flash Memory

Random Write IOPS (16KB Blocks)

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

HDD(4 RAID10 xfs) Intel(xfs) Fusion (xfs)

1 i/o thread

100 i/o threads

Page 11: SSD Deployment Strategies for MySQL

11Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Understanding how data is written to SSD (1)

• Single SSD drive consists of many flash memory chips (i.e. 2GB)• A flash memory chip internally consists of many blocks (i.e. 512KB)• A block internally consists of many pages (i.e. 4KB)• It is *not* possible to overwrite to a non-empty block

– Reading from pages is possible– Writing to pages in an empty block is possible– Appending is possible– Overwriting to pages in a non-empty block is *not* possible

Flash memory chips

Block (empty)

Page

Page

Block (empty)

Block (empty) Block

….

Page 12: SSD Deployment Strategies for MySQL

12Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Understanding how data is written to SSD (2)

• Overwriting to a non-empty block is not possible• Writing new data to an empty block instead• Writing to a non-empty block is fast (-200 microseconds)• Even though applications write to same positions in same files (i.e. InnoDB Log

File), written pages/blocks are distributed (Wear-Leveling)

Block (empty)

Page

Block (empty)

Block (empty)

….

Page

New data×Block

Page 13: SSD Deployment Strategies for MySQL

13Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Understanding how data is written to SSD (3)

• In the long run, almost all blocks will be fully used– i.e. Allocating 158GB files on 160GB SSD

• New empty block must be allocated on writes• Basic steps to write new data:

– 1. Reading all pages from a block– 2. ERASE the block– 3. Writing all data w/ new data into the block

• ERASE is very expensive operation (takes a few milliseconds)• At this stage, write performance becomes very slow because of

massive ERASE operations

New

PBlock

P

1. Reading all pages

2. Erasing the block

P

P

3. Writing all data

PBlock

P

PBlock

P

PBlock

P

PBlock

P

Block

P

PBlock

P

PBlock

P

PBlock

P

New

P

P

P

Block

Page 14: SSD Deployment Strategies for MySQL

14Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Reserved Space

• To keep high enough write performance, SSDs have a feature of “reserved space”

• Data size visible to applications is limited to the size of data space– i.e. 160GB SSD, 120GB data space, 40GB reserved space

• Fusion I/O has a functionality to change reserved space size– # fio-format -s 96G /dev/fct0

PBlock

P

PBlock

P

PBlock

P

PBlock

P

Block

P

PBlock

P

PBlock

P

PBlock

P

PBlock

P

Data Space

New data

Block (empty)

Block (empty)

1. Reading pages

Background jobs ERASE unused blocks

2. Writing data

P

P

Reserved Space

Block

Page 15: SSD Deployment Strategies for MySQL

15Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Write performance deterioration

• At the beginning, write IOPS was close to “Fastest” line• When massive writes happened, write IOPS gradually deteriorated toward

“Slowest” line (because massive ERASE happened)• Increasing reserved space improves steady-state write throughput• Write IOPS recovered to “Fastest” when stopping writes for a long time

(Many blocks were ERASEd by background job)• Highly depending on Flash memory and I/O controller (TRIM support,

ERASE scheduling, etc)

Write IOPS deterioration (16KB random write)

0

5000

10000

15000

20000

25000

30000

Intel Fusion(150G) Fusion(120G) Fusion(96G) Fusion(80G)

IOP

S Fastest

Slowest

Continuous write-intensive workloads

Stopping writing for a while

Page 16: SSD Deployment Strategies for MySQL

16Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Sequential I/O

• Typical scenario: Full table scan (read), logging/journaling (write)• SSD outperforms HDD for sequential reads, but less significant• HDD (4 RAID10) is fast enough for sequential i/o• Data transfer size by sequential writes tends to be huge, so you

need to care about write deterioration on SSD• No strong reason to use SSD for sequential writes

Sequential Read/Write throughput (1MB consecutive reads/writes)

0

100

200

300

400

500

600

4 HDD(raid10, xfs) Intel(xfs) Fusion(xfs)

MB

/s

Seq read

Seq write

Page 17: SSD Deployment Strategies for MySQL

17Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

fsync() speed

• 10,000+ fsync/sec is fine in most cases• Fusion I/O was CPU bound (%system), not I/O bound

(%iowait).

fsync speed

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

HDD(xfs) Intel (xfs) Fusion I/O(xfs)

fsync/se

c

1KB

8KB

16KB

Page 18: SSD Deployment Strategies for MySQL

18Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

HDD is fast for sequential writes / fsync

• Best Practice: Writes can be boosted by using BBWC (Battery Backed up Write Cache), especially for REDO Logs (because it’s sequentially written)

• No strong reason to use SSDs here

disk

Write cache

disk

seek & rotation time

seek & rotation time

Page 19: SSD Deployment Strategies for MySQL

19Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Filesystem matters

• On xfs, multiple threads can write to the same file if opened with O_DIRECT, but can not on ext*

• Good concurrency on xfs, close to raw device• ext3 is less optimized for Fusion I/O

Random write iops (16KB Blocks)

02000400060008000

100001200014000160001800020000

Fusion(ext3) Fusion (xfs) Fusion (raw)

Filesystem

iops 1 thread

16 thread

Page 20: SSD Deployment Strategies for MySQL

20Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Changing I/O unit size

• On HDD, maximum 22% performance difference was found between 1KB and 16KB

• No big difference when concurrency < 10

Read IOPS and I/O unit size (4 HDD RAID10)

0

500

1000

1500

2000

2500

1 2 3 4 5 6 8 10 15 20 30 40 50 100 200

concurrency

IOPS

1KB

4KB

16KB

Page 21: SSD Deployment Strategies for MySQL

21Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Changing I/O unit size on SSD

• Huge difference• On SSDs, not only IOPS, but also I/O transfer size matters• It’s worth considering that Storage Engines support

“configurable block size” functionality

Read IOPS and I/O unit size (Fusion I/O)

0

50000

100000

150000

200000

1 2 3 4 5 6 8 10 15 20 30 40 50 100 200

concurrency

IOP

S 1KB

4KB

16KB

Page 22: SSD Deployment Strategies for MySQL

22Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Let’s start MySQL benchmarking• Base: Disk-bound application (DBT-2) running on:

– Sun Fire X4270– Nehalem 8 Core– 4 HDD– RAID1+0, Write Cache with Battery

• What will happen if …– Replacing HDD with Intel SSD (SATA)– Replacing HDD with Fusion I/O (PCI-E)– Moving log files and ibdata to HDD– Not using Nehalem– Using two Fusion I/O drives with Software RAID1– Deploying DRBD protocol B or C

• Replacing 1GbE with 10GbE– Using MySQL 5.5.4

Page 23: SSD Deployment Strategies for MySQL

23Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

DBT-2 condition

• SuSE Enterprise Linux 11, xfs• MySQL 5.5.2M2 (InnoDB Plugin 1.0.6)• 200 Warehouses (20GB – 25GB hot data)• Buffer pool size

– 1GB– 2GB– 5GB– 30GB (large enough to cache all data)

• 1000 seconds warm up time• Running 3600 seconds (1 hour)• Fusion I/O: 96GB data space, 64GB reserved space

Page 24: SSD Deployment Strategies for MySQL

24Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

HDD vs Intel SSD

• Storing all data on HDD or Intel SSD• Massive disk i/o happens

– Random reads for all accesses– Random writes for updating rows and indexes– Sequential writes for REDO log files, etc

• SSD is very good at these kinds of workloads• 5.5 times performance improvement, without any application

change!

HDD Intel

Buffer pool 1G 1125.44 5709.06 (NOTPM: Transactions per minute)

Page 25: SSD Deployment Strategies for MySQL

25Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

HDD vs Intel SSD vs Fusion I/O

• Fusion I/O is a PCI-E based SSD• PCI-E is much faster than SAS/SATA• 14x improvement compared to 4HDDs

HDD Intel Fusion I/O

Buffer pool 1G 1125.44 5709.06 15122.75

Page 26: SSD Deployment Strategies for MySQL

26Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Which should we spend money, RAM or SSD?

• Increasing RAM (buffer pool size) reduces random disk reads– Because more data are cached in the buffer pool

• If all data are cached, only disk writes (both random and sequential) happen

• Disk writes happen asynchronously, so application queries can be much faster

• Large enough RAM + HDD outperforms too small RAM + SSD

HDD Intel Fusion I/O

Buffer pool 1G 1125.44 5709.06 15122.75Buffer pool 2G 1863.19

Buffer pool 5G 4385.18

Buffer pool 30G(Caching all hot data)

36784.76

Page 27: SSD Deployment Strategies for MySQL

27Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Which should we spend money, RAM or SSD?

• It is not always possible to cache all hot data• Fusion I/O + good amount of memory (5GB) was pretty good

• Basic rule can be: – If you can cache all active data, large enough RAM + HDD– If you can’t, or if you need extremely high throughput, spending on

both RAM and SSD

HDD Intel Fusion I/O

Buffer pool 1G 1125.44 5709.06 15122.75Buffer pool 2G 1863.19 7536.55 20096.33Buffer pool 5G 4385.18 12892.56 30846.34Buffer pool 30G(Caching all hot data)

36784.76 - 57441.64

Page 28: SSD Deployment Strategies for MySQL

28Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Let’s think about MySQL file location• SSD is extremely good at random reads• SSD is very good at random writes• HDD is good enough at sequential reads/writes• No strong reason to use SSD for sequential writes

• Random I/O oriented:– Data Files (*.ibd)

• Sequential reads if doing full table scan– Undo Log, Insert Buffer (ibdata)

• UNDO tablespace (small in most cases, except for running long-running batch)• On-disk insert buffer space (small in most cases, except that InnoDB can not

catch up with updating indexes)

• Sequential Write oriented:– Doublewrite Buffer (ibdata)

• Write volume is equal to *ibd files. Huge– Binary log (mysql-bin.XXXXXX)– Redo log (ib_logfile)– Backup files

Page 29: SSD Deployment Strategies for MySQL

29Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Moving sequentially written files into HDD

• Moving ibdata, ib_logfile, (+binary logs) into HDD• High impact on performance

– Write volume to SSD becomes half because doublewrite area is allocated in HDD

– %iowait was significantly reduced– You can delay write performance deterioration

Fusion I/O Fusion I/O + HDD Up

Buffer pool 1G 15122.75 (us=25%, wa=15%)

19295.94(us=32%, wa=10%)

+28%

Buffer pool 2G 20096.33(us=30%, wa=12.5%)

25627.49(us=36%, wa=8%)

+28%

Buffer pool 5G 30846.34(us=39%, wa=10%)

39435.25(us=49%, wa=6%)

+28%

Buffer pool 30G 57441.64(us=70%, wa=3.5%)

66053.68(us=77%, wa=1%)

+15%

Page 30: SSD Deployment Strategies for MySQL

30Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Does CPU matter?

• Nehalem has two big advantages1. Memory is directly attached to CPU : Faster for in-memory workloads2. Interface speed between CPU and North Bridge is 2.5x higher, and

interface traffics do not conflict with CPU<->Memory workloads : Faster for disk i/o workloads when using PCI-Express SSDs

Nehalem Older Xeon

CPUs Memory

North Bridge(IOH)

PCI-Express

QPI: 25.6GB/s

CPUs

MemoryNorth Bridge

(MCH)

PCI-Express

FSB: 10.6GB/s

Page 31: SSD Deployment Strategies for MySQL

31Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Harpertown X5470 (older Xeon) vs Nehalem X5570 (HDD)

• CPU difference matters on CPU bound workloads

HDD Harpertown X5470, 3.33GHz

Nehalem(X5570, 2.93GHz)

Up

Buffer pool 1G 1135.37 (us=1%) 1125.44 (us=1%) -1%Buffer pool 2G 1922.23 (us=2%) 1863.19 (us=2%) -3%Buffer pool 5G 4176.51 (us=7%) 4385.18(us=7%) +5%Buffer pool 30G 30903.4 (us=40%) 36784.76 (us=40%) +19%

us: userland CPU utilization

Page 32: SSD Deployment Strategies for MySQL

32Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Harpertown X5470 vs Nehalem X5570 (Fusion)

• TPM difference was much higher than HDD• For disk i/o bound workloads (buffer pool 1G/2G), CPU utilizations

on Nehalem were smaller, but TPM were much higher– Verified that Nehalem is much more efficient for PCI-E workloads

• Benefit from high interface speed between CPU and PCI-Express• Fusion I/O fits with Nehalem much better than with traditional CPUs

Fusion I/O+HDD Harportown X5470, 3.33GHz

Nehalem(X5570, 2.93GHz)

Up

Buffer pool 1G 13534.06 (user=35%) 19295.94 (user=32%) +43%Buffer pool 2G 19026.64 (user=40%) 25627.49 (user=37%) +35%Buffer pool 5G 30058.48 (user=50%) 39435.25 (user=50%) +31%Buffer pool 30G 52582.71 (user=76%) 66053.68 (user=76%) +26%

Page 33: SSD Deployment Strategies for MySQL

33Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

We need to think about redundancy overhead

• Single server + No RAID is meaningless in the real database world

• Redundancy– RAID 1 / 5 / 10– Network mirroring (DRBD)– Replication (Sync / Async)

• Relative overhead for redundancy will be (much) higher than on traditional HDD environment

Page 34: SSD Deployment Strategies for MySQL

34Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Fusion I/O + Software RAID1

• Fusion I/O itself has RAID5 feature– Writing parity bits into Flash Memory– Flash Chips are not Single Point of Failure – Controller / PCI-E Board is Single Point of Failure

• Right now no H/W RAID controller is provided for PCI-E SSDs

• Using Software RAID1 (or RAID10)– Two Fusion I/O drives in the same machine

Page 35: SSD Deployment Strategies for MySQL

35Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Understanding how software RAID1 works

• Response time on Software RAID1 is max(time-to-write-to-disk1, time-to-write-to-disk2)

• If either of the two takes time for ERASE, response time will be longer

• On faster storages / faster writes (i.e. sequential write + fsync), relative overheads of the software raid process are higher

Software RAID daemon“md0_raid1” process

Disk1 Disk2

H/W RAID1

Write cache with battery

Disk1 Disk2

Writing to files on /dev/sdX Response

RAID controllerBackground writes

(in parallel)

Writing to files on /dev/md0

Response

Writing to disks (in parallel)

S/W RAID1App/DB App/DB

Page 36: SSD Deployment Strategies for MySQL

36Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Random Write IOPS, S/W RAID1 vs No-RAID

• 120GB data space = 40GB additional reserved space• 96GB data space = 64GB additional reserved space• On S/W RAID1, IOPS deteriorated more quickly than on No-RAID• On S/W RAID1 with 96GB data space, the slowest line was smaller than No-RAID• 20-25% performance drop can be expected on disk write bound workloads

Random Write IOPS (Fusion I/O 160GB SLC, 16KB I/O unit, XFS)

05000

100001500020000250003000035000400004500050000

1 61 121 181 241 301 361 421 481

Running time (minutes)

IOP

S

No-RAID (120G)

S/W RAID1 (120G)

No-RAID (96G)

S/W RAID1 (96G)

Page 37: SSD Deployment Strategies for MySQL

37Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

What about Reads?

• Theoretically reads IOPS can be twice by RAID1• Peak IOPS was 43636 on No-RAID, 75627 on RAID, 73% up• Good scalability

Read IOPS (16KB Blocks)

0

10000

20000

30000

40000

50000

60000

70000

80000

1 2 3 4 5 6 8 10 15 20 30 40 50 100 200

concurrency

IOP

S No-RAID

S/W RAID1

Page 38: SSD Deployment Strategies for MySQL

38Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

DBT-2, No-RAID vs S/W RAID on Fusion I/O

Fusion I/O+HDD RAID 1 Fusion I/O+HDD

%iowait Down

Buffer pool 1G 19295.94 15468.81 10% -19.8%Buffer pool 2G 25627.49 21405.23 8% -16.5%Buffer pool 5G 39435.25 35086.21 6-7% -11.0%Buffer pool 30G 66053.68 66426.52 0-1% +0.56%

Page 39: SSD Deployment Strategies for MySQL

39Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Intel SSDs with a traditional H/W raid controller

• Raw SSD drives performed much better than using a traditional H/W raid controller

– Even on RAID10 performance was worse than single raw drive– H/W Raid controller seemed serious bottleneck– Make sure SSD drives have write cache and capacitor itself (Intel X25-

V/M/E doesn’t have capacitor)• Use JBOD + write cache + capacitor• Research appliances such as Schooner, Gear6, etc• Wait until H/W vendors release great H/R raid controllers that work well

with SSDs

Single raw Intel Four RAID5 Intel Down

Buffer pool 1G 5709.06 2975.04 -48%Buffer pool 2G 7536.55 4763.60 -37%Buffer pool 5G 12892.56 11739.27 -9%

Page 40: SSD Deployment Strategies for MySQL

40Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

What about DRBD? • Single server is not Highly Available

– Mother Board/RAID Controller/etc are Single Point of Failure• Heartbeat + DRBD + MySQL is one of the most

common HA (Active/Passive) solutions• Network might be a bottleneck

– 1GbE -> 10GbE, InfiniBand, Dolphin Interconnect, etc• Replication level

– Protocol A (async)– Protocol B (sync to remote drbd receiver process)– Protocol C (sync to remote disk)

• Network channel is single threaded– Storing all data under /data (single DRBD partition) => single

thread– Storing log/ibdata under /hdd, *ibd under /ssd => two

threads

Page 41: SSD Deployment Strategies for MySQL

41Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

DRBD Overheads on HDD

• DRBD 8.3.7• DRBD overhead (protocol B) was not big on disk i/o bound

workloads• Network bandwidth difference was not big on disk i/o bound

workloads

HDD No DRBD DRBD Protocol B, 1GbE

DRBD Protocol B, 10GbE

Buffer pool 1G 1125.44 1080.8 1101.63Buffer pool 2G 1863.19 1824.75 1811.95Buffer pool 5G 4385.18 4285.22 4326.22Buffer pool 30G 36784.76 32862.81 35689.67

Page 42: SSD Deployment Strategies for MySQL

42Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

DRBD Overheads on Fusion I/O

• DRBD overhead was not negligible• 10GbE performed much better than 1GbE• Still 6-10 times faster than HDD• Note: DRBD supports faster interface such as InfiniBand SDP

and Dolphin Interconnect

Fusion I/O+HDD No DRBD DRBD Protocol B, 1GbE

Down DRBD Protocol B, 10GbE

Down

Buffer pool 1G 19295.94 5976.18 -69.0% 12107.88 -37.3%

Buffer pool 2G 25627.49 8100.5 -68.4% 16776.19 -34.5%

Buffer pool 5G 39435.25 16073.9 -59.2% 30288.63 -23.2%

Buffer pool 30G 66053.68 37974 -42.5% 62024.68 -6.1%

Page 43: SSD Deployment Strategies for MySQL

43Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Misc topic: Insert performance on InnoDB vs MyISAM (HDD)

• MyISAM doesn’t do any special i/o optimization like “Insert Buffering” so a lot of random reads/writes happen, and highly depending on OS

• Disk seek & rotation overhead is really serious on HDD

Time to insert 1 million records (HDD)

0

1000

2000

3000

4000

5000

1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145

Existing records (millions)

Sec

onds

innodb

myisam

250 rows/s

Page 44: SSD Deployment Strategies for MySQL

44Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Note: Insert Buffering (InnoDB feature)• If non-unique, secondary index blocks are not in

memory, InnoDB inserts entries to a special buffer(“insert buffer”) to avoid random disk i/o operations

– Insert buffer is allocated on both memory and innodb SYSTEM tablespace

• Periodically, the insert buffer is merged into the secondary index trees in the database (“merge”)

• Pros: Reducing I/O overhead– Reducing the number of disk i/o operations by merging i/o

requests to the same block– Some random i/o operations can be sequential

• Cons: Additional operations are added Merging might take a very long time

– when many secondary indexes must be updated and many rows have been inserted.

– it may continue to happen after a server shutdown and restart

Insert buffer

Optimized i/o

Page 45: SSD Deployment Strategies for MySQL

45Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Insert performance: InnoDB vs MyISAM (SSD)

• MyISAM got much faster by just replacing HDD with SSD !

Time to insert 1million records (SSD)

0

100

200

300

400

500

600

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103

Existing records (millions)

Secon

ds

InnoDB

MyISAM

Index size exceeded buffer pool sizeFilesystem cache was fully used, disk reads began

5,000 rows/s

2,000 rows/s

Page 46: SSD Deployment Strategies for MySQL

46Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Try MySQL 5.5.4 !

• Got 20-26% improvements for disk i/o bound workloads on Fusion I/O– Both CPU %user and %iowait were improved

• %user: 36% (5.5.2) to 44% (5.5.4) when buf pool = 2g• %iowait: 8% (5.5.2) to 5.5% (5.5.4) when buf pool = 2g, but iops was

20% higher– Could handle a lot more concurrent i/o requests in 5.5.4 !– No big difference was found on 4 HDDs

• Works very well on faster storages such as Fusion I/O, lots of disks

Fusion I/O + HDD MySQL5.5.2 MySQL5.5.4 UpBuffer pool 1G 19295.94 24019.32 +24%Buffer pool 2G 25627.49 32325.76 +26%Buffer pool 5G 39435.25 47296.12 +20Buffer pool 30G 66053.68 67253.45 +1.8%

Page 47: SSD Deployment Strategies for MySQL

47Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Conclusion for choosing H/W• Disks

– PCI-E SSDs (i.e. Fusion I/O) perform very well– SAS/SATA SSDs (i.e. Intel X25) – Carefully research RAID controller. Many controllers do not

scale with SSD drives– Keep enough reserved space if you need to handle massive

write traffics– HDD is good at sequential writes

• Use fast network adapter– 1GbE will be saturated on DRBD– 10GbE or Infiniband

• Use Nahalem CPU– Especially when using PCI-Express SSDs

Page 48: SSD Deployment Strategies for MySQL

48Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Conclusion for database deployments

• Put sequentially written files on HDD– ibdata, ib_logfile, binary log files– HDD is fast enough for sequential writes– Write performance deterioration can be mitigated– Life expectancy of SSD will be longer

• Put randomly accessed files on SSD– *ibd files, index files(MYI), data files(MYD)– SSD is 10x -100x faster for random reads than HDD

• Archive less active tables/records to HDD– SSD is still much expensive than HDD

• Use InnoDB Plugin– Higher scalability & concurrency matters on faster storage

Page 49: SSD Deployment Strategies for MySQL

49Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

What will happen in the real database world?• These are just my thoughts..

• Less demand for NoSQL– Isn’t it enough for many applications just to replace HDD with Fusion I/O? – Importance on functionality will be relatively stronger

• Stronger demand for Virtualization– Single server will have enough capacity to run two or more mysqld

instances

• I/O volume matters– Not just IOPS– Block size, disabling doublewrite, etc

• Concurrency matters– Single SSD scales as well as 8-16 HDDs– Concurrent ALTER TABLE, parallel query

Page 50: SSD Deployment Strategies for MySQL

50Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Special Thanks To

• Koji Watanabe – Fusion I/O Japan• Hideki Endo – Sumisho Computer Systems, Japan

– Rent me two Fusion I/O 160GB SLC drives

• Daisuke Homma, Masashi Hasegawa - Sun Japan– Did benchmarks together

Page 51: SSD Deployment Strategies for MySQL

51Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database

Thanks for attending!

• Contact:– E-mail: [email protected]– Blog http://yoshinorimatsunobu.blogspot.com– @matsunobu on Twitter

Page 52: SSD Deployment Strategies for MySQL

52Copyright 2010 Sun Microsystems inc The World’s Most Popular Open Source Database