bring intelligence to your database storage · storage engine database. computational storage for...

15
Bring Intelligence to Your Database Storage Tong Zhang Chief Scientist, ScaleFlux Inc. Professor, Rensselaer Polytechnic Institute (RPI) Flash Memory Summit 2019 Santa Clara, CA 1

Upload: others

Post on 25-Jun-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bring Intelligence to Your Database Storage · Storage Engine Database. Computational Storage for Database 4 1. ... MySQL Server TPC-H Benchmark CentOS v. 7.2.1511 100GB Database

Bring Intelligence to Your

Database Storage

Tong Zhang

Chief Scientist, ScaleFlux Inc.

Professor, Rensselaer Polytechnic Institute (RPI)

Flash Memory Summit 2019

Santa Clara, CA 1

Page 2: Bring Intelligence to Your Database Storage · Storage Engine Database. Computational Storage for Database 4 1. ... MySQL Server TPC-H Benchmark CentOS v. 7.2.1511 100GB Database

Computational Storage

Flash Memory Summit 2019

Santa Clara, CA 2

❑ A very simple and intuitive idea➢ One option to architect a heterogeneous computing fabric

➢ Entertained by the academia over the past 20 years: “Intelligent RAM”, “Active Disk”, …

Benefit Cost Computational

Storage

DRAM CPUsDRAM

NVMe SSDs

Truly optimized SW

(Multi-threading, SIMD, …)

Healthy CMOS scaling

in the Good Old Days

Page 3: Bring Intelligence to Your Database Storage · Storage Engine Database. Computational Storage for Database 4 1. ... MySQL Server TPC-H Benchmark CentOS v. 7.2.1511 100GB Database

Computational Storage

Flash Memory Summit 2019

Santa Clara, CA 3

BenefitCost

❑ Reduce cost:

➢ Make in-storage computation as transparent as possible

➢ Avoid any changes to the core structure/algorithm of applications

❑ Improve benefit:

➢ Focus on mainstream, compute/IO-intensive applications

Computational

StorageOSSQL Engine

Storage

Engine

Database

Page 4: Bring Intelligence to Your Database Storage · Storage Engine Database. Computational Storage for Database 4 1. ... MySQL Server TPC-H Benchmark CentOS v. 7.2.1511 100GB Database

Computational Storage for Database

4

1. Computational storage with in-line transparent compression

FlashControl

NAND Flash

FPGA

In-line zlibcompression & decompression

Linux FS (ext4, XFS, …)

CSD Driver

HW →←SW

Database Processing(I/O-bound)

Flash Memory

Database Processing(less I/O-bound)

Flash Memory

In-line Compression

CPU

300K

IOPS

500K IOPS

Flash Memory

Database Processing(less CPU-bound)

Flash Memory

In-line Compression

CPU

Database Processing(CPU-bound)

Compression

Page 5: Bring Intelligence to Your Database Storage · Storage Engine Database. Computational Storage for Database 4 1. ... MySQL Server TPC-H Benchmark CentOS v. 7.2.1511 100GB Database

Flash Memory Summit 2019

Santa Clara, CA 5

Computational Storage for Database1. Computational storage with in-line transparent compression

Read/Write Sweep: SFX vs NVMe SSDs(Random, 2.5:1 compressible, FIO, 4KB I/Os, 8 Jobs, 32 QD)

IOP

S (k

)

100% Reads

CSD 2000Drive ADrive B

100% Writes

✓ Consistent throughput across R/W mix

✓ No burden on CPU for running compression

✓ Reduced Write Amplification

➢ Better endurance

✓ Performance leadership for key applications

➢ OLTP (65-95% Reads)

➢ Mixed Read/Write (50-90% Reads)

➢ Write-Intensive (50%+ Writes)

Page 6: Bring Intelligence to Your Database Storage · Storage Engine Database. Computational Storage for Database 4 1. ... MySQL Server TPC-H Benchmark CentOS v. 7.2.1511 100GB Database

Flash Memory Summit 2019

Santa Clara, CA 6

Computational Storage for Database2. Computational storage with in-line data filtering

FlashControl

NAND Flash

HW →←SW

DecompressionData filteringTruly intelligent SQL processing

FlashControl

NAND Flash

HW →←SW

DecompressionData filteringTruly intelligent SQL processing

Page 7: Bring Intelligence to Your Database Storage · Storage Engine Database. Computational Storage for Database 4 1. ... MySQL Server TPC-H Benchmark CentOS v. 7.2.1511 100GB Database

Flash Memory Summit 2019

Santa Clara, CA 7

Computational Storage for Database

Empower OLTP-oriented databases with more efficient analytics support

Customized RocksDB

FlashControl

NAND Flash

FPGA

Decompression & Data filtering

I/O

Linux FS (ext4, XFS, …)

CSD Driver

HW →←SWMyRocks

2. Computational storage with in-line data filtering

Page 8: Bring Intelligence to Your Database Storage · Storage Engine Database. Computational Storage for Database 4 1. ... MySQL Server TPC-H Benchmark CentOS v. 7.2.1511 100GB Database

Computational Storage for DatabaseMyRocks appears to be an ideal first target

✓ MySQL built-in support of Engine Condition Pushdown & Index Condition Pushdown

✓ Elegant RocksDB data structure/format ➔ Simplify hardware implementation

Engine Condition Pushdown Index Condition Pushdown

RandomAccessFileReader.PushdownRead

File Access Layer

RandomAccessFileReader.Read

Kernel

SFX NVMe block device

SFX compute interface

storage::rocksdb

Compute

Read/Write

Filtering

R/W

Page 9: Bring Intelligence to Your Database Storage · Storage Engine Database. Computational Storage for Database 4 1. ... MySQL Server TPC-H Benchmark CentOS v. 7.2.1511 100GB Database

Flash Memory Summit 2019

Santa Clara, CA 9

Computational Storage for Database

16

channels

Middle-range Xilinx KU15P 16nm FPGA

Flash Control(soft LDPC)

TBs 3D TLC NAND Flash

Proc. Engine #1

Proc. Engine #3

Proc. Engine #2

Proc. Engine #4

2.4GB/s

3.2GB/s, 750K IOPS

PCIe Gen3x4

➢ Implementation: MySQL 5.6 & RocksDB 5.18

➢ Support in-line Snappy decompression, filter conditions: =, !=, >, <, >=, <=, !Null, Null

➢ Engine condition pushdown (ECP): Direct comparison between non-indexed columns and constants inside computational storage devices

➢ Index condition pushdown (ICP): Condition evaluation on the index inside computational storage devices

Page 10: Bring Intelligence to Your Database Storage · Storage Engine Database. Computational Storage for Database 4 1. ... MySQL Server TPC-H Benchmark CentOS v. 7.2.1511 100GB Database

Flash Memory Summit 2019

Santa Clara, CA 10

Computational Storage for Database

Performance benefit Hardware utilization

Block-by-block

iterative processing

Look-ahead

batch processing

Parallel & batch processing

Sync call Mixed sync/async call

Compressed block

Type: 1-byte

CRC: 4-byte

time

Compressed block

Type: 1-byte

CRC: 4-byte

Type: 1-byte time

16

channels

Middle-range Xilinx KU15P 16nm FPGA

Flash Control(soft LDPC)

TBs 3D TLC NAND Flash

Proc. Engine #1

Proc. Engine #3

Proc. Engine #2

Proc. Engine #4

2.4GB/s

3.2GB/s, 750K IOPS

PCIe Gen3x4

Page 11: Bring Intelligence to Your Database Storage · Storage Engine Database. Computational Storage for Database 4 1. ... MySQL Server TPC-H Benchmark CentOS v. 7.2.1511 100GB Database

11

Computational Storage for Database

0

500

1000

1500

2000

2500

3000

1 2 4 6 5 7 10 12 14 15 19 20

QueryLatency(seconds)

TPC-HQueryNumber

OriginalMyRocks MyRocks+CSD

DualE5 2620 V3@ 2.2GHZ

128GB DRAM

MySQL Server

TPC-H Benchmark

CentOS v. 7.2.1511

100GB Database

MySQL v5.6

3.2TB

Massive Data Movement Savings

Massive CPU Processing Savings

Page 12: Bring Intelligence to Your Database Storage · Storage Engine Database. Computational Storage for Database 4 1. ... MySQL Server TPC-H Benchmark CentOS v. 7.2.1511 100GB Database

12

Computational Storage for Database

DRAMCPU

32 coresDRAM

1 … 8100GB instance per drive

8 client

threads per

instance

60%

CPU vs. FPGA

fixed engine

Parallelize DB Record Pre-Processing Across

CSDs

64%

+ CPU resource

bottlenecks

75%

Page 13: Bring Intelligence to Your Database Storage · Storage Engine Database. Computational Storage for Database 4 1. ... MySQL Server TPC-H Benchmark CentOS v. 7.2.1511 100GB Database

Computational Storage for Database

Memory

Pollution

CPU

contention

DRAM

SSD

OLTP Database

CPU

OLTP Analytics

DRAM

SSD

OLTP Database

CPU

OLTP

DRAM

SSD

AP Database

CPU

Analytics

ETL (extraction,

transformation, load)

13

Empower OLTP-oriented databases with more efficient analytical processing support

Page 14: Bring Intelligence to Your Database Storage · Storage Engine Database. Computational Storage for Database 4 1. ... MySQL Server TPC-H Benchmark CentOS v. 7.2.1511 100GB Database

Computational Storage for Database

DRAM

CSD w/ in-line

data filtering

OLTP Database

CPU

OLTP

14

Analytics

Much less CPU contention & memory pollution ➔ much less impact on OLTP performance

Sysbench OLTP read-only + TPC-H Q6

>2x

TP+AP on SSD TP+AP on CSD TP only

Page 15: Bring Intelligence to Your Database Storage · Storage Engine Database. Computational Storage for Database 4 1. ... MySQL Server TPC-H Benchmark CentOS v. 7.2.1511 100GB Database

15

Summary & Call to Action

Computational Storage for Data-Driven Applications

▪ Computational Storage is Coming finally!

▪ Database: One ideal target of computational storage• General-purpose: In-line transparent compression• Database-specific: In-line predicate pushdown (successful demonstration

with MyRocks on computational storage)

▪ Very large design space to be explored• Cross-layer innovation across software and hardware• Close collaboration across industry sectors