title of presentation - snia...o olap (online analytical processing) not possible to do analytical...

25
2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 1 Computational Storage Make Relational Databases Efficiently Support Data Analytics Yang Liu Chief Architect, ScaleFlux Inc.

Upload: others

Post on 23-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 1

Computational StorageMake Relational Databases Efficiently Support Data Analytics

Yang LiuChief Architect, ScaleFlux Inc.

Page 2: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 2

Agenda

Computational Storage Analytical Workloads with Computational

Storage

Page 3: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 3

The Rise of Computational Storage

Domain Specific Compute

Compute

FPGA/GPU/TPU

End of Moore’s Law

Networking

SmartNICs

10 → 100-400Gb/s

Storage

Computational Storage

Fast & Big Data Growth

Tremendous Value-Add to Applications & Infrastructure

Scale performance with increasing workload capacities

Optimize total infrastructure footprint

Highly adaptable to evolving modern applications

Page 4: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 4

Benefits of Computational Storage

By putting computational engines near storageo More efficient heterogeneous computing o Better Performance (lower latency and higher throughput)o Reduced IO traffico Offload CPUo Scalable performance with increasing workload capacitieso Optimize total infrastructure footprint (TCO)o A lot more!

Page 5: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 5

Computational Storage Unifies Fast & Big Data Low-Latency FAST Data

Transactions Scale BIG Data analytics

workloads with capacity Commodity infrastructure

Intensive ComputeExtract/TransformRecord Filtering

Page 6: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 6

Computational Storage for Database

6

FlashControl

NAND Flash

FPGA

In-line zlibcompression & decompression

Linux FS (ext4, XFS, …)

CSD Driver

HW →← SW

Database Processing(I/O-bound)

Flash Memory

Database Processing(less I/O-bound)

Flash Memory

In-line Compression

CPU

300K IOPS

500K IOPS

Flash Memory

Database Processing(less CPU-bound)

Flash Memory

In-line Compression

CPUDatabase Processing

(CPU-bound)Compression

1. Computational storage with in-line transparent compression

Page 7: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 7

Computational Storage for Database2. Computational storage with in-line data filtering

FlashControl NAND Flash

HW →← SW

DecompressionData filteringTruly intelligent SQL processing

FlashControl NAND Flash

HW →← SW

DecompressionData filteringTruly intelligent SQL processing

Page 8: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 8

Accelerating Analytical Workloads with Computational Storage

Page 9: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 9

OLTP Vs. OLAP

Database Workloado OLTP (Online Transactional Processing)o OLAP (Online Analytical Processing)

NOT possible to do analytical processing on TP database, even for moderate level analytical queryo Cache pollution / CPU contention / High system utilizationo Unacceptable Quality of Service (QoS)!

Page 10: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 10

ETL Complications

ETL (Extraction, Transformation, Load)o Data formats changing over time

row based Vs. column basedo Complicated to learn/maintain/integrationo Cannot query on real time datao Concurrency (Isolation control)o Etc…

Page 11: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 11

Database Pushdown

Common database query

o Projection: SELECTo Predicate (Filter): WHERE conditions

Database pushdown:o Pushdown projection, predication or other query operations

from SW query engine to SW storage engine Storage Node in distributed database system Storage controller for computational storage system

SELECT col1, col2, ... FROM table WHERE conditions

Page 12: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 12

MyRocks Pushdown MySQL built-in support of Engine Condition Pushdown & Index Condition Pushdown

Elegant RocksDB data structure/format Simplify hardware implementation

Engine Condition Pushdown Index Condition Pushdown

RandomAccessFileReader.PushdownRead

File Access LayerRandomAccessFileReader.Read

Kernel

SFX NVMe block device

SFX compute interface

storage::rocksdb

Compute

Read/Write

Filtering

R/W

Page 13: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 13

HW Database Pushdown

16 channels

Middle-range Xilinx KU15P 16nm FPGA

Flash Control(soft LDPC)

TBs 3D TLC NAND Flash

Proc. Engine #1

Proc. Engine #3

Proc. Engine #2

Proc. Engine #4

2.4GB/s

3.2GB/s, 750K IOPS

PCIe

Implementation: MySQL 5.6 & RocksDB 5.18

Support in-line Snappy decompression, filter conditions: =, !=, >, <, >=, <=, !Null, Null

Engine condition pushdown (ECP): Direct comparison between non-indexed columns and constants inside computational storage devices

Index condition pushdown (ICP): Condition evaluation on the index inside computational storage devices

Page 14: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 14

HW Database Pushdown

Performance benefit Hardware utilization Parallel & batch processing

16 channels

Middle-range Xilinx KU15P 16nm FPGA

Flash Control(soft LDPC)

TBs 3D TLC NAND Flash

Proc. Engine #1

Proc. Engine #3

Proc. Engine #2

Proc. Engine #4

2.4GB/s

3.2GB/s, 750K IOPS

PCIe

Page 15: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 15

Implementation Considerations

Block-by-block iterative processing Look-ahead batch processing Sync call Mixed sync/async call Error tolerance Recovery from computational errors and no impacts normal storage IO

Page 16: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 16

Data Block Format

Compressed block

Type: 1-byteCRC: 4-byte

time

Compressed block

Type: 1-byteCRC: 4-byte

# of Restarts

time

# of KeysType: 1-byte

Streaming process any size of data blocks without DDRs

Page 17: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 17

OLAP

12345678

12345678

12345678

12345678

Compute

I/O

CPU Computational Storage

Page 18: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 182019 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

ScalabilityMyRocks

768GB DRAM8 host CPU cores, 100GB, 1 drive per DB instance

TPC-H Benchmark Q6

32 cores

Seco

nds

← L

ower

is B

ette

r

…Faster FPGA

engine vs. CPU 60% Faster!

75% Faster!

Scale Performance with Capacity

Page 19: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 19

HTAP: Sysbench (TP) + TPC-H (AP)# TPC-H Analytical Processing Threads

MyRocks SFXDB ImprovementSysbench TPS TPS % Sysbench TPS TPS %

0 16890.9 100.00% 16890.9 100.00% 1.0x4 8932.0 52.88% 15038.81 89.03% 1.7x8 6778.63 40.13% 14929.94 88.39% 2.2x16 5687.97 33.67% 13619.02 80.63% 2.4x24 4605.07 27.26% 12675.46 75.04% 2.8x32 3637.12 21.53% 13300.13 78.74% 3.7x

When there is no TPC-H AP workload on the system, the TPS performance is the same w or w/ computational storage

As the TPC-H AP workload starts to increase, the Sysbench TPS benchmark is negatively impacted in the case where analytical TPC-H queries are executed on the host x86 CPU.

Page 20: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 20

12345678

Compute

I/O

CPU Computational StorageHTAP

Page 21: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 21

Summary & Call to Action

Computational Storage is in production! Database: One ideal target of computational storage

• Successful demonstration with MyRocks on computational storage

Very large design space to be explored• Cross-layer innovation across software and hardware• Close collaboration across industry sectors

Page 22: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 22

Thank You!

Deploying Computational Storage at Scale

Page 23: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 23

Page 24: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 242019 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.

Bullet one Bullet two

Bullet 3 Bullet 4

Page 25: Title of Presentation - SNIA...o OLAP (Online Analytical Processing) NOT possible to do analytical processing on TP database, even for moderate level analytical query o Cache pollution

2019 Storage Developer Conference. © ScaleFlux Inc. All Rights Reserved. 25