larger-than-memory data management on modern storage ... · •apache geode – overflow tables...

21
Larger-than-Memory Data Management on Modern Storage Hardware for In-Memory OLTP Database Systems Lin Ma, Joy Arulraj, Sam Zhao, Andrew Pavlo, Subramanya R. Dulloor, Michael J. Giardino, Jeff Parkhurst, Jason L. Gardner, Kshitij Doshi, Col. Stanley Zdonik

Upload: others

Post on 26-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

Larger-than-Memory Data Managementon Modern Storage Hardware

for In-Memory OLTP Database SystemsLin Ma, Joy Arulraj, Sam Zhao, Andrew Pavlo, Subramanya R. Dulloor,

Michael J. Giardino, Jeff Parkhurst, Jason L. Gardner, Kshitij Doshi,Col. Stanley Zdonik

Page 2: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

2

MOTIVATION

•Allow an in-memory DBMS to store/access data on disk without bringing back all the slow parts of a disk-oriented DBMS.

•Different properties of storage devices may affect important design decisions.

Page 3: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

3

STORAGE TECHNOLOGIES

• 10m Tuples – 1KB each• Synchronization Enabled

HDD SMR SSD 3D XPoint NVRAM DRAM0.01

1100

100001000000

1KB Read 1KB Write 64KB Read 64KB Write

Late

ncy

(nan

osec

)

random sequential

Page 4: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

4

DESIGN DECISIONS

• Hardware independent policies–Cold Tuple Identification–Evicted Tuple Meta-data

• Hardware dependent policies–Cold Tuple Retrieval–Merging Threshold–Access Methods

Page 5: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

5

HARDWARE INDEPENDENT POLICIES

Page 6: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

6

INDEPENDENT POLICIES

•Cold Tuple Identification–Option #1: On-line identification–Option #2: Off-line identification

● Evicted Tuple Meta-data–Option #1: Marker to represent the on-disk position–Option #2: Bloom filter–Option #3: Rely on virtual paging

Page 7: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

7

EVICTED TUPLE META-DATAIn-Memory Table Heap

Tuple #01

Tuple #03

Tuple #04

Tuple #00

Tuple #02

Cold-Data Storage

header

Tuple #01

Tuple #03

Tuple #04

In-Memory Index

Access FrequencyTuple #00Tuple #01Tuple #02Tuple #03Tuple #04Tuple #05

<Block,Offset>

<Block,Offset>

<Block,Offset>

IndexBloom Filter

Page 8: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

8

HARDWARE DEPENDENT POLICIES

Page 9: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

9

Transaction

COLD TUPLE RETRIEVALIn-Memory Table Heap

Tuple #00

Tuple #02

Cold-Data Storage

header

Tuple #03

Tuple #04

● Option #1:Abort-and-Restart

Tuple #01Tuple #01

Tuple #00Read:

?Read: Tuple #01Read:

Tuple #02Read:

Abort

Page 10: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

10

Transaction

COLD TUPLE RETRIEVALIn-Memory Table Heap

Tuple #00

Tuple #02

Cold-Data Storage

header

Tuple #03

Tuple #04

● Option #2:Synchronous Retrieval

Tuple #01Tuple #01

Tuple #00Read:

?Read: Tuple #01Read:

Tuple #02Read:

Stall

Page 11: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

11

• Option #1: Always Merge

• Option #2:Merge Only on Update

• Option #3: Selective Merge

Transaction

MERGING THRESHOLDIn-Memory Table Heap

Tuple #00

Tuple #02

Cold-Data Storage

header

Tuple #03

Tuple #04

Tuple #01

Tuple #00Read:

Tuple #01Read:

Tuple #02Read:

Sketch

Page 12: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

12

ACCESS METHODS

• Option #1: Block-addressable–Block-level access through file system

• Option #2: Byte-addressable (NVRAM)–Use mmap through a filesystem designed for byte-addressable

NVRAM (PMFS)–Directly operate on NVRAM-resident data as if it existed in

DRAM

Page 13: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

13

EVALUATION

• Compare design decisions in H-Store with anti-caching.

• Storage Devices:–Hard-Disk Drive (HDD)–Shingled Magnetic Recording Drive (SMR)–Solid-State Drive (SSD)–3D XPoint (3DX)–Non-volatile Memory (NVRAM)

Page 14: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

14

COLD TUPLE RETRIEVAL

• YCSB Workload – 90% Reads / 10% Writes• 10GB Database using 1.25GB Memory

HDD SMR SDD 3DX NVM0

50000

100000

150000

200000

Abort and Restart Synchronous Retrieval

Thro

ughp

ut (t

xn/s

ec)

large blocks small blocks

Page 15: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

15

MERGING THRESHOLD

• YCSB Workload – 90% Reads / 10% Writes• 10GB Database using 1.25GB Memory

HDD (AR) HDD (SR)0

50000

100000

150000

200000

Merge (Update-Only) Merge (Top-5%)Merge (Top-20%) Merge (All)

Thro

ughp

ut (t

xn/s

ec)

Improvement:● SMR: 30%● SSD: 17%● 3DX: 21%● NVRAM: 10%

Page 16: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

17

CONFIGURATION COMPARISON

• Generic Configuration (2013 Anti-caching)–Abort-and-Restart Retrieval–Merge (All) Threshold–1024 KB Block Size

• Optimized Configuration–Synchronous Retrieval–Top-5% Merge Threshold–Block Sizes (HDD/SMR-1024 KB) (SSD/3DX-16 KB)–Byte-addressable access for NVRAM

Page 17: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

18

GENERIC VS OPTIMIZED

HDD SMR SSD 3D XPoint NVRAM0

100000

200000

Generic Optimized

VOTER

Txn

/ se

c

HDD SMR SSD 3D XPoint NVRAM0

200000

400000

TATP

Txn

/ se

c

DRAM

Page 18: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

19

CONCLUSION

• Low-latency storage devices: Smaller block sizes and synchronous retrieval

• Constraints on merge frequency improve performance

• The performance of NVRAM is as good as pure DRAM if treated correctly

Page 19: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

20

[email protected]

1-844-88-CMUDB

END

Page 20: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

21

REAL-WORLD IMPLEMENTATIONS

•H-Store – Anti-Caching•Microsoft Hekaton –Project Siberia• EPFL’s VoltDB Prototype•Apache Geode – Overflow Tables•MemSQL – Columnar Tables• SolidDB•P*TIME

Page 21: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload

22

MERGING THRESHOLD

• YCSB Workload – 90% Reads / 10% Writes• 10GB Database using 1.25GB Memory