a light-weight compaction tree to reduce i/o amplification...

26
A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores Ting Yao 1 , Jiguang Wan 1 , Ping Huang 2 , Xubin He 2 , Qingxin Gui 1 , Fei Wu 1 , and Changsheng Xie 1 1 Wuhan National Laboratory for Optoelectronics Huazhong University of Science and Technology 2 Temple University

Upload: others

Post on 21-May-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores

T i n g Y a o 1 , J i g u a n g W a n 1 , P i n g H u a n g 2 , X u b i n H e 2 , Q i n g x i n G u i 1 , F e i W u 1 ,

a n d C h a n g s h e n g X i e 1

1W u h a n N a t i o n a l L a b o r a t o r y f o r O p t o e l e c t r o n i c s

H u a z h o n g U n i v e r s i t y o f S c i e n c e a n d T e c h n o l o g y

2T e m p l e U n i v e r s i t y

Page 2: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

Outline➢Background

➢LWC-tree (Light-Weight Compaction tree)

➢LWC-store on SMR drives

➢Experiment Result

➢Summary

Page 3: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

BackgroundKey-value stores are widespread in modern data centers.Better service quality

Responsive user experience

The log-structured merge tree (LSM-tree) is widely deployed.RocksDB, Cassandra, Hbase, PNUTs, and

LevelDB

High throughput write

Fast read performance

LogL0

L1

Ln

Memory

Disk

2

14

Write (key, value)

compaction

3

L2

………

Immutable MemTableMemTableSSTable

Page 4: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

Background · LSM-tree

read

ba c

a-c

sort

③writeL0

L1

Ln

L2

L0

L1

Ln

………

L2

Memory

Disk

ba c

a-c

The overall read and write data size for a compaction: 8 tables 13x

Above 10x

This serious I/O amplifications of compactions motivate our design !

Page 5: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

➢Background

➢LWC-tree (Light-Weight Compaction tree)

➢LWC-store on SMR drives

➢Experiment Result

➢Summary

Page 6: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

LWC-treeAim

Alleviate the I/O amplification

Achieve high write throughput

No sacrifice to read performance

How

Keep the basic component of LSM-tree

Keep tables sorted

Keep the multilevel structure

➢Light-weight compaction – reduce I/O amplification

➢Metadata aggregation – reduce random read in a compaction

➢New table structure, DTable – improve lookup efficient within a table

➢Workload balance – keep the balance of LWC-tree

Page 7: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

LWC-tree · Light-weight compaction

Aim

Reduce I/O amplification

How

append the data and only merge the metadata

➢Read the victim table

➢Sort and divide the data, merge the metadata

➢Overwrite and append the segment

Reduce 10 x amplification theoretically (AF=10) (In LSM-tree, the overall data

size for a conventional compaction: 8 tables.)

L1

L2ba

L0

L1

Ln

………

c

a-c

Memory

Disk

① read

a-csort

a’ b’ c’

③Overwrite and append

a b c

The overall read and write data size for a light-weight compaction: 2 tables.

Page 8: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

LWC-tree · Metadata Aggregation

Aim Reduce random read in a compaction

Efficiently obtain the metadata form overlapped Dtables

HowCluster the metadata of overlapped

DTables to its corresponding victim Dtableafter each compaction

Li+1ba

Li

c

a-ca-c

b

b’

c

c’

a

a’

A light-weight compaction

Page 9: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

LWC-tree · Metadata Aggregation

Aim Reduce random read in a compaction

Efficiently obtain the metadata form overlapped Dtables

HowCluster the metadata of overlapped

DTables to its corresponding victim Dtableafter each compaction

Li+1ba

Li

c

a-ca-c

b

b’

c

c’

a

a’

Metadata aggregation after light-weight compaction

Page 10: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

LWC-Tree · DTableAimSupport Light-weight compaction

Keep the lookup efficiency within a DTable

HowStore the metadata of its

corresponding overlapped Dtables

Manage the data and block index in segment

Origin data

Segment 2(append data)

Segment 1(append data)

Metadata

DtableOverlapped Dtables

Metadata Data_block i

Data_block i+1

……

Filter blocks

footer

Index block

Meta_index block

Overlapped Meta_index block

Origin indexSegment 1

IndexSegment 2

IndexIndex block

Magic data

Index block in segment

Page 11: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

LWC-Tree · Workload Balance

AimKeep the balance of LWC-tree

Improve the operation efficiency

HowDeliver the key range of the

overly-full table to its siblings after light-weight compaction

Advantageno data movement and no

extra overhead

Dat

a vo

lum

e

DTable number in Level Li

1 2 3 4 5 6 7 n

Data block

8

ba

Li

Li+1c

da-c

…d ba c

c-da-b

d

Light-weight compaction

Range adjustment

… …

1 2 1 2

Page 12: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

➢Background

➢LWC-tree (Light-Weight Compaction tree)

➢LWC-store on SMR drives

➢Experiment Result

➢Summary

Page 13: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

LevelDB on SMR DrivesSMR(Shingled Magnetic Recording)

Overlapped tracks

Band & Guard Regions

Random write constrain

LevelDB on SMR

Multiplicative I/O amplification

Figure from Fast 2015 “Skylight – A Window on Shingled Disk Operation”

Page 14: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

LevelDB on SMR DrivesSMR(Shingled Magnetic Recording)

Overlapped tracks

Band & Guard Regions

Random write constrain

LevelDB on SMR

Multiplicative I/O amplification 9.7310.07 9.83 9.72 9.86

25.22 39.8952.85

62.14

76.59

0

20

40

60

80

100

20 30 40 50 60

Am

plif

icat

ion

RA

tio

band size (MB)

WAMWA

Band size 40 MB

WA(Write amplification of LevelDB) 9.83x

AWA (Auxiliary write amplification of SMR) 5.39x

MWA (Multiplicative write amplification of LevelDB on SMR ) 52.58x

This auxiliary I/O amplifications of SMR motivate our implementation!

Page 15: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

LWC-store on SMR drive

Aim Eliminate the auxiliary I/O

amplification of SMR

Improve the overall performance

How A DTable is mapped to a band in

SMR drive

Segment appends to the band and overlaps the out-of-date metadata

Equal division: Divide the DTableoverflows a band into several sub-tables in the same level

Page 16: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

➢Background

➢LWC-tree (Light-Weight Compaction tree)

➢LWC-store on SMR drives

➢Experiment Result

➢Summary

Page 17: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

Configuration

1. LevelDB on HDDs (LDB-hdd)

2. LevelDB on SMR drives (LDB-smr)

3. SMRDB*• An SMR drive optimized key-value store• reduce the LSM-tree levels to only two levels (i.e., L0 and L1)

• the key range of the tables at the same level overlapped

• match the SSTable size with the band size

4. LWC-store on SMR drives (LWC-smr)

Experiment PerimeterTest Machine 16 Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz processors

SMR Drive Seagate ST5000AS0011

CMR HDD Seagate ST1000DM003

SSD Intel P3700

Page 18: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

Experiment · Load (100GB data)

Random load Large amount of compactions

LWC-store 9.80x better than LDB-SMR

LWC-store 4.67x better than LDB-HDD

LWC-store 5.76x better than SMRDB

Sequential loadNo compaction

Similar Sequential load performance

12.20

42.90 45.00 45.50

0.00

10.00

20.00

30.00

40.00

50.00

LDB-SMR LDB-HDD SMRDB LWC-SMRSe

q.lo

adth

rou

ghp

ut

(MB

/s)

Sequential put

1.00 2.10 1.70

9.80

0.00

2.00

4.00

6.00

8.00

10.00

12.00

LDB-SMR LDB-HDD SMRDB LWC-SMR

Rn

d.lo

ad t

hro

ugh

pu

t (M

B/s

) Random put

4.67x

Page 19: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

Experiment · read (100K entries)

30.58

28.91 28.88 28.65

27.50

28.00

28.50

29.00

29.50

30.00

30.50

31.00

LDB-SMR LDB-HDD SMRDB LWC-SMR

Rn

d.r

ead

late

ncy

(m

s)

Random get

15.50 12.60

20.50

26.90

0.00

5.00

10.00

15.00

20.00

25.00

30.00

LDB-SMR LDB-HDD SMRDB LWC-SMR

Seq

.rea

d t

hro

ugh

pu

t (M

B/s

)

Sequential get

Look-up 100K KV entries against a 100GB random load database

Page 20: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

Experiment · compaction (randomly load 40GB data)

Compaction performance in microscope LevelDB: number of compactions is

large

SMRDB: data size of each compaction is large

LWC-tree: small number of compactions and small data size

Overall compaction time LWC-smr gets the highest efficiency

Page 21: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

Experiment · compaction (randomly load 40GB data)

35640

19227

49298

5128

0

10000

20000

30000

40000

50000

LDB-SMR LDB-HDD SMRDB LWC-SMR

Ove

rall

com

p t

ime

(s)

Overall compaction time

Compaction performance in microscope LevelDB: number of compactions is

large

SMRDB: data size of each compaction is large

LWC-tree: small number of compactions and small data size

Overall compaction time LWC-smr gets the highest efficiency

Page 22: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

Experiment · Write amplificationCompetitors • LWC-SMR

• LDB-SMR

Write amplification (WA)• Write amplification of KV-store

Auxiliary write amplification (ARA)• Auxiliary write amplification of SMR

multiplicative write amplification (MWA)• Multiplicative write amplification of KV stores

on SMR

2.28 1.68 1.68 1.63 1.63 25.22

39.89

52.85 62.14

76.59

0.0010.0020.0030.0040.0050.0060.0070.0080.0090.00

20MB 30MB 40MB 50MB 60MBMu

ltip

licat

ive

wri

te a

mp

lific

atio

n

Band Size

LWC-SMR.MWA LDB-SMR.MWA2

.24

1.5

6

1.4

7

1.3

9

1.3

9

9.7

3

10

.07

9.8

3

9.7

2

9.8

6

1.0

2

1.0

8

1.1

4

1.1

7

1.1

7

2.5

9

3.9

6

5.3

8

6.3

9

7.7

7

0.00

2.00

4.00

6.00

8.00

10.00

12.00

20MB 30MB 40MB 50MB 60MB

Wri

te a

mp

lific

atio

n

Band Size

LWC-SMR.WA LDB-SMR.WA LWC-SMR.AWA LDB-SMR.AWA

38.12x

Page 23: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

Experiment · LWC-store on HDD and SSD

Page 24: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

➢Background

➢LWC-tree (Light-Weight Compaction tree)

➢LWC-store on SMR drives

➢Experiment Result

➢Summary

Page 25: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

Summary

LWC-tree: A variant of LSM-tree

Light-weight compaction – Significantly reduce the I/O amplification of compaction

LWC-store on SMR drive

Data management in SMR drive – eliminate the auxiliary I/O amplification from SMRdrive

Experiment result

high compaction efficiency

high write efficiency

Fast read performance same as LSM-tree

Wide applicability

Page 26: A Light-weight Compaction Tree to Reduce I/O Amplification ...storageconference.us/2017/Presentations/CompactionTreeToReduce… · A Light-weight Compaction Tree to Reduce I/O Amplification

Thank you!QUESTIONS?

Email: [email protected]