a light-weight compaction tree to reduce i/o amplification ......a light-weight compaction tree to...
Post on 10-Apr-2020
4 Views
Preview:
TRANSCRIPT
A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores
T i n g Y a o 1 , J i g u a n g W a n 1 , P i n g H u a n g 2 , X u b i n H e 2 , Q i n g x i n G u i 1 , F e i W u 1 ,
a n d C h a n g s h e n g X i e 1
1W u h a n N a t i o n a l L a b o r a t o r y f o r O p t o e l e c t r o n i c s
H u a z h o n g U n i v e r s i t y o f S c i e n c e a n d T e c h n o l o g y
2T e m p l e U n i v e r s i t y
Outline➢Background
➢LWC-tree (Light-Weight Compaction tree)
➢LWC-store on SMR drives
➢Experiment Result
➢Summary
BackgroundKey-value stores are widespread in modern data centers.Better service quality
Responsive user experience
The log-structured merge tree (LSM-tree) is widely deployed.RocksDB, Cassandra, Hbase, PNUTs, and
LevelDB
High throughput write
Fast read performance
LogL0
L1
Ln
Memory
Disk
2
14
Write (key, value)
compaction
3
L2
………
…
Immutable MemTableMemTableSSTable
Background · LSM-tree
read
ba c
a-c
①
②
sort
③writeL0
L1
Ln
L2
L0
L1
Ln
………
L2
…
Memory
Disk
ba c
a-c
The overall read and write data size for a compaction: 8 tables 13x
Above 10x
This serious I/O amplifications of compactions motivate our design !
➢Background
➢LWC-tree (Light-Weight Compaction tree)
➢LWC-store on SMR drives
➢Experiment Result
➢Summary
LWC-treeAim
Alleviate the I/O amplification
Achieve high write throughput
No sacrifice to read performance
How
Keep the basic component of LSM-tree
Keep tables sorted
Keep the multilevel structure
➢Light-weight compaction – reduce I/O amplification
➢Metadata aggregation – reduce random read in a compaction
➢New table structure, DTable – improve lookup efficient within a table
➢Workload balance – keep the balance of LWC-tree
LWC-tree · Light-weight compaction
Aim
Reduce I/O amplification
How
append the data and only merge the metadata
➢Read the victim table
➢Sort and divide the data, merge the metadata
➢Overwrite and append the segment
Reduce 10 x amplification theoretically (AF=10) (In LSM-tree, the overall data
size for a conventional compaction: 8 tables.)
L1
L2ba
L0
L1
Ln
………
c
a-c
Memory
Disk
① read
a-csort
a’ b’ c’
③Overwrite and append
a b c
The overall read and write data size for a light-weight compaction: 2 tables.
LWC-tree · Metadata Aggregation
Aim Reduce random read in a compaction
Efficiently obtain the metadata form overlapped Dtables
HowCluster the metadata of overlapped
DTables to its corresponding victim Dtableafter each compaction
Li+1ba
Li
c
a-ca-c
b
b’
c
c’
a
a’
A light-weight compaction
LWC-tree · Metadata Aggregation
Aim Reduce random read in a compaction
Efficiently obtain the metadata form overlapped Dtables
HowCluster the metadata of overlapped
DTables to its corresponding victim Dtableafter each compaction
Li+1ba
Li
c
a-ca-c
b
b’
c
c’
a
a’
Metadata aggregation after light-weight compaction
LWC-Tree · DTableAimSupport Light-weight compaction
Keep the lookup efficiency within a DTable
HowStore the metadata of its
corresponding overlapped Dtables
Manage the data and block index in segment
Origin data
Segment 2(append data)
Segment 1(append data)
Metadata
DtableOverlapped Dtables
Metadata Data_block i
Data_block i+1
……
Filter blocks
footer
Index block
Meta_index block
Overlapped Meta_index block
Origin indexSegment 1
IndexSegment 2
IndexIndex block
Magic data
Index block in segment
LWC-Tree · Workload Balance
AimKeep the balance of LWC-tree
Improve the operation efficiency
HowDeliver the key range of the
overly-full table to its siblings after light-weight compaction
Advantageno data movement and no
extra overhead
Dat
a vo
lum
e
DTable number in Level Li
1 2 3 4 5 6 7 n
…
Data block
8
ba
Li
Li+1c
da-c
…d ba c
c-da-b
d
Light-weight compaction
Range adjustment
… …
…
1 2 1 2
➢Background
➢LWC-tree (Light-Weight Compaction tree)
➢LWC-store on SMR drives
➢Experiment Result
➢Summary
LevelDB on SMR DrivesSMR(Shingled Magnetic Recording)
Overlapped tracks
Band & Guard Regions
Random write constrain
LevelDB on SMR
Multiplicative I/O amplification
Figure from Fast 2015 “Skylight – A Window on Shingled Disk Operation”
LevelDB on SMR DrivesSMR(Shingled Magnetic Recording)
Overlapped tracks
Band & Guard Regions
Random write constrain
LevelDB on SMR
Multiplicative I/O amplification 9.7310.07 9.83 9.72 9.86
25.22 39.8952.85
62.14
76.59
0
20
40
60
80
100
20 30 40 50 60
Am
plif
icat
ion
RA
tio
band size (MB)
WAMWA
Band size 40 MB
WA(Write amplification of LevelDB) 9.83x
AWA (Auxiliary write amplification of SMR) 5.39x
MWA (Multiplicative write amplification of LevelDB on SMR ) 52.58x
This auxiliary I/O amplifications of SMR motivate our implementation!
LWC-store on SMR drive
Aim Eliminate the auxiliary I/O
amplification of SMR
Improve the overall performance
How A DTable is mapped to a band in
SMR drive
Segment appends to the band and overlaps the out-of-date metadata
Equal division: Divide the DTableoverflows a band into several sub-tables in the same level
➢Background
➢LWC-tree (Light-Weight Compaction tree)
➢LWC-store on SMR drives
➢Experiment Result
➢Summary
Configuration
1. LevelDB on HDDs (LDB-hdd)
2. LevelDB on SMR drives (LDB-smr)
3. SMRDB*• An SMR drive optimized key-value store• reduce the LSM-tree levels to only two levels (i.e., L0 and L1)
• the key range of the tables at the same level overlapped
• match the SSTable size with the band size
4. LWC-store on SMR drives (LWC-smr)
Experiment PerimeterTest Machine 16 Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz processors
SMR Drive Seagate ST5000AS0011
CMR HDD Seagate ST1000DM003
SSD Intel P3700
Experiment · Load (100GB data)
Random load Large amount of compactions
LWC-store 9.80x better than LDB-SMR
LWC-store 4.67x better than LDB-HDD
LWC-store 5.76x better than SMRDB
Sequential loadNo compaction
Similar Sequential load performance
12.20
42.90 45.00 45.50
0.00
10.00
20.00
30.00
40.00
50.00
LDB-SMR LDB-HDD SMRDB LWC-SMRSe
q.lo
adth
rou
ghp
ut
(MB
/s)
Sequential put
1.00 2.10 1.70
9.80
0.00
2.00
4.00
6.00
8.00
10.00
12.00
LDB-SMR LDB-HDD SMRDB LWC-SMR
Rn
d.lo
ad t
hro
ugh
pu
t (M
B/s
) Random put
4.67x
Experiment · read (100K entries)
30.58
28.91 28.88 28.65
27.50
28.00
28.50
29.00
29.50
30.00
30.50
31.00
LDB-SMR LDB-HDD SMRDB LWC-SMR
Rn
d.r
ead
late
ncy
(m
s)
Random get
15.50 12.60
20.50
26.90
0.00
5.00
10.00
15.00
20.00
25.00
30.00
LDB-SMR LDB-HDD SMRDB LWC-SMR
Seq
.rea
d t
hro
ugh
pu
t (M
B/s
)
Sequential get
Look-up 100K KV entries against a 100GB random load database
Experiment · compaction (randomly load 40GB data)
Compaction performance in microscope LevelDB: number of compactions is
large
SMRDB: data size of each compaction is large
LWC-tree: small number of compactions and small data size
Overall compaction time LWC-smr gets the highest efficiency
Experiment · compaction (randomly load 40GB data)
35640
19227
49298
5128
0
10000
20000
30000
40000
50000
LDB-SMR LDB-HDD SMRDB LWC-SMR
Ove
rall
com
p t
ime
(s)
Overall compaction time
Compaction performance in microscope LevelDB: number of compactions is
large
SMRDB: data size of each compaction is large
LWC-tree: small number of compactions and small data size
Overall compaction time LWC-smr gets the highest efficiency
Experiment · Write amplificationCompetitors • LWC-SMR
• LDB-SMR
Write amplification (WA)• Write amplification of KV-store
Auxiliary write amplification (ARA)• Auxiliary write amplification of SMR
multiplicative write amplification (MWA)• Multiplicative write amplification of KV stores
on SMR
2.28 1.68 1.68 1.63 1.63 25.22
39.89
52.85 62.14
76.59
0.0010.0020.0030.0040.0050.0060.0070.0080.0090.00
20MB 30MB 40MB 50MB 60MBMu
ltip
licat
ive
wri
te a
mp
lific
atio
n
Band Size
LWC-SMR.MWA LDB-SMR.MWA2
.24
1.5
6
1.4
7
1.3
9
1.3
9
9.7
3
10
.07
9.8
3
9.7
2
9.8
6
1.0
2
1.0
8
1.1
4
1.1
7
1.1
7
2.5
9
3.9
6
5.3
8
6.3
9
7.7
7
0.00
2.00
4.00
6.00
8.00
10.00
12.00
20MB 30MB 40MB 50MB 60MB
Wri
te a
mp
lific
atio
n
Band Size
LWC-SMR.WA LDB-SMR.WA LWC-SMR.AWA LDB-SMR.AWA
38.12x
Experiment · LWC-store on HDD and SSD
➢Background
➢LWC-tree (Light-Weight Compaction tree)
➢LWC-store on SMR drives
➢Experiment Result
➢Summary
Summary
LWC-tree: A variant of LSM-tree
Light-weight compaction – Significantly reduce the I/O amplification of compaction
LWC-store on SMR drive
Data management in SMR drive – eliminate the auxiliary I/O amplification from SMRdrive
Experiment result
high compaction efficiency
high write efficiency
Fast read performance same as LSM-tree
Wide applicability
Thank you!QUESTIONS?
Email: tingyao@hust.edu.cn
top related