leveraging value locality in optimizing nand flash-based ssds - … · 2019-02-25 · aayush gupta,...
TRANSCRIPT
Leveraging Value Locality in
Optimizing NAND Flash-based SSDs
Aayush Gupta, Raghav Pisolkar,
Bhuvan Urgaonkar and Anand Sivasubramaniam
Computer Systems Lab
The Pennsylvania State University
1
Agenda
Relook at Locality
Another dimension of Locality : Value Locality
• Value Locality and SSDs
CA-SSD Design
• Mapping Structures
• Metadata Management
Evaluation
• CA-SSD vs Traditional SSD
2
Locality: The pillar of storage
Temporal Locality
• If a logical address is accessed now, it is likely to
be accessed again in the near future
Spatial Locality
• If a logical address is accessed now, there is a high
likelihood that its neighboring addresses will be
accessed in the near future
Pervasive : L1/L2 cache, TLB, Buffer Cache,
Virtual Memory, Disk Cache, Web Cache …
3
Another Dimension of Locality
Value Locality
• Certain content is accessed preferentially
Data deduplication using Content Addressable
Storage (CAS)
Use cases of Value Locality (VL)
• Network traffic reduction
• Content based Caching
• Efficient data storage (archival/backup)
• E.g: Venti, Foundation, EMC Centera, Data Domain Storage
Systems
4
Can we use Value Locality to
address the idiosyncrasies of SSDs?
CAS suits SSD
5
SSD CAS
Provides
DeduplicationWrites are a bottleneck
Read/Write asymmetry
Block Erases
Out of Place Updates in CAS
7
Storage ABC DEF PQR
120 121 122 123 124Physical Address
120 121 122 123 124Logical Address
Translation
Write
(123, ‘XYZ’)
TUV
Write
(123, ‘ABC’)
XYZ
CAS and SSD: Made for each other?
8
SSDCAS
Out of Place
Updates
Erase before
Write
Problem with CAS
9
SSDCAS
Loss of
Sequentiality
Fast Random
Reads
Do real workloads exhibit
Value Locality?
Workloads [Koller10]
10
Workload Writes
(%)
Total
Requests
(Millions)
Unique
Write
Requests
(%)
Unique
Read
Requests
(%)
web 77.0 3.8 42.35 32.05
mail 77.3 3.6 7.83 80.85
homes 96.7 4.4 66.37 80.75
[Koller10] Koller, R., and Rangaswami, R. “I/O Deduplication: Utilizing Content Similarity to
Improve I/O Performance.” (FAST’10)
Write dominant
Duplication
Value Popularity
VP represents the number of occurrences of
each unique value in a workload
• Signifies potential for deduplication for a workload
11
Some Values are Very Popular
12
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2
Cu
mu
lati
ve f
ract
ion
of
wri
te a
cce
sse
s
Unique Values (x 105)
24K
8.8%0.5
Some Values are Very Popular
13
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2 2.5
Cu
mu
lati
ve f
ract
ion
of
wri
te a
cce
sse
s
Unique Values (x 105)
web
homes
30%8.8%
24K 84K
0.5
CA-SSD Design
16
BB-RAM
(Mapping
structures)
SSD
Controller
(FTL)
Hash
Co-processor
Device
Driver
Data
H(Data)
Update Mapping Structures
PPN
WriteLPN, Data
NULL
H(Data)
PPN, DataWrite
Return
LPN, DataWrite
Mapping Structures: LPT & HPT
17
LPT
LPN
L1
L2
L3
PPN
P1
P2
P1L4 P4
HPT
Hash
H1
H2
H3
PPN
P1
P2
P3H4 P4
Mapping Structures: iLPT
18
iLPT
PPN
P1
P2
P3
P4
LPN
L1, L3
L2
L4INV
HPT
Hash
H1
H2
H3
PPN
P1
P2
P3H4 P4
LPT
LPN
L1
L2
L3
PPN
P1
P2
P1
L4 P4
Mapping Structures: iLPT & iHPT
19
iLPT
PPN
P1
P2
P3
P4
LPN
L1, L3
L2
L4INV
HPT
Hash
H1
H2
H3
PPN
P1
P2
P3H4 P4
iHPT
PPNP1
P2
P3
P4
HashH1
H2
H3
H4
LPT
LPN
L1
L2
L3
PPN
P1
P2
P1
L4 P4
Remove
Metadata: Traditional SSD
20
SSD
Controller
RAM
LPT
LPN
L1
L2
L3
PPN
P1
P2
P3
L4 P4
Metadata : CA-SSD
21
SSD
Controller
Hash
Co-processor
BB-RAMiHPT
PPN Hash
P1 H1
P2 H2
P3 H3
P4 H4
LPT
LPN PPN
L1 P1
L2 P2
L3 P1
L4 P4
How do we fit the metadata
in CA-SSD’s RAM?
HPT
Hash PPN
H1 P1
H2 P2
H3 P3
H4 P4
iLPT
PPN LPN
P1 L1,L3
P2 L2
P3 INV
P4 L4
Option 1: Larger RAM
Not Scalable!!BB-RAM
Option 2 : Shrink Metadata
22
BB-RAM
iHPT
PPN Hash
P1 H1
P2 H2
P3 H3
P4 H4
LPT
LPN PPN
L1 P1
L2 P2
L3 P1
L4 P4
SSD
Controller
Hash
Co-processor
HPT
Hash PPN
H1 P1
H2 P2
H3 P3
H4 P4
iLPT
PPN LPN
P1 L1,L3
P2 L2
P3 INV
P4 L4
Temporal Value Locality
TVL implies that if a certain value is accessed
now, it is likely to be accessed again in the near
future not necessarily from the same address
23
Temporal Value Locality: Writes
24
0
0.2
0.4
0.6
0.8
1
0 0.75 1.5 2.25
Cu
mu
lati
ve F
ract
ion
of
Wri
te R
eq
ue
sts
Position in LRU Queue (x 105)
Value
LPN
Higher TVL than
traditional TL
Shrink metadata
using TVL
web
1.3K 141K
0.9
Metadata Management: TVL
25
SSD
Controller
Hash
Co-processor
BB-RAM
HPT
iHPT
Hash
H1
H2
H3
PPN
P1
P2
P3
H4 P4
PPN
P1
P2
P3
P4
Hash
H1
H2
H3
H4
LPT
LPN PPN
L1 P1
L2 P2
L3 P1
L4 P4
iLPT
PPN LPN
P1 L1,L3
P2 L2
P3 INV
P4 L4
Metadata Management: TVL
26
SSD
Controller
Hash
Co-processor
BB-RAM
iLPT
PPN LPN
P1 L1,L3
P2 L2
P3 INV
P4 L4
HPT
Hash PPN
H1 P1
H2 P2
H3 P3
iHPT
PPN Hash
P1 H1
P2 H2
P3 H3
LPT
LPN PPN
L1 P1
L2 P2
L3 P1
L4 P4
MRU
LRU
DiscardHow does CA-SSD perform compared
to Traditional SSDs?
Evaluation : Response Time
27
0
2
4
6
8
10
12
14
web mail home
Re
spo
nse
Tim
e (
ms)
Traces
NON CAS
CAS
16K
64K
128K
65%84%
7msMail shows
lower TVL
Evaluation : Response Time
28
0
2
4
6
8
10
12
14
web mail home
Re
spo
nse
Tim
e (
ms)
Traces
NON CAS
CAS
128K
Similar to
infinite RAM
Total Writes : web
29
0
1
2
3
4
5
6
7
8
Non CAS CAS 16K 64K 128K
Tota
l wri
tes
(mill
ion
s)Workload Writes GC writes
94% reduction
Dedup reduces
valid content
75% reduction in
valid pages copied
Total Erases : web
30
0
20
40
60
80
100
120
NON CAS CAS 16K 64K 128K
Blo
ck e
rase
s (T
ho
usa
nd
s)
77%
Lesser number
of total writes
Reduced GC
invocation
Conclusions
Workloads exhibit significant value locality
• Characterization of Value Popularity and Temporal
Value Locality
CAS and SSDs complement each other
Certain implementation challenges need to be
addressed
• Mapping structures
• Metadata Management
31