ftl design exploration in reconfigurable high-performance ssd (rhpssd) for server applications...
TRANSCRIPT
FTL Design Exploration in Reconfigurable High-Performance SSD (RHPSSD) for Server Applications
International Conference on Supercomputing
June 12, 2009
Ji-Yong Shin12, Zeng-Lin Xia1, Ning-Yi Xu1, Rui Gao1,
Xiong-Fei Cai1, Seungryoul Maeng2, and Feng-Hsiung Hsu1
1Microsoft Research Asia2Korea Advanced Institute of Science and Technology
Introduction and Background (1/3) Growing popularity of flash memory and SSD
Low latency Low power Solid state reliability
SSD widening its range of application Embedded devices Desktop and laptop PC Server and supercomputer
SSD expected to revolutionize storage subsystem
2/20
Introduction and Background (2/3) Flash memory
Erase needed before write Unit of read/write and erase differs
Read/Write: page (typically 2 to 4KB) Erase: block (typically 64 pages)
Latency for read, write, erase differs Read (25us) < write (250us) < erase (500us) Erase carried out on demand: cleaning or garbage
collection Wear-leveling necessary
Memory cells wears out when erased Typically a block endures 100K erase operations
3/20
Flash Memory
Introduction and Background (3/3) Flash translation layer (FTL)
Provides abstraction of flash memory characteristics
Maintains logical to physical address mapping
Carries out cleaning operations Conducts wear leveling
FTL in multiple flash chip environment Manages parallelism and wear
level among chips
Host Machine
FTL
Flash Memor
y modul
e
IOReque
st
Flash Memor
y modul
e
Flash Memor
y modul
e
Flash Reque
st
Flash Reque
st
Flash Reque
st
4/20
Motivation (1/2) Servers and Supercomputing Environment
High performance storage subsystem required Applications are usually fixed
SSD performance characteristics Highly dependent on FTL design and workloads
Customized SSD Can Boost Up Servers and Supercomputers
5/20
Motivation (2/2)
Our Focus
High performance SSD with abundant resource
FTL design tradeoffs using different algorithms in each functionalities
Customizing FTL considering flash memory and workload characteristics
Based on Reconfigurable High-Performance SSD Architecture,
we will explore FTL design considerations and tradeoffs and propose guidelines for customizing FTL
Related Work
Flash memory for embedded system or generic SSD
Internal hardware’s organizational tradeoffs of SSD [Agrawal et al. USENIX 08]
Configuring RAID system considering disk and workload characteristics
6/20
Reconfigurable High-Performance SSD (RHPSSD) RHPSSD architecture
High performance 36 independent flash channels 4GB/s PCI Express host-to-SSD interface
Flexibility from FPGA for reconfiguring of FTL
PCI Express (4GB/s)PCI Express (4GB/s)
FPGA
FTL or flash
controller
flash channel
controllers for each
flash channel
Random Access Memory
Flash Daughter Board
Flash Chip with Independent
Channel
Flash Chip with Independent
Channel
Flash Chip with Independent
Channel
Flash Chip with Independent
Channel
Flash Chip with Independent
Channel
… …
Flash Chip with Independent
Channel
ChipDie Die
Plane
Plane
Plane
Plane
1. Maintaining High Parallelismfor Performance!
2. Wear Leveling for Endurance
a.Among all blocks b.Among chips, dies and
planes
7/20
FTL Design Exploration and Analysis Simulation-based method to discover:
1. Logical page to physical flash plane allocation
2. Effect of hot/cold data separation
3. Wear leveling and Cleaning 1. Cleaning analysis for different allocation2. Wear leveling in different clusters
8/20
Simulation Environment and Workloads (1/2) Simulation Environment
Modified DiskSim 4.0 and SSD plug-in of MSR SVC Various FTL algorithms implemented
Basic Configurations RHPSSD architecture Flash chip
Latencies (read - 25us, write - 250us, erase - 500us) Two types of chip for different SSD capacities
4GB (2 dies with 2 planes) chip 8GB (4 dies with 2 planes) chip
9/20
Simulation Environment and Workloads (2/2) Traces used for simulation
Workload Sequential Random Highly IO intensive
High data locality
SSD Setting
PostmarkO O
144GB SSD
IOzoneO O O
144GB SSD
WebDCO O
288GB SSD
TPC-CO O
288GB SSD
SQLO O
288GB SSD
ExchangeO
2 x 288GB SSD
10/20
Logical Page to Physical Plane Allocation (1/2) Allocation is directly related to parallelism Static allocation
Binding logical page address to specific plane Striping methods
Wide striping, page striping unit: high parallelism, more cleaning Narrow striping, block striping unit: low parallelism, less cleaning
Dynamic allocation Allocate page request to idle plane on runtime Binding logical address to
Chip: less degree of freedom SSD: maximum degree of freedom
Wide Striping Narrow Striping
11/20
Logical Page to Physical Plane Allocation (2/2)
Resp
onse
Tim
e N
orm
aliz
ed t
o S
TA
TIC
W
-PA
GE
12/20
Hot/Cold Data Separation (1/2) Separating pages according to temperature in
each plane Block with hot data are likely to be full of invalid page Block with cold data are likely to maintain its condition
Known to reduce erase operation and valid page migration Also leads to smaller response time
13/20
Hot/Cold Data Separation (2/2)Im
pro
vem
ent
aft
er
apply
ing
the
separa
tion (
%)
14/20
Wear Leveling and Cleaning High performance and wear level of SSD is a
different story
Static allocation Logical addresses are bounded to plane so no page
migration can take place to the outside of the dedicated plane (only local wear leveling) Selecting allocation to evenly wear out each plane is
important
Dynamic allocation Wear leveling can be carried out in different clusters (chip,
SSD) Cluster is the scope where the lifetime of blocks will be
maintained evenly The Larger the cluster is, the more even the wear level is in
SSD as a whole The Larger the cluster is, the greater the overhead is
15/20
# o
f O
pera
tions
Norm
aliz
ed t
o
W-P
age
Number of Cleaning and Erase Distributionwithout Wear Leveling
16/20
Wear Leveling in Different Clusters Wear leveling cluster
Group of blocks that wear leveling algorithm maintains the age even
The larger the cluster the worse the performance becomes
The larger the cluster the evener the age of blocks are
0
0.5
1
1.5
2
resp time
avg life
overall stddev
chip stddev
die stddev
plane stddev
Chip PChip C
00.5
11.5
22.5
3
resp time
avg life
overall stddev
chip stddev
die stddev
plane stddev
SSD PSSD S
17/20
Summary Static vs. dynamic allocation
Static wide striping: dominant sequential IO workloads Page striping unit: small response time, more cleaning Block striping unit: large response time, less cleaning Trade off between response time and cleaning operations
Dynamic: dominant random IO workloads
Hot/Cold data separation Effective for evenly distributed IO
Wear leveling cluster Large cluster: large overhead, even distribution of wear
level Small cluster: small overhead, uneven distribution of
wear level Trade off between response time and even wear level
18/20
Conclusion Algorithms in each FTL functionality studied
for high performance SSD
Tradeoffs and simple guidelines for designing customized FTL in different workload and SSD’s lifetime requirements presented
Please read the paper for more details
19/20
Thank you.Questions?
20/20