FTL (Flash Translation Layer)
Yoon Jae Seong ([email protected])School of Computer Science and Engineering
Seoul National University
Contents
NAND flash memory characteristics
Flash Translation Layer
Mapping algorithms
Summary
2
NAND flash memory characteristics
Two faces of NAND flash memory
3
-High density -Low power consumption-Low access latency-High shock/vibration resistance-Small form factor….
-No in-place updating-Limited endurance-Bad blocks-Worsening reliability ….
FROM THE DARK NIGHT
Layered approach : Abstraction
4
NAND flash memory
Applications
Flash memory software/hardware
By overcoming the limitations of underlying flash memory
To provide the illusion of fast/reliable storage system
Flash memory software
5
Flash Translation Layer
NAND flash memory
Legacy file systems (EXT2, VFAT..)
Flash file systems (JFFS2, YAFFS..)
NAND flash interface
Block device interface
Flash-aware system software(Virtual memory, DBMS)
Block-erasePage-programPage-read
Sector writeSector read
Flash Translation Layer (FTL)
Definition A software layer that emulates the functionality of an HDD
while hiding the peculiarities of flash memory
Roles Re-mapping Wear-leveling Bad block management ……
6
Wear-leveling
Limited endurance of NAND flash memory The number of P/E (program/erase) cycles for each block is limited to 100,000 for
SLC and <10,000 for MLC
7
Wear-leveling
Bad block management
Types of bad block Initial bad blocks
Identified by a special mark at a designated location in each block
Run-time bad blocks worn-out blocks (i.e., an error returned during a program or an erase
operation)
8
Data blocks Reserved blocks
Swap
Re-mapping
9
Mapping table
NAND flash memory
LBA address spacewrite
old data
new data
(As seen by the host)
Plethora of FTLs
10
AFTL
CNFTL
JFTL
CFTL
µ-FTL
super-block scheme
Log block scheme
Replacement block scheme
DFTL
LASTFAST
Reconfigurable FTL
???
………..and so onVanilla FTL
MS FTL
LazyFTL
SFTL
Mapping granularity
Block-level mapping Page-level mapping
11NAND flash block
Logical sector address
Logical block #
Block-level mapping table
Sector # within a block
Physical block #
NAND flash block
Logical sector address
Logical page#
page-level mapping table
Sector # within a page
Physical block #
Physical page #
Block-mapping vs. Page-mapping
Block-level mapping Requires a much smaller mapping table But, at the cost of inefficient handling of small-sized writes
Page-level mapping Allows more flexible management
Efficient for small-sized writes But, requires a larger mapping table
12
Block-mapping vs. Page-mapping
In terms of mapping table size Assuming 32GB flash storage with 128KB blocks and 2KB pages
Block-mapping requires 256K mapping entries (32GB/128KB) 256K * 4B (for each mapping entry) = 1MB for mapping table
Page- mapping requires 16M mapping entries (32GB/2KB) 16M * 4B (for each mapping entry) = 64MB for mapping table
In terms of flexibility of management Block-mapping requires “data in a block” to be moved together Page-mapping requires “data in a page” to be moved together
13
What is the implication?
A simple page-mapped FTL
14
Append only
free blocks
data blocks
Garbage collection
Page-level mapping table
LBA address spacewrite
Logical page #
cc
b
b
a
a
d
d
f
f
f
Challenges in page-mapped FTLs Mapping information management
Long reconstruction time Birrell et al. “A design for high-performance flash disks”
Large memory requirement Gupta et al. “DFTL : a flash translation layer employing demand-based
selective caching of page-level address mappings”
Garbage collection overhead
15
data block
Summary page(Sequence#, LPNs of its pages)
free block
Seal
SRAM Flash Global Mapping Table
Cached Mapping Table
DFTL
Architecture
16
…… ……
Translation blocks Data-blocks
FlashTranslation page
Data-page
Volatile Memory
Cached mapping table
Global translation directory
Map entry Directory entry
Consult location of translation
pages in flash
Fetch mapping
entry Evict mapping
entry
Stores active address mapping
Tracks translation pages in flash
DFTL variants (1)
17
Convertible FTL Park et al. “a convertible flash translation layer adaptive to data
access pattern” To improve read performance of DFTL
(1) Grouping logically consecutive “cold” pages into a physical block during garbage collection and providing cached block-mapping table
(2) Exploiting spatial locality by augmenting consecutive field to cached (page) map entry
Only benefits reading sequentially written pages
Garbage collection
Physical block W
LPN i LPN i+1 LPN i+(n-1)
Physical block X
Physical block Y
Physical block Z
Logical block K
…..….. ….. …..
…..….. K W
Cached BlockMapping Table
LPN PPN Cached PageMapping Table
Consecutive
DFTL variants (2) Lazy FTL
Ma et al. “LazyFTL: A Page-level Flash Translation Layer Optimized for NAND Flash Memory
To improve crash recovery time of DFTL Cached portion of mapping table is lost when crash occurs (in DFTL)
Consistent mapping table can be recovered using logical page number recorded in the spare area of data page (but it is too slow)
18
Data region
Map region
Update Mapping Table
Update region
G.C. Host write
Flash
Global Mapping Table
Lazy update
- Contain most recent data - Maintained with small number of blocks
(Consistent mapping information at T-K)
(Most up-to-date mapping information at T)
Vanilla FTL A simplistic block-mapping FTL
19Data block Free block
New dataOld data
copy
Block-level mapping table
LBA address spacewrite
A challenge in block-mapped FTLs
Achilles´heel of block-mapped FTLs Poor small-random write performance
Due to expensive copy operations when only a part of block is modified
Write buffering schemes Temporarily store the data from the host to write buffer blocks
before block remapping is performed Various schemes have been introduced
Replacement block scheme Log block scheme Super block scheme FAST and LAST …….
20
Replacement block scheme (1)
Key idea A data block has a chain of write buffer blocks called replacement blocks Mapping within a replacement block is managed in block-level
21
data block Replacement blocks
Replacement block scheme (2)
Merge operation Is triggered when there is no free block for a replacement block gathers valid pages in a data block and write buffer blocks (replacement
blocks) to form a single complete data block
22
Replacement block scheme (3)
Problems Low utilization of replacement blocks Sequential traverse over replacement blocks during reads and writes No consideration for sequential programming constraint
23
data block Replacement blocks
Log block scheme (1)
Key idea One dedicated log block for a data block Mapping within a log block is managed in page-level
Update is performed in an append-only manner from the first page
24
Data block Log block
Write request 1
Write request 2
Write request 3
invalid
invalidinvalid
validvalidvalid
Volatile memory
Block-mapping table
Page-mapping table for each
log block
+
Logical address of a page in spare
area
Log block scheme (2)
Merge operation Is triggered when
All the pages in a log block is consumed # of free blocks is below a certain threshold
Three types of merge
25
1132
0123
Free block pool
[Full merge]Data block log block
0123
01
Data block log block
Free block pool
0123
01
Data block log block
Free block pool
23
[Partial merge] [Switch merge]
Log block scheme (3)
Log block thrashing Occurs when the number of log blocks is not enough to cover the
write working set Example scenario
There are only two log blocks in the system Workload : P1 P5 P9 P1 P5 P9 P1 P5 P9 ……
26
Data blockp4p5p6p7
p8p9p10p11
p0p1p2p3
p1Log block
p5
p9?
FAST (1)
Key idea FAST : Fully Associative Sector Translation
Fully associative mapping between data blocks and log blocks Mapping within a log block is managed in page-level as in log block
scheme (referred to as BAST : Block Associative Sector Translation)
27
data block X data block Y data block Z
Log block A
Write 1Write 2
Write 3
Log block B
Write 4Write 5
FAST (2)
Two different types of log block Sequential log block and random log block
28
01234567
01234567
Sequential log blockdata block21
Random log block1617181920212223
data block
Sequentiality detect algorithm
Task 1 Task 2Sequential write request(LPN 0,1,2,3,4,5,6,7)
Random write request(LPN 21), (LPN 145)
O/S (File System)
Mixed write request(LPN 0,1,2,3,21,4,5,6,145,7)
Switch merge
Full merge
FAST (3)
Pros Higher utilization of log blocks when merged Delayed merge operation
increases the probability of page invalidation
29
Data blockp4p5p6p7
p8p9p10p11
p0p1p2p3
p1p5
Log block
p9p1
p5p9p1p5
Workload : P1 P5 P9 P1 P5 P9 P1 P5
FAST (4)
Cons Slower address translation time
Due to full scan of page(sector)-level mapping table Excessive overhead for a single log block reclamation
Severely skewed performance depending on the number of data blocks involved in a log block
30
log block data blocks
Reconfigurable FTL
Key idea A log block is shared only by a set of adjacent data blocks, called a
super-block
31
N data blocks
Super-block
… …
K log blocks M log blocks
N data blocks
Chameleon
Key idea Garbage collection among random log blocks as an
alternative to full merge operation
32
Garbage collection
Random log blocks
Merge
Data blocks
LAST : Locality-Aware Sector Translation
Key idea Hot/cold partition in random log blocks
Increase the probability of page invalidation in hot partition
33
Figures from LAST : Locality-Aware Sector Translation for NAND Flash Memory-Based Storage Systems by Lee et al.
Effect of hot/cold partitioning in LAST
34
Evolution of block-mapped FTLs
35
Vanilla FTL
Replacement block scheme
Log block scheme(BAST)
FAST
Reconfigurable FTL
Chameleon LAST
AFTL
[Non-volatile write buffering]
[Volatile write buffering]
BPLRU
HYDRA
Summary
Three primary roles of FTL Re-mapping Wear-leveling Bad block management
Mapping algorithms Page-level mapping Block-level mapping
Various write buffering schemes Hybrid mapping
36
To provide fast and reliable flash storage