flash correct-and-refresh - eth z · problem with in-place reprogramming 26 11 10 01 00 vt ref1...
TRANSCRIPT
![Page 1: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/1.jpg)
Flash Correct-and-Refresh
Retention-Aware Error Management for Increased Flash Memory Lifetime
Yu Cai1 Gulay Yalcin2 Onur Mutlu1 Erich F. Haratsch3 Adrian Cristal2 Osman S. Unsal2 Ken Mai1
1 Carnegie Mellon University 2 Barcelona Supercomputing Center 3 LSI Corporation
![Page 2: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/2.jpg)
Executive Summary n NAND flash memory has low endurance: a flash cell dies after 3k P/E
cycles vs. 50k desired à Major scaling challenge for flash memory n Flash error rate increases exponentially over flash lifetime n Problem: Stronger error correction codes (ECC) are ineffective and
undesirable for improving flash lifetime due to q diminishing returns on lifetime with increased correction strength q prohibitively high power, area, latency overheads
n Our Goal: Develop techniques to tolerate high error rates w/o strong ECC n Observation: Retention errors are the dominant errors in MLC NAND flash
q flash cell loses charge over time; retention errors increase as cell gets worn out
n Solution: Flash Correct-and-Refresh (FCR) q Periodically read, correct, and reprogram (in place) or remap each flash page
before it accumulates more errors than can be corrected by simple ECC q Adapt “refresh” rate to the severity of retention errors (i.e., # of P/E cycles)
n Results: FCR improves flash memory lifetime by 46X with no hardware changes and low energy overhead; outperforms strong ECCs
2
![Page 3: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/3.jpg)
Outline n Executive Summary n The Problem: Limited Flash Memory Endurance/Lifetime n Error and ECC Analysis for Flash Memory n Flash Correct and Refresh Techniques (FCR) n Evaluation n Conclusions
3
![Page 4: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/4.jpg)
Problem: Limited Endurance of Flash Memory n NAND flash has limited endurance
q A cell can tolerate a small number of Program/Erase (P/E) cycles q 3x-nm flash with 2 bits/cell à 3K P/E cycles
n Enterprise data storage requirements demand very high endurance q >50K P/E cycles (10 full disk writes per day for 3-5 years)
n Continued process scaling and more bits per cell will reduce flash endurance
n One potential solution: stronger error correction codes (ECC) q Stronger ECC not effective enough and inefficient
4
![Page 5: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/5.jpg)
UBER: Uncorrectable bit error rate. Fraction of erroneous bits after error correction.
Decreasing Endurance with Flash Scaling
n Endurance of flash memory decreasing with scaling and multi-level cells n Error correction capability required to guarantee storage-class reliability
(UBER < 10-15) is increasing exponentially to reach less endurance
5
Ariel Maislos, “A New Era in Embedded Flash Memory”, Flash Summit 2011 (Anobit)
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
100,000
SLC 5x-nm MLC 3x-nm MLC 2x-nm MLC 3-bit-MLC
P/E
Cyc
le E
nd
ura
nce
100k
10k 5k 3k 1k
4-bit ECC
8-bit ECC
15-bit ECC
24-bit ECC
Error Correction Capability (per 1 kB of data)
![Page 6: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/6.jpg)
The Problem with Stronger Error Correction
n Stronger ECC detects and corrects more raw bit errors à increases P/E cycles endured
n Two shortcomings of stronger ECC: 1. High implementation complexity à Power and area overheads increase super-linearly, but
correction capability increases sub-linearly with ECC strength
2. Diminishing returns on flash lifetime improvement à Raw bit error rate increases exponentially with P/E cycles, but
correction capability increases sub-linearly with ECC strength
6
![Page 7: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/7.jpg)
Outline n Executive Summary n The Problem: Limited Flash Memory Endurance/Lifetime n Error and ECC Analysis for Flash Memory n Flash Correct and Refresh Techniques (FCR) n Evaluation n Conclusions
7
![Page 8: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/8.jpg)
Methodology: Error and ECC Analysis n Characterized errors and error rates of 3x-nm MLC NAND
flash using an experimental FPGA-based flash platform q Cai et al., “Error Patterns in MLC NAND Flash Memory:
Measurement, Characterization, and Analysis,” DATE 2012.
n Quantified Raw Bit Error Rate (RBER) at a given P/E cycle q Raw Bit Error Rate: Fraction of erroneous bits without any correction
n Quantified error correction capability (and area and power consumption) of various BCH-code implementations q Identified how much RBER each code can tolerate
à how many P/E cycles (flash lifetime) each code can sustain
8
![Page 9: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/9.jpg)
NAND Flash Error Types
n Four types of errors [Cai+, DATE 2012]
n Caused by common flash operations q Read errors q Erase errors q Program (interference) errors
n Caused by flash cell losing charge over time q Retention errors
n Whether an error happens depends on required retention time n Especially problematic in MLC flash because voltage threshold
window to determine stored value is smaller
9
![Page 10: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/10.jpg)
retention errors
n Raw bit error rate increases exponentially with P/E cycles n Retention errors are dominant (>99% for 1-year ret. time) n Retention errors increase with retention time requirement
Observations: Flash Error Analysis
10
P/E Cycles
![Page 11: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/11.jpg)
Methodology: Error and ECC Analysis n Characterized errors and error rates of 3x-nm MLC NAND
flash using an experimental FPGA-based flash platform q Cai et al., “Error Patterns in MLC NAND Flash Memory:
Measurement, Characterization, and Analysis,” DATE 2012.
n Quantified Raw Bit Error Rate (RBER) at a given P/E cycle q Raw Bit Error Rate: Fraction of erroneous bits without any correction
n Quantified error correction capability (and area and power consumption) of various BCH-code implementations q Identified how much RBER each code can tolerate
à how many P/E cycles (flash lifetime) each code can sustain
11
![Page 12: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/12.jpg)
ECC Strength Analysis n Examined characteristics of various-strength BCH codes
with the following criteria q Storage efficiency: >89% coding rate (user data/total storage) q Reliability: <10-15 uncorrectable bit error rate q Code length: segment of one flash page (e.g., 4kB)
12
Code length (n)
Correctable Errors (t)
Acceptable Raw BER
Norm. Power
Norm. Area
512 7 1.0x10-4 (1x) 1 11024 12 4.0x10-4 (4x) 2 2.12048 22 1.0x10-3 (10x) 4.1 3.94096 40 1.7x10-3 (17x) 8.6 10.38192 74 2.2x10-3 (22x) 17.8 21.332768 259 2.6x10-3 (26x) 71 85
Error correc'on capability increases sub-‐linearly
Power and area overheads increase super-‐linearly
![Page 13: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/13.jpg)
n Lifetime improvement comparison of various BCH codes
Resulting Flash Lifetime with Strong ECC
13
0
2000
4000
6000
8000
10000
12000
14000
512b-BCH 1k-BCH 2k-BCH 4k-BCH 8k-BCH 32k-BCH
P/E
Cyc
le E
nd
ura
nce
4X Lifetime Improvement
71X Power Consumption 85X Area Consumption
Strong ECC is very inefficient at improving life'me
![Page 14: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/14.jpg)
Our Goal
Develop new techniques to improve flash lifetime without relying on stronger ECC
14
![Page 15: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/15.jpg)
Outline n Executive Summary n The Problem: Limited Flash Memory Endurance/Lifetime n Error and ECC Analysis for Flash Memory n Flash Correct and Refresh Techniques (FCR) n Evaluation n Conclusions
15
![Page 16: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/16.jpg)
Flash Correct-and-Refresh (FCR) n Key Observations:
q Retention errors are the dominant source of errors in flash memory [Cai+ DATE 2012][Tanakamaru+ ISSCC 2011]
à limit flash lifetime as they increase over time q Retention errors can be corrected by “refreshing” each flash
page periodically
n Key Idea: q Periodically read each flash page, q Correct its errors using “weak” ECC, and q Either remap it to a new physical page or reprogram it in-place, q Before the page accumulates more errors than ECC-correctable q Optimization: Adapt refresh rate to endured P/E cycles
16
![Page 17: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/17.jpg)
FCR Intuition
17
Errors with No refresh
ProgramPage ×
After time T × × ×
After time 2T × × × × ×
After time 3T × × × × × × ×
×
× × ×
× × ×
× × ×
×
×
Errors with Periodic refresh
×
× Retention Error × Program Error
![Page 18: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/18.jpg)
FCR: Two Key Questions
n How to refresh? q Remap a page to another one q Reprogram a page (in-place) q Hybrid of remap and reprogram
n When to refresh? q Fixed period q Adapt the period to retention error severity
18
![Page 19: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/19.jpg)
Outline n Executive Summary n The Problem: Limited Flash Memory Endurance/Lifetime n Error and ECC Analysis for Flash Memory n Flash Correct and Refresh Techniques (FCR)
1. Remapping based FCR 2. Hybrid Reprogramming and Remapping based FCR 3. Adaptive-Rate FCR
n Evaluation n Conclusions
19
![Page 20: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/20.jpg)
Outline n Executive Summary n The Problem: Limited Flash Memory Endurance/Lifetime n Error and ECC Analysis for Flash Memory n Flash Correct and Refresh Techniques (FCR)
1. Remapping based FCR 2. Hybrid Reprogramming and Remapping based FCR 3. Adaptive-Rate FCR
n Evaluation n Conclusions
20
![Page 21: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/21.jpg)
Remapping Based FCR
n Idea: Periodically remap each page to a different physical page (after correcting errors)
q Also [Pan et al., HPCA 2012]
q FTL already has support for changing logical à physical flash block/page mappings q Deallocated block is erased by garbage collector
n Problem: Causes additional erase operations à more wearout q Bad for read-intensive workloads (few erases really needed) q Lifetime degrades for such workloads (see paper)
21
![Page 22: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/22.jpg)
Outline n Executive Summary n The Problem: Limited Flash Memory Endurance/Lifetime n Error and ECC Analysis for Flash Memory n Flash Correct and Refresh Techniques (FCR)
1. Remapping based FCR 2. Hybrid Reprogramming and Remapping based FCR 3. Adaptive-Rate FCR
n Evaluation n Conclusions
22
![Page 23: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/23.jpg)
In-Place Reprogramming Based FCR
n Idea: Periodically reprogram (in-place) each physical page (after correcting errors)
q Flash programming techniques (ISPP) can correct retention errors in-place by recharging flash cells
n Problem: Program errors accumulate on the same page à may not be correctable by ECC after some time
23
Reprogram corrected data
![Page 24: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/24.jpg)
n Pro: No remapping needed à no additional erase operations n Con: Increases the occurrence of program errors
In-Place Reprogramming of Flash Cells
24
Retention errors are caused by cell voltage shifting to the left
ISPP moves cell voltage to the right; fixes retention errors
Floating Gate Voltage Distribution
for each Stored Value
Floating Gate
![Page 25: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/25.jpg)
Program Errors in Flash Memory
n When a cell is being programmed, voltage level of a neighboring cell changes (unintentionally) due to parasitic capacitance coupling
à can change the data value stored
n Also called program interference error
n Program interference causes neighboring cell voltage to shift to the right
25
![Page 26: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/26.jpg)
Problem with In-Place Reprogramming
26
11 10 01 00 VT
REF1 REF2 REF3
Floating Gate
Additional Electrons Injected
… … 11 01 00 10 11 00 00 Original data to be programmed
… … 10 01 00 10 11 00 00 Program errors after initial programming
… … Retention errors after some time 10 10 00 11 11 01 01
… … Errors after in-place reprogramming 10 01 00 10 10 00 00
1. Read data 2. Correct errors 3. Reprogram back
Problem: Program errors can accumulate over time
Floating Gate Voltage Distribution
![Page 27: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/27.jpg)
Hybrid Reprogramming/Remapping Based FCR
n Idea: q Monitor the count of right-shift errors (after error correction) q If count < threshold, in-place reprogram the page q Else, remap the page to a new page
n Observation: q Program errors much less frequent than retention errors à
Remapping happens only infrequently
n Benefit: q Hybrid FCR greatly reduces erase operations due to remapping
27
![Page 28: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/28.jpg)
Outline n Executive Summary n The Problem: Limited Flash Memory Endurance/Lifetime n Error and ECC Analysis for Flash Memory n Flash Correct and Refresh Techniques (FCR)
1. Remapping based FCR 2. Hybrid Reprogramming and Remapping based FCR 3. Adaptive-Rate FCR
n Evaluation n Conclusions
28
![Page 29: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/29.jpg)
Adaptive-Rate FCR
n Observation: q Retention error rate strongly depends on the P/E cycles a flash
page endured so far q No need to refresh frequently (at all) early in flash lifetime
n Idea: q Adapt the refresh rate to the P/E cycles endured by each page q Increase refresh rate gradually with increasing P/E cycles
n Benefits: q Reduces overhead of refresh operations q Can use existing FTL mechanisms that keep track of P/E cycles
29
![Page 30: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/30.jpg)
Adaptive-Rate FCR (Example)
30
Acceptable raw BER for 512b-BCH
3-year FCR
3-month FCR
3-week FCR
3-day FCR
P/E Cycles
Select refresh frequency such that error rate is below acceptable rate
![Page 31: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/31.jpg)
Outline n Executive Summary n The Problem: Limited Flash Memory Endurance/Lifetime n Error and ECC Analysis for Flash Memory n Flash Correct and Refresh Techniques (FCR)
1. Remapping based FCR 2. Hybrid Reprogramming and Remapping based FCR 3. Adaptive-Rate FCR
n Evaluation n Conclusions
31
![Page 32: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/32.jpg)
FCR: Other Considerations
n Implementation cost q No hardware changes q FTL software/firmware needs modification
n Response time impact q FCR not as frequent as DRAM refresh; low impact
n Adaptation to variations in retention error rate q Adapt refresh rate based on, e.g., temperature [Liu+ ISCA 2012]
n FCR requires power q Enterprise storage systems typically powered on
32
![Page 33: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/33.jpg)
Outline n Executive Summary n The Problem: Limited Flash Memory Endurance/Lifetime n Error and ECC Analysis for Flash Memory n Flash Correct and Refresh Techniques (FCR) n Evaluation n Conclusions
33
![Page 34: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/34.jpg)
Evaluation Methodology n Experimental flash platform to obtain error rates at
different P/E cycles [Cai+ DATE 2012]
n Simulation framework to obtain P/E cycles of real workloads: DiskSim with SSD extensions
n Simulated system: 256GB flash, 4 channels, 8 chips/channel, 8K blocks/chip, 128 pages/block, 8KB pages
n Workloads q File system applications, databases, web search q Categories: Write-heavy, read-heavy, balanced
n Evaluation metrics q Lifetime (extrapolated) q Energy overhead, P/E cycle overhead
34
![Page 35: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/35.jpg)
Extrapolated Lifetime
35
Maximum full disk P/E Cycles for a Technique
Total full disk P/E Cycles for a Workload × # of Days of Given Application
Obtained from Experimental Platform Data
Obtained from Workload Simulation Real length (in time) of each workload trace
![Page 36: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/36.jpg)
Normalized Flash Memory Lifetime
36
0
20
40
60
80
100
120
140
160
180
200
512b-‐BCH 1k-‐BCH 2k-‐BCH 4k-‐BCH 8k-‐BCH 32k-‐BCH
Normalized
Life
.me
Base (No-‐Refresh) Remapping-‐Based FCR Hybrid FCR Adap've FCR
46x
Adap.ve-‐rate FCR provides the highest life.me Life.me of FCR much higher than life.me of stronger ECC
4x
![Page 37: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/37.jpg)
Lifetime Evaluation Takeaways n Significant average lifetime improvement over no refresh
q Adaptive-rate FCR: 46X q Hybrid reprogramming/remapping based FCR: 31X q Remapping based FCR: 9X
n FCR lifetime improvement larger than that of stronger ECC q 46X vs. 4X with 32-kbit ECC (over 512-bit ECC) q FCR is less complex and less costly than stronger ECC
n Lifetime on all workloads improves with Hybrid FCR q Remapping based FCR can degrade lifetime on read-heavy WL q Lifetime improvement highest in write-heavy workloads
37
![Page 38: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/38.jpg)
Energy Overhead
n Adaptive-rate refresh: <1.8% energy increase until daily
refresh is triggered
38
0%
2%
4%
6%
8%
10%
1 Year 3 Months 3 Weeks 3 Days 1 Day
En
erg
y O
verh
ead
Remapping-based Refresh Hybrid Refresh
7.8%
5.5%
2.6% 1.8%
0.4% 0.3%
Refresh Interval
![Page 39: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/39.jpg)
Overhead of Additional Erases
n Additional erases happen due to remapping of pages
n Low (2%-20%) for write intensive workloads n High (up to 10X) for read-intensive workloads
n Improved P/E cycle lifetime of all workloads largely outweighs the additional P/E cycles due to remapping
39
![Page 40: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/40.jpg)
More Results in the Paper
n Detailed workload analysis
n Effect of refresh rate
40
![Page 41: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/41.jpg)
Outline n Executive Summary n The Problem: Limited Flash Memory Endurance/Lifetime n Error and ECC Analysis for Flash Memory n Flash Correct and Refresh Techniques (FCR) n Evaluation n Conclusions
41
![Page 42: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/42.jpg)
Conclusion n NAND flash memory lifetime is limited due to uncorrectable
errors, which increase over lifetime (P/E cycles)
n Observation: Dominant source of errors in flash memory is retention errors à retention error rate limits lifetime
n Flash Correct-and-Refresh (FCR) techniques reduce retention error rate to improve flash lifetime q Periodically read, correct, and remap or reprogram each page
before it accumulates more errors than can be corrected q Adapt refresh period to the severity of errors
n FCR improves flash lifetime by 46X at no hardware cost q More effective and efficient than stronger ECC q Can enable better flash memory scaling
42
![Page 43: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/43.jpg)
Thank You.
![Page 44: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/44.jpg)
Flash Correct-and-Refresh
Retention-Aware Error Management for Increased Flash Memory Lifetime
Yu Cai1 Gulay Yalcin2 Onur Mutlu1 Erich F. Haratsch3 Adrian Cristal2 Osman S. Unsal2 Ken Mai1
1 Carnegie Mellon University 2 Barcelona Supercomputing Center 3 LSI Corporation
![Page 45: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/45.jpg)
Backup Slides
![Page 46: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/46.jpg)
Effect of Refresh Rate on Lifetime
46
0
5
10
15
20
25
30
35
40
512b-‐BCH 1k-‐BCH 2k-‐BCH 4k-‐BCH 8k-‐BCH 32k-‐BCH
Normalized
Life
.me
Baseline (No-‐refresh) 1 Year 3 Months 3 Weeks 3 Days 1 Day
![Page 47: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/47.jpg)
Lifetime: Remapping vs. Hybrid FCR
47
0
5
10
15
20
25
30
35
1 Year 3 Months 3 Weeks 3 Days 1 Day
Nom
alized
Life
.me
Base (No Refresh) Remapping-‐based FCR Hybrid FCR
![Page 48: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/48.jpg)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
1 Year 3 Months 3 Weeks 3 Days 1 Day
En
erg
y O
verh
ead
Refresh Interval
Remapping-based FCR Hybrid FCR
Energy Overhead
48
7.8%
5.5%
2.6% 1.8%
0.37% 0.26%
![Page 49: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/49.jpg)
Average Lifetime Improvement
49
0
5
10
15
20
25
30
35
40
45
50
Base (no-refresh) Remapping-based FCR
Hybrid FCR Adaptive FCR
Ave
rag
e Li
feti
me
Imp
rove
men
t
(no
rmal
ized
No
Ref
resh
)
46x
31x
9.7x
![Page 50: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/50.jpg)
Individual Workloads: Remapping-Based FCR
50
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1.E+07
512b-‐BCH
1k-‐BCH
2k-‐BCH
4k-‐BCH
8k-‐BCH
32k-‐BC
H
512b-‐BCH
1k-‐BCH
2k-‐BCH
4k-‐BCH
8k-‐BCH
32k-‐BC
H
512b-‐BCH
1k-‐BCH
2k-‐BCH
4k-‐BCH
8k-‐BCH
32k-‐BC
H
512b-‐BCH
1k-‐BCH
2k-‐BCH
4k-‐BCH
8k-‐BCH
32k-‐BC
H
512b-‐BCH
1k-‐BCH
2k-‐BCH
4k-‐BCH
8k-‐BCH
32k-‐BC
H
512b-‐BCH
1k-‐BCH
2k-‐BCH
4k-‐BCH
8k-‐BCH
32k-‐BC
H
IO-‐ZONE (20 P/E per Day)
CELLO99 (5.5 P/E per Day)
POSTMARK (2.8 P/E per Day)
OLTP (0.14 P/E per Day)
MSR-‐Cambridge (0.005 P/E per Day)
WEB-‐SEARCH(0.001 P/E per Day)
Lifetim
e (days)
Baseline(No Refresh) 1 Year 3 Months 3 Weeks 3 Days 1 Day
![Page 51: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/51.jpg)
Individual Workloads: Hybrid FCR
51
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1.E+07
1 year
3months
3 weeks
3 days
1 day
1 year
3months
3 weeks
3 days
1 day
1 year
3months
3 weeks
3 days
1 day
1 year
3months
3 weeks
3 days
1 day
1 year
3months
3 weeks
3 days
1 day
1 year
3months
3 weeks
3 days
1 day
IO-‐ZONE CELLO99 POSTMARK OLTP MSR-‐Cambridge WEB-‐SEARCH
Lifetime (Days)
Base (no refresh) Remapping-‐based FCR Hybrid FCR
![Page 52: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/52.jpg)
Individual Workloads: Adaptive-Rate FCR
52
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1.E+07
1.E+08
512b-‐BCH
1k-‐BCH
2k-‐BCH
4k-‐BCH
8k-‐BCH
32k-‐BC
H
512b-‐BCH
1k-‐BCH
2k-‐BCH
4k-‐BCH
8k-‐BCH
32k-‐BC
H
512b-‐BCH
1k-‐BCH
2k-‐BCH
4k-‐BCH
8k-‐BCH
32k-‐BC
H
512b-‐BCH
1k-‐BCH
2k-‐BCH
4k-‐BCH
8k-‐BCH
32k-‐BC
H
512b-‐BCH
1k-‐BCH
2k-‐BCH
4k-‐BCH
8k-‐BCH
32k-‐BC
H
512b-‐BCH
1k-‐BCH
2k-‐BCH
4k-‐BCH
8k-‐BCH
32k-‐BC
H
IO-‐ZONE CELLO99 POSTMARK OLTP MSR-‐Cambridge WEB-‐SEARCH
Lifetim
e (Days)
Base (no refresh) Remapping-‐based FCR Hybrid FCR Adaptive-‐rate FCR
![Page 53: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/53.jpg)
P/E Cycle Overhead
n P/E cycle overhead of hybrid FCR is lower than that of remapping-based FCR n P/E cycle overhead for write-intensive applications is low
q Remapping-based FCR (20%), Hybrid FCR (2%)
n Read-intensive applications have higher P/E cycle overhead
53
1.0E-‐51.0E-‐41.0E-‐31.0E-‐21.0E-‐11.0E+01.0E+11.0E+21.0E+31.0E+4
IO-‐ZONE
CELLO99
POSTMAR
K
OLTP
MSR-‐Cam
bridge
WEB-‐SEA
RCH
IO-‐ZONE
CELLO99
POSTMAR
K
OLTP
MSR-‐Cam
bridge
WEB-‐SEA
RCH
Remapping-‐based FCR Hybrid FCR
Ratio
of add
ition
al erase ope
ratio
ns
w/ F
CR over all erases w/o FCR
1 Year 3 Months 3 Weeks 3 Days 1 Day
![Page 54: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/54.jpg)
Motivation for Refresh: A Different Way
n NAND flash endurance can be increased via q Stronger error correction codes (4x) q Tradeoff guaranteed storage time for one write for high
endurance (> 50x)
54
Acceptable raw BER for 512b-BCH
Acceptable raw BER for 32k-BCH
50x Higher Endurance (Relax required storage time)
4x Higher Endurance (Stronger ECC)
Enterprise server need > 50k P/E cycles
![Page 55: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/55.jpg)
FTL Implementation
q FCR can be implemented just as a module in FTL software
55
![Page 56: Flash Correct-and-Refresh - ETH Z · Problem with In-Place Reprogramming 26 11 10 01 00 VT REF1 REF2 REF3 Floating Gate Additional Electrons Injected Original data … 00 11 01 00](https://reader035.vdocuments.us/reader035/viewer/2022062912/5e0bd33216074b589b1b850e/html5/thumbnails/56.jpg)
Flash Cells Can Be Reprogrammed In-Place
n Observations: q Retention errors occur due to loss of charge q Simply recharging the cells can correct the retention errors q Flash programming mechanisms can accomplish this
recharging
n ISPP (Incremental Step Pulse Programming) q Iterative programming mechanism that increases the voltage
level of a flash cell step by step q After each step, voltage level compared to desired voltage
threshold q Can inject more electrons but cannot remove electrons
56