data retention in mlc nand flash memory: …omutlu/pub/flash-memory-data-retention... · data...

53
Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai, Yixin Luo , Erich F. Haratsch*, Ken Mai, Onur Mutlu Carnegie Mellon University, *LSI Corporation 1

Upload: lamphuc

Post on 15-Apr-2018

234 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Data Retention in MLC NAND Flash Memory: Characterization,

Optimization, and RecoveryYu Cai, Yixin Luo, Erich F. Haratsch*,

Ken Mai, Onur Mutlu

Carnegie Mellon University, *LSI Corporation

1

Page 2: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

You Probably Know

•Many use cases:

+ High performance, low energy consumption

2

Page 3: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

NAND Flash Memory Challenges

– Requires erase before program (write)

– High raw bit error rate

3

CPUFl

ash

C

on

tro

ller

ECC Controller

Raw Flash Memory

Chips

Page 4: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Limited Flash Memory Lifetime

4

Program/Erase (P/E) Cycles (or Writes Per Cell)

Raw

bit

err

or

rate

(R

BER

)

ECC-correctable RBER

~30

00

~20

00

Goal: Extend flash memory lifetime at low cost

P/E Cycle Lifetime

Page 5: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Retention Loss

5

Charge leakage over time

One dominant source of flash memory errors [DATE ‘12, ICCD ‘12]

10 0

Retention errorFlash cell

Page 6: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

NAND Flash 101

6

Before I show youhow we extend flash lifetime …

Page 7: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Threshold Voltage (Vth)

7

Normalized Vth

01

Flash cell Flash cell

Page 8: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Threshold Voltage (Vth) Distribution

8

Normalized Vth

01

Probability Density Function (PDF)

Page 9: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Read Reference Voltage (Vref)

9

Normalized Vth

PDF

01

Vref

Page 10: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Multi-Level Cell (MLC)

10

Normalized Vth

Erased(11)

P1(10)

P2(00)

P3(01)

PDFER

-P1

Vre

f

P1

-P2

Vre

f

P2

-P3

Vre

f

Page 11: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

11

Normalized Vth

PDFP1

(10)P2

(00)P3

(01)

Before retention loss:After some retention loss:Threshold Voltage Reduces Over Time

Page 12: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Fixed Read Reference Voltage Becomes Suboptimal

12

Normalized Vth

P1-P

2 V re

f

P2-P

3 V re

fNormalized Vth

PDFP1

(10)P2

(00)P3

(01)

Raw bit errors

Before retention loss:After some retention loss:

Page 13: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Optimal Read Reference Voltage (OPT)

13

Normalized Vth

PDFP1

(10)P2

(00)P3

(01)

P1-P

2 V re

f

P2-P

3 V re

f

P1-P

2 O

PT

P2-P

3 O

PT

Minimal raw bit errors

After some retention loss:

Page 14: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

14

Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage

Page 15: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Correctable errors

Retention Failure

15

Normalized Vth

PDFP1

(10)P2

(00)P3

(01)

P1-P

2 V re

f

P2-P

3 V re

fUncorrectable errors

After some retention loss:After significant retention loss:

Page 16: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

16

Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage

Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors

Page 17: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

17

To understand the effects of retention loss:- Characterize retention loss using real chips

Page 18: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

18

Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors

To understand the effects of retention loss:- Characterize retention loss using real chips

Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage

Page 19: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Characterization Methodology

19FPGA-based flash memory testing platform [Cai+,FCCM ‘11]

Page 20: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Characterization Methodology

•FPGA-based flash memory testing platform

•Real 20- to 24-nm MLC NAND flash chips

•0- to 40-day worth of retention loss

•Room temperature (20⁰C)

•0 to 50k P/E Cycles

20

Page 21: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

21

Characterize the effects of retention loss

1. Threshold Voltage Distribution

2. Optimal Read Reference Voltage

3. RBER and P/E Cycle Lifetime

Page 22: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

1. Threshold Voltage (Vth) Distribution

22

Normalized Vth

PD

F

P1 P2 P3

Page 23: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

1. Threshold Voltage (Vth) Distribution

23

Finding: Cell’s threshold voltage decreases over time

P1 P2 P3

0-day

40-day

0-day40-day

Page 24: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

2. Optimal Read Reference Voltage (OPT)

24

P1 P2 P3

Finding: OPT decreases over time

0-day OPT

40-day OPT

0-day OPT

40-day OPT

Page 25: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

3. RBER and P/E Cycle Lifetime

25

P/E Cycles

RB

ER

Page 26: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Actual OPT

Reading data with 7-day worth of retention loss.

3. RBER and P/E Cycle Lifetime

26

ECC-correctable RBER

Finding: Using actual OPT achieves the longest lifetime

Vref closer to actual OPT

No

min

al

Life

tim

e

Exte

nd

ed

Life

tim

e

Page 27: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Characterization Summary

Due to retention loss‐ Cell’s threshold voltage (Vth) decreases over time‐ Optimal read reference voltage (OPT) decreases over time

Using the actual OPT for reading‐ Achieves the longest lifetime

27

Page 28: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

28

Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors

To understand the effects of retention loss:- Characterize retention loss using real chips

Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage

Page 29: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Naïve Solution: Sweeping Vref

Key idea: Read the data multiple times with different read reference voltages until the raw bit errors are correctable by ECC

Finds the optimal read reference voltage

Requires many read-retries higher read latency

29

Page 30: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Comparison of Flash Read Techniques

30

Flash Read Techniques

Lifetime(P/E Cycle)

Performance(Read Latency)

Fixed Vref

SweepingVref

Our Goal

Page 31: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

1. The optimal read reference voltage gradually decreases over time

Key idea: Record the old OPT as a prediction (Vpred) of the actual OPT

Benefit: Close to actual OPT Fewer read retries

2. The amount of retention loss is similar across pages within a flash block

Key idea: Record only one Vpred for each block

Benefit: Small storage overhead (768KB out of 512GB)

Observations

31

Page 32: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Retention Optimized Reading (ROR)

Components:

1. Online pre-optimization algorithm‐ Periodically records a Vpred for each block

2. Improved read-retry technique‐ Utilizes the recorded Vpred to minimize read-retry count

32

Page 33: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

1. Online Pre-Optimization Algorithm•Triggered periodically (e.g., per day)

•Find and record an OPT as per-block Vpred

•Performed in background

•Small storage overhead

33

Normalized Vth

PDFNew Vpred

Old Vpred

Page 34: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

2. Improved Read-Retry Technique

•Performed as normal read

•Vpred already close to actual OPT

•Decrease Vref if Vpred fails, and retry

34

Normalized Vth

PDF OPT Vpred

Very close

Page 35: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Retention Optimized Reading: Summary

35

Flash Read Techniques

Lifetime(P/E Cycle)

Performance(Read Latency)

Fixed Vref

Sweeping Vref

64% ↑

ROR 64% ↑ _____Nom. Life: 2.4% ↓

Ext. Life: 70.4% ↓

Page 36: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

36

Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors

To understand the effects of retention loss:- Characterize retention loss using real chips

Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage

Page 37: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Correctable errors

Retention Failure

37

Normalized Vth

PDFP1

(10)P2

(00)P3

(01)

P1-P

2 V re

f

P2-P

3 V re

fUncorrectable errors

After some retention loss:After significant retention loss:

Page 38: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Leakage Speed Variation

38

Normalized Vth

PDF

S

F

low-leaking cell

ast-leaking cell

S

F

Page 39: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Initially, Right After Programming

39

Normalized Vth

PDF

S

F

SF

S

F

SF

P2 P3

Page 40: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

P2 P3

F

F

F

F

After Some Retention Loss

40

Normalized Vth

PDF

S

F

SF

S

F

SF

Fast-leaking cells have lower Vth

Slow-leaking cells have higher Vth

Page 41: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Eventually: Retention Failure

41

Normalized Vth

PDF

S

F

SF

S

F

SF

OPTP2 P3

Page 42: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Retention Failure Recovery (RFR)

Key idea: Guess original state of the cell from its leakage speed property

Three steps

1. Identify risky cells

2. Identify fast-/slow-leaking cells

3. Guess original states

42

Page 43: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

1. Identify Risky Cells

43

Normalized Vth

PDF

S

S

F

F

OP

T+σ

OP

T

OP

T–σ

Risky cells

P2

P3

+ S =

+ F =

Key Formula

Page 44: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

2. Identifying Fast- vs. Slow-Leaking Cells

44

Normalized Vth

PDF

OP

T+σ

OP

T

OP

T–σ

Risky cells

P2

P3

+ S =

+ F =

Key Formula

?

?

?

?

?

?

Page 45: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

2. Identifying Fast- vs. Slow-Leaking Cells

45

Normalized Vth

PDF

OP

T+σ

OP

T

OP

T–σ

Risky cells

P2

P3

+ S =

+ F =

Key Formula

?

?

?

?

S F

F S

?

?

Page 46: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

3. Guess Original States

46

Normalized Vth

PDF

S F

F S

Risky cells

P2

P3

+ S =

+ F =

Key Formula

Page 47: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

RFR Evaluation

•Expect to eliminate 50% of raw bit errors

•ECC can correct remaining errors

47

Program with random data

Detect failure, backup data

Recover data

28 days

12 addt’l. days

Page 48: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

48

Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors

To understand the effects of retention loss:- Characterize retention loss using real chips

Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage

Page 49: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

ConclusionProblem: Retention loss reduces flash lifetime

Overall Goal: Extend flash lifetime at low cost

Flash Characterization: Developed an understanding of the effects of retention loss in real chips

Retention Optimized Reading: A low-cost mechanism that dynamically finds the optimal read reference voltage

‐ 64% lifetime ↑, 70.4% read latency ↓

Retention Failure Recovery: An offline mechanism that recovers data after detecting uncorrectable errors

‐ Raw bit error rate 50% ↓, reduces data loss49

Page 50: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Data Retention in MLC NAND Flash Memory: Characterization,

Optimization, and RecoveryYu Cai, Yixin Luo, Erich F. Haratsch*,

Ken Mai, Onur Mutlu

Carnegie Mellon University, *LSI Corporation

50

Page 51: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

Backup Slides

51

Page 52: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

RFR Motivation

Data loss can happen in many ways

1. High P/E cycle

2. High temperature accelerates retention loss

3. High retention age (lost power for a long time)

52

Page 53: Data Retention in MLC NAND Flash Memory: …omutlu/pub/flash-memory-data-retention... · Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai,

What if there are other errors?

Key: RFR does not have to correct all errors

Example:

•ECC can correct 40 errors in a page

•Corrupted page has 20 retention errors, 25 other errors (45 total errors)

•After RFR: 10 retention errors, 30 other errors (40 total errors ECC correctable)

53