differential raid: rethinking raid for ssd reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs...

24
Differential RAID: Rethinking RAID for SSD Reliability Mahesh Balakrishnan Asim Kadav 1 , Vijayan Prabhakaran, Dahlia Malkhi Microsoft Research Silicon Valley 1 The University of Wisconsin-Madison

Upload: others

Post on 08-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Differential RAID: Rethinking RAID for SSD Reliability

Mahesh BalakrishnanAsim Kadav1, Vijayan Prabhakaran, Dahlia Malkhi

Microsoft Research Silicon Valley1The University of Wisconsin-Madison

Page 2: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Solid State Storage

• Flash storage is now mainstream

• Commodity SSDs– Thousands of IOPS– Low power consumption

• How do we protect data on SSDs?– Device-Level redundancy: RAID levels

2

Page 3: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Primer on SSDs

• Smallest unit of read/write is a Page (e.g., 4KB)• Pages must be erased before they are

overwrittenMore writes More erasures

• MTTF, Bit Error Rate (BER)More erasures Higher BER

More writes Higher BER

3

Page 4: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

The BER Curve

Can we use RAID to protect data on SSDs? 4

Page 5: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

RAID-5 and SSDs

Drive 1

DataDataData

ParityDataDataData

ParityDataData

Drive 2

DataData

ParityDataDataData

ParityDataDataData

Drive 3

DataParityDataDataData

ParityDataDataData

Parity

Drive 4

ParityDataDataData

ParityDataDataData

ParityData

In RAID-5, all drives age at the same rate

Due to random writes, parity

blocks are updated more

often than data blocks

In RAID-5, parity blocks are evenly

distributed across all drives

5

Page 6: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Correlated Failures in RAID-5

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

Conventional RAID can induce correlated failures with SSDs

Parity Distribution:

6

Page 7: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

RAID-5 ReliabilityData Loss Probability: When a drive fails, the probability that it cannot be completely reconstructed.

Drives are replaced when they reach erasure limit simultaneously

7

Page 8: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Solution: Differential RAID

• Goal: Age SSDs at different rates

• Uneven Parity Assignment• Example: (70, 10, 10, 10): 70% of parity on Device 1• Any possible configuration between RAID-4 and RAID-5

• Drive Replacement

8

Page 9: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Uneven Parity Distribution

Drive 1

DataDataData

ParityParityParityParityParityParityParity

Drive 2

DataData

ParityDataDataDataDataDataDataData

Drive 3

DataParityDataDataDataDataDataDataDataData

Drive 4

ParityDataDataDataDataDataDataDataDataData

Drive 1: More parity More writes

Ages 2x faster than other drives

9

Page 10: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

esUneven Parity Distribution

Parity Distribution:

Aging SSDs at different rates helps10

Page 11: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Solution: Differential RAID

• Goal: Age SSDs at different rates

• Uneven Parity Assignment• Example: (70, 10, 10, 10): 70% of parity on Device 1• Any possible configuration between RAID-4 and RAID-5

• Drive Replacement

11

Page 12: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4 Drive 5

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4 Drive 5

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4 Drive 5

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4 Drive 5

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

esDrive Replacement

Parity Distribution:

Naïve Drive Replacement can still lead to correlated failures! 12

Page 13: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

Convergence

Parity Distribution:

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

0%10%20%30%40%50%60%70%80%90%

100%

Drive 1 Drive 2 Drive 3 Drive 4

Use

d Er

asur

es

Age distribution provably converges for any parity assignment!13

Page 14: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Evaluation

• Simulation for reliability– Real BER data for 12 chips from two studies1

– 5-10K erase cycles– Assumed 4-bit ECC per 512-byte sector– Metric: Data Loss Probability (DLP)

• Implementation for performance

1N.Mielke et al., Bit Error Rate in NAND Flash Memories. International Reliability Physics Symposium, 2008.L.M.Grupp et al., Characterizing Flash Memory: Anomalies, Observations, and Applications. Micro 2009.14

Page 15: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Evaluation

• Simulation for reliability• Implementation for performance

– 5 x Intel X25-M MLC SSDs• 80 GB each• Random write: 3.3K IOPS, Random read: 35K IOPS• Sequential write: 70 MB/s, Sequential read: 250 MB/s

1N.Mielke et al., Bit Error Rate in NAND Flash Memories. International Reliability Physics Symposium, 2008.L.M.Grupp et al., Characterizing Flash Memory: Anomalies, Observations, and Applications. Micro 2009.15

Page 16: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Diff-RAID Reliability

16

RAID-5

Diff-RAID

Device replacements

Page 17: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Diff-RAID Reliability

17

Diff-RAID is more reliable than

RAID-5 by at least 10x for 7/12 chips

Page 18: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Reliability vs Throughput

Reliability decreases as parity is less concentrated

(20,20,20,20,20)RAID-5(80,5,5,5,5)

Throughput increasesas parity is less concentrated

18

Page 19: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Conclusion

• RAID can cause correlated failures with Flash– Not just RAID-5; not just SSDs

• Differential RAID:– Key Idea: Age SSDs at different rates– Same space overhead as RAID-5– Trade-off between reliability and throughput

19

Page 20: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Other Results

• Diff-RAID works on real workloads

• Diff-RAID can be used to extend SSD lifetime– Replace drives at 13K cycles DLP < 0.001– Replace drives at 15K cycles DLP < 0.01

• Diff-RAID can lower ECC requirements– Diff-RAID on 3-bit ECC == RAID-5 on 5-bit ECC

20

Page 21: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

SSDs and RAID

• RAID-5 is a bad idea for Hard Disks: Performance: Slow random writes Cost: Storage is cheap Just use RAID-1/10 Reliability: High probability of data loss in large arrays

• RAID-5 is a great idea for SSDs!Performance: Fast random writes (5 SSDs = 14K/sec)Cost: Storage is expensive Can’t use RAID-1/10Reliability: Correlated Failures! (Not just RAID-5: RAID-1, RAID-4, RAID-10, RAID-6…)

SSDs are not hard disks!

21

Page 22: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Other Benefits

30% more lifetime with DLP 0.00150% more lifetime with DLP 0.01

Diff-RAID allows SSDs to be used past the erasure limit:

Diff-RAID with 3-bit ECC is equal to RAID-5 with 5-bit ECC

Diff-RAID allows SSDs to be used with less ECC:

22

Page 23: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Real WorkloadsRAID stripe size matters:

larger stripe size more random writes

Larger stripe more reliable

Reliability versus throughputon real workloads

23

Page 24: Differential RAID: Rethinking RAID for SSD Reliabilitypages.cs.wisc.edu › ~kadav › new2 › pdfs › eurosys2010-mahesh.pdf · Reliability: High probability of data loss in large

Reliability

Random device fails: Diff-RAID is at least 2x more reliable for 9/12 chips.

Oldest device fails: Diff-RAID is at least 10x more reliable for 7/12 chips.

Flash Chips

24