cisc 3320 mass storage: reliability and efficiency · •snapshot is a view of file system before a...

25
CISC 3320 Mass Storage: Reliability and Efficiency Hui Chen Department of Computer & Information Science CUNY Brooklyn College 12/4/2019 1 CUNY | Brooklyn College

Upload: others

Post on 08-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

CISC 3320

Mass Storage: Reliability

and EfficiencyHui Chen

Department of Computer & Information Science

CUNY Brooklyn College

12/4/2019 1CUNY | Brooklyn College

Page 2: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Acknowledgement

• These slides are a revision of the slides

provided by the authors of the textbook

via the publisher of the textbook

12/4/2019 CUNY | Brooklyn College 2

Page 3: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Memory Hierarchy

12/4/2019 CUNY | Brooklyn College 3

Secondary Memory (e.g., Magnetic Disk)

Main Memory

Cache

Registers

~10ms

~10ns

~2ns

~1ns

~TB

~GB

~MB

<KB

Tertiary Memory (e.g., Magnetic Tapes)

Page 4: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Outline

• Reliable and Efficiency

• Redundancy and parallelism

• RAID Structure

12/4/2019 CUNY | Brooklyn College 4

Page 5: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Mass Storage: Design Goals

• Function

• They need to work, read & write

• Reliable

• Murphy’s law

• "Anything that can go wrong will go wrong“

• How do we make it appearing reliable?

• Efficient

• I/O Efficiency (performance)

• We need it to be “fast” (read & write)

12/4/2019 CUNY | Brooklyn College 5

Page 6: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Disk Failures and Data Loss

• Mean time between failures (MTBF)

• The statistical mean time that a device is expected to work correctly before failing, seean example.

• Mean time to repair

• Exposure time when another failure could cause data loss

• Mean time to data loss based on above factors (Why? See discussion)

12/4/2019 CUNY | Brooklyn College 6

Page 7: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Redundancy for Reliability

• Mirroring: duplicate disk drive

• Example: two physical disk drives are presented as a logical disk drive (mirrored volume)

• Disk failure

• Failure of one physical disk does not result in data loss when the failed physical disk is replaced in time

• Data loss

• Failure of two physical disk drives (at the same time, or before replacement of the first failed disk)

• Redundancy can reduce chances of data loss (greatly)

12/4/2019 CUNY | Brooklyn College 7

Page 8: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Redundancy and Data Loss:

Factors• Mean time between failure (MTBF) of a single disk drive and many disk drives

• Example

• MTBF of a single disk drive: 1,000,000 hours

• 100 disk drives: 1,000,000/100 = 10,000 hours = 416.7 days

• Mean time to repair (MTTR): time required to replace the failure disk drive

• Mean time to data loss: time required to have a data loss (the second disk also failed before the failed one is repaired)

12/4/2019 CUNY | Brooklyn College 8

Page 9: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Redundancy and Data Loss:

Example• For two-disk mirroring case (Disk A and Disk B)

• MTBF = 1,000,000 hours

• MTTR = 10 hours

• Data loss

• Disk A failed first, and then disk B failed

• Disk B failed first, and them Disk A failed

• Mean time to data loss: failure of the mirrored volume (the 2nd disk drive also failed before the 1st could be replaced) : 1,000,0002 / (2 x 10) =5 x 1010 hours = 5.7 x 106 years!

12/4/2019 CUNY | Brooklyn College 9

Page 10: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Practical Consideration

• Disk failures are not independent

• Example

• Large number of disk drive failures can be the result of a power failure and a tremor of a minor earthquake

• Manufacturing defects in a batch of disk drives can lead to correlated failures

• The probability of failure grows as disk drives age

• Mean time to data loss is smaller

12/4/2019 CUNY | Brooklyn College 10

Page 11: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Parallelism for Efficiency

• Observation

• Duplicating disk drives doubles the rate at which read requests can be handled

• Data stripping

• Splitting data across multiple disk drives

• Bit-level stripping

• Splitting the bits of each byte across multiple disk drives

• Example: using an array of 8 disks (or a factor of 8 or a multiple of 8)

• Block-level stripping

• Splitting blocks of a file across multiple disk drives

• Benefits

• Increase the throughput of multiple small accesses (page accesses)

• Reduce the response time of large accesses

12/4/2019 CUNY | Brooklyn College 11

Page 12: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

RAID Structure

• RAID – redundant array of inexpensive disks

• multiple disk drives provides reliability via redundancy

• Increases the mean time to data loss

• Frequently combined with NVRAM to improve write performance

• Several improvements in disk-use techniques involve the use of multiple disks working cooperatively

12/4/2019 CUNY | Brooklyn College 12

Page 13: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

RAID Levels

• Disk striping uses a group of disks as one storage unit

• RAID is arranged into six different levels

• RAID schemes improve performance and improve the reliability of the storage system by storing redundant data

• Mirroring or shadowing (RAID 1) keeps duplicate of each disk

• Striped mirrors (RAID 1+0) or mirrored stripes (RAID 0+1) provides high performance and high reliability

• Block interleaved parity (RAID 4, 5, 6) uses much less redundancy

• RAID within a storage array can still fail if the array fails, so automatic replication of the data between arrays is common

• Frequently, a small number of hot-spare disks are left unallocated, automatically replacing a failed disk and having data rebuilt onto them

12/4/2019 CUNY | Brooklyn College 13

Page 14: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

12/4/2019 CUNY | Brooklyn College 14

Page 15: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

RAID Level 0 - 3

• Level 0 (not a true RAID): Stripping only

• Level 1: Mirroring only

• Level 2: Memory-style error-correction-code (ECC) organization

• Example

• Split a byte into 4-bit nibbles

• Add Hamming code (3 bits) to each

• One bit per drive onto 7 disk drives

• Level 3: Bit-interleaved parity organization

• Compute a parity bit

• Driver detects error, a parity bit is sufficient to correct error (unlike memory)

• Backup and parity drives are shown shaded.

12/4/2019 CUNY | Brooklyn College 15

Page 16: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

RAID Level 4 - 6

• Level 4: block-interleaved parity organization

• Stripping

• Compute parity for blocks

• One disk for parities

• Level 5: block-interleaved distributed parity

• Stripping

• Compute parity for blocks

• Parities are distributed

• Level 6: P+Q redundancy scheme

• Extra redundant information to guard multiple disk failures

• Backup and parity drives are shown shaded.

12/4/2019 CUNY | Brooklyn College 16

Page 17: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

RAID 0+1 and 1+0

• Combination of RAID levels 0 and 1

• RAID 0+1: stripping and then mirroring

• RAID 1+0: mirroring and then stripping

12/4/2019 CUNY | Brooklyn College 17

Page 18: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Selecting RAID Level

• RAID 0: high performance, data loss is not critical

• RAID 1: high reliability with fast recovery

• RAID 0+1 & 1+0: both performance and reliability

• Expense: 2 for 1

• RAID 5:

• Often preferred for large volumes of data

• Required number of disks?

12/4/2019 CUNY | Brooklyn College 18

Page 19: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Other Features

• Regardless of where RAID implemented, other useful features can be added

• Snapshot is a view of file system before a set of changes take place (i.e. at a point in time)

• More in Ch 12

• Replication is automatic duplication of writes between separate sites

• For redundancy and disaster recovery

• Can be synchronous or asynchronous

• Hot spare disk is unused, automatically used by RAID production if a disk fails to replace the failed disk and rebuild the RAID set if possible

• Decreases mean time to repair

12/4/2019 CUNY | Brooklyn College 19

Page 20: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Questions

• Reliability and performance

• Performance via parallelism

• Reliability via redundancy

• RAID

• Which level to use? How many disks?

12/4/2019 CUNY | Brooklyn College 20

Page 21: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Limitation of RAID

• RAID alone does not prevent or detect data

corruption or other errors, just disk failures

• RAID is not flexible

• Present a disk array as a volume

• What if a file system is small, or large, or change over

time?

12/4/2019 CUNY | Brooklyn College 21

volume volume volume

FS FS FS

Page 22: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Extensions: Solaris ZFS

• Solaris ZFS adds checksums of all data and metadata

• Checksums kept with pointer to object, to detect if object is the right one and whether it changed

• Can detect and correct data and metadata corruption

• ZFS also removes volumes, partitions

• Disks allocated in pools

• Filesystems with a pool share that pool, use and release space like malloc() and free() memory allocate / release calls

12/4/2019 CUNY | Brooklyn College 22

Page 23: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

12/4/2019 CUNY | Brooklyn College 23

ZFS checksums all metadata and data

Page 24: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Traditional and Pooled

Storage

12/4/2019 CUNY | Brooklyn College 24

Page 25: CISC 3320 Mass Storage: Reliability and Efficiency · •Snapshot is a view of file system before a set of changes take place (i.e. at a point in time) •More in Ch 12 •Replication

Questions?

• RAID?

• When to use different levels?

• Limitations?

• ZFS?

• Why (vs RAID)?

12/4/2019 CUNY | Brooklyn College 25