an introduction to raid

7/28/2019 An Introduction to RAID

1/30

An Introduction to RAID

The Need for RAID

Data Striping & Redundancy

Different Types of RAID

Cost & Performance Issues

Reliability Issues in RAID

Implementations

Problems

RAID (redundant array of independent disks, originally redundant array of inexpensive

disks[1][2]) is a storage technology that combines multipledisk drivecomponents into a logicalunit. Data is distributed across the drives in one of several ways called "RAID levels", depending

on what level ofredundancyand performance (viaparallel communication) is required.

RAID is an example ofstorage virtualizationand was first defined byDavid Patterson,Garth A.

Gibson, andRandy Katzat theUniversity of California, Berkeleyin 1987.[3]

Marketers

representing industry RAID manufacturers later attempted to reinvent the term to describe aredundant array of independent disks as a means of dissociating a low-cost expectation from

RAID technology.[4]

RAID is now used as anumbrella termforcomputer data storageschemes that can divide and

replicatedataamong multiple physical drives. The physical drives are said to be in a RAID

array,[5]

which is accessed by theoperating systemas one single drive. The different schemes orarchitectures are named by the word RAID followed by a number (e.g., RAID 0, RAID 1). Each

scheme provides a different balance between two key goals: increasedata reliabilityand increase

input/outputperformance.
http://www.ecs.umass.edu/ece/koren/architecture/Raid/intro.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/intro.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/why.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/why.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/striping.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/striping.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/reliability.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/reliability.htmlhttp://en.wikipedia.org/wiki/RAID#cite_note-0http://en.wikipedia.org/wiki/RAID#cite_note-0http://en.wikipedia.org/wiki/RAID#cite_note-0http://en.wikipedia.org/wiki/Disk_drivehttp://en.wikipedia.org/wiki/Disk_drivehttp://en.wikipedia.org/wiki/Disk_drivehttp://en.wikipedia.org/wiki/Redundancy_(engineering)http://en.wikipedia.org/wiki/Redundancy_(engineering)http://en.wikipedia.org/wiki/Redundancy_(engineering)http://en.wikipedia.org/wiki/Parallel_communicationhttp://en.wikipedia.org/wiki/Parallel_communicationhttp://en.wikipedia.org/wiki/Parallel_communicationhttp://en.wikipedia.org/wiki/Storage_virtualizationhttp://en.wikipedia.org/wiki/Storage_virtualizationhttp://en.wikipedia.org/wiki/Storage_virtualizationhttp://en.wikipedia.org/wiki/David_Patterson_(scientist)http://en.wikipedia.org/wiki/David_Patterson_(scientist)http://en.wikipedia.org/wiki/David_Patterson_(scientist)http://en.wikipedia.org/wiki/Garth_A._Gibsonhttp://en.wikipedia.org/wiki/Garth_A._Gibsonhttp://en.wikipedia.org/wiki/Garth_A._Gibsonhttp://en.wikipedia.org/wiki/Garth_A._Gibsonhttp://en.wikipedia.org/wiki/Randy_Katzhttp://en.wikipedia.org/wiki/Randy_Katzhttp://en.wikipedia.org/wiki/Randy_Katzhttp://en.wikipedia.org/wiki/University_of_California,_Berkeleyhttp://en.wikipedia.org/wiki/University_of_California,_Berkeleyhttp://en.wikipedia.org/wiki/University_of_California,_Berkeleyhttp://en.wikipedia.org/wiki/RAID#cite_note-patterson-2http://en.wikipedia.org/wiki/RAID#cite_note-patterson-2http://en.wikipedia.org/wiki/RAID#cite_note-patterson-2http://en.wikipedia.org/wiki/RAID#cite_note-3http://en.wikipedia.org/wiki/RAID#cite_note-3http://en.wikipedia.org/wiki/RAID#cite_note-3http://en.wikipedia.org/wiki/Umbrella_termhttp://en.wikipedia.org/wiki/Umbrella_termhttp://en.wikipedia.org/wiki/Umbrella_termhttp://en.wikipedia.org/wiki/Computer_data_storagehttp://en.wikipedia.org/wiki/Computer_data_storagehttp://en.wikipedia.org/wiki/Computer_data_storagehttp://en.wikipedia.org/wiki/Data_(computing)http://en.wikipedia.org/wiki/Data_(computing)http://en.wikipedia.org/wiki/Data_(computing)http://en.wikipedia.org/wiki/RAID#cite_note-RAS-4http://en.wikipedia.org/wiki/RAID#cite_note-RAS-4http://en.wikipedia.org/wiki/RAID#cite_note-RAS-4http://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Data_reliabilityhttp://en.wikipedia.org/wiki/Data_reliabilityhttp://en.wikipedia.org/wiki/Data_reliabilityhttp://en.wikipedia.org/wiki/Input/outputhttp://en.wikipedia.org/wiki/Input/outputhttp://en.wikipedia.org/wiki/Input/outputhttp://en.wikipedia.org/wiki/Data_reliabilityhttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/RAID#cite_note-RAS-4http://en.wikipedia.org/wiki/Data_(computing)http://en.wikipedia.org/wiki/Computer_data_storagehttp://en.wikipedia.org/wiki/Umbrella_termhttp://en.wikipedia.org/wiki/RAID#cite_note-3http://en.wikipedia.org/wiki/RAID#cite_note-patterson-2http://en.wikipedia.org/wiki/University_of_California,_Berkeleyhttp://en.wikipedia.org/wiki/Randy_Katzhttp://en.wikipedia.org/wiki/Garth_A._Gibsonhttp://en.wikipedia.org/wiki/Garth_A._Gibsonhttp://en.wikipedia.org/wiki/David_Patterson_(scientist)http://en.wikipedia.org/wiki/Storage_virtualizationhttp://en.wikipedia.org/wiki/Parallel_communicationhttp://en.wikipedia.org/wiki/Redundancy_(engineering)http://en.wikipedia.org/wiki/Disk_drivehttp://en.wikipedia.org/wiki/RAID#cite_note-0http://en.wikipedia.org/wiki/RAID#cite_note-0http://www.ecs.umass.edu/ece/koren/architecture/Raid/reliability.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/striping.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/why.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/intro.html


2/30

There are number of different RAID levels:

Level 0 -- Striped Disk Array without Fault Tolerance: Provides data striping(spreading out blocks of

each file across multiple disk drives) but no redundancy. This improves performance but does not deliver

fault tolerance. If one drive fails then all data in the array is lost.

Level 1 -- Mirroring and Duplexing: Providesdisk mirroring. Level 1 provides twice the read transaction

rate of single disks and the same write transaction rate as single disks.

Level 2 -- Error-Correcting Coding: Not a typical implementation and rarely used, Level 2 stripes data at

the bit level rather than the block level.

Level 3 -- Bit-Interleaved Parity: Provides byte-level striping with a dedicated parity disk. Level 3, which

cannot service simultaneous multiple requests, also is rarely used.

Level 4 -- Dedicated Parity Drive:A commonly used implementation of RAID, Level 4 provides block-

level striping (like Level 0) with a parity disk. If a data disk fails, the parity data is used to create a

replacement disk. A disadvantage to Level 4 is that the parity disk can create write bottlenecks.

Level 5 -- Block Interleaved Distributed Parity:Provides data striping at the byte level and also stripe

error correction information. This results in excellent performance and good fault tolerance. Level 5 is one ofthe most popular implementations of RAID.

Level 6 -- Independent Data Disks with Double Parity: Provides block-level striping with parity data

distributed across all disks.

Level 0+1 -- A Mirror of Stripes:Not one of the original RAID levels, two RAID 0 stripes are created, and

a RAID 1 mirror is created over them. Used for both replicating and sharing data among disks.

Level 10 -- A Stripe of Mirrors:Not one of the original RAID levels, multiple RAID 1 mirrors are created,

and a RAID 0 stripe is created over these.

Level 7:A trademark of Storage Computer Corporation that adds caching to Levels 3 or 4.

RAID S: (also called Parity RAID) EMC Corporation's proprietary striped parity RAID system used in its

Symmetrix storage systems.

[edit] New RAID classification

In 1996, the RAID Advisory Board introduced an improved classification of RAIDsystems.

[citation needed]It divides RAID into three types:

Failure-resistant(systems that protect against loss of data due to drive failure).

Failure-tolerant(systems that protect against loss of data access due to failure of any singlecomponent).

Disaster-tolerant(systems that consist of two or more independent zones, either of which

provides access to stored data).

The original "Berkeley" RAID classifications are still kept as an important historical referencepoint and also to recognize that RAID levels 06 successfully define all known data mapping

and protection schemes for disk-based storage systems. Unfortunately, the original classification
http://www.webopedia.com/TERM/D/disk_mirroring.htmlhttp://www.webopedia.com/TERM/D/disk_mirroring.htmlhttp://www.webopedia.com/TERM/D/disk_mirroring.htmlhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=6http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=6http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=6http://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=6http://www.webopedia.com/TERM/D/disk_mirroring.html


3/30

caused some confusion due to the assumption that higher RAID levels imply higher redundancy

and performance; this confusion has been exploited by RAID system manufacturers, and it has

given birth to the products with such names as RAID-7, RAID-10, RAID-30, RAID-S, etc.Consequently, the new classification describes the data availability characteristics of a RAID

system, leaving the details of its implementation to system manufacturers.

Failure-resistant disk systems (FRDS) (meets a minimum of criteria 16)

1. Protection against data loss and loss of access to data due to drive failure

2. Reconstruction of failed drive content to a replacement drive

3. Protection against data loss due to a "write hole"

4. Protection against data loss due to host and host I/O bus failure

5. Protection against data loss due to replaceable unit failure

6. Replaceable unit monitoring and failure indication

Failure-tolerant disk systems (FTDS) (meets a minimum of criteria 115)

7. Disk automatic swap and hot swap

8. Protection against data loss due to cache failure

9. Protection against data loss due to external power failure

10.Protection against data loss due to a temperature out of operating range

11.Replaceable unit and environmental failure warning

12.Protection against loss of access to data due to device channel failure

13.Protection against loss of access to data due to controller module failure

14.Protection against loss of access to data due to cache failure

15.Protection against loss of access to data due to power supply failure

Disaster-tolerant disk systems (DTDS) (meets a minimum of criteria 121)

16.Protection against loss of access to data due to host and host I/O bus failure

17.Protection against loss of access to data due to external power failure

18.Protection against loss of access to data due to component replacement19.Protection against loss of data and loss of access to data due to multiple drive failures

20.Protection against loss of access to data due to zone failure

21.Long-distance protection against loss of data due to zone failure


4/30

NEED

The need for RAID can be summarized in two points given below. The two keywords areRedundant and Array.

An arrayof multiple disks accessed in parallel will give greater throughput than a

single disk.

Redundant data on multiple disks provides fault tolerance.

Provided that the RAID hardware and software perform true parallel accesses on multiple drives,

there will be a performance improvement over a single disk.

With a single hard disk, you cannot protect yourself against the costs of a disk failure, the time

required to obtain and install a replacement disk, reinstall the operating system, restore files frombackup tapes, and repeat all the data entry performed since the last backup was made.

With multiple disks and a suitable redundancy scheme, your system can stay up and runningwhen a disk fails, and even while the replacement disk is being installed and its data restored.

To create an optimal cost-effective RAID configuration, we need to simultaneously achievethe following goals:

Maximize the number of disks being accessed in parallel.

Minimize the amount of disk space being used for redundant data.

Minimize the overhead required to achieve the above goals.

Basic RAID Organizations


5/30

There are many types of RAID and some of the important ones are introduced below:

Non-Redundant (RAID Level 0)A non-redundant disk array, or RAID level 0, has the lowest cost of any RAID organizationbecause it does not employ redundancy at all. This scheme offers the best performance since it

never needs to update redundant information. Surprisingly, it does not have the best

performance. Redundancy schemes that duplicate data, such as mirroring, can perform better onreads by selectively scheduling requests on the disk with the shortest expectedseekand

rotationaldelays. Without, redundancy, any single disk failure will result in data-loss. Non-

redundant disk arrays are widely used in super-computing environments where performance and

capacity, rather than reliability, are the primary concerns.

Sequential blocks of data are written across multiple disks in stripes, as follows:

source:Reference 2

The size of a data block, which is known as the "stripe width", varies with the implementation,but is always at least as large as a disk's sector size. When it comes time to read back this

sequential data, all disks can be read in parallel. In a multi-tasking operating system, there is a

high probability that even non-sequential disk accesses will keep all of the disks working inparallel.

Mirrored (RAID Level 1)The traditional solution, called mirroring or shadowing, uses twice as many disks as a non-redundant disk array. whenever data is written to a disk the same data is also written to a

redundant disk, so that there are always two copies of the information. When data is read, it can

be retrieved from the disk with the shorter queuing, seek and rotational delays. If a disk fails, the

other copy is used to service requests. Mirroring is frequently used in database applicationswhere availability and transaction time are more important than storage efficiency.
http://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#seekhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#seekhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#seekhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#rotationalhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#rotationalhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#rotationalhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#seek


6/30

source:Reference 2

Memory-Style(RAID Level 2)Memory systems have provided recovery from failed components with much less cost than

mirroring by using Hamming codes. Hamming codes contain parity for distinct overlappingsubsets of components. In one version of this scheme, four disks require three redundant disks,

one less than mirroring. Since the number of redundant disks is proportional to the log of thetotal number of the disks on the system, storage efficiency increases as the number of data disksincreases.

If a single component fails, several of the parity components will have inconsistent values, andthe failed component is the one held in common by each incorrect subset. The lost information is

recovered by reading the other components in a subset, including the parity component, and

setting the missing bit to 0 or 1 to create proper parity value for that subset. Thus, multiple

redundant disks are needed to identify the failed disk, but only one is needed to recover the lostinformation.

In you are unaware of parity, you can think of the redundant disk as having the sum of all data inthe other disks. When a disk fails, you can subtract all the data on the good disks form the parity

disk; the remaining information must be the missing information. Parity is simply this summodulo 2.

A RAID 2 system would normally have as many data disks as the word size of the computer,typically 32. In addition, RAID 2 requires the use of extra disks to store an error-correcting code

for redundancy. With 32 data disks, a RAID 2 system would require 7 additional disks for a

Hamming-code ECC. Such an array of 39 disks was the subject of a U.S. patent granted toUnisys Corporation in 1988, but no commercial product was ever released.

For a number of reasons, including the fact that modern disk drives contain their own internalECC, RAID 2 is not a practical disk array scheme.
http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.html


7/30

source:Reference 2

Bit-Interleaved Parity (RAID Level 3)One can improve upon memory-style ECC disk arrays by noting that, unlike memory component

failures, disk controllers can easily identify which disk has failed. Thus, one can use a single

parity rather than a set of parity disks to recover lost information.

In a bit-interleaved, parity disk array, data is conceptually interleaved bit-wise over the data

disks, and a single parity disk is added to tolerate any single disk failure. Each read requestaccesses all data disks and each write request accesses all data disks and the parity disk. Thus,

only one request can be serviced at a time. Because the parity disk contains only parity and no

data, the parity disk cannot participate on reads, resulting in slightly lower read performance thanfor redundancy schemes that distribute the parity and data over all disks. Bit-interleaved, parity

disk arrays are frequently used in applications that require high bandwidth but not high I/O rates.

They are also simpler to implement than RAID levels 4, 5, and 6.

Here, the parity disk is written in the same way as the parity bit in normal Random AccessMemory (RAM), where it is the Exclusive Or of the 8, 16 or 32 data bits. In RAM, parity is used

to detect single-bit data errors, but it cannot correct them because there is no informationavailable to determine which bit is incorrect. With disk drives, however, we rely on the disk

controller to report a data read error. Knowing which disk's data is missing, we can reconstruct it

as the Exclusive Or (XOR) of all remaining data disks plus the parity disk.

source:Reference 2

As a simple example, suppose we have 4 data disks and one parity disk. The sample bits are:
http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.html


8/30

Disk 0 Disk 1 Disk 2 Disk 3 Parity

0 1 1 1 1

The parity bit is the XOR of these four data bits, which can be calculated by adding them up and

writing a 0 if the sum is even and a 1 if it is odd. Here the sum of Disk 0 through Disk 3 is "3",

so the parity is 1. Now if we attempt to read back this data, and find that Disk 2 gives a readerror, we can reconstruct Disk 2 as the XOR of all the other disks, including the parity. In the

example, the sum of Disk 0, 1, 3 and Parity is "3", so the data on Disk 2 must be 1.

Block-Interleaved Parity (RAID Level 4)The block-interleaved, parity disk array is similar to the bit-interleaved, parity disk array except

that data is interleaved across disks of arbitrary size rather than in bits. The size of these blocks is

called the striping unit. Read requests smaller than the striping unit access only a single datadisk. Write requests must update the requested data blocks and must also compute and update the

parity block. For large writes that touch blocks on all disks, parity is easily computed by

exclusive-or'ing the new data for each disk. For small write requests that update only one data

disk, parity is computed by noting how the new data differs from the old data and applying thosedifferences to the parity block. Small write requests thus require four disk I/Os: one to write the

new data, two to read the old data and old parity for computing the new parity, and one to write

the new parity. This is referred to as a read-modify-write procedure. Because a block-interleaved,parity disk array has only one parity disk, which must be updated on all write operations, the

parity disk can easily become a bottleneck. Because of this limitation, the block-interleaveddistributed parity disk array is universally preferred over the block-interleaved, parity disk array.

source:Reference 2

Block-Interleaved Distributed-Parity (RAID Level 5)


9/30

The block-interleaved distributed-parity disk array eliminates the parity disk bottleneck present

in the block-interleaved parity disk array by distributing the parity uniformly over all of the

disks. An additional, frequently overlooked advantage to distributing the parity is that it alsodistributes data over all of the disks rather than over all but one. This allows all disks to

participate in servicing read operations in contrast to redundancy schemes with dedicated parity

disks in which the parity disk cannot participate in servicing read requests. Block-interleaveddistributed-parity disk array have the best small read, large write performance of any redundancydisk array. Small write requests are somewhat inefficient compared with redundancy schemes

such as mirroring however, due to the need to perform read-modify-write operations to update

parity. This is the major performance weakness of RAID level 5 disk arrays.

The exact method used to distribute parity in block-interleaved distributed-parity disk arrays can

affect performance. Following figure illustrates left-symmetric parity distribution.

Each square corresponds to a stripe unit. Each column

of squares corresponds to a disk. P0 computes the parity over stripe units 0, 1, 2 and 3; P1

computes parity over stripe units 4, 5, 6, and 7 etc. (source:Reference 1)

A useful property of the left-symmetric parity distribution is that whenever you traverse the

striping units sequentially, you will access each disk once before accessing any disk device. Thisproperty reduces disk conflicts when servicing large requests.

source:Reference 2
http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.html


10/30

P+Q redundancy (RAID Level 6)Parity is a redundancy code capable of correcting any single, self-identifying failure. As large

disk arrays are considered, multiple failures are possible and stronger codes are needed.Moreover, when a disk fails in parity-protected disk array, recovering the contents of the failed

disk requires successfully reading the contents of all non-failed disks. The probability ofencountering an uncorrectable read error during recovery can be significant. Thus, applications

with more stringent reliability requirements require stronger error correcting codes.

Once such scheme, called P+Q redundancy, uses Reed-Solomon codes to protect against up to

two disk failures using the bare minimum of two redundant disk arrays. The P+Q redundant disk

arrays are structurally very similar to theblock-interleaved distributed-parity disk arraysand

operate in much the same manner. In particular, P+Q redundant disk arrays also perform smallwrite operations using a read-modify-write procedure, except that instead of four disk accesses

per write requests, P+Q redundant disk arrays require six disk accesses due to the need to updateboth the `P' and `Q' information.

Striped Mirrors (RAID Level 10)RAID 10 was not mentioned in the original 1988 article that defined RAID 1 through RAID 5.

The term is now used to mean the combination of RAID 0 (striping) and RAID 1 (mirroring).

Disks are mirrored in pairs for redundancy and improved performance, then data is striped across

multiple disks for maximum performance. In the diagram below, Disks 0 & 2 and Disks 1 & 3are mirrored pairs.

Obviously, RAID 10 uses more disk space to provide redundant data than RAID 5. However, italso provides a performance advantage by reading from all disks in parallel while eliminating the

write penalty of RAID 5. In addition, RAID 10 gives better performance than RAID 5 while a

failed drive remains unreplaced. Under RAID 5, each attempted read of the failed drive can beperformed only by reading all of the other disks. On RAID 10, a failed disk can be recovered bya single read of its mirrored pair.

source:Reference 2
http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205


11/30

Tool to calculate storage efficiency given the number of disks and the RAID level (source:Reference 3)

RAID Systems Need Tape Backups

It is worth remembering an important point about RAID systems. Even when you use a

redundancy scheme like mirroring or RAID 5 or RAID 10, you must still do regular tape

backups of your system. There are several reasons for insisting on this, among them:

RAID does not protect you from multiple disk failures. While one disk is off line for any reason,

your disk array is not fully redundant.

Regular tape backups allow you to recover from data loss that is not related to a disk failure.

This includes human errors, hardware errors, and software errors.

BACK/HOME/NEXT

There are three important considerations while making a selection as to which RAID level is tobe used for a system viz. cost, performance and reliability.

There are many different ways to measure these parameters for eg. performance could be

measured as I/Os per second per dollar, bytes per second or response time. We could also

compare systems at the same cost, the same total user capacity, the same performance or the

same reliability. The method used largely depends on the application and the reason to compare.For example, in transaction processing applications the primary base for comparison would be

I/Os per second per dollar while in scientific applications we would be more interested in bytes

per second per dollar. In some heterogeneous systems like file servers both I/O per second andbytes per second may be important. Sometimes it is important to consider reliability as the base

for comparison.

Taking a closer look at the RAID levels we observe that most of the levels are similar to each

other. RAID level 1 and RAID level 3 disk arrays can be viewed as a subclass of RAID level 5

disk arrays. Also RAID level 2 and RAID level 4 disk arrays are generally found to be inferior toRAID level 5 disk arrays. Hence the problem of selecting among RAID levels 1 through 5 is a

subset of the more general problem of choosing an appropriate parity group size and striping unit

for RAID level 5 disk arrays.

Some Comparisons

Given below is a table that compares the throughput of various redundancy schemes for four

types of I/O requests. The I/O requests are basically reads and writes which are divided intosmall (reads & writes) and large ones. Remembering the fact that our data has been spread over

multiple disks (data striping), a small refers to an I/O request of one striping unit while a large

I/O request refers to requests of one full stripe (one stripe unit from each disk in an error
http://www.ecs.umass.edu/ece/koren/architecture/Raid/raidiator.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidiator.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/striping.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/striping.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/striping.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidiator.html


12/30

correcting group).

RAIDType

SmallRead

SmallWrite

LargeRead

LargeWrite

StorageEfficiency

RAID Level 0 1 1 1 1 1

RAID Level 1 1 1/2 1 1/2 1/2

RAID Level 3 1/G 1/G (G-1)/G (G-1)/G (G-1)/G

RAID Level 5 1 max (1/G,1/4) 1 (G-1)/G (G-1)/G

RAID Level 6 1 max (1/G,1/6) 1 (G-2)/G (G-2)/G

G : The number of disks in an error correction group.

The table above tabulates the maximum throughput per dollar relative level 0 for RAID levels 0,1, 3, 5 and 6. For practical purposes we consider RAID levels 2 & 4 inferior to RAID level 5

disk arrays, so we don't show the comparisons. The cost of a system is directly proportional to

the number of disks it uses in the disk array. Thus the table shows us that given equivalent cost

RAID level 0 and RAID level 1 systems, the RAID level 1 system can sustain half the number ofsmall writes per second that a RAID level 0 system can sustain. Equivalently the cost of small

writes is twice as expensive in a RAID level 1 system as in a RAID level 0 system.

The table also shows storage efficiency of each RAID level. The storage efficiency is

approximately inverse the cost of each unit of user capacity relative to a RAID level 0 system.

Thestorage efficiencyis equal to the performance/cost metric for large writes.
http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%200http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%201http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%203http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%206http://www.ecs.umass.edu/ece/koren/architecture/Raid/raidiator.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidiator.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidiator.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidiator.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%206http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%203http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%201http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%200


13/30

source:Reference 1

The figures above graph the performance/cost metrics from the table above for RAID levels 1, 3,5 and 6 over a range of parity group sizes. The performance/cost of RAID level 1 systems is

equivalent to the performance/cost of RAID level 5 systems when the parity group size is equal

to 2. The performance/cost of RAID level 3 systems is always less than or equal to theperformance/cost of RAID level 5 systems. This is expected given that a RAID level 3 system isa subclass of RAID level 5 systems derived by restricting the striping unit size such that all

requests access exactly a parity stripe of data. Since the configuration of RAID level 5 systems is

not subject to such a restriction, the performance/cost of RAID level 5 systems can never be lessthan that of an equivalent RAID level 3 system. Of course such generalizations are specific to the

models of disk arrays used in the above experiments. In reality, a specific implementation of a


14/30

RAID level 3 system can have better performance/cost than a specific implementation of a RAID

level 5 system.

The question of which RAID level to use is better expressed as more general configuration

questions concerning the size of the parity group and striping unit. For a parity group size of 2,

mirroring is desirable, while for a very small striping unit RAID level 3 would be suited.

The figure below plots the performance/cost metrics from the table above for RAID levels 3, 5 &

6.

source:Reference 1


15/30

ReliabilityBACK/HOME

Reliability of any I/O system has become as important as its performance and cost. This part of the

tutorial:

Reviews the basic reliability provided by ablock-interleaved parity disk array

Lists and discusses three factors that can determine the potential reliability of disk arrays.

Redundancy in disk arrays is motivated by the need to fight disk failures. Two key factors MTTF(Mean-

Time-to-Failure) and MTTR(Mean-Time-to-Repair) are of primary concern in estimating the reliability of

any disk. Following are some formulae for the mean time between failures :

RAID level 5

MTTF(disk)2

------------------

N*(G-1)*MTTR(disk)

Disk array with two redundant disk per parity group (eg: P+Q redundancy)

MTTF(disk)3

-------------------------

N*(G-1)*(G-2)* (MTTR(disk)2

)

N - total number of disks in the systemG - number of disks in the parity group

Factors affecting Reliability
http://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.html


16/30

Three factors that can dramatically affect the reliability of disk arrays are:

System crashes

Uncorrectable bit-errors

Correlated disk failures

System CrashesSystem crash refers to any event such as a power failure, operator error, hardware breakdown,or software crash that can interrupt an I/O operation to a disk array.

Such crashes can interrupt write operations, resulting in states where the data is updated andthe parity is not updated or vice versa. In either case, parity is inconsistent and cannot be used in

the event of a disk failure. Techniques such as redundant hardware and power supplies can be

applied to make such crashes less frequent.

System crashes can cause parity inconsistencies in both bit-interleaved and block-interleaved

disk arrays, but the problem is of practical concern only in block-interleaved disk arrays.

For, reliability purposes, system crashes in block-interleaved disk arrays are similar to disk

failures in that they may result in the loss of the correct parity for stripes that were modified

during the crash.

Uncorrectable bit-errorsMost uncorrectable bit-errors are generated because data is incorrectly written or graduallydamaged as the magnetic media ages. These errors are detected only when we attempt to readthe data.

Our interpretation ofuncorrectable bit error rates is that they represent the rate at which

errors are detected during reads from the disk during the normal operation of the disk

drive.

One approach that can be used with or without redundancy is to try to protect against bit

errors by predicting when a disk is about to fail. VAXsimPLUS, a product from DEC, monitors

the warnings issued by disks and notifies an operator when it feels the disk is about to fail.


17/30

Correlated disk failuresCauses: Common environmental and manufacturing factors.

For example, an accident might sharply increase the failure rate for all disks in a disk array for ashort period of time. In general, power surges, power failures and simply switching the disks

on and offcan place stress on the electrical components of all affected disks. Disks also share

common support hardware; when this hardware fails, it can lead to multiple, simultaneous disk

failures.

Disks are generally more likely to fail either very early or very late in their lifetimes.

Early failuresare frequently caused by transient defects which may not have been detected

during the manufacturer's burn-in process.

Late failures occur when a disk wears out. Correlated disk failures greatly reduce the reliability

of disk arrays by making it much more likely that an initial disk failure will be closely followedby additional disk failures before the failed disk can be reconstructed.

Mean-Time-To-Data-Loss(MTTDL)Following are some formulae to calculate the mean-time-to-data-loss(MTTDL). In a block-

interleaved parity-protected disk array, data loss is possible through the following three commonways:

double disk failures

system crash followed by a disk failure

disk failure followed by an uncorrectable bit error during reconstruction

The above three failure modes are the hardest failure combinations, in that we, currently, don't have

any techniques to protect against them without sacrificing performance.

RAID Level 5

Double Disk Failure MTTF(disk1) * MTTF(disk2)
http://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#MTTDLhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#MTTDLhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#MTTDLhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#MTTDL


18/30

-----------------------

N * (G-1) * MTTR(disk)

System Crash + Disk Failure

MTTF(system) * MTTF(disk)

-----------------------

N * MTTR(system)

Disk Failure + Bit Error

MTTF(disk)

-----------------------

N * (1 - ( p(disk)) (G-1) )

Software RAID harmonic sum of the above

Hardware RAIDharmonic sum of above excluding

system crash + disk failure

Failure Characteristics forRAID Level 5Disk Arrays (source:Reference 1)

P+Q disk Array

Triple Disk Failures

MTTF(disk) * (MTTF(disk2) * MTTF(disk3)

----------------------------------

N * (G-1) * (G-2) * MTTR(disk) 2

System Crash + Disk Failure

MTTF(system) * MTTF(disk)

--------------------------

N * MTTR(system)

Double disk failure + Bit error MTTF(disk) * MTTF(disk2)
http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205


19/30

----------------------------------

N*(G-1)*(1-(p(disk)) (G-2) )* MTTR(disk)

Software RAID harmonic sum of the above

Hardware RAID harmonic sum excluding system crash +disk failure

Failure characteristics for aP+Q disk array (source:Reference 1)

p(disk) = The probability of reading all sectors on a disk (derived from disk size, sector size, and BER)

Tool for Reliability Using the Above Equations.

(source:Reference 3)

[edit] Implementations

The distribution of data across multiple drives can be managed either by dedicatedcomputerhardwareor bysoftware. A software solution may be part of the operating system, or it may be

part of the firmware and drivers supplied with a hardware RAID controller.

[edit] Software-based RAID

Software RAID implementations are now provided by manyoperating systems. Software RAID

can be implemented as:

a layer that abstracts multiple devices, thereby providing a singlevirtual device(e.g. Linux'smd).

a more generic logical volume manager (provided with most server-class operating systems, e.g.

VeritasorLVM).

a component of the file system (e.g.ZFSorBtrfs).

[edit]Volume manager support

Server class operating systems typically providelogical volume management, which allows asystem to use logical

[jargon]volumes which can be resized or moved. Often, features like RAID or

snapshots are also supported.

Vinumis a logical volume manager supportingRAID-0,RAID-1, andRAID-5. Vinum is part of the

base distribution of theFreeBSDoperating system, and versions exist forNetBSD,OpenBSD, and

DragonFly BSD.
http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%206http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%206http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%206http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/reliability_tool.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=9http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=9http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=9http://en.wikipedia.org/wiki/Computer_hardwarehttp://en.wikipedia.org/wiki/Computer_hardwarehttp://en.wikipedia.org/wiki/Computer_hardwarehttp://en.wikipedia.org/wiki/Computer_hardwarehttp://en.wikipedia.org/wiki/Computer_softwarehttp://en.wikipedia.org/wiki/Computer_softwarehttp://en.wikipedia.org/wiki/Computer_softwarehttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=10http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=10http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=10http://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Virtualizationhttp://en.wikipedia.org/wiki/Virtualizationhttp://en.wikipedia.org/wiki/Virtualizationhttp://en.wikipedia.org/wiki/Mdadmhttp://en.wikipedia.org/wiki/Mdadmhttp://en.wikipedia.org/wiki/Mdadmhttp://en.wikipedia.org/wiki/Veritas_file_systemhttp://en.wikipedia.org/wiki/Veritas_file_systemhttp://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)http://en.wikipedia.org/wiki/ZFShttp://en.wikipedia.org/wiki/ZFShttp://en.wikipedia.org/wiki/ZFShttp://en.wikipedia.org/wiki/Btrfshttp://en.wikipedia.org/wiki/Btrfshttp://en.wikipedia.org/wiki/Btrfshttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=11http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=11http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=11http://en.wikipedia.org/wiki/Logical_volume_managementhttp://en.wikipedia.org/wiki/Logical_volume_managementhttp://en.wikipedia.org/wiki/Logical_volume_managementhttp://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Vinum_volume_managerhttp://en.wikipedia.org/wiki/Vinum_volume_managerhttp://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_0http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_0http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_0http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_1http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_1http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_1http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_5http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_5http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_5http://en.wikipedia.org/wiki/FreeBSDhttp://en.wikipedia.org/wiki/FreeBSDhttp://en.wikipedia.org/wiki/NetBSDhttp://en.wikipedia.org/wiki/NetBSDhttp://en.wikipedia.org/wiki/NetBSDhttp://en.wikipedia.org/wiki/OpenBSDhttp://en.wikipedia.org/wiki/OpenBSDhttp://en.wikipedia.org/wiki/OpenBSDhttp://en.wikipedia.org/wiki/DragonFly_BSDhttp://en.wikipedia.org/wiki/DragonFly_BSDhttp://en.wikipedia.org/wiki/DragonFly_BSDhttp://en.wikipedia.org/wiki/OpenBSDhttp://en.wikipedia.org/wiki/NetBSDhttp://en.wikipedia.org/wiki/FreeBSDhttp://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_5http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_1http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_0http://en.wikipedia.org/wiki/Vinum_volume_managerhttp://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Logical_volume_managementhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=11http://en.wikipedia.org/wiki/Btrfshttp://en.wikipedia.org/wiki/ZFShttp://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)http://en.wikipedia.org/wiki/Veritas_file_systemhttp://en.wikipedia.org/wiki/Mdadmhttp://en.wikipedia.org/wiki/Virtualizationhttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=10http://en.wikipedia.org/wiki/Computer_softwarehttp://en.wikipedia.org/wiki/Computer_hardwarehttp://en.wikipedia.org/wiki/Computer_hardwarehttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=9http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/reliability_tool.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%206


20/30

Solaris SVM supports RAID 1 for the boot filesystem, and adds RAID 0 and RAID 5 support (and

various nested combinations) for data drives.

LinuxLVMsupports RAID 0 and RAID 1.

HP'sOpenVMSprovides a form of RAID 1 called "Volume shadowing", giving the possibility to

mirror data locally and at remote cluster systems.

[edit]File system support

Some advancedfile systemsare designed to organize data across multiple storage devices directly(without needing the help of a third-party logical volume manager).

ZFSsupports equivalents of RAID 0, RAID 1, RAID 5 (RAID Z), RAID 6 (RAID Z2), and a triple parity

version RAID Z3, and any nested combination of those like 1+0. ZFS is the native file system on

Solaris, and also available on FreeBSD.

Btrfssupports RAID 0, RAID 1, and RAID 10 (RAID 5 and 6 are under development).

[edit]Other support

Many operating systems provide basic RAID functionality independently of volume

management.

Apple'sMac OS X Server[19]

andMac OS X[20]

support RAID 0, RAID 1, and RAID 1+0.

FreeBSDsupports RAID 0, RAID 1, RAID 3, and RAID 5, and all nestings viaGEOM

modules[21][22]

and ccd.[23]

Linux's md supports RAID 0, RAID 1, RAID 4, RAID 5, RAID 6, and all nestings.[24][25]

Certain

reshaping/resizing/expanding operations are also supported.[26]

Microsoft's server operating systems support RAID 0, RAID 1, and RAID 5. Some of the Microsoft

desktop operating systems support RAID such as Windows XP Professional which supports RAID

level 0 in addition to spanning multiple drives but only if using dynamic disks and volumes.

Windows XP can be modified to support RAID 0, 1, and 5.[27]

NetBSDsupports RAID 0, RAID 1, RAID 4, and RAID 5, and all nestings via its software

implementation, named RAIDframe.

OpenBSDaims to support RAID 0, RAID 1, RAID 4, and RAID 5 via its software implementation

softraid.

FlexRAID(for Linux and Windows) is a snapshot RAID implementation.

Software RAID has advantages and disadvantages compared to hardware RAID. The software

must run on a host server attached to storage, and the server's processor must dedicate processingtime to run the RAID software; the additional processing capacity required for RAID 0 and

RAID 1 is low, but parity-based arrays require more complex data processing during write orintegrity-checking operations. As the rate of data processing increases with the number of drivesin the array, so does the processing requirement. Furthermore, all the buses between the

processor and the drive controller must carry the extra data required by RAID, which may cause

congestion.

Fortunately, over time, the increase in commodity CPU speed has been consistently greater than

the increase in drive throughput;[28]

the percentage of host CPU time required to saturate a given
http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)http://en.wikipedia.org/wiki/OpenVMShttp://en.wikipedia.org/wiki/OpenVMShttp://en.wikipedia.org/wiki/OpenVMShttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=12http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=12http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=12http://en.wikipedia.org/wiki/File_systemhttp://en.wikipedia.org/wiki/File_systemhttp://en.wikipedia.org/wiki/ZFShttp://en.wikipedia.org/wiki/ZFShttp://en.wikipedia.org/wiki/Solaris_(operating_system)http://en.wikipedia.org/wiki/Solaris_(operating_system)http://en.wikipedia.org/wiki/Btrfshttp://en.wikipedia.org/wiki/Btrfshttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=13http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=13http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=13http://en.wikipedia.org/wiki/Mac_OS_X_Serverhttp://en.wikipedia.org/wiki/Mac_OS_X_Serverhttp://en.wikipedia.org/wiki/Mac_OS_X_Serverhttp://en.wikipedia.org/wiki/Mac_OS_X_Serverhttp://en.wikipedia.org/wiki/Mac_OS_Xhttp://en.wikipedia.org/wiki/Mac_OS_Xhttp://en.wikipedia.org/wiki/Mac_OS_Xhttp://en.wikipedia.org/wiki/Mac_OS_Xhttp://en.wikipedia.org/wiki/FreeBSDhttp://en.wikipedia.org/wiki/FreeBSDhttp://en.wikipedia.org/wiki/GEOMhttp://en.wikipedia.org/wiki/GEOMhttp://en.wikipedia.org/wiki/GEOMhttp://en.wikipedia.org/wiki/RAID#cite_note-20http://en.wikipedia.org/wiki/RAID#cite_note-20http://en.wikipedia.org/wiki/RAID#cite_note-20http://en.wikipedia.org/wiki/RAID#cite_note-22http://en.wikipedia.org/wiki/RAID#cite_note-22http://en.wikipedia.org/wiki/RAID#cite_note-22http://en.wikipedia.org/wiki/Linuxhttp://en.wikipedia.org/wiki/Linuxhttp://en.wikipedia.org/wiki/RAID#cite_note-23http://en.wikipedia.org/wiki/RAID#cite_note-23http://en.wikipedia.org/wiki/RAID#cite_note-25http://en.wikipedia.org/wiki/RAID#cite_note-25http://en.wikipedia.org/wiki/RAID#cite_note-25http://en.wikipedia.org/wiki/Microsofthttp://en.wikipedia.org/wiki/Microsofthttp://en.wikipedia.org/wiki/RAID#cite_note-26http://en.wikipedia.org/wiki/RAID#cite_note-26http://en.wikipedia.org/wiki/RAID#cite_note-26http://en.wikipedia.org/wiki/NetBSDhttp://en.wikipedia.org/wiki/NetBSDhttp://en.wikipedia.org/wiki/OpenBSDhttp://en.wikipedia.org/wiki/OpenBSDhttp://en.wikipedia.org/wiki/FlexRAIDhttp://en.wikipedia.org/wiki/FlexRAIDhttp://en.wikipedia.org/wiki/RAID#cite_note-27http://en.wikipedia.org/wiki/RAID#cite_note-27http://en.wikipedia.org/wiki/RAID#cite_note-27http://en.wikipedia.org/wiki/RAID#cite_note-27http://en.wikipedia.org/wiki/FlexRAIDhttp://en.wikipedia.org/wiki/OpenBSDhttp://en.wikipedia.org/wiki/NetBSDhttp://en.wikipedia.org/wiki/RAID#cite_note-26http://en.wikipedia.org/wiki/Microsofthttp://en.wikipedia.org/wiki/RAID#cite_note-25http://en.wikipedia.org/wiki/RAID#cite_note-23http://en.wikipedia.org/wiki/RAID#cite_note-23http://en.wikipedia.org/wiki/Linuxhttp://en.wikipedia.org/wiki/RAID#cite_note-22http://en.wikipedia.org/wiki/RAID#cite_note-20http://en.wikipedia.org/wiki/RAID#cite_note-20http://en.wikipedia.org/wiki/GEOMhttp://en.wikipedia.org/wiki/FreeBSDhttp://en.wikipedia.org/wiki/Mac_OS_Xhttp://en.wikipedia.org/wiki/Mac_OS_Xhttp://en.wikipedia.org/wiki/Mac_OS_X_Serverhttp://en.wikipedia.org/wiki/Mac_OS_X_Serverhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=13http://en.wikipedia.org/wiki/Btrfshttp://en.wikipedia.org/wiki/Solaris_(operating_system)http://en.wikipedia.org/wiki/ZFShttp://en.wikipedia.org/wiki/File_systemhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=12http://en.wikipedia.org/wiki/OpenVMShttp://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)


21/30

number of drives has decreased. For instance, under 100% usage of a single core on a 2.1 GHz

Intel "Core2" CPU, the Linux software RAID subsystem (md) as of version 2.6.26 is capable of

calculating parity information at 6 GB/s; however, a three-drive RAID 5 array using drivescapable of sustaining a write operation at 100 MB/s only requires parity to be calculated at the

rate of 200 MB/s, which requires the resources of just over 3% of a single CPU core.

Furthermore, software RAID implementations may employ more sophisticated algorithms than

hardware RAID implementations (e.g. drive scheduling and command queueing), and thus, may

be capable of better performance.

Another concern with software implementations is the process of booting the associated

operating system. For instance, consider a computer being booted from a RAID 1 (mirroreddrives); if the first drive in the RAID 1 fails, then afirst-stage boot loadermight not be

sophisticated enough to attempt loading thesecond-stage boot loaderfrom the second drive as a

fallback. In contrast, a RAID 1 hardware controller typically has explicit programming to decide

that a drive has malfunctioned and that the next drive should be used. At least the following

second-stage boot loaders are capable of loading akernelfrom a RAID 1:

LILO(for Linux).

Some configurations of theGRUB.

The boot loader for FreeBSD.[29]

The boot loader for NetBSD.

For data safety, the write-backcacheof an operating system or individual drive might need to be

turned off in order to ensure that as much data as possible is actually written to secondary storage

before some failure (such as a loss of power); unfortunately, turning off the write-back cache has

a performance penalty that can be significant depending on the workload and commandqueuing

[jargon]support. In contrast, a hardware RAID controller may carry a dedicated battery-

powered write-back cache of its own, thereby allowing for efficient operation that is also

relatively safe. Fortunately, it is possible to avoid such problems with a software controller byconstructing a RAID with safer components; for instance, each drive could have its own battery

or capacitor on its own write-back cache, and the drive could implementatomicityin various

ways, and the entire RAID or computing system could be powered by aUPS, etc.[citation needed]

Finally, a software RAID controller that is built into an operating system usually uses proprietary

data formats and RAID levels, so an associated RAID usually cannot be shared betweenoperating systems as part of amulti bootsetup. However, such a RAID may be moved between

computers that share the same operating system; in contrast, such mobility is more difficult when

using a hardware RAID controller because both computers must provide compatible hardware

controllers. Also, if the hardware controller fails, data could become unrecoverable unless ahardware controller of the same type is obtained.

Most software implementations allow a RAID to be created frompartitionsrather than entirephysical drives. For instance, an administrator could divide each drive of an odd number of

drives into two partitions, and then mirror partitions across drives and stripe a volume across the

mirrored partitions to emulateIBM's RAID 1E configuration. Using partitions in this way also
http://en.wikipedia.org/wiki/Bootloaderhttp://en.wikipedia.org/wiki/Bootloaderhttp://en.wikipedia.org/wiki/Booting#Second-stage_boot_loaderhttp://en.wikipedia.org/wiki/Booting#Second-stage_boot_loaderhttp://en.wikipedia.org/wiki/Booting#Second-stage_boot_loaderhttp://en.wikipedia.org/wiki/Kernel_(computing)http://en.wikipedia.org/wiki/Kernel_(computing)http://en.wikipedia.org/wiki/Kernel_(computing)http://en.wikipedia.org/wiki/LILO_(boot_loader)http://en.wikipedia.org/wiki/LILO_(boot_loader)http://en.wikipedia.org/wiki/GNU_GRUBhttp://en.wikipedia.org/wiki/GNU_GRUBhttp://en.wikipedia.org/wiki/GNU_GRUBhttp://en.wikipedia.org/wiki/RAID#cite_note-28http://en.wikipedia.org/wiki/RAID#cite_note-28http://en.wikipedia.org/wiki/RAID#cite_note-28http://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Atomicity_(database_systems)http://en.wikipedia.org/wiki/Atomicity_(database_systems)http://en.wikipedia.org/wiki/Atomicity_(database_systems)http://en.wikipedia.org/wiki/Uninterruptible_power_supplyhttp://en.wikipedia.org/wiki/Uninterruptible_power_supplyhttp://en.wikipedia.org/wiki/Uninterruptible_power_supplyhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Multi_boothttp://en.wikipedia.org/wiki/Multi_boothttp://en.wikipedia.org/wiki/Multi_boothttp://en.wikipedia.org/wiki/Disk_partitioninghttp://en.wikipedia.org/wiki/Disk_partitioninghttp://en.wikipedia.org/wiki/Disk_partitioninghttp://en.wikipedia.org/wiki/Non-standard_RAID_levels#IBM_ServeRAID_1Ehttp://en.wikipedia.org/wiki/Non-standard_RAID_levels#IBM_ServeRAID_1Ehttp://en.wikipedia.org/wiki/Non-standard_RAID_levels#IBM_ServeRAID_1Ehttp://en.wikipedia.org/wiki/Non-standard_RAID_levels#IBM_ServeRAID_1Ehttp://en.wikipedia.org/wiki/Disk_partitioninghttp://en.wikipedia.org/wiki/Multi_boothttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Uninterruptible_power_supplyhttp://en.wikipedia.org/wiki/Atomicity_(database_systems)http://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/RAID#cite_note-28http://en.wikipedia.org/wiki/GNU_GRUBhttp://en.wikipedia.org/wiki/LILO_(boot_loader)http://en.wikipedia.org/wiki/Kernel_(computing)http://en.wikipedia.org/wiki/Booting#Second-stage_boot_loaderhttp://en.wikipedia.org/wiki/Bootloader


22/30

allows for constructing multiple RAIDs in various RAID levels from the same set of drives. For

example, one could have a very robust RAID 1 for important files, and a less robust RAID 5 or

RAID 0 for less important data, all using the same set of underlying drives. (Some BIOS-basedcontrollers offer similar features, e.g.Intel Matrix RAID.) Using two partitions from the same

drive in the same RAID puts data at risk if the drive fails; for instance:

A RAID 1 across partitions from the same drive makes all the data inaccessible if the single drive

fails.

Consider a RAID 5 composed of 4 drives, 3 of which are 250 GB and one of which is 500 GB; the

500 GB drive is split into 2 partitions, each of which is 250 GB. Then, a failure of the 500 GB drive

would remove 2 underlying 'drives' from the array, causing a failure of the entire array.

[edit] Hardware-based RAID

Hardware RAID controllers use proprietary data layouts, so it is not usually possible to span

controllers from different manufacturers. They do not require processor resources, the BIOS can

boot from them, and tighter integration with the device driver may offer better error handling.

On a desktop system, a hardware RAID controller may be anexpansion cardconnected to a bus(e.g.,PCIorPCIe), a component integrated into themotherboard; there are controllers for

supporting most types of drive technology, such asIDE/ATA,SATA,SCSI,SSA,Fibre

Channel, and sometimes even a combination. The controller and drives may be in a stand-aloneenclosure, rather than inside a computer, and the enclosure may bedirectly attachedto a

computer, or connected via aSAN.

Most hardware implementations provide a read/writecache, which, depending on the I/O

workload, improves performance. In most systems, the write cache is non-volatile (i.e. battery-

protected), so pending writes are not lost in the event of a power failure.

Hardware implementations provide guaranteed performance, add no computational overhead to

the host computer, and can support many operating systems; the controller simply presents theRAID as anotherlogical drive.

[edit] Firmware/driver-based RAID

A RAID implemented at the level of an operating system is not always compatible with the

system's boot process, and it is generally impractical for desktop versions of Windows (as

described above). However, hardware RAID controllers are expensive and proprietary. To fill

this gap, cheap "RAID controllers" were introduced that do not contain a dedicated RAIDcontroller chip, but simply a standard drive controller chip with special firmware and drivers;

during early stage bootup, the RAID is implemented by the firmware, and once the operating

system has been more completely loaded, then the drivers take over control. Consequently, suchcontrollers may not work when driver support is not available for the host operating system.

[30]

Initially, the term "RAID controller" implied that the controller does the processing. However,

while a controller without a dedicated RAID chip is often described by a manufacturer as a
http://en.wikipedia.org/wiki/Intel_Matrix_RAIDhttp://en.wikipedia.org/wiki/Intel_Matrix_RAIDhttp://en.wikipedia.org/wiki/Intel_Matrix_RAIDhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=14http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=14http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=14http://en.wikipedia.org/wiki/Expansion_cardhttp://en.wikipedia.org/wiki/Expansion_cardhttp://en.wikipedia.org/wiki/Peripheral_Component_Interconnecthttp://en.wikipedia.org/wiki/Peripheral_Component_Interconnecthttp://en.wikipedia.org/wiki/Peripheral_Component_Interconnecthttp://en.wikipedia.org/wiki/PCI_Expresshttp://en.wikipedia.org/wiki/PCI_Expresshttp://en.wikipedia.org/wiki/PCI_Expresshttp://en.wikipedia.org/wiki/Motherboardhttp://en.wikipedia.org/wiki/Motherboardhttp://en.wikipedia.org/wiki/Motherboardhttp://en.wikipedia.org/wiki/Advanced_Technology_Attachmenthttp://en.wikipedia.org/wiki/Advanced_Technology_Attachmenthttp://en.wikipedia.org/wiki/Advanced_Technology_Attachmenthttp://en.wikipedia.org/wiki/Serial_ATAhttp://en.wikipedia.org/wiki/Serial_ATAhttp://en.wikipedia.org/wiki/Serial_ATAhttp://en.wikipedia.org/wiki/SCSIhttp://en.wikipedia.org/wiki/SCSIhttp://en.wikipedia.org/wiki/SCSIhttp://en.wikipedia.org/wiki/Serial_Storage_Architecturehttp://en.wikipedia.org/wiki/Serial_Storage_Architecturehttp://en.wikipedia.org/wiki/Serial_Storage_Architecturehttp://en.wikipedia.org/wiki/Fibre_Channelhttp://en.wikipedia.org/wiki/Fibre_Channelhttp://en.wikipedia.org/wiki/Fibre_Channelhttp://en.wikipedia.org/wiki/Fibre_Channelhttp://en.wikipedia.org/wiki/Disk_enclosurehttp://en.wikipedia.org/wiki/Disk_enclosurehttp://en.wikipedia.org/wiki/Direct-attached_storagehttp://en.wikipedia.org/wiki/Direct-attached_storagehttp://en.wikipedia.org/wiki/Direct-attached_storagehttp://en.wikipedia.org/wiki/Storage_area_networkhttp://en.wikipedia.org/wiki/Storage_area_networkhttp://en.wikipedia.org/wiki/Storage_area_networkhttp://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Logical_diskhttp://en.wikipedia.org/wiki/Logical_diskhttp://en.wikipedia.org/wiki/Logical_diskhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=15http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=15http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=15http://en.wikipedia.org/wiki/RAID#cite_note-29http://en.wikipedia.org/wiki/RAID#cite_note-29http://en.wikipedia.org/wiki/RAID#cite_note-29http://en.wikipedia.org/wiki/RAID#cite_note-29http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=15http://en.wikipedia.org/wiki/Logical_diskhttp://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Storage_area_networkhttp://en.wikipedia.org/wiki/Direct-attached_storagehttp://en.wikipedia.org/wiki/Disk_enclosurehttp://en.wikipedia.org/wiki/Fibre_Channelhttp://en.wikipedia.org/wiki/Fibre_Channelhttp://en.wikipedia.org/wiki/Serial_Storage_Architecturehttp://en.wikipedia.org/wiki/SCSIhttp://en.wikipedia.org/wiki/Serial_ATAhttp://en.wikipedia.org/wiki/Advanced_Technology_Attachmenthttp://en.wikipedia.org/wiki/Motherboardhttp://en.wikipedia.org/wiki/PCI_Expresshttp://en.wikipedia.org/wiki/Peripheral_Component_Interconnecthttp://en.wikipedia.org/wiki/Expansion_cardhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=14http://en.wikipedia.org/wiki/Intel_Matrix_RAID


23/30

"RAID controller", it is rarely made clear that the burden of RAID processing is borne by a host

computer's central processing unit rather than the RAID controller itself. Thus, this new type is

sometimes called "fake" RAID;Adapteccalls it a "HostRAID".

Moreover, a firmware controller can often only support certain types of hard drive to form the

RAID that it manages (e.g. SATA for anIntel Matrix RAID, as there is neither SCSI nor PATAsupport in modern Intel ICHsouthbridges; however, motherboard makers implement RAID

controllers outside of the southbridge on some motherboards).

[edit] Hot spares

Both hardware and software RAIDs with redundancy may support the use of ahot sparedrive;

this is a drive physically installed in the array which is inactive until an active drive fails, whenthe system automatically replaces the failed drive with the spare, rebuilding the array with the

spare drive included. This reduces themean time to recovery(MTTR), but does not completely

eliminate it. As with non-hot-spare systems, subsequent additional failure(s) in the same RAID

redundancy group before the array is fully rebuilt can cause data loss. Rebuilding can takeseveral hours, especially on busy systems.

It is sometimes considered that if drives are procured and installed at the same time, several

drives are more likely to fail at about the same time than for unrelated drives, so rapid

replacement of a failed drive is important.[citation needed]

RAID 6 without a spare uses the samenumber of drives as RAID 5 with a hot spare and protects data against failure of up to two drives,

but requires a more advanced RAID controller. Further, a hot spare can be shared by multiple

RAID sets.

[edit] Data scrubbing / Patrol read

Data scrubbing is periodic reading and checking by the RAID controller of all the blocks in aRAID, including those not otherwise accessed. This allows bad blocks to be detected before they

are used.[31]

An alternate name for this ispatrol read. This is defined as a check for bad blocks on each

storage device in an array, but which also uses the redundancy of the array to recover bad blocks

on a single drive and reassign the recovered data to spare blocks elsewhere on the drive.[32]

[edit] Reliability terms

Failure rate

Two different kinds of failure rates are applicable to RAID systems. Logical failure is defined as

the loss of a single drive and its rate is equal to the sum of individual drives' failure rates. System

failure is defined as loss of data and its rate will depend on the type of RAID. For RAID 0 this is

equal to the logical failure rate, as there is no redundancy. For other types of RAID, it will be less

than the logical failure rate, potentially very small, and its exact value will depend on the type of
http://en.wikipedia.org/wiki/Adaptechttp://en.wikipedia.org/wiki/Adaptechttp://en.wikipedia.org/wiki/Adaptechttp://en.wikipedia.org/wiki/Intel_Matrix_RAIDhttp://en.wikipedia.org/wiki/Intel_Matrix_RAIDhttp://en.wikipedia.org/wiki/Intel_Matrix_RAIDhttp://en.wikipedia.org/wiki/Southbridge_(computing)http://en.wikipedia.org/wiki/Southbridge_(computing)http://en.wikipedia.org/wiki/Southbridge_(computing)http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=16http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=16http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=16http://en.wikipedia.org/wiki/Hot_sparehttp://en.wikipedia.org/wiki/Hot_sparehttp://en.wikipedia.org/wiki/Mean_time_to_recoveryhttp://en.wikipedia.org/wiki/Mean_time_to_recoveryhttp://en.wikipedia.org/wiki/Mean_time_to_recoveryhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=17http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=17http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=17http://en.wikipedia.org/wiki/RAID#cite_note-30http://en.wikipedia.org/wiki/RAID#cite_note-30http://en.wikipedia.org/wiki/RAID#cite_note-30http://en.wikipedia.org/wiki/RAID#cite_note-31http://en.wikipedia.org/wiki/RAID#cite_note-31http://en.wikipedia.org/wiki/RAID#cite_note-31http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=18http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=18http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=18http://en.wikipedia.org/wiki/Failure_ratehttp://en.wikipedia.org/wiki/Failure_ratehttp://en.wikipedia.org/wiki/Failure_ratehttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=18http://en.wikipedia.org/wiki/RAID#cite_note-31http://en.wikipedia.org/wiki/RAID#cite_note-30http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=17http://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Mean_time_to_recoveryhttp://en.wikipedia.org/wiki/Hot_sparehttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=16http://en.wikipedia.org/wiki/Southbridge_(computing)http://en.wikipedia.org/wiki/Intel_Matrix_RAIDhttp://en.wikipedia.org/wiki/Adaptec


24/30

RAID, the number of drives employed, the vigilance and alacrity of its human administrators,

and chance (improbable events do occur, though infrequently).

Mean time to data loss (MTTDL)

In this context, the average time before a loss of data in a given array.[33]

Mean time to data loss

of a given RAID may be higher or lower than that of its constituent hard drives, depending upon

what type of RAID is employed. The referenced report assumes times to data loss are

exponentially distributed, so that 63.2% of all data loss will occur between time 0 and the

MTTDL.

Mean time to recovery(MTTR)

In arrays that include redundancy for reliability, this is the time following a failure to restore an

array to its normal failure-tolerant mode of operation. This includes time to replace a failed

drive mechanism and time to re-build the array (to replicate data for redundancy).

Unrecoverable bit error rate (UBE)

This is the rate at which a drive will be unable to recover data after application of cyclic

redundancy check (CRC) codes and multiple retries.

Write cache reliability

Some RAID systems useRAMwrite cache to increase performance. A power failure can result in

data loss unless this sort ofdrive bufferhas a supplementary battery to ensure that the buffer

has time to write from RAM to secondary storage before the drive powers down.

Atomic writefailure

Also known by various terms such as torn writes, torn pages, incomplete writes, interrupted

writes, non-transactional, etc.

[edit] Problems with RAID

[edit] Correlated failures

The theory behind the error correction in RAID assumes that failures of drives are independent.Given these assumptions, it is possible to calculate how often they can fail and to arrange the

array to make data loss arbitrarily improbable. There is also an assumption that motherboard

failures won't damage the hard drive and that hard drive failures occur more often thanmotherboard failures.

In practice, the drives are often the same age (with similar wear) and subject to the sameenvironment. Since many drive failures are due to mechanical issues (which are more likely on

older drives), this violates those assumptions; failures are in fact statistically correlated. In
http://en.wikipedia.org/wiki/RAID#cite_note-32http://en.wikipedia.org/wiki/RAID#cite_note-32http://en.wikipedia.org/wiki/Mean_time_to_recoveryhttp://en.wikipedia.org/wiki/Mean_time_to_recoveryhttp://en.wikipedia.org/wiki/Random-access_memoryhttp://en.wikipedia.org/wiki/Random-access_memoryhttp://en.wikipedia.org/wiki/Disk_bufferhttp://en.wikipedia.org/wiki/Disk_bufferhttp://en.wikipedia.org/wiki/Atomicity_(database_systems)http://en.wikipedia.org/wiki/Atomicity_(database_systems)http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=19http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=19http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=19http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=20http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=20http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=20http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=20http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=19http://en.wikipedia.org/wiki/Atomicity_(database_systems)http://en.wikipedia.org/wiki/Disk_bufferhttp://en.wikipedia.org/wiki/Random-access_memoryhttp://en.wikipedia.org/wiki/Mean_time_to_recoveryhttp://en.wikipedia.org/wiki/RAID#cite_note-32


25/30

practice, the chances of a second failure before the first has been recovered (causing data loss) is

not as unlikely as for random failures. In a study including about 100 thousand drives, the

probability of two drives in the same cluster failing within one hour was observed to be fourtimes larger than was predicted by theexponential statistical distributionwhich characterizes

processes in which events occur continuously and independently at a constant average rate. The

probability of two failures within the same 10-hour period was twice as large as that which waspredicted by an exponential distribution.[34]

A common assumption is that "server-grade" drives fail less frequently than consumer-gradedrives. Two independent studies (one byCarnegie Mellon Universityand the other byGoogle)

have shown that the "grade" of a drive does not relate to the drive's failure rate.[35][36]

In addition, there is no protection circuitry between the motherboard and hard drive electronics,

so a catastrophic failure of the motherboard can cause the harddrive electronics to fail. Therefore,

taking elaborate precautions via RAID setups ignores the equal risk of electronics failures

elsewhere which can cascade to a hard drive failure. For a robust critical data system, no risk can

outweigh another as the consequence of any data loss is unacceptable.

[edit] Atomicity

This is a little understood and rarely mentioned failure mode for redundant storage systems that

do not utilize transactional features. Database researcherJim Graywrote "Update in Place is aPoison Apple"

[37]during the early days of relational database commercialization. However, this

warning largely went unheeded and fell by the wayside upon the advent of RAID, which many

software engineers mistook as solving all data storage integrity and reliability problems. Many

software programs update a storage object "in-place"; that is, they write a new version of theobject on to the same secondary storage addresses as the old version of the object. While the

software may also log some delta information elsewhere, it expects the storage to present"atomic write semantics," meaning that the write of the data either occurred in its entirety or did

not occur at all.

However, very few storage systems provide support for atomic writes, and even fewer specifytheir rate of failure in providing this semantic. Note that during the act of writing an object, a

RAID storage device will usually be writing all redundant copies of the object in parallel,

although overlapped or staggered writes are more common when a single RAID processor is

responsible for multiple drives. Hence an error that occurs during the process of writing mayleave the redundant copies in different states, and furthermore may leave the copies in neither the

old nor the new state. The little known failure mode is that delta logging relies on the original

data being either in the old or the new state so as to enable backing out the logical change, yetfew storage systems provide an atomic write semantic for a RAID.

While the battery-backed write cache may partially solve the problem, it is applicable only to apower failure scenario.

Since transactional support is not universally present in hardware RAID, many operating systemsinclude transactional support to protect against data loss during an interrupted write. Novell
http://en.wikipedia.org/wiki/Exponential_distributionhttp://en.wikipedia.org/wiki/Exponential_distributionhttp://en.wikipedia.org/wiki/RAID#cite_note-schroeder-33http://en.wikipedia.org/wiki/RAID#cite_note-schroeder-33http://en.wikipedia.org/wiki/RAID#cite_note-schroeder-33http://en.wikipedia.org/wiki/Carnegie_Mellon_Universityhttp://en.wikipedia.org/wiki/Carnegie_Mellon_Universityhttp://en.wikipedia.org/wiki/Carnegie_Mellon_Universityhttp://en.wikipedia.org/wiki/Googlehttp://en.wikipedia.org/wiki/Googlehttp://en.wikipedia.org/wiki/Googlehttp://en.wikipedia.org/wiki/RAID#cite_note-CMUDiskFailure-34http://en.wikipedia.org/wiki/RAID#cite_note-CMUDiskFailure-34http://en.wikipedia.org/wiki/RAID#cite_note-CMUDiskFailure-34http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=21http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=21http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=21http://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)http://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)http://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)http://en.wikipedia.org/wiki/RAID#cite_note-36http://en.wikipedia.org/wiki/RAID#cite_note-36http://en.wikipedia.org/wiki/RAID#cite_note-36http://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=21http://en.wikipedia.org/wiki/RAID#cite_note-CMUDiskFailure-34http://en.wikipedia.org/wiki/RAID#cite_note-CMUDiskFailure-34http://en.wikipedia.org/wiki/Googlehttp://en.wikipedia.org/wiki/Carnegie_Mellon_Universityhttp://en.wikipedia.org/wiki/RAID#cite_note-schroeder-33http://en.wikipedia.org/wiki/Exponential_distribution


26/30

NetWare, starting with version 3.x, included a transaction tracking system. Microsoft introduced

transaction tracking via thejournalingfeature inNTFS. ext4 has journaling with checksums;

ext3 has journaling without checksums but an "append-only" option, or ext3cow (Copy onWrite). If the journal itself in a filesystem is corrupted though, this can be problematic. The

journaling in NetAppWAFLfile system gives atomicity by never updating the data in place, as

doesZFS. An alternative method to journaling issoft updates, which are used in some BSD-derived system's implementation ofUFS.

This can present as a sector read failure. Some RAID implementations protect against this failuremode by remapping thebad sector, using the redundant data to retrieve a good copy of the data,

and rewriting that good data to the newly mapped replacement sector. The UBE (Unrecoverable

Bit Error) rate is typically specified at 1 bit in 1015

for enterprise class drives (SCSI,FC,SAS),

and 1 bit in 1014

for desktop class drives (IDE/ATA/PATA, SATA). Increasing drive capacitiesand large RAID 5 redundancy groups have led to an increasing inability to successfully rebuild a

RAID group after a drive failure because an unrecoverable sector is found on the remaining

drives. Double protection schemes such as RAID 6 are attempting to address this issue, but

suffer from a very high write penalty.

[edit] Write cache reliability

The drive system can acknowledge the write operation as soon as the data is in the cache, not

waiting for the data to be physically written. This typically occurs in old, non-journaled systems

such as FAT32, or if the Linux/Unix "writeback" option is chosen without any protections likethe "soft updates" option (to promote I/O speed whilst trading-away data reliability). A power

outage or system hang such as aBSODcan mean a significant loss of any data queued in such a

cache.

Often a battery is protecting the write cache, mostly solving the problem. If a write fails becauseof power failure, the controller may complete the pending writes as soon as restarted. This

solution still has potential failure cases: the battery may have worn out, the power may be off fortoo long, the drives could be moved to another controller, and the controller itself could fail.

Some systems provide the capability of testing the battery periodically, however this leaves the

system without a fully charged battery for several hours.

An additional concern about write cache reliability exists, specifically regarding devices

equipped with a write-back cachea caching system which reports the data as written as soon asit is written to cache, as opposed to the non-volatile medium.

[38]The safer cache technique is

write-through, which reports transactions as written when they are written to the non-volatile

medium.

[edit] Equipment compatibili

an introduction to raid

Documents