an introduction to raid

Upload: sahil-jain

Post on 03-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 An Introduction to RAID

    1/30

    An Introduction to RAID

    The Need for RAID

    Data Striping & Redundancy

    Different Types of RAID

    Cost & Performance Issues

    Reliability Issues in RAID

    Implementations

    Problems

    RAID (redundant array of independent disks, originally redundant array of inexpensive

    disks[1][2]) is a storage technology that combines multipledisk drivecomponents into a logicalunit. Data is distributed across the drives in one of several ways called "RAID levels", depending

    on what level ofredundancyand performance (viaparallel communication) is required.

    RAID is an example ofstorage virtualizationand was first defined byDavid Patterson,Garth A.

    Gibson, andRandy Katzat theUniversity of California, Berkeleyin 1987.[3]

    Marketers

    representing industry RAID manufacturers later attempted to reinvent the term to describe aredundant array of independent disks as a means of dissociating a low-cost expectation from

    RAID technology.[4]

    RAID is now used as anumbrella termforcomputer data storageschemes that can divide and

    replicatedataamong multiple physical drives. The physical drives are said to be in a RAID

    array,[5]

    which is accessed by theoperating systemas one single drive. The different schemes orarchitectures are named by the word RAID followed by a number (e.g., RAID 0, RAID 1). Each

    scheme provides a different balance between two key goals: increasedata reliabilityand increase

    input/outputperformance.

    http://www.ecs.umass.edu/ece/koren/architecture/Raid/intro.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/intro.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/why.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/why.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/striping.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/striping.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/reliability.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/reliability.htmlhttp://en.wikipedia.org/wiki/RAID#cite_note-0http://en.wikipedia.org/wiki/RAID#cite_note-0http://en.wikipedia.org/wiki/RAID#cite_note-0http://en.wikipedia.org/wiki/Disk_drivehttp://en.wikipedia.org/wiki/Disk_drivehttp://en.wikipedia.org/wiki/Disk_drivehttp://en.wikipedia.org/wiki/Redundancy_(engineering)http://en.wikipedia.org/wiki/Redundancy_(engineering)http://en.wikipedia.org/wiki/Redundancy_(engineering)http://en.wikipedia.org/wiki/Parallel_communicationhttp://en.wikipedia.org/wiki/Parallel_communicationhttp://en.wikipedia.org/wiki/Parallel_communicationhttp://en.wikipedia.org/wiki/Storage_virtualizationhttp://en.wikipedia.org/wiki/Storage_virtualizationhttp://en.wikipedia.org/wiki/Storage_virtualizationhttp://en.wikipedia.org/wiki/David_Patterson_(scientist)http://en.wikipedia.org/wiki/David_Patterson_(scientist)http://en.wikipedia.org/wiki/David_Patterson_(scientist)http://en.wikipedia.org/wiki/Garth_A._Gibsonhttp://en.wikipedia.org/wiki/Garth_A._Gibsonhttp://en.wikipedia.org/wiki/Garth_A._Gibsonhttp://en.wikipedia.org/wiki/Garth_A._Gibsonhttp://en.wikipedia.org/wiki/Randy_Katzhttp://en.wikipedia.org/wiki/Randy_Katzhttp://en.wikipedia.org/wiki/Randy_Katzhttp://en.wikipedia.org/wiki/University_of_California,_Berkeleyhttp://en.wikipedia.org/wiki/University_of_California,_Berkeleyhttp://en.wikipedia.org/wiki/University_of_California,_Berkeleyhttp://en.wikipedia.org/wiki/RAID#cite_note-patterson-2http://en.wikipedia.org/wiki/RAID#cite_note-patterson-2http://en.wikipedia.org/wiki/RAID#cite_note-patterson-2http://en.wikipedia.org/wiki/RAID#cite_note-3http://en.wikipedia.org/wiki/RAID#cite_note-3http://en.wikipedia.org/wiki/RAID#cite_note-3http://en.wikipedia.org/wiki/Umbrella_termhttp://en.wikipedia.org/wiki/Umbrella_termhttp://en.wikipedia.org/wiki/Umbrella_termhttp://en.wikipedia.org/wiki/Computer_data_storagehttp://en.wikipedia.org/wiki/Computer_data_storagehttp://en.wikipedia.org/wiki/Computer_data_storagehttp://en.wikipedia.org/wiki/Data_(computing)http://en.wikipedia.org/wiki/Data_(computing)http://en.wikipedia.org/wiki/Data_(computing)http://en.wikipedia.org/wiki/RAID#cite_note-RAS-4http://en.wikipedia.org/wiki/RAID#cite_note-RAS-4http://en.wikipedia.org/wiki/RAID#cite_note-RAS-4http://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Data_reliabilityhttp://en.wikipedia.org/wiki/Data_reliabilityhttp://en.wikipedia.org/wiki/Data_reliabilityhttp://en.wikipedia.org/wiki/Input/outputhttp://en.wikipedia.org/wiki/Input/outputhttp://en.wikipedia.org/wiki/Input/outputhttp://en.wikipedia.org/wiki/Data_reliabilityhttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/RAID#cite_note-RAS-4http://en.wikipedia.org/wiki/Data_(computing)http://en.wikipedia.org/wiki/Computer_data_storagehttp://en.wikipedia.org/wiki/Umbrella_termhttp://en.wikipedia.org/wiki/RAID#cite_note-3http://en.wikipedia.org/wiki/RAID#cite_note-patterson-2http://en.wikipedia.org/wiki/University_of_California,_Berkeleyhttp://en.wikipedia.org/wiki/Randy_Katzhttp://en.wikipedia.org/wiki/Garth_A._Gibsonhttp://en.wikipedia.org/wiki/Garth_A._Gibsonhttp://en.wikipedia.org/wiki/David_Patterson_(scientist)http://en.wikipedia.org/wiki/Storage_virtualizationhttp://en.wikipedia.org/wiki/Parallel_communicationhttp://en.wikipedia.org/wiki/Redundancy_(engineering)http://en.wikipedia.org/wiki/Disk_drivehttp://en.wikipedia.org/wiki/RAID#cite_note-0http://en.wikipedia.org/wiki/RAID#cite_note-0http://www.ecs.umass.edu/ece/koren/architecture/Raid/reliability.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/striping.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/why.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/intro.html
  • 7/28/2019 An Introduction to RAID

    2/30

    There are number of different RAID levels:

    Level 0 -- Striped Disk Array without Fault Tolerance: Provides data striping(spreading out blocks of

    each file across multiple disk drives) but no redundancy. This improves performance but does not deliver

    fault tolerance. If one drive fails then all data in the array is lost.

    Level 1 -- Mirroring and Duplexing: Providesdisk mirroring. Level 1 provides twice the read transaction

    rate of single disks and the same write transaction rate as single disks.

    Level 2 -- Error-Correcting Coding: Not a typical implementation and rarely used, Level 2 stripes data at

    the bit level rather than the block level.

    Level 3 -- Bit-Interleaved Parity: Provides byte-level striping with a dedicated parity disk. Level 3, which

    cannot service simultaneous multiple requests, also is rarely used.

    Level 4 -- Dedicated Parity Drive:A commonly used implementation of RAID, Level 4 provides block-

    level striping (like Level 0) with a parity disk. If a data disk fails, the parity data is used to create a

    replacement disk. A disadvantage to Level 4 is that the parity disk can create write bottlenecks.

    Level 5 -- Block Interleaved Distributed Parity:Provides data striping at the byte level and also stripe

    error correction information. This results in excellent performance and good fault tolerance. Level 5 is one ofthe most popular implementations of RAID.

    Level 6 -- Independent Data Disks with Double Parity: Provides block-level striping with parity data

    distributed across all disks.

    Level 0+1 -- A Mirror of Stripes:Not one of the original RAID levels, two RAID 0 stripes are created, and

    a RAID 1 mirror is created over them. Used for both replicating and sharing data among disks.

    Level 10 -- A Stripe of Mirrors:Not one of the original RAID levels, multiple RAID 1 mirrors are created,

    and a RAID 0 stripe is created over these.

    Level 7:A trademark of Storage Computer Corporation that adds caching to Levels 3 or 4.

    RAID S: (also called Parity RAID) EMC Corporation's proprietary striped parity RAID system used in its

    Symmetrix storage systems.

    [edit] New RAID classification

    In 1996, the RAID Advisory Board introduced an improved classification of RAIDsystems.

    [citation needed]It divides RAID into three types:

    Failure-resistant(systems that protect against loss of data due to drive failure).

    Failure-tolerant(systems that protect against loss of data access due to failure of any singlecomponent).

    Disaster-tolerant(systems that consist of two or more independent zones, either of which

    provides access to stored data).

    The original "Berkeley" RAID classifications are still kept as an important historical referencepoint and also to recognize that RAID levels 06 successfully define all known data mapping

    and protection schemes for disk-based storage systems. Unfortunately, the original classification

    http://www.webopedia.com/TERM/D/disk_mirroring.htmlhttp://www.webopedia.com/TERM/D/disk_mirroring.htmlhttp://www.webopedia.com/TERM/D/disk_mirroring.htmlhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=6http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=6http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=6http://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=6http://www.webopedia.com/TERM/D/disk_mirroring.html
  • 7/28/2019 An Introduction to RAID

    3/30

    caused some confusion due to the assumption that higher RAID levels imply higher redundancy

    and performance; this confusion has been exploited by RAID system manufacturers, and it has

    given birth to the products with such names as RAID-7, RAID-10, RAID-30, RAID-S, etc.Consequently, the new classification describes the data availability characteristics of a RAID

    system, leaving the details of its implementation to system manufacturers.

    Failure-resistant disk systems (FRDS) (meets a minimum of criteria 16)

    1. Protection against data loss and loss of access to data due to drive failure

    2. Reconstruction of failed drive content to a replacement drive

    3. Protection against data loss due to a "write hole"

    4. Protection against data loss due to host and host I/O bus failure

    5. Protection against data loss due to replaceable unit failure

    6. Replaceable unit monitoring and failure indication

    Failure-tolerant disk systems (FTDS) (meets a minimum of criteria 115)

    7. Disk automatic swap and hot swap

    8. Protection against data loss due to cache failure

    9. Protection against data loss due to external power failure

    10.Protection against data loss due to a temperature out of operating range

    11.Replaceable unit and environmental failure warning

    12.Protection against loss of access to data due to device channel failure

    13.Protection against loss of access to data due to controller module failure

    14.Protection against loss of access to data due to cache failure

    15.Protection against loss of access to data due to power supply failure

    Disaster-tolerant disk systems (DTDS) (meets a minimum of criteria 121)

    16.Protection against loss of access to data due to host and host I/O bus failure

    17.Protection against loss of access to data due to external power failure

    18.Protection against loss of access to data due to component replacement19.Protection against loss of data and loss of access to data due to multiple drive failures

    20.Protection against loss of access to data due to zone failure

    21.Long-distance protection against loss of data due to zone failure

  • 7/28/2019 An Introduction to RAID

    4/30

    NEED

    The need for RAID can be summarized in two points given below. The two keywords areRedundant and Array.

    An arrayof multiple disks accessed in parallel will give greater throughput than a

    single disk.

    Redundant data on multiple disks provides fault tolerance.

    Provided that the RAID hardware and software perform true parallel accesses on multiple drives,

    there will be a performance improvement over a single disk.

    With a single hard disk, you cannot protect yourself against the costs of a disk failure, the time

    required to obtain and install a replacement disk, reinstall the operating system, restore files frombackup tapes, and repeat all the data entry performed since the last backup was made.

    With multiple disks and a suitable redundancy scheme, your system can stay up and runningwhen a disk fails, and even while the replacement disk is being installed and its data restored.

    To create an optimal cost-effective RAID configuration, we need to simultaneously achievethe following goals:

    Maximize the number of disks being accessed in parallel.

    Minimize the amount of disk space being used for redundant data.

    Minimize the overhead required to achieve the above goals.

    Basic RAID Organizations

  • 7/28/2019 An Introduction to RAID

    5/30

    There are many types of RAID and some of the important ones are introduced below:

    Non-Redundant (RAID Level 0)A non-redundant disk array, or RAID level 0, has the lowest cost of any RAID organizationbecause it does not employ redundancy at all. This scheme offers the best performance since it

    never needs to update redundant information. Surprisingly, it does not have the best

    performance. Redundancy schemes that duplicate data, such as mirroring, can perform better onreads by selectively scheduling requests on the disk with the shortest expectedseekand

    rotationaldelays. Without, redundancy, any single disk failure will result in data-loss. Non-

    redundant disk arrays are widely used in super-computing environments where performance and

    capacity, rather than reliability, are the primary concerns.

    Sequential blocks of data are written across multiple disks in stripes, as follows:

    source:Reference 2

    The size of a data block, which is known as the "stripe width", varies with the implementation,but is always at least as large as a disk's sector size. When it comes time to read back this

    sequential data, all disks can be read in parallel. In a multi-tasking operating system, there is a

    high probability that even non-sequential disk accesses will keep all of the disks working inparallel.

    Mirrored (RAID Level 1)The traditional solution, called mirroring or shadowing, uses twice as many disks as a non-redundant disk array. whenever data is written to a disk the same data is also written to a

    redundant disk, so that there are always two copies of the information. When data is read, it can

    be retrieved from the disk with the shorter queuing, seek and rotational delays. If a disk fails, the

    other copy is used to service requests. Mirroring is frequently used in database applicationswhere availability and transaction time are more important than storage efficiency.

    http://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#seekhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#seekhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#seekhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#rotationalhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#rotationalhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#rotationalhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#seek
  • 7/28/2019 An Introduction to RAID

    6/30

    source:Reference 2

    Memory-Style(RAID Level 2)Memory systems have provided recovery from failed components with much less cost than

    mirroring by using Hamming codes. Hamming codes contain parity for distinct overlappingsubsets of components. In one version of this scheme, four disks require three redundant disks,

    one less than mirroring. Since the number of redundant disks is proportional to the log of thetotal number of the disks on the system, storage efficiency increases as the number of data disksincreases.

    If a single component fails, several of the parity components will have inconsistent values, andthe failed component is the one held in common by each incorrect subset. The lost information is

    recovered by reading the other components in a subset, including the parity component, and

    setting the missing bit to 0 or 1 to create proper parity value for that subset. Thus, multiple

    redundant disks are needed to identify the failed disk, but only one is needed to recover the lostinformation.

    In you are unaware of parity, you can think of the redundant disk as having the sum of all data inthe other disks. When a disk fails, you can subtract all the data on the good disks form the parity

    disk; the remaining information must be the missing information. Parity is simply this summodulo 2.

    A RAID 2 system would normally have as many data disks as the word size of the computer,typically 32. In addition, RAID 2 requires the use of extra disks to store an error-correcting code

    for redundancy. With 32 data disks, a RAID 2 system would require 7 additional disks for a

    Hamming-code ECC. Such an array of 39 disks was the subject of a U.S. patent granted toUnisys Corporation in 1988, but no commercial product was ever released.

    For a number of reasons, including the fact that modern disk drives contain their own internalECC, RAID 2 is not a practical disk array scheme.

    http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.html
  • 7/28/2019 An Introduction to RAID

    7/30

    source:Reference 2

    Bit-Interleaved Parity (RAID Level 3)One can improve upon memory-style ECC disk arrays by noting that, unlike memory component

    failures, disk controllers can easily identify which disk has failed. Thus, one can use a single

    parity rather than a set of parity disks to recover lost information.

    In a bit-interleaved, parity disk array, data is conceptually interleaved bit-wise over the data

    disks, and a single parity disk is added to tolerate any single disk failure. Each read requestaccesses all data disks and each write request accesses all data disks and the parity disk. Thus,

    only one request can be serviced at a time. Because the parity disk contains only parity and no

    data, the parity disk cannot participate on reads, resulting in slightly lower read performance thanfor redundancy schemes that distribute the parity and data over all disks. Bit-interleaved, parity

    disk arrays are frequently used in applications that require high bandwidth but not high I/O rates.

    They are also simpler to implement than RAID levels 4, 5, and 6.

    Here, the parity disk is written in the same way as the parity bit in normal Random AccessMemory (RAM), where it is the Exclusive Or of the 8, 16 or 32 data bits. In RAM, parity is used

    to detect single-bit data errors, but it cannot correct them because there is no informationavailable to determine which bit is incorrect. With disk drives, however, we rely on the disk

    controller to report a data read error. Knowing which disk's data is missing, we can reconstruct it

    as the Exclusive Or (XOR) of all remaining data disks plus the parity disk.

    source:Reference 2

    As a simple example, suppose we have 4 data disks and one parity disk. The sample bits are:

    http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.html
  • 7/28/2019 An Introduction to RAID

    8/30

    Disk 0 Disk 1 Disk 2 Disk 3 Parity

    0 1 1 1 1

    The parity bit is the XOR of these four data bits, which can be calculated by adding them up and

    writing a 0 if the sum is even and a 1 if it is odd. Here the sum of Disk 0 through Disk 3 is "3",

    so the parity is 1. Now if we attempt to read back this data, and find that Disk 2 gives a readerror, we can reconstruct Disk 2 as the XOR of all the other disks, including the parity. In the

    example, the sum of Disk 0, 1, 3 and Parity is "3", so the data on Disk 2 must be 1.

    Block-Interleaved Parity (RAID Level 4)The block-interleaved, parity disk array is similar to the bit-interleaved, parity disk array except

    that data is interleaved across disks of arbitrary size rather than in bits. The size of these blocks is

    called the striping unit. Read requests smaller than the striping unit access only a single datadisk. Write requests must update the requested data blocks and must also compute and update the

    parity block. For large writes that touch blocks on all disks, parity is easily computed by

    exclusive-or'ing the new data for each disk. For small write requests that update only one data

    disk, parity is computed by noting how the new data differs from the old data and applying thosedifferences to the parity block. Small write requests thus require four disk I/Os: one to write the

    new data, two to read the old data and old parity for computing the new parity, and one to write

    the new parity. This is referred to as a read-modify-write procedure. Because a block-interleaved,parity disk array has only one parity disk, which must be updated on all write operations, the

    parity disk can easily become a bottleneck. Because of this limitation, the block-interleaveddistributed parity disk array is universally preferred over the block-interleaved, parity disk array.

    source:Reference 2

    Block-Interleaved Distributed-Parity (RAID Level 5)

    http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.html
  • 7/28/2019 An Introduction to RAID

    9/30

    The block-interleaved distributed-parity disk array eliminates the parity disk bottleneck present

    in the block-interleaved parity disk array by distributing the parity uniformly over all of the

    disks. An additional, frequently overlooked advantage to distributing the parity is that it alsodistributes data over all of the disks rather than over all but one. This allows all disks to

    participate in servicing read operations in contrast to redundancy schemes with dedicated parity

    disks in which the parity disk cannot participate in servicing read requests. Block-interleaveddistributed-parity disk array have the best small read, large write performance of any redundancydisk array. Small write requests are somewhat inefficient compared with redundancy schemes

    such as mirroring however, due to the need to perform read-modify-write operations to update

    parity. This is the major performance weakness of RAID level 5 disk arrays.

    The exact method used to distribute parity in block-interleaved distributed-parity disk arrays can

    affect performance. Following figure illustrates left-symmetric parity distribution.

    Each square corresponds to a stripe unit. Each column

    of squares corresponds to a disk. P0 computes the parity over stripe units 0, 1, 2 and 3; P1

    computes parity over stripe units 4, 5, 6, and 7 etc. (source:Reference 1)

    A useful property of the left-symmetric parity distribution is that whenever you traverse the

    striping units sequentially, you will access each disk once before accessing any disk device. Thisproperty reduces disk conflicts when servicing large requests.

    source:Reference 2

    http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.html
  • 7/28/2019 An Introduction to RAID

    10/30

    P+Q redundancy (RAID Level 6)Parity is a redundancy code capable of correcting any single, self-identifying failure. As large

    disk arrays are considered, multiple failures are possible and stronger codes are needed.Moreover, when a disk fails in parity-protected disk array, recovering the contents of the failed

    disk requires successfully reading the contents of all non-failed disks. The probability ofencountering an uncorrectable read error during recovery can be significant. Thus, applications

    with more stringent reliability requirements require stronger error correcting codes.

    Once such scheme, called P+Q redundancy, uses Reed-Solomon codes to protect against up to

    two disk failures using the bare minimum of two redundant disk arrays. The P+Q redundant disk

    arrays are structurally very similar to theblock-interleaved distributed-parity disk arraysand

    operate in much the same manner. In particular, P+Q redundant disk arrays also perform smallwrite operations using a read-modify-write procedure, except that instead of four disk accesses

    per write requests, P+Q redundant disk arrays require six disk accesses due to the need to updateboth the `P' and `Q' information.

    Striped Mirrors (RAID Level 10)RAID 10 was not mentioned in the original 1988 article that defined RAID 1 through RAID 5.

    The term is now used to mean the combination of RAID 0 (striping) and RAID 1 (mirroring).

    Disks are mirrored in pairs for redundancy and improved performance, then data is striped across

    multiple disks for maximum performance. In the diagram below, Disks 0 & 2 and Disks 1 & 3are mirrored pairs.

    Obviously, RAID 10 uses more disk space to provide redundant data than RAID 5. However, italso provides a performance advantage by reading from all disks in parallel while eliminating the

    write penalty of RAID 5. In addition, RAID 10 gives better performance than RAID 5 while a

    failed drive remains unreplaced. Under RAID 5, each attempted read of the failed drive can beperformed only by reading all of the other disks. On RAID 10, a failed disk can be recovered bya single read of its mirrored pair.

    source:Reference 2

    http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205
  • 7/28/2019 An Introduction to RAID

    11/30

    Tool to calculate storage efficiency given the number of disks and the RAID level (source:Reference 3)

    RAID Systems Need Tape Backups

    It is worth remembering an important point about RAID systems. Even when you use a

    redundancy scheme like mirroring or RAID 5 or RAID 10, you must still do regular tape

    backups of your system. There are several reasons for insisting on this, among them:

    RAID does not protect you from multiple disk failures. While one disk is off line for any reason,

    your disk array is not fully redundant.

    Regular tape backups allow you to recover from data loss that is not related to a disk failure.

    This includes human errors, hardware errors, and software errors.

    BACK/HOME/NEXT

    There are three important considerations while making a selection as to which RAID level is tobe used for a system viz. cost, performance and reliability.

    There are many different ways to measure these parameters for eg. performance could be

    measured as I/Os per second per dollar, bytes per second or response time. We could also

    compare systems at the same cost, the same total user capacity, the same performance or the

    same reliability. The method used largely depends on the application and the reason to compare.For example, in transaction processing applications the primary base for comparison would be

    I/Os per second per dollar while in scientific applications we would be more interested in bytes

    per second per dollar. In some heterogeneous systems like file servers both I/O per second andbytes per second may be important. Sometimes it is important to consider reliability as the base

    for comparison.

    Taking a closer look at the RAID levels we observe that most of the levels are similar to each

    other. RAID level 1 and RAID level 3 disk arrays can be viewed as a subclass of RAID level 5

    disk arrays. Also RAID level 2 and RAID level 4 disk arrays are generally found to be inferior toRAID level 5 disk arrays. Hence the problem of selecting among RAID levels 1 through 5 is a

    subset of the more general problem of choosing an appropriate parity group size and striping unit

    for RAID level 5 disk arrays.

    Some Comparisons

    Given below is a table that compares the throughput of various redundancy schemes for four

    types of I/O requests. The I/O requests are basically reads and writes which are divided intosmall (reads & writes) and large ones. Remembering the fact that our data has been spread over

    multiple disks (data striping), a small refers to an I/O request of one striping unit while a large

    I/O request refers to requests of one full stripe (one stripe unit from each disk in an error

    http://www.ecs.umass.edu/ece/koren/architecture/Raid/raidiator.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidiator.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/striping.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/striping.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/striping.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidiator.html
  • 7/28/2019 An Introduction to RAID

    12/30

    correcting group).

    RAIDType

    SmallRead

    SmallWrite

    LargeRead

    LargeWrite

    StorageEfficiency

    RAID Level 0 1 1 1 1 1

    RAID Level 1 1 1/2 1 1/2 1/2

    RAID Level 3 1/G 1/G (G-1)/G (G-1)/G (G-1)/G

    RAID Level 5 1 max (1/G,1/4) 1 (G-1)/G (G-1)/G

    RAID Level 6 1 max (1/G,1/6) 1 (G-2)/G (G-2)/G

    G : The number of disks in an error correction group.

    The table above tabulates the maximum throughput per dollar relative level 0 for RAID levels 0,1, 3, 5 and 6. For practical purposes we consider RAID levels 2 & 4 inferior to RAID level 5

    disk arrays, so we don't show the comparisons. The cost of a system is directly proportional to

    the number of disks it uses in the disk array. Thus the table shows us that given equivalent cost

    RAID level 0 and RAID level 1 systems, the RAID level 1 system can sustain half the number ofsmall writes per second that a RAID level 0 system can sustain. Equivalently the cost of small

    writes is twice as expensive in a RAID level 1 system as in a RAID level 0 system.

    The table also shows storage efficiency of each RAID level. The storage efficiency is

    approximately inverse the cost of each unit of user capacity relative to a RAID level 0 system.

    Thestorage efficiencyis equal to the performance/cost metric for large writes.

    http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%200http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%201http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%203http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%206http://www.ecs.umass.edu/ece/koren/architecture/Raid/raidiator.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidiator.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidiator.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidiator.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%206http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%203http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%201http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%200
  • 7/28/2019 An Introduction to RAID

    13/30

    source:Reference 1

    The figures above graph the performance/cost metrics from the table above for RAID levels 1, 3,5 and 6 over a range of parity group sizes. The performance/cost of RAID level 1 systems is

    equivalent to the performance/cost of RAID level 5 systems when the parity group size is equal

    to 2. The performance/cost of RAID level 3 systems is always less than or equal to theperformance/cost of RAID level 5 systems. This is expected given that a RAID level 3 system isa subclass of RAID level 5 systems derived by restricting the striping unit size such that all

    requests access exactly a parity stripe of data. Since the configuration of RAID level 5 systems is

    not subject to such a restriction, the performance/cost of RAID level 5 systems can never be lessthan that of an equivalent RAID level 3 system. Of course such generalizations are specific to the

    models of disk arrays used in the above experiments. In reality, a specific implementation of a

    http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.html
  • 7/28/2019 An Introduction to RAID

    14/30

    RAID level 3 system can have better performance/cost than a specific implementation of a RAID

    level 5 system.

    The question of which RAID level to use is better expressed as more general configuration

    questions concerning the size of the parity group and striping unit. For a parity group size of 2,

    mirroring is desirable, while for a very small striping unit RAID level 3 would be suited.

    The figure below plots the performance/cost metrics from the table above for RAID levels 3, 5 &

    6.

    source:Reference 1

    http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.html
  • 7/28/2019 An Introduction to RAID

    15/30

    ReliabilityBACK/HOME

    Reliability of any I/O system has become as important as its performance and cost. This part of the

    tutorial:

    Reviews the basic reliability provided by ablock-interleaved parity disk array

    Lists and discusses three factors that can determine the potential reliability of disk arrays.

    Redundancy in disk arrays is motivated by the need to fight disk failures. Two key factors MTTF(Mean-

    Time-to-Failure) and MTTR(Mean-Time-to-Repair) are of primary concern in estimating the reliability of

    any disk. Following are some formulae for the mean time between failures :

    RAID level 5

    MTTF(disk)2

    ------------------

    N*(G-1)*MTTR(disk)

    Disk array with two redundant disk per parity group (eg: P+Q redundancy)

    MTTF(disk)3

    -------------------------

    N*(G-1)*(G-2)* (MTTR(disk)2

    )

    N - total number of disks in the systemG - number of disks in the parity group

    Factors affecting Reliability

    http://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/raidhome.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/cp.html
  • 7/28/2019 An Introduction to RAID

    16/30

    Three factors that can dramatically affect the reliability of disk arrays are:

    System crashes

    Uncorrectable bit-errors

    Correlated disk failures

    System CrashesSystem crash refers to any event such as a power failure, operator error, hardware breakdown,or software crash that can interrupt an I/O operation to a disk array.

    Such crashes can interrupt write operations, resulting in states where the data is updated andthe parity is not updated or vice versa. In either case, parity is inconsistent and cannot be used in

    the event of a disk failure. Techniques such as redundant hardware and power supplies can be

    applied to make such crashes less frequent.

    System crashes can cause parity inconsistencies in both bit-interleaved and block-interleaved

    disk arrays, but the problem is of practical concern only in block-interleaved disk arrays.

    For, reliability purposes, system crashes in block-interleaved disk arrays are similar to disk

    failures in that they may result in the loss of the correct parity for stripes that were modified

    during the crash.

    Uncorrectable bit-errorsMost uncorrectable bit-errors are generated because data is incorrectly written or graduallydamaged as the magnetic media ages. These errors are detected only when we attempt to readthe data.

    Our interpretation ofuncorrectable bit error rates is that they represent the rate at which

    errors are detected during reads from the disk during the normal operation of the disk

    drive.

    One approach that can be used with or without redundancy is to try to protect against bit

    errors by predicting when a disk is about to fail. VAXsimPLUS, a product from DEC, monitors

    the warnings issued by disks and notifies an operator when it feels the disk is about to fail.

  • 7/28/2019 An Introduction to RAID

    17/30

    Correlated disk failuresCauses: Common environmental and manufacturing factors.

    For example, an accident might sharply increase the failure rate for all disks in a disk array for ashort period of time. In general, power surges, power failures and simply switching the disks

    on and offcan place stress on the electrical components of all affected disks. Disks also share

    common support hardware; when this hardware fails, it can lead to multiple, simultaneous disk

    failures.

    Disks are generally more likely to fail either very early or very late in their lifetimes.

    Early failuresare frequently caused by transient defects which may not have been detected

    during the manufacturer's burn-in process.

    Late failures occur when a disk wears out. Correlated disk failures greatly reduce the reliability

    of disk arrays by making it much more likely that an initial disk failure will be closely followedby additional disk failures before the failed disk can be reconstructed.

    Mean-Time-To-Data-Loss(MTTDL)Following are some formulae to calculate the mean-time-to-data-loss(MTTDL). In a block-

    interleaved parity-protected disk array, data loss is possible through the following three commonways:

    double disk failures

    system crash followed by a disk failure

    disk failure followed by an uncorrectable bit error during reconstruction

    The above three failure modes are the hardest failure combinations, in that we, currently, don't have

    any techniques to protect against them without sacrificing performance.

    RAID Level 5

    Double Disk Failure MTTF(disk1) * MTTF(disk2)

    http://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#MTTDLhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#MTTDLhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#MTTDLhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/glossary.html#MTTDL
  • 7/28/2019 An Introduction to RAID

    18/30

    -----------------------

    N * (G-1) * MTTR(disk)

    System Crash + Disk Failure

    MTTF(system) * MTTF(disk)

    -----------------------

    N * MTTR(system)

    Disk Failure + Bit Error

    MTTF(disk)

    -----------------------

    N * (1 - ( p(disk)) (G-1) )

    Software RAID harmonic sum of the above

    Hardware RAIDharmonic sum of above excluding

    system crash + disk failure

    Failure Characteristics forRAID Level 5Disk Arrays (source:Reference 1)

    P+Q disk Array

    Triple Disk Failures

    MTTF(disk) * (MTTF(disk2) * MTTF(disk3)

    ----------------------------------

    N * (G-1) * (G-2) * MTTR(disk) 2

    System Crash + Disk Failure

    MTTF(system) * MTTF(disk)

    --------------------------

    N * MTTR(system)

    Double disk failure + Bit error MTTF(disk) * MTTF(disk2)

    http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%205
  • 7/28/2019 An Introduction to RAID

    19/30

    ----------------------------------

    N*(G-1)*(1-(p(disk)) (G-2) )* MTTR(disk)

    Software RAID harmonic sum of the above

    Hardware RAID harmonic sum excluding system crash +disk failure

    Failure characteristics for aP+Q disk array (source:Reference 1)

    p(disk) = The probability of reading all sectors on a disk (derived from disk size, sector size, and BER)

    Tool for Reliability Using the Above Equations.

    (source:Reference 3)

    [edit] Implementations

    The distribution of data across multiple drives can be managed either by dedicatedcomputerhardwareor bysoftware. A software solution may be part of the operating system, or it may be

    part of the firmware and drivers supplied with a hardware RAID controller.

    [edit] Software-based RAID

    Software RAID implementations are now provided by manyoperating systems. Software RAID

    can be implemented as:

    a layer that abstracts multiple devices, thereby providing a singlevirtual device(e.g. Linux'smd).

    a more generic logical volume manager (provided with most server-class operating systems, e.g.

    VeritasorLVM).

    a component of the file system (e.g.ZFSorBtrfs).

    [edit]Volume manager support

    Server class operating systems typically providelogical volume management, which allows asystem to use logical

    [jargon]volumes which can be resized or moved. Often, features like RAID or

    snapshots are also supported.

    Vinumis a logical volume manager supportingRAID-0,RAID-1, andRAID-5. Vinum is part of the

    base distribution of theFreeBSDoperating system, and versions exist forNetBSD,OpenBSD, and

    DragonFly BSD.

    http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%206http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%206http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%206http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/reliability_tool.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=9http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=9http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=9http://en.wikipedia.org/wiki/Computer_hardwarehttp://en.wikipedia.org/wiki/Computer_hardwarehttp://en.wikipedia.org/wiki/Computer_hardwarehttp://en.wikipedia.org/wiki/Computer_hardwarehttp://en.wikipedia.org/wiki/Computer_softwarehttp://en.wikipedia.org/wiki/Computer_softwarehttp://en.wikipedia.org/wiki/Computer_softwarehttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=10http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=10http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=10http://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Virtualizationhttp://en.wikipedia.org/wiki/Virtualizationhttp://en.wikipedia.org/wiki/Virtualizationhttp://en.wikipedia.org/wiki/Mdadmhttp://en.wikipedia.org/wiki/Mdadmhttp://en.wikipedia.org/wiki/Mdadmhttp://en.wikipedia.org/wiki/Veritas_file_systemhttp://en.wikipedia.org/wiki/Veritas_file_systemhttp://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)http://en.wikipedia.org/wiki/ZFShttp://en.wikipedia.org/wiki/ZFShttp://en.wikipedia.org/wiki/ZFShttp://en.wikipedia.org/wiki/Btrfshttp://en.wikipedia.org/wiki/Btrfshttp://en.wikipedia.org/wiki/Btrfshttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=11http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=11http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=11http://en.wikipedia.org/wiki/Logical_volume_managementhttp://en.wikipedia.org/wiki/Logical_volume_managementhttp://en.wikipedia.org/wiki/Logical_volume_managementhttp://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Vinum_volume_managerhttp://en.wikipedia.org/wiki/Vinum_volume_managerhttp://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_0http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_0http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_0http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_1http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_1http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_1http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_5http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_5http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_5http://en.wikipedia.org/wiki/FreeBSDhttp://en.wikipedia.org/wiki/FreeBSDhttp://en.wikipedia.org/wiki/NetBSDhttp://en.wikipedia.org/wiki/NetBSDhttp://en.wikipedia.org/wiki/NetBSDhttp://en.wikipedia.org/wiki/OpenBSDhttp://en.wikipedia.org/wiki/OpenBSDhttp://en.wikipedia.org/wiki/OpenBSDhttp://en.wikipedia.org/wiki/DragonFly_BSDhttp://en.wikipedia.org/wiki/DragonFly_BSDhttp://en.wikipedia.org/wiki/DragonFly_BSDhttp://en.wikipedia.org/wiki/OpenBSDhttp://en.wikipedia.org/wiki/NetBSDhttp://en.wikipedia.org/wiki/FreeBSDhttp://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_5http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_1http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_0http://en.wikipedia.org/wiki/Vinum_volume_managerhttp://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Logical_volume_managementhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=11http://en.wikipedia.org/wiki/Btrfshttp://en.wikipedia.org/wiki/ZFShttp://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)http://en.wikipedia.org/wiki/Veritas_file_systemhttp://en.wikipedia.org/wiki/Mdadmhttp://en.wikipedia.org/wiki/Virtualizationhttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=10http://en.wikipedia.org/wiki/Computer_softwarehttp://en.wikipedia.org/wiki/Computer_hardwarehttp://en.wikipedia.org/wiki/Computer_hardwarehttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=9http://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/reliability_tool.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/references.htmlhttp://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html#RAID%20Level%206
  • 7/28/2019 An Introduction to RAID

    20/30

    Solaris SVM supports RAID 1 for the boot filesystem, and adds RAID 0 and RAID 5 support (and

    various nested combinations) for data drives.

    LinuxLVMsupports RAID 0 and RAID 1.

    HP'sOpenVMSprovides a form of RAID 1 called "Volume shadowing", giving the possibility to

    mirror data locally and at remote cluster systems.

    [edit]File system support

    Some advancedfile systemsare designed to organize data across multiple storage devices directly(without needing the help of a third-party logical volume manager).

    ZFSsupports equivalents of RAID 0, RAID 1, RAID 5 (RAID Z), RAID 6 (RAID Z2), and a triple parity

    version RAID Z3, and any nested combination of those like 1+0. ZFS is the native file system on

    Solaris, and also available on FreeBSD.

    Btrfssupports RAID 0, RAID 1, and RAID 10 (RAID 5 and 6 are under development).

    [edit]Other support

    Many operating systems provide basic RAID functionality independently of volume

    management.

    Apple'sMac OS X Server[19]

    andMac OS X[20]

    support RAID 0, RAID 1, and RAID 1+0.

    FreeBSDsupports RAID 0, RAID 1, RAID 3, and RAID 5, and all nestings viaGEOM

    modules[21][22]

    and ccd.[23]

    Linux's md supports RAID 0, RAID 1, RAID 4, RAID 5, RAID 6, and all nestings.[24][25]

    Certain

    reshaping/resizing/expanding operations are also supported.[26]

    Microsoft's server operating systems support RAID 0, RAID 1, and RAID 5. Some of the Microsoft

    desktop operating systems support RAID such as Windows XP Professional which supports RAID

    level 0 in addition to spanning multiple drives but only if using dynamic disks and volumes.

    Windows XP can be modified to support RAID 0, 1, and 5.[27]

    NetBSDsupports RAID 0, RAID 1, RAID 4, and RAID 5, and all nestings via its software

    implementation, named RAIDframe.

    OpenBSDaims to support RAID 0, RAID 1, RAID 4, and RAID 5 via its software implementation

    softraid.

    FlexRAID(for Linux and Windows) is a snapshot RAID implementation.

    Software RAID has advantages and disadvantages compared to hardware RAID. The software

    must run on a host server attached to storage, and the server's processor must dedicate processingtime to run the RAID software; the additional processing capacity required for RAID 0 and

    RAID 1 is low, but parity-based arrays require more complex data processing during write orintegrity-checking operations. As the rate of data processing increases with the number of drivesin the array, so does the processing requirement. Furthermore, all the buses between the

    processor and the drive controller must carry the extra data required by RAID, which may cause

    congestion.

    Fortunately, over time, the increase in commodity CPU speed has been consistently greater than

    the increase in drive throughput;[28]

    the percentage of host CPU time required to saturate a given

    http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)http://en.wikipedia.org/wiki/OpenVMShttp://en.wikipedia.org/wiki/OpenVMShttp://en.wikipedia.org/wiki/OpenVMShttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=12http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=12http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=12http://en.wikipedia.org/wiki/File_systemhttp://en.wikipedia.org/wiki/File_systemhttp://en.wikipedia.org/wiki/ZFShttp://en.wikipedia.org/wiki/ZFShttp://en.wikipedia.org/wiki/Solaris_(operating_system)http://en.wikipedia.org/wiki/Solaris_(operating_system)http://en.wikipedia.org/wiki/Btrfshttp://en.wikipedia.org/wiki/Btrfshttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=13http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=13http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=13http://en.wikipedia.org/wiki/Mac_OS_X_Serverhttp://en.wikipedia.org/wiki/Mac_OS_X_Serverhttp://en.wikipedia.org/wiki/Mac_OS_X_Serverhttp://en.wikipedia.org/wiki/Mac_OS_X_Serverhttp://en.wikipedia.org/wiki/Mac_OS_Xhttp://en.wikipedia.org/wiki/Mac_OS_Xhttp://en.wikipedia.org/wiki/Mac_OS_Xhttp://en.wikipedia.org/wiki/Mac_OS_Xhttp://en.wikipedia.org/wiki/FreeBSDhttp://en.wikipedia.org/wiki/FreeBSDhttp://en.wikipedia.org/wiki/GEOMhttp://en.wikipedia.org/wiki/GEOMhttp://en.wikipedia.org/wiki/GEOMhttp://en.wikipedia.org/wiki/RAID#cite_note-20http://en.wikipedia.org/wiki/RAID#cite_note-20http://en.wikipedia.org/wiki/RAID#cite_note-20http://en.wikipedia.org/wiki/RAID#cite_note-22http://en.wikipedia.org/wiki/RAID#cite_note-22http://en.wikipedia.org/wiki/RAID#cite_note-22http://en.wikipedia.org/wiki/Linuxhttp://en.wikipedia.org/wiki/Linuxhttp://en.wikipedia.org/wiki/RAID#cite_note-23http://en.wikipedia.org/wiki/RAID#cite_note-23http://en.wikipedia.org/wiki/RAID#cite_note-25http://en.wikipedia.org/wiki/RAID#cite_note-25http://en.wikipedia.org/wiki/RAID#cite_note-25http://en.wikipedia.org/wiki/Microsofthttp://en.wikipedia.org/wiki/Microsofthttp://en.wikipedia.org/wiki/RAID#cite_note-26http://en.wikipedia.org/wiki/RAID#cite_note-26http://en.wikipedia.org/wiki/RAID#cite_note-26http://en.wikipedia.org/wiki/NetBSDhttp://en.wikipedia.org/wiki/NetBSDhttp://en.wikipedia.org/wiki/OpenBSDhttp://en.wikipedia.org/wiki/OpenBSDhttp://en.wikipedia.org/wiki/FlexRAIDhttp://en.wikipedia.org/wiki/FlexRAIDhttp://en.wikipedia.org/wiki/RAID#cite_note-27http://en.wikipedia.org/wiki/RAID#cite_note-27http://en.wikipedia.org/wiki/RAID#cite_note-27http://en.wikipedia.org/wiki/RAID#cite_note-27http://en.wikipedia.org/wiki/FlexRAIDhttp://en.wikipedia.org/wiki/OpenBSDhttp://en.wikipedia.org/wiki/NetBSDhttp://en.wikipedia.org/wiki/RAID#cite_note-26http://en.wikipedia.org/wiki/Microsofthttp://en.wikipedia.org/wiki/RAID#cite_note-25http://en.wikipedia.org/wiki/RAID#cite_note-23http://en.wikipedia.org/wiki/RAID#cite_note-23http://en.wikipedia.org/wiki/Linuxhttp://en.wikipedia.org/wiki/RAID#cite_note-22http://en.wikipedia.org/wiki/RAID#cite_note-20http://en.wikipedia.org/wiki/RAID#cite_note-20http://en.wikipedia.org/wiki/GEOMhttp://en.wikipedia.org/wiki/FreeBSDhttp://en.wikipedia.org/wiki/Mac_OS_Xhttp://en.wikipedia.org/wiki/Mac_OS_Xhttp://en.wikipedia.org/wiki/Mac_OS_X_Serverhttp://en.wikipedia.org/wiki/Mac_OS_X_Serverhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=13http://en.wikipedia.org/wiki/Btrfshttp://en.wikipedia.org/wiki/Solaris_(operating_system)http://en.wikipedia.org/wiki/ZFShttp://en.wikipedia.org/wiki/File_systemhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=12http://en.wikipedia.org/wiki/OpenVMShttp://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)
  • 7/28/2019 An Introduction to RAID

    21/30

    number of drives has decreased. For instance, under 100% usage of a single core on a 2.1 GHz

    Intel "Core2" CPU, the Linux software RAID subsystem (md) as of version 2.6.26 is capable of

    calculating parity information at 6 GB/s; however, a three-drive RAID 5 array using drivescapable of sustaining a write operation at 100 MB/s only requires parity to be calculated at the

    rate of 200 MB/s, which requires the resources of just over 3% of a single CPU core.

    Furthermore, software RAID implementations may employ more sophisticated algorithms than

    hardware RAID implementations (e.g. drive scheduling and command queueing), and thus, may

    be capable of better performance.

    Another concern with software implementations is the process of booting the associated

    operating system. For instance, consider a computer being booted from a RAID 1 (mirroreddrives); if the first drive in the RAID 1 fails, then afirst-stage boot loadermight not be

    sophisticated enough to attempt loading thesecond-stage boot loaderfrom the second drive as a

    fallback. In contrast, a RAID 1 hardware controller typically has explicit programming to decide

    that a drive has malfunctioned and that the next drive should be used. At least the following

    second-stage boot loaders are capable of loading akernelfrom a RAID 1:

    LILO(for Linux).

    Some configurations of theGRUB.

    The boot loader for FreeBSD.[29]

    The boot loader for NetBSD.

    For data safety, the write-backcacheof an operating system or individual drive might need to be

    turned off in order to ensure that as much data as possible is actually written to secondary storage

    before some failure (such as a loss of power); unfortunately, turning off the write-back cache has

    a performance penalty that can be significant depending on the workload and commandqueuing

    [jargon]support. In contrast, a hardware RAID controller may carry a dedicated battery-

    powered write-back cache of its own, thereby allowing for efficient operation that is also

    relatively safe. Fortunately, it is possible to avoid such problems with a software controller byconstructing a RAID with safer components; for instance, each drive could have its own battery

    or capacitor on its own write-back cache, and the drive could implementatomicityin various

    ways, and the entire RAID or computing system could be powered by aUPS, etc.[citation needed]

    Finally, a software RAID controller that is built into an operating system usually uses proprietary

    data formats and RAID levels, so an associated RAID usually cannot be shared betweenoperating systems as part of amulti bootsetup. However, such a RAID may be moved between

    computers that share the same operating system; in contrast, such mobility is more difficult when

    using a hardware RAID controller because both computers must provide compatible hardware

    controllers. Also, if the hardware controller fails, data could become unrecoverable unless ahardware controller of the same type is obtained.

    Most software implementations allow a RAID to be created frompartitionsrather than entirephysical drives. For instance, an administrator could divide each drive of an odd number of

    drives into two partitions, and then mirror partitions across drives and stripe a volume across the

    mirrored partitions to emulateIBM's RAID 1E configuration. Using partitions in this way also

    http://en.wikipedia.org/wiki/Bootloaderhttp://en.wikipedia.org/wiki/Bootloaderhttp://en.wikipedia.org/wiki/Booting#Second-stage_boot_loaderhttp://en.wikipedia.org/wiki/Booting#Second-stage_boot_loaderhttp://en.wikipedia.org/wiki/Booting#Second-stage_boot_loaderhttp://en.wikipedia.org/wiki/Kernel_(computing)http://en.wikipedia.org/wiki/Kernel_(computing)http://en.wikipedia.org/wiki/Kernel_(computing)http://en.wikipedia.org/wiki/LILO_(boot_loader)http://en.wikipedia.org/wiki/LILO_(boot_loader)http://en.wikipedia.org/wiki/GNU_GRUBhttp://en.wikipedia.org/wiki/GNU_GRUBhttp://en.wikipedia.org/wiki/GNU_GRUBhttp://en.wikipedia.org/wiki/RAID#cite_note-28http://en.wikipedia.org/wiki/RAID#cite_note-28http://en.wikipedia.org/wiki/RAID#cite_note-28http://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Atomicity_(database_systems)http://en.wikipedia.org/wiki/Atomicity_(database_systems)http://en.wikipedia.org/wiki/Atomicity_(database_systems)http://en.wikipedia.org/wiki/Uninterruptible_power_supplyhttp://en.wikipedia.org/wiki/Uninterruptible_power_supplyhttp://en.wikipedia.org/wiki/Uninterruptible_power_supplyhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Multi_boothttp://en.wikipedia.org/wiki/Multi_boothttp://en.wikipedia.org/wiki/Multi_boothttp://en.wikipedia.org/wiki/Disk_partitioninghttp://en.wikipedia.org/wiki/Disk_partitioninghttp://en.wikipedia.org/wiki/Disk_partitioninghttp://en.wikipedia.org/wiki/Non-standard_RAID_levels#IBM_ServeRAID_1Ehttp://en.wikipedia.org/wiki/Non-standard_RAID_levels#IBM_ServeRAID_1Ehttp://en.wikipedia.org/wiki/Non-standard_RAID_levels#IBM_ServeRAID_1Ehttp://en.wikipedia.org/wiki/Non-standard_RAID_levels#IBM_ServeRAID_1Ehttp://en.wikipedia.org/wiki/Disk_partitioninghttp://en.wikipedia.org/wiki/Multi_boothttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Uninterruptible_power_supplyhttp://en.wikipedia.org/wiki/Atomicity_(database_systems)http://en.wikipedia.org/wiki/Wikipedia:Explain_jargonhttp://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/RAID#cite_note-28http://en.wikipedia.org/wiki/GNU_GRUBhttp://en.wikipedia.org/wiki/LILO_(boot_loader)http://en.wikipedia.org/wiki/Kernel_(computing)http://en.wikipedia.org/wiki/Booting#Second-stage_boot_loaderhttp://en.wikipedia.org/wiki/Bootloader
  • 7/28/2019 An Introduction to RAID

    22/30

    allows for constructing multiple RAIDs in various RAID levels from the same set of drives. For

    example, one could have a very robust RAID 1 for important files, and a less robust RAID 5 or

    RAID 0 for less important data, all using the same set of underlying drives. (Some BIOS-basedcontrollers offer similar features, e.g.Intel Matrix RAID.) Using two partitions from the same

    drive in the same RAID puts data at risk if the drive fails; for instance:

    A RAID 1 across partitions from the same drive makes all the data inaccessible if the single drive

    fails.

    Consider a RAID 5 composed of 4 drives, 3 of which are 250 GB and one of which is 500 GB; the

    500 GB drive is split into 2 partitions, each of which is 250 GB. Then, a failure of the 500 GB drive

    would remove 2 underlying 'drives' from the array, causing a failure of the entire array.

    [edit] Hardware-based RAID

    Hardware RAID controllers use proprietary data layouts, so it is not usually possible to span

    controllers from different manufacturers. They do not require processor resources, the BIOS can

    boot from them, and tighter integration with the device driver may offer better error handling.

    On a desktop system, a hardware RAID controller may be anexpansion cardconnected to a bus(e.g.,PCIorPCIe), a component integrated into themotherboard; there are controllers for

    supporting most types of drive technology, such asIDE/ATA,SATA,SCSI,SSA,Fibre

    Channel, and sometimes even a combination. The controller and drives may be in a stand-aloneenclosure, rather than inside a computer, and the enclosure may bedirectly attachedto a

    computer, or connected via aSAN.

    Most hardware implementations provide a read/writecache, which, depending on the I/O

    workload, improves performance. In most systems, the write cache is non-volatile (i.e. battery-

    protected), so pending writes are not lost in the event of a power failure.

    Hardware implementations provide guaranteed performance, add no computational overhead to

    the host computer, and can support many operating systems; the controller simply presents theRAID as anotherlogical drive.

    [edit] Firmware/driver-based RAID

    A RAID implemented at the level of an operating system is not always compatible with the

    system's boot process, and it is generally impractical for desktop versions of Windows (as

    described above). However, hardware RAID controllers are expensive and proprietary. To fill

    this gap, cheap "RAID controllers" were introduced that do not contain a dedicated RAIDcontroller chip, but simply a standard drive controller chip with special firmware and drivers;

    during early stage bootup, the RAID is implemented by the firmware, and once the operating

    system has been more completely loaded, then the drivers take over control. Consequently, suchcontrollers may not work when driver support is not available for the host operating system.

    [30]

    Initially, the term "RAID controller" implied that the controller does the processing. However,

    while a controller without a dedicated RAID chip is often described by a manufacturer as a

    http://en.wikipedia.org/wiki/Intel_Matrix_RAIDhttp://en.wikipedia.org/wiki/Intel_Matrix_RAIDhttp://en.wikipedia.org/wiki/Intel_Matrix_RAIDhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=14http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=14http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=14http://en.wikipedia.org/wiki/Expansion_cardhttp://en.wikipedia.org/wiki/Expansion_cardhttp://en.wikipedia.org/wiki/Peripheral_Component_Interconnecthttp://en.wikipedia.org/wiki/Peripheral_Component_Interconnecthttp://en.wikipedia.org/wiki/Peripheral_Component_Interconnecthttp://en.wikipedia.org/wiki/PCI_Expresshttp://en.wikipedia.org/wiki/PCI_Expresshttp://en.wikipedia.org/wiki/PCI_Expresshttp://en.wikipedia.org/wiki/Motherboardhttp://en.wikipedia.org/wiki/Motherboardhttp://en.wikipedia.org/wiki/Motherboardhttp://en.wikipedia.org/wiki/Advanced_Technology_Attachmenthttp://en.wikipedia.org/wiki/Advanced_Technology_Attachmenthttp://en.wikipedia.org/wiki/Advanced_Technology_Attachmenthttp://en.wikipedia.org/wiki/Serial_ATAhttp://en.wikipedia.org/wiki/Serial_ATAhttp://en.wikipedia.org/wiki/Serial_ATAhttp://en.wikipedia.org/wiki/SCSIhttp://en.wikipedia.org/wiki/SCSIhttp://en.wikipedia.org/wiki/SCSIhttp://en.wikipedia.org/wiki/Serial_Storage_Architecturehttp://en.wikipedia.org/wiki/Serial_Storage_Architecturehttp://en.wikipedia.org/wiki/Serial_Storage_Architecturehttp://en.wikipedia.org/wiki/Fibre_Channelhttp://en.wikipedia.org/wiki/Fibre_Channelhttp://en.wikipedia.org/wiki/Fibre_Channelhttp://en.wikipedia.org/wiki/Fibre_Channelhttp://en.wikipedia.org/wiki/Disk_enclosurehttp://en.wikipedia.org/wiki/Disk_enclosurehttp://en.wikipedia.org/wiki/Direct-attached_storagehttp://en.wikipedia.org/wiki/Direct-attached_storagehttp://en.wikipedia.org/wiki/Direct-attached_storagehttp://en.wikipedia.org/wiki/Storage_area_networkhttp://en.wikipedia.org/wiki/Storage_area_networkhttp://en.wikipedia.org/wiki/Storage_area_networkhttp://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Logical_diskhttp://en.wikipedia.org/wiki/Logical_diskhttp://en.wikipedia.org/wiki/Logical_diskhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=15http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=15http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=15http://en.wikipedia.org/wiki/RAID#cite_note-29http://en.wikipedia.org/wiki/RAID#cite_note-29http://en.wikipedia.org/wiki/RAID#cite_note-29http://en.wikipedia.org/wiki/RAID#cite_note-29http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=15http://en.wikipedia.org/wiki/Logical_diskhttp://en.wikipedia.org/wiki/Cache_(computing)http://en.wikipedia.org/wiki/Storage_area_networkhttp://en.wikipedia.org/wiki/Direct-attached_storagehttp://en.wikipedia.org/wiki/Disk_enclosurehttp://en.wikipedia.org/wiki/Fibre_Channelhttp://en.wikipedia.org/wiki/Fibre_Channelhttp://en.wikipedia.org/wiki/Serial_Storage_Architecturehttp://en.wikipedia.org/wiki/SCSIhttp://en.wikipedia.org/wiki/Serial_ATAhttp://en.wikipedia.org/wiki/Advanced_Technology_Attachmenthttp://en.wikipedia.org/wiki/Motherboardhttp://en.wikipedia.org/wiki/PCI_Expresshttp://en.wikipedia.org/wiki/Peripheral_Component_Interconnecthttp://en.wikipedia.org/wiki/Expansion_cardhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=14http://en.wikipedia.org/wiki/Intel_Matrix_RAID
  • 7/28/2019 An Introduction to RAID

    23/30

    "RAID controller", it is rarely made clear that the burden of RAID processing is borne by a host

    computer's central processing unit rather than the RAID controller itself. Thus, this new type is

    sometimes called "fake" RAID;Adapteccalls it a "HostRAID".

    Moreover, a firmware controller can often only support certain types of hard drive to form the

    RAID that it manages (e.g. SATA for anIntel Matrix RAID, as there is neither SCSI nor PATAsupport in modern Intel ICHsouthbridges; however, motherboard makers implement RAID

    controllers outside of the southbridge on some motherboards).

    [edit] Hot spares

    Both hardware and software RAIDs with redundancy may support the use of ahot sparedrive;

    this is a drive physically installed in the array which is inactive until an active drive fails, whenthe system automatically replaces the failed drive with the spare, rebuilding the array with the

    spare drive included. This reduces themean time to recovery(MTTR), but does not completely

    eliminate it. As with non-hot-spare systems, subsequent additional failure(s) in the same RAID

    redundancy group before the array is fully rebuilt can cause data loss. Rebuilding can takeseveral hours, especially on busy systems.

    It is sometimes considered that if drives are procured and installed at the same time, several

    drives are more likely to fail at about the same time than for unrelated drives, so rapid

    replacement of a failed drive is important.[citation needed]

    RAID 6 without a spare uses the samenumber of drives as RAID 5 with a hot spare and protects data against failure of up to two drives,

    but requires a more advanced RAID controller. Further, a hot spare can be shared by multiple

    RAID sets.

    [edit] Data scrubbing / Patrol read

    Data scrubbing is periodic reading and checking by the RAID controller of all the blocks in aRAID, including those not otherwise accessed. This allows bad blocks to be detected before they

    are used.[31]

    An alternate name for this ispatrol read. This is defined as a check for bad blocks on each

    storage device in an array, but which also uses the redundancy of the array to recover bad blocks

    on a single drive and reassign the recovered data to spare blocks elsewhere on the drive.[32]

    [edit] Reliability terms

    Failure rate

    Two different kinds of failure rates are applicable to RAID systems. Logical failure is defined as

    the loss of a single drive and its rate is equal to the sum of individual drives' failure rates. System

    failure is defined as loss of data and its rate will depend on the type of RAID. For RAID 0 this is

    equal to the logical failure rate, as there is no redundancy. For other types of RAID, it will be less

    than the logical failure rate, potentially very small, and its exact value will depend on the type of

    http://en.wikipedia.org/wiki/Adaptechttp://en.wikipedia.org/wiki/Adaptechttp://en.wikipedia.org/wiki/Adaptechttp://en.wikipedia.org/wiki/Intel_Matrix_RAIDhttp://en.wikipedia.org/wiki/Intel_Matrix_RAIDhttp://en.wikipedia.org/wiki/Intel_Matrix_RAIDhttp://en.wikipedia.org/wiki/Southbridge_(computing)http://en.wikipedia.org/wiki/Southbridge_(computing)http://en.wikipedia.org/wiki/Southbridge_(computing)http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=16http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=16http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=16http://en.wikipedia.org/wiki/Hot_sparehttp://en.wikipedia.org/wiki/Hot_sparehttp://en.wikipedia.org/wiki/Mean_time_to_recoveryhttp://en.wikipedia.org/wiki/Mean_time_to_recoveryhttp://en.wikipedia.org/wiki/Mean_time_to_recoveryhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=17http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=17http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=17http://en.wikipedia.org/wiki/RAID#cite_note-30http://en.wikipedia.org/wiki/RAID#cite_note-30http://en.wikipedia.org/wiki/RAID#cite_note-30http://en.wikipedia.org/wiki/RAID#cite_note-31http://en.wikipedia.org/wiki/RAID#cite_note-31http://en.wikipedia.org/wiki/RAID#cite_note-31http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=18http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=18http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=18http://en.wikipedia.org/wiki/Failure_ratehttp://en.wikipedia.org/wiki/Failure_ratehttp://en.wikipedia.org/wiki/Failure_ratehttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=18http://en.wikipedia.org/wiki/RAID#cite_note-31http://en.wikipedia.org/wiki/RAID#cite_note-30http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=17http://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Mean_time_to_recoveryhttp://en.wikipedia.org/wiki/Hot_sparehttp://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=16http://en.wikipedia.org/wiki/Southbridge_(computing)http://en.wikipedia.org/wiki/Intel_Matrix_RAIDhttp://en.wikipedia.org/wiki/Adaptec
  • 7/28/2019 An Introduction to RAID

    24/30

    RAID, the number of drives employed, the vigilance and alacrity of its human administrators,

    and chance (improbable events do occur, though infrequently).

    Mean time to data loss (MTTDL)

    In this context, the average time before a loss of data in a given array.[33]

    Mean time to data loss

    of a given RAID may be higher or lower than that of its constituent hard drives, depending upon

    what type of RAID is employed. The referenced report assumes times to data loss are

    exponentially distributed, so that 63.2% of all data loss will occur between time 0 and the

    MTTDL.

    Mean time to recovery(MTTR)

    In arrays that include redundancy for reliability, this is the time following a failure to restore an

    array to its normal failure-tolerant mode of operation. This includes time to replace a failed

    drive mechanism and time to re-build the array (to replicate data for redundancy).

    Unrecoverable bit error rate (UBE)

    This is the rate at which a drive will be unable to recover data after application of cyclic

    redundancy check (CRC) codes and multiple retries.

    Write cache reliability

    Some RAID systems useRAMwrite cache to increase performance. A power failure can result in

    data loss unless this sort ofdrive bufferhas a supplementary battery to ensure that the buffer

    has time to write from RAM to secondary storage before the drive powers down.

    Atomic writefailure

    Also known by various terms such as torn writes, torn pages, incomplete writes, interrupted

    writes, non-transactional, etc.

    [edit] Problems with RAID

    [edit] Correlated failures

    The theory behind the error correction in RAID assumes that failures of drives are independent.Given these assumptions, it is possible to calculate how often they can fail and to arrange the

    array to make data loss arbitrarily improbable. There is also an assumption that motherboard

    failures won't damage the hard drive and that hard drive failures occur more often thanmotherboard failures.

    In practice, the drives are often the same age (with similar wear) and subject to the sameenvironment. Since many drive failures are due to mechanical issues (which are more likely on

    older drives), this violates those assumptions; failures are in fact statistically correlated. In

    http://en.wikipedia.org/wiki/RAID#cite_note-32http://en.wikipedia.org/wiki/RAID#cite_note-32http://en.wikipedia.org/wiki/Mean_time_to_recoveryhttp://en.wikipedia.org/wiki/Mean_time_to_recoveryhttp://en.wikipedia.org/wiki/Random-access_memoryhttp://en.wikipedia.org/wiki/Random-access_memoryhttp://en.wikipedia.org/wiki/Disk_bufferhttp://en.wikipedia.org/wiki/Disk_bufferhttp://en.wikipedia.org/wiki/Atomicity_(database_systems)http://en.wikipedia.org/wiki/Atomicity_(database_systems)http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=19http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=19http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=19http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=20http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=20http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=20http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=20http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=19http://en.wikipedia.org/wiki/Atomicity_(database_systems)http://en.wikipedia.org/wiki/Disk_bufferhttp://en.wikipedia.org/wiki/Random-access_memoryhttp://en.wikipedia.org/wiki/Mean_time_to_recoveryhttp://en.wikipedia.org/wiki/RAID#cite_note-32
  • 7/28/2019 An Introduction to RAID

    25/30

    practice, the chances of a second failure before the first has been recovered (causing data loss) is

    not as unlikely as for random failures. In a study including about 100 thousand drives, the

    probability of two drives in the same cluster failing within one hour was observed to be fourtimes larger than was predicted by theexponential statistical distributionwhich characterizes

    processes in which events occur continuously and independently at a constant average rate. The

    probability of two failures within the same 10-hour period was twice as large as that which waspredicted by an exponential distribution.[34]

    A common assumption is that "server-grade" drives fail less frequently than consumer-gradedrives. Two independent studies (one byCarnegie Mellon Universityand the other byGoogle)

    have shown that the "grade" of a drive does not relate to the drive's failure rate.[35][36]

    In addition, there is no protection circuitry between the motherboard and hard drive electronics,

    so a catastrophic failure of the motherboard can cause the harddrive electronics to fail. Therefore,

    taking elaborate precautions via RAID setups ignores the equal risk of electronics failures

    elsewhere which can cascade to a hard drive failure. For a robust critical data system, no risk can

    outweigh another as the consequence of any data loss is unacceptable.

    [edit] Atomicity

    This is a little understood and rarely mentioned failure mode for redundant storage systems that

    do not utilize transactional features. Database researcherJim Graywrote "Update in Place is aPoison Apple"

    [37]during the early days of relational database commercialization. However, this

    warning largely went unheeded and fell by the wayside upon the advent of RAID, which many

    software engineers mistook as solving all data storage integrity and reliability problems. Many

    software programs update a storage object "in-place"; that is, they write a new version of theobject on to the same secondary storage addresses as the old version of the object. While the

    software may also log some delta information elsewhere, it expects the storage to present"atomic write semantics," meaning that the write of the data either occurred in its entirety or did

    not occur at all.

    However, very few storage systems provide support for atomic writes, and even fewer specifytheir rate of failure in providing this semantic. Note that during the act of writing an object, a

    RAID storage device will usually be writing all redundant copies of the object in parallel,

    although overlapped or staggered writes are more common when a single RAID processor is

    responsible for multiple drives. Hence an error that occurs during the process of writing mayleave the redundant copies in different states, and furthermore may leave the copies in neither the

    old nor the new state. The little known failure mode is that delta logging relies on the original

    data being either in the old or the new state so as to enable backing out the logical change, yetfew storage systems provide an atomic write semantic for a RAID.

    While the battery-backed write cache may partially solve the problem, it is applicable only to apower failure scenario.

    Since transactional support is not universally present in hardware RAID, many operating systemsinclude transactional support to protect against data loss during an interrupted write. Novell

    http://en.wikipedia.org/wiki/Exponential_distributionhttp://en.wikipedia.org/wiki/Exponential_distributionhttp://en.wikipedia.org/wiki/RAID#cite_note-schroeder-33http://en.wikipedia.org/wiki/RAID#cite_note-schroeder-33http://en.wikipedia.org/wiki/RAID#cite_note-schroeder-33http://en.wikipedia.org/wiki/Carnegie_Mellon_Universityhttp://en.wikipedia.org/wiki/Carnegie_Mellon_Universityhttp://en.wikipedia.org/wiki/Carnegie_Mellon_Universityhttp://en.wikipedia.org/wiki/Googlehttp://en.wikipedia.org/wiki/Googlehttp://en.wikipedia.org/wiki/Googlehttp://en.wikipedia.org/wiki/RAID#cite_note-CMUDiskFailure-34http://en.wikipedia.org/wiki/RAID#cite_note-CMUDiskFailure-34http://en.wikipedia.org/wiki/RAID#cite_note-CMUDiskFailure-34http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=21http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=21http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=21http://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)http://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)http://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)http://en.wikipedia.org/wiki/RAID#cite_note-36http://en.wikipedia.org/wiki/RAID#cite_note-36http://en.wikipedia.org/wiki/RAID#cite_note-36http://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)http://en.wikipedia.org/w/index.php?title=RAID&action=edit&section=21http://en.wikipedia.org/wiki/RAID#cite_note-CMUDiskFailure-34http://en.wikipedia.org/wiki/RAID#cite_note-CMUDiskFailure-34http://en.wikipedia.org/wiki/Googlehttp://en.wikipedia.org/wiki/Carnegie_Mellon_Universityhttp://en.wikipedia.org/wiki/RAID#cite_note-schroeder-33http://en.wikipedia.org/wiki/Exponential_distribution
  • 7/28/2019 An Introduction to RAID

    26/30

    NetWare, starting with version 3.x, included a transaction tracking system. Microsoft introduced

    transaction tracking via thejournalingfeature inNTFS. ext4 has journaling with checksums;

    ext3 has journaling without checksums but an "append-only" option, or ext3cow (Copy onWrite). If the journal itself in a filesystem is corrupted though, this can be problematic. The

    journaling in NetAppWAFLfile system gives atomicity by never updating the data in place, as

    doesZFS. An alternative method to journaling issoft updates, which are used in some BSD-derived system's implementation ofUFS.

    This can present as a sector read failure. Some RAID implementations protect against this failuremode by remapping thebad sector, using the redundant data to retrieve a good copy of the data,

    and rewriting that good data to the newly mapped replacement sector. The UBE (Unrecoverable

    Bit Error) rate is typically specified at 1 bit in 1015

    for enterprise class drives (SCSI,FC,SAS),

    and 1 bit in 1014

    for desktop class drives (IDE/ATA/PATA, SATA). Increasing drive capacitiesand large RAID 5 redundancy groups have led to an increasing inability to successfully rebuild a

    RAID group after a drive failure because an unrecoverable sector is found on the remaining

    drives. Double protection schemes such as RAID 6 are attempting to address this issue, but

    suffer from a very high write penalty.

    [edit] Write cache reliability

    The drive system can acknowledge the write operation as soon as the data is in the cache, not

    waiting for the data to be physically written. This typically occurs in old, non-journaled systems

    such as FAT32, or if the Linux/Unix "writeback" option is chosen without any protections likethe "soft updates" option (to promote I/O speed whilst trading-away data reliability). A power

    outage or system hang such as aBSODcan mean a significant loss of any data queued in such a

    cache.

    Often a battery is protecting the write cache, mostly solving the problem. If a write fails becauseof power failure, the controller may complete the pending writes as soon as restarted. This

    solution still has potential failure cases: the battery may have worn out, the power may be off fortoo long, the drives could be moved to another controller, and the controller itself could fail.

    Some systems provide the capability of testing the battery periodically, however this leaves the

    system without a fully charged battery for several hours.

    An additional concern about write cache reliability exists, specifically regarding devices

    equipped with a write-back cachea caching system which reports the data as written as soon asit is written to cache, as opposed to the non-volatile medium.

    [38]The safer cache technique is

    write-through, which reports transactions as written when they are written to the non-volatile

    medium.

    [edit] Equipment compatibili