part8 ch3 raid

7/31/2019 Part8 Ch3 Raid

1/12

Page 1

Copyright 2007 Koren & Kri shna, Morgan- KaufmanPart. 8 .1

FAULT TOLERANT SYSTEMS

ht t p: / / www. ecs.umass. edu/ ece/ kor en/ Fault Toler ant Syst ems

Par t 8 RAI D Syst ems

Chapt er 3 I nf or mat ion Redundancy


RAI D - Redundant Ar r ays ofI nexpensive (I ndependent ) Disks

RAI D1 - t wo mirr or ed disks

I f one disk f ails, t he ot her can cont inue

I f bot h wor k:

speeds up r ead accesses - dividest hem among two disks

Wr it e accesses are slowed downComput ing r eliabili t y, availabili t y, and MTTDL (mean

t ime t o data loss) of RAI D1

1

0

0

1

0

0

1

1

1

0

0

1

0

0

1

1


2/12

Page 2


RAI D1 - Reliabilit y Calculat ion

Assumpt ions:

disks f ail independent ly

f ailur e pr ocess - Poisson pr ocess wit h r at e

r epair t ime - exponent ial wit h mean t ime 1/

Mar kov chain: st at e - number of good disks

Reliabilit y at t ime t -

)()(2)(

122 tPtPdt

tdP += )(2)()(

)(21

1 tPtPdt

tdP ++=

)()(1)(210tPtPtP = 0)0()0(;1)0(

102=== PPP

)(1)()()( 021 tPtPtPtR =+=


RAI D1 - MTTDLCalculat ionStarting in state 2 at t =0

- mean t ime bef or e ent er ing state 1 = 1/ (2)

Mean t ime spent in state 1 is 1/ ( + )

Go back t o state 2 wit h pr obabilit y q = / ( + )

or to state 0 wit h pr obabilit y p = / ( + )

Pr obabilit y of n visit st o state 1 bef or e t r ansit ion t o state 0 is

Mean t ime t o ent er state 0 :

pqn 1

)(2

3)

1

21()(

02

+

+=

++= nnnT

)1()(02

1

102

1

1

=

=

== TpnqnTpqMTTDLn

n

n

n

202

2

3)1(

+==

p

T


3/12

Page 3


Appr oximat eReliabilit y of RAI D1

I f >> , the transition

r at e int o st at e 0 f rom t heaggr egate of st at es 1 and 2 is 1/ MTTDL

Appr oximate r eliabilit y:

I mpact of Disk lif et ime

I mpact of Disk Repair t ime

MTTDLtetR

=)(


RAI D1 - Availabi li t yCalculat ion

Mar kov chain: st at e - number of good disks

The solut ion t o t he dif f erent ial equat ions:

Long- t erm availabilit y -

A = P2 + P1 = 1 - P0

2222 )(1)()2( +=+= +

tetP

)(2222

)(2)()( ++++= te )(222 )( +++

te

)(222 )(2

++

)()(1)( 120 tPtPtP =

tetP

)(221 )()(2)(2)(

++++=


4/12

Page 4


RAI D2

A bank of data disks in parallel wit h Hamming- codeddisks

d dat a disks and c code disks

i - t h bi t of each disk - bi t of a c+d- bit word

Fr om Hamming codes t heor y - t o permit t hecor r ect ion of one bit per wor d -

We wil l not spend mor e t ime on RAI D2 because ot herRAI D designs impose much less overhead

12 ++ dcc


RAI D3

Modif icat ion of RAI D2

Observat ion - each disk has er ror - det ect ioncoding per sect or - a bad sect or can be ident if ied

Bank of d dat a disks t oget her wit h one parit y disk

Data ar e bit - int erleaved acr oss t he dat a disks

The i - t h posit ion of t he pari t y disk cont ains t hepar it y bit associat ed wit h t he bit s in t he i - t hposit ion of each of t he dat a disks


5/12

Page 5


Er r or Det ect ion and Cor rect ion inRAI D3

i - t h bit s of each disk f orm a d+1- bit wor d- d dat a and 1 parity bits

I f j - t h bit in wor d is incor r ect - sect or err or -det ect ing code inj - t h disk will indicat e a f ailur e -f ault will be locat ed - r emaining bit s can be used t or est or e t he f ault y bit

Example: word - 11100 ; dat a bit s - 1110 ;parity bit - 0

I f even par it y is being used - a bit is in er r orThir d disk indicat es an err or in t he r elevant sect or

and t he ot her disks show no such er r or sThe cor r ect wor d is 11000

Copyright 2007 Koren & Kri shna, Morgan- KaufmanPart. 8 . 10

Reliabilit y of RAI D3

Similar analysis t o RAI D1:(d+1) disks inst ead of 2

Syst em f ails (dat a loss) if t wo or mor e disks f ail

Mean t ime t o dat a loss f or t his gr oup is

The reliabilit y is given appr oximat ely by

d(d+1)

d+1 d F

MTTDLtetR

=)(

2)1(

)12(

+++=

dddMTTDL


6/12

Page 6


Compar ing Dif f er ent RAI D3 Syst ems

Unr eliabilit y of RAI D3 syst ems f or dif f er entvalues of d - mean lif et ime of a single disk is500,000 hour s

The d=1 case - ident ical t o RAI D1

Reliabili t y goes down as d increases


RAI D4

Similar t o RAI D3 -but unit of int er leavingblock of ar bit r ar ysize - a stripe

Advant age - a small r ead may be cont ained in onesingle dat a disk, r at her t han int er leaved over all disks

Small read operat ions ar e f ast er in RAI D4

Similar ly f or small wr it e operat ionsWr it e - af f ected dat a disk and par it y disk must be

updat ed

Par it y updat e simple - par it y bit t oggles if dat a bitbeing wr it t en is dif f erent f r om one being over wr it t en

Reliabilit y model f or RAI D4 - ident ical t o RAI D3


7/12

Page 7


RAI D5

Observat ion - par it y disk can be syst em bot t leneck

I n RAI D4 - par it y disk accessed in each wr it e

I n RAI D5 - par it y blocks int erleaved among disks

Every disk has some dat a blocks and some par it yblocks

Reliabili t y model f or RAI D5 same as f or RAI D4Only t he perf or mance model is dif f erent


Modeling Cor r elat ed Failur es

We assumed unt il now t hat disks are independentwit h respect t o f ailur es

Disk f ailur es may be cor r elat ed - power supply andcont r ol ar e t ypically shared among mult iple disks

Disk syst ems are usually made up of st r ings -consist ing of disks t hat shar e power supply, cabling,cooling, and a cont r oller

I f any of t hese shar ed suppor t it ems f ail, t heent ir e st r ing can f ail

I f t he st r ing const it ut es t he RAI D group - dat aloss can occur


8/12

Page 8


Appr oximat e Reliabilit y of St r ing

- f ailur e r at e of t he suppor t element spower, cabling, cooling, cont r ol) of a st r ing

- appr oximat e f ailure r at e due t oindependent f ailur es

I f a RAI D gr oup is cont r olled by a single st r ing -t he aggr egate f ailur e rat e of t he gr oup is

And t he reliabilit y is

strindeptotal +=

indep

str

ttotaletRtotal

=)(


I mpact of St r ing Failur es on RAI D1

Similar r esult s f or RAI D3 and higher levels

Figures of 150,000 hour s f or t he mean st r inglif et ime have been quot ed in t he lit eratur e

At least one manuf act ur er claims mean disklif et imes of 1,000,000 hours

Gr ouping an ent ir e RAI D arr ay as a single st r ingincreases unr eliabilit y by or ders of magnit ude

Mean St r ing Lif et ime


9/12

Page 9


Or t hogonal

Ar r angementof St r ingsand RAI D

Gr oups

Failur e of a st r ing af f ect s onlyone disk in each RAI D group

Since each RAI D can t oler at e t he f ailure of up t oone disk, t his r educes t he impact of st r ing f ailur es

Data loss wil l happen only if any RAI D group has atleast t wo disks down at t he same t ime

Str ing

RAID

group


Appr oximat e Modeling of Or t hogonal Syst ems

Dat a loss is caused by a sequence of event s

A f ailur e can be t r igger ed by an individual diskf ailur e or by a st r ing f ailure - ver y low f ailur e r at es

We will f ind t he (appr oximate) f ailur e rat e due t oeach -

Sum of t hese t wo f ailur e r at es - t he approximat e

over all f ailur e rat e -

I t can t hen be used t o appr oximately det ermineMTTDL - Mean t ime t o dat a loss, and r eliabilit y -probabil it y of no dat a loss over any given per iod oft ime

strindivlossdata +=_

strindiv ;

lossdata_


10/12

Page 10


Ort hogonal Ar r angement - Not at ions

d+1 st r ings, g RAI D gr oups - t ot al of (d+1)g disks

- densit y f unct ion of t he disk repair t ime - f ailur e rat e of a single disk

- pr obabilit y t hat a given individual f ailur et r iggers dat a loss

Appr oximat e r at e (per disk) at which individualf ailur es t r igger dat a loss

= pr obabilit y t hat a second disk f ails in t heaf f ect ed RAI D gr oup while t he f ir st f ailur e is notyet r epair ed

This second f ailur e has t he r at e -t he second disk f ailure can happen eit her due t o anindividual disk or st r ing f ailur e

disk

)(tfdisk

indiv

indivdisk

indiv

)( strdiskd +


Calculat ing Failure Rat es - Condit ioning on - repai r t ime of f i rst disk fa ilure

Uncondit ional pr obabilit y of dat a loss -

- t he Laplace t ransf orm of Appr oximat e rat e at which dat a loss is t r iggered by

individual disk f ailur e -

indiv

)(1}|{Pr strdisk

detakesrepairLossDataob

+=

(.)*diskF (.)diskf


11/12

Page 11


Calculat ing Failur e Rat es -

Tot al r at e of st r ing f ailur es:

When a st r ing f ails - we repair t he st r ing, and anyindividual disks af f ect ed by t his st r ing f ailur e

Pessimist ic assumpt ion - a second f ailur e can happenat any group or disk bef or e all gr oups ar e f ullyr est or ed

Example: arr ival of a second st r ing f ailur e t o t hesame st r ing bef or e f ir st f ailur e has gone

Opt imist ic assumpt ion: disks af f ect ed by st r ingf ailur e ar e immune t o f ur t her f ailur es bef or e st r ing

and af f ect ed disks ar e f ully r est or edThe dif f erence between f ailur e r at es det ermines how

t ight bounds are

strd )1( +

str


Pessimist ic Calculat ion - (random) t ime t aken t o r epair t he f ailed st r ing

and all disks af f ect ed by it

- probabilit y densit y f unct ion of

- Laplace t ransf orm of

Pessimist ic assumpt ion - r at e of addit ional f ailur es

Condit ioning upon - t he pr obabilit y of dat a loss

I nt egr at ing on - uncondit ional pessimist icpr obabilit y of dat a loss

diskstrdpess gd )1()1( +++=

pess

pess ep

= 1

()

strF

()str

f

)(1 pessstrFpess

=

()str

f


12/12

Page 12


Opt imist ic Calculat ion

Opt imist ic assumpt ion - r at e of addit ional f ailur es

Condit ioning upon - t he pr obabilit y of dat a loss is

I nt egr at ing on - uncondit ional opt imist ic probabilit yof dat a loss

diskstrdopt dg +=

opt

opt ep

= 1

)(1 optstrFopt

=


Reliabilit y of Or t hogonal Syst emRat e of st r ing f ailur es t r iggering dat a loss

Appr oximate rat e of data loss in t he syst em -

Mean Time To Data Loss -

Syst em r eliabilit y -

strindivlossdata +

_

lossdata

MTTDL_

1

tlossdataetR _)(

)(;)1(opt

orpessstr

dstr +=

part8 ch3 raid

Documents