csce 5760: design for fault tolerance

9/24/19

1

CSCE5760 September 24, 2019 1

CSCE 5760: Design For Fault ToleranceHW #1. Take a look at the solutionsI will quickly summarize how to approach the problems

2.1. We are told that the system failed between 4 and 8 years. We need to find the probability that it failed before 5 years. That we need the conditional probability

P(T<5 | 4<T<8) à failed between 4 and 5 years

Prob{[T < 5]∩[4 ≤ T ≤ 8]}Prob{4 ≤ T ≤ 8}

= Prob[4 ≤ T < 5]Prob{4 ≤ T ≤ 8}

= F(5)− F(4)F(8)− F(4)

= (1− e−05*5) )− (1− e−0.5*4) )

(1− e−05*8) )− (1− e−0.5*4) )= 0.455

Second problem is straightforward --> use Weibull distribution

Third problem involves converting the system into series-parallel combinations

1


CSCE 5760: Design For Fault Tolerance

Let us consider the parallel set in the top middle (I will label the units as C and D)RCD = 1 – (1-RC)(1-RD)

Now let us add the unit in front (say A)RACD = RA*RCD = RA*[1 – (1-RC)(1-RD)]

Now we have two parallel paths (with the bottom left, say B)RBACD = 1- [(1-RB)*{RA*(1 – (1-RC)(1-RD))}]

Finally we add the unit on the right (say ERABCDE = RBACD*RE

Solutions to Chapter 2 Exercises 3

Figure 2.2: A 5-module series-parallel system.

4 blocks and the second unit with the rightmost block. If the reliability of the leftmost 4blocks is RA(t), the system reliability is RA(t)R(t).

Now, we calculate RA(t). This subsystem consists of a parallel arrangement of one unitconsisting of the bottom block and another consisting of the other 3 blocks. If RB(t) is thereliability of the top 3 blocks, RA(t) = 1 − (1 − RB(t))(1 − R(t)).

Next, we calculate RB(t): this subsystem consists of a series arrangement of one block withanother consisting of two blocks in parallel. Hence, we have

RB(t) = R(t)(1 − (1 − R(t))2)

5. The lifetime of each of the seven blocks in Figure 2.3 is exponentially distributed withparameter λ. Derive an expression for the reliability function of the system, Rsystem(t),and plot it over the range t = [0, 100] for λ = 0.02.

Figure 2.3: A 7-module series-parallel system.

Solution:

As before, we decompose this structure into two substructures connected in series. Theleft substructure is four blocks in parallel; the right substructure consists of a series ar-rangement of two blocks in parallel with one block.

The reliability of this system is thus given by:

Rsystem(t) =[

1 − (1 − R(t))4] [

1 − (1 − R(t))(1 − R2(t))]

where R(t) = e−λt.

Solutions for Fault-Tolerant Systems, by Koren and Krishna c⃝2007 Elsevier Science (USA) – Do not copy

2

9/24/19

2


CSCE 5760: Design For Fault ToleranceAnother problem àrelated HW #2

Suppose that the reliability of a system consisting of 4 blocks, two of which are identical, is given by the following equation:

Rsystem = R1R2R3 +R12 - R12R2R3

Draw the reliability block diagram representing the system.

Rsystem = R1R2R3 +R12 - R12R2R3 = R1 [R2R3 + R1 - R1R2R3]

This means module M1 is in series with some other network [R2R3 + R1 - R1R2R3] = = 1 – [(1-R2R3)(1-R1)

We have M2 and M3 in series and together in parallel with M1

4.3. Draw a Markov chain for reliability evaluation of the TMR with three voters shown inFig. 4.6. Assume that the failure rate of each module is �m, and the failure rate of each voteris �v. No repairs are allowed.Solution

Part-II:

Suppose that the reliability of a system consisting of 4 blocks, two of which are identical, isgiven by the following equation:Rsystem = R1R2R3 +R2

1 �R21R2R3 Draw the reliability block diagram representing the system

Solution

4

3


CSCE 5760: Design For Fault ToleranceReview:

Markov Chains and Markov ProcessesDeriving state probabilities

Steady stateAbsorbing states

Deriving differential equations

An example: Duplex with repair

åå¹¹

+-=ij

jjiijijii tPtPdttdP )()()( ll Our general equation

)()(2)( 122 tPtPdttdP µl +-=)()()(2)(2)( 1021 tPtPtPdttdP µlµl +-+=

)(2)()( 010 tPtPdttdP µl -=Initial conditionP2(0)=1, P1(0)=P0(0)=0

4

9/24/19

3


CSCE 5760: Design For Fault ToleranceSolving the differential equations we get

tetP )(2222 )(2)()( µlµllµµlµ +-+++=

te )(222 )( µlµll +-++

tetP )(221 )()(2)(2)( µlµlµllµllµ +-+-++=

te )(222 )(2 µlµll +-+-

)()(1)( 120 tPtPtP --=

Note that this Markov process is irreducible à only recurrent states and no absorbing states

If we have irreducible process, we can derive the steady state behaviorThat is when time t à ¥

5


CSCE 5760: Design For Fault Tolerance)()(2)( 122 tPtPdttdP µl +-=

)()()(2)(2)( 1021 tPtPtPdttdP µlµl +-+=

)(2)()( 010 tPtPdttdP µl -=

Setting dPi(t)/dt=0 we get −2λP2(t)+ 2µP1(t) = 02λP2(t)+ 2µP0(t)− (λ + µ)P1(t)= 0λP1(t)− 2µP0(t)= 0Po(t)+ P1(t)+ P2(t) = 1

And solving these linear equations we get(we can drop t since we are looking at steady state)

22 )(2 µlµ +=P2)(21 µllµ +=P

22 )(0 µll +=P

6

9/24/19

4



So, the long term availability is given by 1- P0 = P1 + P2

2222 )(1)()2( µllµllµµ +-=+= +

Another way of looking at modeling continuous time Markov processes.Let us construct a transition matrix (instantaneous transition probability)Consider a system with two states: for example the following systems

M = m11 m21m12 m22

⎡

⎣⎢

⎤

⎦⎥

p. 27 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab

Single-component system, no repair

• Only two states – one operational (state 1) and one failed (state 2) – if no repair is allow, there is a single, non-reversible

transition between the states (used in availability analysis)

– label l corresponds to the failure rate of the component

1 2 l

Note how the rows and columns are numbered: columns should add to zeroIf we have absorbing states, diagonal elements of those states will be zero

7



M =−2λ µ 02λ −λ − µ 2µ0 λ −2µ

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

Using our example with 3 states we have

ddt

P2(t)

P1(t)

P0(t)

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

=−2λ µ 02λ −λ − µ 2µ0 λ −2µ

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

*

P2(t)

P1(t)

P0(t)

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

)()(2)( 122 tPtPdttdP µl +-=

)()()(2)(2)( 1021 tPtPtPdttdP µlµl +-+=

)(2)()( 010 tPtPdttdP µl -=

These are the same equations we had before

8

9/24/19

5


CSCE 5760: Design For Fault ToleranceWhat are watchdog processor?there are also watchdog timers

Watchdog processors only check for correct execution flow Check the control flow of a program

Signatures with instructions executed by each basic blockSo if some instructions are incorrectly sequenced signature differsComputed vs assigned

Can we use similar idea for detecting security violations?

9


CSCE 5760: Design For Fault ToleranceMalicious failures à not benign

The approach to solve such problems is called Byzantine algorithm– related Byzantine generals stories

Byz(N, m) – N nodes with up to m failed nodes

Step 1: Original source sends data to each of the N-1 receiversStep 2: If m>0, each of the N-1 receivers become sources. They distribute the values received in the previous step to other nodes.

In other words, each of the N-1 nodes apply Byz (N-1, m-1) algorithm-- step 2 is recursive

Step 3: At the end of communications, each node has a “vector” of values.Each node looks for a majority value in this vectorIf no majority, need to a default value

To correctly work, N >= 3m+1

10

9/24/19

6


CSCE 5760: Design For Fault ToleranceSome assumptions

Non faulty unit is truthful about all its messagesFaulty unit may send contradictory or even no messagesTime out mechanism is available to detect “no message”When no message, assume a default value

Chapter 3. Information redundancy à coding theory

Vector space – N dimensional space over a q elements (that is each dimension can take q values) can have Nq vectors.If binary space, q=2

If we select a subspace – with fewer than the maximum vectors we defined a code

The idea is: the only legal vectors are those in the code space.if you see a vector outside the code space – an error.


Error detection

• We can define a code so that errors introduced in a codeword force it to lie outside the range of codewords

– basic principle of error detection


Error detection

all possible words

code words


Error correction

• We can define a code so that it is possible to determine the correct code word from the erroneous codeword

– basic principle of error correction


Error correction

all possible codewords

code words

11


CSCE 5760: Design For Fault ToleranceSimple parity (only for binary data)

one extra bit p =1 or c = d+1– only half of all possible values are used

Distance (Hamming distance) between two codewords - the number of bit positions in which the two words differ

Minimum distance is the minimum of all distances between pairs of codewordsFor simple parity, minimum distance = 2

Distance can be defined for other bases (say decimal)

Simple introduction firstTo detect k bit errors, minimum distance must be >= k+1To detect 1 bit error minimum distance must be at least 2

Simple parity works

To correct (implies detect first) k errors, minimum distance must >= 2K+1

12

9/24/19

7



P1 P2 D1 P3 D2 D3 D4P1 is a parity on D1, D2 and D4

P2 is a parity on D1, D3 and D4P3 is a parity on D2, D3 and D4

Consider data value of 0010P1= 0; P2 = 1 and P3 = 1Codeword = 0101010

How do we detect and correct errors

Compute parities from received data – if all parities check, no errorIf parity Pi does not match correct value, we assign a Si =1So, we have 3 bit Syndrome S1S2S3 which ranges between 000 and 111

This number locates the error bit

Consider a minimum distance 3 Hamming code with 4 bit binary numbers. We need to add 3 parity bits. So we have 7-bit code words for 4-bit data word.

13


CSCE 5760: Design For Fault ToleranceIn general if we have N = 2n data bits, we need n+1 parity bits (distance 3 code)For 32 bit data we need 6 parity bits

Parity bits are located at bit positions corresponding powers of 2

Pi is at bit position 2i

P1 at bit 1; P2 at bit 2, P3 at bit 4; P4 at bit 8; P5 at bit 16; P6 at bit 32

P1 P2 D1 P3 D2 D3 D4 P4 D5 D6 D7 D8 D9 D10 D11 P5

Parity Pi is a parity on “alternating 2i bits, starting Pi

P1 is parity on P1, D1, D2,….P2 is parity on P2, D1, D3, D4, D6, D7,

P3 is parity on P3, D2, D3, D4, D8, D9, D10, D11,….

P4 is parity on P4, D5, D6, D7, D8, D9, D10, D11, ….

14

9/24/19

8


CSCE 5760: Design For Fault ToleranceSeparable codes

data and parity bits are separate Hamming code is a separable code

Non separable codesdata and parity cannot be separatemore complex to decode

Groups, Rings and Fields

A group [S, +] is a non-empty set with a binary operation (operation on two values):closure (a+b is also a member of the group)associative (a+b)+c = a+(b+a)an identity element exist for the operation (a+0 = a)each element has an inverse under the operation (a+(-a)) =0commutative group if the operation is commutative (a+b=b+a)

Examples: Set of all integers with addition (not just positive integers)Polynomials in x. Addition and multiplication of polynomials.Modulo addition and multiplication.

15


CSCE 5760: Design For Fault ToleranceRing. [R, +, *] is a ring on a set of elements R defined with two operations, + and *.

[R, +] must be a commutative group.

and the * operation must satisfy: closure, associative and distributive operation (over +)Distributive: a*(b+c) = a*b + a*c à no need for an inverse

Examples: Set of Integers (both positive and negative) with addition and multiplication is a Ring.

Polynomials over real (or integers) is a ring under addition and multiplication.

Another example:Consider the set of n * n matrices over integers. Define matrix addition and Matrix multiplication

à not regular matrix multiplication but multiplication of respective elements

We have ring here.

16

9/24/19

9


CSCE 5760: Design For Fault ToleranceYet more complex structure is called a field. [F, +, *] is a field if [F,+, *] is a ring

that is, the following holdcommutative a+b = b+a and a*b = b*aAssociative (a+b)+c = a + (b+c) and (a*b)*c = a* (b*c)Distributive a*(b+c) = a*b + a*c

andField has both additive and multiplicative identity elements

a+0 =0; a*1 = a (0 and 1 are identify elements)

Fields have additive and multiplicative inverses (except for 0)

a+ b =0 then b is the inverse of a under additiona*b =1 then b is the inverse of a under multiplication

17


CSCE 5760: Design For Fault ToleranceExamples: The set of real numbers under addition and multiplication is a field

(note set of integers is not – why not?)

Consider Modulo addition and division. [Z5, +5, *5]. Is this a field?

Important: In a field, a*b = 0 if and only if either a or b is 0 (0 is the additive identity).

18

9/24/19

10


CSCE 5760: Design For Fault ToleranceWhat is the inverse of 2? 2 has no inverse. So this is not a field. Why is Z5 a field but not Z4? Zp is a field if p is a prime number.

Now we have a field. This because, we are not looking at Z4, we are looking at Zx where x = 22

Or in other words, a filed over x = py where p is a prime number.

We need to be careful in defining these tables for + and * to make sure we have a fieldSuch fields are called Galois fields.

We will write them as GFq or GF(q) where q = ph and p is a prime number

19


CSCE 5760: Design For Fault ToleranceAnother interesting factor. For any prime power field, there is a non-zero element say g that generates all the other members: that is, all the other number can be written as gi. Such an element is called the primitive element (or generator element).In the above example (with 4 elements 0, 1, a, b), a can be used as the primitive number since

a0 = 0, a= a1, a2= a*a = b, a3= b*a = 1.

More complex structures. Vector Spaces. We have to first start with a field, that is we will start with say [F, +, *] that gives us a field (the set F contains values taken by field elements).We will define vector of dimension n as a n tuple.

v = (v1,v2,....,vn) à all vi elements belong to F

We can now define vector addition as a binary operation.u+v = (u1+v1,u2+v2,....,un+vn)

Is this operation closed? Since the tuple elements are from the field and addition is closed on these elements, vector addition is also closed.

20

9/24/19

11


CSCE 5760: Design For Fault ToleranceLikewise this operation is associative.

The vector 0 = (0,0,.....,0) the zero vector is the identity for the vector addition operation.

What about inverse of a vector?

V-1 = (v1-1,v2-1,....,vn-1) à this vector is in the same field since vi-1 belongs to F

We can also define a scalar multiplication on vectors.

a*V = (a*v1,a*v2,....,a*vn)

Since a*vi is defined on the underlying field elements, this scalar multiplication is closed, and associative.

1 (the multiplication identity of the underlying field) is the identity for scalar multiplication

Scalar multiplication is associative: (c*d)*V = c*(d*V)

21


CSCE 5760: Design For Fault ToleranceDistributive Laws

c*(U+V) = c*U + c*V

(c+d)*V = (c*V + d*V)

For scalar multiplication we cannot define inverses, since the identity is not a vector.

We can sometimes define vector multiplication (this is different from dot product)U*V = (u1*v1, u2*v2,......, un*vn)

Now can define an identity vector (1,1,...,1)

This vector multiplication is closed, associative

U*(c*V + d*W) = c*(U*V) + d*(U*W)

Then we have a linear associative algebra.

22

9/24/19

12


CSCE 5760: Design For Fault ToleranceLinear combination of vectors.

V = a*A + b*B +… where A and B are vectors themselves.

In other words, we can express one vector as a combination of other vectors.

If we start with say n vectors (V1, V2, ..., Vn). we can define a vector space by taking all possible linear combination of these vectors.

The set of vectors (V1, V2, ..., Vk) are known as linearly independent if we cannot express any one of the vectors as a linear combination of the others vectors. à basis vectors

We can talk about Subgroup, we can talk of subspace, subfields etc.

Subgroup. [H, +] is a subgroup of [G,+] if H is a subset of G and H is a group under +.

That is, + must be closed within H, must be associative, must have an identity and must define inverses for the elements of H. But, we need only to check for closure and inverses (the other properties follow).

Examples. Consider the set of {0,1,a,b} with the addition defined – here we have q=22

23



{0,1} forms a subgroup under the same operations. But {a,b} does not.--- does not contain the additive identify

In general for subgroup H = {h0, h1, ...., hm-1}, m is the cardinaility of the set (also known as the rank or order of the group)

The identity element of G is also the identity element of H.

What is the cardinality of group G? Must be an integer multiple of m. Why?

24

9/24/19

13


CSCE 5760: Design For Fault ToleranceConsider writing as follows (each gi is a member of the group but not in H)

h0 h1 h2 ...... hm-1

g1+h0 g1+h1 g1+h2 g1+hm-1

g2+h0 g2+h1 g2+h2 g2+hm-1........................................................................

gk+h0 gk+h1 gk+h2 gk+hm-1

Now we have rows – all of which are members of the group – but not of the subgroup H (except first row).

So, the cardinality of G is an integral multiple of the cardinality of H

Each row of elements {gj+hi | for hi in H} is called a co-set of H, and gj is called the co-set leader

Co-sets are useful in detecting and correcting errors.

25


CSCE 5760: Design For Fault ToleranceHow do we know if two elements g and g' are in the same co-set? If they are, then we can use the co-set leader say gj and express g and g’ as

g = gj+hi and g' = gj+hk

Consider the element g-1+g' = ( gj+hi)-1+ (gj+hk) = ( hi-1+gj -1)+ (gj+hk) = (hi-1+hk)

Since hi and hk are element of H, ( hi-1+hk) must also be an element of H.

In other words, if g-1+g' belongs to H then g and g' belong to the same co-set.

Normal Subgroup. H is a normal subgroup of G if and only if for every element g of G and any element of h, g-1+h+g is in H.

Vector Subspaces. Any linear combination of vectors, say V1, V2, ..., Vn forms a subspace.

26

9/24/19

14


CSCE 5760: Design For Fault ToleranceIf we can obtain the same vector space with k linearly independent vectors U1, U2,.., Uk then we call these k vectors as the basis and the vector subspace describes a k dimensional space.

We can represent any vector in the k-dimensional space as a linear combination of the basis vectors.

V = (a1*U1 + a2*U2 + ....+ak*Uk )

We can also represent this linear combination as(a1, a2, ..., an) [U] where [U] is a matrix written as

U1

U2U3

….Uk

We should know that two Matrices M and N are equivalent if one can be obtained from the other by using 1)row transposition, 2)row arithmetic, 3) column transposition and 4) column arithmetic.

27


CSCE 5760: Design For Fault ToleranceFor example consider the following matrices in binary field

1 0 0 1 1 1 0 0 1 10 1 0 1 0 1 1 0 0 10 0 1 0 1 1 1 1 0 0

These matrices are equivalent (add row 1 to row 2; add row 1 and row 2 to row 3)

For any k-dimensional vector space, we can use the following basis

(1,0,0....0); (0, 1,...,0)....(0,0,...1).In matrix representation this looks like the Identity matrix.

However, we can also represent these vectors as matrices. Consider the matrices aboveWe have 3 independent rows and thus 3 independent vectors

We can generate vectors inn 3 dimensional space (subspace of 4 dimensions)

28

9/24/19

15


CSCE 5760: Design For Fault ToleranceIntroduction to Linear Codes.

A linear code is nothing but a subspace. Every element (or codeword) of a linear code can be expressed as a linear combination of the basis vectors or basis codewords.

The basis codewords can be written as a matrix.

Consider the following Hamming code. p1 p2 m1 p4 m2 m3 m4 à as we have seen before

1 0 0 0 1 0 1 1 0 1 0 1 0 10 1 0 0 1 1 1 0 1 1 0 0 1 10 0 1 0 1 1 0 0 0 0 1 1 1 10 0 0 1 0 1 1 1 1 1 1 1 1 1

Either of these matrices form the basis for the 4-dimensional vector subspace in a 7-dimensional space à And give the code we are looking for.

The left matrix looks like {I | P} where I is the identify matrix.

29


CSCE 5760: Design For Fault ToleranceIn order to construct a linear code, we need to consider a vector space (actually a subspace) over a finite Galois Field (filed over set containing a prime number of elements or power of a prime set)

If we are looking at a k dimensional subspace of a n-dimensional vectors, we will call the code [n, k] code over GFq.

Sometimes we would also describe the "minimum distance" d; [n, k, d] code.

Note since the linear code C is a subspace, for any two vector U and V in C,U+V is also in C (since the vector addition must be closed).

And for any scalar a in GFq, a*V is also in C (scalar multiplication is closed).

The zero vector (0,0,..,0) always belongs to any linear code -- the identity over vector addition.

Thus the codewords (elements of C) are n-dimensional vectors forming a k dimensional vector subspace.

In terms of codes, n-k is the number of parity bits or redundancy, k is the number of data bits.n-k is the redundancy

30

9/24/19

16


CSCE 5760: Design For Fault ToleranceWeight of a vector or a codeword is the number of non-zero elements of the vector (represented as a tuple).

For example, consider a 4 dimensional vector over GF3 –(1, 0, 2, 0) weight is 2 (2, 2, 2, 1) weight is 4.

Distance between two vectors: is the weight of the difference vector.(1,0,2,0) - (2,2,2,1) = (1,0,2,0) + (-2, -2, -2, -1) = (1,0,2,0) + (1, 1,1, 2) = (2, 1, 0, 2)

Difference = weight of (2,1,0,2) = 3.

Note. Weight of a binary vector is the number of 1's in the vectorDistance between 2 binary vectors is the number places they differ

1 0 0 0 1 0 1 weight = 30 1 0 0 1 1 1 weight = 40 0 1 0 1 1 0 weight = 30 0 0 1 0 1 1 weight = 3

Distance [(1000101), (0100111)] = weight (1100010) = 3

31


CSCE 5760: Design For Fault ToleranceMinimum distance of a vector space (or a code -- vector subspace)The minimum of all distances between any pair of vectors or codewords.

Note, the difference between any two codewords is a codeword since vector addition/subtraction is closed.

So, minimum distance of a code is the minimum weight of a codeword in the code.

How to generate codewords. A code is a vector space. We can think of basis vectors which can be written as generator matrix using the basis vectors.

For a [n,k] code -- k-dimensional subspace of n-dimensional space, we need k basis vectors; each vector is a n tuple. That is we have k * n matrix.

Since we are dealing with k-dimensional subspace, we can have only qk codewords (vectors) in our code.

Since we can manipulate matrices and obtain a new equivalent matrix, it will be useful to convert a generator matrix of a code to look like [Ik | Ak x n-k] matrix. (Ik is identity matrix)

32

9/24/19

17


CSCE 5760: Design For Fault ToleranceConsider the Hamming code example

p1 p2 m1 p4 m2 m3 m4 à this is what we want to generate

1 0 0 0 0 1 1 1 0 1 0 1 0 10 1 0 0 1 0 1 0 1 1 0 0 1 10 0 1 0 1 1 0 0 0 0 1 1 1 10 0 0 1 1 1 1 1 1 1 1 1 1 1

We can use either of these generator matrices. The one on the right is how I have shown previously when we generated parity bits for Hamming code.

This is a [7, 4] code. We can have only 24 = 16 codewords. We can encode 4 bit numbers since we have 16 different possible numbers here. We can encode 4-bit numbers to get 7-bit codewords --- 3 parity bits. To do this all we have to do is multiply the 4 bit number by the generator matrix.

The same applies for any [n,k] code over any GFq field.

33


CSCE 5760: Design For Fault ToleranceHow to decode?

Note that Vector spaces forms a group under vector addition. Since a Code is a subspace, it forms a subgroup under vector addition.

It means we can construct co-sets using the elements of the code and the vectors which are not part of the code.

These vectors which are not part of the code indicate errors.

The co-set leader (the vector with the smallest weight in a co-set) indicates the error or syndrome.

We could create an array.

For example with 7-bit binary numbers and 16 code words, we would have 7 co-sets in addition to the code – note the 7 co-sets indicate the 7 possible error states we discussed

So, we can construct an array 8 x 16 (7 co-sets and the codewords)

We can then locate where the received (code with error) falls and appropriately decode it.

34

9/24/19

18


CSCE 5760: Design For Fault ToleranceIn general, we should compute for each syndrome, the coset leader.

How to find co-sets?We can write the co-sest as

h0 h1 h2 ...... hm

g1+h0 g1+h1 g1+h2 g1+hmg2+h0 g2+h1 g2+h2 g2+hm

........................................................................

gk+h0 gk+h1 gk+h2 gk+hm

That is, the first row is our code wordsFor every gi not in the code, we create a new row and gi is the co-set leader

For gi compute syndrome.For any y, compute syndrome – then use the co-set array to locate the row corresponding to the syndrome – find y in that row.

The correct code is found in the first row for that column

35


CSCE 5760: Design For Fault ToleranceHowever, this is too slow or takes of lot of memory. Alternative?

Dual Code space. For each linear code, we can construct another code that is an orthogonal space.

That is, if we have a code space C, we can build another code space D which is perpendicular to C.

Or, if a code (vector) is in C, it is not in D, and vice versa. But D is a linear code by itself.

If a code C is [n,k] subspace, its orthogonal dual will be [n, n-k] subspace.

How do we find this orthogonal space?

If G is the generator matrix for C, the generator matrix for D (say H) satisfies

G x HT = 0; If G = [I | A] then H = [AT | I].

We call H the parity check matrix.

36

9/24/19

19


CSCE 5760: Design For Fault ToleranceAssume we received a data item y (n bit value). If this is correct value, then it must belong C and if not in D.

If we multiply y by HT, we should get a zero vector if y is correct. If we get a not zero, then there is an error and y * HT is called a syndrome.

For each syndrome, we can identify a co-set in which y belongs (hence we can decode).

Consider the [7,4,3] (dimension of vector space, dimension of code space and minimum distance) Hamming code – use the following generator matrix

1 0 0 0 0 1 10 1 0 0 1 0 10 0 1 0 1 1 00 0 0 1 1 1 1

37


CSCE 5760: Design For Fault ToleranceThe following are all the possible codes.

data code

0000 0000 0000001 0001 1110010 0010 1100011 0011 0010100 0100 1010101 0101 0100110 0110 0110111 0111 1001000 1000 0111001 1001 1001010 1010 1011011 1011 0101100 1100 1101101 1101 0011110 1110 0001111 1111 111

What is the parity check Matrix H?

G = [I | A] 1 0 0 0 0 1 10 1 0 0 1 0 10 0 1 0 1 1 00 0 0 1 1 1 1

H = [AT | I]0 1 1 1 1 0 01 0 1 1 0 1 01 1 0 1 0 0 1

38

9/24/19

20


CSCE 5760: Design For Fault ToleranceConsider our Hamming code [7,4,3] – 3 parity bitsWhat are the possible syndromes?

0000 000 0001 111 0010 110 0011 001 0100 101 0101 010 0110 011 0111 100 syndrome

0000 001 0001 110 0010 111 0011 000 0100 100 0101 011 0110 010 0111 101 111

0000 010 0001 101 0010 100 0011 011 0100 111 0101 000 0110 001 0111 110 110

Note the syndrome indicates the error state as represented by the paritiesAnd the co-set leader indicates which bit is in error (or just subtract this co-set from the received value)

consider the correct codeword [1010101] but we received [1010111] y*HT = [001] give us the syndrome

Remember co-sets and co-set leaders?We can think of creating a matrix as we did before.

39


CSCE 5760: Design For Fault Toleranceh0 h1 h2 ...... hm Syndromeg1+h0 g1+h1 g1+h2 g1+hmg2+h0 g2+h1 g2+h2 g2+hm........................................................................

gk+h0 gk+h1 gk+h2 gk+hm

Now we can use this matrix to find the co-set leader based on the syndrome.Then we can subtract the co-set leader from received value to get original codeword

Let us see how we can construct H matrix for any number of data bits Note that if we can construct the H matrix, we can get G matrix for any code

Suppose you want to create a code for d data bitsnote that we need r parity bits where 2r >= d+r+1 à number of statesand n =d+r

Here we are looking at distance 3 (or only one bit is in error hence the number of states is d+r+1)

40

9/24/19

21


CSCE 5760: Design For Fault ToleranceH will have r rows and n columns.We can start with H by using all non zero values for columns Then rearrange H to look like [AT | I] and then obtain G

Consider r= 3 (n=7 and d=4)Let us start H as

0 0 0 1 1 1 10 1 1 0 0 1 11 0 1 0 1 0 1

We can rewrite this into 0 1 1 1 1 0 01 0 1 1 0 1 01 1 0 1 0 0 1

Then G = 1 0 0 0 0 1 10 1 0 0 1 0 10 0 1 0 1 1 00 0 0 1 1 1 1

Each column is a non-zerovalue with 3 bits

41


CSCE 5760: Design For Fault ToleranceFor any field over q, we need to create H using non zero values possible as columns. For example consider the filed over (0, 1, 2) or q=3Let us use 2 parity digits (r =2)

So 32 >= d + r+1 Now we can have 6 data digits

One example H matrix can be (2*8)

0 0 1 1 1 2 2 21 2 0 1 2 0 1 2

We can rewrite the H to be in the standard form [AT | I ]and then get G matrix

42

csce 5760: design for fault tolerance

Documents