lecture notes in computer science: vol_22_no_3.files/jo…  · web viewerror-correcting codes for...

21
Error-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for Shifted Dual Basis of GF(2 m ) Chiou-Yng Lee 1, , Pramod Kumar Meher 2 , and Yung-Hui Chen 1 1 Department of Computer Information and Network Engineering, Lunghwa University of Science and Technology, Taoyuan 333, Taiwan [email protected] 2 Department of Embedded Systems, Institute for Infocomm Research, Singapore, 138632 [email protected] Received 15 May 2011; Revised 20 July 2011; Accepted 30 August 2011 Abstract. This work presents a novel bit-parallel systolic multiplier for the shifted dual basis of GF(2 m ). The shifted dual basis multiplication for all trinomials can be represented as the sum of two Hankel matrix-vector multiplications. The proposed multiplier architecture comprises one Hankel multiplier and one (2 m-1)-bit adder. The algebraic encoding scheme based on linear cyclic codes is adopted to implement the multiplications with concurrent error correction (CEC). The latency overhead is analytically demonstrated to require extra four clock cycles than as compared by the multiplier without CEC. The block Hankel matrix-vector representation is used to derive a CEC scalable SDB multiplier. In the binary field GF(2 84 ), the space overhead of the proposed bit-parallel architecture using cyclic code is around 22.8%. The proposed CEC scalable multiplier given by seven or fewer injection errors can correct nearly 99.6% of error correction. Unlike the existing concurrent error detection multipliers that apply the parity prediction scheme, the proposed architectures have multiple error-detection capabilities. Keywords: Fault-based attack, finite field multiplication, linear cyclic code, concurrent error correction 1 Introduction The preparation of manuscripts which are to be reproduced by photo-offset requires special care. Papers submitted in a technically unsuitable form will be returned for retyping, or canceled if the volume cannot otherwise be finished on time. Hardware architectures used in cryptographic applications require significant numbers of circuits to perform the basic arithmetic operations, Correspondence author

Upload: others

Post on 17-Apr-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Error-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for Shifted Dual Basis of GF(2m)

Chiou-Yng Lee1, *, Pramod Kumar Meher2, and Yung-Hui Chen1

1 Department of Computer Information and Network Engineering,

Lunghwa University of Science and Technology,

Taoyuan 333, Taiwan

[email protected] Department of Embedded Systems,

Institute for Infocomm Research,

Singapore, 138632

[email protected]

Received 15 May 2011; Revised 20 July 2011; Accepted 30 August 2011

Abstract. This work presents a novel bit-parallel systolic multiplier for the shifted dual basis of GF(2 m). The shifted dual basis multiplication for all trinomials can be represented as the sum of two Hankel ma -trix-vector multiplications. The proposed multiplier architecture comprises one Hankel multiplier and one (2m-1)-bit adder. The algebraic encoding scheme based on linear cyclic codes is adopted to implement the multiplications with concurrent error correction (CEC). The latency overhead is analytically demonstrated to require extra four clock cycles than as compared by the multiplier without CEC. The block Hankel ma -trix-vector representation is used to derive a CEC scalable SDB multiplier. In the binary field GF(2 84), the space overhead of the proposed bit-parallel architecture using cyclic code is around 22.8%. The proposed CEC scalable multiplier given by seven or fewer injection errors can correct nearly 99.6% of error correc -tion. Unlike the existing concurrent error detection multipliers that apply the parity prediction scheme, the proposed architectures have multiple error-detection capabilities.

Keywords: Fault-based attack, finite field multiplication, linear cyclic code, concurrent error correction

1 Introduction

The preparation of manuscripts which are to be reproduced by photo-offset requires special care. Papers sub-mitted in a technically unsuitable form will be returned for retyping, or canceled if the volume cannot other -wise be finished on time.

Hardware architectures used in cryptographic applications require significant numbers of circuits to perform the basic arithmetic operations, and especially for multiplication. In cryptographic applications [1], the field size can be in the range of 160 to 2048 bits. The bit-parallel multipliers in such applications require more than a million transistors for high-speed implementation. Consequently, one or more of this large number of transis -tors are likely to become faulty in the operation of the system, potentially producing an incorrect output of the field multiplication. Several works [2]-[5] have, therefore, addressed concurrent error detection (CED)/correc-tion (CEC) for digital electronic circuits. Additionally, various digital signature and identification schemes al -low an attacker to inject faults into the hardware, such that the incorrect outputs can completely expose the se -cret signatures, as seen in [6], [7]. The design of efficient multipliers using CEC is highly desirable as it pro-motes reliable operation in cryptographic hardware.

In the finite field GF(2m), the field element is typically represented using one of three bases polynomial basis (PB), normal basis (NB) and dual basis (DB). Various GF(2m) multipliers [8]-[13], including bit-serial, bit-parallel and digit-serial, have been investigated. Most GF(2m) multipliers seek to minimize the time- and space-spatial complexity. However, the major shortcoming of these circuits is that they cannot detect/correct errors in the results.

Parity codes have been proven to be useful in designing concurrent error detection (CED) finite field arith -metic units using a design technique that is called the parity prediction. This approach performs the necessary pre-calculation of the parity arithmetic operation, and compares the result with actual parity of the output, to

* Correspondence author

Page 2: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Journal of Computers Vol. 22, No. 3, October 2011

detect errors in the results. In recent years, a single parity code was applied to design CED multipliers [14 ], [15]. However, the disadvantage of these architectures is that they need a long execution time for computing parity. Multiple parity prediction in finite field multipliers [16], [17] has been recently proven to have an error detection capability almost 100%.

Linear block codes are well established in the theory and practice of error control coding. In particular, a (2m-1,2m-m) Hamming code [18] is a perfect linear block code, which requires only m-bit parity check digits. The principal techniques associated with linear codes support two fundamentals, (1) error detection and retrial, and (2) error correction. Error detection and retrial involves the use of CED circuitry to monitor the outputs of a circuit. If an error in the system is detected, the method of rollback and retrial can be implemented to present a failure. Error correction involves the use of decoding procedure to correct the errors. Popular decoding pro-cedures, such as BCH codes [19], [20], have the following three major steps; (a) calculate the syndrome values from the received word; (b) determine the error location polynomial (x), and (c) find the root of (x) to cor-rect the errors. CEC PB multipliers that use a single error-correcting (SEC) linear code have recently been de -veloped [4], [21]. However, finite field multipliers support only single error correction.

By using a shifted polynomial basis representation [22], this work describes a new basis multiplication scheme of GF(2m), called shifted dual basis (SDB) multiplication. If the field is constructed from an irreduc -ible trinomial, then the SDB multiplier established to be able to be represented by the sum of two Hankel ma -trix-vector multiplications. Hankel matrix-vector representation is utilized in the development of bit-parallel and scalar SDB multipliers. To develop a CEC architecture, an error decoding approach is employed to correct errors in bit-parallel systolic and scalable multipliers. A CEC scalable multiplier can be designed with multiple error-correcting capabilities. The proposed scheme can be used effectively to produce a fault-tolerant crypto -graphic architecture, even for a large field. This method considerably reduces the space overhead of the pro -posed architectures. For the bit-parallel architectures, the space overhead is approximately 22.8%, while for the traditional CEC multiplier [4], it ranges from 50% to 112% space overhead.

The remainder of the investigation is organized as follows. Section 2 addresses Hankel matrix-vector repre-sentation and a new SDB multiplication architecture. Section 3 presents a CEC SDB multiplier using single er-ror decoding approach. Section 4 develops a CEC SDB-scalable multiplier based on the structure of the CEC multiplier. Section 5 estimates the probability of a corrected error, and complexity issues associated with the proposed architectures. Finally, Section 6 offers concluding remarks.

2 Preliminaries

This section briefly reviews Hankel matrix-vector multiplication and its associated bit-parallel systolic archi -tecture. Hankel matrix-vector representation is employed here to develop a new shifted dual basis multiplica -tion.

2.1 Hankel Matrix-Vector Multiplication

Definition 1. An mm matrix H is called a Hankel matrix, if it satisfies the relation H(p,q)=H(p−1,q+1), for 1p,qm−1, where H(p,q) represents the (p,q)-entry in matrix H. Such a matrix is determined by the 2m−1 entries in the first row and the last column. .

(1)

By Definition 1, an mm Hankel matrix H is defined in terms of the vector H=[h0, h1, …, h2m-2] over GF(2). Let A=[a0, a1, …,am-1] be a vector and C=HA where H is an mm Hankel matrix given by the vector H=[h0, h1,

…, h2m-2]. The product is given by , as can be confirmed.

Example 1. The 55 Hankel matrix given by the vector H=[h0, h1, h2, h3, h4, h5, h6, h7, h8] is

38

Page 3: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Lee et al: Error-Correcting Codes for Concurrent Error Correction and Scalable Multipliers for Shifted Dual Basis of GF(2m)

.

The components of the vector C=HA are thus given by

Figure 1 presents the Hankel multiplier given by Example 1, associated with 55 U-cells for m=5. Each U-cell consists of one 2-input AND gate, one 2-input XOR gate and two 1-bit latches, as shown in Figure 2. The proposed systolic architecture is a regular array of simple processors.

Fig. 1. The bit-parallel systolic Hankel multiplier Fig. 2. The detailed circuit of the U-cell

2.2 New Shifted Dual Basis Multiplication over GF(2m)

The binary field GF(2m) is composed of 2m field elements. Each GF(2m) is constructed with an irreducible poly-nomial P(x)=p0+p1x+…+pm-1xm-1+xm over GF(2). Assume that is the root of the irreducible polynomial P(x), such that P()=0; the set {1,, 2,…, m-1} is called the polynomial basis of GF(2m). Every element A in GF(2m) is represented by

where aiGF(2) ( ) is the ith coordinate of A. Each element in GF(2m) has a unique representation as a linear combination of the polynomial. The multiplication of two elements in GF(2m) is uniquely deter-

39

a0

c0 c1 cjh0 h1 hj hm-1

c0 c1 cj cm-1

cm-1

a1

ai

am-1

U0,0 U0,1 U0,j U0,m-1

U1,0 U1,1 U1,j U1,m-1

Ui,0 Ui,1 Ui,j Ui,m-1

Um-1,0 Um-1,1 Um-1,j Um-1,m-1

hm

hm+i

h2m-2

ai

cjhij

Page 4: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Journal of Computers Vol. 22, No. 3, October 2011

mined by , pjGF(2), since P()=0. Multiplication in GF(2m) can be performed by applying

polynomial modulo P(x) to the field elements, which must be represented as polynomials of degree m-1 or less.Definition 2. Let {i} and {j} be two bases in GF(2m), let f: GF(2m) GF(2) be a linear function, and let be in GF(2m), 0. Then, the bases {i} and {j} are said to be dual with respect to f and if

(2)

Accordingly, the set {j} is called the dual basis of {i}.

For A∈GF(2m), we have , where and are the coordinates of A with respect

to the polynomial basis and its dual basis, respectively. Let two elements A and B in GF(2m) be represented by

and , respectively. Based on the assumption that the element is the

product of A and B, the product yields the following coefficient ci of C.

(3)

Therefore, the product of C with the matrix-vector representation is given by

(4)

Since P()=0, we have

From the above equations, we can be obtained that

(5)

From another work (Fan and Dai, 2005), the shifted polynomial basis (SPB) of GF(2m) is defined as follows:Definition 3. Let v be an integer and the ordered set be a polynomial basis of GF(2m)

over GF(2). The ordered set is called the shifted polynomial basis (SPB) with respect to M.

This SPB is similar to the polynomial basis representation. For an example, if an element AGF(2m), one can then

. (6)

40

Page 5: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Lee et al: Error-Correcting Codes for Concurrent Error Correction and Scalable Multipliers for Shifted Dual Basis of GF(2m)

Now, and are two elements A and B in GF(2m), respectively. Assume that

the element is the product of A and B. This multiplication is called the shifted dual basis (SDB)

multiplication over GF(2m). The property in Definition 1 is used to compute the coefficient ci of C

(7)

Given the matrix-vector representation, the SDB multiplication can be represented by as follows:

(8)

Assume and v=n. The SDB multiplication can also yield the following matrix-vector:

(9)

Assume that the field is constructed from an irreducible trinomial P(x)=xm+xn+1

Hence, in computing , each term bi for -ni2m-2-n can be computed as follows:

, for -ni -1

, for mi2m-2-n

, for 0im-1

(10a)

(10b)

(10c)

Each term bi in (10) is included by at most two coefficients of B because the selected =n. Substituting (10) into (9) yields the product C

(11)

where

,

41

Page 6: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Journal of Computers Vol. 22, No. 3, October 2011

.Both matrices and are the Hankel matrices. Two vectors

and are used to define both matrices and , respectively.

For clarity, the finite field GF(25) that is generated by P(x)=x5+x3+1 is used as an example of the SDB mul-tiplication. Let two sets and be dual bases over GF(25), where is the root of

an irreducible P(x). Assume that and are two field elements A,B GF(25), and

their product is given by . Applying (10) yields the product C

From the above matrix-vector representation, and are Hankel matrices. We can use two vectors and can be used to define both matrices

and , respectively. Therefore, Figure 3 presents the SDB multiplication architecture that is based on the structure of the Hankel multiplier in Figure 1.

Fig. 3. The proposed SDB multiplier for all trinomials

3 Proposed CEC SDB Multiplier

In this section, the cyclic code is used to develop a new CEC SDB multiplier.

3.1 Strategy of CEC Multiplication Architecture42

Hankel multiplier in Fig. 1

Page 7: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Lee et al: Error-Correcting Codes for Concurrent Error Correction and Scalable Multipliers for Shifted Dual Basis of GF(2m)

Cyclic code is an error-correcting code that both detects and corrects errors. In contrast, the simple parity code cannot correct errors, nor can it be used to detect more than a single error. In the (n,m) cyclic code, the binary information sequence is segmented into message blocks of fixed length m, and denoted U=(u0,u1,…,um-1). Definition 4. Let the (n,m) cyclic code be constructed from an irreducible G(x) of degree n-m, and let PU be defined by the parity digits of the message U:

PU = xn-mU mod G(x) (12)

Thus, the systematic structure of the codeword V is given asV=PU+xn-mU (13)

Based on (12) and (13), each message U can be translated into the binary n-tuple V=(v0,v1,…,vn-1) with n > m. A one-to-one correspondence should between a message U and its codeword V, and all the 2m codewords must be distinct.

The following sections, we consider the use of (n,m) cyclic code to develop the CEC SDB multiplier. Figure 4 displays the functional block for CEC multiplier. It comprises three major modules: (i) combinational mod-ule, (ii) parity-check generator (PCG), and (iii) error corrector (EC). The PCG module utilizes two signals A and B to yields the parity-check bits of C=AB. The both values C and are combined to generate the

codeword V = (v0,v1,…,vn-1) = +xn-mC, where C is the output of the combinational circuit and is gener-ated from the PCG module. The EC module for correcting errors in C executes two steps: the first step is to use two values C and to compute the syndrome S, which the syndrome S is calculated using S= +xn-mC

mod G(x); the second step uses the syndrome S to correct the errors in the multiplications.

Fig. 4. The functional block of CEC SDB multiplication architecture

3.2 Parity Check Generator (PCG)

The polynomial representation is applied, and assumes that the ith column of the Hankel matrix in (1) is trans-formed into

(14)

Therefore, we obtain that the Hankel matrix-vector multiplication can be represented as

(15)

Let the polynomial H0 be based on (14) to encode the systematic codeword. The parity check digits of H0

are then given by

(16)

Next, we partition the polynomial H0 into two terms, i.e.,

(17)

where

Accordingly, computing the parity check digits can be obtained as

43

Combinational logic circuit

Parity check Generator (PCG)

A

B

CP

Error corrector (EC)

C’

C

Page 8: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Journal of Computers Vol. 22, No. 3, October 2011

(18)

A comparison of two polynomials H0 and H1 reveals that

The parity check digits of H1 are calculated using

Generally, the parity check digits of Hi are computed as

(19)

where

Notably, when the designed cyclic code is determined, and in (19) are constant polynomials. For

the example of the (7,4) code generated using G=x3+x+1, we can obtain that and .

Therefore, for computing in (19) is easily computed if has been previously determined.Based on (15) and (19), the parity check digits of Hankel multiplication are expressed as

(20)

According to the structure of the proposed Hankel multiplier in Figure 1, assume that the output of the ith

row-cells is , and the codeword Vi is encoded through Ci . We can be obtained that

(21a)

where

(21b)

Equations (19) and (21b) are used to define the PCG i cell to calculate , as shown in Figure 4. When the cyclic code is generated by the irreducible trinomial of the form G=xn-m+xk+1, the 1/x module in Figure 5 is

performed by , which includes two XOR gates. “” and “” denote (n-m)-bit multiplier and

adder, respectively. Thus, the time and space complexity of the PCG cell isSpace complexity:

XOR gate: 2n-2m+2

AND gate: 2n-2m+1

1-bit latch: 3n-3m+1

time delay: TA+TX+TL

44

Page 9: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Lee et al: Error-Correcting Codes for Concurrent Error Correction and Scalable Multipliers for Shifted Dual Basis of GF(2m)

Fig.5. The detailed circuit of the PCGi cell

3.3 Error Corrector (EC) Module

Assume that an (n,m) cyclic code with generator polynomial G of degree n-m is an error correcting code. This subsection derives the structure of the EC module based on this code.

Let V=v0+v1x+…+vn-1xn-1 be a codeword that is encoded by V= +xn-mC where PC represents the parity-check digits of C. Let R=r0+r1x+…+rn-1xn-1 be the received codeword at the output of a combinational logic cir-cuit and parity-check generator in Fig. 4. When the attacker uses the fault based attack scheme to intrude upon the combinational logic circuit, R may be different from V. The polynomial sum

E=R+V=e0+e1x+…+en-1xn-1 (22)

is an n-tuple polynomial, where ei=1 for rivi and ei=0 for ri=vi. This polynomial E is called the error pat-tern. Equation (22) implies that the received polynomial R can be represented as the sum of the codeword V and the error pattern E, given by

R=V+E (23)

Since the codeword V is constructed from G, we can find that the codeword V in cyclic code can be divided

by G, i.e., V=QG. Dividing R by the generator polynomial G yieldsR=QG+S (24)

The remainder S=s0+s1x+…+sn-m-1xn-m-1 is called the syndrome polynomial, which is a polynomial of degree n-m-1 or lower.

Example 3: Let G=1+x+x3 be used to construct the (7,4) cyclic code, and let R be the received codeword. The component of the syndrome polynomial S=s0+s1x+s2x2 can then be computed as

s0=r0+r3+r5+r6,s1=r1+r3+r4+r5,s2=r2+r4+r5+r6.

As stated above, the received codeword divided by G yields the syndrome value S, i.e.,

(25a)

where

(25b)

, for 0ym-1 (25c)

45

1/xai

hm-1+i

hi

iCP

iCP

iHP

iHP 1ˆ

mxP

xP1ˆ

Page 10: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Journal of Computers Vol. 22, No. 3, October 2011

PC is called the actual parity check digits. As the product C is produced through Figure 1, Equation (25b) can be used to define m W-cells to computePC. Figure 6 presents in detail the circuit of the W-cell, which con-sists of one x module, (n-m) AND gates, (n-m) XOR gates, and (2n-2m+1) 1-bit latches. Figure 7 shows the computation of the syndrome S, which is given by (25a). In computing the syndrome polynomial S, the fol-lowing properties can be obtained.Proposition 1. If the designed code uses a t-error-correcting code, then the correction capacity of the proposed

CEC multiplier has t errors.Proposition 2. Based on Proposition 1, assume that the number of errors in the proposed CEC multiplier is at

most one; the computed syndrome value S has the following properties: (i) if S=0, then the mul-tiplier not incurs an injection fault; (ii) if S0, then the multiplier does suffer from the injection faults.

Next, if the error occurs at the ith bit of R, such that E=xi, then, the corresponding syndrome polynomial equals to . Hence, the following propositions are obtained:

Proposition 3. If the ith bit is corrected, such that , then .

Proposition 4. If an error has occurred at the ith bit, such that , then .

From the structure of R= +xn-mC, where C=c0+c1x+…+cm-1xm-1, Propositions 3 and 4 are adopted to detect the error in C. To correct the ith bit of C, we can be defined by the EC i-cell, as shown in Fig. 8. The ECi-cell executes the functionality to correct errors in the multiplier. The zero counter (ZC)

exploits . If =0, then the output of ZC is one; otherwise, if ,

then the output of ZC is zero. In Fig.7, XOR-1 is used to perform the error correction, i.e., .

Fig. 6. The W-cell Fig. 7. The S-cell

Fig. 8. The detailed circuit of EC-cell

46

x

cj

jxS

1jCP

jxS

jCP

jxScj

CP

CP

S

Zero counter

cj jxS

S

c’j

Page 11: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Lee et al: Error-Correcting Codes for Concurrent Error Correction and Scalable Multipliers for Shifted Dual Basis of GF(2m)

3.4 Proposed CEC Bit-Parallel Systolic SDB Multiplier

Individual modules are derived and shown in detail in the previous sub-sections. Figure 9 presents the pro-posed bit-parallel multiplier with CEC capability. This works discusses here the integration of individual mod-ules to provide the desired functionality in the proposed structure. In the initial step, the input

to be used in the W0-cell is pre-computed based on the (n,m) cyclic code. , and

are used to input PCG0-cell for computing . At the ith computational loop, the PCG-cell

yields according to (24b). The W-cell generates the actual parity check digits . Over 2m recursive op-

erations, Wm-1 and PCGm-1 cells produce and , respectively. The output of the S cell is . Therefore, the EC-cell can be used for correcting the multiplication. The proposed CEC Hankel multiplier has the space overhead of m PCG cells, m W cells, m EC cell, and one S-cell.

According to Figure 3, the SDB multiplier includes one Hankel multiplier and one (2m-1)-bit adder. Fig -ure10 presents the proposed CEC SDB multiplier based on the structure of CEC Hankel multiplier.

Fig. 9. The proposed bit-parallel systolic Hankel multiplier with CEC capability

47

a0

c0 c1 cjh0 h1 hj hm-1cm-1

a1

ai

am-1

U0,0 U0,1 U0,j U0,m-1 PCG0

U1,0 U1,1 U1,j U1,m-1 PCG1

Ui,0 Ui,1 Ui,j Ui,m-1 PCGi

Um-1,0 Um-1,1 Um-1,j Um-1,m-1 PCGm-1

W0 W1 Wj Wm-1 S

h0

h1

hi

hm-1

hm+1

hm+i

h2m-1

mnxS ˆ

0

xm PPP xH 11

0

ˆˆˆ0

CP

CP

0

hm

EC0 EC1 ECj ECm-1

c1c0 cj cm-1

Page 12: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Journal of Computers Vol. 22, No. 3, October 2011

Fig. 10. The proposed CEC SDB multiplier for all trinomials

4 Proposed Scalable SDB Multiplier with CEC Capability

The previous section demonstrated that the computation of the SDB multiplication for all trinomials can be de -composed into two Hankel matrix-vector multiplications. The structure of the proposed SDB multiplier com -prises one field adder and one CEC Hankel multiplier. The circuit complexity of the proposed CEC multiplier is thus proportional to O(m2) for large values of m. One shortcoming of this CEC multiplier is that it can per-form only a single error correction in the result. To solve such problem, this section derives another design in which by using cyclic code is used to correct multiple errors in a scalable SDB multiplier.

The proposed scalable SDB multiplication for all trinomials firstly constructs a smaller scale SDB multi -plier of n data bits (i.e. an nn Hankel multiplier) using the proposed architecture in the previous section. A scalable architecture is then derived to obtain the complete m-bit SDB multiplication by iteratively applying this nn Hankel Multiplication k2 times, where k=m/n. The derivation is given in detail as below.

Definition 5(block Hankel matrix-vector). Assume that H is an m m Hankel matrix and V is an m1 column vector. If m=nk, then matrix H and vector V can be split as follows:

and

where each Hi(for 0i 2k-2) is an nn matrix in Hankel form and each Vj (for 0 j k - 1) is an n 1 column vector.

Assume that C=[C0,C1,…,Ck-1]T is the result of HV. The vector Ci can be calculated as follows

Therefore, the SDB multiplication in (21) can also be expressed by

(26)

The proposed CEC Hankel multiplier in Figure 8 is employed and Eq. (26) applied to construct the new CEC scalable SDB multiplier, as shown in Fig. 10. The circuit includes one register, one matrix addition cir -cuit, and one nn CEC Hankel multiplier. The CEC Hankel multiplier can correct a single error. In the initial step, the register C is set to zero. In the first round, k-time Hankel multiplications are applied to compute 48

B0 B1

A

CEC Hankel multiplier

C

0HP

Page 13: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Lee et al: Error-Correcting Codes for Concurrent Error Correction and Scalable Multipliers for Shifted Dual Basis of GF(2m)

[C0,C1,…,Ck-1]=[C0+(B0,0+B1,0)A0, C1+(B0,1+B1,1)A0, …, Ck-1+(B0,k-1+B1,k-1)A0], and the result is stored in the regis-ter C. Similarly, in the second round, [C0,C1,…,Ck-1]=[C0+(B0,0+B1,0)A1, C1+(B0,1+B1,1)A1, …, Ck-1+(B0,k-1+B1,k-

1)A1] is computed. The register C outputs the results of the SDB multiplication after k rounds.

5 Simulation Results

Section 3 proposed the CEC SDB multiplier using the cyclic code. Its architecture comprises one CEC multi-plier, one (2m-1)-bit adder and one converter module. Section 4 presented the CEC scalable SDB multiplier, which is based on the block Hankel matrix-vector representation. Table 1 compares them. The proposed SDB multiplier in Fig.3 is lower than the existing DB multiplier [12]. To simulate the time- and space-overhead, the Altera FPGA environment was used with a Cyclone-III device to simulate the proposed CEC SDB multiplier in Fig. 9. The code generator polynomial x6+x+1 is used to construct CEC multiplier in Fig.9. Table 2 presents the simulated results for three fields GF(284), GF(260) and GF(236). In GF(284), the proposed CEC multiplier has an approximately 22.8% logic element (LE) overhead, a 220.41 MHz frequency and a 139.7 mW thermal power dissipation (TPD). The latency overhead of the proposed CEC multiplier is an increase of four clock cy-cles. Table 3 compares the complexities of the proposed architectures and described elsewhere of [3], [23]. In the field GF(284), the proposed CEC multiplier has about 22.8% area overhead, while the CED multiplier [23] required approximately 52.9% area overhead. If the field is large, then the designed CEC multiplier will obtain low-space overhead architecture. To estimate the probability of corrected errors, the proposed CEC scalable multiplier (Fig.11) over GF(284) is simulated with 10000 random inputs. Table 4 presents the simulation results in terms of probability in relation to the number of injected faults. Based on the simulation results, the pro-posed CED multiplier given by seven or fewer injection errors can correct nearly 99.6% probability.

Fig.11. The proposed scalable SDB multiplier with CEC capability

Table 1. Comparison of SDB multipliers over GF(2m) with and without concurrent error detection

Multiplier DB multiplier [12]

SDB multiplierWithout

CEC(Fig.3)

SDB multiplier withCEC (Fig.9)

SDB multiplierWith CEC(Fig.10)

Structure Bit-parallelsystolic

Bit-parallelsystolic Bit-parallel systolic Scalable

Latency 3m 2m-1 2m+3 k2+2d+3Propagation de-layPer cell

TA+2TX+TL TA+TX+TL TA+TX+TL TA+TX+TL

Space complex-ity#AND#XOR

2m2

2m2

7m2

m2

m2+m-12m2

m2+3(n-m)m+mm2+4m+(n-m)(4m+1)2m2+5(n-m)m+2mm

d2+3(n-d)d+dd2+4d+(n-d)(4d+1)2d2+5(n-d)d+2d

49

CEC Hankel multiplier in Fig. 9

C0C1CiCk-1

0,1

1,1

22,1

B

B

B k

0,0

1,0

22,0

B

B

B k

0

1

1

AA

Ak

Page 14: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Journal of Computers Vol. 22, No. 3, October 2011

#latch#Zero counter d

Error correction no no yes yes

Note:(1) k=m/d, where d is selected digit size. (2) The proposed multipliers in Figs. 9 and 10 are based on the code generator xn-m +xq+1 with q(n-m)/2 to construct CEC multiplier architecture, where n is a code length.

Table 2. Area overhead and thermal power dissipation in CEC miltiplier for some fields

Table 3. Comparison of related CED/CEC multipliers over GF(2m)

Multipliers [23] [3] Fig.9

Basis Polynomialbasis Dual basis Shifted dual basis

Structure Bit-serial Bit-parallelsystolic Bit-parallel systolic

Area overhead 52.9% 46% 22.8%Latency overhead 1.84% Two cycles Four cyclesErrorDetection/correction Error detection Error detection Error correction

Table 4. The probability of error correction vs. fault injection for CEC scalable multiplier over GF(284)

#fault infection 1 2 3 4 5 6 7The probability of er-

ror correction 100% 100% 99.95% 99.88% 99.84% 99.67% 99.56%

6 Conclusions

This investigation develops a new basis representation, called a shifted dual basis of GF(2m). This multiplica-tion for all trinomials can be represented by the sum of two Hankel multiplications. The CEC bit-parallel SDB multiplier is derived using error-correcting codes. The latency overhead of the proposed CEC multiplier is only an extra four cycles. According to a simulation, the proposed CEC SDB multiplier over GF(2 84) has an LE overhead of around 22.8%. The Hankel block matrix-vector representation is used to derive the CEC scal -able SDB multiplier. The experimental results suggest that the proposed CEC scalable multiplier has a 99.6% probability of correcting errors. The proposed CEC architectures can therefore be used effectively in fault tol -erant cryptosystems.

References

[1] IEEE Standard 1363-2000, IEEE standard specifications for public-key cryptography.

[1] C.Y. Lee, “Concurrent Error Detection in Digit-Serial Normal Basis Multiplication over GF(2 m),” in Proceed-

ings of International Conference Advanced Information Networking and Applications Workshops (AINA2008) , Ok-

inawa, Japan, pp. 1499-1504, 2008.

[2] C.Y. Lee, C.W. Chiou, J.M. Lin, “Concurrent Error Detection in a Bit-parallel Systolic Multiplier for Dual Basis

of GF(2m),” Journal of Electronic Testing: Theory and Applications, Vol. 21, No. 5, pp. 539-549, 2005.

50

fieldsSDB multiplier in Fig.3 CEC multiplier in Fig.9 LE

overheadLEs TPD LEs TPDGF(284) 15298 139.7mW 18879 139.7mW 22.8%GF(260) 8427 132.6mW 11238 132.7mW 33.35%GF(236) 3466 125.7mW 6409 125.4mW 83.48%

Page 15: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Lee et al: Error-Correcting Codes for Concurrent Error Correction and Scalable Multipliers for Shifted Dual Basis of GF(2m)

[3] J. Mathew, A. Costas, A. M. Jabir, M. Rahaman, H. D. K. Pradhan, “Single error correcting finite field multi -

pliers over GF(2m),” in Proceedings of 21st International Conference VLSI Design, Hyderabad, India, pp. 33-38,

2008.

[4] R. Karri, G. Kuznetsov, M. Goessel, “Parity-based concurrent error detection of substitution-permutation net -

work block ciphers,” in Proceeding of CHES 2003, Springer LNCS 2779, Cologne, Germany, pp. 113-124, 2003.

[5] M. Joye, A. K. Lenstra, J. J. Quisquater, “Chinese Remaindering Based Cryptosystems in the Presence of

Faults,” Journal of Cryptography, Vol. 12, pp. 241-245, 1999.

[6] D. Boneh, R. A. DeMillo, R. J. Lipton, “On the Importance of Eliminating Errors in Cryptographic Computa-

tions,” Journal of Cryptography, Vol. 14, pp. 101-119, 2001.

[7] C.Y. Lee, “Low-complexity Parallel Systolic Montgomery Multipliers Over GF(2m) using Toeplitz Matrix-vec-

tor Representation,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences ,

Vol. E91-A, No. 6, pp. 1470-1477, 2008.

[8] C.Y. Lee and C.W. Chiou, “New Bit-parallel Systolic Architectures for Computing Multiplication, Multiplica-

tive Inversion and Division in GF(2m) under the Polynomial Basis and Normal Basis Representations,” Journal of

VLSI Signal Processing Systems, Vol. 52, No. 3, pp. 313-324, 2008.

[9] C.Y. Lee, “Low-complexity Bit-parallel Systolic Multipliers Over GF(2m),” Integration – The VLSI Journal,

Vol. 41, No. 1, pp. 106-112, 2008.

[10] C.Y. Lee, C.W. Chiou, J.M. Lin, C.C. Chang, “Scalable and Systolic Montgomery Multiplier over GF(2m) Gen-

erated by Trinomials,” IET Circuits, Devices & System, Vol. 1, No. 6, pp. 377-484, 2007.

[11] S. T. J. Fenn, M. Benaissa, O. Taylor, “Dual Basis Systolic Multipliers for GF(2m),” IEE Computers and Digital

Techniques, Vol. 144, No. 1, pp. 43-46, 1997.

[12] M. Diab and A. Poli, “New Bit-serial Systolic Multiplier for GF(2m) using Irreducible Trinomials,” Electronics

Letters, Vol. 27, No. 13, pp. 1183-1184, 1991.

[13] C.Y. Lee, “Concurrent Error Detection Architectures for Gaussian Normal Basis Multiplication over GF(2 m),”

Integration – The VLSI Journal, Vol. 43, No. 1, pp. 113-123, 2010.

[14] A. Reyhani-Masoleh and M. A. Hasan, “Fault Detection Architectures for Field Multiplication using Polynomial

Bases,” IEEE Transactions on Computers, Vol. 55, No. 9, pp. 1089-1103, 2006.

[15] S. Bayat-Sarmadi and M. A. Hasan, “On Concurrent Detection of Errors in Polynomial Basis Multiplication,”

IEEE Transactions on VLSI Systems, Vol. 15, No. 1, pp. 413-426, 2007.

[16] C.Y. Lee, P. K. Meher, J.C. Patra, “Concurrent Error Detection in Bit-serial Normal Basis Multipliers Over

GF(2m),” IEEE Transactions on VLSI Systems, Vol. 18, No. 8, pp. 1234-1238, 2010.

[17] W. Hamming, “Error Detecting and Error Correcting Codes,” Bell Systems Technical Journal, pp. 147-160,

1950.

[18] K. K. Parhi, “Eliminating the Fanout Bottleneck in Parallel Long BCH Encoders,” IEEE Transactions on Cir-

cuits and Systems-I, Vol. 51, No. 3, pp. 512-516, 2004.

[19] T. Kasiimi, T. Takata, T. Fujiwara, S. Lin, “Trellis Diagram Construction for Some BCH Codes,” in Proceed-

ings of IEEE International Conference on Information Theory and Applications, Honolulu, HI, 1990.

51

Page 16: Lecture Notes in Computer Science: Vol_22_No_3.files/JO…  · Web viewError-Correcting Codes for Concurrent Error Correction in Bit-parallel Systolic and Scalable Multipliers for

Journal of Computers Vol. 22, No. 3, October 2011

[20] C.Y. Lee, P.K. Meher, W.Y. Lee, “Fault-Tolerant Bit-Parallel Multiplier for Polynomial Basis of GF(2 m),” in

Proceedings of IEEE Circuits and Systems International Conference on Testing and Diagnosis (ICTD'09) ,

Chengdu , China, pp. 1-4, 2009.

[21] H. Fan and Y. Dai, “Fast Bit-parallel GF(2n) Multiplier for All Trinomials,” IEEE Transactions on Computers,

Vol. 54, No. 4, pp. 485-490, 2005.

[22] S. Bayat-Saramdi and M. A. Hasan, “Run-time Error Detection in Polynomial Basis Multiplication using Linear

Codes,” in Proceedings of IEEE International Conference on Application-specific Systems, Architectures and Pro-

cessors, Québec, Canada, pp. 204-209, 2007.

52