reliable inversion in gf 2 with redundant arithmetic for secure...

696 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO. 3, MARCH 2018

Reliable Inversion in GF(28) With RedundantArithmetic for Secure Error Detection of

Cryptographic ArchitecturesMehran Mozaffari Kermani, Senior Member, IEEE, Amir Jalali, Reza Azarderakhsh, Member, IEEE,

Jiafeng Xie, Member, IEEE, and Kim-Kwang Raymond Choo, Senior Member, IEEE

Abstract—In secure cryptographic primitives, such as blockciphers, the reliability of hardware implementations needs tobe closely considered because faults in the hardware imple-mentations can potentially reduce or impact on the underlyingsecurity. In this paper, we present approaches to detect errorsin hardware implementations of the inversion in GF(28). Theproposed approaches are based on both nonredundant andredundant arithmetic, utilizing normal basis (nonredundant)and two redundant Galois field representations, i.e., polynomialring representation and redundantly represented basis throughtower fields. To the best of our knowledge, this is the firstwork focusing on the error detection architectures for redundantarithmetic-based inversion in GF(28). The presented signature-based schemes in this paper are general and can be appliedto block ciphers with 8-bit S-boxes, such as Camellia, SMS4,the advanced encryption standard, and CLEFIA. We presentthe results of error simulations and application-specific inte-grated circuit implementations to demonstrate the utility of thepresented schemes. Based on the specific implementation’s secu-rity/reliability objectives and the overhead/degradation tolerancefor implementation/performance metrics, one can fine-tune andtailor the proposed work to achieve more reliable inversions inGF(28).

Index Terms—Application-specific integrated circuit(ASIC), block cipher, error detection, inversion in GF(28),reliability.

I. INTRODUCTION

CRYPTOGRAPHIC block ciphers are effective solu-tions for ensuring confidentiality in resource-constrained

Manuscript received January 1, 2017; revised March 23, 2017 andMay 27, 2017; accepted June 9, 2017. Date of publication June 20, 2017;date of current version February 16, 2018. This work was supported by theNational Institute of Standards and Technology through the U.S. FederalAgency under Award 60NANB16D245. This paper was recommended byAssociate Editor Q. Xu. (Corresponding author: Mehran Mozaffari Kermani.)

M. Mozaffari Kermani is with the Department of Computer Science andEngineering, University of South Florida, Tampa, FL 33620 USA (e-mail:[email protected]).

A. Jalali and R. Azarderakhsh are with the Department of Computerand Electrical Engineering and Computer Science, Florida AtlanticUniversity, Boca Raton, FL 33431 USA (e-mail: [email protected];[email protected]).

J. Xie is with the Department of Electrical Engineering, Wright StateUniversity, Dayton, OH 45435 USA (e-mail: [email protected]).

K.-K. R. Choo is with the Department of Information Systems and CyberSecurity, University of Texas at San Antonio, San Antonio, TX 78249 USA(e-mail: [email protected]).

Digital Object Identifier 10.1109/TCAD.2017.2717791

applications, e.g., deeply embedded systems in humanbody, such as sensitive implantable and wearable medicaldevices [1]. In lightweight cryptography, the S-boxes are usu-ally small whose implementations can be achieved efficiently.The S-box sizes in lightweight block ciphers are usuallywith the size of 4-input, 4-output and can be implementedefficiently through logic gates or look-up tables. In applica-tions, where memory macros on application-specific integratedcircuit (ASIC) and block memories on field-programmablegate array (FPGA) are utilized, one can realize such look-uptables efficiently. Such variants of S-boxes typically have highperformance but also require a larger area and a higher powerconsumption. On the other hand, there have also been exten-sive work on the realization of S-boxes based on multiplicativeinversion in GF(28) used for cryptographic algorithms, suchas the advanced encryption standard (AES) [2], Camellia [3],SMS4 [4], and CLEFIA [5].

S-boxes need to be implemented efficiently and can berealized through two different (and diverse) approaches inhardware. To alleviate such limitations, on the other hand,one can utilize composite field-based architectures. The latterhave low complexity, but require subpipelining to achieve highthroughput and efficiency. For high-performance applications,e.g., server or game console security, the implementations ofthe S-boxes can be done on FPGAs and through look-up table-based approaches to achieve high speed and performance.Nevertheless, such schemes can be replaced with compositefield implementations to achieve low area and power con-sumption for deeply embedded systems. However, for eitherof these implementations, there might exist naturally occurringand malicious faults which undermine the reliability of blockciphers; consequently, impacting on their capability to assureconfidentiality. A number of concurrent error detection tech-niques have been proposed to account for reliable hardwarearchitectures (including cryptographic block ciphers) [6]–[19].Such schemes are based on hardware/information/time/hybridredundancy. Nevertheless, it is possible to leverage the errordetection schemes which are tailored for specific implemen-tations of the S-boxes to strike a balance between reliabilityand overhead tolerance.

The multiplicative inversion in GF(28) through compos-ite fields can be realized using polynomial basis, normalbasis, or mixed basis [20]–[22]. Although mixed basis can

0278-0070 c© 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

mailto:[email protected]





http://www.ieee.org/publications_standards/publications/rights/index.html

MOZAFFARI KERMANI et al.: RELIABLE INVERSION IN GF(28) WITH REDUNDANT ARITHMETIC FOR SECURE ERROR DETECTION 697

achieve high-performance implementations for the S-boxes,in [23], redundant basis is used for the S-boxes in compositefields to ameliorate the efficient realization of the architec-tures (40% higher efficiency in terms of area-time productcompared to other bases). In this paper, we propose diagno-sis approaches for the multiplicative inversion in GF(28). Wenote that permanent faults are caused by very-large-scale inte-gration manufacturing defects (and of course if the intentionis to break the entire device, such faults can be injected atrun-time). On the other hand, “long transient faults” can leadto information leakage [24]. Simple time redundancy cannotdetect long transient faults that last for the normal computa-tion and recomputation, and it has been shown that one couldsuccessfully inject long transient faults to circumvent thiscountermeasure [24]. Our main contributions are summarizedas follows.

1) For the first time, we propose error detection approachesfor the multiplicative inversion in GF(28) utilizing com-bined normal basis (nonredundant) and two redundantGalois field representations, i.e., polynomial ring rep-resentation and redundantly represented basis throughtower fields. One contribution of this paper is that theproposed approaches can be tailored based on the reli-ability requirements and overhead tolerance of sensitiveusage models (not just confined to parities, for instance,or not getting confined to a certain number of blocks forderiving the signatures).

2) The focus of this paper is on presenting error detec-tion architectures for multiplicative inversion, applicableto the S-boxes in a number of cryptographic algo-rithms, i.e., not only the AES [2] can benefit from thepresented approaches, but Camellia [3], SMS4 [4], andCLEFIA [5] can also utilize them. We also present for-mulations for the transformation matrices to detect errorsin their respective structures.

3) We benchmark the proposed architectures to evaluatetheir capability to detect transient and permanent faultsusing fault injection simulations. Findings demon-strate that the proposed architectures have acceptableerror detection capabilities, i.e., reliability of proposedapproaches.

4) Another contribution of this paper is to assess the falsealarm resiliency of the proposed approaches. Using theproposed approaches, the error detection structures arecapable of detecting the injected faults with high cover-age (transient and permanent as well as single, multiple,and adjacent faults) with low false alarms.

5) We implement the proposed error detection architec-tures on an ASIC platform using a 65-nm standard-celllibrary. Our results show that the proposed efficienterror detection architectures can be practically utilized,and is suitable for deployment on resource-constrainedapplications.

The organization of this paper is as follows. Section IIpresents background materials and our proposed errordetection schemes. In Section III, we present concreteconstructions. We then evaluate the performance of theproposed scheme followed by fault-injection simulations

in Section IV. Finally, the conclusions are presentedin Section V.

II. PROPOSED ERROR DETECTION APPROACHES

In what follows, we present the background materials andthen the proposed error detection schemes.

A. Background Concepts

We will now briefly describe the background materi-als required in the understanding of the design of themultiplicative inversion in GF(28) based on tower fieldarithmetic.

Limitations of look-up table-based realization of theS-boxes include high power consumption, large area require-ment, and high energy usage. On the other hand, compositefield-based architectures have low complexity but pipelining isrequired to achieve high throughput and efficiency. The towerfield approach for inversion in GF(28) is efficient because thesubfields GF((22)2)2 and GF(24)2 operations are realized effi-ciently. Polynomial/normal/mixed bases can be used for suchtower fields (using m = 8 bits in a nonredundant manner).

For the nonzero element a ∈ GF(28), we have theinverse a−1= a254. The scheme underlying the tower fieldapproach is to utilize smaller arithmetic operations over sub-field GF((22)2)2 or GF(24)2 instead of GF(28). There is anisomorphism between the elements of GF(28) and those of thetower field. Such a multiplicative inversion in GF(28) can beefficiently implemented using the Itoh–Tsujii algorithm [25].For instance, for the subfield GF((22)2)2 with normal basiscomposite field, let a field element a in GF((22)2)2 be theinput given by hα16 + lα in normal basis with h and l beingthe upper and lower 4-bit parts of a, respectively. We derivethe inversion of a by the calculation of the 16th and 17thpowers, subfield inversion, and the final multiplication.

In this paper, we use redundant-based representation, whichuses n(>m = 8) bits to represent each element of GF(28). Theintention here is to show that we derive efficient schemes forerror detection in such tower fields. We calculate the inver-sion of the tower field GF(24)2 by the normal basis becausethe squaring operation is performed solely by wiring and tworedundant-based representations combination [23].

B. Error Detection for Redundantly Represented Basis

Each element in such basis is represented by a root of anmth degree irreducible polynomial. Similar to normal basis,redundantly represented basis-based GF(28) squaring can beperformed by bit-wise permutation. We note that such basiscan be obtained by adding a base, β0, to optimal normal basis.For squaring, we propose using signatures, e.g., parities orinterleaved parities, as they remain intact for the actual andpredicted variants, leading to very efficient error detection.

We present Theorems 1 and 2 for the error detection of themultiplication in GF(24) in redundantly represented basis.

Theorem 1: The predicted parity for the multiplicationin GF(24) in redundantly represented basis is as follows(s and t are 5-bit entries and the result u is also 5 bits;


s = s4β4 + s3β

3 + s2β2 + s1β + s0, t = t4β4 + t3β3 + t2β2 +

t1β + t0, u = u4β4 + u3β

3 + u2β2 + u1β + u0)

P̂ =∑

0≤i,j≤4,i �=j

si.tj (1)

where the “sum”∑

symbol represents XOR operation, β isthe root of the fourth degree all-one polynomial, and si, tj ∈GF(2).

Proof: We have the following for the result of the multipli-cation in GF(24) in redundantly represented basis:

u0 = (s1 + s4)(t1 + t4) + (s2 + s3)(t2 + t3) (2)

u1 = (s0 + s1)(t0 + t1) + (s2 + s4)(t2 + t4) (3)

u2 = (s0 + s2)(t0 + t2) + (s3 + s4)(t3 + t4) (4)

u3 = (s0 + s3)(t0 + t3) + (s1 + s2)(t1 + t2) (5)

u4 = (s0 + s4)(t0 + t4) + (s1 + s3)(t1 + t3). (6)

Modulo-2 addition of the above results in the predicted par-ity of P̂ = (s1+s4)(t1+t4)+(s2+s3)(t2+t3) +(s0+s1)(t0+t1)+(s2+s4)(t2+t4)+(s0 +s2)(t0+t2)+(s3+s4)(t3+t4)+(s0+s3)

(t0+t3)+(s1+s2)(t1+t2)+(s0+s4)(t0 +t4)+(s1+s3)(t1+t3) =s0(t1 + t2 + t3 +t4) + s1(t0 + t2 + t3 + t4) + s2(t1 + t0 + t3+t4) + s3(t1 + t2 + t0 + t4) + s4(t1 + t2 + t3 + t0) which isshown for the sake of simplicity as (1).

Remark 1: The predicted parity P̂ = s0(t1+t2+t3+t4)+s1(t0+t2 + t3 + t4) + s2(t1 + t0 + t3 + t4) + s3(t1 + t2 + t0 + t4) +s4(t1 + t2 + t3 + t0) can be implemented using 13 XOR gatesand 5 AND gates with the critical path delay of 5 TX and1 TA, where TX and TA are the delays for XOR and ANDgates, respectively.

Theorem 2 describes the interleaved parities for burst errordetection.

Theorem 2: The predicted interleaved parities for the mul-tiplication in GF(24) in redundantly represented basis areas follows (P̂1 and P̂2 represent, respectively, u0 + u2 + u4and u1 + u3) : P̂1 = s0(t2 + t4) + s1(t3 + t4) +s2(t0 +t3) + s3(t1 + t2 + t3 + t4) + s4 (t0 + t1 + t3 + t4), P̂2 =s0(t1 + t3) + s1(t0 + t2)+ s2(t1 + t4)+ s3(t0 + t3) + s4(t2 + t4).

Proof: Considering the proof of Theorem 1, we derive theinterleaved parities for P̂1 and P̂2, respectively, which can beefficiently implemented through subexpression sharing.

Remark 2: The predicted interleaved parities P̂1 = s0(t2 +t4)+ s1(t3 + t4)+ s2(t0 + t3)+ s3(t1 + t2 + t3 + t4)+ s4(t0 + t1 +t3 + t4), P̂2 = s0(t1 + t3) + s1(t0 + t2) + s2(t1 + t4) + s3(t0 +t3) + s4(t2 + t4) can be implemented for the former using 10XOR gates and 5 AND gates with the critical path delay of5 TX and 1 TA, and for the latter using 9 XOR gates and 5AND gates with the critical path delay of 4 TX and 1 TA. Onecan save 2 XOR gates with further sharing the architecturesof the two interleaved signatures.

C. Error Detection for Subfield Inversion in GF(24)

The inversion in GF(24) is constructed using the inputs inpolynomial ring representation and the outputs in redundantlyrepresented basis. We now present fault diagnosis schemesfor the inversion unit noting that the transformation matrixused for basis conversion is implemented without incurringadditional cost.

We present Theorem 3 for fault diagnosis of the 5-bit input(d = d4β

4 + d3β3 + d2β

2 + d1β + d0) and its inversion, i.e.,e = e4β

4 + e3β3 + e2β

2 + e1β + e0.Theorem 3: The predicted parity and the interleaved parities

for the inversion in GF(24) are derived as follows:

P̂ = d1d2d3(d0 + d4

) + d0(d2d3d4 + d1d3d4 + d1d2d4)

(7)

P̂1 = d0d3d4d2 + d2d1d0d4 + (d2 + d3)d1 + (d2 + d1)d4

(8)

P̂2 = d0d1d2d3 + d2d3d4d1 + d3d1d4d0 + d1d4d2 + d2 ∨ d3.

(9)

Proof: We have the following for the result of the inversionin GF(24):

e0 = (d1 ∨ d4)(d2 ∨ d3) (10)

e1 = d4(d1 + d2) ∨ (d0d4(d2 ∨ d3)) (11)

e2 = d3(d2 + d4) ∨ (d0d3(d1 ∨ d4)) (12)

e3 = d2(d1 + d3) ∨ (d0d2(d1 ∨ d4)) (13)

e4 = d1(d3 + d4) ∨ (d0d1(d2 ∨ d3)). (14)

Modulo-2 addition of the above formulas as well asits interleaved modulo-2 addition results in (7)–(9) aftersimplifications.

Remark 3: The predicted (interleaved) parities P̂ =d1d2d3(d0 + d4) + d0(d2d3d4 + d1d3d4 + d1d2d4), P̂1 =d0d3d4d2 + d2d1d0d4 + (d2 + d3)d1 + (d2 + d1)d4, P̂2 =d0d1d2d3 + d2d3d4d1 + d3d1d4d0 + d1d4d2 + d2 ∨ d3 canbe implemented, respectively, excluding the inverters, using3 XOR gates and 9 AND gates with the critical path delay of3 TX and 3 TA, using 5 XOR gates and 8 AND gates with thecritical path delay of 2 TX and 2 TA, and finally utilizing 4XOR gates and 12 AND/OR gates with the critical path delayof 3 TX and 2 TA.

D. Error Detection for Other Stages

The other stages construct the beginning of the architecture(assuming h and l as the 4-bit entries to the entire inver-sion in GF((24)2) and d as the 5-bit input of the inversionin GF(24)). Considering the mappings from normal basis andpolynomial ring representation, we present Theorem 4 for theparity prediction of the stage which performs φ′hl+φ′′(h+l)2,

where we have

φ′ →

⎛

⎜⎜⎜⎜⎝

0 1 1 00 0 1 10 0 0 11 0 0 01 1 0 0

⎞

⎟⎟⎟⎟⎠and φ′′ →

⎛

⎜⎜⎜⎜⎝

1 1 1 01 1 1 10 1 1 11 0 1 11 1 0 1

⎞

⎟⎟⎟⎟⎠.

Theorem 4: The predicted parity for φ′hl + φ′′(h + l)2 isefficiently derived with no cost as P̂ = 0. This yields to verylightweight architectures using such derivation and thus usinginterleaved parities is not recommended for this stage unlessa specific error coverage is aimed.

Proof: Due to page limitations, we do not present the for-mulations for the result of φ′hl +φ′′(h + l)2, whose modulo-2addition of the output bits results in P̂ = 0.


Fig. 1. First proposed choice in error detection of the inversion in GF((24)2).

Fig. 2. Second proposed choice in error detection of the inversion inGF((24)2).

To finalize this section, we would like to emphasize thatthe utilized signatures in this paper are a subset of pos-sible ones that can be used. For instance, one may use athree-flag, three-bit signature for different subparts of theS-boxes in redundant-basis, such as the inversion in GF(24)

as S0 = e0 + e1, S1 = e2 + e3, and S1 = e4. Such derivation,and other similar ones, would add to the overhead of the errordetection architecture, with the advantage of higher error cov-erage; nevertheless, based on the reliability requirements andoverhead tolerance, such schemes can also be tailored for errordetection (this also depends on the fault model used).

III. EFFICIENT DETECTION CONSTRUCTION

In this section, using the derived formulations for the signa-tures in this paper, we present the error detection constructionsand based on four different considerations, namely; complex-ity (area/delay and, thus, the respective power/energy), errorcoverage, false alarms, and real attacker resiliency. We thenchoose an efficient construction from the pool of designs inFigs. 1–3. Error simulations and ASIC implementations arepresented for these constructions in the next section.

A. Three Choices to Consider

Based on the derived signatures, we examine three choicesfor fault diagnosis as shown in Figs. 1–3. The proposed

schemes are not confined to specific number of blocks(although three of such choices are shown in Figs. 1–3). Wealso remark that for different reliability requirements and over-head tolerance, one can fine-tune such choices. There are tworeasons for choosing these three constructions as examplesfor presenting the proposed error detection schemes. First, thechoices are made to preserve the inner operations in com-posite field and having them intact, allowing them to beimplemented through different inner-structures. Second, not-ing that different usage models and contexts constrain theerror detection architectures in terms of the required errorcoverage and overhead tolerance, we have made sure thatthese examples cover three cases to have such respective com-promise. In addition to these three examples, we have alsosimulated for assessing the error coverage and implementedon ASIC another case study (with four blocks by modifyingBlock 1 in Fig. 3 to two blocks, one being the transforma-tion matrix). As it is shown in our simulations and throughour implementations, the error coverages of Figs. 1 and 2and the four-block construction are higher, in the order speci-fied, compared to the construction in Fig. 3. The false alarms,although negligible, decrease in the same order. However,the overhead and performance degradation of the construc-tion in Fig. 3 are the lowest. Finally, we note that havingmany blocks increases the error detection overhead unreason-ably and, thus, might not be preferred. On the other hand,having only one block would result in low error coverage,and, in addition, it is interesting that the resulting formula-tions for signatures would be costly causing high overheadwhich is not preferred. Thus, the eventual choice is dependenton the reliability requirements and overhead tolerance of thearchitectures.

As observed in Fig. 1, in the first choice, seven sub-blocks(blocks 1–7) are used for error detection whose predicted sig-natures are based on the formulations in the preceding section.Let us choose the parity for finding an error detection construc-tion, noting that this does not confine us to this signature only.


Fig. 3. Third proposed choice in error detection of the inversion in GF((24)2).

In Fig. 1, the predicted signatures (parities for example) areshown as Sig. 1–7.

The derivation of the multiplicative inversion is performedas follows.

1) The 8-bit input a is divided into lower and higherhalves (4-bit l and h, respectively) and as entries to Hand L units (which derive the 6-bit outputs by havingHi,j = hi + hj, Li,j = li + lj(1 ≤ i < j ≤ 4)). Similardefinition is used for the unit F. Stage 1 performs the16th and 17th powers of input, where input a is givenby normal basis, and outputs a16 and a17 are given byredundantly represented basis and polynomial ring rep-resentation, respectively. For α as the root of a seconddegree irreducible polynomial over GF(24), with respec-tive irreducible polynomials, we get the 16th and 17thpowers of input (a16 = lα16+hα, a17 = hlμ2+(h+l)2υ).Stage 1 performs φ′hl + φ′′(h + l)2, where the linearmatrices used, φ′ and φ′′, are explained in Theorem 4.

2) The transformation matrices, as well as the “free” nor-mal basis to redundant basis conversion (because thenormal basis can be considered as the reduced versionof redundantly represented basis with the same root ofthe fourth degree all-one polynomial), are also shown inFig. 1. Stage 2 performs the subfield inversion (inputsand outputs are in redundant basis), where the input andoutput are given by polynomial ring representation andredundantly represented basis, respectively. More detailsare presented in Theorem 3.

3) In Stage 3 (two subparts as seen in Fig. 1), twomultiplications in redundant basis (redundantly repre-sented basis) are computed. More details are presentedin Theorem 1. Figs. 2 and 3 show the same structurebut with different number of check points, i.e., 5 and 3,respectively.

A single gate, e.g., the XOR gate used for comparing thepredicted/actual signatures, can be separately hardened usinglogical or circuit-level fault tolerance techniques. Finally, wenote that one can use a combination of the proposed methodsto achieve the intended error coverage, alleviate the complica-tions, such as detecting the faults that occur right at the outputof blocks, and fine-tuning the false alarm ratios.

B. False Alarms

False-alarms could impact on the utilization of crypto-graphic solutions. Specifically, if such alarms become repeti-tive, they might hinder the normal operations of cryptographicalgorithms by inducing distrust in users who may eventuallyabandon the particular security solutions. Constructions in

Figs. 1–3 have been simulated for errors (specifically for falsealarms). Although the number of such alarms is not high,different constructions have different (and somewhat compa-rable) characteristics. For example, our error simulations showthat Fig. 1 has the highest number of such alarms (0.05%)and Fig. 3 has the lowest number (0.03%). Because we aredealing with multiple signatures, such alarms are due to thecases in which we detect faults in an inner sub-block whichwill not be eventually translated into errors in the output.

C. Linear Transformations Within Ciphers

In this paper, we have focused on the 8-bit S-boxes whichare the only nonlinear transformations within the ciphers andconsume much of the area and constitute high power con-sumption of the ciphers. Moreover, a major motivation hasbeen that the key schedule unit constitutes the S-boxes as themain transformation. Nevertheless, we briefly present here theerror detection schemes of the linear transformations withinciphers as well.

Focusing on the AES, three transformations other thanthe S-boxes (also known as SubBytes) are: 1) ShiftRows;2) MixColumns; and 3) AddRoundKey. The first one is justrewiring in hardware, and the derivation of any signatureis straightforward. The second one is a linear transforma-tion (multiplication with a constant matrix in finite field), forwhich, based on the error detection requirements and over-head tolerance, column signatures can be derived. Finally,AddRoundKey is implemented through XOR gates; thus,signature derivation is performed accordingly.

IV. ERROR SIMULATIONS AND ASIC SYNTHESIS

As seen in Figs. 1–3, one needs to derive the signaturesfor the transformation and inverse transformation matrices(corresponding to Sig. 1 and the signature of last block inthese figures, respectively). We combine the last block withthe affine transformation matrix of the AES to derive twomatrices [23] to apply error detection according to Theorem 5.

Theorem 5: The predicted parity and interleaved parity forBlock 1 ( ˆPb1) and “combined” last block (lb) with affine trans-formation (P̂lb) are derived as below, considering S as theinput to Block 1, A as the output of Block 1, A−1 as theinput to the last block, and O as the output of last block:ˆPb1 = s1 + s2 + s3 + s4 + s5 + s6, ˆP1b1 = s1, ˆP2b1 =

s2 + s3 + s4 + s5 + s6, P̂lb = a−11 + a−1

2 , ˆP1lb = a−10 + a−1

1 +a−1

2 + a−13 + a−1

6 + a−17 , ˆP2lb = a−1

0 + a−13 + a−1

6 + a−17 .

Proof: The transformation matrix (M1 below) is an 8 × 8one, and the merged inverse and affine transformation (M2below) is an 8 × 10 matrix (see Figs. 1–3), through which theabove is derived, noting that the least significant bit is on topand also the input to Block 1 is in normal basis

M1 →

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 1 0 1 1 1 0 01 0 1 0 0 0 1 11 0 0 1 0 0 0 10 0 0 0 0 1 0 00 1 1 0 1 1 0 01 0 1 0 1 0 0 01 1 1 0 0 0 0 10 0 1 1 0 0 0 1

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠


Fig. 4. Error detection ratios for interleaved parity (gray) and parity (black)for the proposed construction in Fig. 1 for up to 35 000 fault injections.


M2 →

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 1 1 0 1 0 0 1 00 0 1 1 0 1 1 1 0 11 1 1 1 0 0 1 0 1 01 0 0 0 1 1 1 1 0 11 1 0 1 1 1 0 0 0 11 0 0 0 1 1 0 1 1 10 0 1 0 1 0 0 1 0 11 0 1 0 0 1 1 0 1 1

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

.

The proposed error detection architectures (applied to theAES) have been simulated after injecting faults. The proposedarchitectures have the capability of detecting both permanentand transient faults (this covers both natural and maliciousfaults). The approach in the proposed fault diagnosis schemesis designed to inject faults and then observe the error indica-tion flags. For simulations, Verilog HDL is used. We considerall sub-blocks of the original architecture to induce faultsby flipping one or more bits and then inspect the generatedoutputs.

We consider a particular fault scenario and apply differentinputs to assert a subset of entries while injecting faults. Wethen observe all detected errors for the inputs. The fault modelused to test the proposed architectures is created using externalfeedback linear feedback shift registers to generate pseudo-random fault vectors that can flip random bits in the output ofthe gates and at random intervals. For the three architecturespresented in Figs. 1–3, we inject 35 000 faults and record thenumber of errors as seen in Figs. 4–6. Moreover, in Fig. 7,the false alarm ratios are presented. As seen in Figs. 4–6, the


Fig. 7. False alarms ratios for the proposed three constructions for up to35 000 fault injections.

error coverage in all the cases is more than 99% (for Fig. 1, weslightly have higher coverage as seen in Fig. 4; nevertheless,because we consider all the S-boxes in the SubBytes transfor-mation, the detection ratios of the architectures presented areclose). We note that in these figures, error coverage ratios areshown which is the number of detected faults over the totalinjections. The difference between the error detection resultsis, comparably, not high. In addition, the results of our sim-ulations for 80 000 injected faults with the ratios of detectedfaults obtained as 0.9998, 0.9989, and 0.9991 for Figs. 1–3,respectively, have been obtained for parity as signature. Theseshow that the slight fluctuations in Figs. 4–6 do not followa trend. Finally, the false alarm ratios are typically low (seeFig. 7); however, if that is the usage model concern, then thearchitecture in Fig. 3 is preferable.

A. Proposed Approaches in Presence of Fault Attacks

We note that the fundamental difference between securityattacks and random faults is the intelligent-attacker assump-tion. Injection of random faults mimics errors happening dueto natural causes. In contrast, the intelligent adversary run-ning fault-based cryptanalysis will carefully determine thefault (s)he is going to inject and perform injection right at thecalculated position and point of time. Consequently, we notethat just trying random faults will not be helpful in break-ing most ciphers. As such, we emphasize that the signatureswe use for error simulations and implementations are generaland, for instance, for the case of parities, we detect all theodd errors (and single errors). Ideally, single stuck-at faults


are better for real attackers to obtain side-channel informa-tion. In reality, however, this might not happen due to lackof technological advancements. Differential fault analysis usestransient and mostly multiple bit and byte faults. In the caseof interleaved parities, we detect burst faults which are morerealistic to consider for the attackers. We note that a variantof fault attacks, i.e., fault sensitivity analysis, can be thwarted,for instance, through eliminating fault sensitivity based on adelay insertion algorithm.

B. Differential Fault Intensity Analysis

A subset of fault attacks, differential fault intensity analy-sis; see [26]–[28], combines differential power analysis withfault injection principles to obtain biased fault models. Theadvantage in such an approach is that the same fault in boththe original and redundant computations can be injected, butnot all faults occur with equal probability. In practice, theattacker is interested in using as few faults as possible (prefer-ably single faults with different intensities) to minimize theeffort. Previous work argue that the single-bit (more likelyin low fault intensity), two-bit, three-bit, and four-bit (morelikely in higher intensities) biased fault models can be usedto simulate variation of fault intensity. In addition, fault cate-gories presented in [29] include: single bit upset (SBU), singlebyte double bit upset, single byte triple bit upset (SBTBU),single byte quadruple bit upset (SBQBU), other single bytefaults (OSB), and multiple byte faults (MB); the former fourcorresponding to single/two/three/four-bit models.

C. Flexibility in Choosing the Signatures

Compared to the approaches based on using 1-bit parityfor the entire 16 S-boxes or those using 1-bit parities forone S-box (information redundancy), the proposed work givesflexibility to the designers in order to choose three instancesof dividing the architectures into blocks (Figs. 1–3) and alsohave the flexibility of using single of interleaved parities. Theproposed approaches in this paper based on error detectingcodes are able to thwart a number of such fault models.Specifically, SBUs and SBTBUs are detected fully throughthe approaches based on error detecting codes and using pari-ties. Moreover, through interleaved parities, in addition to burstfaults, a number of SBTBUs, SBQBUs, OSBs, and MBs aredetected. Error detecting codes and column signatures (pari-ties) can also detect OSBs and MBs, based on our simulations.The proposed methods, being for reliability, can deal with per-manent and transient faults and compound the challenge oflaunching fault analysis attacks. The signature-based diagno-sis approach, which uses linear codes that can (always) detectrandom errors of small multiplicity (and can never detect someother errors), differs from an architecture based on robustcodes which can detect (with probability) any error. We alsonote that plain time/hardware redundancy has the disadvan-tage of high overhead; however, the potential remedy for timeredundancy, i.e., recomputing with encoded operands (hybridredundancy), while being expensive, is able to detect per-manent and transient faults similar to the proposed approachhere.

D. Making the Attacks Difficult

We note that having the flexibility in choosing the signa-tures for the three architectures in Figs. 1–3 is an advantagefor our schemes. It has been shown that it is practical to attackparity-based schemes that use merely 1-bit parity for SubBytesor even one S-box; however, the number of injections neededwould be high if composite fields are used (which allow incor-porating the proposed work). Multibyte faults cannot be usedto realistically attack time redundancy countermeasure imple-mentations, e.g., recomputing with permuted operands, andsingle-byte fault models are the only viable option for theattackers [29]. We remark that, however, unlike our proposedschemes, recomputing with permuted operands could fail todetect the occurrence of a fault as long as the adversary couldinject the same fault in both the original and redundant compu-tations (biased fault model makes it easier). Such recomputingwith permuted operands can be used in conjunction with cod-ing schemes to nullify the effect of the bias in the fault modelby fault space transformation (if two equivalent faults f0 andf1 are injected in the output registers, we use a mapping thattransforms the fault space); thus thwarting both these attackschemes. This is similar to the schemes used in [29], whichincurs additional overhead.

E. ASIC Implementations and Comparisons

Through ASIC and for the constructions in Figs. 1–3 andthe previous work using look-up table (LUT)-based and com-posite field-based S-boxes, we also present the performanceand implementation metrics for the multiplicative inversion inGF(28) utilizing different structures as well as normal basis(nonredundant) and two redundant Galois field representations.The benchmarking is performed for the error detection archi-tectures using TSMC 65 nm library and Synopsys DesignCompiler (shown in Table I for area, frequency, power con-sumption at the respective working frequencies, and efficiency[throughput over gate equivalents (GE)]). There are two notesto consider regarding the synthesis.

1) We synthesized the multiplicative inversion in GF(28)

which can be used for cryptographic algorithms, suchas the AES [2], Camellia [3], SMS4 [4], CLEFIA [5],and the like.

2) We mainly compare signature-based approaches; nev-ertheless, approaches based on recomputations, e.g.,NREPO (normal basis recomputing with encodedoperands which has been proven to be resilient againstfault analysis attacks at the cost of higher overhead), canbe combined with the proposed architectures.

In Table I, the first two schemes are based on LUTs andare expensive on ASIC and the overhead difference com-pared to the other schemes is high. Moreover, the schemebased on double-data-rate using both clock edges of regis-ters is presented. This is achieved at the expense of sufferingfrom high throughput degradation. We note that in Table I,in order to make the area results meaningful when switchingtechnologies, we have also provided the NAND-gate equiva-lency (GE). This is performed using the area of a NAND gatein the utilized TSMC 65-nm CMOS library which is 1.41 μm2.As reported in the table, our proposed work outperforms other


TABLE IOVERHEADS AND PERFORMANCE DEGRADATION COMPARISON FOR THE MULTIPLICATIVE INVERSION IN GF(28)

OF THE S-BOXES INCLUDING 7/5/3-BLOCK ERROR DETECTION (ED) IN FIGS. 1–3 ON 65-nm ASIC

approaches, in terms of performance and implementation met-rics, making it suitable for efficient error detection of themultiplicative inversion in GF(28). For the entire AES encryp-tion (which includes 16 S-boxes in each of its ten rounds),the ASIC synthesis GE area results for redundant basis (thispaper/Figs. 1–3) are 61 243 GE, 57 300 GE, and 56 008 GE,respectively. One future research direction is to investigatecombined fault and power analysis attacks countermeasures forthe multiplicative inversion in GF(28) using redundant basis(related prior work, for instance, [14], have not consideredsuch basis).

V. CONCLUSION

In this paper, signature-based error detection approacheswere presented for the multiplicative inversion in GF(28)

utilizing normal basis (nonredundant) and two redundantGalois field representations, i.e., polynomial ring representa-tion and redundantly represented basis through tower fields.In our approaches, we considered both the reliability andperformance metrics objectives. Signature-based approachesare used for such nonlinear blocks to achieve high efficiency,while maintaining high error coverage. Our results demon-strated that the proposed efficient error detection architecturescan be feasibly utilized, suitable for the required performance,reliability, and implementation metrics for constrained appli-cations. As observed in Figs. 4–6, the error coverage in allthe cases is more than 99%. Moreover, as reported in Table I,better performance and implementation metrics were achievedin the proposed work. This makes it suitable for efficient errordetection of the multiplicative inversion in GF(28).

REFERENCES

[1] M. Mozaffari Kermani, M. Zhang, A. Raghunathan, and N. K. Jha,“Emerging frontiers in embedded security,” in Proc. Conf. VLSI Design,Pune, India, Jan. 2013, pp. 203–208.

[2] Specification of the Advanced Encryption Standard. Accessed onJul. 2017. [Online]. Available: http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf

[3] Specification of Camellia, a 128-bit Block Cipher. Accessedon Jul. 2017. [Online]. Available: https://info.isl.ntt.co.jp/crypt/eng/camellia/dl/01espec.pdf

[4] Specification of SMS4. Accessed on Jul. 2017. [Online]. Available:http://eprint.iacr.org/2008/329.pdf

[5] Specification of CLEFIA. Accessed on Jul. 2017. [Online]. Available:http://www.sony.net/Products/cryptography/clefia/index.html

[6] C. H. Yen and B. F. Wu, “Simple error detection methods for hard-ware implementation of Advanced Encryption Standard,” IEEE Trans.Comput., vol. 55, no. 6, pp. 720–731, Jun. 2006.

[7] G. Di Natale, M. Doulcier, M. L. Flottes, and B. Rouzeyre, “A reliablearchitecture for parallel implementations of the Advanced EncryptionStandard,” J. Electron. Test. Theory Appl., vol. 25, no. 4, pp. 269–278,Aug. 2009.

[8] M. Mozaffari-Kermani and A. Reyhani-Masoleh, “Concurrent structure-independent fault detection schemes for the Advanced EncryptionStandard,” IEEE Trans. Comput., vol. 59, no. 5, pp. 608–622, May 2010.

[9] P. Maistri and R. Leveugle, “Double-data-rate computation as a counter-measure against fault analysis,” IEEE Trans. Comput., vol. 57, no. 11,pp. 1528–1539, Nov. 2008.

[10] X. Guo, D. Mukhopadhyay, C. Jin, and R. Karri, “Security analy-sis of concurrent error detection against differential fault analysis,”J. Cryptograph. Eng., vol. 5, no. 3, pp. 153–169, 2015.

[11] M. Yasin, B. Mazumdar, S. S. Ali, and O. Sinanoglu, “Security analysisof logic encryption against the most effective side-channel attack: DPA,”in Proc. Defect Fault Tolerance VLSI Syst., Amherst, MA, USA, 2015,pp. 97–102.

[12] M. Mozaffari-Kermani, R. Azarderakhsh, and A. Aghaie, “Reliableand error detection architectures of Pomaranch for false-alarm-sensitivecryptographic applications,” IEEE Trans. Very Large Scale Integr. (VLSI)Syst., vol. 23, no. 12, pp. 2804–2812, Dec. 2015.

[13] M. Karpovsky, K. J. Kulikowski, and A. Taubin, “Robust protec-tion against fault-injection attacks on smart cards implementing theAdvanced Encryption Standard,” in Proc. Depend. Syst. Netw., 2004,pp. 93–101.

[14] H. Pahlevanzadeh, J. Dofe, and Q. Yu, “Assessing CPA resistance ofAES with different fault tolerance mechanisms,” in Proc. Asia SouthPac. Design Autom. Conf. (ASP DAC), 2016, pp. 661–666.

[15] T. Schneider, A. Moradi, and T. Güneysu, “ParTI—Towards com-bined hardware countermeasures against side-channel and fault-injectionattacks,” in Proc. CRYPTO, 2016, pp. 302–332.

[16] X. Guo and R. Karri, “Recomputing with permuted operands: A con-current error detection approach,” IEEE Trans. Comput.-Aided DesignIntegr. Circuits Syst., vol. 32, no. 10, pp. 1595–1608, Oct. 2013.

[17] R. Karri, K. Wu, P. Mishra, and K. Yongkook, “Fault-based side-channelcryptanalysis tolerant Rijndael symmetric block cipher architecture,” inProc. DFT, 2001, pp. 418–426.

[18] X. Guo, D. Mukhopadhyay, C. Jin, and R. Karri, “NREPO: Normalbasis recomputing with permuted operands,” in Proc. HOST, Arlington,VA, USA, 2014, pp. 118–123.

[19] K. Wu, R. Karri, G. Kuznetsov, and M. Goessel, “Low cost concurrenterror detection for the Advanced Encryption Standard,” in Proc. TestConf., Charlotte, NC, USA, 2004, pp. 1242–1248.

[20] A. Satoh, S. Morioka, K. Takano, and S. Munetoh, “A compact Rijndaelhardware architecture with S-box optimization,” in Proc. ASIACRYPT,2001, pp. 239–254.

http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf

http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf

https://info.isl.ntt.co.jp/crypt/eng/camellia/dl/01espec.pdf

https://info.isl.ntt.co.jp/crypt/eng/camellia/dl/01espec.pdf

http://eprint.iacr.org/2008/329.pdf

http://www.sony.net/Products/cryptography/clefia/index.html


[21] D. Canright, “A very compact S-box for AES,” in Proc. CHES, 2005,pp. 441–455.

[22] Y. Nogami, K. Nekado, T. Toyota, N. Hongo, and Y. Morikawa, “Mixedbases for efficient inversion in F((22)2)2 and conversion matrices ofSubBytes of AES,” in Proc. CHES, 2010, pp. 234–247.

[23] R. Ueno, N. Homma, Y. Sugawara, Y. Nogami, and T. Aoki, “Highlyefficient GF(28) inversion circuit based on redundant GF arithmetic andits application to AES design,” in Proc. Cryptograph. Hardw. EmbeddedSyst., 2015, pp. 63–80.

[24] G. Canivet et al., “Glitch and laser fault attacks onto a secure AESimplementation on a SRAM-based FPGA,” J. Cryptol., vol. 24, no. 2,pp. 247–268, 2011.

[25] T. Itoh and S. Tsujii, “A fast algorithm for computing multiplicativeinverses in GF(2m) using normal bases,” Inf. Comput., vol. 78, no. 3,pp. 171–177, 1988.

[26] N. F. Ghalaty, B. Yuce, and P. Schaumont, “Analyzing the efficiencyof biased-fault based attacks,” Embedded Syst. Lett., vol. 8, no. 2,pp. 33–36, Jun. 2016.

[27] N. F. Ghalaty, B. Yuce, M. M. I. Taha, and P. Schaumont, “Differentialfault intensity analysis,” in Proc. FDTC, 2014, pp. 49–58.

[28] S. Patranabis, A. Chakraborty, P. H. Nguyen, and D. Mukhopadhyay,“A biased fault attack on the time redundancy countermeasure for AES,”in Proc. COSADE, 2015, pp. 189–203.

[29] S. Patranabis, A. Chakraborty, D. Mukhopadhyay, and P. Ha Nguyen.(2015). Using State Space Encoding to Counter Biased Fault Attackson AES Countermeasures. [Online]. Available: https://eprint.iacr.org/2015/806.pdf

Mehran Mozaffari Kermani (S’00–M’11–SM’16)received the B.Sc. degree from the Universityof Tehran, Tehran, Iran, and the M.E.Sc. andPh.D. degrees from the University of WesternOntario, London, ON, Canada, in 2007 and 2011,respectively.

He joined the Advanced Micro Devices,Markham, ON, Canada, as a Senior ASIC/LayoutDesigner, integrating sophisticated secu-rity/cryptographic capabilities into acceleratedprocessing. In 2012, he joined the Electrical

Engineering Department, Princeton University, Princeton, NJ, USA, as anNSERC Post-Doctoral Research Fellow. From 2013 to 2017, he was anAssistant Professor with the Rochester Institute of Technology, Rochester,NY, USA. In 2017, he has joined the Computer Science and EngineeringDepartment, University of South Florida, Tampa, FL, USA.

Dr. Mozaffari Kermani was a recipient of the prestigious Natural Sciencesand Engineering Research Council of Canada Post-Doctoral ResearchFellowship in 2011 and the Texas Instruments Faculty Award (DouglasHarvey) in 2014. He is currently serving as an Associate Editor for theIEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI)SYSTEMS, the ACM Transactions on Embedded Computing Systems, theIEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I. He has been theGuest Editor for the IEEE TRANSACTIONS ON DEPENDABLE AND SECURE

COMPUTING, the IEEE/ACM TRANSACTIONS ON COMPUTATIONAL

BIOLOGY AND BIOINFORMATICS, and the IEEE TRANSACTIONS ON

EMERGING TOPICS IN COMPUTING for special issues on security. Hehas been the TPC member for a number of conferences including HOST(Publications Chair), Design Automation Conference, Design Automationand Test in Europe Conference, RFIDSec, LightSec, International Workshopon the Arithmetic of Finite Fields, Fault Diagnosis and Tolerance inCryptography Workshop, and International Symposium on Defect and FaultTolerance in VLSI and Nanotechnology Systems.

Amir Jalali received the B.Sc. degree in computerengineering from Shahid Beheshti University,Tehran, Iran, in 2009 and the M.Sc. degree incomputer engineering from the Department ofComputer Engineering and Information Technology,Amirkabir University of Technology, Tehran, in2012. He is currently pursuing the Ph.D. degreein computer engineering with the Department ofComputer, Electrical Engineering and ComputerScience, Florida Atlantic University, Boca Raton,FL, USA.

He is an I-SENSE Researcher with Florida Atlantic University. His currentresearch interests include efficient software implementation of elliptic curvecryptography and post-quantum cryptography.

Reza Azarderakhsh (M’12) received the B.Sc.degree in electrical and electronic engineering andthe M.Sc. degree in computer engineering fromthe Sharif University of Technology, Tehran, Iran,in 2002 and 2005, respectively, and the Ph.D.degree in electrical and computer engineering fromthe University of Western Ontario, London, ON,Canada, in 2011.

He joined the Department of Electrical andComputer Engineering, University of WesternOntario, as a Limited Duties Instructor, in 2011,

where he has been an Natural Sciences and Engineering ResearchCouncil (NSERC) Post-Doctoral Research Fellow with the Center forApplied Cryptographic Research and the Department of Combinatoricsand Optimization. He is currently with the Department of Computer andElectrical Engineering and Computer Science and is an I-SENSE Fellow,Florida Atlantic University, Boca Raton, FL, USA. His current researchinterests include finite field and its application, elliptic curve cryptography,and pairing-based cryptography.

Dr. Azarderakhsh was a recipient of the prestigious NSERC of Canada Post-Doctoral Research Fellowship in 2012. He is currently serving as an AssociateEditor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I. Hewas the Guest Editor for the IEEE TRANSACTIONS ON DEPENDABLE AND

SECURE COMPUTING for the special issue of Emerging Embedded andCyber Physical System Security Challenges and Innovations, from 2016 to2017. He was also the Guest Editor for the IEEE TRANSACTIONS ON

COMPUTATIONAL BIOLOGY AND BIOINFORMATICS for the special issueof Emerging Security Trends for Biomedical Computations, Devices, andInfrastructures, from 2015 to 2016.

Jiafeng Xie (M’15) received the B.E. degree in mea-surement and control technology and instrumenta-tion from Yanshan University, Qinhuangdao, China,in 2006, the M.E. degree in control science and engi-neering from Central South University, Changsha,China, in 2010, and the Ph.D. degree in electri-cal engineering from the University of Pittsburgh,Pittsburgh, PA, USA, in 2014.

He is currently an Assistant Professor withDepartment of Electrical Engineering, Wright StateUniversity, Dayton, OH, USA. His current research

interests include very large scale integration (VLSI) cryptographic circuitsdesign, intelligent system fault detection, hardware security, VLSI, andsignal/image processing systems.

Kim-Kwang Raymond Choo (SM’15) receivedthe Ph.D. degree in information security from theQueensland University of Technology, Brisbane,QLD, Australia, in 2006.

He holds the Cloud Technology EndowedProfessorship with the University of Texas atSan Antonio, San Antonio, TX, USA. He is anAssociate Professor with the University of SouthAustralia, Adelaide, SA, Australia, and a GuestProfessor with the China University of Geosciences,Wuhan, China.

Dr. Choo was a recipient of various awards, including the ESORICS2015 Best Paper Award, the Winning Team of the Germany’s University ofErlangen-Nuremberg (FAU) Digital Forensics Research Challenge in 2015,the 2014 Highly Commended Award by the Australia New Zealand PolicingAdvisory Agency, the Fulbright Scholarship in 2009, the 2008 Australia DayAchievement Medallion, and the British Computer Society’s Wilkes Award in2008. He is a fellow of the Australian Computer Society.

https://eprint.iacr.org/2015/806.pdf

https://eprint.iacr.org/2015/806.pdf

reliable inversion in gf 2 with redundant arithmetic for secure...

Documents