smashing the implementation records of aes s-box...smashing the implementation records of aes s-box...

58
Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh , Mostafa Taha, and Doaa Ashmawy Western University London, Ontario, Canada CHES-2018 1

Upload: others

Post on 30-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Smashing the Implementation Records of

AES S-box

Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy

Western University London, Ontario, Canada

CHES-2018

1

Page 2: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Introduction.

• Proposed AES S-box Architecture.

• New Logic-Minimization Algorithms.

• New GF((24)2) Inversion.• New Exponentiation Stage.

• New Representation of Subfield Inversion.

• New Output Multipliers.

• Comparisons and Concluding Remarks.

Outline

2

Page 3: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Introduction

3

1998 2001 2005 2010 2015 2016 2018

First Introductionof Rijndael

Rijmen & Daemen

Standardizing Rijndael as the

AES

First Imp. using Tower Fields

Satoh et al.

Page 4: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Target small area

Introduction

3

1998 2001 2005 2010 2015 2016 2018

First Introductionof Rijndael

Rijmen & Daemen

Standardizing Rijndael as the

AES

Most compact S-box

Canright

Reduce the number of gates in Canright to 115

Boyar and Peralta

Then to 113

CMT

First Imp. using Tower Fields

Satoh et al.

Page 5: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Target small delay/ high efficiency

Target small area

Introduction

3

1998 2001 2005 2010 2015 2016 2018

First Introductionof Rijndael

Rijmen & Daemen

Standardizing Rijndael as the

AES

Most compact S-box

Canright

Reduce the number of gates in Canright to 115

Boyar and Peralta

Most efficient S-box

Ueno et al.

Reduce the depth of S-box to 16 gatesBoyar, Find and Peralta

Then to 113

CMT

First Imp. using Tower Fields

Satoh et al.

Page 6: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Target small delay/ high efficiency

Target small area

Introduction

3

1998 2001 2005 2010 2015 2016 2018

First Introductionof Rijndael

Rijmen & Daemen

Standardizing Rijndael as the

AES

Most compact S-box

Canright

Reduce the number of gates in Canright to 115

Boyar and Peralta

Most efficient S-box

Ueno et al.

Reduce the depth of S-box to 16 gatesBoyar, Find and Peralta

Then to 113

CMT

1. The most compact S-box to date. 2. The most efficient S-box to date.

In this paper, we propose:

First Imp. using Tower Fields

Satoh et al.

Page 7: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Implementation Pitfalls1. Use AND gates,

when NAND gates have smaller area and delay in all technology libraries.

4

Page 8: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Implementation Pitfalls1. Use AND gates,

when NAND gates have smaller area and delay in all technology libraries.

2. Use only simple gates, when compound gates (AND-OR-Invert, OR-AND-Invert) may be more efficient.

4

Page 9: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Implementation Pitfalls1. Use AND gates,

when NAND gates have smaller area and delay in all technology libraries.

2. Use only simple gates, when compound gates (AND-OR-Invert, OR-AND-Invert) may be more efficient.

• We improved previous designs using AND gates to the ones using NAND/NOR gates:

Targeting STM 65-nm CMOS standard library

4

S-boxArea (GEs) Delay (ns)

Original Improved Original Improved

Canright [Can05b] 200 1.253

113-gates [Boy16] 202 194 1.523 1.346

Depth-16 (2012) [BP12] 230.5 222 0.960 0.906

Depth-16 (2017) [BFP17] 224.5 216 0.957 0.912

Ueno et al. [UHS+15] 256.5 238 0.831 0.772

Page 10: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Implementation Pitfalls1. Use AND gates,

when NAND gates have smaller area and delay in all technology libraries.

2. Use only simple gates, when compound gates (AND-OR-Invert, OR-AND-Invert) may be more efficient.

• We improved previous designs using AND gates to the ones using NAND/NOR gates:

Targeting STM 65-nm CMOS standard library

4

S-boxArea (GEs) Delay (ns)

Original Improved Original Improved

Canright [Can05b] 200 1.253

113-gates [Boy16] 202 194 1.523 1.346

Depth-16 (2012) [BP12] 230.5 222 0.960 0.906

Depth-16 (2017) [BFP17] 224.5 216 0.957 0.912

Ueno et al. [UHS+15] 256.5 238 0.831 0.772

The smallestoriginal

The fastestoriginal

Page 11: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Implementation Pitfalls1. Use AND gates,

when NAND gates have smaller area and delay in all technology libraries.

2. Use only simple gates, when compound gates (AND-OR-Invert, OR-AND-Invert) may be more efficient.

• We improved previous designs using AND gates to the ones using NAND/NOR gates:

Targeting STM 65-nm CMOS standard library

4

S-boxArea (GEs) Delay (ns)

Original Improved Original Improved

Canright [Can05b] 200 1.253

113-gates [Boy16] 202 194 1.523 1.346

Depth-16 (2012) [BP12] 230.5 222 0.960 0.906

Depth-16 (2017) [BFP17] 224.5 216 0.957 0.912

Ueno et al. [UHS+15] 256.5 238 0.831 0.772

The smallestoriginal

The smallestimproved

The fastestoriginal

The fastestimproved

Page 12: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Implementation Pitfalls1. Use AND gates,

when NAND gates have smaller area and delay in all technology libraries.

2. Use only simple gates, when compound gates (AND-OR-Invert, OR-AND-Invert) may be more efficient.

• We improved previous designs using AND gates to the ones using NAND/NOR gates:

Targeting STM 65-nm CMOS standard library

At the end, we compare only against the Improved Versions.Formulations of the improved designs are included in the paper.

4

S-boxArea (GEs) Delay (ns)

Original Improved Original Improved

Canright [Can05b] 200 1.253

113-gates [Boy16] 202 194 1.523 1.346

Depth-16 (2012) [BP12] 230.5 222 0.960 0.906

Depth-16 (2017) [BFP17] 224.5 216 0.957 0.912

Ueno et al. [UHS+15] 256.5 238 0.831 0.772

The smallestoriginal

The smallestimproved

The fastestoriginal

The fastestimproved

Page 13: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Original S-box

AES S-box

5

Inversion GF(28)g x M + h s

Page 14: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Original S-box

• Typical implementation using Composite Fields in Normal Basis

AES S-box

5

Inversion GF(28)g x M + h s

x M + h sXX-1g

()2

Composite field Inversion

Page 15: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• 12 terms are shared between the Exponentiation and Multipliers

Proposed AES S-box Architecture

6

sToutTing 12

5

10

5

Composite field Inversion

6

6

Page 16: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• 12 terms are shared between the Exponentiation and Multipliers

New Logic-Minimization Algorithms

New Logic-Minimization Algorithms

New Formulations

New, Improved Representations

New Formulations

New Multipliers

Proposed AES S-box Architecture

6

sToutTing 12

5

10

5

Composite field Inversion

6

6

Page 17: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• 12 terms are shared between the Exponentiation and Multipliers

New Logic-Minimization Algorithms

New Logic-Minimization Algorithms

New Formulations

New, Improved Representations

New Formulations

New Multipliers

Proposed AES S-box Architecture

6

sToutTing 12

5

10

5

Everything optimized by-hand and by CAD tools at various abstraction levels(promote using NAND/NOR and compound gates )

Composite field Inversion

6

6

Page 18: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Introduction, Motivation and Previous Work.

• Proposed AES S-box Architecture.

• New Logic-Minimization Algorithms.

• New GF((24)2) Inversion.• New Exponentiation Stage.

• New Representation of Subfield Inversion.

• New Output Multipliers.

• Comparisons and Concluding Remarks.

Outline

7

Page 19: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Inp

ut

Rep

. in

GF(

(24)2

)1

2 s

har

ed t

erm

s

• Implement isomorphic transformationmatrices using smallest number of gates.

• NP-hard problem [BMP08].

Logic-Minimization Algorithms

Tin

Ting 12

8

Page 20: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Implement isomorphic transformationmatrices using smallest number of gates.

• NP-hard problem [BMP08].

• Previous work• Cancellation-free search:

Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94].

Logic-Minimization Algorithms (cont.)

9

First 8 rows of Tin

Page 21: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Implement isomorphic transformationmatrices using smallest number of gates.

• NP-hard problem [BMP08].

• Previous work• Cancellation-free search:

Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94].

• Heuristics (with cancellation): Normal-BP (Boyar and Peralta [BP10])

Logic-Minimization Algorithms (cont.)

9

First 8 rows of Tin

Page 22: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Implement isomorphic transformationmatrices using smallest number of gates.

• NP-hard problem [BMP08].

• Previous work• Cancellation-free search:

Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94].

• Heuristics (with cancellation): Normal-BP (Boyar and Peralta [BP10])

1. Test adding one gate2. Compute Distance to each target

(assuming no sharing)3. Select a gate leading to the (min average Dist)

Resolve ties using different methods.

Logic-Minimization Algorithms (cont.)

1

9

First 8 rows of Tin

Page 23: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Implement isomorphic transformationmatrices using smallest number of gates.

• NP-hard problem [BMP08].

• Previous work• Cancellation-free search:

Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94].

• Heuristics (with cancellation): Normal-BP (Boyar and Peralta [BP10])

1. Test adding one gate2. Compute Distance to each target

(assuming no sharing)3. Select a gate leading to the (min average Dist)

Resolve ties using different methods.

Logic-Minimization Algorithms (cont.)

1

ComputeDist2

9

First 8 rows of Tin

Page 24: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Implement isomorphic transformationmatrices using smallest number of gates.

• NP-hard problem [BMP08].

• Previous work• Cancellation-free search:

Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94].

• Heuristics (with cancellation): Normal-BP (Boyar and Peralta [BP10])

1. Test adding one gate2. Compute Distance to each target

(assuming no sharing)3. Select a gate leading to the (min average Dist)

Resolve ties using different methods.

Logic-Minimization Algorithms (cont.)

1

ComputeDist2

9

3

First 8 rows of Tin

Page 25: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Implement isomorphic transformationmatrices using smallest number of gates.

• NP-hard problem [BMP08].

• Previous work• Cancellation-free search:

Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94].

• Heuristics (with cancellation): Normal-BP (Boyar and Peralta [BP10])

1. Test adding one gate2. Compute Distance to each target

(assuming no sharing)3. Select a gate leading to the (min average Dist)

Resolve ties using different methods.

Logic-Minimization Algorithms (cont.)

1

ComputeDist2

9

3

First 8 rows of Tin

Add the selected gate and

redo

Page 26: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Proposed Logic-Minimization Algorithms• Improved-BP:

• Test all the ties.

• Monitor progress of the delay.

• Shortest-Dist-First:• Select a gate leading to many small (short) Distances

(prioritize small Distances, not the average).

• Test all the ties and monitor the delay.

• Focused-Search:• Select a gate leading to any small (short) Distance

(ignore the count and search through more cases)(close to exhaustive search).

• Test all the ties and monitor the delay.

10

First 8 rows of Tin

Logic-Minimization Algorithms (cont.)

1

ComputeDist2

3

Page 27: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Studied Tin and Tout for all possible isomorphic transformations (a total of 96 matrices).

11

Logic-Minimization Algorithms (cont.)

Page 28: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Studied Tin and Tout for all possible isomorphic transformations (a total of 96 matrices).

• The proposed algorithms consistently lead to equal or betterimplementations.

11

Logic-Minimization Algorithms (cont.)

Page 29: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Studied Tin and Tout for all possible isomorphic transformations (a total of 96 matrices).

• The proposed algorithms consistently lead to equal or betterimplementations.

• Lightweight Implementation

Optimizedby CAD tools

Normal-BP Improved-BPShortest-Dist-

FirstFocused-Search

Tin (#gates) 29 19 19 19 19

Tout (#gates) 23 19 17 17 16

11

Logic-Minimization Algorithms (cont.)

Page 30: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Studied Tin and Tout for all possible isomorphic transformations (a total of 96 matrices).

• The proposed algorithms consistently lead to equal or betterimplementations.

• Lightweight Implementation

• Fast Implementation

Optimizedby CAD tools

Normal-BP Improved-BPShortest-Dist-

FirstFocused-Search

Tin (#gates) 29 19 19 19 19

Tout (#gates) 23 19 17 17 16

11

Area (# XOR gates) Delay (levels of XOR gates)

Tin (#gates) 24 3

Tout (#gates) 21 3

Logic-Minimization Algorithms (cont.)

Page 31: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Introduction, Motivation and Previous Work.

• Proposed AES S-box Architecture.

• New Logic-Minimization Algorithms.

• New GF((24)2) Inversion.• New Exponentiation Stage.

• New Representation of Subfield Inversion.

• New Output Multipliers.

• Comparisons and Concluding Remarks.

Outline

12

Page 32: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Express as one operation withclosed-form equations (allows for maximum sharing).

New Exponentiation Stage

13

()2

Page 33: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Express as one operation withclosed-form equations (allows for maximum sharing).

• Two designs: Lightweight and Fast.

(Optimized by hand)

• One design optimized by CAD tools.

New Exponentiation Stage

13

()2

Page 34: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

New Exponentiation Stage (cont.)

14

Area (GEs) Delay (ns)

1. Lightweight (optimized by-hand) 30 0.103

2. Fast (optimized by-hand) 30 0.091

3. Optimized by CAD tool 29.25 0.100

1. Lightweight(optimized by-hand)

2. Fast (optimized by-hand)

3. Optimized by CAD tool(Used XOR3 gates)

Page 35: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Express in closed-form equations

• Derive 12 equivalent functions using Karnough maps,and optimize by-hand.

• Optimized using CAD tools.

New Subfield Inversion

15

Page 36: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Express in closed-form equations

• Derive 12 equivalent functions using Karnough maps,and optimize by-hand.

• Optimized using CAD tools.

New Subfield Inversion

15

Area (GEs) Delay (ns)

Lightweight and fast (optimized by-hand) 36 0.121

Optimized by CAD tools 31 0.102

Lightweight and fast, optimized by-handUsed NAND3 gates

Optimized by CAD toolsUsed OR-AND-Invert gates

Page 37: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Two multipliers with a common input:W = B x E & Z = A x E

New Output Multipliers

16

5

5Z

WB

E

A

Page 38: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Two multipliers with a common input:W = B x E & Z = A x E

• Input and output terms represented as 4 bits x 4 bits 5 bits

Reduction from 5 bits back to 4 bits is part of Tout .

New Output Multipliers

16

5

5Z

WB

E

A

Page 39: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Two multipliers with a common input:W = B x E & Z = A x E

• Input and output terms represented as 4 bits x 4 bits 5 bits

Reduction from 5 bits back to 4 bits is part of Tout .

• Previous work: 4x4 4 [Can05b], 5x5 5 [NNI12], 4x5 5 [UHS+15]

New Output Multipliers

16

5

5Z

WB

E

A

Page 40: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Focus on the combined cost of the twomultipliers (deploy maximum sharing).

New Output Multipliers (cont.)

17

5

5Z

WB

E

A

Z

WB

E

A

bi + bj

ei + ej

ai + aj

5

6

6

6

5

4

4

4

Page 41: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Focus on the combined cost of the twomultipliers (deploy maximum sharing).

New Output Multipliers (cont.)

17

5

5Z

WB

E

A

Z

WB

E

A

bi + bj

ei + ej

ai + aj

Used NAND3 gatesPart of Tin

5

6

6

6

5

4

4

4

Page 42: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Focus on the combined cost of the twomultipliers (deploy maximum sharing).

New Output Multipliers (cont.)

17

5

5Z

WB

E

A

Z

WB

E

A

bi + bj

ei + ej

ai + aj

Used NAND3 gatesPart of Tin

Implemented once(shared)

5

6

6

6

5

4

4

4

Page 43: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Focus on the combined cost of the twomultipliers (deploy maximum sharing).

• Some multipliers do not allow sharing ([Mas91], [RDJ+01] and [GM16]).

New Output Multipliers (cont.)

17

5

5Z

WB

E

A

Z

WB

E

A

bi + bj

ei + ej

ai + aj

Used NAND3 gatesPart of Tin

Implemented once(shared)

5

6

6

6

5

4

4

4

Page 44: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Space and time complexities of a single multiplier

New Output Multipliers (cont.)

18

Multiplier used in Space Complexity Time Complexity

GF(

((2

2)2

)2) Satoh et al. [SMTM01] 21 XOR + 9 AND 4 DX + DAD

Canright [Can05b] 20 XOR + 9 NAND 4 DX + DND

Nogami et al. [NNT+10] 21 XOR + 9 AND 4 DX + DAD

GF(

(24)2

)

Rudra et al. [RDJ+01] 15 XOR + 16 AND 3 DX + DAD

Gueron et al. [GM16] 15 XOR + 16 AND 3 DX + DND

Nekado et al. [NNI12] 25 XOR + 10 AND 2 DX + DAD

Ueno et al. [UHS+15] 21 XOR + 10 AND 2 DX + DAD

This work 17 XOR + 10 NAND 2 DX + DND

Page 45: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Space and time complexities of a single multiplier

The smallest and fastest 4-bit multiplier to date among all the GF((24)2) and GF(((22)2)2) multipliers

New Output Multipliers (cont.)

18

Multiplier used in Space Complexity Time Complexity

GF(

((2

2)2

)2) Satoh et al. [SMTM01] 21 XOR + 9 AND 4 DX + DAD

Canright [Can05b] 20 XOR + 9 NAND 4 DX + DND

Nogami et al. [NNT+10] 21 XOR + 9 AND 4 DX + DAD

GF(

(24)2

)

Rudra et al. [RDJ+01] 15 XOR + 16 AND 3 DX + DAD

Gueron et al. [GM16] 15 XOR + 16 AND 3 DX + DND

Nekado et al. [NNI12] 25 XOR + 10 AND 2 DX + DAD

Ueno et al. [UHS+15] 21 XOR + 10 AND 2 DX + DAD

This work 17 XOR + 10 NAND 2 DX + DND

Page 46: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Additional area and delay required for the multipliers

Area (GEs) Delay (ns)

Optimized by-hand 52 0.099

Optimized by CAD tools 53.5 0.121

Optimized by-hand

Z

W

bi

Eei + ej

5

6

6

6

5

4

4

4

bij=bi + bj

aij=ai + aj

ai

Tin

New Output Multipliers (cont.)

19

Page 47: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Introduction, Motivation and Previous Work.

• Architecture of the Proposed AES S-box.

• New Logic-Minimization Algorithms.

• New GF((24)2) Inversion.• New Exponentiation Stage.

• New Representation of Subfield Inversion.

• New Output Multipliers.

• Comparisons and Concluding Remarks.

Outline

20

Page 48: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Targeting Lightweight Implementation

21

Comparisons

S-box Area (GEs) Delay (ns) Area-Time Product

Canright [Can05b] 200 1.25 250

Improved 113-gates 194 1.35 261.9

This work (Lightweight) 182.25 1.20 218.7

The smallest, fastest and most efficient Lightweight S-box

Page 49: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Targeting Lightweight Implementation

• Targeting Fast Implementation

At STM 65-nm CMOS standard technology library

21

Comparisons

S-box Area (GEs) Delay (ns) Area-Time Product

Canright [Can05b] 200 1.25 250

Improved 113-gates 194 1.35 261.9

This work (Lightweight) 182.25 1.20 218.7

S-box Area (GEs) Delay (ns) Area-Time Product

Improved Depth-16 (2012) 222 0.91 202.02

Improved Depth-16 (2017) 216 0.91 196.56

Improved Ueno et al. 238 0.77 183.26

This work (Fast) 208 0.78 162.24

The smallest, fastest and most efficient Lightweight S-box

The smallest, fastest and most efficient Fast S-box

Page 50: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

• Targeting Lightweight Implementation

• Targeting Fast Implementation

At STM 65-nm CMOS standard technology library

21

Comparisons

S-box Area (GEs) Delay (ns) Area-Time Product

Canright [Can05b] 200 1.25 250

Improved 113-gates 194 1.35 261.9

This work (Lightweight) 182.25 1.20 218.7

S-box Area (GEs) Delay (ns) Area-Time Product

Improved Depth-16 (2012) 222 0.91 202.02

Improved Depth-16 (2017) 216 0.91 196.56

Improved Ueno et al. 238 0.77 183.26

This work (Fast) 208 0.78 162.24

The smallest, fastest and most efficient Lightweight S-box

The smallest, fastest and most efficient Fast S-box

As compared against the improved versions proposed in this paper

As a result of testing more than 46 pieces of VHDL code, at various abstraction levels of the designs

Page 51: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

22

Effect of Target Library• Industrial technology libraries (e.g., STM and TSMC):

• Lightweight: Used XOR3 and OAI32 182.25 GEs.

• Fast: Used NAND3 208 GEs.

Page 52: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

22

Effect of Target Library• Industrial technology libraries (e.g., STM and TSMC):

• Lightweight: Used XOR3 and OAI32 182.25 GEs.

• Fast: Used NAND3 208 GEs.

• NanGate45nm:

• Lightweight: Used AOI12 and OAI12 gates 186 GEs.

• Fast: Used NAND3 208 GEs (no change).

Page 53: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

22

Effect of Target Library• Industrial technology libraries (e.g., STM and TSMC):

• Lightweight: Used XOR3 and OAI32 182.25 GEs.

• Fast: Used NAND3 208 GEs.

• NanGate45nm:

• Lightweight: Used AOI12 and OAI12 gates 186 GEs.

• Fast: Used NAND3 208 GEs (no change).

• Without using any compound gate:

• Lightweight: 191 GEs (best previous work: 194 GEs)

• Fast: 211 GEs (best previous work: 216 GEs)

Page 54: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

22

Effect of Target Library• Industrial technology libraries (e.g., STM and TSMC):

• Lightweight: Used XOR3 and OAI32 182.25 GEs.

• Fast: Used NAND3 208 GEs.

• NanGate45nm:

• Lightweight: Used AOI12 and OAI12 gates 186 GEs.

• Fast: Used NAND3 208 GEs (no change).

• Without using any compound gate:

• Lightweight: 191 GEs (best previous work: 194 GEs)

• Fast: 211 GEs (best previous work: 216 GEs)

The proposed designs are superior under any restriction by the target library.

Page 55: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Concluding Remarks

23

• In this paper, we proposed:

• Two new designs for the AES S-box: Lightweight and fast.

• New logic-minimization heuristics.

• New formulations for each stage of the S-box.

• New output multipliers.

• Design methodology for an optimum synergy between theoretical analysis and technology-assisted CAD tools.

Page 56: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

References

24

• [Can05b] David Canright. A very compact S-box for AES. CHES-2005.

• [Boy16] CMT: Circuit minimization team, 2016. http://www.cs.yale.edu/homes/peralta/CircuitStuff/CMT.html,

• [BP12] Joan Boyar and René Peralta. A small depth-16 circuit for the AES S-box. Information Security and Privacy Conference, SEC 2012.

• [BFP17] Joan Boyar, Magnus Find, and René Peralta. Low-depth, low-size circuits for cryptographic applications. In Boolean Functions and their Applications BFA-2017.

• [UHS+15] Rei Ueno, Naofumi Homma, Yukihiro Sugawara, Yasuyuki Nogami, and Takafumi Aoki. Highly efficient GF(28) inversion circuit based on redundant GF arithmetic and its application to AES design. CHES-2015.

• [BMP08] Joan Boyar, Philip Matthews, and René Peralta. On the shortest linear straight-line program for computing linear forms. Mathematical Foundations of Computer Science, MFCS 2008.

• [Paa94] Christof Paar. Efficient VLSI architectures for bit parallel computation in Galios fields. PhD thesis, University of Duisburg-Essen, Germany, 1994.

• [BP10] Joan Boyar and René Peralta. A new combinational logic minimization technique with applications to cryptology. Symposium on Experimental Algorithms, SEA 2010.

• [NNI12] Kenta Nekado, Yasuyuki Nogami, and Kengo Iokibe. Very short critical path implementation of AES with direct logic gates. International Workshop on Security, IWSEC 2012.

• [Mas91] E. D. Mastrovito. VLSI Architectures for Computation in Galois Fields. PhD thesis, Linkoping Univ., Linkoping Sweden, 1991.

• [RDJ+01] Atri Rudra, Pradeep K. Dubey, Charanjit S. Jutla, Vijay Kumar, Josyula R.Rao, and Pankaj Rohatgi. Efficient Rijndael encryption implementation with composite field arithmetic. CHES 2001.

• [GM16] Shay Gueron and Sanu Mathew. Hardware implementation of AES using area-optimal polynomials for composite-field representation GF((24)2) of GF(28). ARITH 2016.

• [SMTM01] Akashi Satoh, Sumio Morioka, Kohji Takano, and Seiji Munetoh. A compact Rijndael hardware architecture with S-box optimization. ASIACRYPT 2001.

• [NNT+10] Yasuyuki Nogami, Kenta Nekado, Tetsumi Toyota, Naoto Hongo, and Yoshitaka Morikawa. Mixed bases for efficient inversion in F((22)2)2 and conversion matrices of subbytes of AES. CHES-2010.

Page 57: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

25

Thank You,

Questions?

Page 58: Smashing the Implementation Records of AES S-box...Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London,

Logic-Minimization AlgorithmsTout

Input and Dist, using original the inputs

3

77

5

33

1

5

Dist, assume using w0+w1

3

56

4

32

1

5

Sum(Dist) = 29

Dist, assume using w0+w2

3

67

5

32

1

5

Sum(Dist) = 32

First, add all gates with Dist=1

3

67

5

32

1

5

Dist, assume using w0+w4

3

66

5

32

1

5

Sum(Dist) = 31• Normal-BP:

1.Test all the possible XOR gates that can use the previous level gates (the inputs and (w2+w4)). That is: from (w0+w1) all the way to (z4 + (w2+w4)).2.Select one gate that leads to [ min (sum (Dist)) ]. In case of ties, select one gate based on different tie breaking criteria.For example, within the best gates, select one gate that maximizes the Euclidean norm of Dist

• Improved-BP:Similar to Normal-BP, but try all the tie, and monitor progress of the Delay.

• Shortest-Dist-FirstSimilar to Norma-BP, but select all the gates that as many small numbers in the Dist as possible. If we consider the four cases above, we will select all of thembecause the smallest number is 2 (excluding ones), and this number (2) appears one time in each case. If it were to appear twice in any case, I would have selected that case. If the smallest number is 3, so that is the smallest Dist, and select the case that leads to as many (Dist=3) as possible.

• Focused-SearchSimilar to ‘Shortest-Dist-First’, but we ignore the count of (Dist=2) or (Dist=3). Here, we select all the gates that include (Dist=2) within the vector of Distances. We do not differentiate based on the count. If there is no gate that lead to Dist=2, select all the gates that include Dist=3, and so on.

Dist, assume using w0+w3

3

66

5

32

1

5

Sum(Dist) = 31

26