aes’ [and other block ciphers] implementation...
TRANSCRIPT
Useful Properties for ImplementingBlock Ciphers
Bit-wise operations (XOR, AND, OR, etc.)
LUT4 x 1
X = a + bY = X + c Z = Y + d
Z = a + b + c + d
Useful Properties for ImplementingBlock Ciphers
Permutation
Permutation = [1, 5, 4, 3, 2, 6]Change of wires Free of cost
Useful Properties for ImplementingBlock Ciphers
Shift & rotation
IN[31:0]
A[31:24]
B[23:16]
C[15:8]
D[7:0]
Shifting 8-bit Rotation
•Cost free operations
IN[24:0]
OUT[31:0]
8-bit
Useful Properties for ImplementingBlock Ciphers
Iterative nature
1st
Round
2nd
Round
nth
RoundLatch Latch Latch
CE CLK CE CLK CE CLK
IN Out
Latch Latch Latch
CE CLK1 CE CLK1 CE CLK1
IN Out
1st
Round
Latch
2nd
Round
Latch
nth
Round
Latch
CLK2
One
Round
CE CLK
IN
Select
OutLatchIterative
Pipeline
Sub-Pipeline
Useful Properties for ImplementingBlock Ciphers
Parallelism
X = a + bY = X + cZ = Y + d
X = a + bY = a + b + cZ = a + b + c + d
One cycleThree cycle
X Y Z XYZ
• Lot of permutation operations. Is there any difficulty?• Substitution is a problem?
Example for DES Implementation onFPGA
right<=ip(56)&ip(48)&ip(40)&ip(32)&ip(24)&ip(16)&ip(8)&ip(0)&ip(58)&ip(50)&ip(42)&ip(34)&ip(26)&ip(18)&ip(10)&ip(2)&ip(60)&ip(52)&ip(44)&ip(36)&ip(28)&ip(20)&ip(12)&ip(4)&ip(62)&ip(54)&ip(46)&ip(38)&ip(30)&ip(22)&ip(14)&ip(6);
concatenation operator
ip[63:0] right
Permutations in Hardware (FPGA)
S1 64 x 4 = 256 bitsS2 64 x 4 = 256 bitsS3 64 x 4 = 256 bitsS4 64 x 4 = 256 bitsS5 64 x 4 = 256 bitsS6 64 x 4 = 256 bitsS7 64 x 4 = 256 bitsS8 64 x 4 = 256 bits
2048 bits = 2K
CLB slices in memoy mode = 4 x 8 = 32 CLB slicesUsing selected BRAM => Virtex series devices contains more than 280 BRAMs of 4K each
Substitution in Hardware (FPGA)
DES implementation in Hardware(FPGA)
Author Device CLB Slices
Allowed Freq. (MHz)
Throughput (Mbits/s)
Biham(software) 1997
Alpha 8400
300 127
Wong et al 1998 XC4020E 438 10 26.7
Kaps and Paar 1998
XC4028EX 741 25.18 402.7
Free-DES 2000 XCV400 5263 47.7 3052
McLoony 2003 XCV1000 6446 59.5 3808
Sandia 1999 Laboratories
ASIC 9280
Patterson 2000 (Jbits)
XCV150 1584 168 10752
This work XCV312 165 68.05 274
RijndaelAdvanced Encryption Standard
Rijndael block cipher algorithm has been chosen byNIST as the Advanced Encryption Standard– 128, 192 and 256 bit block-length– When it is called AES, it means block length of 128 bits only
FPGA AES implementations:– Single encryptor:
• Dandalis, … , Elbirt, … & Gaj,… : 2000
– Full encryptor/decryptor:• McLoone & McCanny 2001 CHES2001
– 3.2 Gbps
AES Encryption AlgorithmFlow
BS: Byte SubstitutionSR: Shift RowsMC: Mix ColumnARK: Add Round Key
ARK BS ARK BS SR ARK
SR MC
IN OUT
(ROUND-1)
USER KEY SUB KEY SUB KEY
Selection of rounds
AES
1514131211109876543210bbbbbbbbbbbbbbbb
!!
"
!!
#
$
!!
%
!!
&
'
15141312
111098
7654
3210
bbbb
bbbb
bbbb
bbbb
Input = 128 bits = 16 bytes
Both plaintext and key are arranged into 4 x 4 matrix
State Matrix
………………………….. RoundKey 10
RoundKey 3
RoundKey 1
RoundKey 0
!!
"
!!
#
$
!!
%
!!
&
'
15141312
111098
7654
3210
kkkk
kkkk
kkkk
kkkk
…………………………..
…………………………..
Key Scheduling
!!
"
!!
#
$
!!
%
!!
&
'
31302928
27262524
23222120
19181716
kkkk
kkkk
kkkk
kkkk
!!
"
!!
#
$
!!
%
!!
&
'
175174173172
171170169168
167166165164
163162161160
kkkk
kkkk
kkkk
kkkk
User-key Generated- keys
S-BOX16x16
a3,3a3,2a3,1a3,0
a2,3a2,2a2,1a2,0
a1,3a1,2a1,1a1,0
a0,3a0,2a0,1a0,0
b3,3b3,2b3,1b3,0
b2,3b2,2b2,1b2,0
b1,3b1,2b1,1b1,0
b0,3b0,2b0,1b0,0
1. Byte Substitution
State Matrix
BS ARK
SR MC
SUB KEY
BRAM
BRAM
BRAM
BRAM
BRAM
BRAM
BRAM
BRAM
BRAM
BRAM
BRAM
BRAM
BRAM
BRAM
BRAM
BRAMIN
OUT
16 BRAMS OF 256 X 8
Byte Substitution
onmpjilkehgfdcba
ponmlkjihgfedcbaOffset 0
2. ShiftRow(SR)
MCOffset 1
Offset 2
Offset 3
onmpjilkehgfdcba
ponmlkjihgfedcbaOffset 0
IMCOffset 1
Offset 2
Offset 3
BS ARK
SR MC
SUB KEY
onmpjilkehgfdcba
ponmlkjihgfedcbaOffset 0
ShiftRow(SR)
MCOffset 1
Offset 2
Offset 3
BS ARK
SR MC
SUB KEY
abcde f g
hi
jk
l
mno
p
IN OUT
**Every entry is represented in GF(28)
3. MixColumn(MC) &Inv MixColumn(IMC)
MC
IMC
i=0,1,2,3
BS ARK
SR MC
SUB KEY
!!!!
"
#
$$$$
%
&
!!!!
"
#
$$$$
%
&
=
!!!!
"
#
$$$$
%
&
'
'
'
'
i
i
i
i
c
c
c
c
c
c
c
c
,3
,2
,1
,0
0,0
0,0
0,0
0,0
02010103
03020101
01030201
01010302
!!!!
"
#
$$$$
%
&
!!!!
"
#
$$$$
%
&
=
!!!!
"
#
$$$$
%
&
'
'
'
'
i
i
i
i
c
c
c
c
EDB
BED
DBE
DBE
c
c
c
c
,3
,2
,1
,0
0,0
0,0
0,0
0,0
00900
00090
00009
09000
in GF(28)
in GF(28)
a3,3a3,2a3,1a3,0
a2,3a2,2a2,1a2,0
a1,3a1,2a1,1a1,0
a0,3a0,2a0,1a0,0
b3,3b3,2b3,1b3,0
b2,3b2,2b2,1b2,0
b1,3b1,2b1,1b1,0
b0,3b0,2b0,1b0,0
k3,3k3,2k3,1k3,0
k2,3k2,2k2,1k2,0
k1,3k1,2k1,1k1,0
k0,3k0,2k0,1k0,0
!=
4. AddRoundKey(ARK)
key
BS ARK
SR MC
SUB KEY
Novel techniques for implementingAES round transformation
Steps• Key schedule• S-Box & Inv. S-Box• MC & Inv. MC
!
k0
k4
k8
k12
k1
k5
k9
k13
k2
k6
k10
k14
k3
k7
k11
k15
"
#
$ $ $ $
%
&
' ' ' '
Step 1 Step 2
( ) rconkSboxkk !!="1300
044kkk "!="
,
0848kkkk "!!="
,
0128.412kkkkk "!!!="
Step 1 Step 2 Step 3 Step 4
( ) rconkSboxkk !!="1300
044kkk "!="
488kkk "!="
81212kkk "!="
Key Schedule
!
" k 0
" k 4
" k 8
" k 12
" k 1
" k 5
" k 9
" k 13
" k 2
" k 6
" k 10
" k 14
" k 3
" k 7
" k 11
" k 15
#
$
% % % %
&
'
( ( ( (
Step 1 Step 2
( ) rconkSboxkk !!="1300
044kkk "!="
,
0848kkkk "!!="
,
0128.412kkkkk "!!!="
Key Schedule
Byte Substitution (BS)• Look-up table method• Composite Field approach
IAF
MIAF S-BOX
INV S-BOX
IN
E/D
MI AF
IAF MI
S-BOX
INV S-BOX
IN
in GF(28)
Byte Substitution (BS)(MI manipulation)• Look-up table method
Two methods to construct S-Box using look-up tablemethod
1. Using distributed memory2. Using built in memories called BRAMs
1. Map the element A ∈ GF(28) to a composite field F2. Compute the Multiplicative Inverse over the field F3. Map back from field F to GF(28)
• Composite Field GF((22)2 )2
S. Morioka and A. Satoh, CHES 2002
MixColumn (MC)[ ]
[ ]
[ ]
[ ]
[ ] [ ] [ ] [ ]
[ ] [ ] [ ] [ ]
[ ] [ ] [ ] [ ]
[ ] [ ] [ ] [ ]302201101003
303202101001
301203102001
301201103002
3
2
1
0
02010103
03020101
01030201
01010302
aaaa
aaaa
aaaa
aaaa
a
a
a
a
!!!
!!!
!!!
!!!
=
""""
#
$
%%%%
&
'
""""
#
$
%%%%
&
'
in GF(28)
02×v.
[ ] [] [ ] [ ]3210 aaaat !!!=
[ ] []10 aav !=
( )vxtimev =
[ ] [ ] tvaa !!=" 00
[] [ ]21 aav !=
( )vxtimev =
[] [] tvaa !!=" 11
[ ] [ ]32 aav !=
( )vxtimev =
[ ] [ ] tvaa !!=" 22
[ ] [ ]03 aav !=
( )vxtimev =
[ ] [ ] tvaa !!=" 33
[ ] [ ] [ ]( ) [ ] [ ] [ ] [ ]( )32101020020 aaaaaaa !!!!!!
MixColumn (MC)
[] [ ] [ ]321 aaav !!=
[ ]( )00
axtimext =
[ ]1
0 xtxtvao!!="
[ ] [ ] [ ]320 aaav !!=
[]( )11
axtimext =
[]21
1 xtxtva !!="
[ ] [] [ ]310 aaav !!=
[ ]( )22
axtimext =
[ ]32
2 xtxtva !!="
[ ] [] [ ]210 aaav !!=
[ ]( )33
axtimext =
[ ]03
3 xtxtva !!="
[] [ ] [ ]321 aaav !!=
[ ]( )00
axtimext =
[ ] [ ]1
00 xtxtvka o !!!="
[ ] [ ] [ ]320 aaav !!=
[]( )11
axtimext =
[] []21
11 xtxtvka !!!="
[ ] [] [ ]310 aaav !!=
[ ]( )22
axtimext =
[ ] [ ]32
22 xtxtvka !!!="
[ ] [] [ ]210 aaav !!=
[ ]( )33
axtimext =
[ ] [ ]03
33 xtxtvka !!!="
Key
02×v
Inv MixColumn(IMC)[ ]
[ ]
[ ]
[ ]
[ ] [ ] [ ] [ ]
[ ] [ ] [ ] [ ]
[ ] [ ] [ ] [ ]
[ ] [ ] [ ] [ ]302091000
302010900
302010009
309201000
3
2
1
0
00900
00090
00009
09000
EaaDaBa
BaEaaDa
DaBaEaa
aDaBaEa
a
a
a
a
EDB
BED
DBE
DBE
!!!
!!!
!!!
!!!
=
"""""
#
$
%%%%%
&
'
"""""
#
$
%%%%%
&
'
IMC
( )( )( )[ ] ( )( )[ ] ( )xxtimexxtimextimexxtimextimextimeEx !!=0
( )xxtimex =02( ) xxxtimex !=03
08(x) 04(x) 02(x)
02(x)
Now compare MC & IMC ?
IMC
MC *
!!!!
"
#
$$$$
%
&
!!!!
"
#
$$$$
%
&
=
!!!!
"
#
$$$$
%
&
05000400
00050004
04000500
00040005
02010103
03020101
01030201
01010302
00900
00090
00009
09000
EDB
BED
DBE
DBE
We observe that,
(1) (2)
( )( ) xxxtimextimex !=05
The biggest co-efficient for Eq.2 is, 05
Eq.1, we already have(MC), Eq.2 calculation can be made before Eq.1
Inv MixColumn(IMC)
Implementing AES on FPGAs
Architecture 1: Encryptor core• Sequential approach
Architecture 2: Encryptor core• Pipeline approach
Architecture 3: Encryptor/decryptor core• MC/IMC modified approach
Architecture 4: Encryptor/decryptor core• Using look-up table method
Architecture 5: Encryptor/decryptor core• Using composite field approach
AES Implementation Strategies
The commonly used architecures are:
round 1round 2
------------round n
n roundsone round
Register 1Stage 1
Register 2
Stage k Register k
..............
One round repeated n times
Loop unrolling
Iterative looping
Inner-round pipeling
Architecture 1Sequential Approach
KGEN LATCHROUND KEY
SRCON CLK
USER KEY
RND 1-9 LATCH
SROUND-KEY CLK
RND 10RND 0
ROUND-KEY
CIPHER TEXT
PLAIN TEXT
USER-KEY
Architecture 2 Pipelined Approach
IN R
EG
RN
D 0
RN
D 1
RN
D 2
RN
D 3
RN
D 4
RN
D 5
RN
D 6
RN
D 7
RN
D 8
RN
D 9
RN
D 1
0
IN OUT
IN R
EG
KG
EN
KG
EN
KG
EN
KG
EN
KG
EN
KG
EN
KG
EN
KG
EN
KG
EN
KG
EN
KG
ENUSER- KEY
RK
0
RK
1
RK
2
RK
3
RK
4
RK
5
RK
6
RK
7
RK
8
RK
9
RK
10
Encryption: MI + AF + SR + MC + ARKDecryption: ISR + IAF + MI + ModM + MC + ARK
Architecture 3Encryption/Decryption
ISRIAF
MI
AFSR
ModM
MCARKIN OUT
ENC
DEC
ISR
IAF
MI
AF
SRIMC
IARK
MC
ARKIN OUT
ENC
DEC
E/DE/D
E/DE/D
Architecture 4Encryptor/decryptor core
using look-up table method
• Same S-Box (MI) for encryption/decryption• Memory requirements become half
• BRAMs are used for storing MI values.• No initial time to prepare them
ISRMI
AF
IN
IAF
SRE/D MC
ARK
IMCIARK
OUTE/D
GF(28) TO FIELD F GF(2)2)2
M-1M FIELD F TO GF(28)MIManipulation
IstTransformation
2ndTransformation
Let A∈F2 and A= AH y + AL , then it can be shown that:
( )( )
LLHLLHH
LHH
AAAAAAAyAAA
AAyAA16216161617
16
0
;+=++×=×=
++=
ll
AHGF(28)
toGF(24)
8
A-14
4 XlAH
AL
4GF(24)
toGF(28)
8A
A17
AL16
4Mul 4x4
AL
X2 Mul 4x4
Mul 4x4
X -1
l AH2
ALA16
Architecture 5Encryptor/decryptor core using
composite field for MI
Metrics to measureperformance
Throughput := Clock cycle (Frequency) x No. of bits No. of rounds
1
2
Area CLB slices, BRAMs etc.
AES Implementation Strategies
3
Ratio= Throughput/Area
Device
(XCV)
Area
(CLB slices)
Throughput
(Mbs)
Through-put/Area
Gaj et al [1] 1000 2902 331.5 0.11
Dandalis et al [2] 1000 5673 353 0.06
Nazar et al 812 2744 258.5 0.09
Device
(XCV)
Area (CLB slices) Throughput
(Mbits/s)
Throughput/Area
Elbirt et al [3] 1000 9004 1940 0.22
Nazar et al 2600 2136 2868 1.29
Architecture 1: AES encryptor core usingsequential approach
Architecture 2: AES encryptor core usingpipeline approach
5%, 51% 22%, 26%
76% 47%
Architecture 3: AESencryptor/decryptor core usingMC/IMC modified approach
T/SThroughput(Mbits/s)(T)
CLB(S)Slices
BRAMsDevice
0.734121567780XCV2600EThis design
0.4332397576102XCV3200EMcLoone etal
Two approach for MC/IMC Less BRAMs Less Slices Higher Throughput reported to-date
27.03%25.06%
T/SThroughput(Mbits/s)(T)
CLB(S)Slices
BRAMsDevice
0.24313613416NoBRAMs
XCV2600EE/D GF(24)0.583840667680XCV2600EE/D GF(28)0.4332397576102XCV3200EMcLoone
Two approaches for MI Key Scheduling included No initial delay
First design uses look-up table for MI, Fast but high memory requirements Second design use composite field approach for MI, Slower with less memory requirements.
Both are efficient as compared to reported design
Architecture 4 & 5: AESencryptor/decryptor core usingMI look-up table and compositefield approach
11%, 77 % 25%, 3 %
Related Publications1. Nazar A. Saqib, Francisco Rodriguez-Henriquez, and Arturo Diaz-Perez,
“Sequential and pipelined architectures for AES implementation, ” proceedingsof IASTED international conference COMPUTER SCIENCE AND TECHNOLOGY,pp 159-163, May 19-21, 2003, Cancun Mexico.
2. F. Rodriguez-Henriquez, N.A. Saqib, and A. Diaz-Perez, “4.2 Gbit/s single-chipFPGA implementation of AES algorithm, “ ELECTRONICS LETTERS, Vol.39,No. 15, July 24, 2003.
3. Nazar A. Saqib, Francisco Rodriguez-Henriquez, and Arturo Diaz-Perez, “TwoApproaches for a Single-Chip FPGA Implementation of an Encryptor/DecryptorAES Core,” FPL 2003, Lecture Notes in computer Science 2778, pp. 303-312, 2003 (FPL 2003, Sep 1-3, Lisbon,Portugal).
4. Nazar A. Saqib, Francisco Rodriguez-Henriquez, and Arturo Diaz-Perez, “AESAlgorithm Implementation-An efficient approach for Sequential and Pipelinearchitectures,” Fourth Mexican International Conference on Computer Science,ENC’ 03, pp. 126-130, Sep. 8-12, 2003, Tlaxcala, Mexico.
5. Nazar A. Saqib, Arturo Diaz-Perez and Francisco Rodriguez-Henriquez, HighlyOptimized Single-Chip FPGA Implementations of AES Encryption and DecryptionCores”, Accepted for Iberchip 2004
Conclusions A promising AES Encryptor/decryptor core (contributions for AES S-Box/Inv S-Box)
Using look-up table for S-Box Using Composite Fields GF(24)
An optimized AES Encryptor/decryptor core (contributions for AES MC/IMC)
Using Modified version for IMC
A sequential and pipeline encryptor core (tradeoff between speed and area)
Future work: completion of ECC scalar multiplication Thesis writing and defense