lightweight tbc-based modes for small hardware …
TRANSCRIPT
![Page 1: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/1.jpg)
Lightweight TBC-Based Modes for SmallHardware Implementations
Mustafa Khairallah1
NTU, Singapore
December 15, 2019
1Joint with Tetsu Iwata, Kazuhiko Minematsu and Thomas Peyrin1 / 33
![Page 2: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/2.jpg)
Overview
Lightweight Cryptography Design Space
Our Goals
State of the art
Romulus-N
Misuse Resistance
FSM Optimization
Results
What’s next for Romulus?
2 / 33
![Page 3: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/3.jpg)
Lightweight Cryptography Design Space
Security Area Thput Energy Power
3 / 33
![Page 4: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/4.jpg)
Lightweight Cryptography Design Space
Security Area Thput Energy Power
4 / 33
![Page 5: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/5.jpg)
Lightweight Cryptography Design Space
5 / 33
![Page 6: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/6.jpg)
What went wrong?
1. Broken2. Too slow3. Too much energy4. Too complex5. Too big?!
6 / 33
![Page 7: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/7.jpg)
Security Goals
1. Urgent Provable 128-bit security in the standard TBC model.2. Urgent Easy to mask for side channel protection.3. Optional Misuse resistance.
7 / 33
![Page 8: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/8.jpg)
Area Goals
1. Urgent No extra state beyond the TBC.2. Urgent No feed-forward.3. Urgent No key/nonce storage.4. Urgent Around 6000 GEs for round based implementations.5. Urgent Below 10,000 GEs for threshold round based
implementations.6. Urgent Strictly inverse free.7. Urgent Decryption is almost free.8. Important Minimal use of multiplexers.9. Important Around 3000 GEs for byte serial implementations.
10. Important A wide variety of trade-offs.
8 / 33
![Page 9: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/9.jpg)
Performance Goals
1. Urgent Fast for short messages and authentication.2. Important Fast enough (≫ 1 Gbps on modern ASIC
technologies) even with first order masking.3. Important Wide performance range.4. Important Competitive with AES-based designs and CAESAR
lightweight portfolio (namely, Ascon).
9 / 33
![Page 10: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/10.jpg)
Where to look?
▶ ΘCB3 has great features:▶ Standard TPRP assumption.▶ The security bound is independent of the length.▶ Parallelizable.▶ Not inverse free, needs n extra flip flops, needs an extra call.
▶ ΘCB3 needs n extra flip flops at least.
10 / 33
![Page 11: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/11.jpg)
Where to look?
1. iCOFB is interesting starting point: lightweight.2. ZAE has higher rate for authentication compared to
encryption.
11 / 33
![Page 12: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/12.jpg)
Romulus-N
0n ρ
A[1] A[2] A[a− 1]
ρ
pad(A[a]) N
Sn
n
n
t
S ρ
M [1] N N
ρ
0n
n
n
n
t
C[1]
n
ρ
pad(M [m])
C[m]
lsb
T
ωA ∈ {24, 26}
ωM ∈ {20, 21}
E8,1K E
8,a−2K E
ωA,aK
E4,1K E
ωM ,a−2K
lsb
12 / 33
![Page 13: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/13.jpg)
Rx Architecture
state
Skinny lt
input
output
13 / 33
![Page 14: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/14.jpg)
Sx Architecture
S0 S1 S2 S3
S4 S5 S6 S7
S8 S9 Sa Sb
Sc Sd Se Sf SBox
RC
RTKρ
input
0x00
len
output
0x00
14 / 33
![Page 15: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/15.jpg)
Why Serial?
Parallelization is not that efficient in hardware.
Table: Synthesis results of the Deoxys-I-128 implementation using TSMC65nm technology with x4 parallelization
Impl. Area Max. Freq. Throughput Efficiency(KGE) (MHz) (Mbps) (Mbps/KGE)
[KCP17] 59.53 847 7,227 121.40ATHENa 53.37 549 4,684 87.76
15 / 33
![Page 16: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/16.jpg)
Why Serial?
▶ PFB: designed independently by Naito and Sugawara [NS19]around the same time of our work.
▶ It has partial parallelization for encryption only.▶ Doesn’t work for authentication and decryption.▶ Makes the feedback function and padding complicated.
16 / 33
![Page 17: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/17.jpg)
Why Skinny?
▶ Cheap.▶ Wide variety of trade-offs.▶ Easy to mask.▶ Well analyzed with large security margin.▶ Large tweakey space.▶ Designed and supported by a strong team of researchers
providing different implementations.
17 / 33
![Page 18: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/18.jpg)
Skinny SBoxMSB LSB
MSB LSB
18 / 33
![Page 19: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/19.jpg)
Skinny Tweakey Schedule: Where the magic happens
Utilize the fully linear tweakey scheduling, mostly routing andrenaming bytes▶ Reverse tweakey schedule at the end of every TBC call,
instead of keeping input▶ Very low area, only 67 XOR gates! including both key
correction and block counter.▶ If we were to maintain tweakey state (due to modes/TBC), at
least 320 FFs
19 / 33
![Page 20: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/20.jpg)
Performance Trade-offs
Lightweight core is suitable to unroll, excellent trade-off▶ Speeding up ×2 by two-round unrolling : ≈ + 1,000 GEs, +
20 % of total area
20 / 33
![Page 21: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/21.jpg)
Why LFSR Counter?
▶ Counters are the bottle neck of TBC based designs.▶ A 50 bit arithmetic counter costs ∼ 600 GEs, with depth =
100.▶ An LFSR counter costs 7 ∼ 15 GEs.
21 / 33
![Page 22: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/22.jpg)
Why ρ?
Possible feedback types:▶ Plaintext Feedback.▶ Combined Feedback.▶ Hybrid Feedback.
22 / 33
![Page 23: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/23.jpg)
Is it all about the gate count?
Hybrid Feedback (e.g. HyENA, mixFeed): n XORs.
EKaEKb
EKc
M0
C0
M1
C1
M2
C2
M3
C3
δM
T
128 XORs. 32 XORs and 32 MUXes for a 32-bit bus.
23 / 33
![Page 24: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/24.jpg)
Is it all about the gate count?▶ COFB Feedback: 288 XORs▶ In order to serialize, we need 32 Flip Flops and 32 MUXes.
S ρ
X
Y
S′ S
X
Y
S′
G
1. S0 = S1
2. S1 = S2
3. S2 = S3
4. S3 = S3 ⊕ S0
1. T← S0,S0 ← S1
2. S1 = S2
3. S2 = S3
4. S3 = S3 ⊕ T24 / 33
![Page 25: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/25.jpg)
G goal: 0 XORs
▶ Impossible!!▶ What about 1? Possible, yet needs extra flip flops and MUXes
to serialize.▶ What about 1 XORs for byte serial, 4 for 32-bit bus, 16 in
total? Romulus Feedback.
25 / 33
![Page 26: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/26.jpg)
ρ Feedback Function
Simple operation defined over bytes▶ Each input byte affects one and only one
byte.▶ Rotation happens within the same byte.▶ Computation is on the fly even for 8-bit
buses.
Choice of G1. S0 = G4(S0)
2. S1 = G4(S1)
3. S2 = G4(S2)
4. S3 = G4(S3)
S ρ
X
Y
S′ S
X
Y
S′
G
G4 =
Gs 0 0 00 Gs 0 00 0 Gs 00 0 0 Gs
26 / 33
![Page 27: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/27.jpg)
Romulus M-variants
Romulus-AD
A M N M N
T C
Romulus-ENC
▶ (Fully) Nonce-misuse-resistance via SIV [RS06].▶ Greatly shares Romulus-N components (easy to implement
both using the same ciruit).▶ 1.5 Rate only (3 TBC calls to process two blocks).
27 / 33
![Page 28: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/28.jpg)
Last pieces of the puzzle
▶ Cryptographers often neglect the cost of the control logic andpadding, which is drastic for LWC.
▶ Kumar [KHK+17] showed that a naive/referenceimplementation of the CAESAR Hardware API requires 4KGE, almost as much as the lightweight scheme itself.
▶ Our goals for the FSM: simple logic, simple padding, low areaand low power.
28 / 33
![Page 29: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/29.jpg)
Padding.
For X ∈ {0, 1}≤128 let
padl(X) ={
X if |X| = 128,X ∥ 0l−|X|−8 ∥ len(X), if 0 ≤ |X| < 128,
▶ A bus of width w needs only w + 4 MUXes for padding, thelength is already stored in the FSM so no decoding required.
▶ 10∗ padding requires n8 bit decoder and 9w
8 MUXes.▶ Security is exactly the same.
29 / 33
![Page 30: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/30.jpg)
FSM Optimization
qastart
qb
qd
qc
qe
qf qg qh
qi
30 / 33
![Page 31: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/31.jpg)
FSM Optimization
qastart
qb
qd
qc
qe
X = αi
31 / 33
![Page 32: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/32.jpg)
Current Design Corners for Romulus-N1 on TSMC 65nm
Arch. Area Power Energy Throughput(GE) (mW) (Enc/Auth) (pJ) (Enc/Auth only) (Gbps)
R1 5772 0.2 24/13 4.2/8R2 6635 0.25 16/9 8/14.2R4 8740 0.32 13.8/8.5 8.8/14.5R8 12990 0.45 21.1/14.4 7.3/10.6S1 3318 0.15 489/247.5 0.131/0.259P1 8048 0.28 36/19.2 4.2/8PS 5154 0.21 772/391 0.131/0.259
32 / 33
![Page 33: Lightweight TBC-Based Modes for Small Hardware …](https://reader030.vdocuments.us/reader030/viewer/2022012601/6197de5739a8d3524b0db300/html5/thumbnails/33.jpg)
Comments and future work
▶ Romulus is not the limit: Remus-N2 acheives the samebit-security for smaller state and and smaller TBC, but underdifferent assumptions (ICM instead of TPRP).
▶ Focus on low power/energy implementations.▶ Study the design space of the threshold implementations more
closely.▶ Experiments on Romulus-M.▶ Architectures to combine Romulus with TBC-based hash
functions.▶ Higher level Side-Channel protection solutions.
Thank you!
33 / 33