rns modular computations for cryptographic applications...algo. emm emw our algorithm 5n2 + 17n 3n2...
TRANSCRIPT
RNS Modular Computationsfor Cryptographic Applications
Karim Bigou
CNRS – IRISA – CAIRN
RAIM 2015: April 7 – 9
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 1 / 23
Context
One objective of our research group:
Design efficient implementations of asymmetric cryptography using fastarithmetic techniques
Examples of targetted cryptosystems:
RSA [RSA78]
Discrete Logarithm Cryptosystems: Diffie-Hellman [DH76] (DH),ElGamal [Elg85]
Elliptic Curve Cryptography (ECC) [Mil85] [Kob87]
The residue number system (RNS) is a representation which enables fastcomputations for cryptosystems requiring large integers (or FP elements)
Objective of my PhD: exploit RNS properties to speed up cryptographiccomputations
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 2 / 23
Residue Number System (RNS) [SV55] [Gar59]
X a large integer of ` bits (` ≈ 160–4096) is represented by:−→X = (x1, . . . , xn) = (X mod m1, . . . ,X mod mn)
channel 1
±×mod m1
w
z1
w
y1
w
x1
channel 2
±×mod m2
w
z2
w
y2
w
x2 . . .. . .
. . .
. . .
channel n
±×mod mn
w
zn
w
yn
w
xnX
Y
Z
RNS base B = (m1, . . . ,mn), n pairwise co-primes of w bits, n × w > `The Chinese remainder theorem (CRT) is the base of RNS
Note: an EMM is a w -bit elementary modular multiplication (one channel)
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 3 / 23
RNS Properties
Pros:
Carry free between channels
each channel is independant
Fast parallel +, −, × and some exact divisions
computations over all channels can be performed in parallel
an RNS multiplication requires n EMMs
Non-positional number system
randomization of internal computations (SCA countermeasures)
Flexibility for hardware implementations
the number of hardware channels and theoretical channels can bedifferentvarious area/ time trade-offs and multi-size support
Cons:
comparison, modular reduction and division are much harder
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 4 / 23
RNS Uses in Cryptography
In cryptography, RNS has been used for implementations of:
RSA and Discrete Logarithm (DL, Diffie-Hellman and Elgamal)e.g. [NMSK01, Gui11, BEG13, PITM13, SS14]
Elliptic curve cryptography (ECC) e.g. [SG08, Gui10, ESJ+13, BM14]
Pairings in large characteristic e.g. [CDF+11, YFCV12]
Lattice based cryptography e.g. [BEMP14]
Over various platforms, as:
FPGA circuits Xilinx and Alterae.g. [Gui10, CDF+11, ESJ+13, BM14]
ASIC circuits e.g. [GLP+12, BEG13]
GPU e.g. [SG08, ABS12]
CPU e.g. [LP07, LPL09]
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 5 / 23
RNS Montgomery Reduction (MR) [PP95]
Input:−→X ,−→X ′ with X < αP2 < PM and 3P < M ′
Output: (−→ω ,−→ω ′) with ω ≡ X ×M−1 mod P0 6 ω < 3P
−→Q ←−
−→X × (−
−→P −1) (in base B)
−→Q ′ ←−BE(
−→Q ,B,B′)
−→S ′ ←−
−→X ′ +
−→Q ′ ×
−→P ′ (in base B′)
−→ω ′ ←−−→S ′ ×
−→M−1 (in base B′)
−→ω ←−BE(−→ω ′,B′,B)
B B′ו
•×+ו
•
BE
BE
α is a parameter chosen to speed up some computations, M =∏n
i=1 mi
MR cost: 2 n2 + O(n) EMMs
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 6 / 23
Typical RNS Computation Flow
±× over one channel over one RNS vector(i.e. n channels)
base extension modulo P in RNS
1 n
time
n
±×
±×
±×
±×
•
••
±×
±×
±×
±×
•
••
±×
±×
±×
±×
•
••
±×
±×
±×
±×
•
••
• • •
• • •
• • •
• • •
±×
±×
±×
±×
•
••
±×
±×
±×
±×
•
••
±×
±×
±×
±×
•
••
±×
±×
±×
±×
•
••
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 7 / 23
Selection of State-of-art Cryptography with RNS
ref. conf./journ. yy usage implem.
[BDK98] IEEE TC 98 RSA N[KKSS00] EuroCrypt 00 RSA N[NMSK01] CHES 01 RSA A 250 nm[CNPQ03] MWSCAS 03 RSA F Virtex 2
[BI04] IEEE TC 04 RSA N[MPS07] IMA CC 07 RSA G 7800GTX[LP07] ASSC 07 RSA P Xtensa[SG08] CHES 08 RSA, ECC G 8800GTS
[SFM+09] IEEE TCAS I 09 ECC F Virtex E[LPL09] TENCON 09 ECC P Xtensa[Gui10] CHES 10 ECC F Stratix I & II
[CDF+11] CHES 11 Pair. F Virtex 6, Stratix 3, Cyclone 2[GLP+12] IEEE TC 12 RSA A 45 nm[ABS12] Comp. J. 12 ECC G 285GTX[BEG13] ARITH 13 RSA A 250 nm
[PITM13] DSD 13 RSA F Spartan 3[ESJ+13] IEEE TVLSI 13 ECC F VirtexE, Virtex 2 Pro, Stratix II
[SS14] IEEE TCAS I 14 RSA F Virtex 6 Virtex 2[BM14] CARDIS 14 ECC F Kintex 7
[SGXYC14] ISIC 14 RSA F Virtex 5
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 8 / 23
Cox-Rower RNS Architecture [KKSS00, Gui10]
channel 1
rower 1
w
w
channel 2
rower 2
w
w
. . .
channel n
rower n
w
w
cox
. . .
1
t
w
w
Output
Input
n× w
w w w w w w
CTRL
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 9 / 23
How to Speed Up RNS Computations for Cryptography?
Two main ideas to reduce the impact of modular reductions:
Reduce the cost of modular reduction in specific contexts, forinstance:
rearranging computations in an ECC context [Gui10]
rearranging computations in RSA exponentiation context [GLP+12]
using optimizations for several usual computation patterns [BT14]
Reduce the number of modular reductions, for instance:
computing pattern of the form AB + CD mod P in ECCformulas [BDE13]
computing the modular inversion with our PM-MI algorithm in an ECCcontext [BT13]
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 10 / 23
Fast Patterns for RNS Computations
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 11 / 23
Improving Modular Computations
RNS modular multiplication MM is the most costly operation in RNScryptographic applications (ECC, RSA, DL)
Two different multiplications:
simple RNS multiplication : n EMMs
MM = simple RNS multiplication + MR : 2n2 + O(n) EMMs
Goal: accelerate some specific, but usual, computation patterns which usesRNS modular multiplications
Examples:
modular squares
modular multiplication by constants
more complex patterns with operands reuse
In state-of-the-art, RNS do not support accelerations for these patterns(except accelerations inside channels)
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 12 / 23
Proposed Modular Multiplication
A first approach has been proposed in our work [BT14] with a newmodular multiplication algorithm
Idea:
Split operands into 2 parts and introduce partial-reductions
only 32n moduli required vs 2n (3 half-bases Ba,Bb,Bc of n/2)
the Split of X can be performed once for all reuses of X
Constraint:
It requires an hypothesis on P: not possible for RSA but possible forECC and discrete logarithm
The constraint is µP + 1 = Ma × D with µ small
µP − 1 = Ma × D is also possible (for discrete logarithm)
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 13 / 23
Decomposition with Split Algorithm
Input:−−−→Xa|b|c
Precomp.:−−−−−−−→(M−1
a
)b|c
Output:−−−−−−→(Kx )a|b|c ,
−−−−−−→(Rx )a|b|c ,
−−−→Xa|b|c =
−−−−−−→(Kx )a|b|c ×
−−−−−−→(Ma)a|b|c +
−−−−−−→(Rx )a|b|c
−−−−−→(Rx )b|c ← BE
(−−−→(Rx )a ,Ba,Bb|c
)−−−−−→(Kx )b|c ←
(−−→Xb|c −
−−−−−→(Rx )b|c
)×−−−−−−−→(M−1
a
)b|c
if−−−−−→(Kx )b|c =
−→−1 then
−−−−−→(Kx )b|c ←
−→0 /* Kawamura BE correction */
−−−−−→(Rx )b|c ←
−−−−−→(Rx )b|c −
−−−−−→(Ma)b|c−−−→
(Kx )a ← BE(−−−−→
(Kx )b ,Bb,Ba
)return
−−−−−−→(Kx )a|b|c ,
−−−−−−→(Rx )a|b|c
Note: the cost of Split is dominated by the 2 BEs (the first one is largerthan the second one)
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 14 / 23
Proposed Modular Multiplication Algorithm
Input: X ,Y < αP
Precomp.: D = |M−1a |P
Output:−−−→Va|b|c with V ≡
∣∣XYM−1a M−1
b
∣∣P
and V < αP
(−−−−−−→(Kx )a|b|c ,
−−−−−−→(Rx )a|b|c )← Split(
−−−→Xa,b,c )
(−−−−−−→(Ky )a|b|c ,
−−−−−−→(Ry )a|b|c )← Split(
−−−→Ya,b,c )
−−−→Ua|b|c ←PR
(−−−−−−→(Kx )a|b|c ,
−−−−−−→(Rx )a|b|c ,
−−−−−−→(Ky )a|b|c ,
−−−−−−→(Ry )a|b|c ,D
)−−−→Va|b|c ← MR(
−→Ub ,−−→Ua|c )
return−−−→Va|b|c
PR computes U = KxKyMa + (KxRy + KyRx ) + RxRyD ≡ |X Y D|PThe final MR is performed on 3n/2 moduli instead of 2n .
Example: inputs of 258 bits → 516 bits product → 388 bits partialreduction → 258 bits output
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 15 / 23
baseextension
(BE)
computation
sin
1base
SPLIT PR MR
baseB a
Xa
Ya
Ua
Kx
Ky
Ry = Ya
Rx = Xa
Qa Sa
baseB b
Xb
Yb
Rx Kx
Ry Ky
Ub Qb Sb
baseB c
Xc
Yc
Rx Kx
Ry Ky
Uc Qc Sc
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 16 / 23
Theoretical Performance Comparison
For different small and usual patterns we have :
Operations AB mod P A2 mod P Cst × A mod P
MM [EMM] 2n2 + 4n 2n2 + 4n 2n2 + 4n
SPRR [EMM] 2.5n2 + 12.5n 1.75n2 + 10.5n 1.75n2 + 7n
More, precomputations are reduced by 25 %
Results for exponentiation for discrete logarithm (e.g. Diffie-Hellman)
0.70.80.91.01.11.2
10 20 30 40 50 60 70
Our
/ R
ef
n
EMM Expo. Montg.
State-of-art reference (Ref):[GLP+12]Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 17 / 23
A Specific Fast Pattern
The cost of some patterns can be reduced without constraint on the fieldcharacteristic, for instance in the following algorithm [Gor98] :
Input: k = (k`−1, . . . , k1, k0)2, G ∈ Z/PZOutput: G k mod PS ← 1for i from `− 1 to 0 do
S ← S2 mod Pif ki = 1 then S ← S · G mod P
return S
One can observe:
S2G ≡(K 2
s M2a + 2KsRsMa + R2
s
)G mod P
≡ K 2s |M2
aG |P + KsRs |2MaG |P + R2s |G |P mod P
≡ Ks
(Ks |M2
aG |P + Rs |2MaG |P)
+ R2s |G |P mod P
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 18 / 23
A Specific Fast Pattern
Values |M2aG |P , |2MaG |P and |G |P can be precomputed
We choose Ba with n/2 moduli of w bits then Ks and Rs are `/2-bitvalues (i.e. the same size as
√P)
If U2 = Ks
(Ks |M2
aG |P + Rs |2MaG |P)
+ R2s |G |P then log2 U2 ≈ 2` i.e.
U2 is a partially reduced value
Finally, we use the state-of-the-art MR to finish the modular reduction
The total cost of |S2G |P :
the Split mainly costs n2 EMMs . . .
and the final MR mainly costs 2n2 EMMs . . .
leading to 3n2 EMMs
The same pattern is computed with 4n2 EMMs in state-of-the-art of RNS
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 19 / 23
Exponentiation Cost
Average values for 2 bits of key (one 1 and one 0):
Algo. EMM EMW
Our algorithm 5n2 + 17n 3n2 + 20n
RNS-ME [GLP+12] 6n2 + 12n 2n2 + 10n
Our algorithm (regular) 6n2 + 26n 3n2 + 26n
Regular RNS-ME [GLP+12] 8n2 + 16n 2n2 + 10n
0.7
0.8
0.9
1.0
1.1
1.2
10 20 30 40 50 60 70n
Our / RNS−ME [EMM]Our (Regular) / Regular RNS−ME [EMM]
Note: Our regular algorithm involves only 3.4 % overhead for RSA 2048compared to RNS-ME [GLP+12] in EMMs
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 20 / 23
Conclusion
Our proposed RNS modular multiplication:
reduces by 25 % the number of precomputations stored
reduces the number of EMMs up to 10 % for large cryptographicparameters
Our proposed modular exponentiation:
reduces the number of EMMs up to 15 % for the non regular algorithm
reduces the number of EMMs up to 22 % for the regular version
Future works:
implementations of the propositions in full cryptosystems
time×area trade-off explorations
analysis of other patterns
analysis of the use of this pattern in other cryptosystems (e.g. ECC)
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 21 / 23
General Conclusion
RNS is interesting thanks to several natural property (e.g.parallelism)
for new applications? (e.g. lattice based cryptography [BEMP14])
the relative costs between the different operations are not the same inRNS compared to the usual binary system
We have to count differently: e.g. in [BDE13] one has point ADDfaster than DBL!
there is still a lot of work to reach the maximum benefit of RNS forcryptographic applications
choice of parameters (e.g. moduli, curve parameters . . . )(CHES 2015 submission)new algorithmsnew architectures
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 22 / 23
Thank you for your attention
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 23 / 23
References I
[ABS12] S. Antao, J.-C. Bajard, and L. Sousa.RNS-based elliptic curve point multiplication for massive parallel architectures.The Computer Journal, 55(5):629–647, May 2012.
[BDE13] J.-C. Bajard, S. Duquesne, and M. D. Ercegovac.Combining leak-resistant arithmetic for elliptic curves defined over Fp and RNSrepresentation.Publications Mathematiques UFR Sciences Techniques Besancon, pages 67–87,2013.
[BDK98] J.-C. Bajard, L.-S. Didier, and P. Kornerup.An RNS Montgomery modular multiplication algorithm.IEEE Transactions on Computers, 47(7):766–776, July 1998.
[BEG13] J.-C. Bajard, J. Eynard, and F. Gandino.Fault detection in RNS Montgomery modular multiplication.In Proc. 21th Symposium on Computer Arithmetic (ARITH), pages 119–126.IEEE, April 2013.
[BEMP14] J.-C. Bajard, J. Eynard, N. Merkiche, and T. Plantard.Babaı round-off CVP method in RNS: Application to lattice based cryptographicprotocols.In Proc. 14th International Symposium on Integrated Circuits (ISIC), pages440–443. IEEE, December 2014.
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 24 / 23
References II
[BI04] J.-C. Bajard and L. Imbert.A full RNS implementation of RSA.IEEE Transactions on Computers, 53(6):769–774, June 2004.
[BM14] J.-C. Bajard and N. Merkiche.Double level Montgomery Cox-Rower architecture, new bounds.In Proc. 13th Smart Card Research and Advanced Application Conference(CARDIS), LNCS. Springer, November 2014.
[BT13] K. Bigou and A. Tisserand.Improving modular inversion in RNS using the plus-minus method.In Proc. 15th Cryptographic Hardware and Embedded Systems (CHES), volume8086 of LNCS, pages 233–249. Springer, August 2013.
[BT14] K. Bigou and A. Tisserand.RNS modular multiplication through reduced base extensions.In Proc. 25th IEEE International Conference on Application-specific Systems,Architectures and Processors (ASAP), pages 57–62. IEEE, June 2014.
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 25 / 23
References III
[CDF+11] R. C. C. Cheung, S. Duquesne, J. Fan, N. Guillermin, I. Verbauwhede, and G. X.Yao.FPGA implementation of pairings using residue number system and lazy reduction.In Proc. 13th Cryptographic Hardware and Embedded Systems (CHES), volume6917 of LNCS, pages 421–441. Springer, September 2011.
[CNPQ03] M. Ciet, M. Neve, E. Peeters, and J.-J. Quisquater.Parallel FPGA implementation of RSA with residue number systems - canside-channel threats be avoided?In Proc. 46th Midwest Symposium on Circuits and Systems (MWSCAS), volume 2,pages 806–810. IEEE, December 2003.
[DH76] W. Diffie and M. E. Hellman.New directions in cryptography.IEEE Transactions on Information Theory, 22(6):644–654, November 1976.
[Elg85] T. Elgamal.A public key cryptosystem and a signature scheme based on discrete logarithms.IEEE Transactions on Information Theory, 31(4):469–472, July 1985.
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 26 / 23
References IV
[ESJ+13] M. Esmaeildoust, D. Schinianakis, H. Javashi, T. Stouraitis, and K. Navi.Efficient RNS implementation of elliptic curve point multiplication over GF(p).IEEE Transactions on Very Large Scale Integration (VLSI) Systems,21(8):1545–1549, August 2013.
[Gar59] H. L. Garner.The residue number system.IRE Transactions on Electronic Computers, EC-8(2):140–147, June 1959.
[GLP+12] F. Gandino, F. Lamberti, G. Paravati, J.-C. Bajard, and P. Montuschi.An algorithmic and architectural study on Montgomery exponentiation in RNS.IEEE Transactions on Computers, 61(8):1071–1083, August 2012.
[Gor98] D. M. Gordon.A survey of fast exponentiation methods.Journal of algorithms, 27(1):129–146, 1998.
[Gui10] N. Guillermin.A high speed coprocessor for elliptic curve scalar multiplications over Fp .In Proc. 12th Cryptographic Hardware and Embedded Systems (CHES), volume6225 of LNCS, pages 48–64. Springer, August 2010.
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 27 / 23
References V
[Gui11] N. Guillermin.A coprocessor for secure and high speed modular arithmetic.Technical Report 354, IACR Cryptology ePrint Archive, 2011.
[KKSS00] S. Kawamura, M. Koike, F. Sano, and A. Shimbo.Cox-Rower architecture for fast parallel Montgomery multiplication.In Proc. 19th International Conference on the Theory and Application ofCryptographic (EUROCRYPT), volume 1807 of LNCS, pages 523–538. Springer,May 2000.
[Kob87] N. Koblitz.Elliptic curve cryptosystems.Mathematics of computation, 48(177):203–209, 1987.
[LP07] Z. Lim and B. J. Phillips.An RNS-enhanced microprocessor implementation of public key cryptography.In Proc. 41th Asilomar Conference on Signals, Systems and Computers, pages1430–1434. IEEE, November 2007.
[LPL09] Z. Lim, B. J. Phillips, and M. Liebelt.Elliptic curve digital signature algorithm over GF(p) on a residue number systemenabled microprocessor.In Proc. IEEE Region 10 Conference (TENCON), pages 1–6, January 2009.
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 28 / 23
References VI
[Mil85] V. Miller.Use of elliptic curves in cryptography.In Proc. 5th International Cryptology Conference (CRYPTO), volume 218 ofLNCS, pages 417–426. Springer, 1985.
[MPS07] A. Moss, D. Page, and N. P. Smart.Toward acceleration of RSA using 3D graphics hardware.In Proc. 11th IMA International Conference on Cryptography and Coding, pages364–383. Springer, December 2007.
[NMSK01] H. Nozaki, M. Motoyama, A. Shimbo, and S. Kawamura.Implementation of RSA algorithm based on RNS Montgomery multiplication.In Proc. 3rd Cryptographic Hardware and Embedded Systems (CHES), volume2162 of LNCS, pages 364–376. Springer, May 2001.
[PITM13] G. Perin, L. Imbert, L. Torres, and P. Maurine.Electromagnetic analysis on RSA algorithm based on RNS.In Proc. 16th Euromicro Conference on Digital System Design (DSD), pages345–352. IEEE, September 2013.
[PP95] K. C. Posch and R. Posch.Modulo reduction in residue number systems.IEEE Transactions on Parallel and Distributed Systems, 6(5):449–454, May 1995.
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 29 / 23
References VII
[RSA78] R. L. Rivest, A. Shamir, and L. Adleman.A method for obtaining digital signatures and public-key cryptosystems.Communications of the ACM, 21(2):120–126, February 1978.
[SFM+09] D. M. Schinianaki, A. P. Fournaris, H. E. Michail, A. P. Kakarountas, andT. Stouraitis.An RNS implementation of an Fp elliptic curve point multiplier.IEEE Transactions on Circuits and Systems I: Regular Papers, 56(6):1202–1213,June 2009.
[SG08] R. Szerwinski and T. Guneysu.Exploiting the power of GPUs for asymmetric cryptography.In Proc. 10th International Workshop on Cryptographic Hardware and EmbeddedSystems (CHES), volume 5154 of LNCS, pages 79–99. Springer, August 2008.
[SGXYC14] M. Stottinger, Gavin G. X. Yao, and R. C. C. Cheung.Zero collision attack and its countermeasures on residue number system multipliers.
In 14th International Symposium on Integrated Circuits (ISIC), pages 30–33. IEEE,2014.
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 30 / 23
References VIII
[SS14] D. Schinianakis and T. Stouraitis.Multifunction residue architectures for cryptography.IEEE Transactions on Circuits and Systems I: Regular Papers, 61(4):1156–1169,April 2014.
[SV55] A. Svoboda and M. Valach.Operatorove obvody (operator circuits in czech).Stroje na Zpracovanı Informacı (Information Processing Machines), 3:247–296,1955.
[YFCV12] G. X. Yao, J. Fan, R. C. C. Cheung, and I. Verbauwhede.Faster pairing coprocessor architecture.In Proc. 5th Pairing-Based Cryptography (Pairing), volume 7708 of LNCS, pages160–176. Springer, May 2012.
Karim Bigou RNS for Asymmetric Cryptography RAIM 2015: April 7 – 9 31 / 23