lecture 7 montgomery multipliers & exponentiation units
TRANSCRIPT
![Page 1: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/1.jpg)
Lecture 7
Montgomery Multipliers& Exponentiation Units
![Page 2: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/2.jpg)
Motivation:
Public-key ciphers
![Page 3: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/3.jpg)
Secret-key (Symmetric) Cryptosystems
key of Alice and Bob - KABkey of Alice and Bob - KAB
Alice Bob
Network
Encryption Decryption
![Page 4: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/4.jpg)
Key Distribution Problem
N - UsersN · (N-1)
2Keys
Users Keys
100 5,000
1000 500,000
![Page 5: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/5.jpg)
Digital Signature Problem
Both corresponding sides have the same informationand are able to generate a signature
There is a possibility of the • receiver falsifying the message• sender denying that he/she sent the message
![Page 6: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/6.jpg)
Public Key (Asymmetric) Cryptosystems
Public key of Bob - KBPrivate key of Bob - kB
Alice Bob
Network
Encryption Decryption
![Page 7: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/7.jpg)
Message
Hash function
Public keycipher
AliceSignature
Alice’s private key
Bob
Hash function
Alice’s public key
Non-repudiation
Hash value 1
Hash value 2
Hash value
Public key cipher
yes no
Message Signature
![Page 8: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/8.jpg)
RSA as a trap-door one-way function
M C = f(M) = Me mod N C
M = f-1(C) = Cd mod N
PUBLIC KEY
PRIVATE KEY
N = P Q P, Q - large prime numbers
e d 1 mod ((P-1)(Q-1))
message ciphertext
![Page 9: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/9.jpg)
RSA keys
PUBLIC KEY PRIVATE KEY
{ e, N } { d, P, Q }
N = P Q
e d 1 mod ((P-1)(Q-1))
P, Q - large prime numbers
gcd(e, P-1) = 1 and gcd(e, Q-1) = 1
d:
P, Q:
N:
e:
![Page 10: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/10.jpg)
Mini-RSA keys
PUBLIC KEY PRIVATE KEY
{ e, N } { d, P, Q }
N = P Q = 55
3 d 1 mod 40
P = 5 Q = 11
gcd(e, 5-1) = 1 and gcd(e, 11-1) = 1
d:
P, Q:
N:
e: e=3
d=27
![Page 11: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/11.jpg)
Mini-RSA as a trap-door one-way function
M=2 C = f(2) = 23 mod 55 = 8 C=8
M = f-1(C) = 827 mod 55 = 2
PUBLIC KEY
PRIVATE KEY
N = 5 11 5, 11 - prime numbers
3 27 1 mod ((5-1)(11-1))
message ciphertext
![Page 12: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/12.jpg)
Basic Operations of RSA
Encryption
Decryption
ciphertext
= modplaintext public key modulus
public key exponent
plaintext
= mod
ciphertext private key modulus
private key exponent
k-bits k-bits k-bits
k-bits k-bits k-bits
L=k
L < k
C M
e
N
M C
d
N
![Page 13: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/13.jpg)
Modular arithmetic
![Page 14: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/14.jpg)
Quotient and remainder
Given integers a and n, n>0
! q, r Z such that
a = q n + r and 0 r < n
q – quotient
r – remainder (of a divided by n)
q = an = a div n
r = a - q n = a – an
n =
= a mod n
![Page 15: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/15.jpg)
32 mod 5 =
-32 mod 5 =
![Page 16: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/16.jpg)
Integers coungruent modulo n
Two integers a and b are congruent modulo n
(equivalent modulo n)
written a b
iff
a mod n = b mod n
or
a = b + kn, k Z
or
n | a - b
![Page 17: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/17.jpg)
Rules of addition, subtraction and multiplicationmodulo n
a + b mod n = ((a mod n) + (b mod n)) mod n
a - b mod n = ((a mod n) - (b mod n)) mod n
a b mod n = ((a mod n) (b mod n)) mod n
![Page 18: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/18.jpg)
9 · 13 mod 5 =
25 · 25 mod 26 =
![Page 19: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/19.jpg)
Laws of modular arithmetic
Modular addition
Modular multiplication
Regular addition
Regular multiplication
a+b = a+ciff
b=c
a+b a+c (mod n)iff
b c (mod n)
If a b = a c and a 0then b = c
If a b a c (mod n) and gcd (a, n) = 1then b c (mod n)
![Page 20: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/20.jpg)
Modular Multiplication: Example
18 42 (mod 8) 6 3 6 7 (mod 8)
3 7 (mod 8)
x
6 x mod 8
0 1 2 3 4 5 6 7
0 6 4 2 0 6 4 2
x
5 x mod 8
0 1 2 3 4 5 6 7
0 5 2 7 4 1 6 3
![Page 21: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/21.jpg)
Basic Modular Exponentiation
![Page 22: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/22.jpg)
How to perform exponentiation efficiently?
Problems:
Y = XE mod N = X X X X X … X X mod N
E-times
E may be in the range of 21024 10308
1. huge storage necessary to store XE before reduction
2. amount of computations infeasible to perform
Solutions:
1. modulo reduction after each multiplication2. clever algorithms
200 BC, India, “Chandah-Sûtra”
![Page 23: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/23.jpg)
Right-to-left binary exponentiation
Left-to-right binary exponentiation
Exponentiation: Y = XE mod N
E = (eL-1, eL-2, …, e1, e0)2
Y = 1;S = X;for i=0 to L-1 { if (ei == 1) Y = Y S mod N; S = S2 mod N; }
Y = 1;for i=L-1 downto 0 { Y = Y2 mod N; if (ei == 1) Y = Y X mod N; }
![Page 24: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/24.jpg)
Right-to-Left Binary Exponentiation in Hardware
MUL SQR
Y SE
output
X1
enable
![Page 25: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/25.jpg)
Left-to-Right Binary Exponentiation in Hardware
MUL
Y
E
output
X
1
ControlLogic
![Page 26: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/26.jpg)
Modular Multiplication
![Page 27: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/27.jpg)
Algorithms for Modular Multiplication
Multiplication
Modular Reduction
Multiplication combined withmodular reduction
• Montgomery algorithm
• Classical• Karatsuba• Schönhage-Strassen (FFT)
• Classical• Barrett• Selby-Mitchell
(k2)(klg 3)
(k ln(k))
(k2)
(k2)complexity same as multiplication used
(k2)
2
![Page 28: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/28.jpg)
Montgomery Multiplication
![Page 29: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/29.jpg)
Montgomery Modular Multiplication (1)
Z = X Y mod M
X
Integer domain Montgomery domain
X’ = X 2n mod M
Y Y’ = Y 2n mod M
Z’ = MP(X’, Y’, M) = = X’ Y’ 2-n mod M = = (X 2n) (Y 2n) 2-n mod M = = X Y 2n mod M
Z’ = Z 2n mod M Z = X Y mod M
X, Y, M – (n-1)-bit numbers
![Page 30: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/30.jpg)
Montgomery Modular Multiplication (2)
X’ = MP(X, 22n mod M, M) = = X 22n 2-n mod M = X 2n mod M
Z = MP(Z’, 1, M) = = (Z 2n) 1 2-n mod M = Z mod M = Z
X X’
Z Z’
![Page 31: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/31.jpg)
Basic version of the Radix-2Montgomery Multiplication Algorithm
![Page 32: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/32.jpg)
![Page 33: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/33.jpg)
Montgomery ProductS[0] = 0
S[i+1] =
Z = S[n]
S[i]+xiY2
S[i]+xiY + M2
if qi = S[i] + xiY mod 2= 0
if qi = S[i] + xiY mod 2= 1
for i=0 to n-1
M assumed to be odd
![Page 34: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/34.jpg)
Basic version of the Radix-2Montgomery Multiplication Algorithm
![Page 35: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/35.jpg)
Project 2 Rules
- Groups consisting of 2 students (preferred) or a single student (if needed)
- Each group works on different architectures
- Each group of two works on two similar architectures. Members of the group can freely exchange VHDL code and ideas with each other.
- Students working individually work on a single architecture. They must not exchange code with other students.
- Members of the group of two are graded jointly, unless they agree to split no later than two weeks before the Project deadline.
![Page 36: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/36.jpg)
Investigated Montgomery Multipliers
ScalableNon-Scalable
McIvor, et al.• based on 5-to-2 CSA• based on 4-to-2 CSA
Koc & Tenca• radix 2• radix 4
Huang, et al.• Architecture 2
Huang, et al.• Architecture 1
Harris, et al.• radix 2• radix 4
Suzuki• Virtex 5 DSP• Stratix III DSP
Savas et al.• radix 2• radix 4
G1 G2
G3
G4 G5
G6
![Page 37: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/37.jpg)
Investigated Montgomery Multipliers
ScalableNon-Scalable
• dedicated to one particular operand size
• operand size is described by a generic, and can be changed only after reconfiguration
• size of the circuit varies as a function of the operand size
• flexible, can handle multiple operand sizes
• operand size is described by a special input, and can be changed during run-time
• size of the circuit is constant
![Page 38: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/38.jpg)
Operand sizes:
Evaluated parameters:
Max. Clock Frequency [MHz]Min. Latency [clock cycles]Min. Latency [μs]Resource Utilization (CLB slices/ALUTs, DSP Units, Block Memories)Latency x Area [μs x CLB slices/ALUTs]
Assumptions (1)
![Page 39: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/39.jpg)
Project 2 Rules
- Montgomery Multiplier - required
- Montgomery Exponentiation Unit – bonus
- Virtex 5 and Stratix III – required
- Virtex 6 and Stratix IV - bonus
- 1024 and 2048 bit operand sizes required
- 3072 and 4096 bit operand sizes bonus
![Page 40: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/40.jpg)
• Uniform Interface (to be provided, but may need to be tweaked depending on the architecture)
• Test vectors generated using reference software implementation (may need to be extended to generate intermediate results)
• Your own testbench.
Assumptions (2)
![Page 41: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/41.jpg)
Montgomery Multipliersbased on Carry Save Adders
![Page 42: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/42.jpg)
Carry Save Adder (CSA)
FA
c2 s1
a0 b0
FA
c1 s0
FA
c3 s2
FA
cn sn-1 cn-1
. . .
c0
s3
a1 b1 c1a2 b2 c2an-1 bn-1 cn-1
![Page 43: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/43.jpg)
0 1 0 1 01 1 0 1 11 0 1 1 1
24 23 22 21 20
0 0 1 1 01 1 0 1 1
xyz
sc
Operation of a Carry Save Adder (CSA)
Example
x+y+z = s + c
![Page 44: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/44.jpg)
Carry-save adder for four operands
x3 x2 x1 x0
y3 y2 y1 y0
z3 z2 z1 z0
w3 w2 w1 w0
s3 s2 s1 s0
c4 c3 c2 c1
w3 w2 w1 w0
c4 s3 s2 s1 s0
c4 c3 c2 c1
’’’’’’’’
S5 S4 S3 S2 S1 S0
![Page 45: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/45.jpg)
Carry-save adder for four operands
s0s1s2s3 c1c2c3c4
s0s1s2s3 c1c2c3c4’’’’’’’’
![Page 46: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/46.jpg)
Carry-save adder for four operands
x y z
4 4 4
CSA
CSA
4
w
CPA
sc
s’c’
S
![Page 47: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/47.jpg)
Radix-2 Montgomery Multiplication with Carry Save Addition
![Page 48: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/48.jpg)
Carry Save Reduction 4-to-2
U+V+W+Y = S+C
![Page 49: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/49.jpg)
Radix-2 Montgomery MultiplierBased on Carry Save Reduction 4-to-2
![Page 50: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/50.jpg)
Montgomery Multipliersand Exponentiation Units
by Mc Ivor, et al.
![Page 51: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/51.jpg)
5-to-2 CSA
X1+X2+X3+X4+X5 = SUM + CARRY
![Page 52: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/52.jpg)
5-to-2 CSA Montgomery Multiplication
![Page 53: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/53.jpg)
On the fly calculation of Ai
based on the Carry Save Representation of A
A = A1 + A2
![Page 54: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/54.jpg)
Montgomery Exponentiation
![Page 55: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/55.jpg)
Montgomery Exponentiationbased on the 5-to-2 CSA Montgomery
Multiplier
![Page 56: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/56.jpg)
4-to-2 CSA
X1+X2+X3+X4 = SUM + CARRY
![Page 57: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/57.jpg)
4-to-2 CSA Montgomery Multiplication
![Page 58: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/58.jpg)
ScalableMontgomery Multipliers
by Koc & Tenca
![Page 59: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/59.jpg)
Classical Design by Tenca & KocCHES 1999
Multiple Word Radix-2 Montgomery Multiplication algorithm (MWR2MM)
Main ideas:
Use of short precision words (w-bit each):• Reduces broadcast problem in circuit implementation• Word-oriented algorithm provides the support needed to
develop scalable hardware units.
Operand Y(multiplicand) is scanned word-by-word, operand X(multiplier) is scanned bit-by-bit.
![Page 60: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/60.jpg)
X = (xn-1, …,x1,x0) Y = (Y(e-1),…,Y(1),Y(0))
M = (M(e-1),…,M(1),M(0))
The bits are marked with subscripts, andthe words are marked with superscripts.
Classical Design by Tenca & KocCHES 1999
Each operand has n bits e words e =
n+1
w
Each word has w bits
![Page 61: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/61.jpg)
MWR2MMMultiple Word Radix-2 Montgomery Multiplication
algorithm by Tenca and Koc
Task A
Task B
Task C
e-1 times
![Page 62: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/62.jpg)
Problem
w-1 0. . . .
2w-1 w. . . .
S(0)[0]
S(0)[1]w 1. . . .
x0 x1
w-1
1w-2
2w-2 w+1
2
Calculation dependent on x1 (xi+1 in general) can start only two clock cyclesafter the calculation dependent on x0 (xi in general)
S(1)[0]
S(2)[0] 3w-1 2w. . . .3w-2 2w+1
S(1)[1]2w w+1. . . .2w-1 w+2S(3)[0] 4w-1 3w. . . .4w-2 3w+1
![Page 63: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/63.jpg)
• One PE is in charge of the computation of one column that corresponds to the updating of S with respect to one single bit xi.
• The delay between two adjacent PEs is 2 clock cycles.
• The minimum computation time is
2•n+e-1 clock cycles • given
(e+1)/2 PEs
working in parallel.
Data Dependency Graph by Tenca & Koci=0
i=1
i=2
j=0
j=1
j=2
j=3
j=4
j=5
![Page 64: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/64.jpg)
Data Dependency Graph by Tenca & Koc
![Page 65: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/65.jpg)
Example of Operation ofthe Design by Tenca & Koc
Example of the computation executed for 5-bit operands with word-size w = 1 bit
- C
n = 5
w = 1e = 5
2n + e – 1 = 25 + 5 – 1 = 14 clock cycles
(e+1)/2 =(5+1)/2 = 3 PEs sufficient to perform all computations
![Page 66: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/66.jpg)
Example of Operation ofthe Design by Tenca & Koc
Example of the computation executed for 5-bit operands with word-size w = 1 bit
n = 5
w = 1e = 5
2PEs
![Page 67: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/67.jpg)
Pipelined Organization with Two Processing Elements
![Page 68: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/68.jpg)
Non-ScalableMontgomery Multiplier
by Huang et al.
![Page 69: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/69.jpg)
Main Idea of the New Architecture
• In the architecture of Tenca & Koc– w-1 least significant bits of partial
results S(j) are available one clock cycle before they are used
– only one (most significant) bit is missing
• Let us compute a new partial resultunder two assumptions regarding the value of the most significant bit of S(j) and choose the correct value one clock cycle later
![Page 70: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/70.jpg)
Idea for a Speed-up
w-1 0. . . . 2w-1 w. . . .
S(0) S(1)
0 1. . . .
x0
x1
w-1
1w-2 2w-2 w+1
1 1. . . . 2w-1
2
choose between the two possible results
using missing bit computed at the same time
perform two computationsin parallel using two possible
values of the most-significant-bit
![Page 71: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/71.jpg)
Primary Advantage of the New Approach
• Reduction in the number of clock cycles
from
2 n + e - 1
to
n + e – 1
• Minimum penalty in terms of the area and clock period
![Page 72: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/72.jpg)
Pseudocode of the Main Processing Element
![Page 73: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/73.jpg)
Main Processing ElementType E
![Page 74: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/74.jpg)
The Proposed Optimized Hardware Architecture
![Page 75: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/75.jpg)
The First and the Last Processing Elements
Type D Type F
![Page 76: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/76.jpg)
Data Dependency Graph of the Proposed New Architecture
PE#0 PE#1 PE#2 PE#3
![Page 77: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/77.jpg)
The Overall Computation Pattern
Tenca & Koc, CHES 1999Our new proposed
architecture
Special state of each PE vs. One special PE type simpler structure of each PE
![Page 78: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/78.jpg)
Demonstration of Computations• Sequential
S(0)S(1)S(2) ←X0S(e-1)
• Tenca & Koç’s proposal
PE#0
PE#1
PE#2
←X0
←X1
←X2
S(0)S(1)S(2)S(3)
S(0)S(1)S(2)
S(0)
S(4)
![Page 79: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/79.jpg)
Demonstration of Computations (cont.)
• The proposed optimized architecture
PE#0
PE#1
PE#2
PE#3
PE#(e-1)
S(0)S(0)S(0)S(0)
S(1)S(1)S(1)
S(2)S(2)
S(3)
S(e-1)
S(3)
S(2)
S(1)
S(0) ←X0
←X0
←X0
←X0
←X0
←X1
←X1
←X1
←X2
←X2
←Xe-1←X3
←Xe-3
←Xe-2
←Xe-4
![Page 80: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/80.jpg)
0.000.200.400.600.801.001.201.401.601.802.00
Normalized Latency
1024Operand size 2048 3072 4096
Huang et al. Tenca & Koc McIvor et al.
1.76
0.76
1.76
0.85
1.76
0.81
1.76
1.01
![Page 81: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/81.jpg)
0.000.200.400.600.801.001.201.401.601.802.00
Normalized Product Latency Times Area
1024Operand size 2048 3072 4096
Huang et al. Tenca & Koc McIvor et al.
1.66
1.14
1.64
1.28
1.63
1.21
1.631.55
![Page 82: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/82.jpg)
ScalableMontgomery Multiplier
by Huang et al.
![Page 83: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/83.jpg)
![Page 84: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/84.jpg)
![Page 85: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/85.jpg)
![Page 86: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/86.jpg)
![Page 87: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/87.jpg)
Computations for 5-bit operands using a) 3 PEs, b) 2 PEs
![Page 88: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/88.jpg)
Faster Modular Exponentiation
![Page 89: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/89.jpg)
![Page 90: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/90.jpg)
![Page 91: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/91.jpg)
![Page 92: Lecture 7 Montgomery Multipliers & Exponentiation Units](https://reader036.vdocuments.us/reader036/viewer/2022062301/56649e2d5503460f94b1c787/html5/thumbnails/92.jpg)