ece 645 spring 2007 project 2 specification. topic options
TRANSCRIPT
ECE 645Spring 2007
PROJECT 2Specification
Topic Options
Public Key (Asymmetric) Cryptosystems
Public key of Bob - KBPrivate key of Bob - kB
Alice Bob
Network
Encryption Decryption
RSA as a trap-door one-way function
M C = f(M) = Me mod N C
M = f-1(C) = Cd mod N
PUBLIC KEY
PRIVATE KEY
N = P Q P, Q - large prime numbers
e d 1 mod ((P-1)(Q-1))
RSA keys
PUBLIC KEY PRIVATE KEY
{ e, N } { d, P, Q }
N = P Q
e d 1 mod ((P-1)(Q-1))
P, Q - large prime numbers
Early Factoring Device – Lehmer SieveBicycle chain sieve [D. H. Lehmer, 1928]
Computer Museum, Mountain View, CA
Supercomputer Cray-1 from 1980’s
Computer Museum, Mountain View, CA
FPGA based supercomputers
Machine Released
SRC 6 fromSRC Computers
Cray XD1 fromfrom Cray
SGI Altix fromSGI
SRC 7 fromSRC Computers, Inc,
2002
2005
2005
2006
Ruhr University, Bochum, University of Kiel, Germany, 2006
120 Spartan 3 FPGAsClock frequency 100 MHz
Cost: € 8980
COPACOBANA
Factoring 1024-bit RSA keysusing Number Field Sieve (NFS)
Polynomial Selection
Linear Algebra
Square Root
Relation Collection
Sieving
Cofactoring
200 bit
numbers & 350 bit Trial division
ECM, p-1 method, rho method
Topic 1
Trial Division Sieve
Topic 1: Trial Division Sieve (1)
Given:
Inputs:Variables:
1. Integers N1, N2, N3, .... each of the size of k-bitsConstants:2. Factor base = set of all primes smaller smaller than a certain bound B
= { p1=2, p2=3, p3=5, ... , pt ≤ B }
Parameters of interest: 4 ≤ k ≤ 512 3 ≤ B ≤ 105
Topic 1: Trial Division Sieve (2)Required:
Outputs:
For each integer Ni:
A list of primes from the factor base that divides Ni, and
the number of times each prime divides Ni.
For example if
Ni = p1e1 · p2
e2 · p3e3 · Mi,
where Mi is not divisible by any prime belonging to a factor base, thenthe output is
{p1, e1}, {p2, e2}, {p3, e3}
Topic 1: Trial Division Sieve (3)
Example:
Constants:k=10, B=5Factor base = {2, 3, 5}
Variables:
N1 = 408 = 23 · 3 · 17
N2 = 630 = 2 · 32 · 5 · 7
Outputs: {2, 3}, {3, 1} {2, 1}, {3, 2}, {5, 1}
Topic 1: Trial Division Sieve (4)
Optimization Criteria:
Maximum number of integers Ni fully processed per unitof time for a given k and B.
Topic 2
Greatest Common Divisor&
Multiplicative Inverse
Topic 2: Greatest Common Divisor and Multiplicative Inverse(2)
Given:
Inputs: a, N: k-bit integers; a < N
Outputs: y = gcd(a, N)
x = a-1 mod N i.e., integer 1 ≤ x < N, such that a x (mod N) = 1
Parameters of interest: 4 ≤ k ≤ 1024
Greatest common divisor
Greatest common divisor of a and b, denoted by gcd(a, b),
is the largest positive integer that divides both a and b.
d = gcd (a, b) iff 1) d | a and d | b 2) if c | a and c | b then c d
gcd (8, 44) =
gcd (-15, 65) =
gcd (45, 30) =
gcd (31, 15) =
gcd (121, 169) =
Quotient and remainder
Given integers a and n, n>0
! q, r Z such that
a = q n + r and 0 r < n
q – quotient
r – remainder (of a divided by n)
q = an = a div n
r = a - q n = a – an
n =
= a mod n
Euclid’s Algorithmfor computing gcd(a,b)
i
-2-1 0 1
…
t-1 t
ri
r-2 = max(a, b)r-1 = min(a, b)r0
r1
…
rt-1 = gcd(a, b)rt=0
qi
q-1
q0
q1
…
qt-1
qi = ri-1
ri
ri+1 = ri-1 - qi ri
ri+1 = ri-1 mod ri
Euclid’s AlgorithmExample: gcd(36, 126)
i
-2-1 0 1
ri
r-2 = max(a, b) =126r-1 = min(a, b) =36r0 = 18 = gcd(36, 126)r1 = 0
qi
q-1 = 3q0 = 2q1
qi = ri-1
ri
ri+1 = ri-1 - qi ri
ri+1 = ri-1 mod ri
Multiplicative inverse modulo n
The multiplicative inverse of a modulo n is an integer [!!!]
x such that
a x 1 (mod n)
The multiplicative inverse of a modulo n is denoted by
a-1 mod n (in some books a or a*).
According to this notation:
a a-1 1 (mod n)
Extended Euclid’s Algorithm (1)
i
-2-1 0 1
…
t-1 t
ri
r-2 = nr-1 = ar0
r1
…
rt-1
rt=0
xi
x-2=0x-1=1x0
x1
…
xt-1
xt
qi
q-1 = n/a q0
q1
…
qt-1
qi = ri-1
ri
ri+1 = ri-1 - qi ri
xi+1 = xi-1 - qi xi
yi+1 = yi-1 - qi yi
yi
y-2=1y-1=0y0
y1
…
yt-1
yt
ri = xi a + yi n
rt-1 = xt-1 a + yt-1 n
Extended Euclid’s Algorithm (2)
rt-1 = xt-1 a + yt-1 n
rt-1 = xt-1 a + yt-1 n xt-1 a (mod n)
If rt-1 = gcd (a, n) = 1 then
xt-1 a 1 (mod n)
and as a result
xt-1 = a-1 mod n
Extended Euclid’s Algorithmfor computing z = a-1 mod n
i
-2-1 0 1
…
t-1 t
ri
r-2 = nr-1 = ar0
r1
…
rt-1 = 1rt=0
xi
x-2=0x-1=1x0
x1
…
xt-1 = a-1 mod nxt = n
qi
q-1 = n/a q0
q1
…
qt-1
qi = ri-1
ri
ri+1 = ri-1 - qi ri
xi+1 = xi-1 - qi xi
If rt-1 1 the inverse does not existNote:
Extended Euclid’s AlgorithmExample z = 20-1 mod 117
i
-2-1 0 1 2 3 4
ri
r-2 = 117r-1 = 20r0 = 17r1 = 3r2 = 2r3 = 1r4 = 0
xi
x-2= 0x-1= 1x0 =-5x1 = 6x2 = -35x3 = 41 = 20-1 mod 117x4 = -117
qi
q-1 = 5q0 = 1q1 = 5 q2 = 1q3 = 2
qi = ri-1
ri
ri+1 = ri-1 - qi ri
xi+1 = xi-1 - qi xi
Check:20 41 mod 117 = 1
Topic 3
RSA Encryption & Decryptionwith
Montgomery Multipliers based on
Carry Save Adders
RSA as a trap-door one-way function
M C = f(M) = Me mod N C
M = f-1(C) = Cd mod N
PUBLIC KEY
PRIVATE KEY
N = P Q P, Q - large prime numbers
e d 1 mod ((P-1)(Q-1))
Right-to-left binary exponentiation
Left-to-right binary exponentiation
Exponentiation: Y = XE mod N
E = (eL-1, eL-2, …, e1, e0)2
Y = 1;S = X;for i=0 to L-1 { if (ei == 1) Y = Y S mod N; S = S2 mod N; }
Y = 1;for i=L-1 downto 0 { Y = Y2 mod N; if (ei == 1) Y = Y X mod N; }
Montgomery Modular Multiplication (1)
C = A B mod M
A
Integer domain Montgomery domain
A’ = A 2k mod M
B B’ = B 2k mod M
C’ = MP(A’, B’, M) = = A’ B’ 2-k mod M = = (A 2k) (B 2k) 2-k mod M = = A B 2k mod M
C’ = C 2k mod M C = A B
A, B, M – k-bit numbers
Montgomery Modular Multiplication (2)
A’ = MP(A, 22k mod M, M)
C = MP(C’, 1, M)
A A’
C C’
Montgomery Modular Multiplication (3)
x2n-1 x0. . . x1x2n-2 x2n-3 xn . . .
2k bits
X = A’B’
+ q0M
x2n-1 . . . x1x2n-2 x2n-3 xn . . . 0
+ q1Mb
x2n-1 . . .x2n-2 x2n-3 00x2
. . . . . .
00. . .0C’
k bits
C’ 2k = X + zMC’ 2k X = A’B’
C’ A’B’ 2-k
Fast modular exponentiation using Chinese Remainder Theorem
=MPCP P
dP
mod =MQCQ Q
dQ
mod
CP = C mod PdP = d mod (P-1)
CQ = C mod QdQ = d mod (Q-1)
= modCM
d
N
M = MP ·RQ + MQ ·RP mod Nwhere
RP = (P-1 mod Q) ·P = PQ-1 mod N
RQ = (Q-1 mod P) ·Q= QP-1 mod N
Time of exponentiationwithout and with Chinese Remainder Theorem
SOFTWARE
HARDWARE
Without CRT
With CRT
tEXP(k) = cs k3
tEXP-CRT(k) 2 cs ( )3 = tEXP(k)14
Without CRT
With CRT
tEXP(k) = ch k2
tEXP-CRT(k) ch ( )2 = tEXP(k)14
k2
k2
Topic 4
RSA Encryption & Decryptionwith
Word-Based Montgomery Multipliers
Data dependency graph of a classical architecture by Tenca & Koc
Data dependency graph of a new design from GWU & GMU
Block diagram of the new architecture
Block diagram of the main Processing Element
Topic 5
p-1 Method of Factoring
p-1 algorithm
Inputs :
N – number to be factored
a – arbitrary integer such that gcd(a, N)=1
B1 – smoothness bound for Phase1
Outputs:
q - factor of N, 1 < q ≤ N
or FAIL
p-1 algorithm – Phase 1
1
1
0
0
1: such that - consecutive primes
- largest exponent such that
2: mod
3: gcd( 1, )
4 : if 1
5: return
i
i
i
ei ip
ei i
k
k p p B
e p B
q a N
q q N
q
q
(factor of )
6: else
7: go to Phase 2
8: end if
N
precomputations
postcomputations
main computations
out of scope for this project
p-1 Phase 1 – Numerical example
N = 1 740 719 = 1279·1361
a = 2
B1 = 20k = 24·32·5·7·11·13·17·19 = 232 792 560
q0=ak mod N = 2232 792 560 mod 1 740 719 = 1 003 058
q = gcd (1 003 058 1; 1 740 719) = 1361
Why did the method work?
q-1 = 1360 = 2·5·17 | k
ak mod q = a(q-1)·m mod q = 1
q | ak-1
Design MethodologyOptions
by Mike BabstDSPlogic
Methodology 1
RTL VHDL
Classical VHDL-basedDesign Methdology
Structure of a Typical Digital System
Execution Unit
(Datapath)
Control Unit
(Control)
Data Inputs
Data Outputs
Control Inputs
Control Outputs
Control Signals
Hardware Design with RTL VHDL
Pseudocode
Execution Unit Control Unit
Block
diagram
Block
diagramASM
VHDL code VHDL code VHDL code
Interface
Steps of the Design Process
1. Text description2. Interface3. Pseudocode4. Block diagram of the Execution Unit5. Interface with the division into Execution Unit and Control Unit6. ASM chart and/or block diagram of the Control Unit7. RTL VHDL code8. Testbench9. Debugging10. Synthesis and implementation
11. Experimental testing (not required in this course)
Project 2 - Platform & tools
Target devices: Xilinx FPGAs
Tools:
VHDL Simulation: Aldec Active HDL or Xilinx ModelSimVHDL Synthesis: Synplify Pro or Xilinx XSTImplementation: Xilinx ISE or Xilinx WebPack
All tools available in S&T 2, rooms 203 & 265.Xilinx tools available for free for home use.
Aldec Active HDL student edition available for home use.
Methodology 2
Graphical Data Flow Language
DSPlogic RCToolbox
See the presentation byMike Babst, PhD
DSPlogicavailable through WebCT
Project 2 - Platform & toolsTarget devices: Xilinx FPGAs
Tools:
Design Entry & Debugging: DSPlogic RC Toolbox MathWorks Simulink MathWorks Matlab
Synthesis and Implementation: Xilinx System Generator Xilinx ISE
All tools available in S&T 2, room 220.
Two hands-on sessionsgiven by Dr. Babst
during the first two weeks afterthe selection of the project
Reconfigurable computerssupported by DSPlogic toolset
Machine Released
Cray XD1 fromfrom Cray
SGI Altix fromSGI
2005
2005
Interface
P memory
P memory
. . .
P P . . .
I/O Interface
FPGA memory
FPGA memory
. . .
FPGA FPGA . . .
I/O
Microprocessor system Reconfigurable system
What is a Reconfigurable Computer?
Methodology 3
HLL Compilers
Celoxica Handel C
Design Flow
Executable Specification
Handel-C
Synthesis
Place & Route
VHDL
EDIFEDIF
Handel-C / ANSI-C Comparisons
Preprocessorsie. #define
Structures
ANSI-C Constructsfor, while, if, switch
Functions
Arrays
Pointers
Arithmetic operators
Bitwise logical operators
Logical operators
ANSI-C Standard Library
Side Effectsie. X = Y++
Recursion
Floating Point
Handel-C Standard Library
Parallelism
Arbitrary width variables
RAM, ROM
SignalsChannels
Interfaces
Enhanced bit manipulation
ANSI-C HANDEL-C
Handel-C Language (1)
• A subset of ANSI-C
• Sequential software style with a “par” construct to implement parallelism
• A channel “chan” statement allows for communication and synchronization between parallel branches
• Level of design abstraction is above RTL but below behavioral
Handel-C Language (2)
• Each assignment and delay statement take one clock cycle
• Automatic generation of the state machine from an algorithmic description of the circuit in terms of parallel and sequential blocks
• Automatic scheduling of parallel and sequential blocks, that is the code following a group is scheduled only after that whole group has completed
Handel-C Language (3)
• Automatic generation of clocks, clock enables and resets
• Combinational logic may be implemented using for example bus, port and signal types
• It is possible to design at a level where some Handel-C statements look similar to Verilog, but the overal program structure is different
Platform & tools – HLL CompilersTarget devices: Xilinx FPGAs
Tools:
Design Entry & Debugging: Celoxica DK4 Design Suite
(integrated environment providing Handel C compiler, debugging, simulation, and synthesis to EDIF and VHDL)Synthesis and Implementation:
Xilinx ISE
All tools available in S&T 2, rooms 203 & 265.
VHDL macro declaration in Handel-C
ENTITY parmult ISport ( clk: IN std_logic; a: IN std_logic_VECTOR(7 downto 0); b: IN std_logic_VECTOR(7 downto 0); q: OUT std_logic_VECTOR(15 downto 0));
END parmult;
interface parmult (unsigned 16 q) parmult_instance (unsigned 1 clk, unsigned 8 a, unsigned 2 b) with {busformat = "B(I)"};
unsigned 8 x1, x2;unsigned resultX;
interface parmult(unsigned 16 q)
parmult_instance1(unsigned 1 clk = __clock, unsigned 8 a = x1, unsigned 8 b = x2 )
with {busformat = "B(I)"};
VHDL macro instantiation in Handel-C
Celoxica RC10 board supporting Handel C librariesused in the GMU ECE 448 FPGA and ASIC Design with VHDL
Literature
Additional literature with the detailed
description of all algorithms available
for each project.
Project Organization
• 1-3 person teams allowed• 2 person teams preferred
by Friday midnight the latest
Please submit your - ranking of 4 topics - ranking of 3 design methodologies