ece 645 spring 2007 project 2 specification. topic options

Post on 29-Jan-2016

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ECE 645Spring 2007

PROJECT 2Specification

Topic Options

Public Key (Asymmetric) Cryptosystems

Public key of Bob - KBPrivate key of Bob - kB

Alice Bob

Network

Encryption Decryption

RSA as a trap-door one-way function

M C = f(M) = Me mod N C

M = f-1(C) = Cd mod N

PUBLIC KEY

PRIVATE KEY

N = P Q P, Q - large prime numbers

e d 1 mod ((P-1)(Q-1))

RSA keys

PUBLIC KEY PRIVATE KEY

{ e, N } { d, P, Q }

N = P Q

e d 1 mod ((P-1)(Q-1))

P, Q - large prime numbers

Early Factoring Device – Lehmer SieveBicycle chain sieve [D. H. Lehmer, 1928]

Computer Museum, Mountain View, CA

Supercomputer Cray-1 from 1980’s

Computer Museum, Mountain View, CA

FPGA based supercomputers

Machine Released

SRC 6 fromSRC Computers

Cray XD1 fromfrom Cray

SGI Altix fromSGI

SRC 7 fromSRC Computers, Inc,

2002

2005

2005

2006

Ruhr University, Bochum, University of Kiel, Germany, 2006

120 Spartan 3 FPGAsClock frequency 100 MHz

Cost: € 8980

COPACOBANA

Factoring 1024-bit RSA keysusing Number Field Sieve (NFS)

Polynomial Selection

Linear Algebra

Square Root

Relation Collection

Sieving

Cofactoring

200 bit

numbers & 350 bit Trial division

ECM, p-1 method, rho method

Topic 1

Trial Division Sieve

Topic 1: Trial Division Sieve (1)

Given:

Inputs:Variables:

1. Integers N1, N2, N3, .... each of the size of k-bitsConstants:2. Factor base = set of all primes smaller smaller than a certain bound B

= { p1=2, p2=3, p3=5, ... , pt ≤ B }

Parameters of interest: 4 ≤ k ≤ 512 3 ≤ B ≤ 105

Topic 1: Trial Division Sieve (2)Required:

Outputs:

For each integer Ni:

A list of primes from the factor base that divides Ni, and

the number of times each prime divides Ni.

For example if

Ni = p1e1 · p2

e2 · p3e3 · Mi,

where Mi is not divisible by any prime belonging to a factor base, thenthe output is

{p1, e1}, {p2, e2}, {p3, e3}

Topic 1: Trial Division Sieve (3)

Example:

Constants:k=10, B=5Factor base = {2, 3, 5}

Variables:

N1 = 408 = 23 · 3 · 17

N2 = 630 = 2 · 32 · 5 · 7

Outputs: {2, 3}, {3, 1} {2, 1}, {3, 2}, {5, 1}

Topic 1: Trial Division Sieve (4)

Optimization Criteria:

Maximum number of integers Ni fully processed per unitof time for a given k and B.

Topic 2

Greatest Common Divisor&

Multiplicative Inverse

Topic 2: Greatest Common Divisor and Multiplicative Inverse(2)

Given:

Inputs: a, N: k-bit integers; a < N

Outputs: y = gcd(a, N)

x = a-1 mod N i.e., integer 1 ≤ x < N, such that a x (mod N) = 1

Parameters of interest: 4 ≤ k ≤ 1024

Greatest common divisor

Greatest common divisor of a and b, denoted by gcd(a, b),

is the largest positive integer that divides both a and b.

d = gcd (a, b) iff 1) d | a and d | b 2) if c | a and c | b then c d

gcd (8, 44) =

gcd (-15, 65) =

gcd (45, 30) =

gcd (31, 15) =

gcd (121, 169) =

Quotient and remainder

Given integers a and n, n>0

! q, r Z such that

a = q n + r and 0 r < n

q – quotient

r – remainder (of a divided by n)

q = an = a div n

r = a - q n = a – an

n =

= a mod n

Euclid’s Algorithmfor computing gcd(a,b)

i

-2-1 0 1

t-1 t

ri

r-2 = max(a, b)r-1 = min(a, b)r0

r1

rt-1 = gcd(a, b)rt=0

qi

q-1

q0

q1

qt-1

qi = ri-1

ri

ri+1 = ri-1 - qi ri

ri+1 = ri-1 mod ri

Euclid’s AlgorithmExample: gcd(36, 126)

i

-2-1 0 1

ri

r-2 = max(a, b) =126r-1 = min(a, b) =36r0 = 18 = gcd(36, 126)r1 = 0

qi

q-1 = 3q0 = 2q1

qi = ri-1

ri

ri+1 = ri-1 - qi ri

ri+1 = ri-1 mod ri

Multiplicative inverse modulo n

The multiplicative inverse of a modulo n is an integer [!!!]

x such that

a x 1 (mod n)

The multiplicative inverse of a modulo n is denoted by

a-1 mod n (in some books a or a*).

According to this notation:

a a-1 1 (mod n)

Extended Euclid’s Algorithm (1)

i

-2-1 0 1

t-1 t

ri

r-2 = nr-1 = ar0

r1

rt-1

rt=0

xi

x-2=0x-1=1x0

x1

xt-1

xt

qi

q-1 = n/a q0

q1

qt-1

qi = ri-1

ri

ri+1 = ri-1 - qi ri

xi+1 = xi-1 - qi xi

yi+1 = yi-1 - qi yi

yi

y-2=1y-1=0y0

y1

yt-1

yt

ri = xi a + yi n

rt-1 = xt-1 a + yt-1 n

Extended Euclid’s Algorithm (2)

rt-1 = xt-1 a + yt-1 n

rt-1 = xt-1 a + yt-1 n xt-1 a (mod n)

If rt-1 = gcd (a, n) = 1 then

xt-1 a 1 (mod n)

and as a result

xt-1 = a-1 mod n

Extended Euclid’s Algorithmfor computing z = a-1 mod n

i

-2-1 0 1

t-1 t

ri

r-2 = nr-1 = ar0

r1

rt-1 = 1rt=0

xi

x-2=0x-1=1x0

x1

xt-1 = a-1 mod nxt = n

qi

q-1 = n/a q0

q1

qt-1

qi = ri-1

ri

ri+1 = ri-1 - qi ri

xi+1 = xi-1 - qi xi

If rt-1 1 the inverse does not existNote:

Extended Euclid’s AlgorithmExample z = 20-1 mod 117

i

-2-1 0 1 2 3 4

ri

r-2 = 117r-1 = 20r0 = 17r1 = 3r2 = 2r3 = 1r4 = 0

xi

x-2= 0x-1= 1x0 =-5x1 = 6x2 = -35x3 = 41 = 20-1 mod 117x4 = -117

qi

q-1 = 5q0 = 1q1 = 5 q2 = 1q3 = 2

qi = ri-1

ri

ri+1 = ri-1 - qi ri

xi+1 = xi-1 - qi xi

Check:20 41 mod 117 = 1

Topic 3

RSA Encryption & Decryptionwith

Montgomery Multipliers based on

Carry Save Adders

RSA as a trap-door one-way function

M C = f(M) = Me mod N C

M = f-1(C) = Cd mod N

PUBLIC KEY

PRIVATE KEY

N = P Q P, Q - large prime numbers

e d 1 mod ((P-1)(Q-1))

Right-to-left binary exponentiation

Left-to-right binary exponentiation

Exponentiation: Y = XE mod N

E = (eL-1, eL-2, …, e1, e0)2

Y = 1;S = X;for i=0 to L-1 { if (ei == 1) Y = Y S mod N; S = S2 mod N; }

Y = 1;for i=L-1 downto 0 { Y = Y2 mod N; if (ei == 1) Y = Y X mod N; }

Montgomery Modular Multiplication (1)

C = A B mod M

A

Integer domain Montgomery domain

A’ = A 2k mod M

B B’ = B 2k mod M

C’ = MP(A’, B’, M) = = A’ B’ 2-k mod M = = (A 2k) (B 2k) 2-k mod M = = A B 2k mod M

C’ = C 2k mod M C = A B

A, B, M – k-bit numbers

Montgomery Modular Multiplication (2)

A’ = MP(A, 22k mod M, M)

C = MP(C’, 1, M)

A A’

C C’

Montgomery Modular Multiplication (3)

x2n-1 x0. . . x1x2n-2 x2n-3 xn . . .

2k bits

X = A’B’

+ q0M

x2n-1 . . . x1x2n-2 x2n-3 xn . . . 0

+ q1Mb

x2n-1 . . .x2n-2 x2n-3 00x2

. . . . . .

00. . .0C’

k bits

C’ 2k = X + zMC’ 2k X = A’B’

C’ A’B’ 2-k

Fast modular exponentiation using Chinese Remainder Theorem

=MPCP P

dP

mod =MQCQ Q

dQ

mod

CP = C mod PdP = d mod (P-1)

CQ = C mod QdQ = d mod (Q-1)

= modCM

d

N

M = MP ·RQ + MQ ·RP mod Nwhere

RP = (P-1 mod Q) ·P = PQ-1 mod N

RQ = (Q-1 mod P) ·Q= QP-1 mod N

Time of exponentiationwithout and with Chinese Remainder Theorem

SOFTWARE

HARDWARE

Without CRT

With CRT

tEXP(k) = cs k3

tEXP-CRT(k) 2 cs ( )3 = tEXP(k)14

Without CRT

With CRT

tEXP(k) = ch k2

tEXP-CRT(k) ch ( )2 = tEXP(k)14

k2

k2

Topic 4

RSA Encryption & Decryptionwith

Word-Based Montgomery Multipliers

Data dependency graph of a classical architecture by Tenca & Koc

Data dependency graph of a new design from GWU & GMU

Block diagram of the new architecture

Block diagram of the main Processing Element

Topic 5

p-1 Method of Factoring

p-1 algorithm

Inputs :

N – number to be factored

a – arbitrary integer such that gcd(a, N)=1

B1 – smoothness bound for Phase1

Outputs:

q - factor of N, 1 < q ≤ N

or FAIL

p-1 algorithm – Phase 1

1

1

0

0

1: such that - consecutive primes

- largest exponent such that

2: mod

3: gcd( 1, )

4 : if 1

5: return

i

i

i

ei ip

ei i

k

k p p B

e p B

q a N

q q N

q

q

(factor of )

6: else

7: go to Phase 2

8: end if

N

precomputations

postcomputations

main computations

out of scope for this project

p-1 Phase 1 – Numerical example

N = 1 740 719 = 1279·1361

a = 2

B1 = 20k = 24·32·5·7·11·13·17·19 = 232 792 560

q0=ak mod N = 2232 792 560 mod 1 740 719 = 1 003 058

q = gcd (1 003 058 1; 1 740 719) = 1361

Why did the method work?

q-1 = 1360 = 2·5·17 | k

ak mod q = a(q-1)·m mod q = 1

q | ak-1

Design MethodologyOptions

by Mike BabstDSPlogic

Methodology 1

RTL VHDL

Classical VHDL-basedDesign Methdology

Structure of a Typical Digital System

Execution Unit

(Datapath)

Control Unit

(Control)

Data Inputs

Data Outputs

Control Inputs

Control Outputs

Control Signals

Hardware Design with RTL VHDL

Pseudocode

Execution Unit Control Unit

Block

diagram

Block

diagramASM

VHDL code VHDL code VHDL code

Interface

Steps of the Design Process

1. Text description2. Interface3. Pseudocode4. Block diagram of the Execution Unit5. Interface with the division into Execution Unit and Control Unit6. ASM chart and/or block diagram of the Control Unit7. RTL VHDL code8. Testbench9. Debugging10. Synthesis and implementation

11. Experimental testing (not required in this course)

Project 2 - Platform & tools

Target devices: Xilinx FPGAs

Tools:

VHDL Simulation: Aldec Active HDL or Xilinx ModelSimVHDL Synthesis: Synplify Pro or Xilinx XSTImplementation: Xilinx ISE or Xilinx WebPack

All tools available in S&T 2, rooms 203 & 265.Xilinx tools available for free for home use.

Aldec Active HDL student edition available for home use.

Methodology 2

Graphical Data Flow Language

DSPlogic RCToolbox

See the presentation byMike Babst, PhD

DSPlogicavailable through WebCT

Project 2 - Platform & toolsTarget devices: Xilinx FPGAs

Tools:

Design Entry & Debugging: DSPlogic RC Toolbox MathWorks Simulink MathWorks Matlab

Synthesis and Implementation: Xilinx System Generator Xilinx ISE

All tools available in S&T 2, room 220.

Two hands-on sessionsgiven by Dr. Babst

during the first two weeks afterthe selection of the project

Reconfigurable computerssupported by DSPlogic toolset

Machine Released

Cray XD1 fromfrom Cray

SGI Altix fromSGI

2005

2005

Interface

P memory

P memory

. . .

P P . . .

I/O Interface

FPGA memory

FPGA memory

. . .

FPGA FPGA . . .

I/O

Microprocessor system Reconfigurable system

What is a Reconfigurable Computer?

Methodology 3

HLL Compilers

Celoxica Handel C

Design Flow

Executable Specification

Handel-C

Synthesis

Place & Route

VHDL

EDIFEDIF

Handel-C / ANSI-C Comparisons

Preprocessorsie. #define

Structures

ANSI-C Constructsfor, while, if, switch

Functions

Arrays

Pointers

Arithmetic operators

Bitwise logical operators

Logical operators

ANSI-C Standard Library

Side Effectsie. X = Y++

Recursion

Floating Point

Handel-C Standard Library

Parallelism

Arbitrary width variables

RAM, ROM

SignalsChannels

Interfaces

Enhanced bit manipulation

ANSI-C HANDEL-C

Handel-C Language (1)

• A subset of ANSI-C

• Sequential software style with a “par” construct to implement parallelism

• A channel “chan” statement allows for communication and synchronization between parallel branches

• Level of design abstraction is above RTL but below behavioral

Handel-C Language (2)

• Each assignment and delay statement take one clock cycle

• Automatic generation of the state machine from an algorithmic description of the circuit in terms of parallel and sequential blocks

• Automatic scheduling of parallel and sequential blocks, that is the code following a group is scheduled only after that whole group has completed

Handel-C Language (3)

• Automatic generation of clocks, clock enables and resets

• Combinational logic may be implemented using for example bus, port and signal types

• It is possible to design at a level where some Handel-C statements look similar to Verilog, but the overal program structure is different

Platform & tools – HLL CompilersTarget devices: Xilinx FPGAs

Tools:

Design Entry & Debugging: Celoxica DK4 Design Suite

(integrated environment providing Handel C compiler, debugging, simulation, and synthesis to EDIF and VHDL)Synthesis and Implementation:

Xilinx ISE

All tools available in S&T 2, rooms 203 & 265.

VHDL macro declaration in Handel-C

ENTITY parmult ISport ( clk: IN std_logic; a: IN std_logic_VECTOR(7 downto 0); b: IN std_logic_VECTOR(7 downto 0); q: OUT std_logic_VECTOR(15 downto 0));

END parmult;

interface parmult (unsigned 16 q) parmult_instance (unsigned 1 clk, unsigned 8 a, unsigned 2 b) with {busformat = "B(I)"};

unsigned 8 x1, x2;unsigned resultX;

interface parmult(unsigned 16 q)

parmult_instance1(unsigned 1 clk = __clock, unsigned 8 a = x1, unsigned 8 b = x2 )

with {busformat = "B(I)"};

VHDL macro instantiation in Handel-C

Celoxica RC10 board supporting Handel C librariesused in the GMU ECE 448 FPGA and ASIC Design with VHDL

Literature

Additional literature with the detailed

description of all algorithms available

for each project.

Project Organization

• 1-3 person teams allowed• 2 person teams preferred

by Friday midnight the latest

Please submit your - ranking of 4 topics - ranking of 3 design methodologies

top related