ihp im technologiepark 25 15236 frankfurt (oder) germany ihp im technologiepark 25 15236 frankfurt...
TRANSCRIPT
IHPIm Technologiepark 2515236 Frankfurt (Oder)
Germany
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
An Efficient Polynomial Multiplier in GF(2m) and ist Application to ECC Designs
Steffen Peter and Peter Langendörfer
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Outline
• Motivation and introduction into ECC
• Basic polynomial multiplication approaches
• Combinatorial polynomial multiplier
• Iterative polynomial multiplier
• Implications for the ECC design
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Elliptic Curve Cryptography
• Asymmetric cryptography
• Trapdoor : Elliptic Curve Point Multiplication– one can compute: Q = kP
– it is infeasible to determine k for given Q and P
• Higher security with shorter keys than RSA– Recommended key lengths [Lenstra & Verheul “Selecting Cryptographic Key Sizes”]
Year RSA ECC
-2010 1024 160
-2030 2048 224
>2030 3072 256
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
ECC in Software or Hardware?
233 Bit ECC
on MIPS (Software) or
ECC hardware accelerator?
• Time for one ECPM:– MIPS: 410 ms– HW: 0.4 ms
• Energy for one ECPM:– MIPS: 16.5 mWs– HW: 0.03 mWs
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
ECC Pyramid
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
EC Cryptographic Operations
• Cryptographic protocols- Signature generation/verification- Encryption/decryption
• Executed on a CPU- May use ECC accelerator for sub-routines
CPU(MIPS, ARM,
LEON,…)
ECC Co-processor
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
EC Point Operations
• Operations on points on the Elliptic Curve– Point addition: Point + Point– Point multiplication: integer · Point
(Montgomery/Lopez-Dahab Point Multiplication)
• Executed on the Co-processor
CPU ECC Co-processor
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
EC Point Operations
• Asymmetric cryptography
• Trapdoor : Elliptic Curve Point Multiplication– one can compute: Q = kP
– it is infeasible to determine k for given Q and P
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Finite Field Operations
• Operations in the finite field- Addition/subtraction (m-bit XOR)- Multiplication (m-bit · m-bit)- Squaring (much faster than multiplication)- Division (very expensive)
• Each EC point operation requires operations in the finite field– E.g one 233 bit EC Point multiplication
–1200 Additions
–1500 Multiplications (233 bit multiplication)
–800 Squaring
–1 division
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Basic Field Operations
• Prime Fields (GF(p))– p is a very large prime (about 200 bits)
– requires carries for additions
– preferred for software implementations
• Binary Extension Fields (GF(2m))– m is bit length of the field (typical 160-283 bit)
–easy hardware representation (m-bit array)
–no carries (additions are simple XOR operations)
preferred for hardware implementations
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Utilization /Area of Functional Blocks
• Asymmetric cryptography
• Trapdoor : Elliptic Curve Point Multiplication– one can compute: Q = kP
– it is infeasible to determine k for given Q and P
Utilization 95%15%
50%Area70%5%
20%
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Classic (school) Polynomial Multiplication
a(x) & b(x0)+++
a(x) & b(x1)
a(x) & b(x2)
a(x) & b(x3)...
a(x) & b(xm-2)
a(x) & b(xm-1)
+++
c(x) = a(x) ∙ b(x)
a(x) b(x)∙ =
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Classic Polynomial Multiplication
• Gate count: m2 AND gates
(m-1)2 XOR gates
• Longest path: 1 AND + log2(m) XOR
&+
+
&
&
&
&
&
&
&
+
++
+
+
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Classic Karatsuba Multiplication
a(x)
++
A0∙B0
++
c(x) = a(x) ∙ b(x)
A1 A0
A0∙B0
(A1+ A0) ∙ (B1+ B0)
A1∙B1
A1∙B1
4 additions (XOR) + 3 multiplications per level(CPM: 3 additions + 4 multiplications)
b(x) B1 B0
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Classic Karatsuba Multiplication
• Gate count: AND gates
XOR gates
• Longest path: 1 AND + 3 log2m XOR
3log2m286 3log2 mm
& & & & & & & &
3 XORs each
3 XORs each
3 XORs each
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Iterative Karatsuba Multiplication
• Split factors in 4 segmentsA(x) = a3…a0
B(x) = b3…b0
• Perform 9 partial multiplications
Result is 8 segments
C(x) = c7…c0
p a r t i a l p r o d u c t s s e g m e n t s o f r e s u l t a * * b 0
[ 0 ]
a 0 * b 0 [ 1 ]
a 1 * b 1 [ 0 ]
a 1 * b 1 [ 1 ]
a 2 * b 2 [ 0 ]
a 2 * b 2 [ 1 ]
a 3 * b 3 [ 0 ]
a 3 * b 3 [ 1 ]
( a 0 a 1 ) * ( b 0
b 1 ) [ 0 ]
( a 0 a 1 ) * ( b 0
b 1 ) [ 1 ]
( a 0 a 2 ) * ( b 0
b 2 ) [ 0 ]
( a 0 a 2 ) * ( b 0 b 2 ) [ 1 ]
( a 1 a 3 ) * ( b 1
b 3 ) [ 0 ]
( a 1 a 3 ) * ( b 1 b 3 ) [ 1 ]
( a 2 a 3 ) * ( b 2
b 3 ) [ 0 ]
( a 2 a 3 ) * ( b 2
b 3 ) [ 1 ]
( a 0 a 1
a 2
a 3 ) * ( b 0 b 1
b 2
b 3 ) [ 0 ]
( a 0 a 1
a 2
a 3 ) * ( b 0
b 1
b 2
b 3 ) [ 1 ]
)( xC
01234567 cccccccc
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Iterative Karatsuba Multiplication (2)
• Optimized aggregation planReduces number of XOR operations to 34
(instead of 40 for classic Karatsuba)
• Without additional costs– constant number of ANDs– constant longest path
• Can be applied recursively – 256 bit mul = 9 x 64 bit mul– 64 bit mul = 9 x 16 bit mul– 16 bit mul = 9 x 4 bit mul
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Comparison
Bit size Classic Polynomial RAI Karatsuba
XOR AND XOR AND
2 1 4 4 (4) 3
4 9 16 23 (24) 9
16 225 256 332 (360) 81
64 3969 4096 3521 (3864) 729
128 16129 16384 10959 (12100) 2187
256 65025 65536 33854 (37320) 65619x
9x
9x
Hybrid RAIK
XOR AND
1 4
9 16
206 144
2497 1296
7505 3888
24649 11664
Hybrid RAIK is smallest polynomial multiplication unit BUT: CPM is faster Bit size XOR gates in longest path
CPM Hybrid RAIK
64 6 15
128 7 18
256 8 21
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Recursive combinatorial multiplication units
• Perform multiplication within one clock cycle
• Do not need state information
• Technical feasible up to 256 bit– huge complexity– high latency
Practically questionable– Data transport/bus becomes bottleneck
MUL256 bit
16 ns
A B
C = A·B
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Iterative multiplication units
• More than one clock cycle per Multiplication
• Iterative unit embeds smaller recursive unit
• Highly regular structure– flexible– little overhead
A
BSelection Partial
Multiplier
Aggregation
C
256 bit 64 bit 128 bit 511 bit
Control9 times
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Iterative multiplication units
• 256 bit polynomial multipliers
Confi-guration
Cycles per Multiplication
Size of embedded multiplier
[Bit]
Delay
[ns]
Silicon Area
[mm2]
Energy per Multiplication
[nWs]
Combinatorial 1 256 16 2.0 5
2 segment 3 128 13 1.2 6
4 segment 9 64 11 0.6 11
8 segment 27 32 10 0.4 19
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Set up an ECC accelerator design
• Asymmetric cryptography
• Trapdoor : Elliptic Curve Point Multiplication– one can compute: Q = kP
– it is infeasible to determine k for given Q and P
• 283 bit–Bus–Registers–Alu
• Speed requirements
4 segment
- Multiplier(72 bit embedded)
• Adapt control logic
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
ECC designs 163 – 571 bit
• Time per ECPM
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
ECC designs 163 – 571 bit
• Energy per ECPM and silicon area (IHP 0.25um CMOS)
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Conclusions
• Polynomial multiplication is the most challenging operation in the finite field:–executed 1500 times for one 233 bit ECPM–Most silicon area (70%)–Highest utilization (95%)
• Large combinatorial multiplier are feasible– hRAIK is the smallest– Classic polynomial is the fastest
• For ECC designs iterative Karatsuba approaches are well suited–Adaptable–Small–Energy efficient
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2007 - All rights reserved
Thank You
Questions?