fast adders: parallel prefix network adders, conditional-sum adders, & carry-skip adders ece...
Post on 04-Jan-2016
232 Views
Preview:
TRANSCRIPT
Fast Adders:Parallel Prefix Network Adders,
Conditional-Sum Adders,& Carry-Skip Adders
ECE 645: Lecture 5
Required Reading
Chapter 6.4, Carry Determination as Prefix ComputationChapter 6.5, Alternative Parallel Prefix Networks
Chapter 7.4, Conditional-Sum AdderChapter 7.5, Hybrid Adder Designs
Chapter 7.1, Simple Carry-Skip Adders
Note errata at:http://www.ece.ucsb.edu/~parhami/text_comp_arit_1ed.htm#errors
Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design
Recommended Reading
J-P. Deschamps, G. Bioul, G. Sutter, Synthesis of Arithmetic Circuits: FPGA, ASIC and Embedded Systems
Chapter 11.1.9, Prefix Adders
Chapter 11.1.3, Carry-Skip AdderChapter 11.1.4, Optimization of Carry-Skip AddersChapter 11.1.10, FPGA Implementations of AddersChapter 11.1.11, Long-Operand Adders
Parallel Prefix Network Adders
5
Basic component - Carry operator (1)
B” B’
g” p” g’ p’
g p
g = g” + g’p”p = p’p”
(g, p) = (g’, p’) ¢ (g”, p”) = (g” + g’p”, p’p”)
B
Parallel Prefix Network Adders
6
Basic component - Carry operator (2)
B” B’
g” p” g’ p’
g p
g = g” + g’p”p = p’p”
(g, p) = (g’, p’) ¢ (g”, p”) = (g” + g’p”, p’p”)
B
overlap okay!
Parallel Prefix Network Adders
7
Associative
Not commutative
[(g1, p1) ¢ (g2, p2)] ¢ (g3, p3) = (g1, p1) ¢ [(g2, p2) ¢ (g3, p3)]
(g1, p1) ¢ (g2, p2) (g2, p2) ¢ (g1, p1)
Properties of the carry operator ¢
8
Major concept
Given:
Find:
(g0, p0) (g1, p1) (g2, p2) …. (gk-1, pk-1)
(g[0,0], p[0,0]) (g[0,1], p[0,1]) (g[0,2], p[0,2]) … (g[0,k-1], p[0,k-1])
ci = g[0,i-1] + c0p[0,i-1]
Parallel Prefix Network Adders
block generate from index 0 to k-1
9
Parallel Prefix Sum Problem
Given:
Find:
Parallel Prefix Adder Problem
Given:
Find:
x0 x0+x1 x0+x1+x2 … x0+x1+x2+ …+ xk-1
x0 x1 x2 … xk-1
x0 x0 ¢ x1 x0 ¢ x1 ¢ x2 … x0 ¢ x1 ¢ x2 ¢ … ¢ xk-1
x0 x1 x2 … xk-1
where xi = (gi, pi)
Similar to Parallel Prefix Sum Problem
10
Parallel Prefix Sums Network I
11
Cost = C(k) = 2 C(k/2) + k/2 = = 2 [2C(k/4) + k/4] + k/2 = 4 C(k/4) + k/2 + k/2 = = …. = = 2 log k-1C(2) + k/2 (log2k-1) = = k/2 log2k
2
Example:
C(16) = 2 C(8) + 8 = 2[2 C(4) + 4] + 8 = = 4 C(4) + 16 = 4 [2 C(2) + 2] + 16 = = 8 C(2) + 24 = 8 + 24 = 32 = (16/2) log2 16
C(2) = 1
Parallel Prefix Sums Network I – Cost (Area) Analysis
12
Delay = D(k) = D(k/2) + 1 = = [D(k/4) + 1] + 1 = D(k/4) + 1 + 1 = = …. = = log2k
Example:
D(16) = D(8) + 1 = [D(4) + 1] + 1 = = D(4) + 2 = [D(2) + 1] + 2 = = 4 = log2 16
D(2) = 1
Parallel Prefix Sums Network I – Delay Analysis
13
Parallel Prefix Sums Network II (Brent-Kung)
14
Cost = C(k) = C(k/2) + k-1 = = [C(k/4) + k/2-1] + k-1 = C(k/4) + 3k/2 - 2 = = …. = = C(2) + (2k - 2k/2(log k-1)) - (log2k-1) = = 2k - 2 - log2k
Example:
C(16) = C(8) + 16-1 = [C(4) + 8-1] + 16-1 = = C(2) + 4-1 + 24-2 = 1 + 28 - 3 = 26 = 2·16 - 2 - log216
C(2) = 1
2
Parallel Prefix Sums Network II – Cost (Area) Analysis
15
Delay = D(k) = D(k/2) + 2 = = [D(k/4) + 2] + 2 = D(k/4) + 2 + 2 = = …. = = 2 log2k - 1
Example:
D(16) = D(8) + 2 = [D(4) + 2] + 2 = = D(4) + 4 = [D(2) + 2] + 4 = = 7 = 2 log2 16 - 1
D(2) = 1
Parallel Prefix Sums Network II – Delay Analysis
16
x0x1x2x3x4x5x6x7
s0s1s2s3s4s5s6s7
4 –bit B-K PPN
8-bit Brent-Kung Parallel Prefix Network
17
x1’x3’
s1’s3’
2 –bit B-K PPN
x5’x7’
s5’s7’
4-bit Brent-Kung Parallel Prefix Network
18
8-bit Brent-Kung Parallel Prefix Network Critical Path
GP GP GP GP GP GP GPGPGP
x7 y7 x6 y6 x5 y5 x4 y4 x3 y3 x2 y2 x1 y1 x0 y0
g7,p7 g6,p6 g5,p5 g4,p4 g3,p3 g2,p2 g1,p1 g0,p0
ccc ccc ccc ccc
ccc ccc
ccc
ccc
ccc ccc ccc
g[0,0]
p[0,0]
g[0,1]
p[0,1]
g[0,2]
p[0,2]
g[0,3]
p[0,3]
g[0,4]p[0,4]
g[0,5]p[0,5]
g[0,6]p[0,6]
g[0,7]
p[0,7]
CC
c0
C C C C C C C
SS S S S S S S S
c7 p7 c6 p6 c5 p5 c4 p4 c3 p3 c2 p2 c1 p1
p0
s7 s6 s5 s4 s3 s2 s1 s0
c8
criticalpath
19
Critical Path
GP
c
C
S
gi = xi yi
pi = xi yi
g = g” + g’ p”p = p’ p”
ci+1 = g[0,i] + c0 p[0,i]
si = pi ci
1 gate delay
2 gate delays
2 gate delays
1 gate delay
20
Brent-Kung Parallel Prefix Graph for 16 Inputs
21
Kogge-Stone Parallel Prefix Graph for 16 Inputs
22
Comparison of architectures
HybridNetwork 2Brent-Kung
Kogge-Stone
Cost(k)
Delay(k)
Cost(16)
Delay(16)
Cost(32)
Delay(32)
2 log2k - 2
2k - 2 - log2k
log2k+1 log2k
k/2 log2k k log2k - k + 1
6 5 4
26 32 49
8 6 5
57 80 129
Parallel Prefix Network Adders
23
Latency vs. Area Tradeoff
24
Hybrid Brent-Kung/Kogge-Stone Parallel Prefix Graph for 16 Inputs
Conditional-SumAdders
One-level k-bit Carry-Select Adder
Two-level k-bit Carry Select Adder
28
Conditional Sum Adder
• Extension of carry-select adder• Carry select adder
• One-level using k/2-bit adders• Two-level using k/4-bit adders• Three-level using k/8-bit adders• Etc.
• Assuming k is a power of two, eventually have an extreme where there are log2k-levels using 1-bit adders• This is a conditional sum adder
29
Conditional Sum Adder:Top-Level Block for One Bit Position
30
Three Levels of a Conditional Sum Adder
2 2
2 2
xi+3 yi+3
c=0c=1
2 2
xi+2 yi+2
c=0c=1
1 1
1
1
3c=0
3c=1
2 2
2 2
xi+1 yi+1
c=0c=1
2 2
1-bit conditional sum block
xi yi
c=0c=1
1 1
1
1
3c=0
3c=1
22
1
1
3 3
5
c=1
5
c=0
1+1
2+1
4+1
branch point
concatenation
block carry-indetermines selection
31
16-Bit Conditional Sum Adder Example
32
Conditional Sum Adder Metrics
Hybrid Adders
A Hybrid Ripple-Carry/Carry-Lookahead Adder
A Hybrid Carry-Lookahead/Carry-Select Adder
Carry-Skip AddersFixed-Block-Size
Apr. 2008 Computer Arithmetic, Addition/Subtraction Slide 37
7.1 Simple Carry-Skip Adders
Fig. 7.1 Converting a 16-bit ripple-carry adder into a simple carry-skip adder with 4-bit skip blocks.
cc ccc
cc ccc
pppp
SkipSkipSkip
4-Bit Block
Skip logic (2 gates)
1612
8
4
0
0
4
8
1216
[12,15] [8,11] [4,7][0,3]
(a) Ripple-carry adder.
(b) Simple carry-skip adder.
3 2 1 0
Ripple-carry stages
4-Bit Block
4-Bit Block
4-Bit Block
4-Bit Block
4-Bit Block
3 2 1 0
Apr. 2008 Computer Arithmetic, Addition/Subtraction Slide 38
Another View of Carry-Skip Addition
Street/freeway analogy for carry-skip adder.
c
g
p
4j+1
4j+1
g
p
4j
4j
g
p
4j+2
4j+2
g
p
4j+3
4j+3
c
4j
4j+4
c
4j+3
c
4j+2
c
4j+1
One-way street
Freeway
Apr. 2008 Computer Arithmetic, Addition/Subtraction Slide 39
Mux-Based Skip Carry Logic
The carry-skip adder with “OR combining” works fine if we begin with a clean slate, where all signals are 0s at the outset; otherwise, it will run into problems, which do not exist in mux-based version
c
g
p
4j+1
4j+1
g
p
4j
4j
g
p
4j+2
4j+2
g
p
4j+3
4j+3
c
4j
4j+4
c
4j+3
c
4j+2
c
4j+1
01
p[4j, 4j+3]
c4j+4
c
g
p
4j+1
4j+1
g
p
4j
4j
g
p
4j+2
4j+2
g
p
4j+3
4j+3
c
4j
4j+4
c
4j+3
c
4j+2
c
4j+1
Fig. 10.7 of arch book
Apr. 2008 Computer Arithmetic, Addition/Subtraction Slide 40
Carry-Skip Adder with Fixed Block Size
Block width b; k/b blocks to form a k-bit adder (assume b divides k)
Example: k = 32, b opt = 4, T
opt = 12.5 stages(contrast with 32 stages for a ripple-carry adder)
Tfixed-skip-add = (b – 1) + 0.5 + (k/b – 2) + (b – 1) in block 0 OR gate skips in last block
2b + k/b – 3.5 stages
dT/db = 2 – k/b2 = 0 b opt = k/2
T opt = 22k – 3.5
. . .
1
Fixed-Block-Size Carry-Skip Adder (1)
Notation & Assumptions
Delay of skip logic = Delay of one stage of ripple-carry adder = 1 delay unit
Latency of the carry-skip adder with fixed block width
Fixed block size - b bits
Latencyfixed-carry-skip = ( b - 1 ) + 0.5 + - 2 + ( b - 1 )
in block 0
OR gate skips in last block
Adder size - k-bits
= 2b + - 3.5
kb
kb
Number of stages - t
Fixed-Block-Size Carry-Skip Adder (2)
Optimal fixed block size
dLatencyfixed-carry-skip
db= 2 -
k
b2= 0
bopt=
k2
topt = k
bopt
=
2 k
Latencyfixed-carry-skipopt = 2
k2
+k
k2
- 3.5 =
=
2 k +
2 k - 3.5 =
2 k2 - 3.5
k bopt Latencyfixed-carry-skip
32
16
4
Latencyripple-carry Latencylook-ahead
64
128
Fixed-Block-Size Carry-Skip Adder (3)
8
topt
8
16
12.5
28.5
32
128
6.5
6.5
4.5
8.5
2
3
8
5
8.5 16
5
6
13
11
18.5 64
7.5
18.5
Carry-Chain &Carry-Skip Adders
in Xilinx FPGAs
LUT 0 1
xi yi ci+1
0
0
1
1
0
1
0
1
yi
yi
ci
ci
xi
yi
pi
ci+1
ci
si
Basic Cell of a Carry-Chain Adder in Xilinx FPGAs
LUT 0 1
Carry-Skip Adder
b-bit blockLUT 0 1
LUT 0 1xib
. . . . . . .
yib
pib
cib
sib
cib+1
sib+1
xib+1
yib+1
pib+1
xib+(b-1)
yib+(b-1)
pib+(b-1)
cc(i+1)b
sib+(b-1)
LUT 0 1
Carry-Skip Adder:
Carry SkipMultiplexers
LUT 0 1
LUT 0 1
. . . . . . .
p[0, b-1]
c0
cb
p[b, 2b-1]
p[k-b, k-1]
ck
c0
cb
ck-b
p[k-b, k-1]
p[b, 2b-1]
p[0, b-1]
ccb
cc2b
cck-b
LUT 0 1
Carry-Skip Adder:
Computationof the BlockPropagate
Signals
LUT
LUT 0 1
. . . . . . .
cb
p[ib, ib+(b-1)]
0 1
p[ib, ib+1]
p[ib, ib+3]
p[ib, ib+(b-3)]
0 1
0
0
xib+1
yib+1
xib
yib
xib+3
yib+3
xib+2
yib+2
xib+(b-1)
yib+(b-1)
xib+(b-2)
yib+(b-2)
Complete Carry-Skip Adder
xb-1..0
yb-1..0
x2b-1..b
y2b-1..b
x3b-1..2b
y3b-1..2b
xk-1..k-b
yk-1..k-b. . . . .
cc2b
0
1
cb
0
1
c2b
cc3bccb
0
1
c0
cck
ck-b
. . .
ss-1..0 s2s-1..s s3b-1..2b sk-1..k-b
c0
0
1
ck
xb-1..0
yb-1..0
x2b-1..b
y2b-1..b
x3b-1..2b
y3b-1..2b
xk-1..k-b
yk-1..k-b. . . . .
p[0,b-1] p[b,2b-1] p[2b,3b-1] p[k-b,k-1]
Carry-Skip Adders in Xilinx Spartan II FPGAs
by J-P. Deschamps, G. Bioul, G. Sutter
Delay in ns
b=k b=8 b=16 b=32
Max. Frequency
Increase
64 14 13 12 13%
96 16 14 13 21%
128 23 14 14 63%
256 38 - 16 17 141%
512 77 - 20 20 296%
1024 159 - 28 25 531%
k
Carry-Skip Adders in Xilinx Spartan II FPGAs
by J-P. Deschamps, G. Bioul, G. Sutter
Area in CLB slices
b=k b=8 b=16 b=32 b=8 b=16 b=32
Area Overhead
64 32 47 41 - 47% 28% -
96 48 73 66 - 52% 38% -
128 64 99 91 - 55% 42% -
256 128 - 191 179 - 49% 40%
512 256 - 391 375 - 53% 46%
1024 512 - 791 767 - 54% 50%
k
Carry-Skip AddersVariable-Block-Size
Apr. 2008 Computer Arithmetic, Addition/Subtraction Slide 53
Carry-Skip Adder with Variable-Width Blocks
Fig. 7.2 Carry-skip adder with variable-size blocks and three sample carry paths.
b b b b. . .
RippleSkip
Carry path (1)
01t–1 t–2 Block widths
Carry path (3)
Carry path (2)
The total number of bits in the t blocks is k:
2[b + (b + 1) + . . . + (b + t/2 – 1)] = t(b + t/4 – 1/2) = k
b = k/t – t/4 + 1/2
Tvar-skip-add = 2(b – 1) + 0.5 + t – 2 = 2k/t + t/2 – 2.5
dT/db = –2k/t 2 + 1/2 = 0 t
opt = 2k
T opt = 2k – 2.5 (a factor of 2 smaller than for fixed-block)
P
xj yj
pj
P
xj+1 yj+1
pj+1
P
pj+bi-2
P
xj+b -2 yj+b -2xj+b -1 yj+b -1
pj+bi-1
i i i i
CP
p[i,i|+bi-1]
cin=0
xj yj
FA
xj+1 yj+1
FA
xj+b -2 yj+b -2
FAFA . . .
bi stages
xj+b -1 yj+b -1i i i i
cout
sj+b -1 sj+b -2 sj+1 sj
Most critical path to produce carry
i i
P
xj yj
pj
P
xj+1 yj+1
pj+1
P
pj+b -2
P
xj+b -2 yj+b -2xj+b -1 yj+b -1
pj+b -1
i i i i
CP
p[i,i|+bi-1]
cin
xj yj
FA
xj+1 yj+1
FA
xj+b -2 yj+b -2
FAFA . . .
bi stages
xj+b -1 yj+b -1i i i
cout
sj+b -1 sj+b -2 sj+1 sj
Most critical path to assimilate carry
i i
i
i i
Variable-Block-Size Carry-Skip Adder (1)
Notation & Assumptions
Delay of skip logic = Delay of one stage of ripple-carry adder = 1 delay unit
Adder size - k-bits
Number of stages - t
Block size - variable
First and last block size - b bits
Variable-Block-Size Carry-Skip Adder (2)
Optimum block sizes
bt-1 bt-2 bt-3 . . . bt/2+1 bt/2-1 . . . b2 b1 b0
b b+1 b+2 . . . b+t
2-1 b+
t
2-1 b+2 b+1 b
Total number of bits
k = 2 [ b + (b+1) + (b+2) + … + (b + + 1 ) ] =
= t ( b + - )
t2
t4
12
Variable-Block-Size Carry-Skip Adder (3)
Number of bits in the first and last block
b = kt
- t4
+ 12
Latency of the carry-skip adder with variable block width
Latencyfixed-carry-skip = ( b - 1 ) + 0.5 + t - 2 + ( b - 1 )
in block 0
OR gate skips in last block
= 2 b + t - 3.5 = 2 kt
- t4
+ 12
+ t - 3.5 =
= 2kt
12
t - 2.5+
Variable-Block-Size Carry-Skip Adder (4)
Optimal number of blocks
dLatencyvariable-carry-skip
dt=
2k
t2
topt=
4 k
- +1
2= 0
=
k2
bopt = ktopt
-topt
4+ 1
2=
= k -4
+ 12
k2
k2
= 12
bopt = 1
Variable-Block-Size Carry-Skip Adder (5)
Optimal latency
= 2kt
12
t - 2.5+Latencyvariable-carry-skipopt =
=2k
+2
k2
k2
= - 2.5
=
k2 - 2.5
Latencyvariable-carry-skipopt
Latencyfixed-carry-skipopt
2
top related