fast adders: parallel prefix network adders, conditional-sum adders, & carry-skip adders ece...

Fast Adders:Parallel Prefix Network Adders,

Conditional-Sum Adders,& Carry-Skip Adders

ECE 645: Lecture 5

Required Reading

Chapter 6.4, Carry Determination as Prefix ComputationChapter 6.5, Alternative Parallel Prefix Networks

Chapter 7.4, Conditional-Sum AdderChapter 7.5, Hybrid Adder Designs

Chapter 7.1, Simple Carry-Skip Adders

Note errata at:http://www.ece.ucsb.edu/~parhami/text_comp_arit_1ed.htm#errors

Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design

Similar to Parallel Prefix Sum Problem

Parallel Prefix Sums Network I

Cost = C(k) = 2 C(k/2) + k/2 = = 2 [2C(k/4) + k/4] + k/2 = 4 C(k/4) + k/2 + k/2 = = …. = = 2 log k-1C(2) + k/2 (log2k-1) = = k/2 log2k

Example:

C(16) = 2 C(8) + 8 = 2[2 C(4) + 4] + 8 = = 4 C(4) + 16 = 4 [2 C(2) + 2] + 16 = = 8 C(2) + 24 = 8 + 24 = 32 = (16/2) log2 16

C(2) = 1

Parallel Prefix Sums Network I – Cost (Area) Analysis

Delay = D(k) = D(k/2) + 1 = = [D(k/4) + 1] + 1 = D(k/4) + 1 + 1 = = …. = = log2k

Example:

D(16) = D(8) + 1 = [D(4) + 1] + 1 = = D(4) + 2 = [D(2) + 1] + 2 = = 4 = log2 16

D(2) = 1

Parallel Prefix Sums Network I – Delay Analysis

Parallel Prefix Sums Network II (Brent-Kung)

Cost = C(k) = C(k/2) + k-1 = = [C(k/4) + k/2-1] + k-1 = C(k/4) + 3k/2 - 2 = = …. = = C(2) + (2k - 2k/2(log k-1)) - (log2k-1) = = 2k - 2 - log2k

Example:

C(16) = C(8) + 16-1 = [C(4) + 8-1] + 16-1 = = C(2) + 4-1 + 24-2 = 1 + 28 - 3 = 26 = 2·16 - 2 - log216

C(2) = 1

Parallel Prefix Sums Network II – Cost (Area) Analysis

Delay = D(k) = D(k/2) + 2 = = [D(k/4) + 2] + 2 = D(k/4) + 2 + 2 = = …. = = 2 log2k - 1

Example:

D(16) = D(8) + 2 = [D(4) + 2] + 2 = = D(4) + 4 = [D(2) + 2] + 4 = = 7 = 2 log2 16 - 1

D(2) = 1

Parallel Prefix Sums Network II – Delay Analysis

x0x1x2x3x4x5x6x7

s0s1s2s3s4s5s6s7

4 –bit B-K PPN

8-bit Brent-Kung Parallel Prefix Network

x1’x3’

s1’s3’

2 –bit B-K PPN

x5’x7’

s5’s7’

4-bit Brent-Kung Parallel Prefix Network

8-bit Brent-Kung Parallel Prefix Network Critical Path

GP GP GP GP GP GP GPGPGP

x7 y7 x6 y6 x5 y5 x4 y4 x3 y3 x2 y2 x1 y1 x0 y0

g7,p7 g6,p6 g5,p5 g4,p4 g3,p3 g2,p2 g1,p1 g0,p0

ccc ccc ccc ccc

ccc ccc

ccc ccc ccc

g[0,0]

p[0,0]

g[0,1]

p[0,1]

g[0,2]

p[0,2]

g[0,3]

p[0,3]

g[0,4]p[0,4]

g[0,5]p[0,5]

g[0,6]p[0,6]

g[0,7]

p[0,7]

C C C C C C C

SS S S S S S S S

c7 p7 c6 p6 c5 p5 c4 p4 c3 p3 c2 p2 c1 p1

s7 s6 s5 s4 s3 s2 s1 s0

criticalpath

Critical Path

gi = xi yi

pi = xi yi

g = g” + g’ p”p = p’ p”

ci+1 = g[0,i] + c0 p[0,i]

si = pi ci

1 gate delay

2 gate delays

1 gate delay

Brent-Kung Parallel Prefix Graph for 16 Inputs

Kogge-Stone Parallel Prefix Graph for 16 Inputs

Comparison of architectures

HybridNetwork 2Brent-Kung

Kogge-Stone

Cost(k)

Delay(k)

Cost(16)

Delay(16)

Cost(32)

Delay(32)

2 log2k - 2

2k - 2 - log2k

log2k+1 log2k

k/2 log2k k log2k - k + 1

26 32 49

57 80 129

Latency vs. Area Tradeoff

Hybrid Brent-Kung/Kogge-Stone Parallel Prefix Graph for 16 Inputs

Conditional-SumAdders

One-level k-bit Carry-Select Adder

Two-level k-bit Carry Select Adder

Conditional Sum Adder

• Extension of carry-select adder• Carry select adder

• One-level using k/2-bit adders• Two-level using k/4-bit adders• Three-level using k/8-bit adders• Etc.

• Assuming k is a power of two, eventually have an extreme where there are log2k-levels using 1-bit adders• This is a conditional sum adder

Conditional Sum Adder:Top-Level Block for One Bit Position

Three Levels of a Conditional Sum Adder

xi+3 yi+3

c=0c=1

xi+2 yi+2

c=0c=1

xi+1 yi+1

c=0c=1

1-bit conditional sum block

c=0c=1

branch point

concatenation

block carry-indetermines selection

16-Bit Conditional Sum Adder Example

Conditional Sum Adder Metrics

Hybrid Adders

A Hybrid Ripple-Carry/Carry-Lookahead Adder

A Hybrid Carry-Lookahead/Carry-Select Adder

Carry-Skip AddersFixed-Block-Size

Apr. 2008 Computer Arithmetic, Addition/Subtraction Slide 37

7.1 Simple Carry-Skip Adders

Fig. 7.1 Converting a 16-bit ripple-carry adder into a simple carry-skip adder with 4-bit skip blocks.

cc ccc

SkipSkipSkip

4-Bit Block

Skip logic (2 gates)

[12,15] [8,11] [4,7][0,3]

(a) Ripple-carry adder.

(b) Simple carry-skip adder.

3 2 1 0

Ripple-carry stages

4-Bit Block

3 2 1 0

Another View of Carry-Skip Addition

Street/freeway analogy for carry-skip adder.

One-way street

Freeway

Mux-Based Skip Carry Logic

The carry-skip adder with “OR combining” works fine if we begin with a clean slate, where all signals are 0s at the outset; otherwise, it will run into problems, which do not exist in mux-based version

p[4j, 4j+3]

Fig. 10.7 of arch book

Carry-Skip Adder with Fixed Block Size

Block width b; k/b blocks to form a k-bit adder (assume b divides k)

Example: k = 32, b opt = 4, T

opt = 12.5 stages(contrast with 32 stages for a ripple-carry adder)

Tfixed-skip-add = (b – 1) + 0.5 + (k/b – 2) + (b – 1) in block 0 OR gate skips in last block

2b + k/b – 3.5 stages

dT/db = 2 – k/b2 = 0 b opt = k/2

T opt = 22k – 3.5

Fixed-Block-Size Carry-Skip Adder (1)

Notation & Assumptions

Delay of skip logic = Delay of one stage of ripple-carry adder = 1 delay unit

Latency of the carry-skip adder with fixed block width

Fixed block size - b bits

Latencyfixed-carry-skip = ( b - 1 ) + 0.5 + - 2 + ( b - 1 )

in block 0

OR gate skips in last block

Adder size - k-bits

= 2b + - 3.5

Number of stages - t

Optimal fixed block size

dLatencyfixed-carry-skip

db= 2 -

topt = k

Latencyfixed-carry-skipopt = 2

- 3.5 =

2 k - 3.5 =

2 k2 - 3.5

k bopt Latencyfixed-carry-skip

Latencyripple-carry Latencylook-ahead

8.5 16

18.5 64

Carry-Chain &Carry-Skip Adders

in Xilinx FPGAs

LUT 0 1

xi yi ci+1

Basic Cell of a Carry-Chain Adder in Xilinx FPGAs

LUT 0 1

Carry-Skip Adder

b-bit blockLUT 0 1

LUT 0 1xib

. . . . . . .

xib+(b-1)

yib+(b-1)

pib+(b-1)

cc(i+1)b

sib+(b-1)

LUT 0 1

Carry-Skip Adder:

Carry SkipMultiplexers

LUT 0 1

. . . . . . .

p[0, b-1]

p[b, 2b-1]

p[k-b, k-1]

p[b, 2b-1]

p[0, b-1]

LUT 0 1

Carry-Skip Adder:

Computationof the BlockPropagate

Signals

LUT 0 1

. . . . . . .

p[ib, ib+(b-1)]

p[ib, ib+1]

p[ib, ib+3]

p[ib, ib+(b-3)]

xib+(b-1)

yib+(b-1)

xib+(b-2)

yib+(b-2)

Complete Carry-Skip Adder

xb-1..0

yb-1..0

x2b-1..b

y2b-1..b

x3b-1..2b

y3b-1..2b

xk-1..k-b

yk-1..k-b. . . . .

cc3bccb

ss-1..0 s2s-1..s s3b-1..2b sk-1..k-b

xb-1..0

yb-1..0

x2b-1..b

y2b-1..b

x3b-1..2b

y3b-1..2b

xk-1..k-b

yk-1..k-b. . . . .

p[0,b-1] p[b,2b-1] p[2b,3b-1] p[k-b,k-1]

Carry-Skip Adders in Xilinx Spartan II FPGAs

by J-P. Deschamps, G. Bioul, G. Sutter

Delay in ns

b=k b=8 b=16 b=32

Max. Frequency

Increase

64 14 13 12 13%

96 16 14 13 21%

128 23 14 14 63%

256 38 - 16 17 141%

512 77 - 20 20 296%

1024 159 - 28 25 531%

Carry-Skip Adders in Xilinx Spartan II FPGAs

by J-P. Deschamps, G. Bioul, G. Sutter

Area in CLB slices

b=k b=8 b=16 b=32 b=8 b=16 b=32

Area Overhead

64 32 47 41 - 47% 28% -

96 48 73 66 - 52% 38% -

128 64 99 91 - 55% 42% -

256 128 - 191 179 - 49% 40%

512 256 - 391 375 - 53% 46%

1024 512 - 791 767 - 54% 50%

Carry-Skip AddersVariable-Block-Size

Carry-Skip Adder with Variable-Width Blocks

Fig. 7.2 Carry-skip adder with variable-size blocks and three sample carry paths.

b b b b. . .

RippleSkip

Carry path (1)

01t–1 t–2 Block widths

Carry path (3)

Carry path (2)

The total number of bits in the t blocks is k:

2[b + (b + 1) + . . . + (b + t/2 – 1)] = t(b + t/4 – 1/2) = k

b = k/t – t/4 + 1/2

Tvar-skip-add = 2(b – 1) + 0.5 + t – 2 = 2k/t + t/2 – 2.5

dT/db = –2k/t 2 + 1/2 = 0 t

opt = 2k

T opt = 2k – 2.5 (a factor of 2 smaller than for fixed-block)

xj+1 yj+1

pj+bi-2

xj+b -2 yj+b -2xj+b -1 yj+b -1

pj+bi-1

i i i i

p[i,i|+bi-1]

xj+1 yj+1

xj+b -2 yj+b -2

FAFA . . .

bi stages

xj+b -1 yj+b -1i i i i

sj+b -1 sj+b -2 sj+1 sj

Most critical path to produce carry

xj+1 yj+1

pj+b -2

xj+b -2 yj+b -2xj+b -1 yj+b -1

pj+b -1

i i i i

p[i,i|+bi-1]

xj+1 yj+1

xj+b -2 yj+b -2

FAFA . . .

bi stages

xj+b -1 yj+b -1i i i

sj+b -1 sj+b -2 sj+1 sj

Most critical path to assimilate carry

Variable-Block-Size Carry-Skip Adder (1)

Notation & Assumptions

Delay of skip logic = Delay of one stage of ripple-carry adder = 1 delay unit

Adder size - k-bits

Number of stages - t

Block size - variable

First and last block size - b bits

Optimum block sizes

bt-1 bt-2 bt-3 . . . bt/2+1 bt/2-1 . . . b2 b1 b0

b b+1 b+2 . . . b+t

2-1 b+

2-1 b+2 b+1 b

Total number of bits

k = 2 [ b + (b+1) + (b+2) + … + (b + + 1 ) ] =

= t ( b + - )

Number of bits in the first and last block

b = kt

Latency of the carry-skip adder with variable block width

Latencyfixed-carry-skip = ( b - 1 ) + 0.5 + t - 2 + ( b - 1 )

in block 0

OR gate skips in last block

= 2 b + t - 3.5 = 2 kt

+ t - 3.5 =

t - 2.5+

Optimal number of blocks

dLatencyvariable-carry-skip

bopt = ktopt

= k -4

bopt = 1

Optimal latency

t - 2.5+Latencyvariable-carry-skipopt =

= - 2.5

k2 - 2.5

Latencyvariable-carry-skipopt

Latencyfixed-carry-skipopt

fast adders: parallel prefix network adders, conditional-sum adders, & carry-skip adders ece...

Documents

adders and subtractors discussion d4.1. adders and...

binary adders: half adders and full adders - edward...

parallel adders

parallel prefix adders

half adders and full adders

basic adders and counters implementation of adders in fpgas...

full adders

notes adders

lecture 12b: adders

lecture 17: adders. cmos vlsi designcmos vlsi design 4th ed....

lecture13 adders

fast adders - strony przedmiotów -...

ee241 - spring 2000 - university of california,...

a study of adders

kia adders

design of arithmetic circuits – adders, subtractors, bcd...

16 bit digital adders -...

design of adders,subtractors, bcd adders week6 and 7 -...

binary parallel adders

designing dynamic carry skip adders: analysis and...