vlsi arithmetic adders & multipliers prof. vojin g. oklobdzija university of california
Post on 21-Dec-2015
217 views
TRANSCRIPT
VLSI ArithmeticAdders & Multipliers
Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel
Oklobdzija 2004 Computer Arithmetic 2
Introduction
• Digital Computer Arithmetic belongs to Computer Architecture, however, it is also an aspect of logic design.
• The objective of Computer Arithmetic is to develop appropriate algorithms that are utilizing available hardware in the most efficient way.
• Ultimately, speed, power and chip area are the most often used measures, making a strong link between the algorithms and technology of implementation.
Oklobdzija 2004 Computer Arithmetic 3
Basic Operations
• Addition
• Multiplication
• Multiply-Add
• Division
• Evaluation of Functions
• Multi-Media
Addition of Binary Numbers
Oklobdzija 2004 Computer Arithmetic 5
Addition of Binary NumbersFull Adder. The full adder is the fundamental building block of most arithmetic circuits:
The sum and carry outputs are described as:
iiiiiiiiiiiiiiiiiii cbcabacbacbacbacbac 1
iiiiiiiiiiiii cbacbacbacbas
FullAdder
CinCout
si
ai bi
Oklobdzija 2004 Computer Arithmetic 6
Addition of Binary Numbers
Propagate
Propagate
Generate
Generate
Inputs Outputs
ci ai bi si ci+1
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
Oklobdzija 2004 Computer Arithmetic 7
Full-Adder Implementation
Full Adder operations is defined by equations:
iiiiiiiiiiiiiiiiii cpcbacbacbacbacbas
iiiiiiiiiiii cpgbacbacbac 1
One-bit adder could be implemented as shown
Carry-Propagate:and Carry-Generate gi
iii bap
iii bag cout c in
s i
a i b i
Oklobdzija 2004 Computer Arithmetic 8
High-Speed Addition
iii cps
iiii cpgc 1
One-bit adder could be implemented more efficiently
because MUX is faster
iii bap iii bag
0
1s
b ia i
cout
s i
c in
Oklobdzija 2004 Computer Arithmetic 9
The Ripple-Carry Adder
Oklobdzija 2004 Computer Arithmetic 10
The Ripple-Carry Adder
A0 B0
S0
Co,0Ci,0
A1 B1
S1
Co,1
A2 B2
S2
Co,2
A3 B3
S3
Co,3
(= Ci,1)FA FA FA FA
Worst case delay linear with the number of bits
tadder N 1– tcarry tsum+
td = O(N)
Goal: Make the fastest possible carry path circuit
From Rabaey
Oklobdzija 2004 Computer Arithmetic 11
Inversion Property
A B
S
CoCi FA
A B
S
CoCi FA
S A B Ci S A B Ci
=
Co A B Ci Co A B Ci
=
From Rabaey
Oklobdzija 2004 Computer Arithmetic 12
Minimize Critical Path by Reducing Inverting Stages
A0 B0
S0
Co,0Ci,0
A1 B1
S1
Co,1
A2 B2
S2
Co,2 Co,3FA’ FA’ FA’ FA’
A3 B3
S3
Odd CellEven Cell
Exploit Inversion Property
Note: need 2 different types of cellsFrom Rabaey
Oklobdzija 2004 Computer Arithmetic 13
Ripple Carry Adder
Carry-Chain of an RCA implemented using multiplexer from the standard cell library: a i+1 b i+1 a i b i
a i+2 b i+2
cout
c i+1 c i
s is i+1s i+2
c in
Critical Path
Oklobdzija, ISCAS’88
Oklobdzija 2004 Computer Arithmetic 14
Manchester Carry-Chain Realization of the Carry Path
• Simple and very popular scheme for implementation of carry signal path
V dd
Carry out Carry in
Propagatedevice
Predischarge& kill device
Generatedevice
++++++++
V ddV ddV ddV ddV ddV ddV dd
Oklobdzija 2004 Computer Arithmetic 15
Original DesignT. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers:
A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959.
Oklobdzija 2004 Computer Arithmetic 16
Manchester Carry Chain (CMOS)
P0
Ci,0
P1
G0
P2
G1
P3
G2
P4
G3 G4
VDD
Kilburn, et al, IEE Proc, 1959.
•Implement P with pass-transistors•Implement G with pull-up, kill (delete) with pull-down•Use dynamic logic to reduce the complexity and speed up
Oklobdzija 2004 Computer Arithmetic 17
Pass-Transistor Realization in DPL A
A
B
B
C C
V C CS
S
XO R /XN O R M U LT IPLEX ER B U FFER
C C
M U LT IPLEX ER
V C CC
O
CO
B U FFER
V C C
V C C
O R /N O R
A N D /N A N D
A
A
B
B
A
A
B
B
Oklobdzija 2004 Computer Arithmetic 18
Carry-Skip Adder
MacSorley, Proc IRE 1/61Lehman, Burla, IRE Trans on Comp, 12/61
Oklobdzija 2004 Computer Arithmetic 19
Carry-Skip Adder
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,3Co,2Co,1Co,0Ci ,0
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,2Co,1Co,0Ci,0
Co,3
Mul
tipl
exer
BP=PoP1P2P3
Idea: If (P0 and P1 and P2 and P3 = 1)then Co3 = C0, else “kill” or “generate”.
Bypass
From Rabaey
Oklobdzija 2004 Computer Arithmetic 20
Carry-Skip Adder: N-bits, k-bits/group, r=N/k groups
G r G r-1
...
SN-k-1S N-1
a N -1bN -1 b N -k-1a N -k-1
S(r-1)k-1 S (r-2)k
G 1G o
...
Sk
S2k-1
a 2k-1b 2k-1 b kak
Sk-1
S0
...
...a (r-1)k b(r-1)k a (r-1)kb (r-1)k
...a k-1 b k-1 a0 b 0
...
C in
... ... ... ... ... ... ... ...
P r-1P r-2 P 1 P 0
C out + + + +
A N D
O RO RO R O R
A N DA N DA N D
critica l pa th , de lay =2(k-1)+(N /2-2)
Oklobdzija 2004 Computer Arithmetic 21
Carry-Skip Adder
SKIPRCAd tN
tkt
2
212
N
tp
ripple adder
bypass adder
4..8
k
Oklobdzija 2004 Computer Arithmetic 22
Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004 Computer Arithmetic 23
Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
G 0
... ...
a0 b
0
...
...
ai
bi
aN-1
bN-1
S j
P m -2
C inC out
C ou
t
G 2G m -2G m -1G m
G 0G 1G 2G m -2G m -1G m
S N-1S i
S 0
P 2P 0P m -1P m
.....
G 1
P 1
C in
.....
aj b
j
Carry signal path
skip ing
ripp ling
Oklobdzija 2004 Computer Arithmetic 24
Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
1 13 34 4
5 56
=9
Any-point-to-any-point delay = 9 as compared to 12 for CSKA
Oklobdzija 2004 Computer Arithmetic 25
Carry-chain block size determination for a 32-bit Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004 Computer Arithmetic 26
Delay Calculation for Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
P0
Ci,0
P1
G0
P2
G1
P3
G2
BP
G3
BP
Co,3
Delay model:
Oklobdzija 2004 Computer Arithmetic 27
Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Variable Group Length
Oklobdzija, Barnes, Arith’85
321 cNcctd
Oklobdzija 2004 Computer Arithmetic 28
Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Variable Block Lengths
• No closed form solution for delay• It is a dynamic programming problem
Oklobdzija 2004 Computer Arithmetic 29
Delay Comparison: Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004 Computer Arithmetic 30
Delay Comparison: Variable Block Adder
0
2
4
6
8
10
12
14
16
4 11 18 25 32 39 46 53 60
Size N
Del
ay
VBA- Multi-Level
CLA
VBA
VLSI ArithmeticLecture 4
Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel
Review
Lecture 3
Oklobdzija 2004 Computer Arithmetic 33
Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004 Computer Arithmetic 34
Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
G0
... ...
a0 b
0
...
...
ai
bi
aN-1
bN-1
Sj
Pm-2
CinCout
Cout
G2Gm-2Gm-1Gm
G0G1G2Gm-2Gm-1Gm
SN-1Si
S0
P2P0Pm-1Pm
.....
G1
P1
Cin
.....
aj b
j
Carry signal path
skiping
rippling
Oklobdzija 2004 Computer Arithmetic 35
Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
1 13 34 4
5 56
=9
Any-point-to-any-point delay = 9 as compared to 12 for CSKA
Oklobdzija 2004 Computer Arithmetic 36
Carry-chain block size determination for a 32-bit Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004 Computer Arithmetic 37
Delay Calculation for Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
P0
Ci,0
P1
G0
P2
G1
P3
G2
BP
G3
BP
Co,3
Delay model:
Oklobdzija 2004 Computer Arithmetic 38
Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Variable Group Length
Oklobdzija, Barnes, Arith’85
321 cNcctd
Oklobdzija 2004 Computer Arithmetic 39
Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Variable Block Lengths
• No closed form solution for delay• It is a dynamic programming problem
Oklobdzija 2004 Computer Arithmetic 40
Delay Comparison: Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004 Computer Arithmetic 41
Delay Comparison: Variable Block Adder
0
2
4
6
8
10
12
14
16
4 11 18 25 32 39 46 53 60
Size N
Del
ay
VBA- Multi-Level
CLA
VBASquare Root Dependency
Log Dependency
Oklobdzija 2004 Computer Arithmetic 42
Circuit Issues
• Adder speed can not be estimated based on:– logic gates in the critical path– number of transistors in the path– logic levels in the path
• Estimating Adders speed is much more complex and many of the “fast” schemes may be misleading you.
Oklobdzija 2004 Computer Arithmetic 43
Fan-Out Dependency
Oklobdzija 2004 Computer Arithmetic 44
Fan-In Dependency
This looks like “Logical Effort”
(1985)
Oklobdzija 2004 Computer Arithmetic 45
Delay Comparison: Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004 Computer Arithmetic 46
Oklobdzija 2004 Computer Arithmetic 47
Carry-Lookahead Adder(Weinberger and Smith, 1958)
Ref: A. Weinberger and J. L. Smith, “A Logic for High-Speed Addition”, National Bureau of Standards, Circ. 591, p.3-12, 1958.
ARITH-13: Presenting Achievement Award to Arnold Weinberger of IBM (who invented CLA adder in 1958)
Oklobdzija 2004 Computer Arithmetic 48
CLA Definitions: One-bit adder
iii cps
iiii cpgc 1
iii bap iii bag
0
1s
b ia i
cout
s i
c in
Oklobdzija 2004 Computer Arithmetic 49
CLA Definitions: 4-bit Adderai bi
Ci
gi pi
ai+1 bi+1
Ci+1
gi+1 pi+1
ai+2 bi+2
Ci+2
gi+2 pi+2
ai+3 bi+3
Ci+3
gi+3 pi+3
Ci+4
1111
1111112 )(
cppgpg
cpgpgcpgc
iiiii
iiiiiiii
iiiiiiiiiiii cpgbacbacbac 1
Oklobdzija 2004 Computer Arithmetic 50
Carry-Lookahead Adder: 4-bitsai bi
Ci
gi pi
ai+1 bi+1
Ci+1
gi+1 pi+1
ai+2 bi+2
Ci+2
gi+2 pi+2
ai+3 bi+3
Ci+3
gi+3 pi+3
Ci+4
iiiiiiiiii
iiiiiiiiiiii
cpppgppgpg
cppgpgpgcpgc
1212122
111222223
)(
iiiiiiiiiiiiiii
iiiiiiiiiiii
cppppgpppgppgpg
gppgpgpgcpgc
123123123233
12122333334
)(
Gj Pj
Oklobdzija 2004 Computer Arithmetic 51
Carry-Lookahead Adderiiiiiiiiiij gpppgppgpgG 123123233
iiiij ppppP 123
jjjj cPGc )1(4
One gate delay to calculate p, g
One to calculateP and two for G
Three gate delaysTo calculate C4(j+1)
Compare that to 8 in RCA !
a i b i
Cin Cj
G jP j
a i+1 b i+1
g i+1p i+1 g i p i
a i+2 b i+2a i+3 b i+3
g i+1p i+1g i+1p i+1
C4(j+1)
C4j+1C4j+2C4j+3
P , G G roup
Oklobdzija 2004 Computer Arithmetic 52
Carry-Lookahead Adder(Weinberger and Smith)
iiiiiiiiiij GPPPGPPGPG 123123233*G
iiiij PPPPP 123*
jkkj cPGc 4)1(4 **
P j
G* P*
C 4j+1
G jP j+1G j+1P j+3G j+3P j+2G j+2
C4jC4(j+1)
C 4j+2C 4j+3
Additional two gate delays
C16 will take a total of 5 vs. 32 for RCA !
Oklobdzija 2004 Computer Arithmetic 53
32-bit Carry Lookahead Adder
C in
C out C in
C 4C 8C 12
C out
C 20C 24C 28
C in
C 16
a ib i
ind ividua l addersgenera ting: g i, p i,
and sum S i
C arry-lookahead b locks o f4-b its generating:
G i, P i, and C in fo r theadders
C arry-lookahead super- b locks o f4-b its b locks genera ting:
G * i, P * i, and C in fo r the 4-b itb locks
G roup producing fina lcarry C out and C 16
C ritica l pa th de lay = (fo r g i,p i)+2x2 (fo r G ,P )+3x2 (fo r C in)+1XO R - (fo r S um ) = appx. 12of de lay
Oklobdzija 2004 Computer Arithmetic 54
Carry-Lookahead Adder(Weinberger and Smith: original derivation, 1958 )
Oklobdzija 2004 Computer Arithmetic 55
Carry-Lookahead Adder(Weinberger and Smith: original derivation )
Oklobdzija 2004 Computer Arithmetic 56
Carry-Lookahead Adder (Weinberger and Smith)please notice the similarity with Parallel-Prefix Adders !
Oklobdzija 2004 Computer Arithmetic 57
Carry-Lookahead Adder (Weinberger and Smith)please notice the similarity with Parallel-Prefix Adders !
Motorola: CLA Implementation Example
A. Naini, D. Bearden and W. Anderson, “A 4.5nS 96b CMOS Adder Design”,
Proceedings of the IEEE Custom Integrated Circuits Conference, May 3-6, 1992.
Oklobdzija 2004 Computer Arithmetic 59
Critical path in Motorola's 64-bit CLA
C ritica l pa th : A , B - G 0 - G 3:0 - G 15:0 - G 47:0 - C 48 - C 60 - C 63 - S 63
G4
P7
G0
P0
G1
P1
G2
P2
G3
P3
...
CARRYBLOCK
G8
P1
1
... G1
2
P1
5
... G1
6
P3
1
... G3
2
P4
7
... G4
8
P5
1
G6
0
P6
0
G6
1
P6
1
G6
2
P6
2
G6
3
P6
3
... G5
2
P5
5
... G5
6
P5
9
...
PG BLOCK
PG BLOCK
PG BLOCK
PG BLOCK
P,G
0
P,G
1:0
P,G
2:0
G3
:0
P3
:0
G7
:4
P7
:4
G1
1:8
P1
1:8
G1
5:1
2
P1
5:1
2
G3
:0
P3
:0
G7
:0
P7
:0
G1
1:0
P1
1:0
G1
5:0
P1
5:0
G1
5:0
P1
5:0
G3
1:1
6
P3
1:1
6
G3
1:0
P3
1:0
G4
7:3
2
P4
7:3
2
G4
7:0
P4
7:0
G5
1:4
8
P5
1:4
8
G5
5:5
2
P5
5:5
2
G5
9:5
6
P5
9:5
6
C6
4
G5
1:4
8
P5
1:4
8
G5
5:4
8
P5
5:4
8
G5
9:4
8
P5
9:4
8
P,G
60
P,G
61
:60
P,G
62
:60
G6
3:6
0
P6
3:6
0
G6
3:4
8
P6
3:4
8
G6
3:0
P6
3:0
C0
C4
C8
C1
2
C1
6
C3
2
C4
8
C1
6
C3
2
C4
8
C5
2
C5
6
C6
0
C6
3
PG BLOCK
C6
2
C6
1
1.05nS
1.7nS
2.0nS 2.35nS
2.7nS
3.75nS
4.8nS
Oklobdzija 2004 Computer Arithmetic 60
Motorola's 64-bit CLA
conventional PG Block
carry ripples locally5-transistors in the path
no better situation here !
Basically, this is MCC performance with Carry-Skip.One should not expect any better results than VBA.
Oklobdzija 2004 Computer Arithmetic 61
Motorola's 64-bit CLA
Modified PG Block
Intermediate propagate signals Pi:0 are generated to speed-up C3
still critical path resembles MCC
Oklobdzija 2004 Computer Arithmetic 62
Motorola's 64-bit CLA
1.8nS
2.2nS
2.9nS 3.2nS
3.55nS
3.9nS
Oklobdzija 2004 Computer Arithmetic 63
C ritica l pa th : A , B - G 0 - G 3:0 - G 15:0 - G 47:0 - C 48 - C 60 - C 63 - S 63
G4
P7
G0
P0
G1
P1
G2
P2
G3
P3
...
CARRYBLOCK
G8
P1
1
... G1
2
P1
5
... G1
6
P3
1
... G3
2
P4
7
... G4
8
P5
1
G6
0
P6
0
G6
1
P6
1
G6
2
P6
2
G6
3
P6
3... G
52
P5
5
... G5
6
P5
9
...
PG BLOCK
PG BLOCK
PG BLOCK
PG BLOCK
P,G0
P,G1
:0
P,G2
:0
G3
:0
P3
:0
G7
:4
P7
:4
G1
1:8
P1
1:8
G1
5:1
2
P1
5:1
2
G3
:0
P3
:0
G7
:0
P7
:0
G1
1:0
P1
1:0
G1
5:0
P1
5:0
G1
5:0
P1
5:0
G3
1:1
6
P3
1:1
6
G3
1:0
P3
1:0
G4
7:3
2
P4
7:3
2
G4
7:0
P4
7:0
G5
1:4
8
P5
1:4
8
G5
5:5
2
P5
5:5
2
G5
9:5
6
P5
9:5
6
C6
4
G5
1:4
8
P5
1:4
8
G5
5:4
8
P5
5:4
8
G5
9:4
8
P5
9:4
8
P,G6
0
P,G6
1:6
0
P,G6
2:6
0
G6
3:6
0
P6
3:6
0
G6
3:4
8
P6
3:4
8
G6
3:0
P6
3:0
C0
C4
C8
C1
2
C1
6
C3
2
C4
8
C1
6
C3
2
C4
8
C5
2
C5
6
C6
0
C6
3
PG BLOCK
C6
2
C6
1
1.05nS
1.7nS
2.0nS 2.35nS
2.7nS3.75nS
4.8nS
1.8nS
2.2nS
2.9nS 3.2nS
3.55nS
3.9nS
Delay Optimized CLA
B. Lee, V. G. OklobdzijaJournal of VLSI Signal Processing, Vol.3, No.4, October 1991
Oklobdzija 2004 Computer Arithmetic 65
Delay Optimized CLA: Lee-
Oklobdzija ‘91(a.) Fixed groups and levels
(b.) variable-sized groups, fixed levels
(c.) variable-sized groups and fixed levels
(d.) variable-sized groups and levels
Oklobdzija 2004 Computer Arithmetic 66
Two-Levels of Logic Implementation of the Carry Block
Oklobdzija 2004 Computer Arithmetic 67
Two-Levels of Logic Implementation of the Carry-Lookahead Block
Oklobdzija 2004 Computer Arithmetic 68
Three-Levels of Logic Implementation of the Carry Block (restricted fan-in)
Oklobdzija 2004 Computer Arithmetic 69
Three-Levels of Logic Implementation of the Carry Lookahead (restricted fan-in)
Oklobdzija 2004 Computer Arithmetic 70
Delay Optimized CLA: Lee-Oklobdzija ‘91
Delay: Two-level BCLA Delay: Three-level BCLA
Oklobdzija 2004 Computer Arithmetic 71
Delay Optimized CLA: Lee-Oklobdzija ‘91
(a.) 2-level BCLA =8.5nS (b.) 3-level BCLA =8.9nS
Ling’s Adder
Huey Ling, “High-Speed Binary Adder”
IBM Journal of Research and Development, Vol.5, No.3, 1981.
Used in: IBM 3033, IBM 168, Amdahl V6, HP etc.
Oklobdzija 2004 Computer Arithmetic 73
Ling’s Derivations
ai bi pi gi ti
0 0 0 0 0
0 1 1 0 1
1 0 1 0 1
1 1 0 1 1
iii CCH 11
iii bag
ai bi
ci
si
ci+1
gi implies Ci+1 which implies Hi+1 , thus: gi= gi Hi+1
iiii CpgC 1
define:
111
11
iiiiii
iiiiiiiii
HpCpCp
CppgpCpCp
1 iiii HpCp
111
11
iiiiii
iiiiiiii
HtHpHg
CpHgCpgC
11 iii HtC
Oklobdzija 2004 Computer Arithmetic 74
Ling’s Derivations
iii CCH 11 iiii CpgC 1
From: and
iiiiiiiii CgCCpgCCH 11
iiii HtgH 11 11 iii HtCbecause:
fundamental expansion
Now we need to derive Sum equation
Oklobdzija 2004 Computer Arithmetic 75
Ling Adder
Variation of CLA:
Ling, IBM J. Res. Dev, 5/81
iiii CpgC 1
iii CpS
iii bap
iii bag
iiii HtgH 11
iiiiii HtgHtS 11
iii bat
iii bag
Ling’s equations:
Oklobdzija 2004 Computer Arithmetic 76
Ling Adder
iiii
iiiiii
Cpgg
CpCggC
1
iiii CtgC 1 11 iiii HtgH
Ling’s equation:
see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988.
Ling uses different transfer function.Four of those functions have desiredproperties (Ling’s is one of them)
Variation of CLA:
Oklobdzija 2004 Computer Arithmetic 77
Ling Adder
inCttttgtttgttgtgC 012301231232334
in
in
CtttgttgtggH
CttttgtttgttgtgH
01201212234
101200121122234
Conventional:
Ling:
Fan-in of 5
Fan-in of 4
Oklobdzija 2004 Computer Arithmetic 78
Advantages of Ling’s Adder
• Uniform loading in fan-in and fan-out
• H16 contains 8 terms as compared to G16 that contains 15.
• H16 can be implemented with one level of logic (in ECL), while G16 can not.
(Ling’s adder takes full advantage of wired-OR, of special importance when ECL technology is used)
VLSI ArithmeticLecture 5
Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel
Review
Lecture 4
Ling’s Adder
Huey Ling, “High-Speed Binary Adder”
IBM Journal of Research and Development, Vol.5, No.3, 1981.
Used in: IBM 3033, IBM S370/168, Amdahl V6, HP etc.
Oklobdzija 2004 Computer Arithmetic 82
Ling’s Derivations
ai bi pi gi ti
0 0 0 0 0
0 1 1 0 1
1 0 1 0 1
1 1 0 1 1
iii CCH 11
iii bag
ai bi
ci
si
ci+1
gi implies Ci+1 which implies Hi+1 , thus: gi= gi Hi+1
iiii CpgC 1
define:
11
iiiiii
iiiiiiiii
HpCpCp
CppgpCpCp
1 iiii HpCp
111
11
iiiiii
iiiiiiii
HtHpHg
CpHgCpgC
11 iii HtC
Oklobdzija 2004 Computer Arithmetic 83
Ling’s Derivations
iii CCH 11 iiii CpgC 1
From: and
iiiiiiiii CgCCpgCCH 11
iiii HtgH 11 11 iii HtCbecause:
fundamental expansion
Now we need to derive Sum equation
Oklobdzija 2004 Computer Arithmetic 84
Ling Adder
Variation of CLA:
Ling, IBM J. Res. Dev, 5/81
iiii CpgC 1
iii CpS
iii bap
iii bag
iiii HtgH 11
iiiiii HtgHtS 11
iii bat
iii bag
Ling’s equations:
Oklobdzija 2004 Computer Arithmetic 85
Ling Adder
iiii
iiiiii
Cpgg
CpCggC
1
iiii CtgC 1 iiii HtgH 11
Ling’s equation:
see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988.
Ling uses different transfer function.Four of those functions have desiredproperties (Ling’s is one of them)
Variation of CLA:
ai bi
ci
si
ci+1
ai-1 bi-1
ci-1
si-1
gi, ti gi-1, ti-1
Hi+1 Hi
Oklobdzija 2004 Computer Arithmetic 86
Ling Adder
inCttttgtttgttgtgC 012301231232334
in
in
CtttgttgtggH
CttttgtttgttgtgH
01201212234
101200121122234
Conventional:
Ling:
Fan-in of 5
Fan-in of 4
Oklobdzija 2004 Computer Arithmetic 87
Advantages of Ling’s Adder• Uniform loading in fan-in and fan-out
• H16 contains 8 terms as compared to G16 that contains 15.
• H16 can be implemented with one level of logic (in ECL), while G16 can not (with 8-way wire-OR).
(Ling’s adder takes full advantage of wired-OR, of special importance when ECL technology is used - his IBM limitation was fan-in of 4 and wire-OR of 8)
Oklobdzija 2004 Computer Arithmetic 88
Ling: Weinberger Notes
Oklobdzija 2004 Computer Arithmetic 89
Ling: Weinberger Notes
Oklobdzija 2004 Computer Arithmetic 90
Ling: Weinberger Notes
Oklobdzija 2004 Computer Arithmetic 91
Advantage of Ling’s Adder
• 32-bit adder used in: IBM 3033, IBM S370/ Model168, Amdahl V6.
• Implements 32-bit addition in 3 levels of logic
• Implements 32-bit AGEN: B+Index+Disp in 4 levels of logic (rather than 6)
• 5 levels of logic for 64-bit adder used in HP processor
Oklobdzija 2004 Computer Arithmetic 92
Implementation of Ling’s Adder in CMOS
(S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96)
Oklobdzija 2004 Computer Arithmetic 93
S. Naffziger, ISSCC’96
01212234 gttgtggH
11 iii HtC
Oklobdzija 2004 Computer Arithmetic 94
S. Naffziger, ISSCC’96
01212234 gttgtggH
Oklobdzija 2004 Computer Arithmetic 95
S. Naffziger, ISSCC’96
01212234 gttgtggH
Oklobdzija 2004 Computer Arithmetic 96
S. Naffziger, ISSCC’96
Oklobdzija 2004 Computer Arithmetic 97
S. Naffziger, ISSCC’96
Oklobdzija 2004 Computer Arithmetic 98
S. Naffziger, ISSCC’96
Oklobdzija 2004 Computer Arithmetic 99
S. Naffziger, ISSCC’96
Oklobdzija 2004 Computer Arithmetic 100
S. Naffziger, ISSCC’96
)( 0711711111515161516 gttgtggpHpC
Oklobdzija 2004 Computer Arithmetic 101
S. Naffziger, ISSCC’96
Oklobdzija 2004 Computer Arithmetic 102
S. Naffziger, ISSCC’96
Oklobdzija 2004 Computer Arithmetic 103
S. Naffziger, ISSCC’96
Oklobdzija 2004 Computer Arithmetic 104
Ling Adder Critical Path
Oklobdzija 2004 Computer Arithmetic 105
Ling Adder: Circuits
A0
B0
A1 B1A1
B1
A2
B2
A2 B2
CKG3
G4
CK
A3
B3P4
A2 B2
B3A3B1
A0 B0
A1
CK
CK
P
LCH LCL
C1H C0LC1L C0H
SumH
CK
K
G
SumL LCH LCL
C1H C0LC1L C0H
CK
P2
P1
G0
CKLC
G2G1
Oklobdzija 2004 Computer Arithmetic 106
LCS4 – Critical G Path
4b
in1
G3
12b
P4(k,p) or (g,p) G4
C15
32b
C47 C15C31
S63 S48S62
16b
Oklobdzija 2004 Computer Arithmetic 107
LCS4 – Logical Effort Delay
Prefix-4 Ling/Conditional-Sum (Dynamic - Long Carry Path)
Stages Branch LE ParasiticTotal
Branch Total LEPath Effort fo, opt
Effort Delay
(ps)
Parasitic Delay
(ps)
Total Delay
(ps)
Total Delay (FO4)
dg3# (dg3) 4.0 0.98 2.97g4 (NAND2) 2.0 1.11 1.84C15# (GG4) 1.0 1.01 1.80C15 (INV) 1.0 1.00 1.00C47# (LC) 3.0 1.03 3.32C47 (INV) 1.0 1.00 1.00C47#b (INV) 1.0 1.00 1.00C47b (INV) 1.0 1.00 1.00S63# (SUM) 16.0 0.86 1.36S63 (INV) 1.0 1.00 1.00
3.74E+023.84E+02 9.73E-01 7.2701.81 13666
Oklobdzija 2004 Computer Arithmetic 108
Results:
• 0.5u Technology
• Speed: 0.930 nS
• Nominal process, 80C, V=3.3V
See: S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96
Prefix Addersand
Parallel Prefix Adders
Oklobdzija 2004 Computer Arithmetic 110
from: Ercegovac-Lang
Oklobdzija 2004 Computer Arithmetic 111
Prefix Adders
(g0, p0)
Following recurrence operation is defined:
(g, p)o(g’,p’)=(g+pg’, pp’)
such that:
Gi, Pi =
(gi, pi)o(Gi-1, Pi-1 )
i=0
1 ≤ i ≤ n
ci+1 = Gifor i=0, 1, ….. n
c1 = g0+ p0 cin (g-1, p-1)=(cin,cin)
This operation is associative, but not commutativeIt can also span a range of bits (overlapping and adjacent)
Oklobdzija 2004 Computer Arithmetic 112
from: Ercegovac-Lang
Oklobdzija 2004 Computer Arithmetic 113
Parallel Prefix Adders: variety of possibilitiesfrom: Ercegovac-Lang
Oklobdzija 2004 Computer Arithmetic 114
Pyramid Adder:M. Lehman, “A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic
Units”, IFIP Congress, Munich, Germany, 1962.
Oklobdzija 2004 Computer Arithmetic 115
Parallel Prefix Adders: variety of possibilitiesfrom: Ercegovac-Lang
Oklobdzija 2004 Computer Arithmetic 116
Parallel Prefix Adders: variety of possibilitiesfrom: Ercegovac-Lang
Oklobdzija 2004 Computer Arithmetic 117
Hybrid BK-KS Adder
Oklobdzija 2004 Computer Arithmetic 118
Parallel Prefix Adders: S. Knowles 1999
operation is associative: h>i≥j≥k
operation is idempotent: h>i≥j≥k
produces carry: cin=0
Oklobdzija 2004 Computer Arithmetic 119
Parallel Prefix Adders: Ladner-Fisher
Exploits associativity, but not idempotency. Produces minimal logical depth
Oklobdzija 2004 Computer Arithmetic 120
Two wires at each level. Uniform, fan-in of two.Large fan-out (of 16; n/2); Large capacitive loading combined with the long wires (in the last stages)
Parallel Prefix Adders: Ladner-Fisher(16,8,4,2,1)
Oklobdzija 2004 Computer Arithmetic 121
Parallel Prefix Adders: Kogge-StoneExploits idempotency to limit the fan-out to 1. Dramatic increase in wires. The wire span remains the same as in Ladner-Fisher.
Buffers needed in both cases: K-S, L-F
Oklobdzija 2004 Computer Arithmetic 122
Kogge-Stone Adder
Oklobdzija 2004 Computer Arithmetic 123
Parallel Prefix Adders: Brent-Kung
• Set the fan-out to one
• Avoids explosion of wires (as in K-S)
• Makes no sense in CMOS:– fan-out = 1 limit is arbitrary and extreme– much of the capacitive load is due to wire
(anyway)
• It is more efficient to insert buffers in L-F than to use B-K scheme
Oklobdzija 2004 Computer Arithmetic 124
Brent-Kung Adder
Oklobdzija 2004 Computer Arithmetic 125
Parallel Prefix Adders: Han-Carlson
• Is a hybrid synthesis of L-F and K-S
• Trades increase in logic depth for a reduction in fan-out:– effectively a higher-radix variant of K-S.– others do it similarly by serializing the prefix
computation at the higher fan-out nodes.
• Others, similarly trade the logical depth for reduction of fan-out and wire.
Oklobdzija 2004 Computer Arithmetic 126
Parallel Prefix Adders: variety of possibilitiesfrom: Knowles
bounded by L-F and K-S at ends
Oklobdzija 2004 Computer Arithmetic 127
Parallel Prefix Adders: variety of possibilitiesKnowles 1999
Following rules are used:
• Lateral wires at the jth level span 2j bits
• Lateral fan-out at jth level is power of 2 up to 2j
• Lateral fan-out at the jth level cannot exceed that a the (j+1)th level.
Oklobdzija 2004 Computer Arithmetic 128
Parallel Prefix Adders: variety of possibilitiesKnowles 1999
• The number of minimal depth graphs of this type is given in:
• at 4-bits there is only K-S and L-F, afterwards there are several new possibilities.
Oklobdzija 2004 Computer Arithmetic 129
Parallel Prefix Adders: variety of possibilities
example of a new 32-bit adder [4,4,2,2,1]
Knowles 1999
Oklobdzija 2004 Computer Arithmetic 130
Parallel Prefix Adders: variety of possibilities
Example of a new 32-bit adder [4,4,2,2,1]
Knowles 1999
Oklobdzija 2004 Computer Arithmetic 131
Parallel Prefix Adders: variety of possibilitiesKnowles 1999
• Delay is given in terms of FO4 inverter delay: w.c.(nominal case is 40-50% faster)
• K-S is the fastest• K-S adders are wire limited (requiring 80% more area)• The difference is less than 15% between examined schemes
Oklobdzija 2004 Computer Arithmetic 132
Parallel Prefix Adders: variety of possibilitiesKnowles 1999
Conclusion
• Irregular, hybrid schmes are possible
• The speed-up of 15% is achieved at the cost of large wiring, hence area and power
• Circuits close in speed to K-S are available at significantly lower wiring cost
VLSI ArithmeticLecture 6
Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel
Review
Lecture 5
Prefix Addersand
Parallel Prefix Adders
Oklobdzija 2004 Computer Arithmetic 136
from: Ercegovac-Lang
Oklobdzija 2004 Computer Arithmetic 137
Prefix Adders
(g0, p0)
Following recurrence operation is defined:
(g, p)o(g’,p’)=(g+pg’, pp’)
such that:
Gi, Pi =
(gi, pi)o(Gi-1, Pi-1 )
i=0
1 ≤ i ≤ n
ci+1 = Gifor i=0, 1, ….. n
c1 = g0+ p0 cin (g-1, p-1)=(cin,cin)
This operation is associative, but not commutativeIt can also span a range of bits (overlapping and adjacent)
Oklobdzija 2004 Computer Arithmetic 138
Parallel Prefix Adders: S. Knowles 1999
operation is associative: h>i≥j≥k
operation is idempotent: h>i≥j≥k
produces carry: cin=0
Oklobdzija 2004 Computer Arithmetic 139
from: Ercegovac-Lang
Oklobdzija 2004 Computer Arithmetic 140
Parallel Prefix Adders: variety of possibilitiesfrom: Ercegovac-Lang
Oklobdzija 2004 Computer Arithmetic 141
Parallel Prefix Adders: variety of possibilitiesfrom: Ercegovac-Lang
Oklobdzija 2004 Computer Arithmetic 142
Parallel Prefix Adders: variety of possibilitiesfrom: Ercegovac-Lang
Oklobdzija 2004 Computer Arithmetic 143
Kogge-Stone Adder
Oklobdzija 2004 Computer Arithmetic 144
Brent-Kung Adder
Oklobdzija 2004 Computer Arithmetic 145
Hybrid BK-KS Adder
Oklobdzija 2004 Computer Arithmetic 146
Pyramid Adder:M. Lehman, “A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic
Units”, IFIP Congress, Munich, Germany, 1962.
Oklobdzija 2004 Computer Arithmetic 147
Parallel Prefix Adders: Ladner-Fisher
Exploits associativity, but not idempotency. Produces minimal logical depth
Oklobdzija 2004 Computer Arithmetic 148
Two wires at each level. Uniform, fan-in of two.Large fan-out (of 16; n/2); Large capacitive loading combined with the long wires (in the last stages)
Parallel Prefix Adders: Ladner-Fisher(16,8,4,2,1)
Oklobdzija 2004 Computer Arithmetic 149
Parallel Prefix Adders: Kogge-StoneExploits idempotency to limit the fan-out to 1. Dramatic increase in wires. The wire span remains the same as in Ladner-Fisher.
Buffers needed in both cases: K-S, L-F
Oklobdzija 2004 Computer Arithmetic 150
Parallel Prefix Adders: Brent-Kung
• Set the fan-out to one
• Avoids explosion of wires (as in K-S)
• Makes no sense in CMOS:– fan-out = 1 limit is arbitrary and extreme– much of the capacitive load is due to wire
(anyway)
• It is more efficient to insert buffers in L-F than to use B-K scheme
Oklobdzija 2004 Computer Arithmetic 151
G2,P2
G3,P3
G4,P4
G1,P1
C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15Cout
Two Parallel Prefix Adder Structures
G2,P2
G3,P3
G4,P4
G1,P1
C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15Cout
Kogge-Stone Han-Carlson
• log(bits) carry stages• Extra Wiring
• log(bits) + 1 carry stages• Reduced Wiring and Gates
Oklobdzija 2004 Computer Arithmetic 152
Parallel Prefix Adders: Han-Carlson
• Is a hybrid synthesis of L-F and K-S
• Trades increase in logic depth for a reduction in fan-out:– effectively a higher-radix variant of K-S.– others do it similarly by serializing the prefix
computation at the higher fan-out nodes.
• Others, similarly trade the logical depth for reduction of fan-out and wire.
Oklobdzija 2004 Computer Arithmetic 153
Parallel Prefix Adders: variety of possibilitiesfrom: Knowles
bounded by L-F and K-S at ends
Oklobdzija 2004 Computer Arithmetic 154
Parallel Prefix Adders: variety of possibilitiesKnowles 1999
Following rules are used:
• Lateral wires at the jth level span 2j bits
• Lateral fan-out at jth level is power of 2 up to 2j
• Lateral fan-out at the jth level cannot exceed that a the (j+1)th level.
Oklobdzija 2004 Computer Arithmetic 155
Parallel Prefix Adders: variety of possibilitiesKnowles 1999
• The number of minimal depth graphs of this type is given in:
• at 4-bits there is only K-S and L-F, afterwards there are several new possibilities.
Oklobdzija 2004 Computer Arithmetic 156
Parallel Prefix Adders: variety of possibilities
example of a new 32-bit adder [4,4,2,2,1]
Knowles 1999
Oklobdzija 2004 Computer Arithmetic 157
Parallel Prefix Adders: variety of possibilities
Example of a new 32-bit adder [4,4,2,2,1]
Knowles 1999
Oklobdzija 2004 Computer Arithmetic 158
Parallel Prefix Adders: variety of possibilitiesKnowles 1999
• Delay is given in terms of FO4 inverter delay: w.c.(nominal case is 40-50% faster)
• K-S is the fastest• K-S adders are wire limited (requiring 80% more area)• The difference is less than 15% between examined schemes
Oklobdzija 2004 Computer Arithmetic 159
Parallel Prefix Adders: variety of possibilitiesKnowles 1999
Conclusion
• Irregular, hybrid schmes are possible
• The speed-up of 15% is achieved at the cost of large wiring, hence area and power
• Circuits close in speed to K-S are available at significantly lower wiring cost
Oklobdzija 2004 Computer Arithmetic 160
Possibilities for Further Research
• The logical depth is important (Knowles was right)• The fan-out is less important than fan-in (Knowles
was wrong):– It is possible to examine a variety of topologies with
restricted and varied fan-in.• Driving strength and Logical Effort rules were
overlooked and at least neglected:– It is possible to create number of topologies taking LE
rules into account.– It is further possible to combine the rules with
compound domino implementation taking advantage of two different rules governing “dynamic” and “static”.
• It is still possible to produce a better adder !
Oklobdzija 2004 Computer Arithmetic 161
Other Types of Adders
Conditional Sum Adder
J. Sklansky, “Conditional-Sum Addition Logic”, IRE Transactions on Electronic
Computers, EC-9, p.226-231, 1960.
Oklobdzija 2004 Computer Arithmetic 163
Conditional Sum Adder
from: Ercegovac-Lang
Oklobdzija 2004 Computer Arithmetic 164
ConditionalSum Adder
Oklobdzija 2004 Computer Arithmetic 165
Conditional Sum Adder
from: Ercegovac-Lang
Oklobdzija 2004 Computer Arithmetic 166
Conditional Sum Adder
from: Ercegovac-Lang
Oklobdzija 2004 Computer Arithmetic 167
Conditional Sum Adder
Carry-Select Adder
O. J. Bedrij, “Carry-Select Adder”, IRE Transactions on Electronic Computers, June
1962, p.340-34
Oklobdzija 2004 Computer Arithmetic 169
Carry-Select Sum Adder
from: Ercegovac-Lang
Oklobdzija 2004 Computer Arithmetic 170
Carry-Select Adder
Addition under assumption of Cin=0 and Cin =1.
Oklobdzija 2004 Computer Arithmetic 171
Carry Select Adder:combining two 32-b VBAs in select mode
Delay =VBA32+ MUX
Oklobdzija 2004 Computer Arithmetic 172
Carry-Select Adder
O.J. Bedrij, IBM Poughkeepsie, 1962