cmpen 411 vlsi digital circuits spring 2012 lecture 20 ...kxc104/class/cmpen411/14f/lec/c... ·...
TRANSCRIPT
Sp12 CMPEN 411 L20 S.1
CMPEN 411VLSI Digital Circuits
Spring 2012
Lecture 20: Multiplier Design
[Adapted from Rabaey‘s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
Sp12 CMPEN 411 L20 S.2
Review: Basic Building Blocks
Datapath
Execution units
- Adder, multiplier, divider, shifter, etc.
Register file and pipeline registers
Multiplexers, decoders
Control
Finite state machines (PLA, ROM, random logic)
Interconnect
Switches, arbiters, buses
Memory
Caches (SRAMs), TLBs, DRAMs, buffers
Sp12 CMPEN 411 L20 S.3
The Binary Multiplication
x
+
Partial products
Multiplicand
Multiplier
Result
1 0 1 0 1 0
1 0 1 0 1 0
1 0 1 0 1 0
1 1 1 0 0 1 1 1 0
0 0 0 0 0 0
1 0 1 0 1 0
1 0 1 1
Sp12 CMPEN 411 L20 S.4
Multiply Operation
Multiplication is just a a lot of additions
multiplicand
multiplier
partial
product
array
double precision product
N
2N
N can be formed in parallel
Sp12 CMPEN 411 L20 S.5
Multiplication Approaches
Right shift and add
Partial product array rows are accumulated from top to bottom on an N-bit adder
- After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add
Time for N bits Tserial_mult = O(N Tadder) = O(N2) for a RCA
Making it faster
Use a faster adder
Use higher radix (e.g., base 4) multiplication – O(N/2 Tadder)
- Use multiplier recoding to simplify multiple formation (booth)
Form the partial product array in parallel and add it in parallel
Making it smaller (i.e., slower)
Use serial-parallel mult
Use an array multiplier
- Very regular structure with only short wires to nearest neighbor cells. Thus, very simple and efficient layout in VLSI Can be easily and efficiently pipelined
Sp12 CMPEN 411 L20 S.6
Serial-parallel multiplier structure
Sp12 CMPEN 411 L20 S.7
The Array Multiplier
Y0
Y1
X3 X2 X1 X0
X3
HA
X2
FA
X1
FA
X0
HA
Y2X3
FA
X2
FA
X1
FA
X0
HA
Z1
Z3Z6Z7 Z5 Z4
Y3X3
FA
X2
FA
X1
FA
X0
HA
Z2
Z0
Sp12 CMPEN 411 L20 S.11
Booth multiplier
Encoding scheme to reduce number of stages in multiplication.
Performs two bits of multiplication at once—requires half the stages.
Each stage is slightly more complex than simple multiplier, but adder/subtracter is almost as small/fast as adder.
Sp12 CMPEN 411 L20 S.12
Booth encoding
Two‘s-complement form of multiplier:
y = -2nyn + 2n-1yn-1 + 2n-2yn-2 + ... (first bit is the sign bit)
(example, y=18=010010 y= -18 = 101110 )
Rewrite using 2a = 2a+1 - 2a:
y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...
Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product.
Sp12 CMPEN 411 L20 S.13
Booth actions
yi yi-1 yi-2 increment
0 0 0 0
0 0 1 x
0 1 0 x
0 1 1 2x
1 0 0 -2x
1 0 1 -x
1 1 0 -x
1 1 1 0
y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...
Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product.
Sp12 CMPEN 411 L20 S.14
Booth example
x = 1001 (910), y = 0111 (710).
P0 = 00000000
y3y2y1=011 y1y0y-1=11(0)
y1y0y-1 = 110, P1 = P0 - (1001) = 11110111
x shift left for 2 bits to be 100100
y3y2y1 = 011, P2 = P1 (10*100100) =
11110111+01001000 = 001111111 (6310)
An array multiplier needs N addtions, booth multiplier needs only N/2 additions
Sp12 CMPEN 411 L20 S.15
Review: A 64-bit Adder/Subtractor
1-bit
FA S0
C0=Cin
C1
1-bit
FA S1
C2
1-bit
FA S2
C3
C64=Cout
1-bit
FA S63
C63
. .
.
Ripple Carry Adder (RCA) built out of 64 FAs
Subtraction – complement all subtrahend bits (xor gates) and set the low order carry-in
RCA
advantage: simple logic, so small (low cost)
disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption)
A0
B0
A1
B1
A2
B2
A63
B63
add/subt
Sp12 CMPEN 411 L20 S.16
Booth structure
Sp12 CMPEN 411 L20 S.17
Wallace-Tree Multiplier
6 5 4 3 2 1 0 6 5 4 3 2 1 0
Partial products First stage
Bit position
6 5 4 3 2 1 0 6 5 4 3 2 1 0
Second stage Final adder
FA HA
(a) (b)
(c) (d)
Sp12 CMPEN 411 L20 S.18
Wallace-Tree Multiplier
Full adder = (3,2) compressor
Sp12 CMPEN 411 L20 S.19
Making it Faster: Tree Multiplier Structure
partial
product
array
reduction
tree
fast carry
propagate
adder (CPA)P (product)
mux
+
reduction
tree (log N)
+
CPA (log N)
Q (‗ier)
D (‗icand)
D
D
D
0
0
0
0
multiple
forming
circuits
inte
rconnect
Sp12 CMPEN 411 L20 S.20
(4,2) Counter
Built out of two (3,2) counters (just FA‘s!)
all of the inputs (4 external plus one internal) have the same weight (i.e., are in the same bit position)
the internal carry output is fed to the next higher weight position (indicated by the )
(3,2)
(3,2)Note: Two carry outs - one
―internal‖ and one ―external‖
Sp12 CMPEN 411 L20 S.22
Tiling (4,2) Counters
Reduces columns four high to columns only two high
Tiles with neighboring (4,2) counters
Internal carry in at same ―level‖ (i.e., bit position weight) as the internal carry out
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
Sp12 CMPEN 411 L20 S.24
4x4 Partial Product Array Reduction
multiplicand
multiplier
partial
product
array
reduced pp
array (to
CPA)
double
precision
product
Fast 4x4 multiplication using (4,2) counters
How would you lay it out?
five (4,2) counters
5-bit CPA
multiplicand
mu
ltiplie
r
8-bit product
Sp12 CMPEN 411 L20 S.25
8x8 Partial Product Array Reduction
‘icand
‘ier
partial
product
array
reduced
partial
product
array
Wallace tree
multiplier
two rows of
nine (4,2)
counters
one row of
thirteen
(4,2)
counters
to a 13-bit fast CPA
Sp12 CMPEN 411 L20 S.26
An 8x8 Multiplier Layout
How should it be laid out?multiplicand
mu
ltiplie
r
thirteen (4,2) counters
nine (4,2) counters
nine (4,2) counters
13-bit CPA
Sp12 CMPEN 411 L20 S.32
Multipliers —Summary
• Optimization Goals Different Vs Binary Adder
• Once Again: Identify Critical Path
• Other possible techniques
- Data encoding (Booth)
- Pipelining
FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION
- Logarithmic versus Linear (Wallace Tree Mult)
Sp12 CMPEN 411 L20 S.33
Next Lecture and Reminders
Next lecture
Shifters, decoders, and multiplexers
- Reading assignment – Rabaey, et al, 11.5-11.6