cmpen 411 vlsi digital circuits spring 2012 lecture 20 ...kxc104/class/cmpen411/14f/lec/c... ·...

Sp12 CMPEN 411 L20 S.1

CMPEN 411VLSI Digital Circuits

Spring 2012

Lecture 20: Multiplier Design

[Adapted from Rabaey‘s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]


Review: Basic Building Blocks

Datapath

Execution units

- Adder, multiplier, divider, shifter, etc.

Register file and pipeline registers

Multiplexers, decoders

Control

Finite state machines (PLA, ROM, random logic)

Interconnect

Switches, arbiters, buses

Memory

Caches (SRAMs), TLBs, DRAMs, buffers


The Binary Multiplication

x

+

Partial products

Multiplicand

Multiplier

Result

1 0 1 0 1 0

1 0 1 0 1 0

1 0 1 0 1 0

1 1 1 0 0 1 1 1 0

0 0 0 0 0 0

1 0 1 0 1 0

1 0 1 1


Multiply Operation

Multiplication is just a a lot of additions

multiplicand

multiplier

partial

product

array

double precision product

N

2N

N can be formed in parallel


Multiplication Approaches

Right shift and add

Partial product array rows are accumulated from top to bottom on an N-bit adder

- After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add

Time for N bits Tserial_mult = O(N Tadder) = O(N2) for a RCA

Making it faster

Use a faster adder

Use higher radix (e.g., base 4) multiplication – O(N/2 Tadder)

- Use multiplier recoding to simplify multiple formation (booth)

Form the partial product array in parallel and add it in parallel

Making it smaller (i.e., slower)

Use serial-parallel mult

Use an array multiplier

- Very regular structure with only short wires to nearest neighbor cells. Thus, very simple and efficient layout in VLSI Can be easily and efficiently pipelined


Serial-parallel multiplier structure


The Array Multiplier

Y0

Y1

X3 X2 X1 X0

X3

HA

X2

FA

X1

FA

X0

HA

Y2X3

FA

X2

FA

X1

FA

X0

HA

Z1

Z3Z6Z7 Z5 Z4

Y3X3

FA

X2

FA

X1

FA

X0

HA

Z2

Z0

Sp12 CMPEN 411 L20 S.11

Booth multiplier

Encoding scheme to reduce number of stages in multiplication.

Performs two bits of multiplication at once—requires half the stages.

Each stage is slightly more complex than simple multiplier, but adder/subtracter is almost as small/fast as adder.

Sp12 CMPEN 411 L20 S.12

Booth encoding

Two‘s-complement form of multiplier:

y = -2nyn + 2n-1yn-1 + 2n-2yn-2 + ... (first bit is the sign bit)

(example, y=18=010010 y= -18 = 101110 )

Rewrite using 2a = 2a+1 - 2a:

y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...

Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product.

Sp12 CMPEN 411 L20 S.13

Booth actions

yi yi-1 yi-2 increment

0 0 0 0

0 0 1 x

0 1 0 x

0 1 1 2x

1 0 0 -2x

1 0 1 -x

1 1 0 -x

1 1 1 0

y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...

Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product.

Sp12 CMPEN 411 L20 S.14

Booth example

x = 1001 (910), y = 0111 (710).

P0 = 00000000

y3y2y1=011 y1y0y-1=11(0)

y1y0y-1 = 110, P1 = P0 - (1001) = 11110111

x shift left for 2 bits to be 100100

y3y2y1 = 011, P2 = P1 (10*100100) =

11110111+01001000 = 001111111 (6310)

An array multiplier needs N addtions, booth multiplier needs only N/2 additions

Sp12 CMPEN 411 L20 S.15

Review: A 64-bit Adder/Subtractor

1-bit

FA S0

C0=Cin

C1

1-bit

FA S1

C2

1-bit

FA S2

C3

C64=Cout

1-bit

FA S63

C63

. .

.

Ripple Carry Adder (RCA) built out of 64 FAs

Subtraction – complement all subtrahend bits (xor gates) and set the low order carry-in

RCA

advantage: simple logic, so small (low cost)

disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption)

A0

B0

A1

B1

A2

B2

A63

B63

add/subt

Sp12 CMPEN 411 L20 S.16

Booth structure

Sp12 CMPEN 411 L20 S.17

Wallace-Tree Multiplier

6 5 4 3 2 1 0 6 5 4 3 2 1 0

Partial products First stage

Bit position

6 5 4 3 2 1 0 6 5 4 3 2 1 0

Second stage Final adder

FA HA

(a) (b)

(c) (d)

Sp12 CMPEN 411 L20 S.18

Wallace-Tree Multiplier

Full adder = (3,2) compressor

Sp12 CMPEN 411 L20 S.19

Making it Faster: Tree Multiplier Structure

partial

product

array

reduction

tree

fast carry

propagate

adder (CPA)P (product)

mux

+

reduction

tree (log N)

+

CPA (log N)

Q (‗ier)

D (‗icand)

D

D

D

0

0

0

0

multiple

forming

circuits

inte

rconnect

Sp12 CMPEN 411 L20 S.20

(4,2) Counter

Built out of two (3,2) counters (just FA‘s!)

all of the inputs (4 external plus one internal) have the same weight (i.e., are in the same bit position)

the internal carry output is fed to the next higher weight position (indicated by the )

(3,2)

(3,2)Note: Two carry outs - one

―internal‖ and one ―external‖

Sp12 CMPEN 411 L20 S.22

Tiling (4,2) Counters

Reduces columns four high to columns only two high

Tiles with neighboring (4,2) counters

Internal carry in at same ―level‖ (i.e., bit position weight) as the internal carry out

(3,2)

(3,2)

(3,2)

(3,2)

(3,2)

(3,2)

Sp12 CMPEN 411 L20 S.24

4x4 Partial Product Array Reduction

multiplicand

multiplier

partial

product

array

reduced pp

array (to

CPA)

double

precision

product

Fast 4x4 multiplication using (4,2) counters

How would you lay it out?

five (4,2) counters

5-bit CPA

multiplicand

mu

ltiplie

r

8-bit product

Sp12 CMPEN 411 L20 S.25

8x8 Partial Product Array Reduction

‘icand

‘ier

partial

product

array

reduced

partial

product

array

Wallace tree

multiplier

two rows of

nine (4,2)

counters

one row of

thirteen

(4,2)

counters

to a 13-bit fast CPA

Sp12 CMPEN 411 L20 S.26

An 8x8 Multiplier Layout

How should it be laid out?multiplicand

mu

ltiplie

r

thirteen (4,2) counters

nine (4,2) counters

nine (4,2) counters

13-bit CPA

Sp12 CMPEN 411 L20 S.32

Multipliers —Summary

• Optimization Goals Different Vs Binary Adder

• Once Again: Identify Critical Path

• Other possible techniques

- Data encoding (Booth)

- Pipelining

FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION

- Logarithmic versus Linear (Wallace Tree Mult)

Sp12 CMPEN 411 L20 S.33

Next Lecture and Reminders

Next lecture

Shifters, decoders, and multiplexers

- Reading assignment – Rabaey, et al, 11.5-11.6

cmpen 411 vlsi digital circuits spring 2012 lecture 20 ...kxc104/class/cmpen411/14f/lec/c... ·...

Documents