design of fast and low power multiplier

63
8/13/2019 Design Of Fast and low power Multiplier http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 1/63

Upload: sridhar-pramod

Post on 04-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 1/63

Page 2: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 2/63

 

Department of ECE, JNTUHCEH 2

multipliers have been plagued by complicated switching systems and/or irregularities

in design. 

1.2 Power Optimization 

Power refers to number of Joules dissipated over a certain amount of time

whereas energy is the measure of the total number of Joules dissipated by a circuit.

In digital CMOS design, the well-known power-delay product  is commonly

used to assess the merits of designs. In a sense, this can be shown as power ×

delay = (energy/delay) × delay = energy, which implies delay is irrelevant. 

1.3 Low-Power Multiplier Design 

Multiplication consists of three steps: generation of partial products or

(PPG), reduction of partial products (PPR), and finally carry-propagate addition

(CPA).In general there are sequential and combinational multiplier implementations.

 Now consider combinational case here because the scale of integration now is large

enough to accept parallel multiplier implementations in digital VLSI systems.

Different multiplication algorithms vary in the approaches of PPG, PPR, and CPA.

For PPG, AND gate array is the easiest, also use array of multiplexers for PPG.

For PPR, two alternatives exist: reduction by rows, performed by an array of

adders, and reduction by columns, performed by an array of counters. The final CPA

requires a fast adder scheme because it is on the critical path. In some cases, final CPA

is postponed if it is advantageous to keep redundant results from PPG for further

arithmetic operations.

1.4 Languages and Tools Used 

Considered Verilog HDL  as our primary language. For simulation used

synopsys VCS compiler. For synthesis have used Synopsys Design Compiler  

90nm  process technology.

1.5 Research Approach 

The basic motive of the project was to study and develop an efficient fast and

low power multiplier. As the name suggests had to go for faster and low power factor

optimization simultaneously. The basic building block of a multiplier is ADDER

circuit. Hence turned our focus to the adders first. Studied the area occupied and the

time delay consumed by different adders and found out a proper relation between time

Page 3: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 3/63

 

Department of ECE, JNTUHCEH 3

and area complexity of all the adders under consideration.

Then turned our focus to the Multipliers. In Multipliers studied different

multipliers writing programs, verifying waveforms and then finally calculating area

along with power consumed by the circuit. After knowing all this also calculated

delay for different multipliers which helped us to determine the best multiplier. HPM

multiplier was found to be the best multiplier among all with less power

consumption and proper area, delay trade-off. Future work will be to optimize power

consumed by different multipliers there by reducing number of gates used and area

occupied by them. 

Page 4: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 4/63

Page 5: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 5/63

 

Department of ECE, JNTUHCEH 5

Complexity (A) Delay (T) Product (A x T) Adder Schemes

O(n)

O(n)

O(n)

O(n)

O(n1/*1+1)

O(log n)

O(n2)

O(n*1+2/*1+1)

O(n log n)

Ripple-Carry

Carry-Select

Carry-Look ahead

Table 2.1 Categorization of adders with respect to delay time and capacity 

2.2 Half and Full AdderThe basic building block of a multiplier is ADDER circuit. An adder or

summer is a digital circuit that performs addition of numbers. In many computers and

other kinds of processors, adders are used not only in the arithmetic logic unit, but also in

other parts of the processor, where they are used to calculate addresses, table indices, and

similar operations.

2.2.1 Half Adder 

A half adder adds two one-bit binary numbers “ain”  and “ bin”. It has two

outputs,“Sout”  and “cout”  (the value theoretically carried on to the next addition); the

final sum is, “2cout + sout”. The simplest half-adder design, pictured below in figure

2.1, incorporates an XOR gate for “sout”  and an AND gate f or “cout”. Half

adders cannot be used compositely, given their incapacity for a carry-in bit. TABLE

2.2 shows the truth table of half adder.

Figure 2.1 Logic Diagram of Half Adder

Page 6: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 6/63

 

Department of ECE, JNTUHCEH 6

HALF ADDER INPUTS HALF ADDER OUTPUTS

ain bin sout cout

0

0

1

1

0

1

0

1

0

1

1

1

0

0

0

1

Table 2.2 Truth Table of Half Adder  

2.2.2 Full Adder 

A full adder adds binary numbers and accounts for values carried in as well as

out. A one-bit full adder adds three one-bit numbers, often written as “ain”, “ bin”, and

“cin” where “ain” and  bin are the operands, and “cin” is a bit carried in (in theory from

a past addition). The circuit produces a two-bit output sum typically represented by

the signals „cout‟ and „sout‟. The one-bit full adder's truth table shown in TABLE 2.3.

FULL ADDER INPUTS FULL ADDER OUTPUTS

Ain bin cin sout cout

0

0

0

0

1

1

1

1

0

0

1

1

0

0

1

1

0

1

0

1

0

1

0

1

0

1

1

0

1

0

0

1

0

0

0

1

0

1

1

0

Table 2.3 Truth Table of Full Adder  

A full adder can be implemented in many different ways such as with a

custom transistor-level circuit or composed of other gates. One implementation is

Page 7: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 7/63

 

Department of ECE, JNTUHCEH 7

=   ⊕   ⊕ 

 

=   ·   +   ·   +   ·  

In this implementation for generation of carry, using three AND gates and two

OR gates as shown in figure 2.2. As per the logic diagram this adder has more number of

gates. The results from the synthesis tool are shown in table 2.4.

Figure 2.2 Logic Diagram of Full Adder1

Full adder can be implemented using only two types of gates and is convenient

if the circuit is being implemented using simple IC chips which contain only one gate

type per chip. In this light, „cout can be implemented as shown in figure 2.3.

Figure 2.3 Logic Diagram of Full Adder2

A full adder can be constructed from two half adders by connecting “ain”  and“ bin”  to the input of one half adder, connecting the sum from that to an input to the

second adder, connecting „cin‟  to the other input and OR the two carry outputs.

Equivalently, „sout‟  could be made the three-bit XOR of „ain‟, „ bin‟, and „cin‟, and

„cout‟ could be made the three-bit majority function of „ain‟, „ bin‟ and „cin‟ as shown in

figure 2.4 where h1 and h2 half adders shown in figure 2.1. 

Page 8: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 8/63

 

Department of ECE, JNTUHCEH 8

Figure 2.4 Logic Diagram of Full Adder3 

T he full adder can be viewed as a 3:2 compressor it sums three one-bit

inputs, and returns the result as a single two-bit number. Thus, for example, a binary

input of 101 results in an output of 1+0+1=10 (decimal number '2'). The carry-out

represents bit one of the result, while the sum represents bit zero. Likewise, a half

adder can be used as a 2:2 compressor. The results from the synthesis tool for the

three full adders are shown in TABLE 2.4. From figure it is clear that of all the full

adders shown the full adder1 is the better choice. So throughout the project used full

adder1 both as adder and 3:2 compressor.

Type of Full Adder Area (µm2) Delay (ps) Power (uw)

Full adder 1

Full Adder 2

Full Adder 3

30.1

41.1

51.5

0.45

0.45

0.53

6.6

12.64

14.72

Table 2.4 Comparisons of Different Implementations of Full Adders 

2.3 Ripple Carry Adder 

The well known adder architecture, ripple carry adder is composed of

cascaded full adders for n-bit adder, as shown in figure 2.5. It is constructed by

cascading full adder blocks in series. The carry out of one stage is fed directly to the

carry-in of the next stage. For an n-bit parallel adder it requires “n” full adders. 

Page 9: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 9/63

 

Department of ECE, JNTUHCEH 9

Figure 2.5 4-Bit Ripple Carry Adder  

Multiple full adder circuits can be cascaded in parallel to add an N-bit number.For an N- bit parallel adder, there must be N number of full adder circuits. A ripple carry

adder is a logic circuit in which the carry-out of each full adder is the carry in of the

succeeding next most significant full adder.

It is called a ripple carry adder because each carry bit gets rippled into the next

stage. In a ripple carry adder the sum and carry out bits of any half adder stage is not

valid until the carry in of that stage occurs. Propagation delays inside the logic circuitry

is the reason behind this. Propagation delay is time elapsed between the application of an

input and occurrence of the corresponding output. Not very efficient when large number

 bit numbers are used. Delay increases linearly with bit length.

2.3.1 Delay

Delay from Carry-in to Carry-out is more important than from A to carry-out

or carry-in to SUM, because the carry-propagation chain will determine the latency

of the whole circuit for a Ripple-Carry adder.

2.4 Carry Select Adder 

In Carry select adder scheme, blocks of bits are added in two ways: One

assuming a carry-in of 0 and the other with a carry-in of 1.

Because of multiplexers larger area is required. Have a lesser delay than Ripple

Carry Adders (half delay of RCA). Hence always go for Carry Select Adder while

working with smaller no of bits.

Page 10: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 10/63

 

Department of ECE, JNTUHCEH 10

Figure 2.6 Carry Select with 1 Level using n/2- bit RCA 

Page 11: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 11/63

 

Department of ECE, JNTUHCEH 11

As shown in the figure 2.6, is the basic building block of a carry-select adder, the

carry-select adder generally consists of two ripple carry adders and a multiplexer. Adding

two n-bit numbers with a carry-select adder is done with two adders (therefore two ripple

carry adders) in order to perform the calculation twice, one time with the assumption of the

carry being zero and the other assuming one. After the two results are calculated, the correct

sum, as well as the correct carry, is then selected with the multiplexer once the correct carry

is known.

The block size should have a delay, from addition inputs a and b to the carry out,

equal to that of the multiplexer chain leading into it, so that the carry out is calculated just in

time. where the resulting carry and sum bits are selected by the carry-in. Since one ripple

carry adder assumes a carry-in of 0, and the other assumes a carry-in of 1, selecting whichadder had the correct assumption via the actual carry-in yields the desired result.

2.5 Carry Look Ahead Adder 

Carry Look Ahead Adder can produce carries faster due to carry bits generated in

 parallel by an additional circuitry whenever inputs change. This technique uses carry

 bypass logic to speed up the carry propagation. Let ai and bi  be the augends and

addend inputs, ci the carry input, si and ci+1, the sum and carry-out to the i

th

 bit position.If the auxiliary functions, pi and gi called the propagate and generate signals, the

sum output respectively are defined as follows.

Figure 2.7 4 BIT CLA Logic equations

 pi = ai + bi……..(2.1)  gi = ai bi……… (2.2) 

si = ai xor bi xor ci .....(2.3) ci+1 = gi + pici .....(2.4)

Page 12: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 12/63

 

Department of ECE, JNTUHCEH 12

ACLA = O (n) = 14n ……………(2.5) 

TCLA = O (log n) = 4 log2n. ……….(2.6) 

2.6 Binary To Excess - 1 Code Converter (BEC) 

Binary to Excess One Converter can also be used as a adder where are having

results available and waiting for only carry from the previous stage. Then can use this BEC

converter can calculate the result with one as carry. Then by using the carry coming

from the previous stage as a select signal to the multiplexer and can get the original

result.

Figure 2.8 5 Bit Binary to Ecess –  1 Code Converter with out carry with carry

The BEC gets n inputs and generates n output; the BECWC (BEC with Carry)

gets n input and generates n+1 output to give the carry output.

The output value is one more than the given input. The detailed structures of the

5-bit BEC without carry (BEC) and with carry (BECWC) are shown figure 2.8. The

function table of BEC and BECWC are shown in TABLE 2.5.

Page 13: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 13/63

 

Department of ECE, JNTUHCEH 13

Table 2.5 Functional Table of 5 Bit BEC and BECWC

Input  BEC without carry  BEC with carry 

 b[4:0]  x[4:0]  cy  x[4:0] 

00000

00001

00010

00011

00100

11011

11100

11101

11110

11111 

00001

00010

00011

00100

00101

11100

11101

11110

11111

00000 

0

0

0

0

0

0

0

0

0

00001

00010

00011

00100

00101

11100

11101

11110

11111

00000 

Page 14: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 14/63

 

Department of ECE, JNTUHCEH 14

Chapter -3

The Multipliers 

3.1 Introduction 

High speed multiplication is a primary requirement of high performance digital

systems. In recent trends the column compression multipliers are popular for high speed

computations due to their higher speeds. The data flow in a column compression

multiplier is shown in figure 3.1.

Figure 3.1 Data flow in a column compression multiplier

As shown in the figure 3.1 it is clear that the total delay of the multiplier can be

split up into three parts: due to the Partial Product Generation (PPG), the Partial

Product Summation Tree (PPST), and finally due to the Final Adder. Of these the

dominant components of the multiplier delay are due to the PPST and the final adder. The

relative delay due to the PPG is small. Therefore significant improvement in the speed

of the multiplier can be achieved by reducing the delay in the PPST and the final adder

stage of the multiplier.

The first column compression multiplier was introduced by Wallace in 1964. He

reduced the partial product of  N rows by grouping into sets of three row set and two row

set using (3, 2) counters and (2, 2) counters respectively. In 1965,

Page 15: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 15/63

 

Department of ECE, JNTUHCEH 15

Dadda altered the approach of Wallace by starting with the exact placement of the

(3, 2) counters and (2, 2) counters in the maximum critical path delay of the multiplier.

Since 2000‟s, a closer reconsideration of Wallace and Dadda multipliers has been done

and proved that the Dadda multiplier is slightly faster than the Wallace multiplier and the

hardware required for Dadda multiplier is lesser than the Wallace multiplier. In 2006, H.

Eriksson along with his research team presented HPM reduction tree structure that has an

ease of layout compared to Dadda‟s  approach . Compared to Dadda, HPM is slightly

faster and consumes lesser power while area being the same.

3.2 Wallace Tree Multiplier 

The Wallace tree multiplier is considerably faster than a simple array multiplier

 because its height is logarithmic in word size, not linear. However, in addition to the large

number of adders required, the Wallace tree‟s wiring is much less regular and more

complicated. As a result, Wallace trees are often avoided by designers, while design

complexity is a concern to them.

3.2.1 WALLACE Column Compression Algorithm 

1. The N rows of partial products are together in sets of three each. Any additional

rows that are not a member of a group of three are transferred to the next level

without modification.

2. Within each group of three rows, (3,2) compressors are applied to the columns

containing three bits and (2,2) compressors are applied to the columns containing

two bits.

3. Columns containing only a single bit are transferred to the next level unchanged.

d0 = N

d j+1 = 2*[d j/3] + d j mod 3 ………. (3.1) 

Page 16: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 16/63

 

Department of ECE, JNTUHCEH 16

full

adder  

full

adder  

x7

y7 

x6

y6 

x5

y5 

x4

y4 

x3

y3 

x2

y2 

x1

y1 

x0

y0 

final adder  

half   half   7  6  5  4  3  2  1  0 adder   adder   15  14  13  12  11  10  9  8 

23  22  21  20  19  18  17  16 reduction  31  30  29  28  27  26  25  24 stage 1  39  38  37  36  35  34  33  32 

47  46  45  44  43  42  41  40 55  54  53  52  51  50  49  48 

63  62  61  60  59  58  57  56 

23

c7 

s7

c6 

s6

c5 

s5

c4 

s4

c3 

s3

c2 

s2

c1 

s1

c0 s0  0 

stage 2  47  s15  s14  s13  s12  s11  s10  s9  s8  24 

c15  c14  c13  c12  c11  c10  c9  c8 55  54  53  52  51  50  49  48 

63  62  61  60  59  58  57  56 

c23  s23  s22  s21  s20  s19  s18  s17  s16  s0  0 reduction  47  s15  s14  c22  c21  c20  c19  c18  c17  c16 stage 3  63  c30  s30  s29  s28  s27  s26  s25  s24  c8 

c31  s31  c29  c28  c27  c26  c25  c24 

reduction  c30  s41  s40  s39  s38  s37  s36  s35  s34  s33  s32  s16  s0  0 

stage 4  63  c41  c40  c39  c38  c37  c36  c35  c34  c33  c32 

c31  s31  c29  c28  c27  c26  c25  c24 

s44  s43  s42  s32  s16  s0  0 

c52  c51  c50  c49  c48  c47  c46  c45  c44  c43  c42 

 p15   p14   p13   p12   p11   p10   p9   p8   p7   p6   p5   p4   p3   p2   p1   p0 

Figure 3.2 8 x 8 Wallace Tree Multiplier Logarithmic Depth Hierarchy

Page 17: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 17/63

 

Department of ECE, JNTUHCEH 17

3.2.2 Applying Wallace Compression Algorithm to 8 x 8 multiplier 

Consider N- bit Multiplier X and N- bit Multiplicand. X and Y are represented as

Multiplicand, Y = yn-1 yn-2 yn-3 . . . . . . . . y3 y2 y1 y0 ……………..(3.2) 

Multiplier,X = xn-1 xn-2 xn-3 . . . . . . . . x3 x2 x1 x0 ……………..(3.3) 

The flow diagram below shows the intermediate state reductions of the multipliers are

 being done by full adders and half adders while the final step additions being done by a RCA.

The flow diagram was done in Microsoft Excel sheet as shown in figure 3.2. The architecture of

the 8 x 8 Wallace Multiplier along with the theoretical delay values is shown in figure 3.3

where FA is full adder and HA is half adder.

Figure 3.3 Architecture of 8 x 8 Wallace Tree Multiplier with RCA as final adder

3.3 DADDA Multiplier 

The Dadda multiplier is faster than Wallace Tree multiplier because the Wallace

tree‟s  wiring is much less regular and more complicated. Dadda multiplier over comes

this disadvantage by placing the 3:2 compressors and 2:2 compressors in the critical path.

Page 18: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 18/63

 

Department of ECE, JNTUHCEH 18

The disadvantage in Dadda multiplier is that the layout is not regular.

However, unlike Wallace multipliers that reduce as much as possible on each layer,

Dadda multipliers do as few reductions as possible. Because of this, Dadda multipliers have

a less expensive reduction phase, but the numbers may be a few bits longer, thus requiring

slightly bigger adders.

Take any three wires with the same weights and input them into a full adder. The

result will be an output wire of the same weight and an output wire with a higher weight for

each three input wires. If there are two wires of same weight left, and the current number of

output wires with that weight is equal 2 (modulo 3), input them into a half adder.

Otherwise, pass them through to the next layer. If there is just one wire left, connect it to

the next layer.

3.3.1 Dadda Column Compression Algorithm 

1.  Let d1=2 and d j+1=[1.5*d j]. “dj: is the height of the matrix for the j th stage.

Repeat until the largest jth stage is reached in which the original N height

matrix contains at least which has more than “dj”  partial products.

2. In the jth

stage from the end, place (3,2) and (2,2) compressors as required to

achieve a reduced matrix. Only columns with more than “dj” partial products as

they receive carries from less significant (3,2) and (2,2) compressors are reduced

3. Let j=j-1 and repeat step 2 until a matrix with a height of two is generated. This

should occur when j=1. 

Page 19: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 19/63

Page 20: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 20/63

Page 21: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 21/63

 

Department of ECE, JNTUHCEH 21

63  55  47  39  31  23  15  7  6  5  4  3  2  1  0 62  54  46  38  30  22  14  13  12  11  10  9  8 

61  53  45  37  29  21  20  19  18  17  16  N=8 

60  52  44  36  28  27  26  25  24 

59  51  43  35  34  33  32 

58  50  42  41  40 

57  49  48 

56 

63  55  47  39  31  23  29  21  6  5  4  3  2  1  0 62  54  46  38  30  36  28  13  12  11  10  9  8 

61  53  45  37  43  35  20  19  18  17  16  N=7 

60  52  44  50  42  27  26  25  24 59  51  57  49  34  33  32 

58  c0  56  41  40 

c1  s1  s0  48 

63  55  47  39  31  44  50  42  20  5  4  3  2  1  0 62  54  46  38  51  57  49  27  12  11  10  9  8 

61  53  45  58  c0  56  34  19  18  17  16  N=6 

60  52  c1  s1  s0  41  26  25  24 59  c4  c3  c2  48  33  32 

c5  s5  s4  s3  s2  40 

63  55  47  39  52  c1  s1  s0  41  19  4  3  2  1  0 62  54  46  59  c4  c3  c2  48  26  11  10  9  8

61  53  c5  s5  s4  s3  s2  33  18  17  16  N=5 

60  c10  c9  c8  c7  c6  40  25  24 c11  s11  s10  s9  s8  s7  s6  32 

63  55  47  60  c10  c9  c8  c7  c6  40  18  3  2  1  0 62  54  c11  s11  s10  s9  s8  s7  s6  25  10  9  8 

N=4 

c19  s19  s18  s17  s16  s15  s14  s13  s12  24 

63  55  c19  s19  s18  s17  s16  s15  s14  s13  s12  17  2  1  0 62  c28  c27  c26  c25  c24  c23  c22  c21  c20  24  9  8  N=3 

c29  s29  s28  s27  s26  s25  s24  s23  s22  s21  s20  16 

63  16  1  0  N=2 

c41  s41  s40  s39  s38  s37  s36  s35  s34  s33  s32  s31  s30  8 

Page 22: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 22/63

 

Department of ECE, JNTUHCEH 22

Figure 3.6 8 x 8 HPM Multiplier Logarithmic Depth Hierarchy

3.4.2 Applying HPM Compression Algorithm to 8 x 8 multiplier 

The flow diagram below shows the intermediate state reductions of the

multipliers are being done by full adders and half adders while the final step additions

 being done by a RCA. The flow diagram is shown in figure 3.6 and the architecture

figure 3.7, where FA is full adder and HA is half adder. 

Figure 3.7 Architecture of 8 x 8 HPM Multiplier with RCA as final adder

3.5 Analysis of Multipliers 

The theoretical comparison of number of adders Wallace,Dadda and HPM

multipliers is shown in the table 3.1

Compression

Technique

 Number of Full

Adders

 Number of Half

Adders

Carry Propagation

Adder(CPA) Length

Wallace

Dadda

HPM

 N2 - 4.N + 2 + S

 N2  –  4.N + 3

 N2  –  4.N + 3

> N

 N –  1

 N –  1

2.N –  1 –  S

2.N –  2

2.N –  2

Page 23: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 23/63

 

Department of ECE, JNTUHCEH 23

Table 3.1 Theoretical Comparison of Different Multipliers

Where, N = Multiplier and Multiplicand bit length and

S = Number of Reduction stages

 Now TABLE 3.2 shows the synthesis results of three multipliers for area,

delay and power. simulated and synthesized 8 x 8 bit, 16 x 16 bit and 32 x 32 bit

multipliers using each column reduction technique. The 2:2 compressor used in

the design is the full adder1 shown in figure 2.1 and 3:2 compressor used is shown

in figure 2.2. All the results shown here are the combinational circuit results i.e.,

inputs and outputs are not registered.

Bit width of

Multiplier

PPST Area (µm2) Delay (ns) Power ( mW)

8 x 8

16 x 16

32 x 32

Wallace

Dadda

HPM

Wallace

Dadda

HPM

Dadda

HPM

2770.4

2240.6

2052.2

9273.5

9116.6

8934.3

47059.1

42724.7

7.62

6.63

5.81

15.46

13.94.

12.77

25.43

25.33

1.2 

0.877

0.817

5.07

4.86

4.74

31.09

20.40

Table 3.2 Comparison of Implementation of Different Multipliers w.r.t area, delay and power

From the table 3.2 it is clear that HPM multiplier is the optimal multiplier

compared to the other two with respect to area, delay and power. So used HPM as our

column reduction technique in our proposed multiplier. 

Page 24: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 24/63

 

Department of ECE, JNTUHCEH 24

Chapter - 4 

TheProposed Design for High Speed and Low Power 

4.1 Recursive Multiplication for High Speed 

The architecture proposed in this project work is centered on a recursive

multiplication algorithm by Danysh and Swartzlander [15]. The authors present a

multiplication algorithm based on divide and conquer methodologies that introduces

greater regularity in design than standard column compression multipliers, while avoiding the

linear latency of array multipliers.

Recent studies have examined the consequences of technology scaling on arithmetic

circuitry. These investigations strongly support the need for the consideration of

interconnect layout as an integral part of future arithmetic circuitry. The predominant

advantage offered by the recursive multiplication scheme is the use of smaller multipliers to

implement a larger operation, which is in direct compliance with the presented results. This

structure promotes the notion of exploiting locally optimized arrays for reduced

interconnect power through shorter local interconnects, and a more regular integration of the

sub-components on a larger scale.

The recursive multiplier scheme works by executing an n-bit multiplication using 4

n/2-bit multipliers in parallel and adding up the results. The n/2-bit multipliers may he

further reduced, where each sub-multiplier carries out 4 parallel n/4-bit multiplications,

and so forth. In this manner a large multiplication is carried using recursions of simpler

 base multiplier modules.

Mathematically, the recursive algorithm may be proved by first considering two

unsigned n-bit operands, the multiplier X and multiplicand A

=

 

and A may now he defined as: 

X = XL + XH …………. (4.1)  A = AL+AH …………….(4.2) 

The overall multiplication of A and X is given by

Page 25: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 25/63

 

Department of ECE, JNTUHCEH 25

= A . X ………..(4.3) 

= (AL + AH) . (XL +XH)

= (AL.XL) + (AL.XH) + (AH.XL) + (AH.XH)

= P1 + P2 + P3 +P4 ……………..(4.4) 

Figure 4.1 Pictorial Representation of RecursiveMultiplication

Therefore, the overall multiplication may be reduced to four smaller multiplications,

and this process may be repeated using even smaller base multipliers. In order to minimize

the delay introduced by subdividing the process, the result of the base multipliers, or the

intermediary products, will be kept in carry save form. Hence only one final fast adder will

 be required to yield the final product. All the four N/2 multiplications derived above are

diagrammatically shown in figure 4.1 (for N = 8). The result from multiplier M1, M2, M3

and M4 are P1, P2, P3 and P4 respectively.

The architecture of recursive multiplier for N x N bit multiplier with RCA as

merging adder is shown in figure 4.2 Each N/2 multiplier in figure 4.2 uses HPM

algorithm for PPST.

Page 26: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 26/63

 

Department of ECE, JNTUHCEH 26

Figure 4.2 N –  Bit Recursive Multiplier with RCA as Merging Adder  

Figure 4.2 shows that all the inputs and outputs are registered, so the latency of

the multiplier becomes two i.e, it takes two clock cycles to get the first output and after

that output is obtained for each clock cycle.

Page 27: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 27/63

 

Department of ECE, JNTUHCEH 27

 Now as mentioned earlier the partial products that are dependent (marked in black

in figure. 1(b) and 1(c)) are given to Multiplier M2 and M3 respectively. The products

obtained from M2 and M3 are given to a N- bit RCA and the obtained result is given to

 N+1 bit RCA along with the MSB N/2 bits of product from M1 and the LSB N/2-bits of

 product from M4. The MSB N/2 bits of M4 product are given to N/2-1 bit RCA with „1‟

as carry input and calculating the result before the actual carry arrives, and used a

multiplexer for selecting the product based on the actual carry generated by N+1-bit

RCA. This dependency and flow can be clearly observed in figure 4.2.

 Now in order to improve the speed further replaced the N/2  –   1 RCA Adder

with BEC adder. The logical structure of a 7 –  bit bec adder is shown in figure 4.3. So by

using this recursive multiplication and hybrid adder (combination of RCA and BEC)

for merging the products for four N/2 bit multipliers achieved speed. Now to reduce the

 power have opted the twin precision multiplication which is described in next section.

Figure 4.3 7 –  Bit BEC Adder without Carry

4.2 Twin Precision Multiplication For Low Power

Multiplier is a complex arithmetic operation, which is reflected in its relatively high

signal propagation delay, high power dissipation, and large area requirement.When choosing

a multiplier for a digital system,the bit-width of the multiplier is required to be at least as

wide as the largest operand of the applications that are to be executed onthat digital system.

Page 28: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 28/63

 

Department of ECE, JNTUHCEH 28

The bit-width of the multiplier is therefore, often much larger than the data

represented inside the operands,which leads to unnecessarily high power dissipation and

unnecessary long delay.

This resource waste could partially be remedied by having several multipliers ,each

with a specific bit-width,and use the particular multiplier with the smallest bit-width that is

large enough to accommodate the current multiplication. Such a scheme would assure that a

multiplication would be computed on a multiplier that has been optimized in terms of power

and delay for that specific bit-width.

Figure 4.4 (a) Partial Product Array for N = 8 (b) Partial Products showing the dependency. 

In figure 4.4(b) the partial products are partitioned such that obtain four partial

 product arrays of N = 4, of them the partial products that are marked as black are

dependent because the output storage bits are same for those arrays. So the partial

 products that are in black cannot be operated simultaneously. Thus to increase the

throughput are using the independent partial products that are coated in ash are used

along with operand guarding. The architecture for N = 8 with HPM algorithm as

reduction technique, operand guarding and using the independent partial products for performing two N/2 bit multiplications simultaneously is shown in figure 4.5.

Page 29: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 29/63

 

Department of ECE, JNTUHCEH 29

Figure 4.5 Block Diagram of 8 x 8 Twin Precision Multiplier

4.3 Clock Gating and Recursive Multiplication

Clock gating is a popular technique used in many synchronous circuits for

reducing dynamic power dissipation. Recursive Multiplication is used to reduce the power.

4.3.1 Clock Gating 

Clock gating saves power by adding more logic to a circuit to prune the clock tree.

Pr uning the clock disables portions of the circuitry so that the flip-flops in them do not

have to switch states. Switching states consumes power. When not being switched, the

switching power consumption goes to zero, and only leakage currents are incurred.

Clock gating works by taking the enable conditions attached to registers, and usesthem to gate the clocks. Therefore it is imperative that a design must contain these enable

conditions in order to use and benefit from clock gating. This clock gating process can

also save significant die area as well as power, since it removes large numbers of mu x‟s

and replaces them with clock gating logic. This clock gating logic is generally in the form

of "Integrated clock gating" (ICG) cells. However, note that the clock gating logic will

change the clock tree structure, since the clock gating logic will sit in the clock tree. 

Page 30: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 30/63

 

Department of ECE, JNTUHCEH 30

Clock gating logic can be added into a design in a variety of ways:

1. Coded into the RTL code as enable conditions that can be automatically translated

into clock gating logic by synthesis tools (fine grain clock gating).

2. Inserted into the design manually by the RTL designers (typically as module level

clock gating) by instantiating library specific ICG (Integrated Clock Gating) cells

to gate the clocks of specific modules or registers.

3. Semi-automatically inserted into the RTL by automated clock gating tools. These

tools either insert ICG cells into the RTL, or add enable conditions into the RTL

code. These typically also offer sequential clock gating optimizations.

Sequential clock gating is the process of extracting/propagating the enable

conditions to the upstream/downstream sequential elements, so that additional registers can

 be clock gated. Although asynchronous circuits by definition do not have a "clock",

the term perfect clock gating is used to illustrate how various clock gating techniques are

simply approximations of the data-dependent bheaviour exhibited by asynchronous

circuitry. As the granularity on which gate the clock of a synchronous circuit approaches

zero, the power consumption of that circuit approaches that of an asynchronous circuit:

the circuit only generates logic transitions when it is actively computing.

4.3.2 Recursive Multiplication with Clock Gating 

In project clock gating is used for achieving operator isolation   in the

Recursive multiplier for power reduction. As mentioned earlier recursive multiplier has

four N/2 bit multipliers of which M2 and M3 are dependent and M1 and M4 are

independent. So to achieve low power and double-throughput are using clock gating

technique and are isolating the M2 and M3 multipliers without transferring inputs

them. To perform twin precision multiplication an extra control input is needed. Here are

considering a two it input “Twin” as a control input. The “Twin” is passed through a 2:3

decoder which generates T[1], T[2] and T[3] as control signals and these signals are used

for the operator isolation. TABLE 4.1 shows the truth table of 2:3 decoder. As shown

in the TABLE 4.1 have four operating modes:

Page 31: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 31/63

 

Department of ECE, JNTUHCEH 31

Mode 0 –  Both M1 and M4 in operation for Twin Precision 

Mode 1  –   Only M1 in operation

Mode 2  –   Only M4 in operation

Mode 3 –  Full Mode operation

Operation

ModeT[1] T[2] T[3]

00 –  Both M1

and M4 in

operation for

Twin Precision

1 0 1

01 –  Only M1 in

operation1 0 0

10 –  Only M4 in

operation0 0 1

11 –  Full Mode

operation1 1 1

TABLE 4.1 Decoder Truth Table 

 Now the each output signal of the decoder is given to 2 input AND gate with

clock as another input thus generating three clocks namely clock1, clock2 and clock3 by

T[1], T[2] and T[3] respectively. Clock2 drives registers of M2 and M3 as shown in

figure 4.6. So only in mode 3 the multipliers M2 and M3 will be working and in

remaining all modes they are in off condition thus saving the switching power.

The advantage in this design compared to the regular twin precision multiplier in

is that are isolating the operator instead of operand guarding. So in this design can make

use of one multiplier at a time for one N/2- bit multiplication but in regular twin precision

have to give all zeros for MSB N/2 bits of multiplier and multiplicand in order to

operate the multiplier for same operation, so there is restriction in giving inputs which is

not feasible always. But the control circuit here provides the control to overcome this.

Page 32: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 32/63

Page 33: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 33/63

 

Department of ECE, JNTUHCEH 33

The architecture shown in figure 4.6 has increased speed and also has the flexibility

for N/2 bit multiplication with less power consumption and double-throughput. This can

 be clearly observed in the result analysis.

Page 34: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 34/63

 

Department of ECE, JNTUHCEH 34

Chapter - 5 

ASIC Implementation of Proposed Design 

5.1 Introduction 

In this project implemented different types of multipliers but multipliers with

HPM column reduction technique, recursive multiplier and recursive multiplier with

clock gating are quite important here the implementation of these multipliers is being

described.

For VLSI (hardware) implementation followed ASIC design flow starting from

RTL description to the GDSII. The architecture is described using VERILOG HDL and

the functional simulation is done in VCS simulator , synthesis is carried out in DC

COMPILER.

5.2 ASIC Design Methodology 

Application Specific Integrated Circuit (ASIC) Design, as the name suggests this

design focuses on the development of a hardware module which is completely dedicated

to that particular application or process. This type of design helps in the economical usage

of silicon and also has a good speed compared to the other implementations such as

FPGA and CPLD devices. In general for the development of an ASIC follow a flow

called ASIC design flow. ASIC design flow can be seen in figure 5.1, and the discussion

of each step is done in following sections.

Figure 5.1 ASIC Design

Page 35: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 35/63

 

Department of ECE, JNTUHCEH 35

5.2.1 System Partitioning 

This is the first step of the ASIC design flow; here the complex problem statement

is decomposed into smaller subsystems. The decomposition is carried out hierarchically

until each subsystem is of manageable size.

5.2.2 Design Entry 

In any design, specifications are written first abstractly describing the

functionality, interface and overall architecture of the circuit. A behavioural description is

then created to analyze the design in terms of functionality, compliance to standards, and

other high-level issues. Typically behavioural (simulation model) descriptions are created in

HDLs. Here used VERILOG HDL for design entry.

5.2.3 Simulation

Simulation is carried out at this stage for the written code and this type of

simulation is called as behavioural simulation or functional simulation. Here,the

simulation is carried out by the help of “testbench”, testbench is a piece of code which

 provide the required stimulus or inputs and control signals to the design, by observing

theoutputs in a waveform confirm the functionality. In t h design used VCS simulator

for functional simulation.

5.2.4 Synthesis

The next stage is synthesis, synthesis means converting the written code into gates

and its interconnections. In this stage the conversion of the code into gates and

interconnections is done by mapping to a particular technology i.e., either 0.35µm

technology or 0.18µm technology or 90nm technology. The technology here refers to the

gate length of the transistors used in our design. The output of this synthesis stage is the

“gate level netlist (.vg)”  and “design constraints (.sdc)”  files. Gate level netlist contains

information of the gates and interconnections and design constraints contain the

information such as the clock frequency, wire-load models used. This is the final stage of

the logical flow or the front-end flow. The output files i.e., .v and .sdc are taken as input

to the physical or backend flow.

RTL synthesis is an automated design task in which high-level design descriptions

Page 36: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 36/63

 

Department of ECE, JNTUHCEH 36

written in Hardware Description Languages (such as VHDL, Verilog, or SystemVerilog) are

transformed into gate-level netlists. Gate-level netlist is basically a circuit implementation of

the design made of library components (both combinational and sequential cells) available

in the technology library and their interconnections. The netlist is generated by the synthesis

tool according to the constraints set by the designer.

Design Compiler is RTL Synthesis tool by Synopsys. It supports UNIX platforms

and is installed on Institute's computer systems (see here for available versions on each

 platform linux). Design Compiler is not supported on Windows platform. 

Synthesis with Design Compiler include the following main tasks: reading in the

design, setting constraints, optimizing the design, analyzing the results and saving the design

database. These tasks are described as follows

5.3 Synthesis Overview

Synthesis with Design Compiler include the following main tasks: reading in the

design, setting constraints, optimizing the design, analyzing the results and saving the design

database. These tasks are described below 

5.3.1 Reading in the Design

The first task in synthesis is to read the design into Design Compiler memory.

Reading in an HDL design description consist of two tasks: analyzing and elaborating the

description. The analysis command (analyze) performs the following tasks

Reads the HDL source and checks it for syntactical errors Creates HDL library

objects in an HDL-independent intermediate format and saves these intermediate files in a

specified location

5.3.2 Constraining the design

The next task is to set the design constraints. Constraints are the instructions that the

designer gives to Design Compiler. They define what the synthesis tool can or cannot do

with the design or how the tool behaves. Usually this information can be derived from the

various design specifications (e.g. from timing specification).

There are basically two types of design constraints:

Page 37: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 37/63

 

Department of ECE, JNTUHCEH 37

5.3.2.1 Design Rule Constraints

Design rules constraints are implicit constraints which means that they are defined

 by the ASIC vendor in technology library. By specifying the technology library that Design

Compiler should use, also specify all design rules in that library and cannot discard or

override these rules.

5.3.2.2 Optimization Constraints

Optimization constraints are explicit constraints (set by the designer). They describe

the design goals (area, timing, and so on) the designer has set for the design and work as

instructions for the Design Compiler how to perform synthesis.

Design rule constraints comprise:

5.3.2.3 Maximum transition time

Longest time allowed for a driving pin of a net to change its logic value

5.3.2.4 Maximum fanout

Maximum fanout for a driving pin

5.3.2.5 Maximum (and minimum) capacitance

The maximum (and minimum) total capacitive load that an output pin can drive. The

total capacitance comprises of load pin capacitance and interconnect capacitances.

5.3.2.6 Cell degradation

Some technology libraries contain cell degradation tables. The cell degradation

tables list the maximum capacitance that can be driven by a cell as a function of the

transition times at the inputs of the cell.

5.3.2.7 System clock definition and clock delays

Clock constraints are the most important constraints in your ASIC design. The clocksignal is the synchronization signal that controls the operation of the system. The clock

signal also defines the timing requirements for all paths in the design. Most of the other

timing constraints are related to the clock signal.

5.3.2.8 Multicycle paths

A multicycle path is an exception to the default single cycle timing requirement of

 paths. That is, on a multicycle path the signal requires more than a single clock cycle to

 propagate from the path startpoint to the path endpoint.

Page 38: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 38/63

 

Department of ECE, JNTUHCEH 38

5.3.2.8 Input and output delays

Input and output delays constrain external path delays at the boundaries of a

design. Input delay is used to model the path delay from external inputs to the first

registers in the design. Output delay constrain the path from the last register to the

outputs of the design.

5.3.2.9 Minimum and maximum path delays

Minimum and maximum path delays allow constraining paths individually

and setting specific timing constraints on those paths..

5.3.4 Optimizing the Design 

The following section presents the behavior of Design Compiler optimization step.

The optimization step translates the HDL description into gate-level netlist using the cells

available in the technology library. The optimization is done in several phases. In each

optimization phase different optimization techniques are applied according to the design

constraints. 

5.3.4.1 Gate-level Optimizations

Gate-level optimizations work on the technology-independent netlist and maps it to

the library cells to produce a technology-specific gate-level netlist. Gate-level optimizations

include the following processes:

5.3.4.2 Area Optimization

Area optimization is the last step that Design Compiler performs on the design.

During this phase, only those optimizations that don't break design rules or timing

constraints are allowed.

5.3.5 Reporting and Analyzing the Design

Once the synthesis has been completed, need to analyze the results. Design

Compiler provides together with its graphical user interface (Design Vision) various means

to debug the synthesized design. These include both textual reports that can be generated for

different design objects and graphical views that help inspecting and visualizing the design.

Page 39: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 39/63

 

Department of ECE, JNTUHCEH 39

There are basically two types of analysis methods and tools:

5.3.5.1 Generating reports for design object properties

Reporting commands generate textual reports for various design objects:

timing and area, cells, clocks, ports, buses, pins, nets, hierarchy, resources,

constraints in the design, and so on.

5.3.5.2 Visualizing design objects (Design Vision)

Some design objects and their properties can be analyzed graphically. may

examine for example the design schematic and explore the design structure, visualize

critical and other timing paths in the design, generate histograms for various metrics

and so on.

5.3.6 Save Design

The final task in synthesis with Design Compiler is to save the synthesized design.

The design can be saved in many formats but should save for example the gate-level netlist

(usually in Verilog) and/or the design database. Remember that by default, Design Compiler

does not save anything when exiting.

Page 40: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 40/63

 

Department of ECE, JNTUHCEH 40

Chapter -6

Results

6.1 ASIC Results 

In this chapter will see the simulation and synthesis results of the various

multipliers along with the proposed design. 

6.1.1 Simulation Results 

Figure 6.1 Simulation Result of 16 x 16 HPM Multiplier

Analysis

Signal In/Out Description

clk input Input to the multiplier  

Rst input Input to the multiplier

datain1[15:0] input Input to the multiplier  

datain2[15:0] input Input to the multiplier  

dataout[31:0] output Output of the mulitplier

Table 6.1 Analysis of 16 x 16 HPM Multiplier  

Page 41: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 41/63

 

Department of ECE, JNTUHCEH 41

Figure 6.2 Simulation Result of 32 x 32 HPM Multiplier

Analysis

Signal In/Out Description

clk input Input to the multiplier  

rst input Input to the multiplier

datain1[31:0] input Input to the multiplier  

datain2[31:0] input Input to the multiplier  

dataout[63:0] output Output of the mulitplier

Table 6.2 Analysis of 32 x 32 HPM Multiplier  

Page 42: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 42/63

Page 43: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 43/63

 

Department of ECE, JNTUHCEH 43

Figure 6.4 Simulation Result of 32 x 32 Recursive Multiplier with clock gating 

Analysis

Signal In/Out Description

clk input Input to the multiplier  

Rst input Input to the multiplier

datain1[31:0] input Input to the multiplier  

datain2[31:0] input Input to the multiplier  

dataout[63:0] output Output of the mulitplier

Table 6.4 Analysis of 32 x 32 Recursive Multiplier with clock gating

Page 44: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 44/63

Page 45: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 45/63

 

Department of ECE, JNTUHCEH 45

6.2.2 Power report of 16 x 16 Basic HPM

The above report simply displays the total power of the design. Dynamic power is

the power dissipated when the circuit is active i.e. performing some function. Dynamic

 power is further divided into two components: Switching power and Internal power.

Switching power is dissipated when charging and discharging the load capacitance at

the cell output. The amount of switching power depends on the switching activity (is related

to the operating frequency) of the cell. The more there are logic transitions on the cell

output, the more switching power increases.

Internal power is consumed within a cell for charging and discharging internal cell

capacitances. Internal power also includes short-circuit power. During logic transitions both

P and N type transistors are both on simultaneously for a short time causing direct

connection from Vdd rail to ground rail.

Page 46: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 46/63

 

Department of ECE, JNTUHCEH 46

6.2.3 Timing report of 16 x 16 Basic HPM

The delay report shows delay calculation in two sections: the first section for data

arrival time calculation and the second for data required time calculation. The data arrival

time is the time required for signal to travel from path start point to a path end point. The

data required time is the maximum time a signal has for traveling that path. The difference

of data required time and data arrival time is called slack or timing margin of the path. If

slack is negative, there is a timing violation on that path. 

Page 47: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 47/63

 

Department of ECE, JNTUHCEH 47

6.2.4 Area Report of 32 x 32 Basic HPM

The above report simply displays the total area of the design. The total area is the

sum of three factors: combinational, noncombinational, and net interconnect area. The total

cell area is due to logic cells in design is shown by the combinational (basic logic gates like

ANDs, ORs, and the like) and the noncombinational (registers) factors. The third factor

affecting the area (net interconnect area) is due to the wires connecting these cells . 

Page 48: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 48/63

 

Department of ECE, JNTUHCEH 48

6.2.5 Power report of 32 x 32 Basic HPM

The above report simply displays the total power of the design. Dynamic power is

the power dissipated when the circuit is active i.e. performing some function. Dynamic

 power is further divided into two components: Switching power and Internal power.

Switching power is dissipated when charging and discharging the load capacitance at

the cell output.The amount of switching power depends on the switching activity (is related

to the operating frequncy) of the cell. The more there are logic transitions on the cell output,

the more switching power increases.

Internal power is consumed within a cell for charging and discharging internal cell

capacitances. Internal power also includes short-circuit power. During logic transitions both

P and N type transistors are both on simultaneously for a short time causing direct

connection from Vdd rail to ground rail.

Page 49: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 49/63

 

Department of ECE, JNTUHCEH 49

6.2.6 Timing report of 32 x 32 Basic HPM

The delay report shows delay calculation in two sections: the first section for data

arrival time calculation and the second for data required time calculation. The data arrival

time is the time required for signal to travel from path start point to a path end point. The

data required time is the maximum time a signal has for traveling that path. The difference

of data required time and data arrival time is called slack or timing margin of the path. If

slack is negative, there is a timing violation on that path. 

Page 50: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 50/63

 

Department of ECE, JNTUHCEH 50

6.2.7 Area Report of 16 x 16 Recursive Multiplier with Clock Gating

The above report simply displays the total area of the design. The total area is the

sum of three factors: combinational, noncombinational, and net interconnect area. The total

cell area is due to logic cells in design is shown by the combinational (basic logic gates like

ANDs, ORs, and the like) and the noncombinational (registers) factors. The third factor

affecting the area (net interconnect area) is due to the wires connecting these cells . 

Page 51: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 51/63

Page 52: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 52/63

Page 53: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 53/63

Page 54: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 54/63

 

Department of ECE, JNTUHCEH 54

6.2.11 Power Report of 32 x 32 Recursive Multiplier with Clock Gating 

The above report simply displays the total power of the design. Dynamic power isthe power dissipated when the circuit is active i.e. performing some function. Dynamic

 power is further divided into two components: Switching power and Internal power.

Switching power is dissipated when charging and discharging the load capacitance at

the cell output.The amount of switching power depends on the switching activity (is related

to the operating frequncy) of the cell. The more there are logic transitions on the cell output,

the more switching power increases.

Page 55: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 55/63

 

Department of ECE, JNTUHCEH 55

Internal power is consumed within a cell for charging and discharging internal cell

capacitances. Internal power also includes short-circuit power. During logic transitions both

P and N type transistors are both on simultaneously for a short time causing direct

connection from Vdd rail to ground rail.

6.2.12 Timing Report of 32 x 32 Recursive Multiplier with Clock Gating

The delay report shows delay calculation in two sections: the first section for data

arrival time calculation and the second for data required time calculation. The data arrival

time is the time required for signal to travel from path start point to a path end point. The

data required time is the maximum time a signal has for traveling that path. The difference

of data

required time and data arrival time is called  slack  or timing margin of the path. If slack is

Page 56: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 56/63

 

Department of ECE, JNTUHCEH 56

negative, there is a timing violation on that path. 

6.3 Comparison

The comparison between the TABLE 5.1 (Basic HPM Multiplier) and TABLE 5.2

(Recursive multiplier with clock gating) summarizes the enhanced performance of the proposed

multiplier in terms of percentages which are listed in TABLE 5.3. The summary of Area, Power

and Delay comparisons in Table 5.1 and 5.2 for 16 and 32 bit are plotted in figures. 5.6, 5.7 and

5.8 respectively.

Multiplier

Word size

Type of

Operation

Area (µm2) Delay

(ns)

Power

(µW)

16 x 16

32 x 32

Basic HPM

Basic HPM

12771.8

44388.3

7.43

13.13

313

815

Table 6.5 HPM Multiplier  

Table 6.6 Recursive Multiplier with Clock Gating 

Multiplier

Word size

Type of

Operation

Area (µm2) Delay

(ns)

Power

(µW)

16 x 16

32 x 32

Recursive

Multiplication

with Clock

Gating

Recursive

Multiplication with

Clock gating

14297.7

48912.5

7.13

12.13

161

443.7

Page 57: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 57/63

 

Department of ECE, JNTUHCEH 57

Table 6.7 Percentage Results of Recursive Multiplier with Clock gating with reference

to the HPM

Area Comparison plot for the tables 6.5 & 6.6 of HPM and Recursive Multiplier with

clock gating 

Area comparison of 16 and 32 bit Multipliers

Figure 6.5 Area comparison plot

As shown in the figure 6.5 it is clearly observed that the Area occupied by the 16 bit

is more than compared to the 16 bit basic HPM multiplier, similarly the area occupied by

the 32 bit recursive multiplier is more than compared to the 32 bit basic HPM multiplier.

05000

10000

15000

20000

25000

30000

35000

40000

4500050000

16 x 16 32 x 32

HPM

Recursive

Multiplication with

Clock Gating

Multiplier

Word size

Area (%) Delay (%) Power (%)

16 x 16

32 x 32

11.94 

10.19

-4.03

-7.16

-48.5

-45.5

Page 58: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 58/63

 

Department of ECE, JNTUHCEH 58

Power Comparison plot for the tables 6.5 & 6.6 of HPM and Recursive Multiplier

with clock gating

Power comparison of 16 and 32 bit Multipliers

Figure 6.6 Power Comparison Plot 

As shown in the figure 6.6 it is clearly observed that the power consumed by the 16

 bit recursive multiplier is less than compared to the 16 bit basic HPM multiplier, similarly

the power consumed by the 32 bit recursive multiplier is less than compared to the 32 bit

 basic HPM multiplier.

0

100

200

300

400

500

600

700

800

900

16 x 16 32 x 32

HPM

Recursive

Multiplication

with Clock

Gating

Page 59: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 59/63

 

Department of ECE, JNTUHCEH 59

Delay Comparison plot for the tables 6.5 & 6.6 of HPM and Recursive Multiplier

with clock gating

Figure 6.7 Delay Comparison Plot 

As shown in the figure 6.7 it is clearly observed that the Delay of the 16 bit recursive

multiplier is less than compared to the 16 bit basic HPM multiplier, similarly the delay of

the 32 bit recursive multiplier is less than compared to the 32 bit basic HPM multiplier.  

Delay comparison of 16 and 32 bit Multipliers 

0

2

4

6

8

10

12

14

16 x 16 32 x 32

HPM

Recursive

Multiplication with

Clock Gating

Page 60: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 60/63

 

Department of ECE, JNTUHCEH 60

Chapter - 7 

Conclusion 

In this thesis, successfully achieved a faster and low power multiplication by using

a combination of High Performance Multiplication [HPM] column reduction technique,

implementing a N-bit multiplier by recursive multiplication and acceleration of the final

addition using a hybrid adder (RCA and BEC Adder) and low power has been achieved

 by using clock gating technique. The result analysis shows that area overheads are not

significant when compared to the increase in speed and reduction in power

consumption. The proposed multiplier design technique can be implemented with any

type of parallel multipliers to achieve faster and low power performance.

The design is implemented using Verilog HDL  and simulated with the help of

VCS Compiler and Synthesis is done by using Design Compiler and with the proposed

architecture, double-throughput has been achieved and the results show that for the 32-bit

 proposed multiplier is as much as faster, occupies more area and consumes lesser power

with respect to the regular HPM multiplier. 

Page 61: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 61/63

 

Department of ECE, JNTUHCEH 61

Chapter - 8 

Future Scope

As an attempt to develop fast and low-power multiplier design, the research

 presented in this dissertation has achieved good results and demonstrated the efficiency of

high level optimization techniques. However, there are limitations in our work and several

future research directions are possible.

The results analysis shows that there is a increase in speed and reduction in power

consumption at synthesis level, by implementing physical design we can still improve

increase in speed and power consumption that would prove better according to situation and

require less power and consume less time.

Page 62: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 62/63

 

Department of ECE, JNTUHCEH 62

BIBLIOGRAPHY

[1] B.Parhami, "Computer Arithmetic", Oxford University Press, 2000.

[2] E. E. Swartzlander, Jr. and G. Goto, "Computer arithmetic," The Computer

Engineering Handbook, V. G. Oklobdzija, ed., Boca Raton, FL: CRC Press, 2002.

[3] C. S. Wallace, “A Suggestion for a Fast Multiplier ,” IEEE Transactions on

Electronic Computers, Vol. EC-13, pp. 14-17, 1964.

[4] Luigi Dadda, “Some Schemes for Parallel Multipliers,” Alta Frequenza, Vol. 34, pp.

349-356, August 1965

[5] H. Eriksson, P. Larsson-Edefors, M. Sheeran, M. Själander, D. Johansson, and M.

Schölin, “Multiplier reduction tree with logarithmic logic depth and regular

connectivity,”  in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2006, pp. 4 – 8.

[6] V. G. Oklobdzija and D.Villeger , “Improving Multiplier Design by Using Improved

Column Compression Tree and Optimized Final Adder in CMOS Technology”, IEEE

transactions on Very Large Scale Integration (VLSI) systems, Vol. 3, no. 2, June 1995.

[7] Magnus Själander and Per Larsson-Edefors, ”  Multiplication Acceleration

Through Twin Precision “, IEEE Trans. O VLSI Systems vol. 17, no. 9, pp. 1233-1245 Sep

2009.[8] V. G. Oklobdzija and D.Villeger , “Improving Multiplier Design by Using Improved

Column Compression Tree and Optimized Final Adder in CMOS Technology”, IEEE

transactions on Very Large Scale Integration (VLSI) systems, Vol. 3, no. 2, June 1995.

[9] Paul F.Stelling, “Design strategies for optimal hybrid final adders in parallel

multiplier ”,Journal of VLSI signal processing, vol 14,pp,321-331,1996.

[10] Sabyasachi Das and Sunil P.Khatri,"Generation of the Optimal Bit-Width

Topology of the Fast Hybrid Adder in a Parallel Multiplier", International

Conference on Integrated Circuit Design and Technology (ICICDT) May, 2007.

[11] B.Ramkumar, Harish M Kittur and P.Mahesh Kannan, “ ASIC Implementation of

Modified Faster Carry Save Adder ”, European Journal of Scientific Research, Vol. 42,

Issue 1, 2010.

[12] B.Ramkumar, Harish M Kittur, “Low Area, Low Power CSLA”, IEEE transactions

on Very Large Scale Integration (VLSI) systems.

[13] K.C. Bickerstaff, E.E. Swartzlander, M.J. Schulte, Analysis of column

compression multipliers, Proceedings of 15th IEEE Symposium on Computer

Arithmeitc,2001.

Page 63: Design Of Fast and low power Multiplier

8/13/2019 Design Of Fast and low power Multiplier

http://slidepdf.com/reader/full/design-of-fast-and-low-power-multiplier 63/63

 

[14] W. J. Townsend, Earl E. Swartzlander and J.A. Abraham, “A  comparison of

Dadda and Wallace multiplier delays”,  Advanced Signal Processing

Algorithms, Architectures and Implementations XIII. Proceedings of the SPIE, vol.

5205, 2003, pages 552-560.

[15] Danysh and Swamlander Jr., "A recursive fast multiplier", Asilomar Conf. on

Signals,Systems & Computers, vol. 1, pp. 197 -201, 1998.