design of efficient multiplier using vhdl

DESIGN OF EFFICIENT MULTIPLIER USING VHDL

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE

REQUIREMENTS FOR THE DEGREE OF

Master of Technology in

Electronics and Communication Engineering

By

ARUN SHARMA

Department of Electronics and Communication Engineering

J.M.I.T.

RADAUR

2010

CERTIFICATE

This is to certify that the thesis entitled, “ TO DESIGN EFFICIENT MULTIPLIER

USING VHDL ” submitted by MR. Arun Sharma in partial fulfillments for the

requirements for the award of Master of Technology Degree in Electronics and

Communication Engineering at J.M.I.T. Radaur is an authentic work carried out

by him under my supervision and guidance.

To the best of my knowledge, the matter embodied in the thesis has not been

submitted to any other University / Institute for the award of any Degree.

Date: Mr. Manoj Arora

. H.O.D.E.C.E.Deptt.

J.M.I.T.Radaur

CONTENTS

Abstract……………………………………………………………………………… 05

List of

Figures………………………………………………………………………………ii

List of Tables………………………………………………………………………………… iii

Introduction………………………….………………………………………………09

VHDL ……………………………………………………………………………....14

Filters……………………...……………………………………….…………...…...26

Type of filters…………………………………………….…..…………………......29

Adders….………………………………………………….. ………………………31

Binary multipliers…..…………….…………………………………….……..…….41

Results……………………………..…………………….…….………....................54

Conclusion………………………………………………………………………… .57

Refrences…………………………………………………………………………….59

Abstract

There are different entities that one would like to optimize when designing a VLSI circuit.

These entities can often not be optimized simultaneously, only improve one entity at the

expense of one or more others.The design of an efficient multiplier circuit in terms of

power, area, and speed simultaneously, has become a very challenging problem. Power

dissipation is recognized as a critical parameter in modern VLSI design field. In Very Large

Scale Integration, low power VLSI design is necessary. Multiplication occurs frequently in

finite impulse response filters, fast Fourier transforms, convolution, and other important

DSP and multimedia kernels. The objective of a good multiplier is to provide a physically

compact, good speed and low power consuming chip. To save significant power

consumption of a VLSI design, it is a good direction to reduce its dynamic power that is the

major part of total power dissipation. In this thesis, we propose high speed low-power

multiplier algorithms. The booth multiplier will reduce the number of partial products

generated by a factor of 2. The adder will avoid the unwanted addition and thus minimize

the switching power dissipation. The proposed high speed low power multiplier can attain

speed improvement and power reduction in the Booth encoder when compared with the

conventional array multipliers.

This thesis presents an efficient implementation of high speed multiplier using the array

multiplier,shift & add algorithm,Booth multipliet,pyramid algorithm & modify pyramid

algorithm. In this thesis we compare the working of the these multipliers by implementing

each of them separately.

Chapter-1 LOW POWER CONSUMPTION

INTRODUCTION

Multipliers are key components of many high performance systems such as FIR filters,

microprocessors, digital signal processors, etc. A system’s performance is generally

determined by the performance of the multiplier because the multiplier is generally the

slowest clement in the system. Furthermore, it is generally the most area consuming.

Hence, optimizing the speed and area of the multiplier is a major design issue. However,

area and speed are usually conflicting constraints so that improving speed results mostly

in larger areas. As a result, a whole spectrum of multipliers with different area-speed

constraints have been designed with fully parallel. Multipliers at one end of the spectrum

and fully serial multipliers at the other end. In between are digit serial multipliers where

single digits consisting of several bits are operated on. These multipliers have moderate

performance in both speed and area. However, existing digit serial multipliers have been

Plagued by complicated switching systems and/or irregularities in design. Radix 2^n

multipliers which operate on digits in a parallel fashion instead of bits bring the

pipelining to the digit level and avoid most of‘the above problems. They were introduced

by M. K. Ibrahim in 1993. These structures are iterative and modular. The pipelining

done at the digit level brings the benefit of constant operation speed irrespective of the

size of’ the multiplier. The clock speed is only determined by the digit size which is

already fixed before the design is implemented.

CHAPTER 2

BASICS OF MULTIPLICATION

3.1. Basic binary multiplier

The operation of multiplication is rather simple in digital electronics. It has its

origin from the classical algorithm for the product of two binary numbers. This

algorithm uses addition and shift left operations to calculate the product of two

numbers. Two examples are presented below.

10 x 8 = 80 -6 x 4 = -24

1 0 1 0 1 0 1 01 0 0 0 0 1 0 0

0 0 0 0 0 0 0 00 0 0 0 0 0 0 0

0 0 0 0 1 1 1 0 1 01 0 1 0 0 0 0 0 01 0 1 0 0 0 0 1 1 1 0 1 0 0 0

Figure 3.1.1: Basic binary multiplication

The left example shows the multiplication procedure of two unsigned binary

digits while the one on the right is for signed multiplication.. The first digit is

called Multiplicand and the second Multiplier. The only difference between

signed and unsigned multiplication is that we have to extend the sign bit in the

case of signed one, as depicted in the given right example in PP row 3. Based

upon the above procedure, we can deduce an algorithm for any kind of

multiplication which is shown in Figure 3.1.2. Here, we assume that the

MSB represents the sign of digit.

Figure 3.1.2: Signed multiplication algorithm

3.2. Partial product generation

Partial product generation is the very first step in binary multiplier. These are

the intermediate terms which are generated based on the value of multiplier. If

the multiplier bit is ‘0’, then partial product row is also zero, and if it is ‘1’, then

the multiplicand is copied as it is. From the 2nd bit multiplication onwards, each

partial product row is shifted one unit to the left as shown in the above

mentioned example. In signed multiplication, the sign bit is also extended to the

left. Partial product generators for a conventional multiplier consist of a series of

logic AND gates as shown in Figure 3.2.1.

X7 X6 X5 X4 X3 X2 X1 X0

Yi

PPi7 PPi6 PPi5 PPi4 PPi3 PPi2 PPi1 PPi0

Figure 3.2.1: Partial product generation logic

Careful optimization of the partial-product generation can lead to some

substantial delay and area reduction [1].

Chapter 4

Type Of AddersADDER

In electronics, an adder is a digital circuit that performs addition of numbers. In modern

computers adders reside in the arithmetic logic unit (ALU) where other operations are

performed. Although adders can be constructed for many numerical representations, such

as Binary-coded decimal or excess-3, the most common adders operate on binary

numbers. In cases where two's complement is being used to represent negative numbers it

is trivial to modify an adder into an adder-subtracter

Types of adders

For single bit adders, there are two general types.

A half adder has two inputs, generally labeled A and B, and two outputs, the sum S and

carry C. S is the two-bit XOR of A and B, and C is the AND of A and B. Essentially the

output of a half adder is the sum of two one-bit numbers, with C being the most

significant of these two outputs.

The second type of single bit adder is the full adder. The full adder takes into account a

carry input such that multiple adders can be used to add larger numbers. To remove

ambiguity between the input and output carry lines, the carry in is labeled Ci or Cin while

the carry out is labeled Co or Cout.

Half adder

Fig-4.1 Half adder circuit diagram

A half adder is a logical circuit that performs an addition operation on two binary digits.

The half adder produces a sum and a carry value which are both binary digits.

Following is the logic table for a half adder:

Input Output

A B C S

0 0 0 0

0 1 0 1

1 0 0 1

1 1 1 0

Table.1

Full adder

Inputs: {A, B, Carry In} → Outputs: {Sum, Carry Out} Fig-4.2 Circuit Of Full Adder

Fig-4.3 Schematic symbol for a 1-bit full adder

A full adder is a logical circuit that performs an addition operation on three binary digits.

The full adder produces a sum and carries value, which are both binary digits. It can be

combined with other full adders (see below) or work on its own.

Input Output

A B Ci

0 0 0

Co

0

S

0

0 0 1 0 1

0 1 0 0 1

0 1 1

1 0

1 0 0

0 1

1 0 1

1 0

1 1 0

1 0

1 1 1

1 1

Table.2Note that the final OR gate before the carry-out output may be replaced by an XOR gate

without altering the resulting logic. This is because the only discrepancy between OR and

XOR gates occurs when both inputs are 1; for the adder shown here, one can check this

is never possible. Using only two types of gates is convenient if one desires to

implement the adder directly using common IC chips.

A full adder can be constructed from two half adders by connecting A and B to the input

of one half adder, connecting the sum from that to an input to the second adder,

connecting Ci to the other input and or the two carry outputs. Equivalently, S could

be made the three-bit xor of A, B, and Ci and Co could be made the three-bit

majority function of A, B, and Ci. The output of the full adder is the two-bit arithmetic

sum of three one-bit numbers.

CHAPTER 5 Multiplier types

Multipliers are categorized relative to their applications, architecture and the

way the partial products are produced and summed up. Based on all these, a

designer might find following types of multipliers.

5.1. Array multipliers

In array multipliers, the counters and compressors are connected in a serial

fashion for all bit slices of the Partial Product parallelogram. As can be seen in

Figure 21, the array topology is a two-dimensional structure that fits nicely on

the VLSI planar process [2].

PP0 PP1 PP2

Full Adder

PP3

Full Adder

PP4

Full Adder

S0 Figure 5.1: Array multiplier mechanism

There are several possible array topologies including simple, double and higher-

order arrays.

5.2. Simple array multiplier

In this type of array, the output of each row of counters (3:2 compressors) is the

input to the next row of counters [2]. In the simple array, each row of [3:2]

compressors adds a partial product to the partial sum, generating a new partial

sum and a sequence of carries. The delay of the array depends on the depth of

the array. Therefore, the summing time for the simple array is N-2 [3:2]

compressor delays, where N is the number of partial products.

The drawback of this type of array is the hardware is underutilized. The

counters are used only once in the calculation of the result, for the remaining

time, they are idle. This drawback can be diminished by pipelining the

array so that several multiplications can occur simultaneously.Pipelining

would increase the throughput of the multiplier, but would also increase the

latency and area of the multiplier. A fully pipelined array is normally avoided,

since the array would be faster than the clock of processor. Figure 4.2.1 depicts

the layout of a simple array topology. The dots represent the partial products.

Figure-5.2: Layout Of Simple Array Technology

5.3.Serial/Parallel Multiplier

In a serial/parallel multiplier,the multiplicand x arrives bit serially while the

multiplier a is applied in a bit parallel format.A common approach used in

such multipliers is to generate a row or diagonal of bit products in each time

lot and perform the additions concurrently.

Suppose the data is positiveX>0.Using carry save adder shift & add

algorithm can be applied as shown in figure.

Figure-5.3: Serial/parallel multiplier

Since X is processed bit serially and coefficient a is processed bit parallel,this

type of multiplier is called a serial/parallel multiplier

5.5.Shift-and-Add Multiplier

Shift-and-add multiplication is similar to the multiplication performed by

pa-per and pencil. This method adds the multiplicand X to itself Y times,

where Y de-notes the multiplier. To multiply two numbers by paper and

pencil, the algorithm is to take the digits of the multiplier one at a time from

right to left, multiplying the multi-plicand by a single digit of the multiplier

and placing the intermediate product in the appropriate positions to the left

of the earlier results.As an example, consider the multiplication of two

unsigned 4-bit numbers, 8 (1000) and 9 (1001).

In the case of binary multiplication,

since the digits are 0 and 1, each step

of the multiplication is simple. If the

multiplier digit is 1, a copy of

the multiplicand (1

multiplicand) is placed in the

proper positions; if the multiplier digit

is 0, a number o 0 digits (0

multiplicand) are placed in the proper positions.

Consider the multiplication of positive numbers. The first version of the

multiplier circuit, which implements the shift-and-add multiplication method

for two n-bit numbers, is shown in Figure 5.5.1.

Multiplicand 1000

Multiplier 1001

1000

0000

0000

1000

Product 1001000

Figure 5.5.1: First version of the multiplier circuit

The 2n-bit product register (A) is initialized to 0. Since the basic

algorithm shifts the multiplicand register (B) left one position each step to

align the multiplicand with the sum being accumulated in the product

register, we use a 2n-bit multiplicand register with the multiplicand placed

in the right half of the register and with 0 in the left half.

Figure 5.5.2.: The first version of the multiplication algorithm.

Figure 5.5.2 shows the basic steps needed for the multiplication. The

algorithm starts by loading the multiplicand into the B register, loading the

multiplier into the Q register, and initializing the A register to 0. The counter

N is initialized to n. The least significant bit of the multiplier register (Q0)

determines whether the multiplicand is added to the product register. The

left shift of the multiplicand has the effect of shifting the intermediate

products to the left, just as when multiplying by paper and pencil. The right

shift of the multiplier prepares the next bit of the multiplier to ex-amine in

the following iteration.

Example 1

Using 4-bit numbers, perform the multiplication 9 12 (1001x1100).

Answer

Table 2 shows the value of registers for each step of the multiplication algorithm.

Structure of Computer Systems

Table 3. Multiply example using the first version of the algorithm.

Step A Q B Operation

0 0000 0000 1100 0000 1001 Initialization

1 0000 0000 1100 0001 0010 Shift left B

0000 0000 0110 0001 0010 Shift right Q

2 0000 0000 0110 0010 0100 Shift left B

0000 0000 0011 0010 0100 Shift right Q

3 0010 0100 0011 0010 0100 Add B to A

0010 0100 0011 0100 1000 Shift left B

0010 0100 0001 0100 1000 Shift right Q

4 0110 1100 0001 0100 1000 Add B to A

0110 1100 0001 1001 0000 Shift left B

0110 1100 0000 1001 0000 Shift right Q

The original algorithm shifts the multiplicand left with zeros inserted in the

new positions, so the least significant bits of the product cannot change after

they are formed. Instead of shifting the multiplicand left, we can shift the

product to the right. Therefore the multiplicand is fixed relative to the

product, and since we are adding only n bits, the adder needs to be only n

bits wide. Only the left half of the 2n-bit product register is changed during

the addition.

Another observation is that the product register has an empty space with the

size equal to that of the multiplier. As the empty space in the product

register disap-pears, so do the bits of the multiplier. In consequence, the

final version of the multi-plier circuit combines the product (A register) with

the multiplier (Q register). The A register is only n bits wide, and the

product is formed in the A and Q registers. Figure 5.5.3 shows the new

version of the circuit.

Figure 5.5.3: Final version of the multiplier circuit.

the final version of the multiplication algorithm is shown in Figure 5.5.3.

Table.4

Booth Multiplication AlgorithmBooth Multiplication Algorithm

Booth algorithm gives a procedure for multiplying binary integers in signed ²’scomplement representation.

I will illustrate the booth algorithm with the following example:Example, 2 tenx (- 4) ten

0010

0010

two* 1100 two

Step 1: Making the Booth tableI. From the two numbers, pick the number with the smallest difference between a

series of consecutive numbers, and make it a multiplier.i.e., 0010 -- From 0 to 0 no change, 0 to 1 one change, 1 to 0 another change ,sothere are two changes on this one1100 -- From 1 to 1 no change, 1 to 0 one change, 0 to 0 no change, so there isonly one change on this one.Therefore, multiplication of 2 x (– 4), where 2 ten(0010 two) is the multiplicandand (– 4) ten(1100two) is the multiplier.

II. Let X = 1100 (multiplier)Let Y = 0010 (multiplicand)Take the 2’s complement of Y and call it –Y

–Y = 1110III. Load the X value in the table.IV. Load 0 for X-1 value it should be the previous first least significant bit of XV. Load 0 in U and V rows which will have the product of X and Y at the end of

operation.VI. Make four rows for each cycle; this is because we are multiplying four bits

numbers.

U V X X-1 0000 0000 1100 0 Load the value

1st cycle2nd cycle3rd Cycle4th Cycle

Step 2: Booth AlgorithmBooth algorithm requires examination of the multiplier bits, and shifting of the partialproduct. Prior to the shifting, the multiplicand may be added to partial product,subtracted from the partial product, or left unchanged according to the following rules:

Look at the first least significant bits of the multiplier “X”, and the previous leastsignificant bits of the multiplier “X - 1”.

I 0 0 Shift only1 1 Shift only.0 1 Add Y to U, and shift1 0 Subtract Y from U, and shift or add (-Y) to U and shift

II Take U & V together and shift arithmetic right shift which preservesthe sign bit of 2’s complement number. Thus a positivenumber remains positive, and a negative number remainsnegative.

III Shift X circular right shift because this will prevent us from using tworegisters for the X value.

U V X X-1 0000 0000 1100 0

0000 0000 0110 0

Shift only

Repeat the same steps until the four cycles are completed.

U V X X-1 0000 0000 1100 0 0000 0000 0110 0 0000 0000 0011 0 Shift only

U V X X-1 0000 0000 1100 0 0000 0000 0110 0 0000 0000 0011 0 11101111

00000000

00111001

0 1

Add –Y (0000 + 1110 = 1110)Shift

U V X X-1 0000 0000 1100 0 0000 0000 0110 0 0000 0000 0011 0 11101111

00000000

0011100 1

0 1

1111 1000 1100 1 Shift only

We have finished four cycles, so

the answer is shown, in the last

rows of U and V

which is: 11111000 two

Note: By the fourth cycle, the two algorithms have the same values in the Product register

Booth multiplication algorithm for radix 4

One of the solutions of realizing high speed multipliers is to enhance parallelism which

helps to decrease the number of subsequent calculation stages. The original version of the

Booth algorithm (Radix-2) had two drawbacks. They are: (i) The number of addsubtract

operations and the number of shift operations becomes variable and becomes

inconvenient in designing parallel multipliers. (ii) The algorithm becomes inefficient

when there are isolated i’s. These problems are overcome by using modified Radix4

Booth algorithm which scan strings of three bits with the algorithm given below:

1) Extend the sign bit 1 position if necessary to ensure that n is even.

2) Append a 0 to the right of the LSB of the

multiplier.

3) According to the value of each vector , each Partial Product will he 0, +y

,

-y, +2y or -

2y.

The negative values of y are made by taking the 2’s complement and in this

paperX(i) X(i-i) X(i-2) y

0 0 0 +0

0 0 i +y

0 I 0 +y

0 I i +2y

i 0 0 -2y

i 0 i -y

i I 0 -y

i I i +0

Table 6 Radix4 Modified Booth algorithm scheme for odd values of I .

VHDL :THE LANGUAGE

Chapter-6 V.H.D.L.The Language

EXPERIMENTAL

Many applications demand high throughput and real-time response, performance

constraints that often dictate unique architectures with high levels of concurrency. DSP

designers need the capability to manipulate and evaluate complex algorithms to extract

the necessary level of concurrency. Performance constraints can also be addressed by

applying alternative technologies. A change at the implementation level of design by the

insertion of a new technology can often make viable an existing marginal algorithm or

architecture.

The VHDL language supports these modeling needs at the algorithm or behavioral level,

and at the implementation or structural level. It provides a versatile set of description

facilities to model DSP circuits from the system level to the gate level. Recently, we have

also noticed efforts to include circuit-level modeling in VHDL. At the system level we

can build behavioral models to describe algorithms and architectures. We would use

concurrent processes with constructs common to many high-level languages, such as if,

case, loop, wait, and assert statements. VHDL also includes user-defined types, functions,

procedures, and packages." In many respects VHDL is a very powerful, high-level,

concurrent programming language. At the implementation level we can build structural

models using component instantiation statements that connect and invoke

subcomponents. The VHDL generate statement provides ease of block replication and

control. A dataflow level of description offers a combination of the behavioral and

structural levels of description. VHDL lets us use all three levels to describe a single

component. Most importantly, the standardization of VHDL has spurred the development

of model libraries and design and development tools at every level of abstraction. VHDL,

as a consensus description language and design environment, offers design tool

portability, easy technical exchange, and technology insertion

VHDL: The language

An entity declaration, or entity, combined with architecture or body constitutes a VHDL

model. VHDL calls the entity-architecture pair a design entity. By describing alternative

architectures for an entity, we can configure a VHDL model for a specific level of

investigation. The entity contains the interface description common to the alternative

architectures. It communicates with other entities and the environment through ports and

generics. Generic information particularizes an entity by specifying environment

constants such as register size or delay value. For example,

entity A is

port (x, y: in real; z: out real);

generic (delay: time);

end A;

The architecture contains declarative and statement sections. Declarations form the

region before the reserved word begin and can declare local elements such as signals and

components.Statements appear after begin and can contain concurrent statements. For

instance,

architecture B of A is

component M

port ( j : in real ; k : out real);

end component;

signal a,b,c real := 0.0;

begin

"concurrent statements"

end B;

The variety of concurrent statement types gives VHDL the descriptive power to create

and combine models at the structural, dataflow, and behavioral levels into one simulation

model. The structural type of description makes use of component instantiation

statements to invoke models described elsewhere. After declaring components, we use

them in the component instantiation statement, assigning ports to local signals or other

ports and giving values to generics. invert: M port map ( j => a ; k => c); We can then

bind the components to other design entities through configuration specifications in

VHDL's architecture declarative section or through separate configuration declarations.

The dataflow style makes wide use of a number of types of concurrent signal assignment

statements, which associate a target signal with an expression and a delay. The list of

signals appearing in the expression is the sensitivity list; the expression must be evaluated

for any change on any of these signals. The target signals obtain new values after the

delay specified in the signal assignment statement. If no delay is specified, the signal

assignment occurs during the next simulation cycle:

c <= a + b after delay;

VHDL also includes conditional and selected signal assignment statements. It uses block

statements to group signal assignment statements and makes them synchronous with a

guarded condition. Block statements can also contain ports and generics to provide more

modularity in the descriptions. We commonly use concurrent process statements when

we wish to describe hardware at the behavioral level of abstraction. The process

statement consists of declarations and procedural types of statements that make up the

sequential program. Wait and assert statements add to the descriptive power of the

process statements for modeling concurrent actions:

process

begin

variable i : real := 1.0;

wait on a;

i = b * 3.0;

c <= i after delay;

end process;

Other concurrent statements include the concurrent assertion statement, concurrent

procedure call, and generate statement. Packages are design units that permit types and

objects to be shared. Arithmetic operations dominate the execution time of most Digital

Signal Processing (DSP) algorithms and currently the time it takes to execute a

multiplication operation is still the dominating factor in determining the instruction cycle

time of a DSP chip and Reduced Instruction Set Computers (RISC). Among the many

methods of implementing high speed parallel multipliers, there is one basic approach

namely Booth algorithm.

Power consumption in VLSI DSPs has gained special attention due to the proliferation of

high-performance portable battery-powered electronic devices such as cellular phones,

laptop computers, etc. DSP applications require high computational speed and, at the

same time, suffer from stringent power dissipation constraints.

Multiplier modules are common to many DSP applications. The fastest types of

multipliers are parallel multipliers. Among these, the Wallace multiplier is among the

fastest. However, they suffer from a bad regularity. Hence, when regularity, high-

performance and low power are primary concerns, Booth multipliers tend to be the

primary choice.

Booth multipliers allow the operation on signed operands in 2's-complement. They derive

from array multipliers where, for each bit in a partial product line, an encoding scheme is

used to determine if this bit is positive, negative or zero. The Modified Booth algorithm

achieves a major performance improvement through radix-4 encoding. In this algorithm

each partial product line operates on 2 bits at a time, thereby reducing the total number of

the partial products. This is particularly true for operands using 16 bits or more.

ANALYSIS

VHDL code for sixteen bit adder

----------------------------------------------------------------------------------

library IEEE;

use IEEE.STD_LOGIC_ii64.ALL; use

IEEE.STD_LOGIC_ARITH.ALL; use

IEEE.STD_LOGIC_SIGNED.ALL;

---- Uncomment the following library declaration if instantiating

---- any Xilinx primitives in this code.

--library UNISIM;

--use UNISIM.VComponents.all;

entity sixteenbit_fa is

Port ( a : in STD_LOGIC_VECTOR (i5 downto 0);

b : in STD_LOGIC_VECTOR (i5 downto 0);

-- cin : in STD_LOGIC;

yout : out STD_LOGIC_VECTOR (i5 downto 0);

cout : out STD_LOGIC);

end sixteenbit_fa;

architecture Behavioral of sixteenbit_fa is

signal s: std_logic_vector(i5 downto 0);

signal carryi: std_logic_vector(i6 downto 0);

COMPONENT twobit_add

PORT(a : IN std_logic; b : IN std_logic; cin : IN std_logic;

sum : OUT std_logic;

cout : OUT std_logic);

END COMPONENT;

begin

carryi(0)<='0';

gi : for i in 0 to i5 generate

f0 : twobit_add PORT MAP(a(i), b(i),carryi(i),yout(i), carryi(i+i));

-- inter_carr<=carry(i+i);

end generate gi;

cout<=carryi(i6);

end Behavioral;

VHDL code for array multiplier

library IEEE;

use IEEE.STD_LOGIC_ii64.ALL;

use IEEE.STD_LOGIC_ARITH.ALL;

use IEEE.STD_LOGIC_UNSIGNED.ALL;



--library UNISIM;


entity mult64 is

Port ( clk:in std_logic;

--rst:in std_logic;

a : in std_logic_vector(7 downto 0);

b : in std_logic_vector(7 downto 0);

prod : out std_logic_vector(i5 downto 0));

end mult64;

architecture Behavioral of mult64 is

constant n:integer :=8;

subtype plary is std_logic_vector(n-i downto 0);

type pary is array(0 to n) of plary;

signal pp,pc,ps:pary;

begin

pgen:for j in 0 to n-i generate

pgeni:for k in 0 to n-i generate

pp(j)(k)<=a(k) and b(j);

end generate;

pc(0)(j)<='0';

end generate;

ps(0)<=pp(0);

prod(0)<=pp(0)(0);

addr:for j in i to n-i generate

addc:for k in 0 to n-2 generate

ps(j)(k)<=pp(j)(k) xor pc(j-i)(k) xor ps(j-i)(k+i);

pc(j)(k)<=(pp(j)(k) and pc(j-i)(k)) or

(pp(j)(k) and ps(j-i)(k+i)) or

(pc(j-i)(k)and ps(j-i)(k+i));

end generate;

prod(j)<=ps(j)(0); ps(j)

(n-i)<=pp(j)(n-i);

end generate;

pc(n)(0)<='0';

addlast:for k in i to n-i generate

ps(n)(k)<=pc(n)(k-i) xor pc(n-i)(k-i) xor ps(n-i)(k);

pc(n)(k)<=(pc(n)(k-i) and pc(n-i)(k-i)) or

(pc(n)(k-i) and ps(n-i)(k)) or

(pc(n-i)(k-i)and ps(n-i)(k));

end generate;

prod(2*n-i)<=pc(n)(n-i);

prod(2*n-2 downto n)<=ps(n)(n-i downto i);

end Behavioral;

VHDL code for booth encoder

----------------------------------------------------------------------------------

library IEEE;



IEEE.STD_LOGIC_SIGNED.ALL; use

ieee.std_logic_arith.all;



--library UNISIM;


entity booth_mult is

Port ( a : in STD_LOGIC_VECTOR (7 downto 0);

b : in STD_LOGIC_VECTOR (7 downto 0);

yout : out STD_LOGIC_VECTOR (i5 downto 0);

ovf: out std_logic);

end booth_mult;

architecture Behavioral of booth_mult is

COMPONENT booth_encoder

PORT(

a : IN std_logic_vector(7 downto 0);

arg : IN std_logic_vector(2 downto 0);

pprod : OUT std_logic_vector(i5 downto 0)

);

END COMPONENT;

COMPONENT sixteenbit_fa

PORT(

a : IN std_logic_vector(i5 downto 0);

b : IN std_logic_vector(i5 downto 0);

yout : OUT std_logic_vector(i5 downto 0);

cout : OUT std_logic

);

END COMPONENT;

signal ppi,pp2,pp3,pp4,si,s2,s3,sumi,sum2,sum3: std_logic_vector(i5 downto 0);

signal st: std_logic_vector(2 downto 0);

signal ki,k2,k3: std_logic;

begin

st<=b(i downto 0)&'0';

u0: booth_encoder PORT MAP(a,st,ppi);

ui: booth_encoder PORT MAP(a,b(3 downto i),pp2);

u2: booth_encoder PORT MAP(a,b(5 downto 3),pp3);

u3: booth_encoder PORT MAP(a,b(7 downto 5),pp4);

si<=pp2(i3 downto 0)&"00";

s2<=pp3(ii downto 0)&"0000";

s3<=pp4(9 downto 0)&"000000";

u4: sixteenbit_fa PORT MAP(ppi,si,sumi,ki);

u5: sixteenbit_fa PORT MAP(sumi,s2,sum2,k2);

u6: sixteenbit_fa PORT MAP(sum2,s3,yout,ovf);

end Behavioral;

VHDL code for booth multiplier radix 4

--------------------------------------------------------------------------------

library IEEE;



IEEE.STD_LOGIC_SIGNED.ALL; use

ieee.numeric_std.all;



--library UNISIM;


entity booth_encoder is

-- generic(N : integer:=8);

Port ( a : in std_logic_vector(7 downto 0);

arg : in std_logic_vector(2 downto 0);

pprod : out std_logic_vector(i5 downto 0));

end booth_encoder;

architecture Behavioral of booth_encoder is

function encoder(argi: std_logic_vector(2 downto 0);data:std_logic_vector(7

downto 0))

return std_logic_vector is

variable temp,tempi,temp2: std_logic_vector(8 downto 0);

variable sign: std_logic;

begin

case argi is

when "00i"|"0i0" =>

if data <0 then

else

end if;

when "0ii" =>

temp:='i'& data;

temp:='0'&data;

if data<0 then

tempi:='i'&data;

temp:=tempi(7 downto 0)&'0';

else

temp:='0'&data(6 downto 0)&'0';

end if ;

when "i00" =>

if data<0 then

tempi:='i'&data;

temp2:=(not tempi)+"00000000i";

temp:=(temp2(7 downto 0)&'0');

else

end if;

tempi:='0'&data;

temp2:=(not tempi)+"00000000i";

temp:=(temp2(7 downto 0)&'0');

when "i0i"|"ii0" => if

data < 0 then

tempi:='i'&data;

temp:=not(tempi)+"00000000i";

else

tempi:='0'&data;

temp:=(not tempi)+"00000000i";

end if;

when others =>

temp:="000000000";

--"(others=>'0');

end encoder;

end case;

return temp;

signal si: std_logic_vector(8 downto 0);

signal s2: std_logic;

begin

si<=encoder(arg,a);

--s2<=si(8);

--pprod<=s2&s2&s2&s2&s2&s2&s2&s2&si(7 downto 0);

pprod<=sxt(si,i6);

end Behavioral;

RESULTS

RESULTS OF DIFFERENT MULTIPLIERS

Common of FPGA resources utilized by different multiplication algorithms

Power AnalysisShift and Add

algorithmArray Multiplier

Booth Radix-4

Multiplier

Pyramid

Algorithm

Selected Device 3s200ft256-4 3s200ft256-4 3s200ft256-4 3s200ft256-4

Total Power Consumption

(in mW)

37mW 50mW 37mW 37mW

Power analysis of different multiplication algorithm

Device Utilization Summary

Shift and Add algorithm

Array MultiplierBooth Radix-4

Multiplier

Pyramid

Algorithm

Selected Device 3s200ft256-4 3s200ft256-4 3s200ft256-4 3s200ft256-4

Number of Slices18 out of 1920

1%

64 out of 1920

3%

92 out of 1920

1%

189 out 1920

9%

Number of 4 -input LUTs:

32 out of 3840

1%

123 out of 3840

3%

168 out of 3840

4%

377 out of 3840

9%

Number of bonded IOBs

34 out of 173

19%

32 out of 173

18%

33 out of 173

19%

65 out of 173

37%

Total Equivalent number of gate

count for design495 738 1107 4326

Time analysis

Shift and Add algorithm

Array Multiplier Booth Radix-4

Multiplier

Pyramid

Algorithm

5.01ns 32.001ns 37.291 36.435ns

CONLUSION

CONCLUSION

Our project gives a clear concept of different multiplier and their implementation in tap

delay FIR filter. We found that the parallel multipliers are much option than the serial

multiplier. We concluded this from the result of power consumption and the total area. In

case of parallel multipliers, the total area is much less than that of serial multipliers.

Hence the power consumption is also less. This is clearly depicted in our results. This

speeds up the calculation and makes the system faster.

While comparing the radix 2 and the radix 4 booth multipliers we found that radix 4

consumes lesser power than that of radix 2. This is because it uses almost half number of

iteration and adders when compared to radix 2.

When all the threse multipliers were compared we found that array multipliers are

most power consuming and have the maximum area. This is because it uses a large

number of adders. As a result it slows down the system because now the system has to

do a lot of calculation.

Multipliers are one the most important component of many systems. So we always need

to find a better solution in case of multipliers. Our multipliers should always consume

less power and cover less power. So through our project we try to determine which of the

these algorithms works the best. In the end we determine that radix 4 modified booth

algorithm works the best.

REFRENCES

Websites referred:

i. w w w . w i k i p e d i a . c o m

2. w w w . h o w s t u ff s w o r k . c o m

3. w w w . x il i n x . c om

Books referred:

i. Circuit Design using VHDL, by Pedroni , page number 285-293.

2. VHDL by Sjoholm Stefan

3. VHDL by B Bhaskar

Papers referred:1 C.N.Marimuthu , Dr. P. Thangaraj & Aswathy Ramesan presented a paper in international journal of computer science & information Technology,volume2,november3,2010.They.

2 C.N.Marimuthu & P Thangraj presented paper in ICGST-PDCS Volume 8,issue 1,Dec 2008.

3 E Sutter & E Boembo presented a paper in Latin American Applied Research 37:99.104(2007).

4 Soojin Kim and Kyeongsoon Cho present a paper in the world academy of Science,Engineering & Technology 61 2010.

5 A.Chandrakasan and R.Brodersen,”Low Power CMOS Digital Design”,IEEE J.Solid State Circuits,Vol.27,no.4,pp 473-484,Apr 1992.

6O.T.Chen,S.Wang and Y,W.Wu”Minimisation of switching activities o partial products for designing low power multipliers”IEEE Trans.Very Large Scale Integer.(VLSI)Syst.Vol.11,N0.-3,pp4,

http://www.wikipedia.com/

http://www.xilinx.com/

http://www.howstuffswork.com/

7 jWen-Chang Yeh and Chein-Wei Jen, “High-speed Booth encoded parallel multiplier design,” IEEE Trans. on Computers, vol. 49, isseu 7,pp. 692-701, July 2000.

[7] Hwang-Cherng Chow and I-Chyn Wey, “A 3.3V 1GHz high speed pipelined Booth multiplier,” Proc. of IEEE ISCAS, vol. 1, pp. 457-460,May 2002.[8] M. Aguirre-Hernandez and M. Linarse-Aranda, “Energy-efficienthigh-speed CMOS pipelined multiplier,” Proc. of IEEE CCE, pp.460-464, Nov. 2008.[9] Yung-chin Liang, Ching-ji Huang and Wei-bin Yang, “A 320-MHz 8bit x8bit pipelined multiplier in ultra-low supply voltage,” Proc. of IEEEA-SSCC, pp. 73-76, Nov. 2008.

[10] S. B. Tatapudi and J. G. Delgado-une-2003

design of efficient multiplier using vhdl

Documents

component instantiation statements

bit product register

partial product row

bit xor

concurrent statements

partial products

2s complement

full adder