main projectvlsi

84
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER 1.CHAPTER 1.1 INTRODUCTION Multiplication is an important fundamental function in arithmetic operations. Multiplication based operations such as Multiply and Accumulate (MAC) and inner product are among some of the frequently used Computation Intensive Arithmetic Functions (CIAF) currently implemented in many Digital Signal Processing (DSP) applications such as convolution, Fast Fourier Transform(FFT), filtering and in microprocessors in its arithmetic and logic unit [1]. Since multiplication dominates the execution time of most DSP algorithms, so there is a need of high speed multiplier. Currently, multiplication time is still the dominant factor in determining the instruction cycle time of a DSP chip. The demand for high speed processing has been increasing as a result of expanding computer and signal processing applications. Higher throughput arithmetic operations are important to achieve the desired performance in many real-time signal and image processing applications [2]. One of the key arithmetic operations in such applications is multiplication and the development of fast multiplier circuit has been a subject of interest over decades. Reducing the time delay and power consumption are very essential requirements for many applications [2, 3]. This work presents different multiplier HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURU Page 1

Upload: venki-venkappa

Post on 08-Nov-2014

19 views

Category:

Documents


0 download

DESCRIPTION

vlsi domain

TRANSCRIPT

Page 1: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

1.CHAPTER

1.1 INTRODUCTION

Multiplication is an important fundamental function in arithmetic operations.

Multiplication based operations such as Multiply and Accumulate (MAC) and inner product are

among some of the frequently used Computation Intensive Arithmetic Functions (CIAF)

currently implemented in many Digital Signal Processing (DSP) applications such as

convolution, Fast Fourier Transform(FFT), filtering and in microprocessors in its arithmetic and

logic unit [1]. Since multiplication dominates the execution time of most DSP algorithms, so

there is a need of high speed multiplier. Currently, multiplication time is still the dominant factor

in determining the instruction cycle time of a DSP chip.

The demand for high speed processing has been increasing as a result of expanding computer

and signal processing applications. Higher throughput arithmetic operations are important to

achieve the desired performance in many real-time signal and image processing applications [2].

One of the key arithmetic operations in such applications is multiplication and the

development of fast multiplier circuit has been a subject of interest over decades. Reducing the

time delay and power consumption are very essential requirements for many applications [2, 3].

This work presents different multiplier architectures. Multiplier based on Vedic Mathematics is

one of the fast and low power multiplier.

Minimizing power consumption for digital systems involves optimization at all levels of

the design. This optimization includes the technology used to implement the digital circuits, the

circuit style and topology, the architecture for implementing the circuits and at the highest level

the algorithms that are being implemented. Digital multipliers are the most commonly used

components in any digital circuit design. They are fast, reliable and efficient components that are

utilized to implement any operation. Depending upon the arrangement of the components, there

are different types of multipliers available. Particular multiplier architecture is chosen based on

the application.

In many DSP algorithms, the multiplier lies in the critical delay path and ultimately

determines the performance of algorithm. The speed of multiplication operation is of great

importance in DSP as well as in general processor. In the past multiplication was implemented

generally with a sequence of addition, subtraction and shift operations. There have been many

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 1

Page 2: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

algorithms proposals in literature to perform multiplication, each offering different advantages

and having tradeoff in terms of speed, circuit complexity, area and power consumption.

The multiplier is a fairly large block of a computing system. The amount of circuitry

involved is directly proportional to the square of its resolution i.e. A multiplier of size n bits has

n2 gates. For multiplication algorithms performed in DSP applications latency and throughput

are the two major concerns from delay perspective. Latency is the real delay of computing a

function, a measure of how long the inputs to a device are stable is the final result available on

outputs.

Throughput is the measure of how many multiplications can be performed in a given

period of time; multiplier is not only a high delay block but also a major source of power

dissipation. That's why if one also aims to minimize power consumption, it is of great interest to

reduce the delay by using various delay optimizations.

Digital multipliers are the core components of all the digital signal processors (DSPs) and

the speed of the DSP is largely determined by the speed of its multipliers[9]. Two most common

multiplication algorithms followed in the digital hardware are array multiplication algorithm and

Booth multiplication algorithm. The computation time taken by the array multiplier is

comparatively less because the partial products are calculated independently in parallel. The

delay associated with the array multiplier is the time taken by the signals to propagate through

the gates that form the multiplication array. Booth multiplication is another important

multiplication algorithm. Large booth arrays are required for high speed multiplication and

exponential operations which in turn require large partial sum and partial carry registers.

Multiplication of two n-bit operands using a radix-4 booth recording multiplier requires

approximately n / (2m) clock cycles to generate the least significant half of the final product,

where m is the number of Booth recorder adder stages. Thus, a large propagation delay is

associated with this case. Due to the importance of digital multipliers in DSP, it has always been

an active area of research and a number of interesting multiplication algorithms have been

reported in the literature [4].

In this, Urdhva tiryakbhyam Sutra is first applied to the binary number system and is used

to develop digital multiplier architecture. This is shown to be very similar to the popular array

multiplier architecture. This Sutra also shows the effectiveness of to reduce the NXN multiplier

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 2

Page 3: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

structure into an efficient 4X4 multiplier structures. Nikhilam Sutra is then discussed and is

shown to be much more efficient in the multiplication of large numbers as it reduces the

multiplication of two large numbers to that of two smaller ones. The proposed multiplication

algorithm is then illustrated to show its computational efficiency by taking an example of

reducing a 4X4-bit multiplication to a single 2X2-bit multiplication operation [4]. This work

presents a systematic design methodology for fast and area efficient digit multiplier based on

Vedic mathematics .The Multiplier Architecture is based on the Vertical and Crosswise

algorithm of ancient Indian Vedic Mathematics [5].

We have continued our development of the 12-bit by 11-bit multiplier for our

multiprocessor-ring chip. Since the multiplier's performance is the single most important factor

in determining the maximum speed at which our system can operate, it is essential that this task

be performed in the best possible manner. Our goal was that this multiplier operate in

approximately 30 ns, so that our overall chip speed could reach 30 MHz. The multiplier was

designed to accommodate eleven-bit data and twelve-bit coefficients. In addition to the above-

cited operating speed goal, we needed to make the multiplier small enough that five complete

processors, including five multipliers, can be included on a single IC (which employs 2 -jim

CMOS technology). As mentioned in the previous progress report, after completing the

multiplier's layout and verifying its correct functioning via logic simulation, we performed

SPICE simulations on the various multiplier critical-path components. These simulations

indicated an operating speed well under 30 ns. We next fabricated a MOSIS TinyChip (2.22 mm

x 2.25 mm, in 2-ym technology) with this 12-bit by 11-bit multiplier and a small block of

coefficient RAM as well as a small dual-port register block. This allowed us to test the

multiplier's performance in an environment that closely simulates the five-processor-ring system.

During the present quarter the fabricated TinyChip was tested using the Tektronix LV500

logic verification system, and the results verified the multiplier's proper functioning. The overall

delay of the TinyChip was measured to be 36 ns. The critical path of the multiplier is therefore

approximately 26 ns (subtracting off the read time approximately 10 ns). This is well below the

30 ns goal and corresponds well to the expected (simulation) result of 26.4 ns. Furthermore,

while this design was being fabricated, we laid out a redesigned vector-merge portion of the

multiplier. The simulated critical path of the new design is 22.3 ns. Since the multiplier

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 3

Page 4: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

determines the operating speed of the overall five-processor ring chip, we now expect the

completed project to operate at approximately 40 MHz, far exceeding the proposed 30 MHz

goal, and a quite good performance level in absolute terms, considering the small size of our

multipliers, and considering that they are fabricated using 2-/m CMOS technology. Another

MOSIS TinyChip will be sent out for fabrication this quarter which employs the multiplier with

the re-designed vector-merge We have also begun the IC layout of the storage registers for the

program instructions used by our processors. We expect this layout to be completed during the

present quarter. Furthermore, work has been proceeding nicely on the IC layout of the ALU. This

has been approached by starting with the layout of the multiplier and gradually extending it

according to the specified ALU architecture. Another project has been started during the present

quarter. It concerns a novel signed-digit serial multiplier, which should prove useful for new

real-time FFT and adaptiv e-filtering applications.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 4

Page 5: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

2. CHAPTER

2.1 VEDI MATHEMATICS

Vedic mathematics is part of four Vedas (books of wisdom). It is part of Sthapatya- Veda

(book on civil engineering and architecture), which is an upa-veda (supplement) of Atharva

Veda. It gives explanation of several mathematical terms including arithmetic, geometry (plane,

co-ordinate), trigonometry, quadratic equations, factorization and even calculus. His Holiness

Jagadguru Shankaracharya Bharati Krishna Teerthaji Maharaja (1884- 1960) comprised all this

work together and gave its mathematical explanation while discussing it for various applications.

Swamiji constructed 16 sutras (formulae) and 16 Upa sutras (sub formulae) after extensive

research in Atharva Veda. Obviously these formulae are not to be found in present text of

Atharva Veda because these formulae were constructed by Swamiji himself. Vedic mathematics

is not only a mathematical wonder but also it is logical. That’s why it has such a degree of

eminence which cannot be disapproved. Due these phenomenal characteristics,

Vedic maths has already crossed the boundaries of India and has become an interesting

topic of research abroad. Vedic maths deals with several basic as well as complex mathematical

operations. Especially, methods of basic arithmetic are extremely simple and powerful [2, 3].

The word “Vedic” is derived from the word “Veda” which means the store-house of all

knowledge. Vedic mathematics is mainly based on 16 Sutras (or aphorisms) dealing with various

branches of mathematics like arithmetic, algebra, geometry etc. These methods and ideas can be

directly applied to trigonometry, plain and spherical geometry, conics, calculus (both differential

and integral), and applied mathematics of various kinds. As mentioned earlier, all these Sutras

were reconstructed from ancient Vedic texts early in the last century. Many Sub-sutras were also

discovered at the same time, which are not discussed here.

The beauty of Vedic mathematics lies in the fact that it reduces the otherwise

cumbersome-looking calculations in conventional mathematics to a very simple one. This is so

because the Vedic formulae are claimed to be based on the natural principles on which the

human mind works. This is a very interesting field and presents some effective algorithms which

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 5

Page 6: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

can be applied to various branches of engineering such as computing and digital signal

processing [ 1,4]. The multiplier architecture can be generally classified into three categories.

First is the serial multiplier which emphasizes on hardware and minimum amount of chip area.

Second is parallel multiplier (array and tree) which carries out high speed mathematical

operations. But the drawback is the relatively larger chip area consumption. Third is serial-

parallel multiplier which serves as a good trade-off between the times consuming serial

multiplier and the area consuming parallel multipliers.

Below is a list of those sixteen sutras, translated from Sanskrit into English. And also there are

some sub-sutras that are listed further.

• By one more than the one before

• All from 9 and the last from 10

• Vertically and crosswise

• Transpose and apply

• If the Samuccaya is the same it is zero

• If one is in ratio the other is zero

• By addition and by subtraction

• By the completion or non-completion

• Differential calculus

• By the deficiency

• Specific and general

• The remainders by the last digit

• The ultimate and twice the penultimate

• By one less than the one before

• The product of the sum

• All the multipliers

Below is a list of the Sub sutras or Corollaries:

• Proportionately.

• The remainder remains constant.

• The first by the first and the last by the last.

• For 7 the multiplicand is 143.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 6

Page 7: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

• By osculation.

• Lessen by the deficiency.

• Whatever the deficiency lessen by that amount and set up the square of the deficiency.

• Last Totaling 10.

• Only the last terms.• The sum of the products.

• By alternative elimination and retention.

• By mere observation.

• The product of the sum is the sum of the products.

• On the flag.

TABLE 1

COMPARISON BETWEEN NORMAL

METHOD OF MULTIPLICATION AND VEDIC

MATHEMATICS MULTIPLICATIO

N

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 7

Page 8: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

In algorithmic and structural levels, a lot of multiplication techniques had been developed

to enhance the efficiency of the multiplier; which encounters the reduction of the partial products

and/or the methods for their partial products addition, but the principle behind multiplication was

same in all cases.

Vedic Mathematics is the ancient system of Indian mathematics which has a unique

technique of calculations based on 16 Sutras (Formulae). "Urdhva-tiryakbyham" is a Sanskrit

word means vertically and crosswise formula is used for smaller number multiplication.

"Nikhilam Navatascaramam Dasatah" also a Sanskrit term indicating "all from 9 and last from

10", formula is used for large number multiplication and subtraction. All these formulas are

adopted from ancient Indian Vedic Mathematics. In this work we formulate this mathematics for

designing the complex multiplier architecture in transistor level with two clear goals

in mind such as:

i) Simplicity and modularity multiplications for VLSI implementations and

ii) The elimination of carry propagation for rapid additions and subtractions.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 8

Page 9: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

Mehta et al. [9] have been proposed a multiplier design using "Urdhva-tiryakbyham" sutras,

which was adopted from the Vedas. The formulation using this sutra is similar to the modem

array multiplication, which also indicating the carry propagation issues. A multiplier design

using "Nikhilam Navatascaramam Dasatah" sutras has been reported by Tiwari et. al [10] in

2009, but he has not implemented the hardware module for multiplication.

Multiplier implementation in the gate level (FPGA) using Vedic Mathematics has already

been reported but to the best of our knowledge till date there is no report on transistor level

(ASIC) implementation of such complex multiplier. By employing the Vedic mathematics, an N

bit complex number multiplication was transformed into four multiplications for real and

imaginary terms of the final product. "Nikhilam Navatascaramam Dasatah" sutra is used for the

multiplication purpose, with less number of partial products generation, in comparison with array

based multiplication. When compared with existing methods such as the direct method or the

strength reduction technique, our approach resulted not only in simplified arithmetic operations,

but also in a regular array like structure.

The multiplier is fully parameterized, so any configuration of input and output word-

lengths could be elaborated. Transistor level implementation for perfonnance parameters such as

propagation delay, dynamic leakage power and dynamic switching power consumption

calculation of the proposed method was calculated by spice spectre using 90 nm standard CMOS

technology and compared with the other design like distributed arithmetic[3], parallel adder

based implementation [1] and algebraic transfonnation[2] based implementation. The calculated

results revealed (16,16)x(16,16) complex multiplier have propagation delay only 4 ns with

6.5mW dynamic switching power.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 9

Page 10: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

3. CHAPTER

3.1 PROPOSED MODEL

The multiplier is based on an algorithm Urdhava Tiryakbhyam (Vertical & Crosswise) of

ancient Indian Vedic Mathematics. Urdhava Tiryakbhyam Sutra is a general multiplication

formula applicable to all cases of multiplication. It literally means “Vertically and crosswise”. It

is based on a novel concep t through which the generation of all partial products can be done

with the concurrent addition of these partial products. The parallelism in generation of partial

products and their summation is obtained using Urdhava Tiryakbhyam explained in Figure 1.

Figure 1: Multiplication of two decimal numbers by Urdhava Tiryakbhyam

The algorithm can be generalized for n x n bit number. Since the partial products and

their sums are calculated in parallel, the multiplier is independent of the clock frequency of the

processor. Thus the multiplier will require the same amount of time to calculate the product and

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 10

Page 11: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

hence is independent of the clock frequency. The net advantage is that it reduces the need of

microprocessors to operate at increasingly high clock frequencies. While a higher clock

frequency generally results in increased processing power, its disadvantage is that it also

increases power dissipation which results in higher device operating temperatures. By adopting

the Vedic multiplier, microprocessors designers can easily circumvent these problems to avoid

catastrophic device failures.

The processing power of multiplier can easily be increased by increasing the input and

output data bus widths since it has a quite a regular structure . Due to its regular structure, it can

be easily layout in a silicon chip. The Multiplier has the advantage that as the number of bits

increases, gate delay and area increases very lowly as compared to other multipliers. Therefore it

is time, space and power efficient. Multiplication of two decimal numbers- 325*738 .

To illustrate this multiplication scheme, let us consider the multiplication of two decimal

numbers (325 * 738). Line diagram for the multiplication is shown in Figure 2. The digits on the

both sides of the line are multiplied and added with the carry from the previous step. This

generates one of the bits of the result and a carry. This carry is added in the next step and hence

the process goes on. If more than one line are there in one step, all the results are added to the

previous carry. In each step, least significant bit acts as the result bit and all other bits act as

carry for the next step. Initially the carry is taken to be zero. To make the methodology more

clear, an alternate illustration is given with the help of line diagrams in Figure 3 where the dots

represent bit 0 or1.

3.2 Algorithm for 8 X 8 Bit Multiplication

To illustrate the multiplication algorithm, let us consider the multiplication of two binary

numbers a3a2a1a0 and b3b2b1b0. As the result of this multiplication would be more than 4 bits,

we express it as... r3r2r1r0. Line diagram for multiplication of two 4- bit numbers is shown in

Figure 2 which is nothing but the mapping of the Figure 1 in binary system. For the simplicity,

each bit is represented by a circle. Least significant bit r0 is obtained by multiplying the least

significant bits of the multiplicand and the multiplier. The process is followed according to the

steps shown in Figure 2.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 11

Page 12: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

Figure 2: Line diagram for multiplication of two 4 – bit numbers.

Firstly, least significant bits are multiplied which gives the least significant bit of the

product (vertical). Then, the LSB of the multiplicand is multiplied with the next higher bit of the

multiplier and added with the product of LSB of multiplier and next higher bit of the

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 12

Page 13: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

multiplicand (crosswise). The sum gives second bit of the product and the carry is added in the

output of next stage sum obtained by the crosswise and vertical multiplication and addition of

three bits of the two numbers from least significant position.

Next, all the four bits are processed with crosswise multiplication and addition to give the

sum and carry. The sum is the corresponding bit of the product and the carry is again added to

the next stage multiplication and addition of three bits except the LSB. The same operation

continues until the multiplication of the two MSBs to give the MSB of the product. For example,

if in some intermediate step, we get 110, then 0 will act as result bit (referred as rn) and 11 as the

carry (referred as cn). It should be clearly noted that can may be a multi-bit number.

Thus we get the following expressions:

r0=a0b0; (1)

c1r1=a1b0+a0b1 (2)

c2r2=c1+a2b0+a1b1 + a0b2 (3)

c3r3=c2+a3b0+a2b1 + a1b2 + a0b3 (4)

c4r4=c3+a3b1+a2b2 + a1b3 (5)

c5r5=c4+a3b2+a2b3 (6)

c6r6=c5+a3b3 (7)

With c6r6r5r4r3r2r1r0 being the final product. Hence this is the general mathematical

formula applicable to all cases of multiplication.

The hardware realization of a 4-bit multiplier is shown in Figure 3. This hardware design

is very similar to that of the famous array multiplier where an array of adders is required to arrive

at the final product. All the partial products are calculated in parallel and the delay associated is

mainly the time taken by the carry to propagate through the adders which form the multiplication

array.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 13

Page 14: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

Figure 3: Hardware architecture of the Urdhava tiryakbhyam multiplier.

3.3 ARCHITECTURE

Our design of Q-format signed multiplier includes Urdhava Tiryakbhyam integer

multiplier [4] with certain modifications as follows. This multiplier is faster since all the partial

products are computed concurrently. Considering a 16 bit Q15 multiplier, the product is also a

Q15 number which is 16 bits long. Firstly, if the MSB of input is 1 then it is a negative number.

Therefore 2’s complement of the number is taken before proceeding with multiplication. Since

the MSB denotes sign it is excluded and a ‘0’ is placed in this position while multiplying. A Q15

format multiplier consists of four 8 x 8 Urdhava multipliers and the resulting product is 32 bits

long as shown in fig. 4. But the product of a Q15 number is also a Q15 number which should be

16 bits long.

Therefore the 32 bit product is left shifted by 1 b redundant sign bit and only the most

signify product are considered which constitute the xor operation is performed on the input sign

the sign of the result. If the output is conversion of the 16 bit final result to it format indicating a

negative product.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 14

Page 15: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

Fig. 4. Architecture of a Q15 format multiplier.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 15

Page 16: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

Fig. 5.\Architecture of Q31format multiplier.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 16

Page 17: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

4. CHAPTER

4.1 IMPLEMENTATION

The Vedic multiplier is implemented using VHDL and also other multipliers like booth

multiplier and array multiplier are also implemented. The functional verification through

simulation of the VHDL code was carried out using ModelSim SE 6.0 simulator. The entire code

is completely synthesizable. The synthesis is done using Xilinx Synthesis Tool (XST) available

with Xilinx ISE 9.1i.

The design is optimized for speed and area using Xilinx, device family Spartan3, package

pq208, and speed grade -4. Table 1 and Table 2 indicate the device utilization summary of the

array, booth and Vedic multiplier for 8 bit and 16 bit respectively. Table 3 gives the timing

report of the implementation in terms of maximum combinational path delay. While Figure 4

indicates the RTL schematic of the 8 bit Vedic multiplier, that of 8 bit Normal multiplier is

indicated illustrated in Figure 5.

TABLE 1

DEVICE UTILIZATION SUMMARY

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 17

Page 18: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

TABLE 2

DEVICE UTILIZATION SUMMARY

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 18

Page 19: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

TABLE 3

MAXIMUM COMBINATIONAL PATH DELAY (ns)

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 19

Page 20: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

5. CHAPTER

5.1 INTRODUCTION TO VLSI

Very-large-scale integration (VLSI) is the process of creating integrated circuits by

combining thousands of transistor-based circuits into a single chip. VLSI began in the 1970s

when complex semiconductor and communication technologies were being developed. The

microprocessor is a VLSI device. The term is no longer as common as it once was, as chips have

increased in complexity into the hundreds of millions of transistors.

5.2 Overview

The first semiconductor chips held one transistor each. Subsequent advances added more

and more transistors, and, as a consequence, more individual functions or systems were

integrated over time. The first integrated circuits held only a few devices, perhaps as many as ten

diodes, transistors, resistors and capacitors, making it possible to fabricate one or more logic

gates on a single device. Now known retrospectively as "small-scale integration" (SSI),

improvements in technique led to devices with hundreds of logic gates, known as large-scale

integration (LSI), i.e. systems with at least a thousand logic gates. Current technology has moved

far past this mark and today's microprocessors have many millions of gates and hundreds of

millions of individual transistors.

At one time, there was an effort to name and calibrate various levels of large-scale

integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were used. But the huge

number of gates and transistors available on common devices has rendered such fine distinctions

moot.

Terms suggesting greater than VLSI levels of integration are no longer in widespread use.

Even VLSI is now somewhat quaint, given the common assumption that all microprocessors are

VLSI or better.

As of early 2008, billion-transistor processors are commercially available, an example of

which is Intel's Montecito Itanium chip. This is expected to become more commonplace as

semiconductor fabrication moves from the current generation of 65 nm processes to the next 45

nm generations (while experiencing new challenges such as increased variation across process

corners). Another notable example is NVIDIA’s 280 series GPU.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 20

Page 21: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

This microprocessor is unique in the fact that its 1.4 Billion transistor count, capable of a

teraflop of performance, is almost entirely dedicated to logic (Itanium's transistor count is largely

due to the 24MB L3 cache). Current designs, as opposed to the earliest devices, use extensive

design automation and automated logic synthesis to lay out the transistors, enabling higher levels

of complexity in the resulting logic functionality. Certain high-performance logic blocks like the

SRAM cell, however, are still designed by hand to ensure the highest efficiency (sometimes by

bending or breaking established design rules to obtain the last bit of performance by trading

stability).

5.3 What is VLSI

VLSI stands for "Very Large Scale Integration". This is the field which involves packing

more and more logic devices into smaller and smaller areas.

VLSI

1. Simply we say Integrated circuit is many transistors on one chip.

2. Design/manufacturing of extremely small, complex circuitry using modified

semiconductor material

3. Integrated circuit (IC) may contain millions of transistors, each a few mm in size

4. Applications wide ranging: most electronic logic devices

5.4 History of Scale Integration

late 40s Transistor invented at Bell Labs

late 50s First IC (JK-FF by Jack Kilby at TI)

early 60s Small Scale Integration (SSI)

10s of transistors on a chip

late 60s Medium Scale Integration (MSI)

100s of transistors on a chip

early 70s Large Scale Integration (LSI)

1000s of transistor on a chip

early 80s VLSI 10,000s of transistors on a

chip (later 100,000s & now 1,000,000s)

Ultra LSI is sometimes used for 1,000,000s

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 21

Page 22: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

SSI - Small-Scale Integration (0-102)

MSI - Medium-Scale Integration (102-103)

LSI - Large-Scale Integration (103-105)

VLSI - Very Large-Scale Integration (105-107)

ULSI - Ultra Large-Scale Integration (>=107)

5.5 Advantages of ICs over discrete components

While we will concentrate on integrated circuits , the properties of integrated circuits-

what we can and cannot efficiently put in an integrated circuit-largely determine the architecture

of the entire system. Integrated circuits improve system characteristics in several critical ways.

ICs have three key advantages over digital circuits built from discrete components:

Size. Integrated circuits are much smaller-both transistors and wires are shrunk to

micrometer sizes, compared to the millimeter or centimeter scales of discrete

components. Small size leads to advantages in speed and power consumption, since

smaller components have smaller parasitic resistances, capacitances, and inductances.

Speed. Signals can be switched between logic 0 and logic 1 much quicker within a chip

than they can between chips. Communication within a chip can occur hundreds of times

faster than communication between chips on a printed circuit board. The high speed of

circuits on-chip is due to their small size-smaller components and wires have smaller

parasitic capacitances to slow down the signal.

Power consumption. Logic operations within a chip also take much less power. Once

again, lower power consumption is largely due to the small size of circuits on the chip-

smaller parasitic capacitances and resistances require less power to drive them.

5.6 VLSI and systems

These advantages of integrated circuits translate into advantages at the system level:

Smaller physical size. Smallness is often an advantage in itself-consider portable

televisions or handheld cellular telephones.

Lower power consumption. Replacing a handful of standard parts with a single chip

reduces total power consumption. Reducing power consumption has a ripple effect on the

rest of the system: a smaller, cheaper power supply can be used; since less power

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 22

Page 23: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

consumption means less heat, a fan may no longer be necessary; a simpler cabinet with

less shielding for electromagnetic shielding may be feasible, too.

Reduced cost. Reducing the number of components, the power supply requirements,

cabinet costs, and so on, will inevitably reduce system cost. The ripple effect of

integration is such that the cost of a system built from custom ICs can be less, even

though the individual ICs cost more than the standard parts they replace.

Understanding why integrated circuit technology has such profound influence on the

design of digital systems requires understanding both the technology of IC manufacturing and

the economics of ICs and digital systems.

Applications

Electronic system in cars.

Digital electronics control VCRs

Transaction processing system, ATM

Personal computers and Workstations

Etc….

5.7 Applications of VLSI

Electronic systems now perform a wide variety of tasks in daily life. Electronic

systems in some cases have replaced mechanisms that operated mechanically, hydraulically, or

by other means; electronics are usually smaller, more flexible, and easier to service. In other

cases electronic systems have created totally new applications. Electronic systems perform a

variety of tasks, some of them visible, some more hidden:

Personal entertainment systems such as portable MP3 players and DVD

players perform sophisticated algorithms with remarkably little energy.

Electronic systems in cars operate stereo systems and displays; they also

control fuel injection systems, adjust suspensions to varying terrain, and

perform the control functions required for anti-lock braking (ABS) systems.

Digital electronics compress and decompress video, even at high-definition

data rates, on-the-fly in consumer electronics.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 23

Page 24: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

Low-cost terminals for Web browsing still require sophisticated electronics,

despite their dedicated function.

Personal computers and workstations provide word-processing, financial

analysis, and games. Computers include both central processing units (CPUs)

and special-purpose hardware for disk access, faster screen display, etc.

Medical electronic systems measure bodily functions and perform complex

processing algorithms to warn about unusual conditions. The availability of

these complex systems, far from overwhelming consumers, only creates

demand for even more complex systems.

The growing sophistication of applications continually pushes the design and

manufacturing of integrated circuits and electronic systems to new levels of complexity. And

perhaps the most amazing characteristic of this collection of systems is its variety-as systems

become more complex, we build not a few general-purpose computers but an ever wider range of

special-purpose systems. Our ability to do so is a testament to our growing mastery of both

integrated circuit manufacturing and design, but the increasing demands of customers continue to

test the limits of design and manufacturing

5.8 ASIC

An Application-Specific Integrated Circuit (ASIC) is an integrated circuit (IC)

customized for a particular use, rather than intended for general-purpose use. For example, a chip

designed solely to run a cell phone is an ASIC. Intermediate between ASICs and industry

standard integrated circuits, like the 7400 or the 4000 series, are application specific standard

products (ASSPs).

As feature sizes have shrunk and design tools improved over the years, the maximum

complexity (and hence functionality) possible in an ASIC has grown from 5,000 gates to over

100 million. Modern ASICs often include entire 32-bit processors, memory blocks including

ROM, RAM, EEPROM, Flash and other large building blocks. Such an ASIC is often termed a

SoC (system-on-a-chip). Designers of digital ASICs use a hardware description language (HDL),

such as Verilog or VHDL, to describe the functionality of ASICs.

Field-programmable gate arrays (FPGA) are the modern-day technology for building a

breadboard or prototype from standard parts; programmable logic blocks and programmable

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 24

Page 25: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

interconnects allow the same FPGA to be used in many different applications. For smaller

designs and/or lower production volumes, FPGAs may be more cost effective than an ASIC

design even in production. An application-specific integrated circuit (ASIC) is an integrated

circuit (IC) customized for a particular use, rather than intended for general-purpose use.

1. A Structured ASIC falls between an FPGA and a Standard Cell-based ASIC

2. Structured ASIC’s are used mainly for mid-volume level design. The design task for

structured ASIC’s is to map the circuit into a fixed arrangement of known cells.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 25

Page 26: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

6. CHAPTER

6.1 INTRODUCTION TO XILINX

Migrating Projects from Previous ISE Software Releases

When you open a project file from a previous release, the ISE® software prompts you to

migrate your project. If you click Backup and Migrate or Migrate Only, the software

automatically converts your project file to the current release. If you click Cancel, the software

does not convert your project and, instead, opens Project Navigator with no project loaded.

Note: After you convert your project, you cannot open it in previous versions of the ISE

software, such as the ISE 11 software. However, you can optionally create a backup of the

original project as part of project migration, as described below.

To Migrate a Project

1. In the ISE 12 Project Navigator, select File > Open Project.

2. In the Open Project dialog box, select the .xise file to migrate.

Note You may need to change the extension in the Files of type field to display .npl

(ISE 5 and ISE 6 software) or .ise (ISE 7 through ISE 10 software) project files.

3. In the dialog box that appears, select Backup and Migrate or Migrate Only.

4. The ISE software automatically converts your project to an ISE 12 project.

Note If you chose to Backup and Migrate, a backup of the original project is created at

project_name_ise12migration.zip.

5. Implement the design using the new version of the software.

Note Implementation status is not maintained after migration.

6.2 Properties

For information on properties that have changed in the ISE 12 software, see ISE 11 to

ISE 12 Properties Conversion.

6.3 IP Modules

If your design includes IP modules that were created using CORE Generator™ software

or Xilinx® Platform Studio (XPS) and you need to modify these modules, you may be required

to update the core. However, if the core netlist is present and you do not need to modify the

core, updates are not required and the existing netlist is used during implementation.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 26

Page 27: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

6.4 Obsolete Source File Types

The ISE 12 software supports all of the source types that were supported in the ISE 11

software.

If you are working with projects from previous releases, state diagram source files (.dia),

ABEL source files (.abl), and test bench waveform source files (.tbw) are no longer supported.

For state diagram and ABEL source files, the software finds an associated HDL file and adds it

to the project, if possible. For test bench waveform files, the software automatically converts the

TBW file to an HDL test bench and adds it to the project. To convert a TBW file after project

migration, see Converting a TBW File to an HDL Test Bench

6.5 Using ISE Example Projects

To help familiarize you with the ISE® software and with FPGA and CPLD designs, a set

of example designs is provided with Project Navigator. The examples show different design

techniques and source types, such as VHDL, Verilog, schematic, or EDIF, and include different

constraints and IP.

To Open an Example

1. Select File > Open Example.

2. In the Open Example dialog box, select the Sample Project Name.

Note To help you choose an example project, the Project Description field describes

each project. In addition, you can scroll to the right to see additional fields, which

provide details about the project.

3. In the Destination Directory field, enter a directory name or browse to the

directory.

4. Click OK.

5. The example project is extracted to the directory you specified in the Destination

Directory field and is automatically opened in Project Navigator. You can then run

processes on the example project and save any changes.

Note If you modified an example project and want to overwrite it with the original

example project, select File > Open Example, select the Sample Project Name, and specify the

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 27

Page 28: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

same Destination Directory you originally used. In the dialog box that appears, select Overwrite

the existing project and click OK.

6.6 Creating a Project

Project Navigator allows you to manage your FPGA and CPLD designs using an ISE®

project, which contains all the source files and settings specific to your design. First, you must

create a project and then, add source files, and set process properties. After you create a project,

you can run processes to implement, constrain, and analyze your design. Project Navigator

provides a wizard to help you create a project as follows.

Note If you prefer, you can create a project using the New Project dialog box instead of

the New Project Wizard. To use the New Project dialog box, deselect the Use New Project

wizard option in the ISE General page of the Preferences dialog box.

To Create a Project

1. Select File > New Project to launch the New Project Wizard.

2. In the Create New Project page, set the name, location, and project type, and

click Next.

3. For EDIF or NGC/NGO projects only: In the Import EDIF/NGC Project page,

select the input and constraint file for the project, and click Next.

4. In the Project Settings page, set the device and project properties, and click

Next.

5. In the Project Summary page, review the information, and click Finish to

create the project

Project Navigator creates the project file (project_name.xise) in the directory you

specified. After you add source files to the project, the files appear in the Hierarchy pane of the

6.7 Design panel

Project Navigator manages your project based on the design properties (top-level module

type, device type, synthesis tool, and language) you selected when you created the project. It

organizes all the parts of your design and keeps track of the processes necessary to move the

design from design entry through implementation to programming the targeted Xilinx® device.

Note For information on changing design properties, see Changing Design Properties.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 28

Page 29: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

You can now perform any of the following:

Create new source files for your project.

Add existing source files to your project.

Run processes on your source files.

Modify process properties.

6.8 Creating a Copy of a Project

You can create a copy of a project to experiment with different source options and

implementations. Depending on your needs, the design source files for the copied project and

their location can vary as follows:

Design source files are left in their existing location, and the copied project

points to these files.

Design source files, including generated files, are copied and placed in a

specified directory.

Design source files, excluding generated files, are copied and placed in a

specified directory.

Copied projects are the same as other projects in both form and function. For example, you can

do the following with copied projects:

Open the copied project using the File > Open Project menu command.

View, modify, and implement the copied project.

Use the Project Browser to view key summary data for the copied project and

then, open the copied project for further analysis and implementation, as described in

6.9 Using the Project Browser

 Alternatively, you can create an archive of your project, which puts all of the project

contents into a ZIP file. Archived projects must be unzipped before being opened in Project

Navigator. For information on archiving, see Creating a Project Archive.

To Create a Copy of a Project

1. Select File > Copy Project.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 29

Page 30: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

2. In the Copy Project dialog box, enter the Name for the

copy.

Note The name for the copy can be the same as the name for the project, as long as you

specify a different location.

3. Enter a directory Location to store the copied project.

4. Optionally, enter a Working directory.

By default, this is blank, and the working directory is the same as the project directory.

However, you can specify a working directory if you want to keep your ISE® project

file (.xise extension) separate from your working area.

5. Optionally, enter a Description for the copy.

The description can be useful in identifying key traits of the project for reference later.

6. In the Source options area, do the following:

Select one of the following options:

Keep sources in their current locations - to leave the design source files in their

existing location.

If you select this option, the copied project points to the files in their existing location. If

you edit the files in the copied project, the changes also appear in the original project, because

the source files are shared between the two projects.

Copy sources to the new location - to make a copy of all the design source files and

place them in the specified Location directory.

If you select this option, the copied project points to the files in the specified directory. If

you edit the files in the copied project, the changes do not appear in the original project, because

the source files are not shared between the two projects.

Optionally, select Copy files from Macro Search Path directories to copy files from

the directories you specify in the Macro Search Path property in the Translate Properties dialog

box. All files from the specified directories are copied, not just the files used by the design.

Note: If you added a net list source file directly to the project as described in Working

with Net list-Based IP, the file is automatically copied as part of Copy Project because it is a

project source file. Adding net list source files to the project is the preferred method for

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 30

Page 31: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

incorporating net list modules into your design, because the files are managed automatically by

Project Navigator.

Optionally, click Copy Additional Files to copy files that were not included in the

original project. In the Copy Additional Files dialog box, use the Add Files and Remove Files

buttons to update the list of additional files to copy. Additional files are copied to the copied

project location after all other files are copied.To exclude generated files from the copy, such as

implementation results and reports, select

6.10 Exclude generated files from the copy

When you select this option, the copied project opens in a state in which processes have

not yet been run.

7. To automatically open the copy after creating it, select

Open the copied project.

Note By default, this option is disabled. If you leave this option disabled, the original

project remains open after the copy is made.

Click OK.

6.11 Creating a Project Archive

A project archive is a single, compressed ZIP file with a .zip extension. By default, it

contains all project files, source files, and generated files, including the following:

User-added sources and associated files

Remote sources

Verilog `include files

Files in the macro search path

Generated files

Non-project files

6.12 To Archive a Project

1. Select Project > Archive.

2. In the Project Archive dialog box, specify a file name and directory for the ZIP

file.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 31

Page 32: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

3. Optionally, select Exclude generated files from the archive to exclude

generated files and non-project files from the archive.

4. Click OK.

A ZIP file is created in the specified directory. To open the archived project, you must

first unzip the ZIP file, and then, you can open the project.

Note Sources that reside outside of the project directory are copied into a remote_sources

subdirectory in the project archive. When the archive is unzipped and opened, you must either

specify the location of these files in the remote_sources subdirectory for the unzipped project, or

manually copy the sources into their original location.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 32

Page 33: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

7. CHAPTER

7. 1 INTRODUCTION TO VERILOG

In the semiconductor and electronic design industry, Verilog is a hardware description

language(HDL) used to model electronic systems. Verilog HDL, not to be confused

with VHDL (a competing language), is most commonly used in the design, verification, and

implementation ofdigital logic chips at the register-transfer level of abstraction. It is also used in

the verification ofanalog and mixed-signal circuits.

7.2 Overview

Hardware description languages such as Verilog differ from software programming

languages because they include ways of describing the propagation of time and signal

dependencies (sensitivity). There are two assignment operators, a blocking assignment (=), and a

non-blocking (<=) assignment. The non-blocking assignment allows designers to describe a

state-machine update without needing to declare and use temporary storage variables (in any

general programming language we need to define some temporary storage spaces for the

operands to be operated on subsequently; those are temporary storage variables). Since these

concepts are part of Verilog's language semantics, designers could quickly write descriptions of

large circuits in a relatively compact and concise form. At the time of Verilog's introduction

(1984), Verilog represented a tremendous productivity improvement for circuit designers who

were already using graphical schematic capturesoftware and specially-written software programs

to document and simulate electronic circuits.

The designers of Verilog wanted a language with syntax similar to the C programming

language, which was already widely used in engineering software development. Verilog is case-

sensitive, has a basic preprocessor (though less sophisticated than that of ANSI C/C++), and

equivalent control flow keywords (if/else, for, while, case, etc.), and compatible operator

precedence. Syntactic differences include variable declaration (Verilog requires bit-widths on

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 33

Page 34: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

net/reg types[clarification needed]), demarcation of procedural blocks (begin/end instead of curly braces

{}), and many other minor differences.

A Verilog design consists of a hierarchy of modules. Modules encapsulate design

hierarchy, and communicate with other modules through a set of declared input, output, and

bidirectional ports. Internally, a module can contain any combination of the following:

net/variable declarations (wire, reg, integer, etc.), concurrent and sequential statement blocks,

and instances of other modules (sub-hierarchies). Sequential statements are placed inside a

begin/end block and executed in sequential order within the block. But the blocks themselves are

executed concurrently, qualifying Verilog as a dataflow language.

Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating,

undefined") and strengths (strong, weak, etc.). This system allows abstract modeling of shared

signal lines, where multiple sources drive a common net. When a wire has multiple drivers, the

wire's (readable) value is resolved by a function of the source drivers and their strengths.

A subset of statements in the Verilog language is synthesizable. Verilog modules that

conform to a synthesizable coding style, known as RTL (register-transfer level), can be

physically realized by synthesis software. Synthesis software algorithmically transforms the

(abstract) Verilog source into a net list, a logically equivalent description consisting only of

elementary logic primitives (AND, OR, NOT, flip-flops, etc.) that are available in a

specific FPGA or VLSI technology. Further manipulations to the net list ultimately lead to a

circuit fabrication blueprint (such as a photo mask set for an ASIC or a bit stream file for

an FPGA).

7.3 History

Beginning

Verilog was the first modern hardware description language to be invented. It was created

by Phil Moorby and Prabhu Goel during the winter of 1983/1984. The wording for this process

was "Automated Integrated Design Systems" (later renamed to Gateway Design Automation in

1985) as a hardware modeling language. Gateway Design Automation was purchased

by Cadence Design Systems in 1990. Cadence now has full proprietary rights to Gateway's

Verilog and the Verilog-XL, the HDL-simulator that would become the de-facto standard (of

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 34

Page 35: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

Verilog logic simulators) for the next decade. Originally, Verilog was intended to describe and

allow simulation; only afterwards was support for synthesis added.

7.4 Verilog-95

With the increasing success of VHDL at the time, Cadence decided to make the language

available for open standardization. Cadence transferred Verilog into the public domain under

the Open Verilog International  (OVI) (now known as Accellera) organization. Verilog was later

submitted to IEEE and became IEEE Standard 1364-1995, commonly referred to as Verilog-95.

In the same time frame Cadence initiated the creation of Verilog-A to put standards

support behind its analog simulator Spectre. Verilog-A was never intended to be a standalone

language and is a subset of Verilog-AMS which encompassed Verilog-95.

7.5 Verilog 2001

Extensions to Verilog-95 were submitted back to IEEE to cover the deficiencies that

users had found in the original Verilog standard. These extensions became IEEE Standard 1364-

2001 known as Verilog-2001.

Verilog-2001 is a significant upgrade from Verilog-95. First, it adds explicit support for

(2's complement) signed nets and variables. Previously, code authors had to perform signed

operations using awkward bit-level manipulations (for example, the carry-out bit of a simple 8-

bit addition required an explicit description of the Boolean algebra to determine its correct

value). The same function under Verilog-2001 can be more succinctly described by one of the

built-in operators: +, -, /, *, >>>. A generate/endgenerate construct (similar to VHDL's

generate/endgenerate) allows Verilog-2001 to control instance and statement instantiation

through normal decision operators (case/if/else). Using generate/endgenerate, Verilog-2001 can

instantiate an array of instances, with control over the connectivity of the individual instances.

File I/O has been improved by several new system tasks. And finally, a few syntax additions

were introduced to improve code readability (e.g. always @*, named parameter override, C-style

function/task/module header declaration).

Verilog-2001 is the dominant flavor of Verilog supported by the majority of

commercial EDA software packages.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 35

Page 36: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

7.6 Verilog 2005

Not to be confused with SystemVerilog, Verilog 2005 (IEEE Standard 1364-2005)

consists of minor corrections, spec clarifications, and a few new language features (such as the

uwire keyword).

A separate part of the Verilog standard, Verilog-AMS, attempts to integrate analog and mixed

signal modeling with traditional Verilog.

7.7 SystemVerilog

SystemVerilog is a superset of Verilog-2005, with many new features and capabilities to

aid design verification and design modeling. As of 2009, the SystemVerilog and Verilog

language standards were merged into SystemVerilog 2009 (IEEE Standard 1800-2009).

The advent of hardware verification languages such as OpenVera, and Verisity's e

language encouraged the development of Superlog by Co-Design Automation Inc. Co-Design

Automation Inc was later purchased by Synopsys. The foundations of Superlog and Vera were

donated to Accellera, which later became the IEEE standard P1800-2005: SystemVerilog.

In the late 1990s, the Verilog Hardware Description Language (HDL) became the most

widely used language for describing hardware for simulation and synthesis. However, the first

two versions standardized by the IEEE (1364-1995 and 1364-2001) had only simple constructs

for creating tests. As design sizes outgrew the verification capabilities of the language,

commercial Hardware Verification Languages (HVL) such as Open Vera and e were created.

Companies that did not want to pay for these tools instead spent hundreds of man-years creating

their own custom tools. This productivity crisis (along with a similar one on the design side) led

to the creation of Accellera, a consortium of EDA companies and users who wanted to create the

next generation of Verilog. The donation of the Open-Vera language formed the basis for the

HVL features of SystemVerilog.Accellera’s goal was met in November 2005 with the adoption

of the IEEE standard P1800-2005 for SystemVerilog, IEEE (2005).

The most valuable benefit of SystemVerilog is that it allows the user to construct reliable,

repeatable verification environments, in a consistent syntax, that can be used across multiple

projects

Some of the typical features of an HVL that distinguish it from a Hardware Description

Language such as Verilog or VHDL are

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 36

Page 37: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

Constrained-random stimulus generation

Functional coverage

Higher-level structures, especially Object Oriented Programming

Multi-threading and interprocess communication

Support for HDL types such as Verilog’s 4-state values

Tight integration with event-simulator for control of the design

There are many other useful features, but these allow you to create test benches at a

higher level of abstraction than you are able to achieve with an HDL or a programming language

such as C.

System Verilog provides the best framework to achieve coverage-driven verification (CDV).

CDV combines automatic test generation, self-checking testbenches, and coverage metrics to

significantly reduce the time spent verifying a design. The purpose of CDV is to:

Eliminate the effort and time spent creating hundreds of tests.

Ensure thorough verification using up-front goal setting.

Receive early error notifications and deploy run-time checking and error analysis to

simplify debugging.

Examples

Ex1: A hello world program looks like this:

module main;

initial

begin

$display("Hello world!");

$finish;

end

endmodule

Ex2: A simple example of two flip-flops follows:

module toplevel(clock,reset);

input clock;

input reset;

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 37

Page 38: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

reg flop1;

reg flop2;

always @ (posedge reset or posedge clock)

if (reset)

begin

flop1 <= 0;

flop2 <= 1;

end

else

begin

flop1 <= flop2;

flop2 <= flop1;

end

endmodule

The "<=" operator in Verilog is another aspect of its being a hardware description

language as opposed to a normal procedural language. This is known as a "non-blocking"

assignment. Its action doesn't register until the next clock cycle. This means that the order of the

assignments are irrelevant and will produce the same result: flop1 and flop2 will swap values

every clock.

The other assignment operator, "=", is referred to as a blocking assignment. When "="

assignment is used, for the purposes of logic, the target variable is updated immediately. In the

above example, had the statements used the "=" blocking operator instead of "<=", flop1 and

flop2 would not have been swapped. Instead, as in traditional programming, the compiler would

understand to simply set flop1 equal to flop2 (and subsequently ignore the redundant logic to set

flop2 equal to flop1.)

Ex3: An example counter circuit follows:

module Div20x (rst, clk, cet, cep, count, tc);

// TITLE 'Divide-by-20 Counter with enables'

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 38

Page 39: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

// enable CEP is a clock enable only

// enable CET is a clock enable and

// enables the TC output

// a counter using the Verilog language

parameter size = 5;

parameter length = 20;

input rst; // These inputs/outputs represent

input clk; // connections to the module.

input cet;

input cep;

output [size-1:0] count;

output tc;

reg [size-1:0] count; // Signals assigned

// within an always

// (or initial)block

// must be of type reg

wire tc; // Other signals are of type wire

// The always statement below is a parallel

// execution statement that

// executes any time the signals

// rst or clk transition from low to high

always @ (posedge clk or posedge rst)

if (rst) // This causes reset of the cntr

count <= {size{1'b0}};

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 39

Page 40: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

else

if (cet && cep) // Enables both true

begin

if (count == length-1)

count <= {size{1'b0}};

else

count <= count + 1'b1;

end

// the value of tc is continuously assigned

// the value of the expression

assign tc = (cet && (count == length-1));

endmodule

Ex4: An example of delays:

...

reg a, b, c, d;

wire e;

...

always @(b or e)

begin

a = b & e;

b = a | b;

#5 c = b;

d = #6 c ^ e;

end

The always clause above illustrates the other type of method of use, i.e. the always clause

executes any time any of the entities in the list change, i.e. the b or e change. When one of these

changes, immediately a is assigned a new value, and due to the blocking assignment b is

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 40

Page 41: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

assigned a new value afterward (taking into account the new value of a.) After a delay of 5 time

units, c is assigned the value of b and the value of c ^ e is tucked away in an invisible store. Then

after 6 more time units, d is assigned the value that was tucked away.

Signals that are driven from within a process (an initial or always block) must be of type reg.

Signals that are driven from outside a process must be of type wire. The keyword reg does not

necessarily imply a hardware register.

7.8 Constants

The definition of constants in Verilog supports the addition of a width parameter. The basic

syntax is:

<Width in bits>'<base letter><number>

Examples:

12'h123 - Hexadecimal 123 (using 12 bits)

20'd44 - Decimal 44 (using 20 bits - 0 extension is automatic)

4'b1010 - Binary 1010 (using 4 bits)

6'o77 - Octal 77 (using 6 bits)

7.9 Synthesizable Constructs

There are several statements in Verilog that have no analog in real hardware, e.g.

$display. Consequently, much of the language can not be used to describe hardware. The

examples presented here are the classic subset of the language that has a direct mapping to real

gates.

// Mux examples - Three ways to do the same thing.

// The first example uses continuous assignment

wire out;

assign out = sel ? a : b;

// the second example uses a procedure

// to accomplish the same thing.

reg out;

always @(a or b or sel)

begin

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 41

Page 42: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

case(sel)

1'b0: out = b;

1'b1: out = a;

endcase

end

// Finally - you can use if/else in a

// procedural structure.

reg out;

always @(a or b or sel)

if (sel)

out = a;

else

out = b;

The next interesting structure is a transparent latch; it will pass the input to the output

when the gate signal is set for "pass-through", and captures the input and stores it upon transition

of the gate signal to "hold". The output will remain stable regardless of the input signal while the

gate is set to "hold". In the example below the "pass-through" level of the gate would be when

the value of the if clause is true, i.e. gate = 1. This is read "if gate is true, the din is fed to

latch_out continuously." Once the if clause is false, the last value at latch_out will remain and is

independent of the value of din.

EX6: // Transparent latch example

reg out;

always @(gate or din)

if(gate)

out = din; // Pass through state

// Note that the else isn't required here. The variable

// out will follow the value of din while gate is high.

// When gate goes low, out will remain constant.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 42

Page 43: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

The flip-flop is the next significant template; in Verilog, the D-flop is the simplest, and it can be

modeled as:

reg q;

always @(posedge clk)

q <= d;

The significant thing to notice in the example is the use of the non-blocking assignment.

A basic rule of thumb is to use <= when there is a posedge or negedge statement within the

always clause.

A variant of the D-flop is one with an asynchronous reset; there is a convention that the

reset state will be the first if clause within the statement.

reg q;

always @(posedge clk or posedge reset)

if(reset)

q <= 0;

else

q <= d;

The next variant is including both an asynchronous reset and asynchronous set condition;

again the convention comes into play, i.e. the reset term is followed by the set term.

reg q;

always @(posedge clk or posedge reset or posedge set)

if(reset)

q <= 0;

else

if(set)

q <= 1;

else

q <= d;

Note: If this model is used to model a Set/Reset flip flop then simulation errors can result.

Consider the following test sequence of events. 1) reset goes high 2) clk goes high 3) set goes

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 43

Page 44: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

high 4) clk goes high again 5) reset goes low followed by 6) set going low. Assume no setup and

hold violations.

In this example the always @ statement would first execute when the rising edge of reset

occurs which would place q to a value of 0. The next time the always block executes would be

the rising edge of clk which again would keep q at a value of 0. The always block then executes

when set goes high which because reset is high forces q to remain at 0. This condition may or

may not be correct depending on the actual flip flop. However, this is not the main problem with

this model. Notice that when reset goes low, that set is still high. In a real flip flop this will cause

the output to go to a 1. However, in this model it will not occur because the always block is

triggered by rising edges of set and reset - not levels. A different approach may be necessary for

set/reset flip flops.

Note that there are no "initial" blocks mentioned in this description. There is a split

between FPGA and ASIC synthesis tools on this structure. FPGA tools allow initial blocks

where reg values are established instead of using a "reset" signal. ASIC synthesis tools don't

support such a statement. The reason is that an FPGA's initial state is something that is

downloaded into the memory tables of the FPGA. An ASIC is an actual hardware

implementation.

7.10 Initial Vs Always

There are two separate ways of declaring a Verilog process. These are the always and

the initial keywords. The always keyword indicates a free-running process. The initial keyword

indicates a process executes exactly once. Both constructs begin execution at simulator time 0,

and both execute until the end of the block. Once an always block has reached its end, it is

rescheduled (again). It is a common misconception to believe that an initial block will execute

before an always block. In fact, it is better to think of the initial-block as a special-case of

the always-block, one which terminates after it completes for the first time.

//Examples:

initial

begin

a = 1; // Assign a value to reg a at time 0

#1; // Wait 1 time unit

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 44

Page 45: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

b = a; // Assign the value of reg a to reg b

end

always @(a or b) // Any time a or b CHANGE, run the process

begin

if (a)

c = b;

else

d = ~b;

end // Done with this block, now return to the top (i.e. the @ event-control)

always @(posedge a)// Run whenever reg a has a low to high change

a <= b;

These are the classic uses for these two keywords, but there are two significant additional

uses. The most common of these is an alwayskeyword without the @(...) sensitivity list. It is

possible to use always as shown below:

always

begin // Always begins executing at time 0 and NEVER stops

clk = 0; // Set clk to 0

#1; // Wait for 1 time unit

clk = 1; // Set clk to 1

#1; // Wait 1 time unit

end // Keeps executing - so continue back at the top of the begin

The always keyword acts similar to the "C" construct while(1) {..} in the sense that it will

execute forever.

The other interesting exception is the use of the initial keyword with the addition of

the forever keyword.

7.11 Race Condition

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 45

Page 46: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

The order of execution isn't always guaranteed within Verilog. This can best be

illustrated by a classic example. Consider the code snippet below:

initial

a = 0;

initial

b = a;

initial

begin

#1;

$display("Value a=%b Value of b=%b",a,b);

end

What will be printed out for the values of a and b? Depending on the order of execution of the

initial blocks, it could be zero and zero, or alternately zero and some other arbitrary uninitialized

value. The $display statement will always execute after both assignment blocks have completed,

due to the #1 delay.

7.12 Operators

Note: These operators are not shown in order of precedence.

Bitwise

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 46

Page 47: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

Logical

Reduction

Arithmetic

Relational

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 47

Page 48: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

Shift

7.13 System Tasks

System tasks are available to handle simple I/O, and various design measurement

functions. All system tasks are prefixed with $ to distinguish them from user tasks and functions.

This section presents a short list of the most often used tasks. It is by no means a comprehensive

list.

$display - Print to screen a line followed by an automatic newline.

$write - Write to screen a line without the newline.

$swrite - Print to variable a line without the newline.

$sscanf - Read from variable a format-specified string. (*Verilog-2001)

$fopen - Open a handle to a file (read or write)

$fdisplay - Write to file a line followed by an automatic newline.

$fwrite - Write to file a line without the newline.

$fscanf - Read from file a format-specified string. (*Verilog-2001)

$fclose - Close and release an open file handle.

$readmemh - Read hex file content into a memory array.

$readmemb - Read binary file content into a memory array.

$monitor - Print out all the listed variables when any change value.

$time - Value of current simulation time.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 48

Page 49: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

$dumpfile - Declare the VCD (Value Change Dump) format output file name.

$dumpvars - Turn on and dump the variables.

$dumpports - Turn on and dump the variables in Extended-VCD format.

$random - Return a random value

8. CHAPTER

8.1 CONCLUSION

This paper proposed a fast multiplier architecture for signed Q-format multiplications

using Urdhava Tiryakbhyam method of Vedic mathematics. Since Q-format representation is

widely used in Digital Signal Processors the proposed multiplier can substantially speed up the

multiplication operation which is the basic hardware block. They occupy less area and are faster

than the booth multipliers. It is also shown that the Urdhava Tiryakbhyam Q-format multiplier is

faster than Xilinx parallel multiplier Intellectual Property (IP) using slice LUT’s and also using

DSP48E blocks which are meant specifically for digital signal processing operations like

multiply-accumulate, multiply-add etc. Therefore the Urdhava Tiryakbhyam Q-format multiplier

is best suited for signal processing applications requiring faster multiplications.

8.2 FUTURE SCOPE

Future work lies in the direction of introducing pipeline stages in the multiplier

architecture for maximizing throughput.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 49

Page 50: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

REFERENCES

[1] Wallace, C.S., “A suggestion for a fast multiplier,” IEEE Trans. Elec. Comput., vol. EC-13,

no. 1, pp. 14-17, Feb. 1964.

[2] Booth, A.D., “A signed binary multiplication technique,” Quarterly Journal of Mechanics and

Applied Mathematics, vol. 4, pt. 2, pp. 236- 240, 1951.

[3] Jagadguru Swami Sri Bharath, Krsna Tirathji, “Vedic Mathematics or Sixteen Simple Sutras

From The Vedas”, Motilal Banarsidas, Varanasi(India),1986.

[4] A.P. Nicholas, K.R Williams, J. Pickles, “Application of Urdhava Sutra”, Spiritual Study

Group, Roorkee (India),1984.

[5] Neil H.E Weste, David Harris, Ayan anerjee,”CMOS VLSI Design, A Circuits and Systems

Perspective”,Third Edition, Published by Person Education, PP-327-328]

[6] Mrs. M. Ramalatha, Prof. D. Sridharan, “VLSI Based High Speed Karatsuba Multiplier for

Cryptographic Applications Using Vedic Mathematics”, IJSCI, 2007

[7] M.C. Hanumantharaju, H. Jayalaxmi, R.K. Renuka, M. Ravishankar, "A High Speed Block

Convolution Using Ancient Indian Vedic Mathematics," ICCIMA, vol. 2, pp.169-173,

International Conference on Computational Intelligence and Multimedia Applications, 2007.

[8] Harpreet Singh Dhilon And Abhijit Mitra,“A Reduced-Bit Multiplication Algorithm For

Digital Arithmetic” International Journal of Computational and Mathematical Sciences, Waset,

Spring, 2008.

[9] S. Kumaravel, Ramalatha Marimuthu, "VLSI Implementation of High Performance RSA

Algorithm Using Vedic Mathematics," ICCIMA, vol. 4,pp.126-128, International Conference on

Computational Intelligence and Multimedia Applications (ICCIMA 2007), 2007.

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 50

Page 51: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

9.CHAPTER9.1 SOURCE CODE`timescale 1ns / 1ps//////////////////////////////////////////////////////////////////////////////////// Company: // Engineer: // // Create Date: 19:11:00 12/05/2012 // Design Name: // Module Name: u7 // Project Name: // Target Devices: // Tool versions: // Description: //// Dependencies: //// Revision: // Revision 0.01 - File Created// Additional Comments: ////////////////////////////////////////////////////////////////////////////////////module ha(input a,b,output sum ,car);assign sum=a^b;assign car=a&b;endmodule

`timescale 1ns / 1ps

//////////////////////////////////////////////////////////////////////////////////// Company: // Engineer: // // Create Date: 15:31:34 12/05/2012 // Design Name: // Module Name: prime // Project Name: // Target Devices: // Tool versions: // Description: //// Dependencies: //// Revision: // Revision 0.01 - File Created// Additional Comments: ////////////////////////////////////////////////////////////////////////////////////

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 51

Page 52: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

module fa(input a,b,c,output sum,car);

assign sum=a^b^c;assign car=(a&b)|(b&c)|(c&a);endmodule

`timescale 1ns / 1ps

//////////////////////////////////////////////////////////////////////////////////// Company: // Engineer: // // Create Date: 13:23:41 12/01/2012 // Design Name: // Module Name: copy // Project Name: // Target Devices: // Tool versions: // Description: //// Dependencies: //// Revision: // Revision 0.01 - File Created// Additional Comments: /////////////////////////////////////////////////////////////////////////////////

module copy#(parameter n=16)(input [n-1:0] mr,mc,output [2*n-1:0] y);reg [n+1:0] m;reg [2*n-1:0] y1,y2,y3,y4,y5,y6,y7,y8,y9,yb;integer i;

always@(*) beginbegin m={mr,1'b0}; yb={16'b0,mc}; end

beginif (m[2:0]==3'b000||m[2:0]==3'b111) y1=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y1=yb; else if (m[2:0]==3'b011) y1=yb+yb; else if (m[2:0]==3'b100) y1=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y1=(~yb+1'b1); end

beginm=m>>2;

if (m[2:0]==3'b000||m[2:0]==3'b111) y2=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y2=yb; else if (m[2:0]==3'b011) y2=yb+yb;

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 52

Page 53: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

else if (m[2:0]==3'b100) y2=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y2=(~yb+1'b1); end

beginm=m>>2;

if (m[2:0]==3'b000||m[2:0]==3'b111) y3=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y3=yb; else if (m[2:0]==3'b011) y3=yb+yb; else if (m[2:0]==3'b100) y3=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y3=(~yb+1'b1); end

beginm=m>>2;

if (m[2:0]==3'b000||m[2:0]==3'b111) y4=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y4=yb; else if (m[2:0]==3'b011) y4=yb+yb; else if (m[2:0]==3'b100) y4=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y4=(~yb+1'b1);

end

beginm=m>>2;

if (m[2:0]==3'b000||m[2:0]==3'b111) y5=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y5=yb; else if (m[2:0]==3'b011) y5=yb+yb; else if (m[2:0]==3'b100) y5=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y5=(~yb+1'b1); end

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 53

Page 54: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

beginm=m>>2;

if (m[2:0]==3'b000||m[2:0]==3'b111) y6=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y6=yb; else if (m[2:0]==3'b011) y6=yb+yb; else if (m[2:0]==3'b100) y6=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y6=(~yb+1'b1); end

beginm=m>>2;

if (m[2:0]==3'b000||m[2:0]==3'b111) y7=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y7=yb; else if (m[2:0]==3'b011) y7=yb+yb; else if (m[2:0]==3'b100) y7=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y7=(~yb+1'b1); end

beginm=m>>2;

if (m[2:0]==3'b000||m[2:0]==3'b111) y8=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y8=yb; else if (m[2:0]==3'b011) y8=yb+yb; else if (m[2:0]==3'b100) y8=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y8=(~yb+1'b1); end

beginm=m>>2;

if (m[2:0]==3'b000||m[2:0]==3'b111) y9=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y9=yb; else if (m[2:0]==3'b011) y9=yb+yb; else if (m[2:0]==3'b100) y9=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y9=(~yb+1'b1); endend

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 54

Page 55: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

assign y[1:0]=y1[1:0];

ha u1(y1[2],y2[0],y[2],c1);

fa u2(y1[3],y2[1],c1,y[3],c2);

fa u3(y1[4],y2[2],y3[0],s1,c3);ha u4(s1,c2,y[4],c4);

fa u5(y1[5],y2[3],y3[1],s2,c5);fa u6(s2,c3,c4,y[5],c6);

fa u7(y1[6],y2[4],y3[2],s3,c7);fa u8(s3,y4[0],c5,s4,c8);ha u9(s4,c6,y[6],c9);

fa u10(y1[7],y2[5],y3[3],s5,c10);fa u11(s5,y4[1],c7,s6,c11);fa u12(s6,c8,c9,y[7],c12);

fa u13(y1[8],y2[6],y3[4],s7,c13);fa u14(s7,y4[2],y5[0],s8,c14);fa u15(s8,c10,c11,s9,c15);ha u16(s9,c12,y[8],c16);

fa u17(y1[9],y2[7],y3[5],s10,c17);fa u18(s10,y4[3],y5[1],s11,c18);fa u19(s11,c13,c14,s12,c19);fa u20(s12,c15,c16,y[9],c20);

fa u21(y1[10],y2[8],y3[6],s13,c21);fa u22(s13,y4[4],y5[2],s14,c22);fa u23(s14,c17,c18,s15,c23);fa u24(s15,c19,c20,s16,c24);ha u25(s16,y6[0],y[10],c25);

fa u26(y1[11],y2[9],y3[7],s17,c26);fa u27(s17,y4[5],y5[3],s18,c27);fa u28(s18,c21,c22,s19,c28);fa u29(s19,c23,c24,s20,c29);fa u30(s20,c25,y6[1],y[11],c30);

fa u31(y1[12],y2[10],y3[8],s21,c31);fa u32(s21,y4[6],y5[4],s22,c32);fa u33(s22,c26,c27,s23,c33);fa u34(s23,c28,c29,s24,c34);fa u35(s24,c30,y6[2],s25,c35);ha u36(s25,y7[0],y[12],c36);

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 55

Page 56: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

fa u37(y1[13],y2[11],y3[9],s26,c37);fa u38(s26,y4[7],y5[5],s27,c38);fa u39(s27,c31,c32,s28,c39);fa u40(s28,c33,c34,s29,c40);fa u41(s29,c35,y6[3],s30,c41);fa u42(s30,y7[1],c36,y[13],c42);

fa u43(y1[14],y2[12],y3[10],s31,c43);fa u44(s31,y4[8],y5[6],s32,c44);fa u45(s32,c37,c38,s33,c45);fa u46(s33,c39,c40,s34,c46);fa u47(s34,c41,y6[4],s35,c47);fa u48(s35,y7[2],c42,s36,c48);ha u49(s36,y8[0],y[14],c49);

fa u50(y1[15],y2[13],y3[11],s37,c50);fa u51(s37,y4[9],y5[7],s38,c51);fa u52(s38,c43,c44,s39,c52);fa u53(s39,c45,c46,s40,c53);fa u54(s40,c47,y6[5],s41,c54);fa u55(s41,y7[3],c48,s42,c55);fa u56(s42,y8[1],c49,y[15],c56);

fa u57(y1[16],y2[14],y3[12],s43,c57);fa u58(s43,y4[10],y5[8],s44,c58);fa u59(s44,c50,c51,s45,c59);fa u60(s45,c52,c53,s46,c60);fa u61(s46,c54,y6[6],s47,c61);fa u62(s47,y7[4],c55,s48,c62);fa u63(s48,y8[2],c56,s49,c63);ha u64(s49,y9[0],y[16],c64);

fa u65(y1[17],y2[15],y3[13],s50,c65);fa u66(s50,y4[11],y5[9],s51,c66);fa u67(s51,c57,c58,s52,c67);fa u68(s52,c59,c60,s53,c68);fa u69(s53,c61,y6[7],s54,c69);fa u70(s54,y7[5],c62,s55,c70);fa u71(s55,y8[3],c63,s56,c71);fa u72(s56,y9[1],c64,y[17],c72);

fa u73(y1[18],y2[16],y3[14],s57,c73);fa u74(s57,y4[12],y5[10],s58,c74);fa u75(s58,c65,c66,s59,c75);fa u76(s59,c67,c68,s60,c76);fa u77(s60,c69,y6[8],s61,c77);fa u78(s61,y7[6],c70,s62,c78);fa u79(s62,y8[4],c71,s63,c79);

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 56

Page 57: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

fa u80(s63,y9[2],c72,y[18],c80);

fa u81(y1[19],y2[17],y3[15],s64,c81);fa u82(s64,y4[13],y5[11],s65,c82);fa u83(s65,c73,c74,s66,c83);fa u84(s66,c75,c76,s67,c84);fa u85(s67,c77,y6[9],s68,c85);fa u86(s68,y7[7],c78,s69,c86);fa u87(s69,y8[5],c79,s70,c87);fa u88(s70,y9[3],c80,y[19],c88);

fa u90(y1[20],y2[18],y3[16],s71,c89);fa u91(s71,y4[14],y5[12],s72,c90);fa u92(s72,c81,c82,s73,c91);fa u93(s73,c83,c84,s74,c92);fa u94(s74,c85,y6[10],s75,c93);fa u95(s75,y7[8],c86,s76,c94);fa u96(s76,y8[6],c87,s77,c95);fa u97(s77,y9[4],c88,y[20],c96);

fa u98(y1[21],y2[19],y3[17],s78,c97);fa u99(s78,y4[15],y5[13],s79,c98);fa u100(s79,c89,c90,s80,c99);fa u101(s80,c91,c92,s81,c100);fa u102(s81,c93,y6[11],s82,c101);fa u103(s82,y7[9],c94,s83,c102);fa u104(s83,y8[7],c95,s84,c103);fa u105(s84,y9[5],c96,y[21],c104);

fa u106(y1[22],y2[20],y3[18],s85,c105);fa u107(s85,y4[16],y5[14],s86,c106);fa u108(s86,c97,c98,s87,c107);fa u109(s87,c99,c100,s88,c108);fa u110(s88,c101,y6[12],s90,c109);fa u111(s90,y7[10],c102,s91,c110);fa u112(s91,y8[8],c103,s92,c111);fa u113(s92,y9[6],c104,y[22],c112);

fa u114(y1[23],y2[21],y3[19],s93,c113);fa u115(s93,y4[17],y5[15],s94,c114);fa u116(s94,c105,c106,s95,c115);fa u117(s95,c107,c108,s96,c116);fa u118(s96,c109,y6[13],s97,c117);fa u119(s97,y7[11],c110,s98,c118);fa u120(s98,y8[9],c111,s99,c119);fa u121(s99,y9[7],c112,y[23],c120);

fa u122(y1[24],y2[22],y3[20],s100,c121);fa u123(s100,y4[18],y5[16],s101,c122);

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 57

Page 58: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

fa u124(s101,c113,c114,s102,c123);fa u125(s102,c115,c116,s103,c124);fa u126(s103,c117,y6[14],s104,c125);fa u127(s104,y7[12],c118,s105,c126);fa u128(s105,y8[10],c119,s106,c127);fa u129(s106,y9[8],c120,y[24],c128);

fa u130(y1[25],y2[23],y3[21],s107,c129);fa u131(s107,y4[19],y5[17],s108,c130);fa u132(s108,c121,c122,s109,c131);fa u133(s109,c123,c124,s110,c132);fa u134(s110,c125,y6[15],s111,c133);fa u135(s111,y7[13],c126,s112,c134);fa u136(s112,y8[11],c127,s113,c135);fa u137(s113,y9[9],c128,y[25],c136);

fa u138(y1[26],y2[24],y3[22],s114,c137);fa u139(s114,y4[20],y5[18],s115,c138);fa u140(s115,c129,c130,s116,c139);fa u141(s116,c131,c132,s117,c140);fa u142(s117,c133,y6[16],s118,c141);fa u143(s118,y7[14],c134,s119,c142);fa u144(s119,y8[12],c135,s120,c143);fa u145(s120,y9[10],c136,y[26],c144);

fa u146(y1[27],y2[25],y3[23],s121,c145);fa u147(s121,y4[21],y5[19],s122,c146);fa u148(s122,c137,c138,s123,c147);fa u149(s123,c139,c140,s124,c148);fa u150(s124,c141,y6[17],s125,c149);fa u151(s125,y7[15],c142,s126,c150);fa u152(s126,y8[13],c143,s127,c151);fa u153(s127,y9[11],c144,y[27],c152);

fa u154(y1[28],y2[26],y3[24],s128,c153);fa u155(s128,y4[22],y5[20],s129,c154);fa u156(s129,c145,c146,s130,c155);fa u157(s130,c147,c148,s131,c156);fa u158(s131,c149,y6[18],s132,c157);fa u159(s132,y7[16],c150,s133,c158);fa u160(s133,y8[14],c151,s134,c159);fa u161(s134,y9[12],c152,y[28],c160);

fa u162(y1[29],y2[27],y3[25],s135,c161);fa u163(s135,y4[23],y5[21],s136,c162);fa u164(s136,c153,c154,s137,c163);fa u165(s137,c155,c156,s138,c164);fa u166(s138,c157,y6[19],s139,c165);fa u167(s139,y7[17],c158,s140,c166);fa u168(s140,y8[15],c159,s141,c167);

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 58

Page 59: Main Projectvlsi

A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER

fa u169(s141,y9[13],c160,y[29],c168);

fa u170(y1[30],y2[28],y3[26],s142,c169);fa u171(s142,y4[24],y5[22],s143,c170);fa u172(s143,c161,c162,s144,c171);fa u173(s144,c163,c164,s145,c172);fa u174(s145,c165,y6[20],s146,c173);fa u175(s146,y7[18],c166,s147,c174);fa u176(s147,y8[16],c167,s148,c175);fa u177(s148,y9[14],c168,y[30],c176);

fa u178(y1[31],y2[29],y3[27],s149,c177);fa u179(s149,y4[25],y5[23],s150,c180);fa u180(s150,c169,c170,s151,c181);fa u181(s151,c171,c172,s152,c182);fa u182(s152,c173,y6[21],s153,c183);fa u183(s153,y7[19],c174,s154,c184);fa u184(s154,y8[17],c175,s155,c185);fa u185(s155,y9[15],c176,y[31],c186);

endmodule

HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 59