main projectvlsi
DESCRIPTION
vlsi domainTRANSCRIPT
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
1.CHAPTER
1.1 INTRODUCTION
Multiplication is an important fundamental function in arithmetic operations.
Multiplication based operations such as Multiply and Accumulate (MAC) and inner product are
among some of the frequently used Computation Intensive Arithmetic Functions (CIAF)
currently implemented in many Digital Signal Processing (DSP) applications such as
convolution, Fast Fourier Transform(FFT), filtering and in microprocessors in its arithmetic and
logic unit [1]. Since multiplication dominates the execution time of most DSP algorithms, so
there is a need of high speed multiplier. Currently, multiplication time is still the dominant factor
in determining the instruction cycle time of a DSP chip.
The demand for high speed processing has been increasing as a result of expanding computer
and signal processing applications. Higher throughput arithmetic operations are important to
achieve the desired performance in many real-time signal and image processing applications [2].
One of the key arithmetic operations in such applications is multiplication and the
development of fast multiplier circuit has been a subject of interest over decades. Reducing the
time delay and power consumption are very essential requirements for many applications [2, 3].
This work presents different multiplier architectures. Multiplier based on Vedic Mathematics is
one of the fast and low power multiplier.
Minimizing power consumption for digital systems involves optimization at all levels of
the design. This optimization includes the technology used to implement the digital circuits, the
circuit style and topology, the architecture for implementing the circuits and at the highest level
the algorithms that are being implemented. Digital multipliers are the most commonly used
components in any digital circuit design. They are fast, reliable and efficient components that are
utilized to implement any operation. Depending upon the arrangement of the components, there
are different types of multipliers available. Particular multiplier architecture is chosen based on
the application.
In many DSP algorithms, the multiplier lies in the critical delay path and ultimately
determines the performance of algorithm. The speed of multiplication operation is of great
importance in DSP as well as in general processor. In the past multiplication was implemented
generally with a sequence of addition, subtraction and shift operations. There have been many
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 1
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
algorithms proposals in literature to perform multiplication, each offering different advantages
and having tradeoff in terms of speed, circuit complexity, area and power consumption.
The multiplier is a fairly large block of a computing system. The amount of circuitry
involved is directly proportional to the square of its resolution i.e. A multiplier of size n bits has
n2 gates. For multiplication algorithms performed in DSP applications latency and throughput
are the two major concerns from delay perspective. Latency is the real delay of computing a
function, a measure of how long the inputs to a device are stable is the final result available on
outputs.
Throughput is the measure of how many multiplications can be performed in a given
period of time; multiplier is not only a high delay block but also a major source of power
dissipation. That's why if one also aims to minimize power consumption, it is of great interest to
reduce the delay by using various delay optimizations.
Digital multipliers are the core components of all the digital signal processors (DSPs) and
the speed of the DSP is largely determined by the speed of its multipliers[9]. Two most common
multiplication algorithms followed in the digital hardware are array multiplication algorithm and
Booth multiplication algorithm. The computation time taken by the array multiplier is
comparatively less because the partial products are calculated independently in parallel. The
delay associated with the array multiplier is the time taken by the signals to propagate through
the gates that form the multiplication array. Booth multiplication is another important
multiplication algorithm. Large booth arrays are required for high speed multiplication and
exponential operations which in turn require large partial sum and partial carry registers.
Multiplication of two n-bit operands using a radix-4 booth recording multiplier requires
approximately n / (2m) clock cycles to generate the least significant half of the final product,
where m is the number of Booth recorder adder stages. Thus, a large propagation delay is
associated with this case. Due to the importance of digital multipliers in DSP, it has always been
an active area of research and a number of interesting multiplication algorithms have been
reported in the literature [4].
In this, Urdhva tiryakbhyam Sutra is first applied to the binary number system and is used
to develop digital multiplier architecture. This is shown to be very similar to the popular array
multiplier architecture. This Sutra also shows the effectiveness of to reduce the NXN multiplier
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 2
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
structure into an efficient 4X4 multiplier structures. Nikhilam Sutra is then discussed and is
shown to be much more efficient in the multiplication of large numbers as it reduces the
multiplication of two large numbers to that of two smaller ones. The proposed multiplication
algorithm is then illustrated to show its computational efficiency by taking an example of
reducing a 4X4-bit multiplication to a single 2X2-bit multiplication operation [4]. This work
presents a systematic design methodology for fast and area efficient digit multiplier based on
Vedic mathematics .The Multiplier Architecture is based on the Vertical and Crosswise
algorithm of ancient Indian Vedic Mathematics [5].
We have continued our development of the 12-bit by 11-bit multiplier for our
multiprocessor-ring chip. Since the multiplier's performance is the single most important factor
in determining the maximum speed at which our system can operate, it is essential that this task
be performed in the best possible manner. Our goal was that this multiplier operate in
approximately 30 ns, so that our overall chip speed could reach 30 MHz. The multiplier was
designed to accommodate eleven-bit data and twelve-bit coefficients. In addition to the above-
cited operating speed goal, we needed to make the multiplier small enough that five complete
processors, including five multipliers, can be included on a single IC (which employs 2 -jim
CMOS technology). As mentioned in the previous progress report, after completing the
multiplier's layout and verifying its correct functioning via logic simulation, we performed
SPICE simulations on the various multiplier critical-path components. These simulations
indicated an operating speed well under 30 ns. We next fabricated a MOSIS TinyChip (2.22 mm
x 2.25 mm, in 2-ym technology) with this 12-bit by 11-bit multiplier and a small block of
coefficient RAM as well as a small dual-port register block. This allowed us to test the
multiplier's performance in an environment that closely simulates the five-processor-ring system.
During the present quarter the fabricated TinyChip was tested using the Tektronix LV500
logic verification system, and the results verified the multiplier's proper functioning. The overall
delay of the TinyChip was measured to be 36 ns. The critical path of the multiplier is therefore
approximately 26 ns (subtracting off the read time approximately 10 ns). This is well below the
30 ns goal and corresponds well to the expected (simulation) result of 26.4 ns. Furthermore,
while this design was being fabricated, we laid out a redesigned vector-merge portion of the
multiplier. The simulated critical path of the new design is 22.3 ns. Since the multiplier
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 3
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
determines the operating speed of the overall five-processor ring chip, we now expect the
completed project to operate at approximately 40 MHz, far exceeding the proposed 30 MHz
goal, and a quite good performance level in absolute terms, considering the small size of our
multipliers, and considering that they are fabricated using 2-/m CMOS technology. Another
MOSIS TinyChip will be sent out for fabrication this quarter which employs the multiplier with
the re-designed vector-merge We have also begun the IC layout of the storage registers for the
program instructions used by our processors. We expect this layout to be completed during the
present quarter. Furthermore, work has been proceeding nicely on the IC layout of the ALU. This
has been approached by starting with the layout of the multiplier and gradually extending it
according to the specified ALU architecture. Another project has been started during the present
quarter. It concerns a novel signed-digit serial multiplier, which should prove useful for new
real-time FFT and adaptiv e-filtering applications.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 4
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
2. CHAPTER
2.1 VEDI MATHEMATICS
Vedic mathematics is part of four Vedas (books of wisdom). It is part of Sthapatya- Veda
(book on civil engineering and architecture), which is an upa-veda (supplement) of Atharva
Veda. It gives explanation of several mathematical terms including arithmetic, geometry (plane,
co-ordinate), trigonometry, quadratic equations, factorization and even calculus. His Holiness
Jagadguru Shankaracharya Bharati Krishna Teerthaji Maharaja (1884- 1960) comprised all this
work together and gave its mathematical explanation while discussing it for various applications.
Swamiji constructed 16 sutras (formulae) and 16 Upa sutras (sub formulae) after extensive
research in Atharva Veda. Obviously these formulae are not to be found in present text of
Atharva Veda because these formulae were constructed by Swamiji himself. Vedic mathematics
is not only a mathematical wonder but also it is logical. That’s why it has such a degree of
eminence which cannot be disapproved. Due these phenomenal characteristics,
Vedic maths has already crossed the boundaries of India and has become an interesting
topic of research abroad. Vedic maths deals with several basic as well as complex mathematical
operations. Especially, methods of basic arithmetic are extremely simple and powerful [2, 3].
The word “Vedic” is derived from the word “Veda” which means the store-house of all
knowledge. Vedic mathematics is mainly based on 16 Sutras (or aphorisms) dealing with various
branches of mathematics like arithmetic, algebra, geometry etc. These methods and ideas can be
directly applied to trigonometry, plain and spherical geometry, conics, calculus (both differential
and integral), and applied mathematics of various kinds. As mentioned earlier, all these Sutras
were reconstructed from ancient Vedic texts early in the last century. Many Sub-sutras were also
discovered at the same time, which are not discussed here.
The beauty of Vedic mathematics lies in the fact that it reduces the otherwise
cumbersome-looking calculations in conventional mathematics to a very simple one. This is so
because the Vedic formulae are claimed to be based on the natural principles on which the
human mind works. This is a very interesting field and presents some effective algorithms which
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 5
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
can be applied to various branches of engineering such as computing and digital signal
processing [ 1,4]. The multiplier architecture can be generally classified into three categories.
First is the serial multiplier which emphasizes on hardware and minimum amount of chip area.
Second is parallel multiplier (array and tree) which carries out high speed mathematical
operations. But the drawback is the relatively larger chip area consumption. Third is serial-
parallel multiplier which serves as a good trade-off between the times consuming serial
multiplier and the area consuming parallel multipliers.
Below is a list of those sixteen sutras, translated from Sanskrit into English. And also there are
some sub-sutras that are listed further.
• By one more than the one before
• All from 9 and the last from 10
• Vertically and crosswise
• Transpose and apply
• If the Samuccaya is the same it is zero
• If one is in ratio the other is zero
• By addition and by subtraction
• By the completion or non-completion
• Differential calculus
• By the deficiency
• Specific and general
• The remainders by the last digit
• The ultimate and twice the penultimate
• By one less than the one before
• The product of the sum
• All the multipliers
Below is a list of the Sub sutras or Corollaries:
• Proportionately.
• The remainder remains constant.
• The first by the first and the last by the last.
• For 7 the multiplicand is 143.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 6
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
• By osculation.
• Lessen by the deficiency.
• Whatever the deficiency lessen by that amount and set up the square of the deficiency.
• Last Totaling 10.
• Only the last terms.• The sum of the products.
• By alternative elimination and retention.
• By mere observation.
• The product of the sum is the sum of the products.
• On the flag.
TABLE 1
COMPARISON BETWEEN NORMAL
METHOD OF MULTIPLICATION AND VEDIC
MATHEMATICS MULTIPLICATIO
N
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 7
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
In algorithmic and structural levels, a lot of multiplication techniques had been developed
to enhance the efficiency of the multiplier; which encounters the reduction of the partial products
and/or the methods for their partial products addition, but the principle behind multiplication was
same in all cases.
Vedic Mathematics is the ancient system of Indian mathematics which has a unique
technique of calculations based on 16 Sutras (Formulae). "Urdhva-tiryakbyham" is a Sanskrit
word means vertically and crosswise formula is used for smaller number multiplication.
"Nikhilam Navatascaramam Dasatah" also a Sanskrit term indicating "all from 9 and last from
10", formula is used for large number multiplication and subtraction. All these formulas are
adopted from ancient Indian Vedic Mathematics. In this work we formulate this mathematics for
designing the complex multiplier architecture in transistor level with two clear goals
in mind such as:
i) Simplicity and modularity multiplications for VLSI implementations and
ii) The elimination of carry propagation for rapid additions and subtractions.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 8
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
Mehta et al. [9] have been proposed a multiplier design using "Urdhva-tiryakbyham" sutras,
which was adopted from the Vedas. The formulation using this sutra is similar to the modem
array multiplication, which also indicating the carry propagation issues. A multiplier design
using "Nikhilam Navatascaramam Dasatah" sutras has been reported by Tiwari et. al [10] in
2009, but he has not implemented the hardware module for multiplication.
Multiplier implementation in the gate level (FPGA) using Vedic Mathematics has already
been reported but to the best of our knowledge till date there is no report on transistor level
(ASIC) implementation of such complex multiplier. By employing the Vedic mathematics, an N
bit complex number multiplication was transformed into four multiplications for real and
imaginary terms of the final product. "Nikhilam Navatascaramam Dasatah" sutra is used for the
multiplication purpose, with less number of partial products generation, in comparison with array
based multiplication. When compared with existing methods such as the direct method or the
strength reduction technique, our approach resulted not only in simplified arithmetic operations,
but also in a regular array like structure.
The multiplier is fully parameterized, so any configuration of input and output word-
lengths could be elaborated. Transistor level implementation for perfonnance parameters such as
propagation delay, dynamic leakage power and dynamic switching power consumption
calculation of the proposed method was calculated by spice spectre using 90 nm standard CMOS
technology and compared with the other design like distributed arithmetic[3], parallel adder
based implementation [1] and algebraic transfonnation[2] based implementation. The calculated
results revealed (16,16)x(16,16) complex multiplier have propagation delay only 4 ns with
6.5mW dynamic switching power.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 9
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
3. CHAPTER
3.1 PROPOSED MODEL
The multiplier is based on an algorithm Urdhava Tiryakbhyam (Vertical & Crosswise) of
ancient Indian Vedic Mathematics. Urdhava Tiryakbhyam Sutra is a general multiplication
formula applicable to all cases of multiplication. It literally means “Vertically and crosswise”. It
is based on a novel concep t through which the generation of all partial products can be done
with the concurrent addition of these partial products. The parallelism in generation of partial
products and their summation is obtained using Urdhava Tiryakbhyam explained in Figure 1.
Figure 1: Multiplication of two decimal numbers by Urdhava Tiryakbhyam
The algorithm can be generalized for n x n bit number. Since the partial products and
their sums are calculated in parallel, the multiplier is independent of the clock frequency of the
processor. Thus the multiplier will require the same amount of time to calculate the product and
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 10
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
hence is independent of the clock frequency. The net advantage is that it reduces the need of
microprocessors to operate at increasingly high clock frequencies. While a higher clock
frequency generally results in increased processing power, its disadvantage is that it also
increases power dissipation which results in higher device operating temperatures. By adopting
the Vedic multiplier, microprocessors designers can easily circumvent these problems to avoid
catastrophic device failures.
The processing power of multiplier can easily be increased by increasing the input and
output data bus widths since it has a quite a regular structure . Due to its regular structure, it can
be easily layout in a silicon chip. The Multiplier has the advantage that as the number of bits
increases, gate delay and area increases very lowly as compared to other multipliers. Therefore it
is time, space and power efficient. Multiplication of two decimal numbers- 325*738 .
To illustrate this multiplication scheme, let us consider the multiplication of two decimal
numbers (325 * 738). Line diagram for the multiplication is shown in Figure 2. The digits on the
both sides of the line are multiplied and added with the carry from the previous step. This
generates one of the bits of the result and a carry. This carry is added in the next step and hence
the process goes on. If more than one line are there in one step, all the results are added to the
previous carry. In each step, least significant bit acts as the result bit and all other bits act as
carry for the next step. Initially the carry is taken to be zero. To make the methodology more
clear, an alternate illustration is given with the help of line diagrams in Figure 3 where the dots
represent bit 0 or1.
3.2 Algorithm for 8 X 8 Bit Multiplication
To illustrate the multiplication algorithm, let us consider the multiplication of two binary
numbers a3a2a1a0 and b3b2b1b0. As the result of this multiplication would be more than 4 bits,
we express it as... r3r2r1r0. Line diagram for multiplication of two 4- bit numbers is shown in
Figure 2 which is nothing but the mapping of the Figure 1 in binary system. For the simplicity,
each bit is represented by a circle. Least significant bit r0 is obtained by multiplying the least
significant bits of the multiplicand and the multiplier. The process is followed according to the
steps shown in Figure 2.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 11
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
Figure 2: Line diagram for multiplication of two 4 – bit numbers.
Firstly, least significant bits are multiplied which gives the least significant bit of the
product (vertical). Then, the LSB of the multiplicand is multiplied with the next higher bit of the
multiplier and added with the product of LSB of multiplier and next higher bit of the
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 12
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
multiplicand (crosswise). The sum gives second bit of the product and the carry is added in the
output of next stage sum obtained by the crosswise and vertical multiplication and addition of
three bits of the two numbers from least significant position.
Next, all the four bits are processed with crosswise multiplication and addition to give the
sum and carry. The sum is the corresponding bit of the product and the carry is again added to
the next stage multiplication and addition of three bits except the LSB. The same operation
continues until the multiplication of the two MSBs to give the MSB of the product. For example,
if in some intermediate step, we get 110, then 0 will act as result bit (referred as rn) and 11 as the
carry (referred as cn). It should be clearly noted that can may be a multi-bit number.
Thus we get the following expressions:
r0=a0b0; (1)
c1r1=a1b0+a0b1 (2)
c2r2=c1+a2b0+a1b1 + a0b2 (3)
c3r3=c2+a3b0+a2b1 + a1b2 + a0b3 (4)
c4r4=c3+a3b1+a2b2 + a1b3 (5)
c5r5=c4+a3b2+a2b3 (6)
c6r6=c5+a3b3 (7)
With c6r6r5r4r3r2r1r0 being the final product. Hence this is the general mathematical
formula applicable to all cases of multiplication.
The hardware realization of a 4-bit multiplier is shown in Figure 3. This hardware design
is very similar to that of the famous array multiplier where an array of adders is required to arrive
at the final product. All the partial products are calculated in parallel and the delay associated is
mainly the time taken by the carry to propagate through the adders which form the multiplication
array.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 13
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
Figure 3: Hardware architecture of the Urdhava tiryakbhyam multiplier.
3.3 ARCHITECTURE
Our design of Q-format signed multiplier includes Urdhava Tiryakbhyam integer
multiplier [4] with certain modifications as follows. This multiplier is faster since all the partial
products are computed concurrently. Considering a 16 bit Q15 multiplier, the product is also a
Q15 number which is 16 bits long. Firstly, if the MSB of input is 1 then it is a negative number.
Therefore 2’s complement of the number is taken before proceeding with multiplication. Since
the MSB denotes sign it is excluded and a ‘0’ is placed in this position while multiplying. A Q15
format multiplier consists of four 8 x 8 Urdhava multipliers and the resulting product is 32 bits
long as shown in fig. 4. But the product of a Q15 number is also a Q15 number which should be
16 bits long.
Therefore the 32 bit product is left shifted by 1 b redundant sign bit and only the most
signify product are considered which constitute the xor operation is performed on the input sign
the sign of the result. If the output is conversion of the 16 bit final result to it format indicating a
negative product.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 14
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
Fig. 4. Architecture of a Q15 format multiplier.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 15
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
Fig. 5.\Architecture of Q31format multiplier.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 16
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
4. CHAPTER
4.1 IMPLEMENTATION
The Vedic multiplier is implemented using VHDL and also other multipliers like booth
multiplier and array multiplier are also implemented. The functional verification through
simulation of the VHDL code was carried out using ModelSim SE 6.0 simulator. The entire code
is completely synthesizable. The synthesis is done using Xilinx Synthesis Tool (XST) available
with Xilinx ISE 9.1i.
The design is optimized for speed and area using Xilinx, device family Spartan3, package
pq208, and speed grade -4. Table 1 and Table 2 indicate the device utilization summary of the
array, booth and Vedic multiplier for 8 bit and 16 bit respectively. Table 3 gives the timing
report of the implementation in terms of maximum combinational path delay. While Figure 4
indicates the RTL schematic of the 8 bit Vedic multiplier, that of 8 bit Normal multiplier is
indicated illustrated in Figure 5.
TABLE 1
DEVICE UTILIZATION SUMMARY
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 17
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
TABLE 2
DEVICE UTILIZATION SUMMARY
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 18
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
TABLE 3
MAXIMUM COMBINATIONAL PATH DELAY (ns)
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 19
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
5. CHAPTER
5.1 INTRODUCTION TO VLSI
Very-large-scale integration (VLSI) is the process of creating integrated circuits by
combining thousands of transistor-based circuits into a single chip. VLSI began in the 1970s
when complex semiconductor and communication technologies were being developed. The
microprocessor is a VLSI device. The term is no longer as common as it once was, as chips have
increased in complexity into the hundreds of millions of transistors.
5.2 Overview
The first semiconductor chips held one transistor each. Subsequent advances added more
and more transistors, and, as a consequence, more individual functions or systems were
integrated over time. The first integrated circuits held only a few devices, perhaps as many as ten
diodes, transistors, resistors and capacitors, making it possible to fabricate one or more logic
gates on a single device. Now known retrospectively as "small-scale integration" (SSI),
improvements in technique led to devices with hundreds of logic gates, known as large-scale
integration (LSI), i.e. systems with at least a thousand logic gates. Current technology has moved
far past this mark and today's microprocessors have many millions of gates and hundreds of
millions of individual transistors.
At one time, there was an effort to name and calibrate various levels of large-scale
integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were used. But the huge
number of gates and transistors available on common devices has rendered such fine distinctions
moot.
Terms suggesting greater than VLSI levels of integration are no longer in widespread use.
Even VLSI is now somewhat quaint, given the common assumption that all microprocessors are
VLSI or better.
As of early 2008, billion-transistor processors are commercially available, an example of
which is Intel's Montecito Itanium chip. This is expected to become more commonplace as
semiconductor fabrication moves from the current generation of 65 nm processes to the next 45
nm generations (while experiencing new challenges such as increased variation across process
corners). Another notable example is NVIDIA’s 280 series GPU.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 20
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
This microprocessor is unique in the fact that its 1.4 Billion transistor count, capable of a
teraflop of performance, is almost entirely dedicated to logic (Itanium's transistor count is largely
due to the 24MB L3 cache). Current designs, as opposed to the earliest devices, use extensive
design automation and automated logic synthesis to lay out the transistors, enabling higher levels
of complexity in the resulting logic functionality. Certain high-performance logic blocks like the
SRAM cell, however, are still designed by hand to ensure the highest efficiency (sometimes by
bending or breaking established design rules to obtain the last bit of performance by trading
stability).
5.3 What is VLSI
VLSI stands for "Very Large Scale Integration". This is the field which involves packing
more and more logic devices into smaller and smaller areas.
VLSI
1. Simply we say Integrated circuit is many transistors on one chip.
2. Design/manufacturing of extremely small, complex circuitry using modified
semiconductor material
3. Integrated circuit (IC) may contain millions of transistors, each a few mm in size
4. Applications wide ranging: most electronic logic devices
5.4 History of Scale Integration
late 40s Transistor invented at Bell Labs
late 50s First IC (JK-FF by Jack Kilby at TI)
early 60s Small Scale Integration (SSI)
10s of transistors on a chip
late 60s Medium Scale Integration (MSI)
100s of transistors on a chip
early 70s Large Scale Integration (LSI)
1000s of transistor on a chip
early 80s VLSI 10,000s of transistors on a
chip (later 100,000s & now 1,000,000s)
Ultra LSI is sometimes used for 1,000,000s
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 21
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
SSI - Small-Scale Integration (0-102)
MSI - Medium-Scale Integration (102-103)
LSI - Large-Scale Integration (103-105)
VLSI - Very Large-Scale Integration (105-107)
ULSI - Ultra Large-Scale Integration (>=107)
5.5 Advantages of ICs over discrete components
While we will concentrate on integrated circuits , the properties of integrated circuits-
what we can and cannot efficiently put in an integrated circuit-largely determine the architecture
of the entire system. Integrated circuits improve system characteristics in several critical ways.
ICs have three key advantages over digital circuits built from discrete components:
Size. Integrated circuits are much smaller-both transistors and wires are shrunk to
micrometer sizes, compared to the millimeter or centimeter scales of discrete
components. Small size leads to advantages in speed and power consumption, since
smaller components have smaller parasitic resistances, capacitances, and inductances.
Speed. Signals can be switched between logic 0 and logic 1 much quicker within a chip
than they can between chips. Communication within a chip can occur hundreds of times
faster than communication between chips on a printed circuit board. The high speed of
circuits on-chip is due to their small size-smaller components and wires have smaller
parasitic capacitances to slow down the signal.
Power consumption. Logic operations within a chip also take much less power. Once
again, lower power consumption is largely due to the small size of circuits on the chip-
smaller parasitic capacitances and resistances require less power to drive them.
5.6 VLSI and systems
These advantages of integrated circuits translate into advantages at the system level:
Smaller physical size. Smallness is often an advantage in itself-consider portable
televisions or handheld cellular telephones.
Lower power consumption. Replacing a handful of standard parts with a single chip
reduces total power consumption. Reducing power consumption has a ripple effect on the
rest of the system: a smaller, cheaper power supply can be used; since less power
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 22
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
consumption means less heat, a fan may no longer be necessary; a simpler cabinet with
less shielding for electromagnetic shielding may be feasible, too.
Reduced cost. Reducing the number of components, the power supply requirements,
cabinet costs, and so on, will inevitably reduce system cost. The ripple effect of
integration is such that the cost of a system built from custom ICs can be less, even
though the individual ICs cost more than the standard parts they replace.
Understanding why integrated circuit technology has such profound influence on the
design of digital systems requires understanding both the technology of IC manufacturing and
the economics of ICs and digital systems.
Applications
Electronic system in cars.
Digital electronics control VCRs
Transaction processing system, ATM
Personal computers and Workstations
Etc….
5.7 Applications of VLSI
Electronic systems now perform a wide variety of tasks in daily life. Electronic
systems in some cases have replaced mechanisms that operated mechanically, hydraulically, or
by other means; electronics are usually smaller, more flexible, and easier to service. In other
cases electronic systems have created totally new applications. Electronic systems perform a
variety of tasks, some of them visible, some more hidden:
Personal entertainment systems such as portable MP3 players and DVD
players perform sophisticated algorithms with remarkably little energy.
Electronic systems in cars operate stereo systems and displays; they also
control fuel injection systems, adjust suspensions to varying terrain, and
perform the control functions required for anti-lock braking (ABS) systems.
Digital electronics compress and decompress video, even at high-definition
data rates, on-the-fly in consumer electronics.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 23
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
Low-cost terminals for Web browsing still require sophisticated electronics,
despite their dedicated function.
Personal computers and workstations provide word-processing, financial
analysis, and games. Computers include both central processing units (CPUs)
and special-purpose hardware for disk access, faster screen display, etc.
Medical electronic systems measure bodily functions and perform complex
processing algorithms to warn about unusual conditions. The availability of
these complex systems, far from overwhelming consumers, only creates
demand for even more complex systems.
The growing sophistication of applications continually pushes the design and
manufacturing of integrated circuits and electronic systems to new levels of complexity. And
perhaps the most amazing characteristic of this collection of systems is its variety-as systems
become more complex, we build not a few general-purpose computers but an ever wider range of
special-purpose systems. Our ability to do so is a testament to our growing mastery of both
integrated circuit manufacturing and design, but the increasing demands of customers continue to
test the limits of design and manufacturing
5.8 ASIC
An Application-Specific Integrated Circuit (ASIC) is an integrated circuit (IC)
customized for a particular use, rather than intended for general-purpose use. For example, a chip
designed solely to run a cell phone is an ASIC. Intermediate between ASICs and industry
standard integrated circuits, like the 7400 or the 4000 series, are application specific standard
products (ASSPs).
As feature sizes have shrunk and design tools improved over the years, the maximum
complexity (and hence functionality) possible in an ASIC has grown from 5,000 gates to over
100 million. Modern ASICs often include entire 32-bit processors, memory blocks including
ROM, RAM, EEPROM, Flash and other large building blocks. Such an ASIC is often termed a
SoC (system-on-a-chip). Designers of digital ASICs use a hardware description language (HDL),
such as Verilog or VHDL, to describe the functionality of ASICs.
Field-programmable gate arrays (FPGA) are the modern-day technology for building a
breadboard or prototype from standard parts; programmable logic blocks and programmable
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 24
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
interconnects allow the same FPGA to be used in many different applications. For smaller
designs and/or lower production volumes, FPGAs may be more cost effective than an ASIC
design even in production. An application-specific integrated circuit (ASIC) is an integrated
circuit (IC) customized for a particular use, rather than intended for general-purpose use.
1. A Structured ASIC falls between an FPGA and a Standard Cell-based ASIC
2. Structured ASIC’s are used mainly for mid-volume level design. The design task for
structured ASIC’s is to map the circuit into a fixed arrangement of known cells.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 25
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
6. CHAPTER
6.1 INTRODUCTION TO XILINX
Migrating Projects from Previous ISE Software Releases
When you open a project file from a previous release, the ISE® software prompts you to
migrate your project. If you click Backup and Migrate or Migrate Only, the software
automatically converts your project file to the current release. If you click Cancel, the software
does not convert your project and, instead, opens Project Navigator with no project loaded.
Note: After you convert your project, you cannot open it in previous versions of the ISE
software, such as the ISE 11 software. However, you can optionally create a backup of the
original project as part of project migration, as described below.
To Migrate a Project
1. In the ISE 12 Project Navigator, select File > Open Project.
2. In the Open Project dialog box, select the .xise file to migrate.
Note You may need to change the extension in the Files of type field to display .npl
(ISE 5 and ISE 6 software) or .ise (ISE 7 through ISE 10 software) project files.
3. In the dialog box that appears, select Backup and Migrate or Migrate Only.
4. The ISE software automatically converts your project to an ISE 12 project.
Note If you chose to Backup and Migrate, a backup of the original project is created at
project_name_ise12migration.zip.
5. Implement the design using the new version of the software.
Note Implementation status is not maintained after migration.
6.2 Properties
For information on properties that have changed in the ISE 12 software, see ISE 11 to
ISE 12 Properties Conversion.
6.3 IP Modules
If your design includes IP modules that were created using CORE Generator™ software
or Xilinx® Platform Studio (XPS) and you need to modify these modules, you may be required
to update the core. However, if the core netlist is present and you do not need to modify the
core, updates are not required and the existing netlist is used during implementation.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 26
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
6.4 Obsolete Source File Types
The ISE 12 software supports all of the source types that were supported in the ISE 11
software.
If you are working with projects from previous releases, state diagram source files (.dia),
ABEL source files (.abl), and test bench waveform source files (.tbw) are no longer supported.
For state diagram and ABEL source files, the software finds an associated HDL file and adds it
to the project, if possible. For test bench waveform files, the software automatically converts the
TBW file to an HDL test bench and adds it to the project. To convert a TBW file after project
migration, see Converting a TBW File to an HDL Test Bench
6.5 Using ISE Example Projects
To help familiarize you with the ISE® software and with FPGA and CPLD designs, a set
of example designs is provided with Project Navigator. The examples show different design
techniques and source types, such as VHDL, Verilog, schematic, or EDIF, and include different
constraints and IP.
To Open an Example
1. Select File > Open Example.
2. In the Open Example dialog box, select the Sample Project Name.
Note To help you choose an example project, the Project Description field describes
each project. In addition, you can scroll to the right to see additional fields, which
provide details about the project.
3. In the Destination Directory field, enter a directory name or browse to the
directory.
4. Click OK.
5. The example project is extracted to the directory you specified in the Destination
Directory field and is automatically opened in Project Navigator. You can then run
processes on the example project and save any changes.
Note If you modified an example project and want to overwrite it with the original
example project, select File > Open Example, select the Sample Project Name, and specify the
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 27
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
same Destination Directory you originally used. In the dialog box that appears, select Overwrite
the existing project and click OK.
6.6 Creating a Project
Project Navigator allows you to manage your FPGA and CPLD designs using an ISE®
project, which contains all the source files and settings specific to your design. First, you must
create a project and then, add source files, and set process properties. After you create a project,
you can run processes to implement, constrain, and analyze your design. Project Navigator
provides a wizard to help you create a project as follows.
Note If you prefer, you can create a project using the New Project dialog box instead of
the New Project Wizard. To use the New Project dialog box, deselect the Use New Project
wizard option in the ISE General page of the Preferences dialog box.
To Create a Project
1. Select File > New Project to launch the New Project Wizard.
2. In the Create New Project page, set the name, location, and project type, and
click Next.
3. For EDIF or NGC/NGO projects only: In the Import EDIF/NGC Project page,
select the input and constraint file for the project, and click Next.
4. In the Project Settings page, set the device and project properties, and click
Next.
5. In the Project Summary page, review the information, and click Finish to
create the project
Project Navigator creates the project file (project_name.xise) in the directory you
specified. After you add source files to the project, the files appear in the Hierarchy pane of the
6.7 Design panel
Project Navigator manages your project based on the design properties (top-level module
type, device type, synthesis tool, and language) you selected when you created the project. It
organizes all the parts of your design and keeps track of the processes necessary to move the
design from design entry through implementation to programming the targeted Xilinx® device.
Note For information on changing design properties, see Changing Design Properties.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 28
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
You can now perform any of the following:
Create new source files for your project.
Add existing source files to your project.
Run processes on your source files.
Modify process properties.
6.8 Creating a Copy of a Project
You can create a copy of a project to experiment with different source options and
implementations. Depending on your needs, the design source files for the copied project and
their location can vary as follows:
Design source files are left in their existing location, and the copied project
points to these files.
Design source files, including generated files, are copied and placed in a
specified directory.
Design source files, excluding generated files, are copied and placed in a
specified directory.
Copied projects are the same as other projects in both form and function. For example, you can
do the following with copied projects:
Open the copied project using the File > Open Project menu command.
View, modify, and implement the copied project.
Use the Project Browser to view key summary data for the copied project and
then, open the copied project for further analysis and implementation, as described in
6.9 Using the Project Browser
Alternatively, you can create an archive of your project, which puts all of the project
contents into a ZIP file. Archived projects must be unzipped before being opened in Project
Navigator. For information on archiving, see Creating a Project Archive.
To Create a Copy of a Project
1. Select File > Copy Project.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 29
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
2. In the Copy Project dialog box, enter the Name for the
copy.
Note The name for the copy can be the same as the name for the project, as long as you
specify a different location.
3. Enter a directory Location to store the copied project.
4. Optionally, enter a Working directory.
By default, this is blank, and the working directory is the same as the project directory.
However, you can specify a working directory if you want to keep your ISE® project
file (.xise extension) separate from your working area.
5. Optionally, enter a Description for the copy.
The description can be useful in identifying key traits of the project for reference later.
6. In the Source options area, do the following:
Select one of the following options:
Keep sources in their current locations - to leave the design source files in their
existing location.
If you select this option, the copied project points to the files in their existing location. If
you edit the files in the copied project, the changes also appear in the original project, because
the source files are shared between the two projects.
Copy sources to the new location - to make a copy of all the design source files and
place them in the specified Location directory.
If you select this option, the copied project points to the files in the specified directory. If
you edit the files in the copied project, the changes do not appear in the original project, because
the source files are not shared between the two projects.
Optionally, select Copy files from Macro Search Path directories to copy files from
the directories you specify in the Macro Search Path property in the Translate Properties dialog
box. All files from the specified directories are copied, not just the files used by the design.
Note: If you added a net list source file directly to the project as described in Working
with Net list-Based IP, the file is automatically copied as part of Copy Project because it is a
project source file. Adding net list source files to the project is the preferred method for
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 30
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
incorporating net list modules into your design, because the files are managed automatically by
Project Navigator.
Optionally, click Copy Additional Files to copy files that were not included in the
original project. In the Copy Additional Files dialog box, use the Add Files and Remove Files
buttons to update the list of additional files to copy. Additional files are copied to the copied
project location after all other files are copied.To exclude generated files from the copy, such as
implementation results and reports, select
6.10 Exclude generated files from the copy
When you select this option, the copied project opens in a state in which processes have
not yet been run.
7. To automatically open the copy after creating it, select
Open the copied project.
Note By default, this option is disabled. If you leave this option disabled, the original
project remains open after the copy is made.
Click OK.
6.11 Creating a Project Archive
A project archive is a single, compressed ZIP file with a .zip extension. By default, it
contains all project files, source files, and generated files, including the following:
User-added sources and associated files
Remote sources
Verilog `include files
Files in the macro search path
Generated files
Non-project files
6.12 To Archive a Project
1. Select Project > Archive.
2. In the Project Archive dialog box, specify a file name and directory for the ZIP
file.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 31
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
3. Optionally, select Exclude generated files from the archive to exclude
generated files and non-project files from the archive.
4. Click OK.
A ZIP file is created in the specified directory. To open the archived project, you must
first unzip the ZIP file, and then, you can open the project.
Note Sources that reside outside of the project directory are copied into a remote_sources
subdirectory in the project archive. When the archive is unzipped and opened, you must either
specify the location of these files in the remote_sources subdirectory for the unzipped project, or
manually copy the sources into their original location.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 32
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
7. CHAPTER
7. 1 INTRODUCTION TO VERILOG
In the semiconductor and electronic design industry, Verilog is a hardware description
language(HDL) used to model electronic systems. Verilog HDL, not to be confused
with VHDL (a competing language), is most commonly used in the design, verification, and
implementation ofdigital logic chips at the register-transfer level of abstraction. It is also used in
the verification ofanalog and mixed-signal circuits.
7.2 Overview
Hardware description languages such as Verilog differ from software programming
languages because they include ways of describing the propagation of time and signal
dependencies (sensitivity). There are two assignment operators, a blocking assignment (=), and a
non-blocking (<=) assignment. The non-blocking assignment allows designers to describe a
state-machine update without needing to declare and use temporary storage variables (in any
general programming language we need to define some temporary storage spaces for the
operands to be operated on subsequently; those are temporary storage variables). Since these
concepts are part of Verilog's language semantics, designers could quickly write descriptions of
large circuits in a relatively compact and concise form. At the time of Verilog's introduction
(1984), Verilog represented a tremendous productivity improvement for circuit designers who
were already using graphical schematic capturesoftware and specially-written software programs
to document and simulate electronic circuits.
The designers of Verilog wanted a language with syntax similar to the C programming
language, which was already widely used in engineering software development. Verilog is case-
sensitive, has a basic preprocessor (though less sophisticated than that of ANSI C/C++), and
equivalent control flow keywords (if/else, for, while, case, etc.), and compatible operator
precedence. Syntactic differences include variable declaration (Verilog requires bit-widths on
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 33
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
net/reg types[clarification needed]), demarcation of procedural blocks (begin/end instead of curly braces
{}), and many other minor differences.
A Verilog design consists of a hierarchy of modules. Modules encapsulate design
hierarchy, and communicate with other modules through a set of declared input, output, and
bidirectional ports. Internally, a module can contain any combination of the following:
net/variable declarations (wire, reg, integer, etc.), concurrent and sequential statement blocks,
and instances of other modules (sub-hierarchies). Sequential statements are placed inside a
begin/end block and executed in sequential order within the block. But the blocks themselves are
executed concurrently, qualifying Verilog as a dataflow language.
Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating,
undefined") and strengths (strong, weak, etc.). This system allows abstract modeling of shared
signal lines, where multiple sources drive a common net. When a wire has multiple drivers, the
wire's (readable) value is resolved by a function of the source drivers and their strengths.
A subset of statements in the Verilog language is synthesizable. Verilog modules that
conform to a synthesizable coding style, known as RTL (register-transfer level), can be
physically realized by synthesis software. Synthesis software algorithmically transforms the
(abstract) Verilog source into a net list, a logically equivalent description consisting only of
elementary logic primitives (AND, OR, NOT, flip-flops, etc.) that are available in a
specific FPGA or VLSI technology. Further manipulations to the net list ultimately lead to a
circuit fabrication blueprint (such as a photo mask set for an ASIC or a bit stream file for
an FPGA).
7.3 History
Beginning
Verilog was the first modern hardware description language to be invented. It was created
by Phil Moorby and Prabhu Goel during the winter of 1983/1984. The wording for this process
was "Automated Integrated Design Systems" (later renamed to Gateway Design Automation in
1985) as a hardware modeling language. Gateway Design Automation was purchased
by Cadence Design Systems in 1990. Cadence now has full proprietary rights to Gateway's
Verilog and the Verilog-XL, the HDL-simulator that would become the de-facto standard (of
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 34
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
Verilog logic simulators) for the next decade. Originally, Verilog was intended to describe and
allow simulation; only afterwards was support for synthesis added.
7.4 Verilog-95
With the increasing success of VHDL at the time, Cadence decided to make the language
available for open standardization. Cadence transferred Verilog into the public domain under
the Open Verilog International (OVI) (now known as Accellera) organization. Verilog was later
submitted to IEEE and became IEEE Standard 1364-1995, commonly referred to as Verilog-95.
In the same time frame Cadence initiated the creation of Verilog-A to put standards
support behind its analog simulator Spectre. Verilog-A was never intended to be a standalone
language and is a subset of Verilog-AMS which encompassed Verilog-95.
7.5 Verilog 2001
Extensions to Verilog-95 were submitted back to IEEE to cover the deficiencies that
users had found in the original Verilog standard. These extensions became IEEE Standard 1364-
2001 known as Verilog-2001.
Verilog-2001 is a significant upgrade from Verilog-95. First, it adds explicit support for
(2's complement) signed nets and variables. Previously, code authors had to perform signed
operations using awkward bit-level manipulations (for example, the carry-out bit of a simple 8-
bit addition required an explicit description of the Boolean algebra to determine its correct
value). The same function under Verilog-2001 can be more succinctly described by one of the
built-in operators: +, -, /, *, >>>. A generate/endgenerate construct (similar to VHDL's
generate/endgenerate) allows Verilog-2001 to control instance and statement instantiation
through normal decision operators (case/if/else). Using generate/endgenerate, Verilog-2001 can
instantiate an array of instances, with control over the connectivity of the individual instances.
File I/O has been improved by several new system tasks. And finally, a few syntax additions
were introduced to improve code readability (e.g. always @*, named parameter override, C-style
function/task/module header declaration).
Verilog-2001 is the dominant flavor of Verilog supported by the majority of
commercial EDA software packages.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 35
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
7.6 Verilog 2005
Not to be confused with SystemVerilog, Verilog 2005 (IEEE Standard 1364-2005)
consists of minor corrections, spec clarifications, and a few new language features (such as the
uwire keyword).
A separate part of the Verilog standard, Verilog-AMS, attempts to integrate analog and mixed
signal modeling with traditional Verilog.
7.7 SystemVerilog
SystemVerilog is a superset of Verilog-2005, with many new features and capabilities to
aid design verification and design modeling. As of 2009, the SystemVerilog and Verilog
language standards were merged into SystemVerilog 2009 (IEEE Standard 1800-2009).
The advent of hardware verification languages such as OpenVera, and Verisity's e
language encouraged the development of Superlog by Co-Design Automation Inc. Co-Design
Automation Inc was later purchased by Synopsys. The foundations of Superlog and Vera were
donated to Accellera, which later became the IEEE standard P1800-2005: SystemVerilog.
In the late 1990s, the Verilog Hardware Description Language (HDL) became the most
widely used language for describing hardware for simulation and synthesis. However, the first
two versions standardized by the IEEE (1364-1995 and 1364-2001) had only simple constructs
for creating tests. As design sizes outgrew the verification capabilities of the language,
commercial Hardware Verification Languages (HVL) such as Open Vera and e were created.
Companies that did not want to pay for these tools instead spent hundreds of man-years creating
their own custom tools. This productivity crisis (along with a similar one on the design side) led
to the creation of Accellera, a consortium of EDA companies and users who wanted to create the
next generation of Verilog. The donation of the Open-Vera language formed the basis for the
HVL features of SystemVerilog.Accellera’s goal was met in November 2005 with the adoption
of the IEEE standard P1800-2005 for SystemVerilog, IEEE (2005).
The most valuable benefit of SystemVerilog is that it allows the user to construct reliable,
repeatable verification environments, in a consistent syntax, that can be used across multiple
projects
Some of the typical features of an HVL that distinguish it from a Hardware Description
Language such as Verilog or VHDL are
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 36
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
Constrained-random stimulus generation
Functional coverage
Higher-level structures, especially Object Oriented Programming
Multi-threading and interprocess communication
Support for HDL types such as Verilog’s 4-state values
Tight integration with event-simulator for control of the design
There are many other useful features, but these allow you to create test benches at a
higher level of abstraction than you are able to achieve with an HDL or a programming language
such as C.
System Verilog provides the best framework to achieve coverage-driven verification (CDV).
CDV combines automatic test generation, self-checking testbenches, and coverage metrics to
significantly reduce the time spent verifying a design. The purpose of CDV is to:
Eliminate the effort and time spent creating hundreds of tests.
Ensure thorough verification using up-front goal setting.
Receive early error notifications and deploy run-time checking and error analysis to
simplify debugging.
Examples
Ex1: A hello world program looks like this:
module main;
initial
begin
$display("Hello world!");
$finish;
end
endmodule
Ex2: A simple example of two flip-flops follows:
module toplevel(clock,reset);
input clock;
input reset;
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 37
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
reg flop1;
reg flop2;
always @ (posedge reset or posedge clock)
if (reset)
begin
flop1 <= 0;
flop2 <= 1;
end
else
begin
flop1 <= flop2;
flop2 <= flop1;
end
endmodule
The "<=" operator in Verilog is another aspect of its being a hardware description
language as opposed to a normal procedural language. This is known as a "non-blocking"
assignment. Its action doesn't register until the next clock cycle. This means that the order of the
assignments are irrelevant and will produce the same result: flop1 and flop2 will swap values
every clock.
The other assignment operator, "=", is referred to as a blocking assignment. When "="
assignment is used, for the purposes of logic, the target variable is updated immediately. In the
above example, had the statements used the "=" blocking operator instead of "<=", flop1 and
flop2 would not have been swapped. Instead, as in traditional programming, the compiler would
understand to simply set flop1 equal to flop2 (and subsequently ignore the redundant logic to set
flop2 equal to flop1.)
Ex3: An example counter circuit follows:
module Div20x (rst, clk, cet, cep, count, tc);
// TITLE 'Divide-by-20 Counter with enables'
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 38
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
// enable CEP is a clock enable only
// enable CET is a clock enable and
// enables the TC output
// a counter using the Verilog language
parameter size = 5;
parameter length = 20;
input rst; // These inputs/outputs represent
input clk; // connections to the module.
input cet;
input cep;
output [size-1:0] count;
output tc;
reg [size-1:0] count; // Signals assigned
// within an always
// (or initial)block
// must be of type reg
wire tc; // Other signals are of type wire
// The always statement below is a parallel
// execution statement that
// executes any time the signals
// rst or clk transition from low to high
always @ (posedge clk or posedge rst)
if (rst) // This causes reset of the cntr
count <= {size{1'b0}};
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 39
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
else
if (cet && cep) // Enables both true
begin
if (count == length-1)
count <= {size{1'b0}};
else
count <= count + 1'b1;
end
// the value of tc is continuously assigned
// the value of the expression
assign tc = (cet && (count == length-1));
endmodule
Ex4: An example of delays:
...
reg a, b, c, d;
wire e;
...
always @(b or e)
begin
a = b & e;
b = a | b;
#5 c = b;
d = #6 c ^ e;
end
The always clause above illustrates the other type of method of use, i.e. the always clause
executes any time any of the entities in the list change, i.e. the b or e change. When one of these
changes, immediately a is assigned a new value, and due to the blocking assignment b is
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 40
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
assigned a new value afterward (taking into account the new value of a.) After a delay of 5 time
units, c is assigned the value of b and the value of c ^ e is tucked away in an invisible store. Then
after 6 more time units, d is assigned the value that was tucked away.
Signals that are driven from within a process (an initial or always block) must be of type reg.
Signals that are driven from outside a process must be of type wire. The keyword reg does not
necessarily imply a hardware register.
7.8 Constants
The definition of constants in Verilog supports the addition of a width parameter. The basic
syntax is:
<Width in bits>'<base letter><number>
Examples:
12'h123 - Hexadecimal 123 (using 12 bits)
20'd44 - Decimal 44 (using 20 bits - 0 extension is automatic)
4'b1010 - Binary 1010 (using 4 bits)
6'o77 - Octal 77 (using 6 bits)
7.9 Synthesizable Constructs
There are several statements in Verilog that have no analog in real hardware, e.g.
$display. Consequently, much of the language can not be used to describe hardware. The
examples presented here are the classic subset of the language that has a direct mapping to real
gates.
// Mux examples - Three ways to do the same thing.
// The first example uses continuous assignment
wire out;
assign out = sel ? a : b;
// the second example uses a procedure
// to accomplish the same thing.
reg out;
always @(a or b or sel)
begin
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 41
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
case(sel)
1'b0: out = b;
1'b1: out = a;
endcase
end
// Finally - you can use if/else in a
// procedural structure.
reg out;
always @(a or b or sel)
if (sel)
out = a;
else
out = b;
The next interesting structure is a transparent latch; it will pass the input to the output
when the gate signal is set for "pass-through", and captures the input and stores it upon transition
of the gate signal to "hold". The output will remain stable regardless of the input signal while the
gate is set to "hold". In the example below the "pass-through" level of the gate would be when
the value of the if clause is true, i.e. gate = 1. This is read "if gate is true, the din is fed to
latch_out continuously." Once the if clause is false, the last value at latch_out will remain and is
independent of the value of din.
EX6: // Transparent latch example
reg out;
always @(gate or din)
if(gate)
out = din; // Pass through state
// Note that the else isn't required here. The variable
// out will follow the value of din while gate is high.
// When gate goes low, out will remain constant.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 42
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
The flip-flop is the next significant template; in Verilog, the D-flop is the simplest, and it can be
modeled as:
reg q;
always @(posedge clk)
q <= d;
The significant thing to notice in the example is the use of the non-blocking assignment.
A basic rule of thumb is to use <= when there is a posedge or negedge statement within the
always clause.
A variant of the D-flop is one with an asynchronous reset; there is a convention that the
reset state will be the first if clause within the statement.
reg q;
always @(posedge clk or posedge reset)
if(reset)
q <= 0;
else
q <= d;
The next variant is including both an asynchronous reset and asynchronous set condition;
again the convention comes into play, i.e. the reset term is followed by the set term.
reg q;
always @(posedge clk or posedge reset or posedge set)
if(reset)
q <= 0;
else
if(set)
q <= 1;
else
q <= d;
Note: If this model is used to model a Set/Reset flip flop then simulation errors can result.
Consider the following test sequence of events. 1) reset goes high 2) clk goes high 3) set goes
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 43
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
high 4) clk goes high again 5) reset goes low followed by 6) set going low. Assume no setup and
hold violations.
In this example the always @ statement would first execute when the rising edge of reset
occurs which would place q to a value of 0. The next time the always block executes would be
the rising edge of clk which again would keep q at a value of 0. The always block then executes
when set goes high which because reset is high forces q to remain at 0. This condition may or
may not be correct depending on the actual flip flop. However, this is not the main problem with
this model. Notice that when reset goes low, that set is still high. In a real flip flop this will cause
the output to go to a 1. However, in this model it will not occur because the always block is
triggered by rising edges of set and reset - not levels. A different approach may be necessary for
set/reset flip flops.
Note that there are no "initial" blocks mentioned in this description. There is a split
between FPGA and ASIC synthesis tools on this structure. FPGA tools allow initial blocks
where reg values are established instead of using a "reset" signal. ASIC synthesis tools don't
support such a statement. The reason is that an FPGA's initial state is something that is
downloaded into the memory tables of the FPGA. An ASIC is an actual hardware
implementation.
7.10 Initial Vs Always
There are two separate ways of declaring a Verilog process. These are the always and
the initial keywords. The always keyword indicates a free-running process. The initial keyword
indicates a process executes exactly once. Both constructs begin execution at simulator time 0,
and both execute until the end of the block. Once an always block has reached its end, it is
rescheduled (again). It is a common misconception to believe that an initial block will execute
before an always block. In fact, it is better to think of the initial-block as a special-case of
the always-block, one which terminates after it completes for the first time.
//Examples:
initial
begin
a = 1; // Assign a value to reg a at time 0
#1; // Wait 1 time unit
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 44
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
b = a; // Assign the value of reg a to reg b
end
always @(a or b) // Any time a or b CHANGE, run the process
begin
if (a)
c = b;
else
d = ~b;
end // Done with this block, now return to the top (i.e. the @ event-control)
always @(posedge a)// Run whenever reg a has a low to high change
a <= b;
These are the classic uses for these two keywords, but there are two significant additional
uses. The most common of these is an alwayskeyword without the @(...) sensitivity list. It is
possible to use always as shown below:
always
begin // Always begins executing at time 0 and NEVER stops
clk = 0; // Set clk to 0
#1; // Wait for 1 time unit
clk = 1; // Set clk to 1
#1; // Wait 1 time unit
end // Keeps executing - so continue back at the top of the begin
The always keyword acts similar to the "C" construct while(1) {..} in the sense that it will
execute forever.
The other interesting exception is the use of the initial keyword with the addition of
the forever keyword.
7.11 Race Condition
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 45
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
The order of execution isn't always guaranteed within Verilog. This can best be
illustrated by a classic example. Consider the code snippet below:
initial
a = 0;
initial
b = a;
initial
begin
#1;
$display("Value a=%b Value of b=%b",a,b);
end
What will be printed out for the values of a and b? Depending on the order of execution of the
initial blocks, it could be zero and zero, or alternately zero and some other arbitrary uninitialized
value. The $display statement will always execute after both assignment blocks have completed,
due to the #1 delay.
7.12 Operators
Note: These operators are not shown in order of precedence.
Bitwise
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 46
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
Logical
Reduction
Arithmetic
Relational
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 47
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
Shift
7.13 System Tasks
System tasks are available to handle simple I/O, and various design measurement
functions. All system tasks are prefixed with $ to distinguish them from user tasks and functions.
This section presents a short list of the most often used tasks. It is by no means a comprehensive
list.
$display - Print to screen a line followed by an automatic newline.
$write - Write to screen a line without the newline.
$swrite - Print to variable a line without the newline.
$sscanf - Read from variable a format-specified string. (*Verilog-2001)
$fopen - Open a handle to a file (read or write)
$fdisplay - Write to file a line followed by an automatic newline.
$fwrite - Write to file a line without the newline.
$fscanf - Read from file a format-specified string. (*Verilog-2001)
$fclose - Close and release an open file handle.
$readmemh - Read hex file content into a memory array.
$readmemb - Read binary file content into a memory array.
$monitor - Print out all the listed variables when any change value.
$time - Value of current simulation time.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 48
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
$dumpfile - Declare the VCD (Value Change Dump) format output file name.
$dumpvars - Turn on and dump the variables.
$dumpports - Turn on and dump the variables in Extended-VCD format.
$random - Return a random value
8. CHAPTER
8.1 CONCLUSION
This paper proposed a fast multiplier architecture for signed Q-format multiplications
using Urdhava Tiryakbhyam method of Vedic mathematics. Since Q-format representation is
widely used in Digital Signal Processors the proposed multiplier can substantially speed up the
multiplication operation which is the basic hardware block. They occupy less area and are faster
than the booth multipliers. It is also shown that the Urdhava Tiryakbhyam Q-format multiplier is
faster than Xilinx parallel multiplier Intellectual Property (IP) using slice LUT’s and also using
DSP48E blocks which are meant specifically for digital signal processing operations like
multiply-accumulate, multiply-add etc. Therefore the Urdhava Tiryakbhyam Q-format multiplier
is best suited for signal processing applications requiring faster multiplications.
8.2 FUTURE SCOPE
Future work lies in the direction of introducing pipeline stages in the multiplier
architecture for maximizing throughput.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 49
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
REFERENCES
[1] Wallace, C.S., “A suggestion for a fast multiplier,” IEEE Trans. Elec. Comput., vol. EC-13,
no. 1, pp. 14-17, Feb. 1964.
[2] Booth, A.D., “A signed binary multiplication technique,” Quarterly Journal of Mechanics and
Applied Mathematics, vol. 4, pt. 2, pp. 236- 240, 1951.
[3] Jagadguru Swami Sri Bharath, Krsna Tirathji, “Vedic Mathematics or Sixteen Simple Sutras
From The Vedas”, Motilal Banarsidas, Varanasi(India),1986.
[4] A.P. Nicholas, K.R Williams, J. Pickles, “Application of Urdhava Sutra”, Spiritual Study
Group, Roorkee (India),1984.
[5] Neil H.E Weste, David Harris, Ayan anerjee,”CMOS VLSI Design, A Circuits and Systems
Perspective”,Third Edition, Published by Person Education, PP-327-328]
[6] Mrs. M. Ramalatha, Prof. D. Sridharan, “VLSI Based High Speed Karatsuba Multiplier for
Cryptographic Applications Using Vedic Mathematics”, IJSCI, 2007
[7] M.C. Hanumantharaju, H. Jayalaxmi, R.K. Renuka, M. Ravishankar, "A High Speed Block
Convolution Using Ancient Indian Vedic Mathematics," ICCIMA, vol. 2, pp.169-173,
International Conference on Computational Intelligence and Multimedia Applications, 2007.
[8] Harpreet Singh Dhilon And Abhijit Mitra,“A Reduced-Bit Multiplication Algorithm For
Digital Arithmetic” International Journal of Computational and Mathematical Sciences, Waset,
Spring, 2008.
[9] S. Kumaravel, Ramalatha Marimuthu, "VLSI Implementation of High Performance RSA
Algorithm Using Vedic Mathematics," ICCIMA, vol. 4,pp.126-128, International Conference on
Computational Intelligence and Multimedia Applications (ICCIMA 2007), 2007.
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 50
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
9.CHAPTER9.1 SOURCE CODE`timescale 1ns / 1ps//////////////////////////////////////////////////////////////////////////////////// Company: // Engineer: // // Create Date: 19:11:00 12/05/2012 // Design Name: // Module Name: u7 // Project Name: // Target Devices: // Tool versions: // Description: //// Dependencies: //// Revision: // Revision 0.01 - File Created// Additional Comments: ////////////////////////////////////////////////////////////////////////////////////module ha(input a,b,output sum ,car);assign sum=a^b;assign car=a&b;endmodule
`timescale 1ns / 1ps
//////////////////////////////////////////////////////////////////////////////////// Company: // Engineer: // // Create Date: 15:31:34 12/05/2012 // Design Name: // Module Name: prime // Project Name: // Target Devices: // Tool versions: // Description: //// Dependencies: //// Revision: // Revision 0.01 - File Created// Additional Comments: ////////////////////////////////////////////////////////////////////////////////////
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 51
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
module fa(input a,b,c,output sum,car);
assign sum=a^b^c;assign car=(a&b)|(b&c)|(c&a);endmodule
`timescale 1ns / 1ps
//////////////////////////////////////////////////////////////////////////////////// Company: // Engineer: // // Create Date: 13:23:41 12/01/2012 // Design Name: // Module Name: copy // Project Name: // Target Devices: // Tool versions: // Description: //// Dependencies: //// Revision: // Revision 0.01 - File Created// Additional Comments: /////////////////////////////////////////////////////////////////////////////////
module copy#(parameter n=16)(input [n-1:0] mr,mc,output [2*n-1:0] y);reg [n+1:0] m;reg [2*n-1:0] y1,y2,y3,y4,y5,y6,y7,y8,y9,yb;integer i;
always@(*) beginbegin m={mr,1'b0}; yb={16'b0,mc}; end
beginif (m[2:0]==3'b000||m[2:0]==3'b111) y1=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y1=yb; else if (m[2:0]==3'b011) y1=yb+yb; else if (m[2:0]==3'b100) y1=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y1=(~yb+1'b1); end
beginm=m>>2;
if (m[2:0]==3'b000||m[2:0]==3'b111) y2=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y2=yb; else if (m[2:0]==3'b011) y2=yb+yb;
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 52
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
else if (m[2:0]==3'b100) y2=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y2=(~yb+1'b1); end
beginm=m>>2;
if (m[2:0]==3'b000||m[2:0]==3'b111) y3=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y3=yb; else if (m[2:0]==3'b011) y3=yb+yb; else if (m[2:0]==3'b100) y3=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y3=(~yb+1'b1); end
beginm=m>>2;
if (m[2:0]==3'b000||m[2:0]==3'b111) y4=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y4=yb; else if (m[2:0]==3'b011) y4=yb+yb; else if (m[2:0]==3'b100) y4=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y4=(~yb+1'b1);
end
beginm=m>>2;
if (m[2:0]==3'b000||m[2:0]==3'b111) y5=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y5=yb; else if (m[2:0]==3'b011) y5=yb+yb; else if (m[2:0]==3'b100) y5=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y5=(~yb+1'b1); end
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 53
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
beginm=m>>2;
if (m[2:0]==3'b000||m[2:0]==3'b111) y6=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y6=yb; else if (m[2:0]==3'b011) y6=yb+yb; else if (m[2:0]==3'b100) y6=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y6=(~yb+1'b1); end
beginm=m>>2;
if (m[2:0]==3'b000||m[2:0]==3'b111) y7=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y7=yb; else if (m[2:0]==3'b011) y7=yb+yb; else if (m[2:0]==3'b100) y7=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y7=(~yb+1'b1); end
beginm=m>>2;
if (m[2:0]==3'b000||m[2:0]==3'b111) y8=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y8=yb; else if (m[2:0]==3'b011) y8=yb+yb; else if (m[2:0]==3'b100) y8=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y8=(~yb+1'b1); end
beginm=m>>2;
if (m[2:0]==3'b000||m[2:0]==3'b111) y9=32'b0; else if (m[2:0]==3'b001||m[2:0]==3'b010) y9=yb; else if (m[2:0]==3'b011) y9=yb+yb; else if (m[2:0]==3'b100) y9=(~yb+1'b1)+(~yb+1'b1); else if (m[2:0]==3'b101||m[2:0]==3'b110) y9=(~yb+1'b1); endend
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 54
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
assign y[1:0]=y1[1:0];
ha u1(y1[2],y2[0],y[2],c1);
fa u2(y1[3],y2[1],c1,y[3],c2);
fa u3(y1[4],y2[2],y3[0],s1,c3);ha u4(s1,c2,y[4],c4);
fa u5(y1[5],y2[3],y3[1],s2,c5);fa u6(s2,c3,c4,y[5],c6);
fa u7(y1[6],y2[4],y3[2],s3,c7);fa u8(s3,y4[0],c5,s4,c8);ha u9(s4,c6,y[6],c9);
fa u10(y1[7],y2[5],y3[3],s5,c10);fa u11(s5,y4[1],c7,s6,c11);fa u12(s6,c8,c9,y[7],c12);
fa u13(y1[8],y2[6],y3[4],s7,c13);fa u14(s7,y4[2],y5[0],s8,c14);fa u15(s8,c10,c11,s9,c15);ha u16(s9,c12,y[8],c16);
fa u17(y1[9],y2[7],y3[5],s10,c17);fa u18(s10,y4[3],y5[1],s11,c18);fa u19(s11,c13,c14,s12,c19);fa u20(s12,c15,c16,y[9],c20);
fa u21(y1[10],y2[8],y3[6],s13,c21);fa u22(s13,y4[4],y5[2],s14,c22);fa u23(s14,c17,c18,s15,c23);fa u24(s15,c19,c20,s16,c24);ha u25(s16,y6[0],y[10],c25);
fa u26(y1[11],y2[9],y3[7],s17,c26);fa u27(s17,y4[5],y5[3],s18,c27);fa u28(s18,c21,c22,s19,c28);fa u29(s19,c23,c24,s20,c29);fa u30(s20,c25,y6[1],y[11],c30);
fa u31(y1[12],y2[10],y3[8],s21,c31);fa u32(s21,y4[6],y5[4],s22,c32);fa u33(s22,c26,c27,s23,c33);fa u34(s23,c28,c29,s24,c34);fa u35(s24,c30,y6[2],s25,c35);ha u36(s25,y7[0],y[12],c36);
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 55
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
fa u37(y1[13],y2[11],y3[9],s26,c37);fa u38(s26,y4[7],y5[5],s27,c38);fa u39(s27,c31,c32,s28,c39);fa u40(s28,c33,c34,s29,c40);fa u41(s29,c35,y6[3],s30,c41);fa u42(s30,y7[1],c36,y[13],c42);
fa u43(y1[14],y2[12],y3[10],s31,c43);fa u44(s31,y4[8],y5[6],s32,c44);fa u45(s32,c37,c38,s33,c45);fa u46(s33,c39,c40,s34,c46);fa u47(s34,c41,y6[4],s35,c47);fa u48(s35,y7[2],c42,s36,c48);ha u49(s36,y8[0],y[14],c49);
fa u50(y1[15],y2[13],y3[11],s37,c50);fa u51(s37,y4[9],y5[7],s38,c51);fa u52(s38,c43,c44,s39,c52);fa u53(s39,c45,c46,s40,c53);fa u54(s40,c47,y6[5],s41,c54);fa u55(s41,y7[3],c48,s42,c55);fa u56(s42,y8[1],c49,y[15],c56);
fa u57(y1[16],y2[14],y3[12],s43,c57);fa u58(s43,y4[10],y5[8],s44,c58);fa u59(s44,c50,c51,s45,c59);fa u60(s45,c52,c53,s46,c60);fa u61(s46,c54,y6[6],s47,c61);fa u62(s47,y7[4],c55,s48,c62);fa u63(s48,y8[2],c56,s49,c63);ha u64(s49,y9[0],y[16],c64);
fa u65(y1[17],y2[15],y3[13],s50,c65);fa u66(s50,y4[11],y5[9],s51,c66);fa u67(s51,c57,c58,s52,c67);fa u68(s52,c59,c60,s53,c68);fa u69(s53,c61,y6[7],s54,c69);fa u70(s54,y7[5],c62,s55,c70);fa u71(s55,y8[3],c63,s56,c71);fa u72(s56,y9[1],c64,y[17],c72);
fa u73(y1[18],y2[16],y3[14],s57,c73);fa u74(s57,y4[12],y5[10],s58,c74);fa u75(s58,c65,c66,s59,c75);fa u76(s59,c67,c68,s60,c76);fa u77(s60,c69,y6[8],s61,c77);fa u78(s61,y7[6],c70,s62,c78);fa u79(s62,y8[4],c71,s63,c79);
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 56
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
fa u80(s63,y9[2],c72,y[18],c80);
fa u81(y1[19],y2[17],y3[15],s64,c81);fa u82(s64,y4[13],y5[11],s65,c82);fa u83(s65,c73,c74,s66,c83);fa u84(s66,c75,c76,s67,c84);fa u85(s67,c77,y6[9],s68,c85);fa u86(s68,y7[7],c78,s69,c86);fa u87(s69,y8[5],c79,s70,c87);fa u88(s70,y9[3],c80,y[19],c88);
fa u90(y1[20],y2[18],y3[16],s71,c89);fa u91(s71,y4[14],y5[12],s72,c90);fa u92(s72,c81,c82,s73,c91);fa u93(s73,c83,c84,s74,c92);fa u94(s74,c85,y6[10],s75,c93);fa u95(s75,y7[8],c86,s76,c94);fa u96(s76,y8[6],c87,s77,c95);fa u97(s77,y9[4],c88,y[20],c96);
fa u98(y1[21],y2[19],y3[17],s78,c97);fa u99(s78,y4[15],y5[13],s79,c98);fa u100(s79,c89,c90,s80,c99);fa u101(s80,c91,c92,s81,c100);fa u102(s81,c93,y6[11],s82,c101);fa u103(s82,y7[9],c94,s83,c102);fa u104(s83,y8[7],c95,s84,c103);fa u105(s84,y9[5],c96,y[21],c104);
fa u106(y1[22],y2[20],y3[18],s85,c105);fa u107(s85,y4[16],y5[14],s86,c106);fa u108(s86,c97,c98,s87,c107);fa u109(s87,c99,c100,s88,c108);fa u110(s88,c101,y6[12],s90,c109);fa u111(s90,y7[10],c102,s91,c110);fa u112(s91,y8[8],c103,s92,c111);fa u113(s92,y9[6],c104,y[22],c112);
fa u114(y1[23],y2[21],y3[19],s93,c113);fa u115(s93,y4[17],y5[15],s94,c114);fa u116(s94,c105,c106,s95,c115);fa u117(s95,c107,c108,s96,c116);fa u118(s96,c109,y6[13],s97,c117);fa u119(s97,y7[11],c110,s98,c118);fa u120(s98,y8[9],c111,s99,c119);fa u121(s99,y9[7],c112,y[23],c120);
fa u122(y1[24],y2[22],y3[20],s100,c121);fa u123(s100,y4[18],y5[16],s101,c122);
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 57
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
fa u124(s101,c113,c114,s102,c123);fa u125(s102,c115,c116,s103,c124);fa u126(s103,c117,y6[14],s104,c125);fa u127(s104,y7[12],c118,s105,c126);fa u128(s105,y8[10],c119,s106,c127);fa u129(s106,y9[8],c120,y[24],c128);
fa u130(y1[25],y2[23],y3[21],s107,c129);fa u131(s107,y4[19],y5[17],s108,c130);fa u132(s108,c121,c122,s109,c131);fa u133(s109,c123,c124,s110,c132);fa u134(s110,c125,y6[15],s111,c133);fa u135(s111,y7[13],c126,s112,c134);fa u136(s112,y8[11],c127,s113,c135);fa u137(s113,y9[9],c128,y[25],c136);
fa u138(y1[26],y2[24],y3[22],s114,c137);fa u139(s114,y4[20],y5[18],s115,c138);fa u140(s115,c129,c130,s116,c139);fa u141(s116,c131,c132,s117,c140);fa u142(s117,c133,y6[16],s118,c141);fa u143(s118,y7[14],c134,s119,c142);fa u144(s119,y8[12],c135,s120,c143);fa u145(s120,y9[10],c136,y[26],c144);
fa u146(y1[27],y2[25],y3[23],s121,c145);fa u147(s121,y4[21],y5[19],s122,c146);fa u148(s122,c137,c138,s123,c147);fa u149(s123,c139,c140,s124,c148);fa u150(s124,c141,y6[17],s125,c149);fa u151(s125,y7[15],c142,s126,c150);fa u152(s126,y8[13],c143,s127,c151);fa u153(s127,y9[11],c144,y[27],c152);
fa u154(y1[28],y2[26],y3[24],s128,c153);fa u155(s128,y4[22],y5[20],s129,c154);fa u156(s129,c145,c146,s130,c155);fa u157(s130,c147,c148,s131,c156);fa u158(s131,c149,y6[18],s132,c157);fa u159(s132,y7[16],c150,s133,c158);fa u160(s133,y8[14],c151,s134,c159);fa u161(s134,y9[12],c152,y[28],c160);
fa u162(y1[29],y2[27],y3[25],s135,c161);fa u163(s135,y4[23],y5[21],s136,c162);fa u164(s136,c153,c154,s137,c163);fa u165(s137,c155,c156,s138,c164);fa u166(s138,c157,y6[19],s139,c165);fa u167(s139,y7[17],c158,s140,c166);fa u168(s140,y8[15],c159,s141,c167);
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 58
A main project on HIGH SPEED PERFORMANCE OF MULTIPLIER
fa u169(s141,y9[13],c160,y[29],c168);
fa u170(y1[30],y2[28],y3[26],s142,c169);fa u171(s142,y4[24],y5[22],s143,c170);fa u172(s143,c161,c162,s144,c171);fa u173(s144,c163,c164,s145,c172);fa u174(s145,c165,y6[20],s146,c173);fa u175(s146,y7[18],c166,s147,c174);fa u176(s147,y8[16],c167,s148,c175);fa u177(s148,y9[14],c168,y[30],c176);
fa u178(y1[31],y2[29],y3[27],s149,c177);fa u179(s149,y4[25],y5[23],s150,c180);fa u180(s150,c169,c170,s151,c181);fa u181(s151,c171,c172,s152,c182);fa u182(s152,c173,y6[21],s153,c183);fa u183(s153,y7[19],c174,s154,c184);fa u184(s154,y8[17],c175,s155,c185);fa u185(s155,y9[15],c176,y[31],c186);
endmodule
HELAPURI INSTITUTE OF TECHNOLOGY & SCIENCE-ELURUPage 59