comparison of pipelined ieee-754 standard floating point...

Journal of Scientific & Industrial Research

Vol. 65, November 2006, pp. 900-904

Comparison of pipelined IEEE-754 standard floating point multiplier with

unpipelined multiplier

Kavita Khare1,*, R P Singh

1 and Nilay Khare

2

1Department of Electronics and Communication Engineering, MANIT, Bhopal

2CSE & IT Department, University Institute of Technology, Rajiv Gandhi Prodyougiki Vishvavidyalaya, Bhopal

Received 07 April 2005; revised 21 June 2006; accepted 20 July 2006

The IEEE-754 standard floating point multiplier that provides highly precise computations to achieve high throughput

and low area on the IC have been improved by insertion of pipelining technique. Floating point multiplier-using pipelining

has been simulated, analyzed and its superiority over traditional designs is discussed. To achieve pipelining, one must

subdivide the input process into sequence subtasks, each of which can be executed by specialized hardware stage that

operates concurrently with other stages in the pipeline without the need of extra computing units. Detailed synthesis and

simulation report operated upon Xilinx ISE 5.2i and Modelsim software is given. Hardware design is implemented on Virtex

FPGA chips.

Keywords: Floating point adder, IEEE floating point standard, Latency, Model-Sim, VHDL, Xilinx ISE 5.2i

Introduction Until recently, any meaningful floating-point

arithmetic (FPA) has been virtually impossible to

implement on Field Programmable Gate Arrays

(FPGA) based systems due to the limited density and

speed of older FPGAs. In addition, mapping

difficulties occurred due to inherent complexity of

FPA. With the introduction of high-level languages

such as VHDL, rapid prototyping of floating point

units has become possible. Advanced digital signal

processing requires FPA to achieve higher accuracy

and high dynamic range for numerical computation.

The IEEE has produced a standard for FPA. This

standard specifies how single precision (32 bit) and

double precision (64 bit) floating point numbers are to

be represented, as well as how arithmetic should be

carried out on them.

Methodology In this paper, single precision representation is

dealt with1. The IEEE single precision floating point

standard representation requires a 32 bit word, which

may be represented from 0 to 31, left to right. First bit

is the sign bit, s, the next eight bits are the exponent

bits, E, and the final 23 bits are the mantissa, m. In

IEEE-754 format2, the significant always takes on an

implied ‘1’ for the most significant digit assuming the

value represented is normalized (Table 1). Essential

idea behind floating point number systems is to

formulate representations and computation procedures

in which the scaling procedures introduced by fixed-

point systems2-4

.

Value of number, N = (-1) S X 2

(E-127) X (1.m)

where, 0 <E> 255, Actual exponent is: e = E – 127

Magnitude of numbers is in the range: 2-126

(1.0) to

2127

(2-2-23

)

Table 1Single precision floating point number

Exponent Significand Number presented

0 0 0

0 Non zero Denormalized number (May be

returned as a result of underflow in

multiplication)

1 to 254 Anything Floating Point Number

255 0 Infinity.(Positive divided by zero

yields “infinity”)

255 Non zero NaN (Zero divide by zero yields NaN

“Not A Number”)

___________

*Author for correspondence

Tel: 0755-2420777; Fax: 07552670538

E-mail: [email protected]

KHARE et al.: COMPARISON OF PIPELINED IEEE-754 MULTIPLIER WITH UNPIPELINED MULTIPLIER

901

Here pipelining offers an economic way to realize

temporal parallelism in digital systems that achieve

faster clock rates while sacrificing latency1. Most

modern processors, from PCs to supercomputer rely

on pipeline techniques and floating-point multipliers

(FPMs)/adders to achieve high throughput. A new

algorithm for pipeline insertion is developed here and

used for FP multiplication. The method of pipeline

insertion consisted in the introduction of rows of

latches through the multiplier structure, which divides

into rows of cells that operate independently from

each other5.

Multiplication operator expects to produce the

result after a single clock cycle, thus producing a

circuit requiring substantial amounts of CLB

resources. Instead a pipelined approach for the integer

multiplier has been examined to continue producing a

result in each clock cycle. By using a pipelined

multiplier, resource consumption decreases and speed

increases. FPMs are designed and synthesized through

Xilinx ISE 5.2i into a Virtex device.

Floating Point Multiplier and its VHDL Implementation

Assuming that the operands are already in the IEEE

754 format, performing floating-point multiplication

result [R = X * Y = (-1) Xs (Xm × 2Xe) * (-1) Ys

(Ym × 2Ye)] involves the following steps: 1) If one or

both operands is equal to zero, return the result as

zero, otherwise; 2) Compute the sign of the result Xs

XOR Ys; 3) Compute the mantissa of the result [a)

Multiply the mantissas: Xm * Ym; b) Round the

result to the allowed number of mantissa bits]; 4)

Compute the exponent of the result [Result exponent

= biased exponent (X) + biased exponent (Y) – bias];

5) Normalize if needed, by shifting mantissa right,

incrementing result exponent; and 6) Check result

exponent for overflow/underflow [a) If larger than

maximum exponent allowed return exponent

overflow; b) If smaller than minimum exponent

allowed return exponent underflow].

These independent operations within a multiplier

make it ideal for pipelining. The three steps can be

done for multiplier: 1) Unpack the operands, re-insert

the hidden bit, and which for any exceptions on the

operands (such as zeros or NaN); 2) Multiplication of

the significands, calculation of the sign of the two

significands and addition of the exponents takes

place; and 3) Normalization and exponent

adjustment5.

Rounding occurs in floating point multiplication

when the mantissa of the product is reduced from 48

bits to 24 bits. The least significant 24 bits are

discarded. Overflow occurs when the sum of the

exponents exceeds 127, the largest value which is

defined in bias-127 exponent representation. When

this occurs, the exponent is set to 128 (E = 255) and

the mantissa is set to zero indicating + or-infinity.

Underflow occurs when the sum of the exponents is

more negative than -126, the most negative value

which is defined in bias -127 exponent representation.

When this occurs, the exponent is set to -127 (E = 0).

If m = 0, the number is exactly zero. If m is not zero,

then a denormalized number is indicated which has an

exponent of -127 and a hidden bit of 0. The smallest

such number which is not zero is 2-149. This number

retains only a single bit of precision in the rightmost

bit of the mantissa.

Various VHDL modules developed are7,8

:

multiplier_pckg.vhd―declares the various data types,

functions and procedures in the design;

multiplier.vhd―consists of the various component

instantiations and their port mapping; flag_check_

load.vhd―first stage of the pipeline that performs the

function of loading the operands, checking

for the exceptional inputs, compares the exponents

and generates the exponent difference;

prod_sign.vhd―second stage in the pipeline that

shifts the mantissa according to the

exponent difference value generated in the

previous stage; speip_flag.vhd―third stage in the

pipeline that performs the basic addition

or subtraction; and reg.vhd, reg_bit.vhd,

reg_bitvector.vhd, reg_exp.vhd, reg_int.vhd,

reg_mantissa.vhd, reg_mnt.vhd―describe the various

registers used to interface the various stages.

Field Programmable Gate Arrays (FPGA)

FPGA can be volatile or non-volatile. It consists of

a two-dimensional array of logic blocks. Each logic

block is programmable to implement any logic

function. Thus, they are also called configurable logic

blocks (CLBs). Switchboxes or channels contain

interconnection resources that can be programmed to

connect CLBs to implement more complex logic

functions. Designers can use existing CAD tools to

convert HDL code in order to program FPGAs. An

FPGA contains 2,000-2,000,000 gates (or more).

Since FPGA can be reprogrammed, the turn around

time is only a few minutes. Advantages of FPGAs are

J SCI IND RES VOL 65 NOVEMBER 2006

902

lower prototyping costs and shorter production lead


903

times, which advances the time-to-market and in turn

increases profitability. It can also ensure the reliability

of the design on the board9,10

.

Xilinx Vertex-II FPGA used here has input output

blocks (IOB) in two or four on the perimeter of each

device. IOB includes 6 storage elements, each can be

Table 2Comparison between pipelined and unpipelined multipliers

Device utilization summary: [Selected deviceVirtex 2p (2vp50ff1517-6)]

Results Unpipelined multipliers Pipelined multipliers

Number of slices 2222 out of 10304 (21%) 756 out of 24640 (3%)

Number of slice flipflops 102 out of 20608 (0%)

4234 out of 20608 (20%) 305 out of 49280 (0%)

Number of 4 input LUTs 102 out of 588 (17%) 1316 out of 49280 (2%)

Number of bonded IOBs 100 out of 916 (10%)

Timing summary (Speed Grade: -6)

Minimum period 63.812 ns 3.070ns 325.733MHz 1.265ns

Maximum frequency 15.671 MHz 1.265ns

Minimum input arrival time before clock 70.262 ns

Maximum output required time after clock 5.690 ns

Thermal summary multiplier

Estimated junction temperature: 25 25

Ambient temp: 25 25

Case temp: 25 25

Theta J-A: 0C/W 0C/W

Power summary of multiplier

S No. Results Unpipelined multiplier Pipelined multiplier

Power summary I (mA) P (mW) I (mA) P (mW)

1 Total estimated 938 55

2 power consumption

3 Vccint 1.5V: 533 933 300 450

4 Vcc.5V: 2 5 2 5

5 Clocks: 33 57 0 0

6 Nets: 0 0 0 0

7 Logic: 0 0 0 0

8 Inputs: 1 1 0 0

9 Outputs: 0 0

10 Quiescent 1.5V: 500 875 300 450

Quiescent 2.5V: 2 5 2 5

Fig. 1Flow diagram of pipelined multiplier

Fig. 2Chip schematic of pipelined and unpipelined multipliers

J SCI IND RES VOL 65 NOVEMBER 2006

904

Fig. 4Simulation Results of: a) Unpipelined multiplier; b)

Pipelined multiplier

configured as an edge triggered D-Type flip flop or a

level sensitive switch. Device has CLB in arrays of

switch. Each CLB has 4 slices.

Fig. 3RTL Schematic of: a) Unpipelined multiplier (32 pages); b) Pipelined multiplier


905

Results and Conclusions Both unpipelined and pipelined FP multipliers have

been implemented in VHDL (Figs 1-5). Reports of

device utilization summary and timing summary are

given in Table 2. Several units were synthesized of FP

multiplier to quantify the performance and space

requirements under the reported approach. The

synthesis was carried from a VHDL source and the

target device was a Xilinx Virtex-II FPGA

(2V1000FG456–6)11

. Effect of increasing the number

of pipeline stages effectively increases the operating

frequency. If pipelined multiplier is used, device

utilization and power consumption is reduced, further

speed of output increases from 15.671 to

325.733MHz, hence throughput increases (Table 2).

References 1 Khare K, Singh R P & Khare N, Comparison of pipelined

IEEE-754 standard floating point adder with unpipelined

adder, J Sci Ind Res, 64 (2005) 354-357.

2 Shirazi Nabeel & Athanas P, Quantitative analysis of floating

point arithmetic based custom computing machines, IEEE

Symp on FPGA for Custom Computing Machines (Napa

Valley, California) 1995, 333-334.

3 Eldon John A & Robertson Craig, A floating point format for

signal processing, IEEE Acoustics, Speech, and Signal

Processing Conf (USA) 1992, 717-720.

4 Yalamanchi S & Koltur R, Single Precision Floating-Point

Unit, FDU project, 2001.

5 Asato C D, A data-path multiplier with automatic insertion

of pipeline stages, IEEE J Solid-State Circuits, 4 (1990)

383-885.

6 Walters A, Scaleable filter implement using 32 bit floating

point complex arithmetic on a FPGA based custom

computing platform, M S Thesis, Blacksburg, Virginia, 2002.

7 Ashenden Peter J, The Designers Guide to VHDL (Harcourt

Asia Pvt Ltd., Singapore) 2000, 53-335

8 Douglas P, VHDL, 2nd edn (McGraw Hill, Singapore) 1994,

15-165.

9 Armstrong J R & Gray F G, Structured Logic Design with

VHDL (Prentice Hall, India) 1993, 15-139.

10 Eshraghian K & Weste Neil H E, Principle of CMOS and

VLSI Design: A system perspective, 2nd edn (Addision

Wesley Publishing company, Singapore) 1993, 175-459.

11 Puspam Vikram, Miller Andy & Chappman Ken, Xilinx

application notes Xapp 219, Oct 2001.

Fig. 5FPGA Editor of: a) Unpipelined multiplier;

b) Pipelined multiplier

comparison of pipelined ieee-754 standard floating point...

Documents