aiccsa’06 sharja 1 a cad tool for scalable floating point adder design and generation using...

AICCSA’06 Sharja 1

A CAD Tool for Scalable Floating Point Adder Design and Generation

Using C++/VHDL

By

Asim J. Al-Khalili


Overview

•Introduction to Floating point Addition•Architecture of Single Path FADD•Activity Scaling•Triple Data Path Floating Point Adder•VHDL Modeling•Results•Implementation


•FP Representation

31 30…….23 22……..0

Sign Exponent Significand

1 bit 8 bit 23 bit

FP Representation --1.XXXXX2 * 2YYYY

(IEEE 754 floating-point standard, single precision)


Floating point Addition Start

1. Compare the exponents of the two numbers.2. Shift the smaller number to the right until its exponent

would match the larger exponent

3. Add the significand

Overflow/Underflow

4. Normalize the sum, either shifting right and incrementing the exponent or shifting left and decrementing the exponent

5. Round the significand to the appropriate number

Exceptions

Still Normalized ?

Done

No

Yes

Yes

No


Architecture Consideration

What’s the best

architecture?


•FP AdderFunction include---

•Sign identification•Exponent comparison•Smaller significand right shift•Significand comparison ( If exp. are equal)•Significand inverter•Addition and Leading Zero anticipation•Normalization shifting left•Rounding•Shift after rounding•Compensation shifting•Exception handler


( 0 /1 B it Righ t S h ifter )

A dde r/R ound ing L og ic

L ead ing Z ero Coun ting log ic

N or ma liz at ion

Re sult Se le cto r

D ata S elec to r/Pre -a lign

E xpone n t S ub trac to r

( L eft Ba rre l S h if ter)

D ata S elec to r

(R igh t B arr el S h ifter )/C om ple m en ter

P re -alignm en t

I npu t Floa ting Po in t N um be rsE xponen ts

C on tro l L og icE xponen t L ogic

B ypass L og ic

R esu l t I n te g ra tion /F la g L og ic

Flags IE E E Sum

A dder/R ound ing L og ic

( 1 b i t R igh t/L e ft N orm aliza tion

S h ifter)

E xponen t Incr /D ec r

Re sult S elec to r

Architecture of TDPFADD


BP (bypass)

BP

LZB LZA

LZB LZA

LZA

LZB

I

JK

BP

Transition activity scaling

• State assertion conditions of TDPFADD

State Active data path

State assertion criterion Activity scaled blocks

I Bypass Either exponent is zero or emax +1 or edif > p

Entire TDPFADD except Bypass data path and Exponent, Control, and Result Int. Flag units

J LZA No Bypass and subtraction and edif 1 (LZsp)

Pre-alignment barrel shifter (large)

K LZB No Bypass and addition or edif 1 (LZs 1)

LZA logic and normalization barrel shifter (large)


With IEEE single precision floating point data format, the probability that the FADD is in states A, B or C is given by

P(A) = 0.8177, P(B) = 0.1765 and P(C) = 0.0058.

Here, it is assumed that the exponents are independent, uniformly distributed random variables and the events of addition and subtraction are equally likely.

With IEEE double precision floating point format

P(A) = 0.9484, P(B) = 0.0509 and P(C) = 7*10-4.

The time averaged power consumption (expected value) of a transition activity scaled FADD whose operational states are represented by Fig. 2 is given by

Power=P(A)* PA + P(B) PB * + P(C ) * PC

where PA, PB and PC represent the time averaged power consumption of the FADD in states A, B and C respectively.

Probabilities of the Paths


Control Log ic Ex pon ent Log ic

Result Integ ratio n / Flag Log ic

E x p o n en ts

I E E E S u m

I n p u t F lo at in g Po in t N u m b e r s

By pass Logic

(0 / 1 B i t R ig h t S h if te r)

A d de r /R o u n di n g Lo g ic

L e a di n g Ze ro

C o u n te r

N or m a l iz a ti o n

R e s ul t S e l e c t o r

D a ta S e l e c t or /P r e -a l ig n

E x po n e n t S u b tr a c t or

(B a rre l S h if t e r Le ft)

D a t a S e le c to r

(B a rre l S hi f t e r R ig h t) /

(1 b it R ig h t/ L e ft

C o m p le m e n te r

P re -a l i gn m e nt

N o rm a li z a t i on

S h if t e r)

E xp o ne nt In c r / D e c r

R e su l t S e l e c t o r

F l a g

1 st

2 nd

3 rd

4t h

5 th

C ri tic a l P a th

A dd e r/ R o u nd i ng Lo g ic

Pipelined TDPFADD


S EXP. SIGNIF.S EXP. SIGNIF.

Exponentdifference

0 1 0 1

control Right shifterBit inverter

LOP2'COMPAdder

complement

NORMAL

Rounding

Sign(d) Sign(d)

Sign opA

Sign opB

add/sub

d

MSBandCout

opA opB

Architecture Consideration

5

2

1

3

4

6

7 Straightforward IEEE Floating-point addition algorithm

1. Exponent subtraction. 2. Alignment.3. Significand addition.4. Conversion.5. Leading-one detection.6. Normalization.7. Rounding.

Advantages: 1. Positive result, Eliminate Complement 2. Comparison // Alignment 3. Full Normal // Rounding


Exponent difference

0 1 0 1

control

Right shifter

Bit inverter

LOP2'COMPAdder

1-bit shifter

NORMAL Rounding

Sign(d) Sign(d)

Sign opA

Sign opB

add/sub

d

MSB and Cout

Comparator

Bit inverter

MUX

opA opB


Compound AdderCompound Adder

How can a compound adder compute

fastest?


Compound Adder

The Compound adder computes simultaneously the sum and the sum plus one, and then the correct rounded result is obtained by selecting according to the requirements of the rounding.

B-ABA

B-A-BA

A-B-B A

A-BB A

nSubtractioEffective

B A

B A

AdditionEffective

11

1

1

1


Architecture Consideration Cont. S EXP. SIGNIF.S EXP. SIGNIF.

Exponentdifference

0 1 0 1

control

1-bit shifter

LOP2'COMPAdder

complement

NORMAL

Rounding

Sign(d) Sign(d)

Sign opA

Sign opB

add/sub

d

opA opB

2'COMPAdder

1-bit shifter

Rounding

Bit inverterRightshifter

Bit inverter

MUXFAR path

Effective addition

Effective subtractionwith d>1

CLOSE path

Effective subtractionwith d=0,1

(Compare to signal path)

Reduce latency

FAR data-path:

--No Conversion

--No Full normalization

--No LOP

CLOSE data-path:

--No Full Alignment

The latency of the floating-point addition

Can be improved if the rounding is

combined with the addition/subtraction.


Exponent difference

0 1 0 1

control

1-bit shifter

LOP

NORMAL

Sign(d) Sign(d)

Sign opA

Sign opB

add/sub

d

opA opB

Compound adder

1-bit shifter

Bit inverterRight shifter

Bit inverter

MUXFAR path

Effective addition

Effective subtraction with d>1

CLOSE path

Effective subtraction with d=0,1

Compound adder

Reduce total path delayReduce total path delay --eliminate Comparator--eliminate ComparatorIncrease areaIncrease area--two 2’s COMP ADDER--two 2’s COMP ADDER


Control

exponent0 1 0 1 0 1

compare right shifter

bit inverter bit inverter

LZA logic

LZA counter

56b adder

exponentsubtract

left shift incrementer

rounding control

selector

compensationshifter

exponentincrementer

difference

sign control

e1 e2 s1 s2exponents significands

sign

1

sign

2

sign ABSENT

ABSENT


.CComparison of low latency architectures of

TDPFADD and single data path FADD using 0.13 micron CMOS technology

Parameters TDPFADD Single data path FADD

Maximum Delay, D (ns) 13.62 19.54

Average Power, Pa (mW) at 16.7 MHz 2.95 15.72

Worst case Power, Pw (mW) at 16.7 MHz 4.21 5.13

Power using real data, Preal (mW) at 16.7 MHz 3.41 4.58

Area, A (104 cell-area) 3.62 2.24

Power-Delay Product, PD (ns.mW) 40.18 307.16

Area-Power Product, AP (104cell-area.mW) 10.68 35.21

Area-Delay Product, AT (104cell-area.ns) 49.30 43.76

Area-Delay2 Product, AT2 (104cell-area.ns2) 671.5 855.2


• Comparison of low latency architectures of TDPFADD and single data path FADD using FPGA technology



Average Power, Pa (W) at 2.38 MHz 0.113 0.204

Worst case Power, Pw (W) at 2.38 MHz 0.196 0.205

Power using real data, Preal (W) at 2.38 MHz 0.138 0.183

Area, A, Total CLBs (#) 115 73.7

Power-Delay Product, PD (ns.10mW) 8.85 22.27

Area-Power Product, AP (10#.10mW) 12.99 15.03

Area-Delay Product, AT (10#.ns) 8196 8048

Area-Delay2 Product, AT2 (10#.ns2) 58.41 x 104 87.90x 104


• Comparison of pipelined architectures of

TDPFADD and single data path FADD using 0.13 micron CMOS technology



Average Power, Pa (mW) at 50 MHz 3.87 6.00

Worst case Power, Pw (mW) at 50 MHz 4.51 5.71

Power using real data, Preal (mW) at 50 MHz 3.94 5.50

Area, A (104 cell-area) 5.46 4.44

Power-Delay Product, PD (ns.mW) 22.36 38.1

Area-Power Product, AP (104cell-area.mW) 21.13 26.64

Area-Delay Product, AT (104cell-area.ns) 31.55 28.19

Area-Delay2 Product, AT2 (104cell-area.ns2) 182.40 179.03


• Comparison of pipelined structures of TDPFADD and

single data path FADD using FPGA technology



Average Power, Pa (W) at 5 MHz 0.089 0.111

Worst case Power, Pw (W) at 5 MHz 0.1130 0.1197

Power using real data, Preal (W) at 5 MHz 0.096 0.1141

Area, A, Total CLBs (#) 147.11 104.66

Power-Delay Product, PD (ns.10mW) 2.999 5.01 11.61Area-Power Product, AP (10#.10mW) 13.09

Area-Delay Product, AT (10#.ns) 4957.60 4718.07

Area-Delay2 Product, AT2 (10#.ns2) 1.67 x 104 21.26 x 104


VHDL Modeling

Design Idea :

1. The length and depth parameters needed by some components are defined in package pkg.vhd

2. The parameters of pkg.vhd are created by C/C++ program with user defined Exponent and Significand length

3. VHDL components and created pkg.vhd together generate FP Adder


VHDL Generation

Get Parameter Length from user

C++ programCalculate needed parameters

Package Pkg.vhd

Structural VHDL code of the floating point adder

Synthesize floating point adder hardware

VHDL code


Calculating the Parameters Using C/C++


Input: Exponent Length = 8 Significand Length = 23

Implementation Example 1


Generated package pkg.vhd :

library ieee;use ieee.std_logic_1164.all;package pkg isconstant Exponent_Length : positive :=8;constant Significand_Length : positive :=23;constant HideSig_Length : positive :=27;constant HideSig_Depth : positive :=5;constant LZA_Length : positive :=28;constant LZA_Depth : positive :=5;constant LZA_P2_Length : positive:=32;end pkg;


The synthesized FP Adder


•Simulation and Test Result

Test1 Two positive operands Shifting , Rounding Test2 Two negative operands Sign Test3 Two operands with different sign Sign Test4 Two operands with different sign

(Different order from Test3) Sign

Test5 Two operands with different sign Leading Zero Test6 One operand is very large and the

Other is quite small Shifting , Rounding

Test7 Two operands with the same sign Shifting due to carry out Test8 Two operands with the same sign Rounding cause carry out Test9 Two operands with different sign Underflow exception Test10 Two operands with the same sign Overflow exception Test11 Two operands with different sign Zero exception


Input: Exponent Length = 4 Significand Length = 11

Implementation Example 2


Generated package pkg.vhd :

library ieee;use ieee.std_logic_1164.all;package pkg isconstant Exponent_Length : positive :=4;constant Significand_Length : positive :=11;constant HideSig_Length : positive :=15;constant HideSig_Depth : positive :=4;constant LZA_Length : positive :=16;constant LZA_Depth : positive :=4;constant LZA_P2_Length : positive:=16;end pkg;


The synthesized FP Adder


The Synthesized FADD


•A scalable-length FP adder is generated•The length of the adder is given by the user through C/C++ • The objective function is also stated•A structural mode FP adder is modeled by VHDL•The adder is Synthesizable•Depending on Power-Area-Delay requirement a Simple/TDPADD/Pipelined/PTDOADD is generated•The adder can also be pipelined

ConclusionConclusion


Control

exponent0 1 0 1 0 1

compare right shifter

bit inverter bit inverter

LZA logic

LZA counter

56b adder

exponentsubtract

left shift incrementer

rounding control

selector

compensationshifter

exponentincrementer

difference

sign control

e1 e2 s1 s2exponents significands

sign

1

sign

2

sign X X

S & M

XXX

S & M

S & M


•VHDL Modeling

1. Package for Length and Depth Parameters

2. Components of the FP Adder

3. Top Configuration of the FP Adder


Input parameters :Significand lengthExponent length

Output parameters:significand length for calculationsignificand length for shiftingsignificand depth for shiftingExponent length

1. Package for Length and Depth Parameters


•Exponent Difference

Calculates the difference of the two exponents.

SUB SUB

MUX

EXP1 EXP2

SelectPositive


•Significand Comparison


A>B if (an>bn) OR (an=bn) AND an-1>bn-1) OR (an=bn AND an-1=bn-1

AND an-2>bn-2) OR…

A>B if an=bn AND an-1=bn-1 AND an-2=bn-2 …

A<B if (an<bn) OR (an=bn) AND an-1<bn-1) OR (an=bn AND an-1=bn-1

AND an-2<bn-2) OR…

Equation for Comparison


•Right Shifter and GRS-bit Generation

Right shift the smaller operands according to the exponents differenceInput (For example) Output

A – 8 bits operand

S – 5 bit, exponent difference.

C7..0 GRS – 11 bits, shifted significand with

extended G, R and S bits


MUX

MUX

MUX

G R S

Significand In

Right shift Out

•Right Shifter and GRS-bit GenerationRight Shift with variable length


•Manchester Adder/Subtractor

Inverter

PG generate

C chain

Sum

Cin

CoutSUM

Sig1 Sig2

+/-


Anticipate the leading zero of the addition result

Input Output

A, B – n bits operand E – n+1 bit significand with anticipated

leading zeros.

•Leading Zero Anticipation Logic

Might one bit anticipate error


•Leading Zero Counter

LZ(X) = LZ(XL), if LZ(XR) equals to 0 LZ(XR) + n/2, if XL are all zeros


•Normalization Shifter (left barrel shifter)


•Rounding Logic

=G(M0+R+S)


•A Half Full Adder

HA FAHA HA Rounding Up

Carry out + LZA errorExp in

Over Flow

Exp out


•Significand

•Exponent

•Sign

•Exception Handling

3. Top Configuration of FP Adder


•Significand

MUX MUX

+/-LZA

LZ Counter

Left Shift

Rounding

Exception

Sig1 Sig2

Significand Out


•Exponent

MUX

Half_Full Adder

SUB

E1 E2

LZA Counter

SelectLarger

Under Flow

Over Flow Rounding up

Carry our + LZAerror

E3


•Sign

Select Logic

1. Sign of larger exponent 2. Exponent equal, sign of larger Significand

MUX

Sign1 Sign2

Sign3

Select Logic


•Exception Handling

"00..0" "11..1" "00..0" "00..0" "00..0"

Exponent_out Significand_out

Denor_out

ControlLogic

Exponent3 Significand3

00 01 10 11 00 01 10 11

Exponent Significand Object represented Control Logic

0 0 0 11

0 Nonzero Denormalized number 01

1 to 254 Anything Floating-Point number 00

255 0 Infinity 10


Comparison of Synthesis results for IEEE 754 Single Precision FP addition Using Xilinx Vertex-2 FPGA

Parameters SIMPLE TDPFADD PIPE/TDPFADD

Maximum delay, D (ns) 327.6 213.8 101.11

Average Power, P (mW)@ 2.38 MHz

1836 1024 382.4

Area A, Total number of CLBs (#)

664 1035 1324

Power Delay Product (ns. 10mW)

7.7. *104 4.31 *104. 3.82 *104

Area Delay Product(10 # .ns)

2.18`*104 2.21 * 104 1.34 *104

Area-Delay2 Product(10# . ns2 )

7.13.*106 4.73 * 106 1.35 *106


Main BlocksWhat blocks are

considered?

• Compound Adder with Flagged Prefix Adder (New)

• LOP with Concurrent Position Correction (New)

• Alignment Shifter

• Normalization Shifter


Compound Adder Cont.• Round to nearest if g=1 if (LSB=1) OR (r+s=1) Add 1 to the result else Truncate at LSB• Round Toward zero Truncate• Round Toward +Infinity if sign=positive if any bits to the right of the result LSB=1 Add 1 to the result else Truncate at LSB if sign=negative Truncate at LSB• Round Toward -Infinity if sign=negative if any bits to the right of the result LSB=1 Add 1 to the result else Truncate at LSB if sign=positive Truncate at LSB

Rounding Block

Sum, Sum+1

Sum

Sum, Sum+1 and Sum+2

Sum, Sum+1 and Sum+2

aiccsa’06 sharja 1 a cad tool for scalable floating point adder design and generation using...

Documents

pa pb pb

exponent difference0

exponent subtraction

larger exponent

lopclose datapath

pcwhere pa

significand addition

single precision3130