aiccsa’06 sharja 1 a cad tool for scalable floating point adder design and generation using...
TRANSCRIPT
AICCSA’06 Sharja 1
A CAD Tool for Scalable Floating Point Adder Design and Generation
Using C++/VHDL
By
Asim J. Al-Khalili
AICCSA’06 Sharja 2
Overview
•Introduction to Floating point Addition•Architecture of Single Path FADD•Activity Scaling•Triple Data Path Floating Point Adder•VHDL Modeling•Results•Implementation
AICCSA’06 Sharja 3
•FP Representation
31 30…….23 22……..0
Sign Exponent Significand
1 bit 8 bit 23 bit
FP Representation --1.XXXXX2 * 2YYYY
(IEEE 754 floating-point standard, single precision)
AICCSA’06 Sharja 4
Floating point Addition Start
1. Compare the exponents of the two numbers.2. Shift the smaller number to the right until its exponent
would match the larger exponent
3. Add the significand
Overflow/Underflow
4. Normalize the sum, either shifting right and incrementing the exponent or shifting left and decrementing the exponent
5. Round the significand to the appropriate number
Exceptions
Still Normalized ?
Done
No
Yes
Yes
No
AICCSA’06 Sharja 5
AICCSA’06 Sharja 6
Architecture Consideration
What’s the best
architecture?
AICCSA’06 Sharja 7
•FP AdderFunction include---
•Sign identification•Exponent comparison•Smaller significand right shift•Significand comparison ( If exp. are equal)•Significand inverter•Addition and Leading Zero anticipation•Normalization shifting left•Rounding•Shift after rounding•Compensation shifting•Exception handler
AICCSA’06 Sharja 8
( 0 /1 B it Righ t S h ifter )
A dde r/R ound ing L og ic
L ead ing Z ero Coun ting log ic
N or ma liz at ion
Re sult Se le cto r
D ata S elec to r/Pre -a lign
E xpone n t S ub trac to r
( L eft Ba rre l S h if ter)
D ata S elec to r
(R igh t B arr el S h ifter )/C om ple m en ter
P re -alignm en t
I npu t Floa ting Po in t N um be rsE xponen ts
C on tro l L og icE xponen t L ogic
B ypass L og ic
R esu l t I n te g ra tion /F la g L og ic
Flags IE E E Sum
A dder/R ound ing L og ic
( 1 b i t R igh t/L e ft N orm aliza tion
S h ifter)
E xponen t Incr /D ec r
Re sult S elec to r
Architecture of TDPFADD
AICCSA’06 Sharja 9
BP (bypass)
BP
LZB LZA
LZB LZA
LZA
LZB
I
JK
BP
Transition activity scaling
• State assertion conditions of TDPFADD
State Active data path
State assertion criterion Activity scaled blocks
I Bypass Either exponent is zero or emax +1 or edif > p
Entire TDPFADD except Bypass data path and Exponent, Control, and Result Int. Flag units
J LZA No Bypass and subtraction and edif 1 (LZsp)
Pre-alignment barrel shifter (large)
K LZB No Bypass and addition or edif 1 (LZs 1)
LZA logic and normalization barrel shifter (large)
AICCSA’06 Sharja 10
With IEEE single precision floating point data format, the probability that the FADD is in states A, B or C is given by
P(A) = 0.8177, P(B) = 0.1765 and P(C) = 0.0058.
Here, it is assumed that the exponents are independent, uniformly distributed random variables and the events of addition and subtraction are equally likely.
With IEEE double precision floating point format
P(A) = 0.9484, P(B) = 0.0509 and P(C) = 7*10-4.
The time averaged power consumption (expected value) of a transition activity scaled FADD whose operational states are represented by Fig. 2 is given by
Power=P(A)* PA + P(B) PB * + P(C ) * PC
where PA, PB and PC represent the time averaged power consumption of the FADD in states A, B and C respectively.
Probabilities of the Paths
AICCSA’06 Sharja 11
Control Log ic Ex pon ent Log ic
Result Integ ratio n / Flag Log ic
E x p o n en ts
I E E E S u m
I n p u t F lo at in g Po in t N u m b e r s
By pass Logic
(0 / 1 B i t R ig h t S h if te r)
A d de r /R o u n di n g Lo g ic
L e a di n g Ze ro
C o u n te r
N or m a l iz a ti o n
R e s ul t S e l e c t o r
D a ta S e l e c t or /P r e -a l ig n
E x po n e n t S u b tr a c t or
(B a rre l S h if t e r Le ft)
D a t a S e le c to r
(B a rre l S hi f t e r R ig h t) /
(1 b it R ig h t/ L e ft
C o m p le m e n te r
P re -a l i gn m e nt
N o rm a li z a t i on
S h if t e r)
E xp o ne nt In c r / D e c r
R e su l t S e l e c t o r
F l a g
1 st
2 nd
3 rd
4t h
5 th
C ri tic a l P a th
A dd e r/ R o u nd i ng Lo g ic
Pipelined TDPFADD
AICCSA’06 Sharja 12
S EXP. SIGNIF.S EXP. SIGNIF.
Exponentdifference
0 1 0 1
control Right shifterBit inverter
LOP2'COMPAdder
complement
NORMAL
Rounding
Sign(d) Sign(d)
Sign opA
Sign opB
add/sub
d
MSBandCout
opA opB
Architecture Consideration
5
2
1
3
4
6
7 Straightforward IEEE Floating-point addition algorithm
1. Exponent subtraction. 2. Alignment.3. Significand addition.4. Conversion.5. Leading-one detection.6. Normalization.7. Rounding.
Advantages: 1. Positive result, Eliminate Complement 2. Comparison // Alignment 3. Full Normal // Rounding
S EXP. SIGNIF.S EXP. SIGNIF.
Exponent difference
0 1 0 1
control
Right shifter
Bit inverter
LOP2'COMPAdder
1-bit shifter
NORMAL Rounding
Sign(d) Sign(d)
Sign opA
Sign opB
add/sub
d
MSB and Cout
Comparator
Bit inverter
MUX
opA opB
AICCSA’06 Sharja 13
Compound AdderCompound Adder
How can a compound adder compute
fastest?
AICCSA’06 Sharja 14
Compound Adder
The Compound adder computes simultaneously the sum and the sum plus one, and then the correct rounded result is obtained by selecting according to the requirements of the rounding.
B-ABA
B-A-BA
A-B-B A
A-BB A
nSubtractioEffective
B A
B A
AdditionEffective
11
1
1
1
AICCSA’06 Sharja 15
Architecture Consideration Cont. S EXP. SIGNIF.S EXP. SIGNIF.
Exponentdifference
0 1 0 1
control
1-bit shifter
LOP2'COMPAdder
complement
NORMAL
Rounding
Sign(d) Sign(d)
Sign opA
Sign opB
add/sub
d
opA opB
2'COMPAdder
1-bit shifter
Rounding
Bit inverterRightshifter
Bit inverter
MUXFAR path
Effective addition
Effective subtractionwith d>1
CLOSE path
Effective subtractionwith d=0,1
(Compare to signal path)
Reduce latency
FAR data-path:
--No Conversion
--No Full normalization
--No LOP
CLOSE data-path:
--No Full Alignment
The latency of the floating-point addition
Can be improved if the rounding is
combined with the addition/subtraction.
S EXP. SIGNIF.S EXP. SIGNIF.
Exponent difference
0 1 0 1
control
1-bit shifter
LOP
NORMAL
Sign(d) Sign(d)
Sign opA
Sign opB
add/sub
d
opA opB
Compound adder
1-bit shifter
Bit inverterRight shifter
Bit inverter
MUXFAR path
Effective addition
Effective subtraction with d>1
CLOSE path
Effective subtraction with d=0,1
Compound adder
Reduce total path delayReduce total path delay --eliminate Comparator--eliminate ComparatorIncrease areaIncrease area--two 2’s COMP ADDER--two 2’s COMP ADDER
AICCSA’06 Sharja 16
Control
exponent0 1 0 1 0 1
compare right shifter
bit inverter bit inverter
LZA logic
LZA counter
56b adder
exponentsubtract
left shift incrementer
rounding control
selector
compensationshifter
exponentincrementer
difference
sign control
e1 e2 s1 s2exponents significands
sign
1
sign
2
sign ABSENT
ABSENT
AICCSA’06 Sharja 17
.CComparison of low latency architectures of
TDPFADD and single data path FADD using 0.13 micron CMOS technology
Parameters TDPFADD Single data path FADD
Maximum Delay, D (ns) 13.62 19.54
Average Power, Pa (mW) at 16.7 MHz 2.95 15.72
Worst case Power, Pw (mW) at 16.7 MHz 4.21 5.13
Power using real data, Preal (mW) at 16.7 MHz 3.41 4.58
Area, A (104 cell-area) 3.62 2.24
Power-Delay Product, PD (ns.mW) 40.18 307.16
Area-Power Product, AP (104cell-area.mW) 10.68 35.21
Area-Delay Product, AT (104cell-area.ns) 49.30 43.76
Area-Delay2 Product, AT2 (104cell-area.ns2) 671.5 855.2
AICCSA’06 Sharja 18
• Comparison of low latency architectures of TDPFADD and single data path FADD using FPGA technology
Parameters TDPFADD Single data path FADD
Maximum Delay, D (ns) 71.27 109.21
Average Power, Pa (W) at 2.38 MHz 0.113 0.204
Worst case Power, Pw (W) at 2.38 MHz 0.196 0.205
Power using real data, Preal (W) at 2.38 MHz 0.138 0.183
Area, A, Total CLBs (#) 115 73.7
Power-Delay Product, PD (ns.10mW) 8.85 22.27
Area-Power Product, AP (10#.10mW) 12.99 15.03
Area-Delay Product, AT (10#.ns) 8196 8048
Area-Delay2 Product, AT2 (10#.ns2) 58.41 x 104 87.90x 104
AICCSA’06 Sharja 19
• Comparison of pipelined architectures of
TDPFADD and single data path FADD using 0.13 micron CMOS technology
Parameters TDPFADD Single data path FADD
Maximum Delay, D (ns) 5.78 6.35
Average Power, Pa (mW) at 50 MHz 3.87 6.00
Worst case Power, Pw (mW) at 50 MHz 4.51 5.71
Power using real data, Preal (mW) at 50 MHz 3.94 5.50
Area, A (104 cell-area) 5.46 4.44
Power-Delay Product, PD (ns.mW) 22.36 38.1
Area-Power Product, AP (104cell-area.mW) 21.13 26.64
Area-Delay Product, AT (104cell-area.ns) 31.55 28.19
Area-Delay2 Product, AT2 (104cell-area.ns2) 182.40 179.03
AICCSA’06 Sharja 20
• Comparison of pipelined structures of TDPFADD and
single data path FADD using FPGA technology
Parameters TDPFADD Single data path FADD
Maximum Delay, D (ns) 33.70 45.08
Average Power, Pa (W) at 5 MHz 0.089 0.111
Worst case Power, Pw (W) at 5 MHz 0.1130 0.1197
Power using real data, Preal (W) at 5 MHz 0.096 0.1141
Area, A, Total CLBs (#) 147.11 104.66
Power-Delay Product, PD (ns.10mW) 2.999 5.01 11.61Area-Power Product, AP (10#.10mW) 13.09
Area-Delay Product, AT (10#.ns) 4957.60 4718.07
Area-Delay2 Product, AT2 (10#.ns2) 1.67 x 104 21.26 x 104
AICCSA’06 Sharja 21
VHDL Modeling
Design Idea :
1. The length and depth parameters needed by some components are defined in package pkg.vhd
2. The parameters of pkg.vhd are created by C/C++ program with user defined Exponent and Significand length
3. VHDL components and created pkg.vhd together generate FP Adder
AICCSA’06 Sharja 22
VHDL Generation
Get Parameter Length from user
C++ programCalculate needed parameters
Package Pkg.vhd
Structural VHDL code of the floating point adder
Synthesize floating point adder hardware
VHDL code
AICCSA’06 Sharja 23
Calculating the Parameters Using C/C++
AICCSA’06 Sharja 24
Input: Exponent Length = 8 Significand Length = 23
Implementation Example 1
AICCSA’06 Sharja 25
Generated package pkg.vhd :
library ieee;use ieee.std_logic_1164.all;package pkg isconstant Exponent_Length : positive :=8;constant Significand_Length : positive :=23;constant HideSig_Length : positive :=27;constant HideSig_Depth : positive :=5;constant LZA_Length : positive :=28;constant LZA_Depth : positive :=5;constant LZA_P2_Length : positive:=32;end pkg;
AICCSA’06 Sharja 26
The synthesized FP Adder
AICCSA’06 Sharja 27
AICCSA’06 Sharja 28
AICCSA’06 Sharja 29
•Simulation and Test Result
Test1 Two positive operands Shifting , Rounding Test2 Two negative operands Sign Test3 Two operands with different sign Sign Test4 Two operands with different sign
(Different order from Test3) Sign
Test5 Two operands with different sign Leading Zero Test6 One operand is very large and the
Other is quite small Shifting , Rounding
Test7 Two operands with the same sign Shifting due to carry out Test8 Two operands with the same sign Rounding cause carry out Test9 Two operands with different sign Underflow exception Test10 Two operands with the same sign Overflow exception Test11 Two operands with different sign Zero exception
AICCSA’06 Sharja 30
Input: Exponent Length = 4 Significand Length = 11
Implementation Example 2
AICCSA’06 Sharja 31
Generated package pkg.vhd :
library ieee;use ieee.std_logic_1164.all;package pkg isconstant Exponent_Length : positive :=4;constant Significand_Length : positive :=11;constant HideSig_Length : positive :=15;constant HideSig_Depth : positive :=4;constant LZA_Length : positive :=16;constant LZA_Depth : positive :=4;constant LZA_P2_Length : positive:=16;end pkg;
AICCSA’06 Sharja 32
The synthesized FP Adder
AICCSA’06 Sharja 33
The Synthesized FADD
AICCSA’06 Sharja 34
AICCSA’06 Sharja 35
•A scalable-length FP adder is generated•The length of the adder is given by the user through C/C++ • The objective function is also stated•A structural mode FP adder is modeled by VHDL•The adder is Synthesizable•Depending on Power-Area-Delay requirement a Simple/TDPADD/Pipelined/PTDOADD is generated•The adder can also be pipelined
ConclusionConclusion
AICCSA’06 Sharja 36
Control
exponent0 1 0 1 0 1
compare right shifter
bit inverter bit inverter
LZA logic
LZA counter
56b adder
exponentsubtract
left shift incrementer
rounding control
selector
compensationshifter
exponentincrementer
difference
sign control
e1 e2 s1 s2exponents significands
sign
1
sign
2
sign X X
S & M
XXX
S & M
S & M
AICCSA’06 Sharja 37
•VHDL Modeling
1. Package for Length and Depth Parameters
2. Components of the FP Adder
3. Top Configuration of the FP Adder
AICCSA’06 Sharja 38
Input parameters :Significand lengthExponent length
Output parameters:significand length for calculationsignificand length for shiftingsignificand depth for shiftingExponent length
1. Package for Length and Depth Parameters
AICCSA’06 Sharja 39
•Exponent Difference
Calculates the difference of the two exponents.
SUB SUB
MUX
EXP1 EXP2
SelectPositive
AICCSA’06 Sharja 40
•Significand Comparison
AICCSA’06 Sharja 41
A>B if (an>bn) OR (an=bn) AND an-1>bn-1) OR (an=bn AND an-1=bn-1
AND an-2>bn-2) OR…
A>B if an=bn AND an-1=bn-1 AND an-2=bn-2 …
A<B if (an<bn) OR (an=bn) AND an-1<bn-1) OR (an=bn AND an-1=bn-1
AND an-2<bn-2) OR…
Equation for Comparison
AICCSA’06 Sharja 42
•Right Shifter and GRS-bit Generation
Right shift the smaller operands according to the exponents differenceInput (For example) Output
A – 8 bits operand
S – 5 bit, exponent difference.
C7..0 GRS – 11 bits, shifted significand with
extended G, R and S bits
AICCSA’06 Sharja 43
MUX
MUX
MUX
G R S
Significand In
Right shift Out
•Right Shifter and GRS-bit GenerationRight Shift with variable length
AICCSA’06 Sharja 44
•Manchester Adder/Subtractor
Inverter
PG generate
C chain
Sum
Cin
CoutSUM
Sig1 Sig2
+/-
AICCSA’06 Sharja 45
AICCSA’06 Sharja 46
Anticipate the leading zero of the addition result
Input Output
A, B – n bits operand E – n+1 bit significand with anticipated
leading zeros.
•Leading Zero Anticipation Logic
Might one bit anticipate error
AICCSA’06 Sharja 47
•Leading Zero Counter
LZ(X) = LZ(XL), if LZ(XR) equals to 0 LZ(XR) + n/2, if XL are all zeros
AICCSA’06 Sharja 48
•Normalization Shifter (left barrel shifter)
AICCSA’06 Sharja 49
•Rounding Logic
=G(M0+R+S)
AICCSA’06 Sharja 50
•A Half Full Adder
HA FAHA HA Rounding Up
Carry out + LZA errorExp in
Over Flow
Exp out
AICCSA’06 Sharja 51
•Significand
•Exponent
•Sign
•Exception Handling
3. Top Configuration of FP Adder
AICCSA’06 Sharja 52
•Significand
MUX MUX
+/-LZA
LZ Counter
Left Shift
Rounding
Exception
Sig1 Sig2
Significand Out
AICCSA’06 Sharja 53
•Exponent
MUX
Half_Full Adder
SUB
E1 E2
LZA Counter
SelectLarger
Under Flow
Over Flow Rounding up
Carry our + LZAerror
E3
AICCSA’06 Sharja 54
•Sign
Select Logic
1. Sign of larger exponent 2. Exponent equal, sign of larger Significand
MUX
Sign1 Sign2
Sign3
Select Logic
AICCSA’06 Sharja 55
•Exception Handling
"00..0" "11..1" "00..0" "00..0" "00..0"
Exponent_out Significand_out
Denor_out
ControlLogic
Exponent3 Significand3
00 01 10 11 00 01 10 11
Exponent Significand Object represented Control Logic
0 0 0 11
0 Nonzero Denormalized number 01
1 to 254 Anything Floating-Point number 00
255 0 Infinity 10
AICCSA’06 Sharja 56
Comparison of Synthesis results for IEEE 754 Single Precision FP addition Using Xilinx Vertex-2 FPGA
Parameters SIMPLE TDPFADD PIPE/TDPFADD
Maximum delay, D (ns) 327.6 213.8 101.11
Average Power, P (mW)@ 2.38 MHz
1836 1024 382.4
Area A, Total number of CLBs (#)
664 1035 1324
Power Delay Product (ns. 10mW)
7.7. *104 4.31 *104. 3.82 *104
Area Delay Product(10 # .ns)
2.18`*104 2.21 * 104 1.34 *104
Area-Delay2 Product(10# . ns2 )
7.13.*106 4.73 * 106 1.35 *106
AICCSA’06 Sharja 57
Main BlocksWhat blocks are
considered?
• Compound Adder with Flagged Prefix Adder (New)
• LOP with Concurrent Position Correction (New)
• Alignment Shifter
• Normalization Shifter
AICCSA’06 Sharja 58
Compound Adder Cont.• Round to nearest if g=1 if (LSB=1) OR (r+s=1) Add 1 to the result else Truncate at LSB• Round Toward zero Truncate• Round Toward +Infinity if sign=positive if any bits to the right of the result LSB=1 Add 1 to the result else Truncate at LSB if sign=negative Truncate at LSB• Round Toward -Infinity if sign=negative if any bits to the right of the result LSB=1 Add 1 to the result else Truncate at LSB if sign=positive Truncate at LSB
Rounding Block
Sum, Sum+1
Sum
Sum, Sum+1 and Sum+2
Sum, Sum+1 and Sum+2