spring 2006ee 5324 - vlsi design ii - © kia bazargan 299 ee 5324 – vlsi design ii kia bazargan...

Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 1

EE 5324 – VLSI Design IIEE 5324 – VLSI Design II

Kia Bazargan

University of Minnesota

Part VII: Floating Point ArithmeticPart VII: Floating Point Arithmetic


Floating-Point vs. Fixed-Point Numbers

• Fixed point has limitations x = 0000 0000. 0000 10012

y = 1001 0000. 0000 00002 Rounding? Overflow? (x2 and y2 under/overflow)

• Floating point: represent numbers in two fixed-width fields: “magnitude” and “exponent” Magnitude: more bits = more accuracy Exponent: more bits = wider range of numbers

s e m± Exponent Magnitude

X =


Floating Point Number Representation• Sign field:

When 0: positive number, when 1, negative

• Exponent: Usually presented as unsigned by adding an offset Example: 4 bits of exponent, offset=8

o Exp=10012 e = 10012-10002 = 00012

o Exp=00102 e = 00102-10002 = 10102 = -6

• Magnitude (also called significand, mantissa) Shift the number to get: 1.xxxx Magnitude is the fractional part (hidden ‘1’) Example: 6 bits of mantissa

o Number=110.0101 shift: 1.100101 mantissa=100101

o Number=0.0001011 shift: 1.011 mantissa=011000


Floating Point Numbers: Example

X = ± 1.m × 2es e (+bias) m± Exponent Magnitude

X =

X1 = + 1.0011101 × 220 1 0 1 0 0 0 1 1 1 0 1X1 =

X2 = + 1. 1 × 2-60 0 0 1 0 1 0 0 0 0 0 0X2 =

X3 = - 1.0000001 × 231 1 0 1 1 0 0 0 0 0 0 1X3 =

X4 = + 1.0000000 × 2-8

= 00 0 0 0 0 0 0 0 0 0 0 0X4 =

X5 = + 1.0000000 × 27

= +0 1 1 1 1 0 0 0 0 0 0 0X5 =


- +0

Underflow Regions

Overflow Region

Overflow Region

Positivenumbers

Negativenumbers

FLP- FLP+ maxmin-max -min

Denser Sparser. . . . . .

DenserSparser. . . . . .

Floating Point Number Range• Range: [-max, -min] [min, max]

Min = smallest magnitude x 2smallest exponent

Max = largest magnitude x 2largest exponent

• What happens if: We increase # bits for exponent? Increase # bits for magnitude?

• Ref: http://steve.hollasch.net/cgindex/coding/ieeefloat.html ftp://download.intel.com/technology/itj/q41999/pdf/

ia64fpbf.pdf

[© Oxford U Press]


Floating Point Operations

• Addition/subtraction, multiplication/division, function evaluations, ...

• Basic operations Adding exponents / magnitudes Multiplying magnitudes Aligning magnitudes (shifting, adjusting the

exponent) Rounding Checking for overflow/underflow Normalization (shifting, adjusting the

exponent)


No need to normalize in this case

Floating Point Addition• More difficult than multiplication!• Operations:

Align magnitudes (so that exponents are equal) Add (and round) Normalize (result in the form of 1.xxx)

X = + 1.0011101 × 230 1 0 1 1 0 0 1 1 1 0 1X =

y = + 1.1010011 × 200 1 0 0 0 1 0 1 0 0 1 1y =

y = + 0.0011010 × 230 1 0 1 1 0 0 1 1 0 1 0y =

x+y= +1.0110111 × 230 1 0 1 1 0 1 1 0 1 1 1x+y=


Floating Point Adder Architecture

Unpack

Complement/swapSubtract

ExponentsAlign Magnitudes

Add Magnitudes

Normalize

Round/Complement

Normalize

Pack

AdjustExponent

AdjustExponent

SignLogic+/- Cin

Cout

[© Oxford U Press]


Floating Point Adder Components

• Unpacking Inserting the “hidden 1” Checking for special inputs (NaN, zero)

• Exponent difference Used in aligning the magnitudes A few bits enough for subtraction

o If 32-bit magnitude adder, 8 bits of exponent, only 5 bits involved in subtraction

If negative difference, swap, use positive diffo How to compute the positive diff?

• Pre-shifting and swap Shift/complement provided for one operand only Swap if needed


Floating Point Adder Components (cont.)

• Rounding Three extra bits used for rounding

• Post-shifting Result in the range (-4, 4) z = Coutz1z0.z-1z-2… Right shift: 1 bit max

o If Cout z1 right shift

Left shift: up to # of bits in magnitudeo Determine # of consecutive 0’s (1’s) in z, beginning

with z1.

Adjust exponent accordingly

• Packing Check for special results (zero, under-/overflow) Remove the hidden 1


Counting vs. Predicting Leading Zeros/Ones

Shiftamount

Post-shifter

Magnitude Adder

AdjustExponent

CountLeading

0/1

Post-Shifter

Magnitude Adder

AdjustExponent Shift

amount

Predict Leading

0/1

Counting:Simpler but on the

critical path

Predicting:More complexarchitecture

[© Oxford U Press]


Floating Point Multiplication

• Simpler than floating-point addition• Operation:

Inputs: z1= ± 1.m1 × 2e1 z2= ± 1.m2 × 2e2

Output = ± (1.m1 × 1.m2) × 2e1+e2 Sign: XOR Exponent:

o Tentatively computed as e1+e2 o Subtract the bias (=127) HOW?o Adjusted after normalization

Magnitudeo Result in the range [1,4) (inputs in the range [1,2) )o Normalization: 1- or 2-bit shift right, depending on roundingo Result is 2.(1+m) bits, should be rounded to (1+m) bitso Rounding can gradually discard bits, instead of one last

stage


Floating Point Multiplier Architecture

Note:Pipelining is used in magnitude multiplier, as well as block boundaries

Unpack

XOR AddExponents

Normalize AdjustExponent

Pack

Round

Normalize

MultiplyMagnitudes

Floating-point operands

Product

AdjustExponent

[© Oxford U Press]


Square-Rooting

• Most important elementary function• In IEEE standard, specified a basic

operation (alongside +,-,*,/)• Very similar to division• Pencil-and-paper method:

Radicand: z=z2k-1z2k-2…z1z0

Square root: qk-1qk-2…q1q0

Remainder (z-q2) sksk-1sk-2…s1s0 (k+1 digits)


Append digits

×2

×2

Square Rooting: Example

• Example: sqrt(9 52 41)

q2 q1 q0 q q(0)=0

9 52 41 = z q2=3 q(1)=39

0 52 6q1 × q1 52 q1=0 q(2)=3000

52 41 60q0 × q0 5241 q0=8 q(3)=30848 64

03 77 s = 377 q=308


Square Rooting: Example (cont.)• Why double the partial root?

Partial root after step 2 is: q(2) = 30 Appending the next digit q0 10 × q(2) + q0 Square of which is 100×(q(2))2 + 20×q(2)×q0 + q0

2 The term 100×(q(2))2 already subtracted Find q0 such that (10×(2×q(2)) + q0) × q0 is the

max number partial remainder

• The binary case: Square of 2×q(2) + q0 is:

4×(q(2))2 + 4×q(2)×q0 + q02

Find q0 such that (4×q(2) + q0) × q0 is partial remainder

For q0=1, the expression becomes 4×q(2)+1 (i.e., append “01” to the partial root)


Square Rooting: Example Base 2

• Example: sqrt(011101102) = sqrt(118)q3 q2 q1 q0 q q(0)=0

01 11 01 10 = z=(118)10 q3=1 q(1)=101

00 11 101 ? No q2=0 q(2)=10 0 00

0 11 01 1001 ? Yes q1=1 q(3)=10110 01

01 00 10 10101 ? No q0=0 q(4)=101000 00 00

1 00 10 s=1810 q=10102=1010


Sequential Shift/Subtract Square Rooter Architecture

Square root

Load

sub

(l+2)-bit adder

Trial Difference

l+2

Partial Remainder

q-j

2s(j-1)MSB of

Put z - 1 here at the outset

SelectRoot Digit

l+2

CinCout

Complement

[© Oxford U Press]


Other Methods for Square Rooting

• Restoring vs. non-restoring We looked at the restoring algorithm

(after subtraction, restore partial remainder if the result is negative)

Non-restoring:Use a different encoding (use digits {-1,1} instead of {0,1}) to avoid restoring

• High-radix Similar to modified Booth encoding

multiplication: take care of more number of bits at a time

More complex circuit, but faster


• Convergence methods Use the Newton method to approximate the

function f(x) = x2 – z approximates x=z OR f(x) = 1/x2 – z approximates x=1/z , multiply by z to get z

Iteratively improve the accuracy Can use lookup table for the first iteration

Other Methods for Square Rooting (cont.)


Square Rooting: Abstract Notation

q

z-q3 (q(0) 0q3) 26

-q2 (q(1) 0q2) 24

-q1 (q(2) 0q1) 22

-q0 (q(3) 0q0) 20

s

Floating point format: - Shift left (not right) - Powers of 2 decreasing


Restoring Floating-Point Square Root Calc.

z 0 1 . 1 1 0 1 1 0 (118/64)

s(0) = z - 1 0 0 0 . 1 1 0 1 1 0 q0=1 q(0)=1.

2s(0) 0 0 1 . 1 0 1 1 0 0 -[2× ( 1.)+2

-1] 1 0 . 1

s(1) 1 1 1 . 0 0 1 1 0 0 q-1

=0 q(1)= 1.0 s(1) = 2 s(0) 0 0 1 . 1 0 1 1 0 02s(1) 0 1 1 . 0 1 1 0 0 0 -[2× ( 1.0)+2 -2] 1 0 . 0 1

s(2) 0 0 1 . 0 0 1 0 0 0 q-2

=1 q(2)= 1.012s(2) 0 1 0 . 0 1 0 0 0 0-[2× ( 1.01)+2 -3] 1 0 . 1 0 1

s(3) 1 1 1 . 1 0 1 0 0 0 q-3

=0 q(3)= 1.010s(3) = 2 s(2) 0 1 0 . 0 1 0 0 0 02s(3) 1 0 0 . 1 0 0 0 0 0-[2× ( 1.010)+2

-4] 1 0 . 1 0 0 1

Restore

Restore

[© Oxford U Press]


Restoring Floating-Point Sq. Root Calc. (cont.)

s(4) 0 0 1 . 1 1 1 1 0 0 q-4

=1 q(4)= 1.01012s(4) 0 1 1 . 1 1 1 0 0 0-[2× (1.0101)+2 -5] 1 0 . 1 0 1 0 1

s(5) 0 0 1 . 0 0 1 1 1 0 q-5

=1 q(5)= 1.010112s(5) 0 1 0 . 0 1 1 1 0 0-[2×( 1.01011)+2 -6] 1 0 . 1 0 1 1 0 1

s(6) 1 1 1 . 1 0 1 1 1 1 q-6

=0 q(6)= 1.010110s(6) = 2 s(5) 0 1 0 . 0 1 1 1 0 0 (156/64)

s (true remainder) 0 . 0 0 0 0 1 0 0 1 1 1 0 0q 1 . 0 1 0 1 1 0 (86/64)

Restore

s(3) 1 1 1 . 1 0 1 0 0 0 q-3

=0 q(3)= 1.010s(3) = 2 s(2) 0 1 0 . 0 1 0 0 0 02s(3) 1 0 0 . 1 0 0 0 0 0-[2× ( 1.010)+2

-4] 1 0 . 1 0 0 1

Restore

[© Oxford U Press]

(156/642)


Nonrestoring Floating-Point Square Root Calc.

z 0 1 . 1 1 0 1 1 0 (118/64)

s(0) = z - 1 0 0 0 . 1 1 0 1 1 0 q0

=1 q(0)=1.2 s(0) 0 0 1 . 1 0 1 1 0 0 q

-1=1 q(1)=1.1

-[2× ( 1.)+2-1

] 1 0 . 1 s(1) 1 1 1 . 0 0 1 1 0 0 q

-2=-1 q(2)=1.01

2 s(1) 1 1 0 . 0 1 1 0 0 0 +[2× ( 1.1)-2 -2] 1 0 . 1 1 s(2) 0 0 1 . 0 0 1 0 0 0 q

-3=1 q(3)=1.011

2 s(2) 0 1 0 . 0 1 0 0 0 0-[2× ( 1.01)+2 -3] 1 0 . 1 0 1 s(3) 1 1 1 . 1 0 1 0 0 0 q

-4=-1 q(4)=1.0101

2 s(3) 1 1 1 . 0 1 0 0 0 0+[2× ( 1.011)-2

-4] 1 0 . 1 0 1 1

s(4) 0 0 1 . 1 1 1 1 0 0 q-5

=1 q(5)=1.010112 s(4) 0 1 1 . 1 1 1 0 0 0-[2× (1.0101 )+2

-5] 1 0 . 1 0 1 0 1


Nonrestoring FP Square Root Calc. (cont.)

s(4) 0 0 1 . 1 1 1 1 0 0 q-5

=1 q(5)=1.010112 s(4) 0 1 1 . 1 1 1 0 0 0-[2× (1.0101 )+2

-5] 1 0 . 1 0 1 0 1

s(5) 0 0 1 . 0 0 1 1 1 0 q-6

=1 q(6)=1.0101112 s(5) 0 1 0 . 0 1 1 1 0 0-[2×( 1.01011)+2 -6] 1 0 . 1 0 1 1 0 1

s(6) 1 1 1 . 1 0 1 1 1 1 Negative (-17/64) 1 0 . 1 0 1 1 0 1 Correct

s(6) (corrected) 0 1 0 . 0 1 1 1 0 0 (156/64)s (true remainder) 0 . 0 0 0 0 1 0 0 1 1 1 0 0q (signed-digit) (87/64)q (corrected bin) 1 . 0 1 0 1 1 0 (86/64)

1 . 1 -1 1 -1 1 1

If final S negative, drop the last ‘1’ in q, and restore the remainder to the last positive value.

s(6)=2 S(5)


x(0) read out from table = 1.5 accurate to 10-1

x(1) = 0.5(x(0) +2.4/x(0)) = 1.550 000 000 accurate to 10-2

x(2) = 0.5(x(1) +2.4/x(1)) = 1.549 193 548 accurate to 10-4

x(3) = 0.5(x(2) +2.4/x(2)) = 1.549 193 338 accurate to 10-8

[Par00] p354

Square Root Through Convergence

• Newton-Rapson method: Choose f(x)=x2-z x(i+1) = x(i) – f(x(i)) / f’(x(i)) x(i+1) = 0.5 (x(i) + z / x(i))

• Example: compute square root of z=(2.4)10


Non-Restoring Parallel Square Rooter

q-3

z-1

1

1

1

10

0

0

0

1

Cell

FA

XOR

z-2

z-3 z-4

z-5 z-6

z-7 z-8

q-4

q-2

q-1

s-1 s-2 s-3 s-4 s-5 s-6 s-7 s-8

[© Oxford U Press]


Function Evaluation

• We looked at square root calculation Direct hardware implementation (binary, BSD,

high-radix)o Serialo Parallel

Approximation (Newton method)

• What about other functions? Direct implementation

o Example: log2 x can be directly implemented in hardware (using square root as a sub-component)

Polynomial approximation Table look-up

o Either as part of calculation or for the full calculation


Table Lookup

2u x vtable

Result(s) bits

Operand(s) bitsu

v

[© Oxford U Press]

Post-processinglogic

Smaller

table(s)

Operand(s) bitsu

Result(s) bitsv

.

.

.

. . .Pre

pro

cessin

gLo

gic

Direct table-lookupimplementation

Table-lookup with pre-and post-processing


×

Linear Interpolation Using Four Subintervals

x x

f(x)

f(x)

4-entry tables

a

x

2-bit address

min maxx

x

b /4

RadixPoint

(i)(i)

+

a(0)+b(0)xa(1)+b(1)x

a(2)+b(2)x

a(3)+b(3)x

4x

[© Oxford U Press]


Piecewise Table Lookup

Table 2 m*

d

d-bit output

b-h h

Z mod p

b-bit inputz

Adder

Table 1 v

d*

d*-h h d*

d*

Table 1

Table 2

v

d d

Adder

Adder

-p

Mux

d-bit output

b-bit inputb-g g

d d

d+1

ddSign

d+1

z

z mod p

LvH

[© Oxford U Press]


Accuracy vs. Lookup Table Size Trade-off

Wors

t-ca

se a

bso

lute

err

or

10

10

10

10

10

10

10

10

10

-1

-2

-3

-4

-5

-6

-7

-8

-9

Number of address bits (h)

Linear

2nd-degree

3rd-degree

10 4 8 6 2 0

[© Oxford U Press]


Useful Links

• M. E. Phair, “Free Floating-Point Madness!”, http://www.hmc.edu/chips/

spring 2006ee 5324 - vlsi design ii - © kia bazargan 299 ee 5324 – vlsi design ii kia bazargan...

Documents