chapter - university of isfahanengold.ui.ac.ir/~nikmehr/chapter_03_animated.pdf< 0 ≥ 0 a - b...

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

5th Edition

Chapter 3 Arithmetic for Computers

Chapter 3 — Arithmetic for Computers — 2

Arithmetic for Computers Operations on integers

Addition and subtraction Multiplication and division Dealing with overflow

Floating-point real numbers Representation and operations

§3.1 Introduction


Integer Addition Example: 7 + 6

§3.2 Addition and Subtraction

Overflow if result out of range Adding +ve and –ve operands, no overflow Adding two +ve operands

Overflow if result sign is 1 Adding two –ve operands

Overflow if result sign is 0


Integer Subtraction Add negation of second operand Example: 7 – 6 = 7 + (–6)

+7: 0000 0000 … 0000 0111 –6: 1111 1111 … 1111 1010 +1: 0000 0000 … 0000 0001

Overflow if result out of range Subtracting two +ve or two –ve operands, no overflow Subtracting +ve from –ve operand

Overflow if result sign is 0 Subtracting –ve from +ve operand

Overflow if result sign is 1

Dealing with Overflow

Operation Operand A Operand B Result indicating overflow

A + B ≥ 0 ≥ 0 < 0 A + B < 0 < 0 ≥ 0 A - B ≥ 0 < 0 < 0 A - B < 0 ≥ 0 ≥ 0

Overflow occurs when the result of an operation cannot be represented in 32-bits, i.e., when the sign bit contains a value bit of the result and not the proper sign bit

When adding operands with different signs or when subtracting operands with the same sign, overflow can never occur

Dealing with Overflow

Operation Operand A Operand B Result indicating overflow

A + B ≥ 0 ≥ 0 < 0 A + B < 0 < 0 ≥ 0 A - B ≥ 0 < 0 < 0 A - B < 0 ≥ 0 ≥ 0

Overflow occurs when the result of an operation cannot be represented in 32-bits, i.e., when the sign bit contains a value bit of the result and not the proper sign bit

When adding operands with different signs or when subtracting operands with the same sign, overflow can never occur

MIPS signals overflow with an exception (aka interrupt) – an unscheduled procedure call where the EPC contains the address of the instruction that caused the exception

2’s Comp - Detecting Overflow When adding two's complement numbers, overflow will only

occur if the numbers being added have the same sign the sign of the result is different

If we perform the addition an-1 an-2 ... a1 a0

+bn-1bn-2… b1 b0 ----------------------------------

sn-1sn-2… s1 s0

Overflow can be detected as

Overflow can also be detected as

where cn-1and cn are the carry in and carry out of the most significant bit

111111 −⋅−⋅−+−⋅−⋅−= nnnnnn sbasbaV

1−⊗= nn ccV

Unsigned - Detecting Overflow For unsigned numbers, overflow occurs if there is carry out

of the most significant bit. For example, 1001 = 9 + 1000 = 8 0001 = 1 With the MIPS architecture

Overflow exceptions occur for two’s complement arithmetic add, sub, addi

Overflow exceptions do not occur for unsigned arithmetic addu, subu, addiu

ncV =


Dealing with Overflow Some languages (e.g., C) ignore overflow

Use MIPS addu, addui, subu instructions

Other languages (e.g., Ada, Fortran) require raising an exception Use MIPS add, addi, sub instructions On overflow, invoke exception handler

Save PC in exception program counter (EPC) register Jump to predefined handler address mfc0 (move from coprocessor reg) instruction can

retrieve EPC value, to return after corrective action

MIPS Arithmetic Logic Unit (ALU) Must support the Arithmetic/Logic

operations of the ISA add, addi, addiu, addu sub, subu mult, multu, div, divu sqrt and, andi, nor, or, ori, xor, xori beq, bne, slt, slti, sltiu, sltu

32

32

32

m (operation)

result

A

B

ALU

4

zero ovf

1 1

MIPS Arithmetic Logic Unit (ALU) Must support the Arithmetic/Logic

operations of the ISA add, addi, addiu, addu sub, subu mult, multu, div, divu sqrt and, andi, nor, or, ori, xor, xori beq, bne, slt, slti, sltiu, sltu

32

32

32

m (operation)

result

A

B

ALU

4

zero ovf

1 1

With special handling for sign extend – addi, addiu, slti, sltiu zero extend – andi, ori, xori overflow detection – add, addi, sub


Arithmetic for Multimedia Graphics and media processing operates on

vectors of 8-bit and 16-bit data Use 64-bit adder, with partitioned carry chain

Operate on 8×8-bit, 4×16-bit, or 2×32-bit vectors SIMD (single-instruction, multiple-data)

Saturating operations On overflow, result is largest representable value eg., clipping in audio, saturation in video

Multiply Binary multiplication is just a bunch of right

shifts and adds

multiplicand

multiplier

partial product array

double precision product

n

2n

n

Multiply Binary multiplication is just a bunch of right

shifts and adds

multiplicand

multiplier

partial product array

double precision product

n

2n

n can be formed in parallel and added in parallel for faster multiplication


Multiplication Start with long-multiplication approach

1000 × 1001 1000 0000 0000 1000 1001000

Length of product is the sum of operand lengths

multiplicand

multiplier

product

§3.3 Multiplication


Multiplication Hardware

Initially 0

Multiplicand is zero extended (unsigned mul)

Multiplication Example (Version 1) Consider: 11002 × 11012 , Product = 100111002 4-bit multiplicand and multiplier are used in this example Multiplicand is zero extended because it is unsigned

2

00001100 00000000 1101 Initialize 0

1

3

4

Multiplicand Product Multiplier Iteration


00001100 Multiplier[0] = 1 => ADD +

2

00001100 00000000 1101 Initialize 0

1

3

4



00011000 0110 SLL Multiplicand and SRL Multiplier 00001100 Multiplier[0] = 1 => ADD +

2

00001100 00000000 1101 Initialize 0

1

3

4



00001100 Multiplier[0] = 0 => Do Nothing 00011000 0110 SLL Multiplicand and SRL Multiplier

00001100 Multiplier[0] = 1 => ADD +

2

00001100 00000000 1101 Initialize 0

1

3

4





2

00001100 00000000 1101 Initialize 0

1

3

4





00111100 Multiplier[0] = 1 => ADD +

2

00001100 00000000 1101 Initialize 0

1

3

4



0001 01100000 SLL Multiplicand and SRL Multiplier



00111100 Multiplier[0] = 1 => ADD +

2

00001100 00000000 1101 Initialize 0

1

3

4






00111100 Multiplier[0] = 1 => ADD +

10011100 Multiplier[0] = 1 => ADD +

2

00001100 00000000 1101 Initialize 0

1

3

4







00001100 Multiplier[0] = 1 => ADD +

00111100 Multiplier[0] = 1 => ADD +

10011100 Multiplier[0] = 1 => ADD +

2

00001100 00000000 1101 Initialize 0

1

3

4


Observation on Version 1 of Multiply Hardware in version 1 can be optimized Rather than shifting the multiplicand to the

left Instead, shift the product to the right Has same net effect and produces same results

Reduce Hardware Multiplicand register can be reduced to 32 bits We can also reduce the adder size to 32 bits

One cycle per iteration Shifting and addition can be done

simultaneously

Product = HI and LO registers, HI=0 Product is shifted right Reduced 32-bit Multiplicand & Adder

Version 2 of Multiplication Hardware

= 0

Start

Multiplier[0]?

HI = HI + Multiplicand

32nd Repetition?

Done

= 1

No

Yes

Shift Product = (HI,LO) Right 1 bit Shift Multiplier Right 1 bit

32-bit ALU

Control 64 bits

32 bits

write

add

Multiplicand

shift right

32 bits

33 bits

HI LO

32 bits carry

Multiplier

shift right

Multiplier[0] 32 bits

Eliminate Multiplier Register Initialize LO = Multiplier, HI=0 Product = HI and LO registers

Refined Version of Multiply Hardware

= 0

Start

LO[0]?

HI = HI + Multiplicand

32nd Repetition?

Done

= 1

No

Yes

LO=Multiplier, HI=0

Shift Product = (HI,LO) Right 1 bit 32-bit ALU

Control 64 bits

32 bits

write

add

LO[0]

Multiplicand

shift right

32 bits

33 bits

HI LO

32 bits carry

Multiply Example (Refined Version) Consider: 11002 × 11012 , Product = 100111002 4-bit multiplicand and multiplier are used in this example 4-bit adder produces a 5-bit sum (with carry)

2

1 1 0 0 0 0 0 0 1 1 0 1 Initialize (LO = Multiplier, HI=0) 0

1

3

4

Multiplicand Product = HI, LO Carry Iteration


2


1

3

4


0 1 1 0 0 1 1 0 1 LO[0] = 1 => ADD +


1 1 0 0 Shift Right Product = (HI, LO) 0 1 1 0 0 1 1 0

2


1

3

4


0 1 1 0 0 1 1 0 1 LO[0] = 1 => ADD +


LO[0] = 0 => Do Nothing 1 1 0 0 Shift Right Product = (HI, LO) 0 1 1 0 0 1 1 0

2


1

3

4


0 1 1 0 0 1 1 0 1 LO[0] = 1 => ADD +




2


1

3

4


0 1 1 0 0 1 1 0 1 LO[0] = 1 => ADD +




2


1

3

4


0 1 1 0 0 1 1 0 1 LO[0] = 1 => ADD +

0 1 1 1 1 0 0 1 1 LO[0] = 1 => ADD +





2


1

3

4


0 1 1 0 0 1 1 0 1 LO[0] = 1 => ADD +

0 1 1 1 1 0 0 1 1 LO[0] = 1 => ADD +

1 0 0 1 1 1 0 0 1 LO[0] = 1 => ADD +


Faster Multiplier Uses multiple adders

Cost/performance tradeoff

Can be pipelined Several multiplication performed in parallel


MIPS Multiplication Two 32-bit registers for product

HI: most-significant 32 bits LO: least-significant 32-bits

Instructions mult rs, rt / multu rs, rt

64-bit product in HI/LO mfhi rd / mflo rd

Move from HI/LO to rd Can test HI value to see if product overflows 32 bits

mul rd, rs, rt

Least-significant 32 bits of product –> rd

Unsigned Division (Paper & Pencil)

=

11


Division Check for 0 divisor Long division approach

If divisor ≤ dividend bits 1 bit in quotient, subtract

Otherwise 0 bit in quotient, bring down next

dividend bit Restoring division

Do the subtract, and if remainder goes < 0, add divisor back

Signed division Divide using absolute values Adjust sign of quotient and remainder

as required

1001 1000 1001010 -1000 10 101 1010 -1000 10

n-bit operands yield n-bit quotient and remainder

quotient

dividend

remainder

divisor

§3.4 Division


Division Hardware

Initially dividend

Initially divisor in left half

Division Example

Dividend = 0111 Divisor = 0010

Optimized Divider Uses two registers: HI and LO Initialize: HI = Remainder = 0 and LO = Dividend Shift (HI, LO) LEFT by 1 bit (also Shift Quotient LEFT)

Shift the remainder and dividend registers together LEFT

Compute: Difference = Remainder – Divisor If (Difference ≥ 0) then

Remainder = Difference Set Least significant Bit of Quotient

Observation to Reduce Hardware: LO register can be also used to store the computed

Quotient


Optimized Divider Initialize:

HI = 0, LO = Dividend

Results: HI = Remainder LO = Quotient

Unsigned Integer Division Example Example: 11102 / 00112 (4-bit dividend & divisor) Result Quotient = 01002 and Remainder = 00102 4-bit registers for Remainder and Divisor (4-bit ALU)

2

0 0 0 0 1 1 1 0 Initialize 0

1

3

4

HI Difference LO Iteration 0 0 1 1 Divisor


2

0 0 0 0 1 1 1 0 Initialize 0

1

3

4


1 1 1 0 1: Shift Left, Diff = HI - Divisor 0 0 0 1 1 1 0 0 0 0 1 1


2: Diff < 0 => Do Nothing

2

0 0 0 0 1 1 1 0 Initialize 0

1

3

4





2

0 0 0 0 1 1 1 0 Initialize 0

1

3

4





2: Rem = Diff, set lsb of LO 1 0 0 1 0 0 0 0


2

0 0 0 0 1 1 1 0 Initialize 0

1

3

4







2

0 0 0 0 1 1 1 0 Initialize 0

1

3

4






Combined Multiplier and Divider

One cycle per partial-remainder subtraction Looks a lot like a multiplier!

Same hardware can be used for both


Faster Division Can’t use parallel hardware as in multiplier

Subtraction is conditional on sign of remainder Faster dividers (e.g. SRT devision)

generate multiple quotient bits per step Still require multiple steps


MIPS Division Use HI/LO registers for result

HI: 32-bit remainder LO: 32-bit quotient

Instructions div rs, rt / divu rs, rt

No overflow or divide-by-0 checking Software must perform checks if required

Use mfhi, mflo to access result

Representing Big (and Small) Numbers What if we want to encode

approx. age of the earth? 4,600,000,000 or 4.6 x 109

weight in kg of one a.m.u. (atomic mass unit) 0.0000000000000000000000000166 or 1.6 x 10-27

There is no way we can encode either of them in a 32-bit integer

Floating point representation (-1)sign x F x 2E

Still have to fit everything in 32 bits (single precision)


Floating Point Representation for non-integral numbers

Including very small and very large numbers Like scientific notation

–2.34 × 1056 +0.002 × 10–4 +987.02 × 109

In binary ±1.xxxxxxx2 × 2yyyy

Types float and double in C

normalized

not normalized

§3.5 Floating Point


Floating Point Standard Defined by IEEE Std 754-1985 Developed in response to divergence of

representations Portability issues for scientific code

Now almost universally adopted Two representations

Single precision (32-bit) Double precision (64-bit)


IEEE Floating-Point Format

S: sign bit (0 ⇒ non-negative, 1 ⇒ negative) Normalized significand: 1.0 ≤ |significand| < 2.0

Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit)

Significand is Fraction with the “1.” restored

Exponent: excess representation: actual exponent + Bias Ensures exponent is unsigned Single: Bias = 127; Double: Bias = 1023

S Exponent Fraction

single: 8 bits double: 11 bits

single: 23 bits double: 52 bits

Bias)(ExponentS 2Fraction)(11)(x −×+×−=


IEEE Floating-Point Format

All 0

All 1

All 0

All 1

All 0

All 1

All 0

All 1


Single-Precision Range Exponents 00000000 and 11111111 reserved Smallest value

Exponent: 00000001 ⇒ actual exponent = 1 – 127 = –126

Fraction: 000…00 ⇒ significand = 1.0 ±1.0 × 2–126 ≈ ±1.2 × 10–38

Largest value exponent: 11111110 ⇒ actual exponent = 254 – 127 = +127

Fraction: 111…11 ⇒ significand ≈ 2.0 ±2.0 × 2+127 ≈ ±3.4 × 10+38


Double-Precision Range Exponents 0000…00 and 1111…11 reserved Smallest value

Exponent: 00000000001 ⇒ actual exponent = 1 – 1023 = –1022

Fraction: 000…00 ⇒ significand = 1.0 ±1.0 × 2–1022 ≈ ±2.2 × 10–308

Largest value Exponent: 11111111110 ⇒ actual exponent = 2046 – 1023 = +1023

Fraction: 111…11 ⇒ significand ≈ 2.0 ±2.0 × 2+1023 ≈ ±1.8 × 10+308


Floating-Point Precision Relative precision

all fraction bits are significant Single: approx 2–23

Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal digits of precision

Double: approx 2–52

Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal digits of precision


Floating-Point Example Represent –0.75

–0.75 = (–1)1 × 1.12 × 2–1

S = 1 Fraction = 1000…002 Exponent = –1 + Bias

Single: –1 + 127 = 126 = 011111102 Double: –1 + 1023 = 1022 = 011111111102

Single: 1011111101000…00 Double: 1011111111101000…00


Floating-Point Example What number is represented by the single-

precision float 11000000101000…00

S = 1 Fraction = 01000…002 Exponent = 100000012 = 129

x = (–1)1 × (1 + 0.012) × 2(129 – 127) = (–1) × 1.25 × 22 = –5.0


Representable FP Numbers

Denser Denser Sparser Sparser

Negative numbers FLP FLP ±0 +∞

–∞

Overflow region

Overflow region

Underflow regions

Positive numbers

Underflow example

Overflow example

Midway example

Typical example

min max min max + + – – – +

+1.0×2–126 -1.0×2–126 +2.0×2+127 -2.0×2+127


Zero Biased exponent=0 Fraction = 0 Hidden bit = 0.

Two representations for 0 (single precision)

+0.0 0x00000000

-0.0 0x80000000

0.020)(01)(x BiasS ±=×+×−= −


Denormal Numbers Exponent = 000...0 ⇒ hidden bit is 0

Smaller than normal numbers allow for gradual underflow, with diminishing

precision

1)BiasS 2Fraction)(01)(x −−×+×−= (

Denormalized Example Find the decimal equivalent of IEEE 754 number

0 00000000 11011000000000000000000 Determine the sign: 0, the number is positive Calculate the exponent

exponent of all 0s means the number is denormalized exponent is therefore -126

Convert the significand to decimal remember to add the implicit bit (this is 0. because the

number is denormalized) (0.11011)2 = (0.84375)10

0 00000000 11011000000000000000000 = 0.84375 x 2-126



Infinities and NaNs Exponent = 111...1, Fraction = 000...0

±Infinity Can be used in subsequent calculations,

avoiding need for overflow check Exponent = 111...1, Fraction ≠ 000...0

Not-a-Number (NaN) Indicates illegal or undefined result

e.g., 0.0 / 0.0 Can be used in subsequent calculations


NaNs 1. Operations with at least one NaN operand 2. Indeterminate forms

divisions 0/0 and ±∞/±∞ multiplications 0×±∞ and ±∞×0 additions ∞ + (−∞), (−∞) + ∞ and equivalent

subtractions 3. Real operations with complex results

square root of a negative number. logarithm of a negative number inverse sine or cosine of a number that is less than

−1 or greater than +1.


Floating-Point Addition Consider a 4-digit decimal example

9.999 × 101 + 1.610 × 10–1

1. Align decimal points Shift number with smaller exponent 9.999 × 101 + 0.016 × 101

2. Add significands 9.999 × 101 + 0.016 × 101 = 10.015 × 101

3. Normalize result & check for over/underflow 1.0015 × 102

4. Round and renormalize if necessary 1.002 × 102


Floating-Point Addition Now consider a 4-digit binary example

1.0002 × 2–1 + –1.1102 × 2–2 (0.5 + –0.4375) 1. Align binary points

Shift number with smaller exponent 1.0002 × 2–1 + –0.1112 × 2–1

2. Add significands 1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1

3. Normalize result & check for over/underflow 1.0002 × 2–4, with no over/underflow

4. Round and renormalize if necessary 1.0002 × 2–4 (no change) = 0.0625

Floating Point Addition Example Consider adding: (1.111)2 × 2–1 + (1.011)2 × 2–3

For simplicity, we assume 4 bits of precision (or 3 bits of fraction)



Exponents are not equal




Shift the significand of the lesser exponent right until its exponent matches the larger number




Shift the significand of the lesser exponent right until its exponent matches the larger number (1.011)2 × 2–3 = (0.1011)2 × 2–2 = (0.01011)2 × 2–1





Difference between the two exponents = –1 – (–3) = 2






So, shift right by 2 bits






So, shift right by 2 bits Now, add the significands:

Carry

1.111 0.01011

10.00111

+

Addition Example – cont’d So, (1.111)2 × 2–1 + (1.011)2 × 2–3 = (10.00111)2 × 2–1


However, result (10.00111)2 × 2–1 is NOT normalized



Normalize result: (10.00111)2 × 2–1 = (1.000111)2 × 20




In this example, we have a carry

So, shift right by 1 bit and increment the exponent






Round the significand to fit in appropriate number of bits

We assumed 4 bits of precision or 3 bits of fraction








Round to nearest: (1.000111)2 ≈ (1.001)2

1.000 111 1

1.001

+








Round to nearest: (1.000111)2 ≈ (1.001)2

Renormalize if rounding generates a carry

1.000 111 1

1.001

+








Round to nearest: (1.000111)2 ≈ (1.001)2

Renormalize if rounding generates a carry

Detect overflow / underflow

If exponent becomes too large (overflow) or too small (underflow) Represent overflow/underflow accordingly

1.000 111 1

1.001

+

Floating Point Subtraction Example Consider: (1.000)2 × 2–3 – (1.000)2 × 22

We assume again: 4 bits of precision (or 3 bits of fraction)



Shift right significand of the lesser exponent



Shift right significand of the lesser exponent Difference between the two exponents = 2 – (–3) = 5



Shift right significand of the lesser exponent Difference between the two exponents = 2 – (–3) = 5 Shift right by 5 bits: (1.000)2 × 2–3 = (0.00001000)2 × 22




Convert subtraction into addition to 2's complement





+ 0.00001 × 22

– 1.00000 × 22

Sign





+ 0.00001 × 22

– 1.00000 × 22

0 0.00001 × 22

1 1.00000 × 22

Sign

2’s

Com

plem

ent





+ 0.00001 × 22

– 1.00000 × 22

0 0.00001 × 22

1 1.00000 × 22

1 1.00001 × 22

Sign

2’s

Com

plem

ent





+ 0.00001 × 22

– 1.00000 × 22

0 0.00001 × 22

1 1.00000 × 22

1 1.00001 × 22

Sign

Since result is negative, convert result from 2's complement to sign-magnitude

2’s

Com

plem

ent





+ 0.00001 × 22

– 1.00000 × 22

0 0.00001 × 22

1 1.00000 × 22

1 1.00001 × 22

Sign

Since result is negative, convert result from 2's complement to sign-magnitude

2’s

Com

plem

ent

– 0.11111 × 22 2’s Complement

Subtraction Example – cont’d So, (1.000)2 × 2–3 – (1.000)2 × 22 = – 0.111112 × 22

Subtraction Example – cont’d So, (1.000)2 × 2–3 – (1.000)2 × 22 = – 0.111112 × 22 Normalize result: – 0.111112 × 22 = – 1.11112 × 21

Subtraction Example – cont’d So, (1.000)2 × 2–3 – (1.000)2 × 22 = – 0.111112 × 22 Normalize result: – 0.111112 × 22 = – 1.11112 × 21 Round the significand to fit in appropriate number of bits




Round to nearest: (1.1111)2 ≈ (10.000)2

1.111 1 1

10.000

+



Round to nearest: (1.1111)2 ≈ (10.000)2 Renormalize: rounding generated a carry –1.11112 × 21 ≈ –10.0002 × 21 = –1.0002 × 22

1.111 1 1

10.000

+



Round to nearest: (1.1111)2 ≈ (10.000)2 Renormalize: rounding generated a carry –1.11112 × 21 ≈ –10.0002 × 21 = –1.0002 × 22

Result would have been accurate if more fraction bits are used

1.111 1 1

10.000

+

Floating Point Subtraction Example Consider: (1.000)2 × 2–3 – (1.100)2 × 2-4




Shift right significand of the lesser exponent



Shift right significand of the lesser exponent Difference between the two exponents = -3 – (–4) = 1



Shift right significand of the lesser exponent Difference between the two exponents = -3 – (–4) = 1 Shift right by 1 bits: (1.100)2 × 2–4 = (0.1100)2 × 2-3





+ 1.0000 × 2-3

– 0.1100 × 2-3

Sign





+ 1.0000 × 2-3

– 0.1100 × 2-3

0 1.0000 × 2-3

1 1.0100 × 2-3

Sign

2’s

Com

plem

ent





+ 1.0000 × 2-3

– 0.1100 × 2-3

0 1.0000 × 2-3

1 1.0100 × 2-3

0 0.0100 × 2-3

Sign

2’s

Com

plem

ent

Subtraction Example – cont’d So, (1.000)2 × 2–3 – (1.100)2 × 2-4 = 0.01002 × 2-3

Subtraction Example – cont’d So, (1.000)2 × 2–3 – (1.100)2 × 2-4 = 0.01002 × 2-3 Normalize result: 0.01002 × 2-3 = 1.00002 × 2-5


For subtraction, we can have leading zeros


For subtraction, we can have leading zeros

Round the significand to fit in appropriate number of bits We assumed 4 bits of precision or 3 bits of fraction

Answer: 1.0002 × 2-5


FP Adder Hardware Much more complex than integer adder Doing it in one clock cycle would take too

long Much longer than integer operations Slower clock would penalize all instructions

FP adder usually takes several cycles Can be pipelined


FP Adder Hardware

Step 1

Step 2

Step 3

Step 4


FP Adder Hardware


Floating-Point Multiplication Consider a 4-digit decimal example

1.110 × 1010 × 9.200 × 10–5

1. Add exponents For biased exponents, subtract bias from sum New exponent = 10 + –5 = 5

2. Multiply significands 1.110 × 9.200 = 10.212 ⇒ 10.212 × 105

3. Normalize result & check for over/underflow 1.0212 × 106

4. Round and renormalize if necessary 1.021 × 106

5. Determine sign of result from signs of operands +1.021 × 106


Floating-Point Multiplication Now consider a 4-digit binary example

1.0002 × 2–1 × –1.1102 × 2–2 (0.5 × –0.4375) 1. Add exponents

Unbiased: –1 + –2 = –3 Biased: (–1 + 127) + (–2 + 127) = –3 + 254 – 127 = –3 + 127

2. Multiply significands 1.0002 × 1.1102 = 1.1102 ⇒ 1.1102 × 2–3

3. Normalize result & check for over/underflow 1.1102 × 2–3 (no change) with no over/underflow

4. Round and renormalize if necessary 1.1102 × 2–3 (no change)

5. Determine sign: +ve × –ve ⇒ –ve –1.1102 × 2–3 = –0.21875


FP Arithmetic Hardware FP multiplier is of similar complexity to FP

adder But uses a multiplier for significands instead of

an adder FP arithmetic hardware usually does

Addition, subtraction, multiplication, division, reciprocal, square-root

FP ↔ integer conversion Operations usually takes several cycles

Can be pipelined


FP Instructions in MIPS FP hardware is coprocessor 1

Adjunct processor that extends the ISA Separate FP registers

32 single-precision: $f0, $f1, … $f31 Paired for double-precision: $f0/$f1, $f2/$f3, …

Release 2 of MIPs ISA supports 32 × 64-bit FP reg’s

FP instructions operate only on FP registers Programs generally don’t do integer ops on FP data, or

vice versa More registers with minimal code-size impact

FP load and store instructions lwc1, ldc1, swc1, sdc1

e.g., ldc1 $f8, 32($sp)


FP Instructions in MIPS Single-precision arithmetic

add.s, sub.s, mul.s, div.s add.s $f0, $f1, $f6 #$f0 = $f1 + $f6

Double-precision arithmetic add.d, sub.d, mul.d, div.d

add.d $f0, $f2, $f6 #$f0:$f1 = $f2:$f3 + $f6:$f7

mul.d $f4, $f4, $f6

Single- and double-precision comparison c.xx.s, c.xx.d (xx is eq, lt, le, …) Sets or clears FP condition-code bit (0 to 7)

c.lt.s $f3, $f4 #if($f3 < $f4) cond=1; else cond=0

Branch on FP condition code true or false bc1t, bc1f

bc1t TargetLabel #if(cond==1) go to TargetLabel


FP Example: °F to °C C code: float f2c (float fahr) { return ((5.0/9.0)*(fahr - 32.0)); }

fahr in $f12, result in $f0, literals in global memory space

Compiled MIPS code: f2c: lwc1 $f16, const5($gp) lwc1 $f18, const9($gp) div.s $f16, $f16, $f18 lwc1 $f18, const32($gp) sub.s $f18, $f12, $f18 mul.s $f0, $f16, $f18 jr $ra


Accurate Arithmetic IEEE Std 754 specifies additional rounding

control Extra bits of precision (guard, round, sticky) Choice of rounding modes Allows programmer to fine-tune numerical behavior of

a computation Not all FP units implement all options

Most programming languages and FP libraries just use defaults

Trade-off between hardware complexity, performance, and market requirements

Support for Accurate Arithmetic

IEEE 754 FP rounding modes Always round up (toward +∞) Always round down (toward -∞) Truncate (toward zero) Round to nearest even (RTNE)

Support for Accurate Arithmetic

Rounding (except for truncation) requires the hardware to include extra F bits during calculations Guard bit – used to provide one F bit when shifting left to normalize

a result (e.g., when normalizing F after division or subtraction) Round bit – used to improve rounding accuracy Sticky bit – used to support Round to nearest even; is set to a 1

whenever a 1 bit shifts (right) through it (e.g., when aligning F during addition/subtraction) sticky OR of the discarded bits

IEEE 754 FP rounding modes Always round up (toward +∞) Always round down (toward -∞) Truncate (toward zero) Round to nearest even (RTNE)

F = 1 . xxxxxxxxxxxxxxxxxxxxxxx G R S

Rounding Examples


Number To +∞ To −∞ To Zero +1.000101 +1.0010 +1.0001 +1.0001 - 1.000111 - 1.0001 - 1.0010 - 1.0001 +1.010110 +1.0110 +1.0101 +1.0101 +1.010010 +1.0101 +1.0100 +1.0100 - 1.001110 - 1.0011 - 1.0100 - 1.0011

RTNE Examples GRS - Action (after first normalization – if required) 0xx - round down = do nothing (x means any bit value, 0 or 1) 100 - this is a tie: round up if the bit just before G is 1, else

round down = do nothing 101 - round up (increment by 1) 110 - round up (increment by 1) 111 - round up (increment by 1) Sig GRS Increment? Rounded 1.000000 N 1.000 1.101000 N 1.101 1.000100 N 1.000 1.001100 Y 1.010 1.000101 Y 1.001 1.111110 Y 10.000


Guard Bit Guard bit: guards against loss of a significant bit

Only one guard bit is needed to maintain accuracy of result Shifted left (if needed) during normalization as last fraction bit



Example on the need of a guard bit:




1.00000000101100010001101 × 25

– 1.00000000000000011011010 × 2-2 (subtraction)




1.00000000101100010001101 × 25

– 1.00000000000000011011010 × 2-2 (subtraction)

1.00000000101100010001101 × 25

– 0.00000010000000000000001 1011010 × 25 (shift right 7 bits)




1.00000000101100010001101 × 25

– 1.00000000000000011011010 × 2-2 (subtraction)

1.00000000101100010001101 × 25

– 0.00000010000000000000001 1011010 × 25 (shift right 7 bits)

0 1.00000000101100010001101 × 25

1 1.11111101111111111111110 0 100110 × 25 (2's complement)




1.00000000101100010001101 × 25

– 1.00000000000000011011010 × 2-2 (subtraction)

1.00000000101100010001101 × 25

– 0.00000010000000000000001 1011010 × 25 (shift right 7 bits)

0 1.00000000101100010001101 × 25

1 1.11111101111111111111110 0 100110 × 25 (2's complement)

Guard bit – do not discard




1.00000000101100010001101 × 25

– 1.00000000000000011011010 × 2-2 (subtraction)

1.00000000101100010001101 × 25

– 0.00000010000000000000001 1011010 × 25 (shift right 7 bits)

0 1.00000000101100010001101 × 25

1 1.11111101111111111111110 0 100110 × 25 (2's complement)

0 0.11111110101100010001011 0 100110 × 25 (add significands)





1.00000000101100010001101 × 25

– 1.00000000000000011011010 × 2-2 (subtraction)

1.00000000101100010001101 × 25

– 0.00000010000000000000001 1011010 × 25 (shift right 7 bits)

0 1.00000000101100010001101 × 25

1 1.11111101111111111111110 0 100110 × 25 (2's complement)

0 0.11111110101100010001011 0 100110 × 25 (add significands)

+ 1.11111101011000100010110 1 001100 × 24 (normalized)


Round and Sticky Bits Two extra bits are needed for rounding

Rounding performed after normalizing a result significand Round bit: appears after the guard bit Sticky bit: appears after the round bit (OR of all additional bits)



Consider the same example of previous slide:




0 1.00000000101100010001101 × 25

1 1.11111101111111111111110 0 1 00110 × 25 (2's complement)




0 1.00000000101100010001101 × 25

1 1.11111101111111111111110 0 1 00110 × 25 (2's complement)

0 0.11111110101100010001011 0 1 00110 × 25 (sum)




0 1.00000000101100010001101 × 25

1 1.11111101111111111111110 0 1 00110 × 25 (2's complement)

0 0.11111110101100010001011 0 1 00110 × 25 (sum)

+ 1.11111101011000100010110 1 0 1 × 24 (normalized)




0 1.00000000101100010001101 × 25

1 1.11111101111111111111110 0 1 00110 × 25 (2's complement)

0 0.11111110101100010001011 0 1 00110 × 25 (sum)

+ 1.11111101011000100010110 1 0 1 × 24 (normalized) Round bit




0 1.00000000101100010001101 × 25

1 1.11111101111111111111110 0 1 00110 × 25 (2's complement)

0 0.11111110101100010001011 0 1 00110 × 25 (sum)

+ 1.11111101011000100010110 1 0 1 × 24 (normalized) Round bit Sticky bit




0 1.00000000101100010001101 × 25

1 1.11111101111111111111110 0 1 00110 × 25 (2's complement)

0 0.11111110101100010001011 0 1 00110 × 25 (sum)

+ 1.11111101011000100010110 1 0 1 × 24 (normalized) Round bit Sticky bit

OR-reduce

Guard bit




0 1.00000000101100010001101 × 25

1 1.11111101111111111111110 0 1 00110 × 25 (2's complement)

0 0.11111110101100010001011 0 1 00110 × 25 (sum)

+ 1.11111101011000100010110 1 0 1 × 24 (normalized)

Round bit Sticky bit

OR-reduce

Guard bit

Guard bit

chapter - university of isfahanengold.ui.ac.ir/~nikmehr/chapter_03_animated.pdf< 0 ≥ 0 a - b...

Documents