cpsc 321 computer architecture alu design – integer addition, multiplication & division...

CPSC 321Computer Architecture

ALU Design – Integer Addition, Multiplication & Division

Copyright 2002 David H. Albonesi and the University of Rochester. Additional material by Rabi Mahapatra and Hank Walker

Integer multiplication

Pencil and paper binary multiplication

1000 (multiplicand) 1001 (multiplier)x


1000 (multiplicand) 1001 (multiplier)

1000

x




100000000

x




100000000

x

000000




100000000

1000000

x

000000




1001000 (product)

100000000

+1000000

x

000000(partial products)




Key elementsExamine multiplier bits from right to leftShift multiplicand left one position each stepSimplification: each step, add multiplicand to running

product total, but only if multiplier bit = 1


1001000 (product)

100000000

+1000000

x

000000(partial products)


Initialize product register to 0


00000000 (running product)


Multiplier bit = 1: add multiplicand to product

1000 1001 (multiplier)

+1000 00000000

00001000 (new running product)

(multiplicand)


Shift multiplicand left

1001 (multiplier)

+1000 00000000

00001000

10000 (multiplicand)


Multiplier bit = 0: do nothing

1001 (multiplier)

+1000 00000000


00001000



1001 (multiplier)

+1000 00000000


00001000


Multiplier bit = 0: do nothing

1001 (multiplier)

+1000 00000000


00001000



1001 (multiplier)

+1000 00000000


00001000


Multiplier bit = 1: add multiplicand to product

1001 (multiplier)

+1000 00000000

00001000


+1000000

01001000 (product)

Integer multiplication 32-bit hardware implementation

Multiplicand loaded into right half of multiplicand registerProduct register initialized to all 0’sRepeat the following 32 times

If multiplier register LSB=1, add multiplicand to productShift multiplicand one bit leftShift multiplier one bit right

LSB


Algorithm


Drawback: half of 64-bit multiplicand register are zerosHalf of 64 bit adder is adding zeros

Solution: shift product right instead of multiplicand leftOnly left half of product register added to multiplicand





00000000 (running product)





+1000


00000000





+1000 00000000

00010000






+1000 00000000

00010000+1000

01001000 (product)


Hardware implementation


Final improvement: use right half of product register for the multiplier


Final algorithm

Multiplication of signed numbers

Naïve approachConvert to positive numbersMultiplyNegate product if multiplier and multiplicand signs

differSlow and extra hardware

Multiplication of signed numbers

Booth’s algorithmInvented for speed

Shifting was faster than addition at the timeObjective: reduce the number of additions required

Fortunately, it works for signed numbers as well

Basic idea: the additions from a string of 1’s in the multiplier can be converted to a single addition and a single subtraction operation

Example: 00111110 is equivalent to 01000000 – 00000010

requires additions for each of these bit positions

requires an addition for this bit position

and a subtraction for this bit position

Booth’s algorithm

Starting from right to left, look at two adjacent bits of the multiplierPlace a zero at the right of the LSB to start

If bits = 00, do nothing

If bits = 10, subtract the multiplicand from the productBeginning of a string of 1’s

If bits = 11, do nothingMiddle of a string of 1’s

If bits = 01, add the multiplicand to the productEnd of a string of 1’s

Shift product register right one bit

Booth recoding

Example

1101 (multiplier)x0010 (multiplicand)

Booth recoding

Example

00001101 0 (product+multiplier)

0010 (multiplicand)

extra bit position

Booth recoding

Example

00001101 0

0010 (multiplicand)

+1110

11101101 0

Booth recoding

Example

00001101 0

0010 (multiplicand)

+1110

11110110 1

Booth recoding

Example

00001101 0

0010 (multiplicand)

+1110

11110110 1+0010

00010110 1

Booth recoding

Example

00001101 0

0010 (multiplicand)

+1110

11110110 1+0010

00001011 0

Booth recoding

Example

00001101 0

0010 (multiplicand)

+1110

11110110 1+0010

00001011 0+1110

11101011 0

Booth recoding

Example

00001101 0

0010 (multiplicand)

+1110

11110110 1+0010

00001011 0+1110

11110101 1

Booth recoding

Example

00001101 0

0010 (multiplicand)

+1110

11110110 1+0010

00001011 0+1110

111110101 (product)

Integer division

Pencil and paper binary division

01001000 (dividend)(divisor) 1000

Integer division



1

- 1000

0001 (partial remainder)

Integer division



1

- 1000

00010

Integer division



10

- 1000

00010

Integer division



10

- 1000

000100

Integer division



100

- 1000

000100

Integer division



100

- 1000

0001000

Integer division



1001 (quotient)

- 1000

0001000- 0001000

0000000 (remainder)

Integer division


Steps in hardwareShift the dividend left one position Subtract the divisor from the left half of the dividendIf result positive, shift left a 1 into the quotientElse, shift left a 0 into the quotient, and repeat from the

beginningOnce the result is positive, repeat the process for the

partial remainderDo n iterations where n is the size of the divisor


1001 (quotient)

- 1000

0001000- 0001000

0000000 (remainder)

Integer division

Initial state

01001000 (dividend)(divisor) 1000 0000 (quotient)

Integer division

Shift dividend left one position

10010000 (dividend)(divisor) 1000 0000 (quotient)

Integer division

Subtract divisor from left half of dividend

10010000 (dividend)(divisor) 1000 0000 (quotient)- 1000

00010000(keep these bits)

Integer division

Result positive, left shift a 1 into the quotient


00010000

Integer division

Shift partial remainder left one position


00100000

Integer division

Subtract divisor from left half of partial remainder


00100000- 1000

11010000

Integer division

Result negative, left shift 0 into quotient


00100000- 1000

11010000

Integer division

Restore original partial remainder (how?)


00100000- 1000

11010000

00100000

Integer division



00100000- 1000

11010000

01000000

Integer division



00100000- 1000

11010000

01000000- 1000

11000000

Integer division

Result negative, left shift 0 into quotient


00100000- 1000

11010000

01000000- 1000

11000000

Integer division

Restore original partial remainder


00100000- 1000

11010000

01000000- 1000

11000000

01000000

Integer division



00100000- 1000

11010000

01000000- 1000

11000000

10000000

Integer division



00100000- 1000

11010000

01000000- 1000

11000000

10000000- 1000

00000000

Integer division

Result positive, left shift 1 into quotient


00100000- 1000

11010000

01000000- 1000

11000000

10000000- 1000

00000000 (remainder)

Integer division

Hardware implementation

What operations do we do here?

Load dividend here initially

Integer and floating point revisited

Integer ALU handles add, subtract, logical, set less than, equality test, and effective address calculations

Integer multiplier handles multiply and divideHI and LO registers hold result of integer multiply and

divide

instructionmemory

PC

integerregister

file

integerALU

integermultiplier

datamemory

flt ptregister

file

flt ptadder

flt ptmultiplier

HILO

Floating point representation

Floating point (fp) numbers represent realsExample reals: 5.6745, 1.23 x 10-19, 345.67 x 106

Floats and doubles in C

Fp numbers are in signed magnitude representation of the form (-1)S x M x BE whereS is the sign bit (0=positive, 1=negative)M is the mantissa (also called the significand)B is the base (implied)E is the exponentExample: 22.34 x 10-4

S=0 M=22.34B=10E=-4

Floating point representation

Fp numbers are normalized in that M has only one digit to the left of the “decimal point”Between 1.0 and 9.9999… in decimalBetween 1.0 and 1.1111… in binarySimplifies fp arithmetic and comparisonsNormalized: 5.6745 x 102, 1.23 x 10-19

Not normalized: 345.67 x 106 , 22.34 x 10-4 , 0.123 x 10-

45

In binary format, normalized numbers are of the form

Leading 1 in 1.M is implied

(-1)S x 1.M x BE

Floating point representation tradeoffs

Representing a wide enough range of fp values with enough precision (“decimal” places) given limited bits

More E bits increases the rangeMore M bits increases the precisionA larger B increases the range but decreases the

precisionThe distance between consecutive fp numbers is not

constant!

BE BE+1 BE+2

… …

S

32 bits

E?? M??

(-1)S x 1.M x BE

Floating point representation tradeoffs

Allowing for fast arithmetic implementations Different exponents requires lining up the significands;

larger base increases the probability of equal exponents

Handling very small and very large numbers 0

exponentoverflow

exponentunderflow

exponentoverflow

representable positive numbers (S=0)

representable negative numbers (S=1)

Sorting/comparing fp numbers

fp numbers can be treated as integers for sorting and comparing purposes if E is placed to the left

Example3.67 x 106 > 6.34 x 10-4 > 1.23 x 10-4

S E M

(-1)S x 1.M x BE

bigger E is bigger number

If E’s are same, bigger M is

bigger number

Biased exponent notation

111…111 represents the most positive E and 000…000 represents the most negative E for sorting/comparing purposes

To get correct signed value for E, need to subtract a bias of 011…111

Biased fp numbers are of the form (-1)S x 1.M x BE-bias

Example: assume 8 bits for EBias is 01111111 = 127Largest E represented by 11111111 which is

255 – 127 = 128Smallest E represented by 00000000 which is

0 – 127 = -127

IEEE 754 floating point standard

Created in 1985 in response to the wide range of fp formats used by different companiesHas greatly improved portability of scientific

applications

B=2

Single precision (sp) format (“float” in C)

Double precision (dp) format (“double” in C)

S E M

1 bit 8 bits 23 bits

S E M

1 bit 11 bits 52 bits

IEEE 754 floating point standard

Exponent bias is 127 for sp and 1023 for dp

Fp numbers are of the form (-1)S x 1.M x 2E-bias 1 in mantissa and base of 2 are implied

Sp form is (-1)S x 1.M22 M21 …M0 x 2E-127

and value is (-1)S x (1+(M22x2-1) +(M21x2-2)+…+(M0x2-23)) x 2E-127

Sp example

Number is –1.1000…000 x 21-127=-1.5 x 2-126=1.763 x 10-38

1 00000001 1000…000

S E M

IEEE 754 floating point standard Denormalized numbers

Allow for representation of very small numbers

Identified by E=0 and a non-zero M

Format is (-1)S x 0.M x 2-bias-1

Smallest positive dp denormalized number is 0.00…01 x 2-1022 = 2-1074

smallest positive dp normalized number is 1.0 x 2-1023

Hardware support is complex, and so often handled by software

0

exponentoverflow

exponentunderflow

exponentoverflow

representable positive numbers

representable negative numbers

Floating point addition

Make both exponents the sameFind the number with the smaller oneShift its mantissa to the right until the exponents match

Must include the implicit 1 (1.M)

Add the mantissas

Choose the largest exponent

Put the result in normalized formShift mantissa left or right until in form 1.MAdjust exponent accordingly

Handle overflow or underflow if necessary

Round

Renormalize if necessary if rounding produced an unnormalized result


Algorithm

Floating point addition example

1 00000001 0000…01100

S E M

0 00000011 0100…00111

S E M

Initial values


1 00000001 0000…01100

S E M

0 00000011 0100…00111

S E M

Identify smaller E and calculate E difference

difference = 2


1 00000001 0100…00011

S E M

0 00000011 0100…00111

S E M

Shift smaller M right by E difference


1 00000001 0100…00011

S E M

0 00000011 0100…00111

S E M

Add mantissas

-0.0100…00011 + 1.0100…00111 = 1.0000…00100

0 0000…00100

S E M


1 00000001 0100…00011

S E M

0 00000011 0100…00111

S E M

Choose larger exponent for result

0 00000011 0000…00100

S E M


1 00000001 0100…00011

S E M

0 00000011 0100…00111

S E M

Final answer (already normalized)

0 00000011 0000…00100

S E M


Hardware design

determine smaller

exponent


Hardware design

shift mantissa of smaller

number right by exponent difference


Hardware design

add mantissas


Hardware design

normalize result by shifting mantissa of result and adjusting

larger exponent


Hardware design

round result


Hardware design

renormalize if necessary

Floating point multiply

Add the exponents and subtract the bias from the sumExample: (5+127) + (2+127) – 127 = 7+127

Multiply the mantissas

Put the result in normalized formShift mantissa left or right until in form 1.MAdjust exponent accordingly

Handle overflow or underflow if necessary

Round

Renormalize if necessary if rounding produced an unnormalized result

Set S=0 if signs of both operands the same, S=1 otherwise

Floating point multiply

Algorithm

Floating point multiply example

Initial values

1 00000111 1000…00000

S E M

0 11100000 1000…00000

S E M

-1.5 x 27-127

1.5 x 2224-127


Add exponents

1 00000111 1000…00000

S E M

0 11100000 1000…00000

S E M

-1.5 x 27-127

00000111 + 11100000 = 11100111 (231)

1.5 x 2224-127


Subtract bias

1 00000111 1000…00000

S E M

0 11100000 1000…00000

S E M

-1.5 x 27-127

01101000

S E M

11100111 – 01111111 = 11100111 + 10000001 = 01101000 (104)

1.5 x 2224-127


Multiply the mantissas

1 00000111 1000…00000

S E M

0 11100000 1000…00000

S E M

-1.5 x 27-127

01101000

S E M

1.1000… x 1.1000… = 10.01000…

1.5 x 2224-127


Normalize by shifting 1.M right one position and adding one to E

1 00000111 1000…00000

S E M

0 11100000 1000…00000

S E M

-1.5 x 27-127

01101001

S E M

10.01000… => 1.001000…

001000…

1.5 x 2224-127


Set S=1 since signs are different

1 00000111 1000…00000

S E M

0 11100000 1000…00000

S E M

-1.5 x 27-127

01101001

S E M

001000…

1.5 x 2224-127

1 -1.125 x 2105-127

Rounding

Fp arithmetic operations may produce a result with more digits than can be represented in 1.M

The result must be rounded to fit into the available number of M positions

Tradeoff of hardware cost (keeping extra bits) and speed versus accumulated rounding error

Rounding

Examples from decimal multiplication

Renormalization is required after rounding in c)

Rounding

Examples from binary multiplication (assuming two bits for M)

1.01 x 1.01 = 1.1001 (1.25 x 1.25 = 1.5625)

1.10 x 1.01 = 1.111 (1.5 x 1.25 = 1.875)

May require renormalization after rounding

1.11 x 1.01 = 10.0011 (1.75 x 1.25 = 2.1875)

Result has twice as many bits

Rounding

In binary, an extra bit of 1 is halfway in between the two possible representations

1.001 (1.125) is halfway between 1.00 (1) and 1.01 (1.25)

1.101 (1.625) is halfway between 1.10 (1.5) and 1.11 (1.75)

IEEE 754 rounding modes

TruncateRemove all digits beyond those supported1.00100 -> 1.00

Round up to the next value1.00100 -> 1.01

Round down to the previous value1.00100 -> 1.00Differs from Truncate for negative numbers

Round-to-nearest-evenRounds to the even value (the one with an LSB of 0) 1.00100 -> 1.001.01100 -> 1.10Produces zero average biasDefault mode

Implementing rounding

A product may have twice as many digits as the multiplier and multiplicand1.11 x 1.01 = 10.0011

For round-to-nearest-even, we need to know The value to the right of the LSB (round bit)Whether any other digits to the right of the round digit

are 1’sThe sticky bit is the OR of these digits

1.00101 rounds to 1.01

Round bit Sticky bit = 0 OR 1 = 1

1.00100 rounds to 1.00

LSB of final rounded result


The product before normalization may have 2 digits to the left of the binary point

Product register format needs to be

Two possible cases

bb.bbbb…

1b.bbbb… r sssss…

01.bbbb… r sssss…

Need this as a result bit!


The guard bit (g) becomes part of the unrounded result when the MSB = 0

g, r, and s suffice for rounding addition as well

MIPS floating point registers

32 32-bit FPRs16 64-bit registers (32-bit register pairs) for dp floating pointSoftware conventions for their usage (as with GPRs)

Control/status registerStatus of compare operations, sets rounding mode,

exceptions

Implementation/revision registerIdentifies type of CPU and its revision number

f0f1

f30f31

.

.

.

control/status register31 0

FCR3131 0

implementation/revision register

FCR031 0

floating point registers

MIPS floating point instruction overview

Operate on single and double precision operands

ComputationAdd, sub, multiply, divide, sqrt, absolute value, negateMultiply-add, multiply-subtract

Added as part of MIPS-IV revision of ISA specification

Load and storeInteger register read for EA calculationData to be loaded or stored in fp register file

Move between registers

Convert between different formats

Comparison instructions

Branch instructions

MIPS R10000 arithmetic units

instructionmemory

PC

integerregister

file

integerALU

integerALU +

multiplier

datamemory

flt ptregister

file

flt ptadder

flt ptmultiplier

flt ptdivider

flt ptsq root

EA calc

MIPS R10000 arithmetic units

Integer ALU + shifterAll instructions take one cycle

Integer ALU + multiplierBooth’s algorithm for multiplication (5-10 cycles)Non-restoring division (34-67 cycles)

Floating point adderCarry propagate (2 cycles)

Floating point multiplier (3 cycles)Booth’s algorithm

Floating point divider (12-19 cycles)

Floating point square root unit

Separate unit for EA calculations

Can start up to 5 instructions in 1 cycle

cpsc 321 computer architecture alu design – integer addition, multiplication & division...

Documents

multiplicand leftonly

leftshift multiplicand

multiplicand1000 multiplicand

productshift multiplicand

leftshift multiplier

half of product register

timesif multiplier

shift product