chapter 3 arithmetic for computers.ppt

Chapter 3Arithmetic for Computers

Chapter 3 Arithmetic for Computers

Arithmetic for ComputersOperations on integersAddition and subtractionMultiplication and divisionDealing with overflowFloating-point real numbersRepresentation and operations 3.1 Introduction


Integer AdditionExample: 7 + 63.2 Addition and SubtractionOverflow if result out of rangeAdding +ve and ve operands, no overflowAdding two +ve operandsOverflow if result sign is 1Adding two ve operandsOverflow if result sign is 0


Integer SubtractionAdd negation of second operandExample: 7 6 = 7 + (6)+7:0000 0000 0000 0111 6:1111 1111 1111 1010 +1:0000 0000 0000 0001Overflow if result out of rangeSubtracting two +ve or two ve operands, no overflowSubtracting +ve from ve operandOverflow if result sign is 0Subtracting ve from +ve operandOverflow if result sign is 1


Dealing with OverflowSome languages (e.g., C) ignore overflowUse MIPS addu, addui, subu instructionsOther languages (e.g., Ada, Fortran) require raising an exceptionUse MIPS add, addi, sub instructionsOn overflow, invoke exception handlerSave PC in exception program counter (EPC) registerJump to predefined handler addressmfc0 (move from coprocessor reg) instruction can retrieve EPC value, to return after corrective action


Arithmetic for MultimediaGraphics and media processing operates on vectors of 8-bit and 16-bit dataUse 64-bit adder, with partitioned carry chainOperate on 88-bit, 416-bit, or 232-bit vectorsSIMD (single-instruction, multiple-data)Saturating operationsOn overflow, result is largest representable valuec.f. 2s-complement modulo arithmeticE.g., clipping in audio, saturation in video


MultiplicationStart with long-multiplication approachLength of product is the sum of operand lengthsmultiplicandmultiplierproduct3.3 Multiplication


Multiplication HardwareInitially 0


Optimized MultiplierPerform steps in parallel: add/shiftOne cycle per partial-product additionThats ok, if frequency of multiplications is low


Faster MultiplierUses multiple addersCost/performance tradeoffCan be pipelinedSeveral multiplication performed in parallel


MIPS MultiplicationTwo 32-bit registers for productHI: most-significant 32 bitsLO: least-significant 32-bitsInstructionsmult rs, rt / multu rs, rt64-bit product in HI/LOmfhi rd / mflo rdMove from HI/LO to rdCan test HI value to see if product overflows 32 bitsmul rd, rs, rtLeast-significant 32 bits of product > rd


DivisionCheck for 0 divisorLong division approachIf divisor dividend bits1 bit in quotient, subtractOtherwise0 bit in quotient, bring down next dividend bitRestoring divisionDo the subtract, and if remainder goes < 0, add divisor backSigned divisionDivide using absolute valuesAdjust sign of quotient and remainder as required 10011000 1001010 -1000 10 101 1010 -1000 10n-bit operands yield n-bit quotient and remainderquotientdividendremainderdivisor3.4 Division


Division HardwareInitially dividendInitially divisor in left half


Optimized DividerOne cycle per partial-remainder subtractionLooks a lot like a multiplier!Same hardware can be used for both


Faster DivisionCant use parallel hardware as in multiplierSubtraction is conditional on sign of remainderFaster dividers (e.g. SRT devision) generate multiple quotient bits per stepStill require multiple steps


MIPS DivisionUse HI/LO registers for resultHI: 32-bit remainderLO: 32-bit quotientInstructionsdiv rs, rt / divu rs, rtNo overflow or divide-by-0 checkingSoftware must perform checks if requiredUse mfhi, mflo to access result


Floating PointRepresentation for non-integral numbersIncluding very small and very large numbersLike scientific notation2.34 1056+0.002 104+987.02 109In binary1.xxxxxxx2 2yyyyTypes float and double in Cnormalizednot normalized3.5 Floating Point


Floating Point StandardDefined by IEEE Std 754-1985Developed in response to divergence of representationsPortability issues for scientific codeNow almost universally adoptedTwo representationsSingle precision (32-bit)Double precision (64-bit)


IEEE Floating-Point FormatS: sign bit (0 non-negative, 1 negative)Normalize significand: 1.0 |significand| < 2.0Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit)Significand is Fraction with the 1. restoredExponent: excess representation: actual exponent + BiasEnsures exponent is unsignedSingle: Bias = 127; Double: Bias = 1203SExponentFractionsingle: 8 bits double: 11 bitssingle: 23 bits double: 52 bits


Single-Precision RangeExponents 00000000 and 11111111 reservedSmallest valueExponent: 00000001 actual exponent = 1 127 = 126Fraction: 00000 significand = 1.01.0 2126 1.2 1038Largest valueexponent: 11111110 actual exponent = 254 127 = +127Fraction: 11111 significand 2.02.0 2+127 3.4 10+38


Double-Precision RangeExponents 000000 and 111111 reservedSmallest valueExponent: 00000000001 actual exponent = 1 1023 = 1022Fraction: 00000 significand = 1.01.0 21022 2.2 10308Largest valueExponent: 11111111110 actual exponent = 2046 1023 = +1023Fraction: 11111 significand 2.02.0 2+1023 1.8 10+308


Floating-Point PrecisionRelative precisionall fraction bits are significantSingle: approx 223Equivalent to 23 log102 23 0.3 6 decimal digits of precisionDouble: approx 252Equivalent to 52 log102 52 0.3 16 decimal digits of precision


Floating-Point ExampleRepresent 0.750.75 = (1)1 1.12 21S = 1Fraction = 1000002Exponent = 1 + BiasSingle: 1 + 127 = 126 = 011111102Double: 1 + 1023 = 1022 = 011111111102Single: 101111110100000Double: 101111111110100000


Floating-Point ExampleWhat number is represented by the single-precision float1100000010100000S = 1Fraction = 01000002Fxponent = 100000012 = 129x = (1)1 (1 + 012) 2(129 127)= (1) 1.25 22= 5.0


Denormal NumbersExponent = 000...0 hidden bit is 0Smaller than normal numbersallow for gradual underflow, with diminishing precisionDenormal with fraction = 000...0Two representations of 0.0!


Infinities and NaNsExponent = 111...1, Fraction = 000...0InfinityCan be used in subsequent calculations, avoiding need for overflow checkExponent = 111...1, Fraction 000...0Not-a-Number (NaN)Indicates illegal or undefined resulte.g., 0.0 / 0.0Can be used in subsequent calculations


Floating-Point AdditionConsider a 4-digit decimal example9.999 101 + 1.610 1011. Align decimal pointsShift number with smaller exponent9.999 101 + 0.016 1012. Add significands9.999 101 + 0.016 101 = 10.015 1013. Normalize result & check for over/underflow1.0015 1024. Round and renormalize if necessary1.002 102


Floating-Point AdditionNow consider a 4-digit binary example1.0002 21 + 1.1102 22 (0.5 + 0.4375)1. Align binary pointsShift number with smaller exponent1.0002 21 + 0.1112 212. Add significands1.0002 21 + 0.1112 21 = 0.0012 213. Normalize result & check for over/underflow1.0002 24, with no over/underflow4. Round and renormalize if necessary1.0002 24 (no change) = 0.0625


FP Adder HardwareMuch more complex than integer adderDoing it in one clock cycle would take too longMuch longer than integer operationsSlower clock would penalize all instructionsFP adder usually takes several cyclesCan be pipelined


FP Adder HardwareStep 1Step 2Step 3Step 4


Floating-Point MultiplicationConsider a 4-digit decimal example1.110 1010 9.200 1051. Add exponentsFor biased exponents, subtract bias from sumNew exponent = 10 + 5 = 52. Multiply significands1.110 9.200 = 10.212 10.212 1053. Normalize result & check for over/underflow1.0212 1064. Round and renormalize if necessary1.021 1065. Determine sign of result from signs of operands+1.021 106


Floating-Point MultiplicationNow consider a 4-digit binary example1.0002 21 1.1102 22 (0.5 0.4375)1. Add exponentsUnbiased: 1 + 2 = 3Biased: (1 + 127) + (2 + 127) = 3 + 254 127 = 3 + 1272. Multiply significands1.0002 1.1102 = 1.1102 1.1102 233. Normalize result & check for over/underflow1.1102 23 (no change) with no over/underflow4. Round and renormalize if necessary1.1102 23 (no change)5. Determine sign: +ve ve ve1.1102 23 = 0.21875


FP Arithmetic HardwareFP multiplier is of similar complexity to FP adderBut uses a multiplier for significands instead of an adderFP arithmetic hardware usually doesAddition, subtraction, multiplication, division, reciprocal, square-rootFP integer conversionOperations usually takes several cyclesCan be pipelined


FP Instructions in MIPSFP hardware is coprocessor 1Adjunct processor that extends the ISASeparate FP registers32 single-precision: $f0, $f1, $f31Paired for double-precision: $f0/$f1, $f2/$f3, Release 2 of MIPs ISA supports 32 64-bit FP regsFP instructions operate only on FP registersPrograms generally dont do integer ops on FP data, or vice versaMore registers with minimal code-size impactFP load and store instructionslwc1, ldc1, swc1, sdc1e.g., ldc1 $f8, 32($sp)


FP Instructions in MIPSSingle-precision arithmeticadd.s, sub.s, mul.s, div.se.g., add.s $f0, $f1, $f6Double-precision arithmeticadd.d, sub.d, mul.d, div.de.g., mul.d $f4, $f4, $f6Single- and double-precision comparisonc.xx.s, c.xx.d (xx is eq, lt, le, )Sets or clears FP condition-code bite.g. c.lt.s $f3, $f4Branch on FP condition code true or falsebc1t, bc1fe.g., bc1t TargetLabel


FP Example: F to CC code:float f2c (float fahr) { return ((5.0/9.0)*(fahr - 32.0)); }fahr in $f12, result in $f0, literals in global memory spaceCompiled MIPS code:f2c: lwc1 $f16, const5($gp) lwc2 $f18, const9($gp) div.s $f16, $f16, $f18 lwc1 $f18, const32($gp) sub.s $f18, $f12, $f18 mul.s $f0, $f16, $f18 jr $ra


FP Example: Array MultiplicationX = X + Y ZAll 32 32 matrices, 64-bit double-precision elementsC code:void mm (double x[][], double y[][], double z[][]) { int i, j, k; for (i = 0; i! = 32; i = i + 1) for (j = 0; j! = 32; j = j + 1) for (k = 0; k! = 32; k = k + 1) x[i][j] = x[i][j] + y[i][k] * z[k][j]; }Addresses of x, y, z in $a0, $a1, $a2, and i, j, k in $s0, $s1, $s2


FP Example: Array Multiplication MIPS code: li $t1, 32 # $t1 = 32 (row size/loop end) li $s0, 0 # i = 0; initialize 1st for loop L1: li $s1, 0 # j = 0; restart 2nd for loop L2: li $s2, 0 # k = 0; restart 3rd for loop sll $t2, $s0, 5 # $t2 = i * 32 (size of row of x) addu $t2, $t2, $s1 # $t2 = i * size(row) + j sll $t2, $t2, 3 # $t2 = byte offset of [i][j] addu $t2, $a0, $t2 # $t2 = byte address of x[i][j] l.d $f4, 0($t2) # $f4 = 8 bytes of x[i][j] L3: sll $t0, $s2, 5 # $t0 = k * 32 (size of row of z) addu $t0, $t0, $s1 # $t0 = k * size(row) + j sll $t0, $t0, 3 # $t0 = byte offset of [k][j] addu $t0, $a2, $t0 # $t0 = byte address of z[k][j] l.d $f16, 0($t0) # $f16 = 8 bytes of z[k][j]


FP Example: Array Multiplication sll $t0, $s0, 5 # $t0 = i*32 (size of row of y) addu $t0, $t0, $s2 # $t0 = i*size(row) + k sll $t0, $t0, 3 # $t0 = byte offset of [i][k] addu $t0, $a1, $t0 # $t0 = byte address of y[i][k] l.d $f18, 0($t0) # $f18 = 8 bytes of y[i][k] mul.d $f16, $f18, $f16 # $f16 = y[i][k] * z[k][j] add.d $f4, $f4, $f16 # f4=x[i][j] + y[i][k]*z[k][j] addiu $s2, $s2, 1 # $k k + 1 bne $s2, $t1, L3 # if (k != 32) go to L3 s.d $f4, 0($t2) # x[i][j] = $f4 addiu $s1, $s1, 1 # $j = j + 1 bne $s1, $t1, L2 # if (j != 32) go to L2 addiu $s0, $s0, 1 # $i = i + 1 bne $s0, $t1, L1 # if (i != 32) go to L1


Accurate ArithmeticIEEE Std 754 specifies additional rounding controlExtra bits of precision (guard, round, sticky)Choice of rounding modesAllows programmer to fine-tune numerical behavior of a computationNot all FP units implement all optionsMost programming languages and FP libraries just use defaultsTrade-off between hardware complexity, performance, and market requirements


Interpretation of DataBits have no inherent meaningInterpretation depends on the instructions appliedComputer representations of numbersFinite range and precisionNeed to account for this in programsThe BIG Picture


AssociativityParallel programs may interleave operations in unexpected ordersAssumptions of associativity may fail3.6 Parallelism and Computer Arithmetic: AssociativityNeed to validate parallel programs under varying degrees of parallelism


Sheet1

(x+y)+zx+(y+z)

x-1.50E+380.00E+00-1.50E+38

y1.50E+381.50E+38

z1.01.0

1.00E+000.00E+00

x86 FP ArchitectureOriginally based on 8087 FP coprocessor8 80-bit extended-precision registersUsed as a push-down stackRegisters indexed from TOS: ST(0), ST(1), FP values are 32-bit or 64 in memoryConverted on load/store of memory operandInteger operands can also be converted on load/storeVery difficult to generate and optimize codeResult: poor FP performance3.7 Real Stuff: Floating Point in the x86


x86 FP InstructionsOptional variationsI: integer operandP: pop operand from stackR: reverse operand orderBut not all combinations allowed

Data transferArithmeticCompareTranscendentalFILD mem/ST(i)FISTP mem/ST(i)FLDPIFLD1FLDZFIADDP mem/ST(i)FISUBRP mem/ST(i) FIMULP mem/ST(i) FIDIVRP mem/ST(i)FSQRTFABSFRNDINTFICOMPFIUCOMPFSTSW AX/memFPATANF2XMIFCOSFPTANFPREMFPSINFYL2X


Streaming SIMD Extension 2 (SSE2)Adds 4 128-bit registersExtended to 8 registers in AMD64/EM64TCan be used for multiple FP operands2 64-bit double precision4 32-bit double precisionInstructions operate on them simultaneouslySingle-Instruction Multiple-Data


Right Shift and DivisionLeft shift by i places multiplies an integer by 2iRight shift divides by 2i?Only for unsigned integersFor signed integersArithmetic right shift: replicate the sign bite.g., 5 / 4111110112 >> 2 = 111111102 = 2Rounds toward c.f. 111110112 >>> 2 = 001111102 = +623.8 Fallacies and Pitfalls


Who Cares About FP Accuracy?Important for scientific codeBut for everyday consumer use?My bank balance is out by 0.0002! The Intel Pentium FDIV bugThe market expects accuracySee Colwell, The Pentium Chronicles


Concluding RemarksISAs support arithmeticSigned and unsigned integersFloating-point approximation to realsBounded range and precisionOperations can overflow and underflowMIPS ISACore instructions: 54 most frequently used100% of SPECINT, 97% of SPECFPOther instructions: less frequent3.9 Concluding Remarks


Morgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for ComputersMorgan Kaufmann PublishersChapter 3 Arithmetic for Computers

chapter 3 arithmetic for computers.ppt

Documents