1. 2 12.1 rounding modes 3 rounding: the process to obtain the best possible floating-point...

25
1

Upload: joshua-morrison

Post on 18-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

1

Page 2: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

2

12.1 Rounding Modes

Page 3: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

3

Rounding: the process to obtain the best possible floating-point representation for a given real value.

ANSI/IEEE standard: round to floating number whose significand has an LSB of 0 (of two adjacent floating-point number, the significand of one must end in 0, and the other one in 1). This is called round-to-near-even.

For example, 3.5 and 4.5 are both rounded to 4, the closet even number, based on round-to-near-even.

Page 4: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

4

• Other rounding methods– Round inward (toward 0):choose the nearest value

in the same direction as 0.– Round upward (toward +∞): choose the larger of

the two possible values.– Round downward (toward -∞): choose the smaller

of the two possible vavlues.

Page 5: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

5

Example 12.1 Rounding to the nearest integer

a. Consider the rounded even integer corresponding to a real signed-magnitude number x a rtnei(x). Plot this round-to-nearest-even-integer for x in the range [-4,4].

b. Repeat part a for the function rtni(x), that is, round-to-nearest-integer function, where the midway values are always rounded up

Page 6: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

6

Page 7: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

7

Example 12.2 Directed rounding

a. Consider the inward-directed round corresponding to a real signed-magnitude number x as a function ritni(x). Plot this round-inward-to-nearest-integer function for x in the range [-4,4].

b. Repeat part a for the round-upward-to-nearest-integer rutni(x).

Page 8: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

8

Figure 12.3 Two directed round-to-nearest-integer functions for x in [– 4, 4].

Page 9: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

9

Figure 12.3 (Continued)

Page 10: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

10

12.2 Special Values and Execeptions

• Five special values in ANSI/IEEE floating-point standard– ±0 Biased exponent=0, significand=0 (no

hidden 1)– ± ∞ Biased exponent=255 (short), or 2047

(long), significand=0– NaN Biased exponent=255 (short), or 2047

(long), significand≠0

Page 11: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

11

Consider the addition of ±2e1s1 and ±2e2s2, where e1 > e2

(±2e1s1) +(±2e2s2)=±2e1(s1±s2/2e1-e2)

12.3 Floating-Point Addition

Page 12: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

12

Page 13: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

13Figure 12.6 Simplified schematic of a floating-point adder

Page 14: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

14

12.4 Other Floating-point Operations

Multiplication of ±2e1s1 and ±2e2s2

(±2e1s1)×(±2e2s2)=±2e1+e2(s1×s2/2e1-e2)

Division of ±2e1s1 and ±2e2s2

(±2e1s1)/(±2e2s2)=±2e1-e2(s1/s2)

Page 15: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

15Figure 12.6 Simplified schematic of a floating-point multiply/divide unit.

Page 16: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

16

Figure 12.7 The common floating-point instruction format for MiniMIPS and components for arithmetic instructions. The extension (ex) field distinguishes single (* = s) from double (* = d) operands.

12.5 Floating-Point Instructions

10 floating-point arithmetic instructions (5 different operations: add, sub, multiply, divide, negate)

add.s $f0,$f8,$f10 # set $f0 to ($f8)+($f10)

add.d $f0,$f8,$f10 # set $f0 $f1 to ($f8$f9)+($f10$f11)

Single operands can be in any of the floating registers. Double operands must be in specified to be in even numbered registers

Page 17: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

17

Figure 12.8 Floating-point instructions for format conversion in MiniMIPS.

6 format conversion instructions: integer to single/double, single to double, double to single, and single/double to integercvt.s.w $f0,$f8 # set $f0 to single (integer $f8)cvt.d.w $f0,$f8 # set $f0 to double (integer $f8)cvt.d.s $f0,$f8 # set $f0 to double ($f8)cvt.s.d $f0,$f8 # set $f0 to single ( $f8, $f9,)cvt.w.s $f0,$f8 # set $f0 to integer ($f8)cvt.w.d $f0,$f8 # set $f0 to integer ($f8, $f9)

Page 18: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

18

Figure 12.9 Instructions for floating-point data movement in MiniMIPS.

6 data transfer instructions: load/store word to/from coprocessor1, move single/double from one FP register to another, move (copy) between FP registers and CPU general registers.

lwcl $f8, 40($3) # load mem[40+($s3)] into $f8swc1 $f8, A($3) # store mem[A+($s3)] into $f8mv.s $f0,$f8 # load $f0 with ($f8)mv.d $f0,$f8 # load $f0,$f1 with ( $f8, $f9,)mfc1 $t0,$f12 # load $t0 with ($f12)mtc1 $f8,$t4 # load $f8 with ($t4)

Page 19: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

19

Figure 12.10 Floating-point branch and comparison instructions in MiniMIPS.

2 branch and 6 comparison instructions. The FP unit has a flag that is set to T or F based on 6 comparisons (equal, less than, or less or equal for single/double data type)

bc1t L # branch on FP flag truebc1f L # branch on FP flag falsec.eq.* $f0, $f8 # if ($f0)=($f8), set flag to truec.lt.* $f0, $f8 # if ($f0)<($f8), set flag to truec.lw.* $f0, $f8 # if ($f0)≤($f8), set flag to true

Page 20: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

20

Table 12.1 The 30 MiniMIPS floating-point instructions:because the op field contains 17 for all but two of the instructions (49 for lwc1 and 50 for swc1), it is not shown.

Page 21: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

21

12.6 Result Precision and Errors• FP arithmetic can be quite dangerous and must be used with

proper care, because results of FP computations are inexact.

• Why? – Many real numbers do not have exact binary representation within a

finite word format. This is referred as representation error.

– Even for values that are exactly representable, FP arithmetic produces inexact results. For example, product of 2 short FP numbers will have a 48 bits significant that must be rounded to 23 bits (plus hidden 1) This is called computation error.

Page 22: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

22

Example 12. 4

Associate law of addition does not hold in general in FP arithmetic. For example

a= -25×(1.10101011)

b=25 × (1.10101110)

c=-2-2 × (1.01100101)

(a+b)+c = a+(b+c) ?

Page 23: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

23

Figure 12.11 Algebraically equivalent computations may yield different results with floating-point arithmetic.

Page 24: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

24

• Using guard digits to avoid excessive error.For example, in a 10-digit calculator, 1/3 is represented as 0.333 333 333 3, multiplying 3 results in 0.999 999 999 9, but not 1.

However, in a calculator with 2 guard bits, 1/3 is represented as 0.333 333 333 333, but still displayed as 0.333 333 333 3, multiplying 3 results in 1.

Page 25: 1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

25

Figure 12.12 Function evaluation by table lookup and linear interpolation.