integer operations
DESCRIPTION
Integer Operations. Outline. Arithmetic Operations overflow Unsigned addition, multiplication Signed addition, negation, multiplication Using Shift to perform power-of-2 multiply/divide Suggested reading Chap 2.3. Negation :取反. • • •. • • •. u. Operands: w bits. • • •. • • •. +. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/1.jpg)
1
Integer Operations
![Page 2: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/2.jpg)
2
Outline
• Arithmetic Operations– overflow– Unsigned addition, multiplication– Signed addition, negation, multiplication– Using Shift to perform power-of-2
multiply/divide
• Suggested reading
– Chap 2.3
Negation :取反
![Page 3: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/3.jpg)
3
Unsigned Addition
• • •
• • •
u
v+
• • •u + v
• • •
True Sum: w+1 bits
Operands: w bits
Discard Carry: w bits UAddw(u , v)
![Page 4: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/4.jpg)
4
Unsigned Addition
• Standard Addition Function
– Ignores carry output
• Implements Modular Arithmetic
– s = UAddw(u , v) = (u + v) mod 2w
P67 (2.9)
![Page 5: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/5.jpg)
5
Visualizing Unsigned Addition P68 Figure 2.16
• Wraps Around– If true sum ≥ 2w
– At most once
02
46
810
1214
0
2
4
6
8
10
1214
0
2
4
6
8
10
12
14
16
UAdd4(u , v)
u
v
Overflow
0
2w
2w+1
True Sum
Modular Sum
Overflow
Module: 取模
![Page 6: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/6.jpg)
6
Unsigned Addition Forms an Abelian Group P68
• Closed under addition
– 0 UAddw(u , v) 2w –1
• Commutative (交换律)– UAddw(u , v) = UAddw(v , u)
• Associative (结合律)– UAddw (t, UAddw (u,v)) = UAddw (UAddw (t, u ),
v)
![Page 7: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/7.jpg)
7
Unsigned Addition Forms an Abelian Group
• 0 is additive identity
– UAddw (u , 0) = u
• Every element has additive inverse
– Let UCompw (u ) = 2w – u
– UAddw(u , UCompw (u )) = 0
P68 ( 2.10)
![Page 8: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/8.jpg)
8
Signed Addition
• Functionality– True sum requires w+1 bits– Drop off MSB– Treat remaining bits as 2’s comp. integer
)(,2
,
)(,2
),(
NegOverTMinvuvu
TMaxvuTMinvu
PosOvervuTMaxvu
vuTadd
ww
ww
ww
PosOver : Positive OverflowNegOver : Negative Overflow
P70 ( 2.12)
![Page 9: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/9.jpg)
9
Signed Addition P70 Figure 2.17
u
v
< 0 > 0
< 0
> 0
NegOver
PosOverTAdd(u , v)
–2w –1
–2w
0
2w –1
2w–1
True Sum
TAdd Result
1 000…0
1 100…0
0 000…0
0 100…0
0 111…1
100…0
000…0
011…1
PosOver
NegOver
![Page 10: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/10.jpg)
10
Visualizing 2’s Comp. Addition
• Values– 4-bit two’s comp.
– Range from -8 to +7
• Wraps Around– If sum 2w-1
• Becomes negative
– If sum < –2w–1
• Becomes positive
![Page 11: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/11.jpg)
11
Visualizing 2’s Comp. Addition P72 Figure 2.19
-8 -6 -4-2 0
24
6
-8
-6
-4
-2
0
2
46
-8
-6
-4
-2
0
2
4
6
8
TAdd4(u , v)
u
v
PosOver
NegOver
![Page 12: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/12.jpg)
12
Detecting Tadd Overflow P71
• Task– Given s = TAddw(u , v)
– Determine if s = Addw(u , v)
• Claim– Overflow iff either:
• u, v < 0, s 0 (NegOver)• u, v 0, s < 0 (PosOver)
– ovf = (u<0 == v<0) && (u<0 != s<0);
0
2w –1
2w–1PosOver
NegOver
![Page 13: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/13.jpg)
13
Mathematical Properties of TAdd
• Two’s Complement Under TAdd Forms a Group– Closed, Commutative, Associative, 0 is
additive identity– Every element has additive inverse
• Let
• TAddw(u , TCompw (u )) = 0
TCompw(u) u u TMinw
TMinw u TMinw
P73 ( 2.13)
![Page 14: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/14.jpg)
14
Mathematical Properties of TAdd
• Isomorphic Algebra to UAdd
– TAddw (u , v) = U2T (UAddw(T2U(u ), T2U(v)))
• Since both have identical bit patterns
– T2U(TAddw (u , v)) = UAddw(T2U(u ), T2U(v))
Isomorphic :同构
![Page 15: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/15.jpg)
15
Negating with Complement & Increment P73
• In C– ~x + 1 == -x
• Complement– Observation: ~x + x == 1111…111 == -1
1 0 0 1 0 11 1 x
0 1 1 0 1 00 0~x+
1 1 1 1 1 11 1-1
~x : Complement
![Page 16: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/16.jpg)
16
Signed Addition
• Increment– ~x + 1 = ~x +[x + (-x)] +1– (~x + x) + -x + 1 == -1 + (-x + 1) == -x– So,– ~x + 1 == -x
![Page 17: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/17.jpg)
17
Multiplication P75
• Computing Exact Product of w-bit numbers x, y– Either signed or unsigned
• Ranges– Unsigned: 0 ≤ x * y ≤ (2w – 1) 2 = 22w – 2w+1 + 1
• Up to 2w bits
– Two’s complement min: x *y ≥–2w–1*(2w–1–1) = –22w–2 + 2w–1
• Up to 2w–1 bits
– Two’s complement max: x * y ≤ (–2w–1) 2 = 22w–2
• Up to 2w bits, but only for TMinw2
![Page 18: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/18.jpg)
18
Multiplication
• Maintaining Exact Results– Would need to keep expanding word size with
each product computed
– Done in software by “arbitrary precision” arithmetic packages
![Page 19: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/19.jpg)
19
Power-of-2 Multiply with Shift
• • •
0 0 1 0 0 0•••
u
2k*
u · 2kTrue Product: w+k bits
Operands: w bits
Discard k bits: w bits UMultw(u , 2k)
•••
k
• • • 0 0 0•••
TMultw(u , 2k)
0 0 0••••••
![Page 20: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/20.jpg)
20
Power-of-2 Multiply with Shift
• Operation– u << k gives u * 2k
– Both signed and unsigned
• Examples– u << 3 == u * 8– u << 5 - u << 3 == u * 24– Most machines shift and add much faster
than multiply• Compiler will generate this code automatically
![Page 21: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/21.jpg)
21
Unsigned Power-of-2 Divide with Shift
• Quotient of Unsigned by Power of 2– u >> k gives u / 2k – Uses logical shift
0 0 1 0 0 0•••
u
2k/
u / 2kDivision:
Operands:•••
k••• •••
•••0 0 0••• •••
u / 2k •••0 0•••0Quotient:
.
Binary Point
![Page 22: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/22.jpg)
22
2’s Comp Power-of-2 Divide with Shift P77
• Quotient of Signed by Power of 2– u >> k gives u / 2k – Uses arithmetic shift– Rounds wrong direction when u < 0
0 0 1 0 0 0•••
u
2k/
u / 2kDivision:
Operands:•••
k••• •••
•••0 ••• •••
RoundDown(u / 2k) •••Result:
.
Binary Point
0 •••
![Page 23: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/23.jpg)
23
Correct Power-of-2 Divide
• Quotient of Negative Number by Power of 2– Want u / 2k (Round Toward 0)– Compute as (u+2k-1)/ 2k
• In C: (u + (1<<k)-1) >> k• Biases divided toward 0
Quotient :商
![Page 24: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/24.jpg)
24
Correct Power-of-2 Divide
Divisor:
Dividend:
Case 1: No rounding
0 0 1 0 0 0•••
u
2k/
u / 2k
•••
k1 ••• 0 0 0•••
1 •••0 1 1••• .
Binary Point
1
0 0 0 1 1 1•••+2k +–1 •••
1 1 1•••
1 ••• 1 1 1•••
Biasing has no effect
![Page 25: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/25.jpg)
25
Correct Power-of-2 Divide
Divisor:
Dividend:
Case 2: Rounding
0 0 1 0 0 0•••
u
2k/
u / 2k
•••
k1 ••• •••
1 •••0 1 1••• .
Binary Point
1
0 0 0 1 1 1•••+2k +–1 •••
1 ••• •••
Biasing adds 1 to final result
•••
Incremented by 1
Incremented by 1
![Page 26: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/26.jpg)
26
Floating Point
![Page 27: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/27.jpg)
27
Topics
• Fractional Binary Numbers• IEEE 754 Standard• Rounding Mode• FP Operations• Floating Point in C• Suggested Reading: Chap 2.4
![Page 28: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/28.jpg)
28
Encoding Rational Numbers P80
• Form V =• Very useful when >> 0 or <<1• An Approximation to real arithmetic• From programmer’s perspective
– Uninteresting– Arcane and incomprehensive
* Arcane :神秘的* Incomprehensive: 不可理解的
yx 2V V
![Page 29: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/29.jpg)
29
Encoding Rational Numbers
• Until 1980s– Many idiosyncratic formats, fast speed, easy
implementation, less accuracy
• IEEE 754– Designed by W. Kahan for Intel processors
– Based on a small and consistent set of principles, elegant, understandable, hard to make go fast
Idiosyncratic: 特殊的Elegant :雅致的
![Page 30: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/30.jpg)
30
Fractional Binary Numbers
bm bm–1 b2 b1 b0 b–1 b–2 b–3 b–n• • •• • • .
124
2m–1
2m
• • •
• • •
1/21/4
1/8
2–n
![Page 31: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/31.jpg)
31
Fractional Binary Numbers
• Bits to right of “binary point” represent fractional powers of 2
• Represents rational number: 2i
m
niib P81 ( 2.17)
![Page 32: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/32.jpg)
32
Fractional Numbers to Binary Bits
unsigned result_bits=0, current_bit=0x80000000
for (i=0;i<32;i++) {
x *= 2
if ( x>= 1 ) {
result_bits |= current_bit ;
if ( x == 1)
break ;
x -= 1 ;
}
current_bit >> 1 ;
}
![Page 33: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/33.jpg)
33
Fraction Binary Number Examples
Value Binary Fraction0.2 0.00110011[0011]• Observations:
– The form 0.11111…11 represent numbers just below 1.0 which is noted as 1.0-
– Binary Fractions can only exactly represent x/2k
– Others have repeated bit patterns
![Page 34: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/34.jpg)
34
IEEE Floating-Point Representation P83
• Numeric form– V=(-1)sM 2E
• Sign bit s determines whether number is negative or positive
• Significand M normally a fractional value in range [1.0,2.0).
• Exponent E weights value by power of two
![Page 35: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/35.jpg)
35
IEEE Floating-Point Representation
• Encoding– – s is sign bit– exp field encodes E– frac field encodes M
• Sizes– Single precision (32 bits): 8 exp bits, 23 frac
bits– Double precision (64 bits): 11 exp bits, 52 frac
bits
s exp frac
![Page 36: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/36.jpg)
36
Normalize Values P84
• Condition– exp 000…0 and exp 111…1
• Exponent coded as biased value– E = Exp – Bias
• Exp : unsigned value denoted by exp • Bias : Bias value
– Single precision: 127 (Exp: 1…254, E : -126…127)– Double precision: 1023 (Exp: 1…2046,
E : -1022 …1023)
– In general: Bias = 2m-1 - 1, where m is the number of exponent bits
![Page 37: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/37.jpg)
37
Normalize Values
• Significand coded with implied leading 1
– m = 1.xxx…x2
•xxx…x: bits of frac
• Minimum when 000…0 (M = 1.0)
• Maximum when 111…1 (M = 2.0 – )
• Get extra leading bit for “free”
![Page 38: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/38.jpg)
38
Normalized Encoding Examples
• Value: 12345 (Hex: 0x3039)• Binary bits: 11000000111001• Fraction representation:
1.1000000111001*213
• M: 10000001110010000000000 • E: 10001100 (140)• Binary Encoding
– 0100 0110 0100 0000 1110 0100 0000 0000– 4640E400
![Page 39: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/39.jpg)
39
Denormalized Values P84
• Condition– exp = 000…0
• Values– Exponent Value: E = 1 – Bias
– Significant Value m = 0.xxx…x2
•xxx…x: bits of frac
![Page 40: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/40.jpg)
40
Denormalized Values
• Cases– exp = 000…0, frac = 000…0
• Represents value 0• Note that have distinct values +0 and –0
– exp = 000…0, frac 000…0• Numbers very close to 0.0• Lose precision as get smaller• “Gradual underflow”
![Page 41: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/41.jpg)
41
Special Values P85
• Condition– exp = 111…1
![Page 42: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/42.jpg)
42
Special Values
• exp = 111…1, frac = 000…0
– Represents value(infinity)
– Operation that overflows
– Both positive and negative
– E.g., 1.0/0.0 = 1.0/0.0 = +, 1.0/0.0 =
![Page 43: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/43.jpg)
43
Special Values
• exp = 111…1, frac 000…0
– Not-a-Number (NaN)
– Represents case when no numeric value can be determined
– E.g., sqrt(–1),
![Page 44: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/44.jpg)
44
Summary of Real Number Encodings P85 Figure 2.22
NaNNaN
+
0
+Denorm +Normalized-Denorm-Normalized
+0
![Page 45: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/45.jpg)
45
8-bit Floating-Point Representations
s exp frac
02367
![Page 46: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/46.jpg)
46
8-bit Floating-Point Representations
• Exp exp E 2E
• 0 0000 -6 1/64 (denorms)• 1 0001 -6 1/64• 2 0010 -5 1/32• 3 0011 -4 1/16• 4 0100 -3 1/8• 5 0101 -2 1/4• 6 0110 -1 1/2• 7 0111 0 1• 8 1000 +1 2• 9 1001 +2 4• 10 1010 +3 8• 11 1011 +4 16• 12 1100 +5 32• 13 1101 +6 64• 14 1110 +7 128• 15 1111 n/a (inf, NaN)
![Page 47: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/47.jpg)
47
Dynamic Range (Denormalized numbers) P86Figure 2.23
• s exp frac E Value
• 0 0000 000 -6 0• 0 0000 001 -6 1/8*1/64 = 1/512• 0 0000 010 -6 2/8*1/64 = 2/512• …• 0 0000 110 -6 6/8*1/64 = 6/512• 0 0000 111 -6 7/8*1/64 = 7/512
![Page 48: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/48.jpg)
48
Dynamic Range
• s exp frac E Value
• 0 0001 000 -6 8/8*1/64 = 8/512• 0 0001 001 -6 9/8*1/64 = 9/512• …• 0 0110 110 -1 14/8*1/2 = 14/16• 0 0110 111 -1 15/8*1/2 = 15/16• 0 0111 000 0 8/8*1 = 1• 0 0111 001 0 9/8*1 = 9/8
![Page 49: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/49.jpg)
49
Dynamic Range (Denormalized numbers)
• s exp frac E Value
• 0 0111 010 0 10/8*1 = 10/8• …• 0 1110 110 7 14/8*128 = 224• 0 1110 111 7 15/8*128 = 240• 0 1111 000 n/a inf
![Page 50: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/50.jpg)
50
Distribution of Representable Values
• 6-bit IEEE-like format– K = 3 exponent bits
– n = 2 significand bits
– Bias is 3
• Notice how the distribution gets denser toward zero.
![Page 51: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/51.jpg)
51
Distribution of Representable Values
-15 -10 -5 0 5 10 15
Denormalized Normalized Infinity
-1 -0.5 0 0.5 1
Denormalized Normalized Infinity
![Page 52: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/52.jpg)
52
Interesting Numbers P88 Figure 2.24
![Page 53: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/53.jpg)
53
Special Properties of Encoding
• FP Zero Same as Integer Zero– All bits = 0
• Can (Almost) Use Unsigned Integer Comparison– Must first compare sign bits– Must consider -0 = 0– NaNs problematic
• Will be greater than any other values
– Otherwise OK• Denorm vs. normalized• Normalized vs. infinity
![Page 54: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/54.jpg)
54
Round Mode P89
• Round down: – rounded result is close to but no greater than
true result.
• Round up: – rounded result is close to but no less than
true result.
![Page 55: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/55.jpg)
55
Round Mode P90 Figure 2.25
Mode 1.40
1.60
1.50
2.50 -1.50
Round-to-Even 1 2 2 2 -2
Round-toward-zero
1 1 1 2 -1
Round-down 1 1 1 2 -2
Round-up 2 2 2 3 -1
![Page 56: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/56.jpg)
56
Round-to-Even
• Default Rounding Mode– Hard to get any other kind without dropping
into assembly
– All others are statistically biased• Sum of set of positive numbers will consistently be
over- or under- estimated
![Page 57: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/57.jpg)
57
Round-to-Even P89
• Applying to Other Decimal Places
– When exactly halfway between two possible
values
• Round so that least significant digit is even
– E.g., round to nearest hundredth
1.2349999 1.23 (Less than half way)
1.2350001 1.24 (Greater than half way)
1.2350000 1.24 (Half way—round up)
1.2450000 1.24 (Half way—round down)
![Page 58: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/58.jpg)
58
Rounding Binary Number P89
• “ Even” when least significant bit is 0• Half way when bits to right of rounding
position = 100…2
Value Binary Rounded Action Round Decimal
2 3/32 10.00011 10.00 Down 2
2 3/16 10.0011 10.01 Up 2 1/4
2 7/8 10.111 11.00 Up 3
2 5/8 10.101 10.10 Down 2 1/2
![Page 59: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/59.jpg)
59
Floating-Point Operations
• Conceptual View
– First compute exact result
– Make it fit into desired precision
• Possibly overflow if exponent too large
• Possibly round to fit into frac
![Page 60: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/60.jpg)
60
Mathematical Properties of FP Add
• Compare to those of Abelian Group– Closed under addition? YES
• But may generate infinity or NaN
– Commutative? YES– Associative? NO
• Overflow and inexactness of rounding
– 0 is additive identity? YES– Every element has additive inverse ALMOST
• Except for infinities & NaNs
![Page 61: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/61.jpg)
61
Mathematical Properties of FP Add
• Monotonicity– a ≥ b a+c ≥ b+c?
ALMOST• Except for infinities & NaNs
![Page 62: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/62.jpg)
62
Algebraic Properties of FP Mult
• Compare to Commutative Ring– Closed under multiplication? YES
• But may generate infinity or NaN
– Multiplication Commutative? YES– Multiplication is Associative? P92 NO
• Possibility of overflow, inexactness of rounding
– 1 is multiplicative identity?YES
– Multiplication distributes over addition? NO• Possibility of overflow, inexactness of rounding
![Page 63: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/63.jpg)
63
Algebraic Properties of FP Mult P90
• Monotonicity– a ≥ b & c ≥ 0 a *c ≥ b *c? ALMOST
• Except for infinities & NaNs
![Page 64: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/64.jpg)
64
FP Multiplication
• Operands(–1)s1 M1 2E1
(–1)s2 M2 2E2
• Exact Result(–1)s M 2E
– Sign s : s1 ^ s2– Significand M : M1 * M2– Exponent E : E1 + E2
![Page 65: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/65.jpg)
65
FP Multiplication
• Fixing– If M ≥ 2, shift M right, increment E – If E out of range, overflow – Round M to fit frac precision
![Page 66: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/66.jpg)
66
FP Addition
• Operands(–1)s1 M1 2E1
(–1)s2 M2 2E2
– Assume E1 > E2
• Exact Result(–1)s M 2E
– Sign s, significand M: • Result of signed align & add
– Exponent E : E1
![Page 67: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/67.jpg)
67
FP Addition
• Fixing– If M ≥ 2, shift M right, increment E
– if M < 1, shift M left k positions, decrement E by k
– Overflow if E out of range
– Round M to fit frac precision
![Page 68: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/68.jpg)
68
FP Addition
(–1)s1 m1
(–1)s2 m2
E1–E2
+
(–1)s m
![Page 69: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/69.jpg)
69
Answers to Floating Point Puzzles
• int x = …;
• float f = …;
• double d = …;• Assume neither d nor f is NAN or infinity
![Page 70: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/70.jpg)
70
Floating Point in C
• x == (int)(float) xNo: 24 bit significand• x == (int)(double) x Yes: 53 bit significand• f == (float)(double) f Yes: increases precision• d == (float) d No: loses precision• f == -(-f); Yes: Just change sign bit• 2/3 == 2/3.0 No: 2/3 == 0• d < 0.0((d*2) < 0.0) Yes!• d > f -f < -d No• d *d >= 0.0 Yes!• (d+f)-d == f No: Not associative
![Page 71: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/71.jpg)
71
Answers to Floating Point Puzzles
• C Guarantees Two Levels– float single precision– double double precision
![Page 72: Integer Operations](https://reader036.vdocuments.us/reader036/viewer/2022070400/56813578550346895d9cdb04/html5/thumbnails/72.jpg)
72
Answers to Floating Point Puzzles
• Conversions– Casting between int, float, and double changes numeric
values– Double or float to int
• Truncates fractional part• Like rounding toward zero• Not defined when out of range
– Generally saturates to TMin or TMax– int to double
• Exact conversion, as long as int has ≤ 53 bit word size– int to float
• Will round according to rounding mode