integer operations

1

Integer Operations

2

Outline

• Arithmetic Operations– overflow– Unsigned addition, multiplication– Signed addition, negation, multiplication– Using Shift to perform power-of-2

multiply/divide

• Suggested reading

– Chap 2.3

Negation ：取反

3

Unsigned Addition

• • •

• • •

u

v+

• • •u + v

• • •

True Sum: w+1 bits

Operands: w bits

Discard Carry: w bits UAddw(u , v)

4

Unsigned Addition

• Standard Addition Function

– Ignores carry output

• Implements Modular Arithmetic

– s = UAddw(u , v) = (u + v) mod 2w

P67 (2.9)

5

Visualizing Unsigned Addition P68 Figure 2.16

• Wraps Around– If true sum ≥ 2w

– At most once

02

46

810

1214

0

2

4

6

8

10

1214

0

2

4

6

8

10

12

14

16

UAdd4(u , v)

u

v

Overflow

0

2w

2w+1

True Sum

Modular Sum

Overflow

Module: 取模

6

Unsigned Addition Forms an Abelian Group P68

• Closed under addition

– 0 UAddw(u , v) 2w –1

• Commutative （交换律）– UAddw(u , v) = UAddw(v , u)

• Associative （结合律）– UAddw (t, UAddw (u,v)) = UAddw (UAddw (t, u ),

v)

7

Unsigned Addition Forms an Abelian Group

• 0 is additive identity

– UAddw (u , 0) = u

• Every element has additive inverse

– Let UCompw (u ) = 2w – u

– UAddw(u , UCompw (u )) = 0

P68 （ 2.10）

8

Signed Addition

• Functionality– True sum requires w+1 bits– Drop off MSB– Treat remaining bits as 2’s comp. integer

)(,2

,

)(,2

),(

NegOverTMinvuvu

TMaxvuTMinvu

PosOvervuTMaxvu

vuTadd

ww

ww

ww

PosOver ： Positive OverflowNegOver ： Negative Overflow

P70 （ 2.12）

9

Signed Addition P70 Figure 2.17

u

v

< 0 > 0

< 0

> 0

NegOver

PosOverTAdd(u , v)

–2w –1

–2w

0

2w –1

2w–1

True Sum

TAdd Result

1 000…0

1 100…0

0 000…0

0 100…0

0 111…1

100…0

000…0

011…1

PosOver

NegOver

10

Visualizing 2’s Comp. Addition

• Values– 4-bit two’s comp.

– Range from -8 to +7

• Wraps Around– If sum 2w-1

• Becomes negative

– If sum < –2w–1

• Becomes positive

11

Visualizing 2’s Comp. Addition P72 Figure 2.19

-8 -6 -4-2 0

24

6

-8

-6

-4

-2

0

2

46

-8

-6

-4

-2

0

2

4

6

8

TAdd4(u , v)

u

v

PosOver

NegOver

12

Detecting Tadd Overflow P71

• Task– Given s = TAddw(u , v)

– Determine if s = Addw(u , v)

• Claim– Overflow iff either:

• u, v < 0, s 0 (NegOver)• u, v 0, s < 0 (PosOver)

– ovf = (u<0 == v<0) && (u<0 != s<0);

0

2w –1

2w–1PosOver

NegOver

13

Mathematical Properties of TAdd

• Two’s Complement Under TAdd Forms a Group– Closed, Commutative, Associative, 0 is

additive identity– Every element has additive inverse

• Let

• TAddw(u , TCompw (u )) = 0

TCompw(u) u u TMinw

TMinw u TMinw

P73 （ 2.13）

14

Mathematical Properties of TAdd

• Isomorphic Algebra to UAdd

– TAddw (u , v) = U2T (UAddw(T2U(u ), T2U(v)))

• Since both have identical bit patterns

– T2U(TAddw (u , v)) = UAddw(T2U(u ), T2U(v))

Isomorphic ：同构

15

Negating with Complement & Increment P73

• In C– ~x + 1 == -x

• Complement– Observation: ~x + x == 1111…111 == -1

1 0 0 1 0 11 1 x

0 1 1 0 1 00 0~x+

1 1 1 1 1 11 1-1

~x ： Complement

16

Signed Addition

• Increment– ~x + 1 = ~x +[x + (-x)] +1– (~x + x) + -x + 1 == -1 + (-x + 1) == -x– So,– ~x + 1 == -x

17

Multiplication P75

• Computing Exact Product of w-bit numbers x, y– Either signed or unsigned

• Ranges– Unsigned: 0 ≤ x * y ≤ (2w – 1) 2 = 22w – 2w+1 + 1

• Up to 2w bits

– Two’s complement min: x *y ≥–2w–1*(2w–1–1) = –22w–2 + 2w–1

• Up to 2w–1 bits

– Two’s complement max: x * y ≤ (–2w–1) 2 = 22w–2

• Up to 2w bits, but only for TMinw2

18

Multiplication

• Maintaining Exact Results– Would need to keep expanding word size with

each product computed

– Done in software by “arbitrary precision” arithmetic packages

19

Power-of-2 Multiply with Shift

• • •

0 0 1 0 0 0•••

u

2k*

u · 2kTrue Product: w+k bits

Operands: w bits

Discard k bits: w bits UMultw(u , 2k)

•••

k

• • • 0 0 0•••

TMultw(u , 2k)

0 0 0••••••

20

Power-of-2 Multiply with Shift

• Operation– u << k gives u * 2k

– Both signed and unsigned

• Examples– u << 3 == u * 8– u << 5 - u << 3 == u * 24– Most machines shift and add much faster

than multiply• Compiler will generate this code automatically

21

Unsigned Power-of-2 Divide with Shift

• Quotient of Unsigned by Power of 2– u >> k gives u / 2k – Uses logical shift

0 0 1 0 0 0•••

u

2k/

u / 2kDivision:

Operands:•••

k••• •••

•••0 0 0••• •••

u / 2k •••0 0•••0Quotient:

.

Binary Point

22

2’s Comp Power-of-2 Divide with Shift P77

• Quotient of Signed by Power of 2– u >> k gives u / 2k – Uses arithmetic shift– Rounds wrong direction when u < 0

0 0 1 0 0 0•••

u

2k/

u / 2kDivision:

Operands:•••

k••• •••

•••0 ••• •••

RoundDown(u / 2k) •••Result:

.

Binary Point

0 •••

23

Correct Power-of-2 Divide

• Quotient of Negative Number by Power of 2– Want u / 2k (Round Toward 0)– Compute as (u+2k-1)/ 2k

• In C: (u + (1<<k)-1) >> k• Biases divided toward 0

Quotient ：商

24


Divisor:

Dividend:

Case 1: No rounding

0 0 1 0 0 0•••

u

2k/

u / 2k

•••

k1 ••• 0 0 0•••

1 •••0 1 1••• .

Binary Point

1

0 0 0 1 1 1•••+2k +–1 •••

1 1 1•••

1 ••• 1 1 1•••

Biasing has no effect

25


Divisor:

Dividend:

Case 2: Rounding

0 0 1 0 0 0•••

u

2k/

u / 2k

•••

k1 ••• •••

1 •••0 1 1••• .

Binary Point

1

0 0 0 1 1 1•••+2k +–1 •••

1 ••• •••

Biasing adds 1 to final result

•••

Incremented by 1

Incremented by 1

26

Floating Point

27

Topics

• Fractional Binary Numbers• IEEE 754 Standard• Rounding Mode• FP Operations• Floating Point in C• Suggested Reading: Chap 2.4

28

Encoding Rational Numbers P80

• Form V =• Very useful when >> 0 or <<1• An Approximation to real arithmetic• From programmer’s perspective

– Uninteresting– Arcane and incomprehensive

* Arcane ：神秘的* Incomprehensive: 不可理解的

yx 2V V

29

Encoding Rational Numbers

• Until 1980s– Many idiosyncratic formats, fast speed, easy

implementation, less accuracy

• IEEE 754– Designed by W. Kahan for Intel processors

– Based on a small and consistent set of principles, elegant, understandable, hard to make go fast

Idiosyncratic: 特殊的Elegant ：雅致的

30

Fractional Binary Numbers

bm bm–1 b2 b1 b0 b–1 b–2 b–3 b–n• • •• • • .

124

2m–1

2m

• • •

• • •

1/21/4

1/8

2–n

31

Fractional Binary Numbers

• Bits to right of “binary point” represent fractional powers of 2

• Represents rational number: 2i

m

niib P81 （ 2.17）

32

Fractional Numbers to Binary Bits

unsigned result_bits=0, current_bit=0x80000000

for (i=0;i<32;i++) {

x *= 2

if ( x>= 1 ) {

result_bits |= current_bit ;

if ( x == 1)

break ;

x -= 1 ;

}

current_bit >> 1 ;

}

33

Fraction Binary Number Examples

Value Binary Fraction0.2 0.00110011[0011]• Observations:

– The form 0.11111…11 represent numbers just below 1.0 which is noted as 1.0-

– Binary Fractions can only exactly represent x/2k

– Others have repeated bit patterns

34

IEEE Floating-Point Representation P83

• Numeric form– V=(-1)sM 2E

• Sign bit s determines whether number is negative or positive

• Significand M normally a fractional value in range [1.0,2.0).

• Exponent E weights value by power of two

35

IEEE Floating-Point Representation

• Encoding– – s is sign bit– exp field encodes E– frac field encodes M

• Sizes– Single precision (32 bits): 8 exp bits, 23 frac

bits– Double precision (64 bits): 11 exp bits, 52 frac

bits

s exp frac

36

Normalize Values P84

• Condition– exp 000…0 and exp 111…1

• Exponent coded as biased value– E = Exp – Bias

• Exp : unsigned value denoted by exp • Bias : Bias value

– Single precision: 127 (Exp: 1…254, E : -126…127)– Double precision: 1023 (Exp: 1…2046,

E : -1022 …1023)

– In general: Bias = 2m-1 - 1, where m is the number of exponent bits

37

Normalize Values

• Significand coded with implied leading 1

– m = 1.xxx…x2

•xxx…x: bits of frac

• Minimum when 000…0 (M = 1.0)

• Maximum when 111…1 (M = 2.0 – )

• Get extra leading bit for “free”

38

Normalized Encoding Examples

• Value: 12345 (Hex: 0x3039)• Binary bits: 11000000111001• Fraction representation:

1.1000000111001*213

• M: 10000001110010000000000 • E: 10001100 (140)• Binary Encoding

– 0100 0110 0100 0000 1110 0100 0000 0000– 4640E400

39

Denormalized Values P84

• Condition– exp = 000…0

• Values– Exponent Value: E = 1 – Bias

– Significant Value m = 0.xxx…x2

•xxx…x: bits of frac

40

Denormalized Values

• Cases– exp = 000…0, frac = 000…0

• Represents value 0• Note that have distinct values +0 and –0

– exp = 000…0, frac 000…0• Numbers very close to 0.0• Lose precision as get smaller• “Gradual underflow”

41

Special Values P85

• Condition– exp = 111…1

42

Special Values

• exp = 111…1, frac = 000…0

– Represents value(infinity)

– Operation that overflows

– Both positive and negative

– E.g., 1.0/0.0 = 1.0/0.0 = +, 1.0/0.0 =

43

Special Values

• exp = 111…1, frac 000…0

– Not-a-Number (NaN)

– Represents case when no numeric value can be determined

– E.g., sqrt(–1),

44

Summary of Real Number Encodings P85 Figure 2.22

NaNNaN

+

0

+Denorm +Normalized-Denorm-Normalized

+0

45

8-bit Floating-Point Representations

s exp frac

02367

46

8-bit Floating-Point Representations

• Exp exp E 2E

• 0 0000 -6 1/64 (denorms)• 1 0001 -6 1/64• 2 0010 -5 1/32• 3 0011 -4 1/16• 4 0100 -3 1/8• 5 0101 -2 1/4• 6 0110 -1 1/2• 7 0111 0 1• 8 1000 +1 2• 9 1001 +2 4• 10 1010 +3 8• 11 1011 +4 16• 12 1100 +5 32• 13 1101 +6 64• 14 1110 +7 128• 15 1111 n/a (inf, NaN)

47

Dynamic Range (Denormalized numbers) P86Figure 2.23

• s exp frac E Value

• 0 0000 000 -6 0• 0 0000 001 -6 1/8*1/64 = 1/512• 0 0000 010 -6 2/8*1/64 = 2/512• …• 0 0000 110 -6 6/8*1/64 = 6/512• 0 0000 111 -6 7/8*1/64 = 7/512

48

Dynamic Range


• 0 0001 000 -6 8/8*1/64 = 8/512• 0 0001 001 -6 9/8*1/64 = 9/512• …• 0 0110 110 -1 14/8*1/2 = 14/16• 0 0110 111 -1 15/8*1/2 = 15/16• 0 0111 000 0 8/8*1 = 1• 0 0111 001 0 9/8*1 = 9/8

49

Dynamic Range (Denormalized numbers)


• 0 0111 010 0 10/8*1 = 10/8• …• 0 1110 110 7 14/8*128 = 224• 0 1110 111 7 15/8*128 = 240• 0 1111 000 n/a inf

50

Distribution of Representable Values

• 6-bit IEEE-like format– K = 3 exponent bits

– n = 2 significand bits

– Bias is 3

• Notice how the distribution gets denser toward zero.

51

Distribution of Representable Values

-15 -10 -5 0 5 10 15

Denormalized Normalized Infinity

-1 -0.5 0 0.5 1

Denormalized Normalized Infinity

52

Interesting Numbers P88 Figure 2.24

53

Special Properties of Encoding

• FP Zero Same as Integer Zero– All bits = 0

• Can (Almost) Use Unsigned Integer Comparison– Must first compare sign bits– Must consider -0 = 0– NaNs problematic

• Will be greater than any other values

– Otherwise OK• Denorm vs. normalized• Normalized vs. infinity

54

Round Mode P89

• Round down: – rounded result is close to but no greater than

true result.

• Round up: – rounded result is close to but no less than

true result.

55

Round Mode P90 Figure 2.25

Mode 1.40

1.60

1.50

2.50 -1.50

Round-to-Even 1 2 2 2 -2

Round-toward-zero

1 1 1 2 -1

Round-down 1 1 1 2 -2

Round-up 2 2 2 3 -1

56

Round-to-Even

• Default Rounding Mode– Hard to get any other kind without dropping

into assembly

– All others are statistically biased• Sum of set of positive numbers will consistently be

over- or under- estimated

57

Round-to-Even P89

• Applying to Other Decimal Places

– When exactly halfway between two possible

values

• Round so that least significant digit is even

– E.g., round to nearest hundredth

1.2349999 1.23 (Less than half way)

1.2350001 1.24 (Greater than half way)

1.2350000 1.24 (Half way—round up)

1.2450000 1.24 (Half way—round down)

58

Rounding Binary Number P89

• “ Even” when least significant bit is 0• Half way when bits to right of rounding

position = 100…2

Value Binary Rounded Action Round Decimal

2 3/32 10.00011 10.00 Down 2

2 3/16 10.0011 10.01 Up 2 1/4

2 7/8 10.111 11.00 Up 3

2 5/8 10.101 10.10 Down 2 1/2

59

Floating-Point Operations

• Conceptual View

– First compute exact result

– Make it fit into desired precision

• Possibly overflow if exponent too large

• Possibly round to fit into frac

60

Mathematical Properties of FP Add

• Compare to those of Abelian Group– Closed under addition? YES

• But may generate infinity or NaN

– Commutative? YES– Associative? NO

• Overflow and inexactness of rounding

– 0 is additive identity? YES– Every element has additive inverse ALMOST

• Except for infinities & NaNs

61

Mathematical Properties of FP Add

• Monotonicity– a ≥ b a+c ≥ b+c?

ALMOST• Except for infinities & NaNs

62

Algebraic Properties of FP Mult

• Compare to Commutative Ring– Closed under multiplication? YES

• But may generate infinity or NaN

– Multiplication Commutative? YES– Multiplication is Associative? P92 NO

• Possibility of overflow, inexactness of rounding

– 1 is multiplicative identity?YES

– Multiplication distributes over addition? NO• Possibility of overflow, inexactness of rounding

63

Algebraic Properties of FP Mult P90

• Monotonicity– a ≥ b & c ≥ 0 a *c ≥ b *c? ALMOST

• Except for infinities & NaNs

64

FP Multiplication

• Operands(–1)s1 M1 2E1

(–1)s2 M2 2E2

• Exact Result(–1)s M 2E

– Sign s : s1 ^ s2– Significand M : M1 * M2– Exponent E : E1 + E2

65

FP Multiplication

• Fixing– If M ≥ 2, shift M right, increment E – If E out of range, overflow – Round M to fit frac precision

66

FP Addition

• Operands(–1)s1 M1 2E1

(–1)s2 M2 2E2

– Assume E1 > E2

• Exact Result(–1)s M 2E

– Sign s, significand M: • Result of signed align & add

– Exponent E : E1

67

FP Addition

• Fixing– If M ≥ 2, shift M right, increment E

– if M < 1, shift M left k positions, decrement E by k

– Overflow if E out of range

– Round M to fit frac precision

68

FP Addition

(–1)s1 m1

(–1)s2 m2

E1–E2

+

(–1)s m

69

Answers to Floating Point Puzzles

• int x = …;

• float f = …;

• double d = …;• Assume neither d nor f is NAN or infinity

70

Floating Point in C

• x == (int)(float) xNo: 24 bit significand• x == (int)(double) x Yes: 53 bit significand• f == (float)(double) f Yes: increases precision• d == (float) d No: loses precision• f == -(-f); Yes: Just change sign bit• 2/3 == 2/3.0 No: 2/3 == 0• d < 0.0((d*2) < 0.0) Yes!• d > f -f < -d No• d *d >= 0.0 Yes!• (d+f)-d == f No: Not associative

71


• C Guarantees Two Levels– float single precision– double double precision

72


• Conversions– Casting between int, float, and double changes numeric

values– Double or float to int

• Truncates fractional part• Like rounding toward zero• Not defined when out of range

– Generally saturates to TMin or TMax– int to double

• Exact conversion, as long as int has ≤ 53 bit word size– int to float

• Will round according to rounding mode

integer operations

Documents