chapter 4 round-off and truncation errors. numerical accuracy truncation error : method dependent...

CHAPTER 4CHAPTER 4

Round-Off and Truncation Errors

Numerical AccuracyNumerical AccuracyTruncation error : Method dependentErrors which result from using an approximation

rather than an exact procedure

Round-off error : Machine dependentErrors which result from not being able to

adequately represent the true valueResult from using an approximate number to

represent exact number

....)x(f!3

h)x(f

!2

h)x(fh)x(f)hx(f i

3

i

2

iii

71828.2e ,1416.3

m

0m

mm

0mm

mmo1mm

m

33

232

12

321

o3

32

21o

axm

afaxcxf

mafcxxc21mm1mcmxf

3afc c6xf

2afc axc6c2xf

afc axc3axc2cxf

afc axcaxcaxccxf

)(!

)()()(

!/)()()()()!()(

!/)()(

!/)()()(

)()()()(

)()()()()(

)(

)()(

Taylor Series ExpansionTaylor Series ExpansionConstruction of finite-difference formulaNumerical accuracy: discretization error

a x Base point x = a

Taylor series expansionsTaylor series expansions....)(

!3)(

!2)()()()(

32

1 iiiiii xfh

xfh

xfhxfhxfxf

nn

n

m

m

m

Raxn

afax

afax

afaxafaf

axm

afxf

)(!

)(...)(

!3

)()(

!2

)())(()(

)(!

)()(

)(32

0

)(

Taylor Series and RemainderTaylor Series and RemainderTaylor series (base point x = a)

Remainder

1)1(

)()!1(

)(

nn

n axn

fR

Truncation ErrorTruncation ErrorTaylor series expansion

Example (higher-order terms truncated)

....)(!3

)(!2

)()()()(32

1 iiiiii xfh

xfh

xfhxfhxfxf

....!9

x

!7

x

!5

x

!3

xxxsin

....!5

x

!4

x

!3

x

!2

xx1e

9753

5432x

(xi = 0, h = x xi+1 = x)

Power series Power series PolynomialsPolynomials

The function becomes more nonlinear as m

increases

A MATLAB Script A MATLAB Script Filename: fun_exp.m

function sum = exp(x)% Evaluate exponential function exp(x) % by Taylor series expansion% f(x)=1 + x + x^2/2! + x^3/3! + … + x^n/n!

clear allx = input(‘enter the value of x = ’);n = input(‘enter the order n = ’);term =1 ; sum= term;for i = 1 : n term = term*x/i; sum = sum + term;end

function sum = exp(x)% Evaluate exponential function exp(x) % by Taylor series expansion% f(x)=1 + x + x^2/2! + x^3/3! + … + x^n/n!x = input(‘enter the value of x =’);n = input(‘enter the order n = ’);term(1) =1 ; sum(1)= term(1);for i = 1 : n term(i+1) = term(i)*x/i; sum(i+1) = sum(i) + term(i+1);end% Display the resultsdisp(‘i term(i) sum(i)’)a = 1:n+1; [a’ term’ sum’]

MATLAB For LoopsMATLAB For LoopsFilename: fun_exp2.m

Truncation ErrorTruncation Error 0 1.0000 1.0000 1 10.0000 11.0000 2 50.0000 61.0000 3 166.6667 227.6667 4 416.6667 644.3334 5 833.3334 1477.6667 6 1388.8890 2866.5557 7 1984.1272 4850.6826 8 2480.1589 7330.8418 9 2755.7322 10086.5742 10 2755.7322 12842.3066 11 2505.2112 15347.5176 12 2087.6760 17435.1934 13 1605.9045 19041.0977 14 1147.0746 20188.1719 15 764.7164 20952.8887 16 477.9478 21430.8359 17 281.1458 21711.9824 18 156.1921 21868.1738 19 82.2064 21950.3809 20 41.1032 21991.4844

21 19.5729 22011.0566 22 8.8968 22019.9531 23 3.8682 22023.8223 24 1.6117 22025.4336 25 0.6447 22026.0781 26 0.2480 22026.3262 27 0.0918 22026.4180 28 0.0328 22026.4512 29 0.0113 22026.4629 30 0.0038 22026.4668 31 0.0012 22026.4688 32 0.0004 22026.4688 33 0.0001 22026.4688 34 0.0000 22026.4688 35 0.0000 22026.4688 36 0.0000 22026.4688 37 0.0000 22026.4688 38 0.0000 22026.4688 39 0.0000 22026.4688 40 0.0000 22026.4688

n term sum n term sum

4658.22026e ,10x x

Truncation ErrorTruncation Errorn term sum n term sum

0 1.0000000 1.0000000 1 -10.0000000 -9.0000000 2 50.0000000 41.0000000 3 -166.6666718 -125.6666718 4 416.6666870 291.0000000 5 -833.3333740 -542.3333740 6 1388.8890381 846.5556641 7 -1984.1271973 -1137.5715332 8 2480.1589355 1342.5874023 9 -2755.7321777 -1413.1447754 10 2755.7321777 1342.5874023 11 -2505.2111816 -1162.6237793 12 2087.6760254 925.0522461 13 -1605.9045410 -680.8522949 14 1147.0745850 466.2222900 15 -764.7164307 -298.4941406 16 477.9477539 179.4536133 17 -281.1457520 -101.6921387 18 156.1920776 54.4999390 19 -82.2063599 -27.7064209 20 41.1031799 13.3967590

21 -19.5729427 -6.1761837 22 8.8967924 2.7206087 23 -3.8681707 -1.1475620 24 1.6117378 0.4641758 25 -0.6446951 -0.1805193 26 0.2479596 0.0674404 27 -0.0918369 -0.0243965 28 0.0327989 0.0084024 29 -0.0113100 -0.0029076 30 0.0037700 0.0008624 31 -0.0012161 -0.0003537 32 0.0003800 0.0000263 33 -0.0001152 -0.0000889 34 0.0000339 -0.0000550 35 -0.0000097 -0.0000647 36 0.0000027 -0.0000620 37 -0.0000007 -0.0000627 38 0.0000002 -0.0000625 39 0.0000000 -0.0000626 40 0.0000000 -0.0000626

4x 1045399.0e ,10x

4658.22026/1e 10 How to reduce error?

Round-off ErrorsRound-off ErrorsComputers can represent numbers to a

finite precisionMost important for real numbers -

integer math can be exact, but limited How do computers represent numbers?Binary representation of the integers and

real numbers in computer memory

38127

38128

10189050(2)11111largest

10146930(2)00100smallest

..

..

1023

1024

(2)11111largest

(2)00100smallest

.

.

64 bits (52, 11, 1)211 = 2048

28 = 256

32 bits (23, 8, 1)

MATLAB uses double precision

Order of operation

Addition problem:

9986.00042.00044.099.0

998.00042.0994.00042.0)0044.099.0(

999.00086.099.0)0042.00044.0(99.0

with 3-digit arithmetic:

exact result

Round-off error

Cancellation error

4

2

2

01

2

2

1

2

br

rbx

rbx

bxx If b is large, r is close to b

Difference of two numbers very close to each other potential for greater error!

rbrbrb

rb

rb

rbrbx

2

2

4

22

22

2

Rationalize:

Try b = 97 01972 xx

exact: 0.01031

standard: 0.01050

rationalized: 0.01031

x2 (3 sig. figs.)

Corresponding to “cancellation, critical arithmetic”

(r = 96.9794)

Significant FiguresSignificant Figures48.9 mph? 48.95 mph?

Significant DigitsSignificant DigitsThe places which can be used with confidence32-bit machine: 7 significant digits64-bit machine: 17 significant digitsDouble precision: reduce round-off error,

but increase CPU time

590471828182842e

7310414213562312

2643897932384614159265353

.

.

.

3.25/1.96 = 1.65816326530162... (from MATLAB)

But in practice only report 1.65 (chopping) or 1.66 (rounding)! Why??Why??Because we don’t know what is beyond the second decimal place

False Significant FiguresFalse Significant Figures

...65586522403258.1964.1/245.3

...77246644501278.1955.1/254.3Rounding

...18696505840528.1969.1/250.3

...40826627551020.1960.1/259.3Chopping

Accuracy - How closely a measured or computed value agrees with the true value

Precision - How closely individual measured or computed values agree with each other

Accuracy is getting all your shots near the target.Precision is getting them close together.

Accuracy and precisionAccuracy and precision

More Accurate

More Precise

Approximation = true value + true errorApproximation = true value + true error

Et = true value approximation = x* x

or in percent

*

*

x

xx

ValueTrue

Error TrueErrorelative R

%*

100*x

xxt

Numerical ErrorsNumerical ErrorsThe difference between the true value and the approximation

%100

%100a

ionapproximat present

approx. previous approx. present

ionapproximat

error eapproximat

Approximate ErrorApproximate ErrorBut the true value is not known

If we knew it, we wouldn’t have a problem

Use approximate error

%100x

xxerrorRelative

new

oldnew

Number SystemsNumber Systems Base-10 (Decimal): 0,1,2,3,4,5,6,7,8,9 Base-8 (Octal): 0,1,2,3,4,5,6,7 Base-2 (Binary): 0,1 – off/on, close/open, negative/positive

charge Other non-decimal systems 1 lb = 16 oz, 1 ft = 12 in, ½”, ¼”, …..

16

11212120211011.0

45212021212021101101:2base

1051021011033125.0

109102101105129,5:10base

4321

012345

4321

0123

Decimal System

(base 10)

Binary System (base 2)

Integer RepresentationInteger Representation

Signed magnitude methodUse the first bit of a word to indicate the

sign – 0: negative (off), 1: positive (on)Remaining bits are used to store a number

+ 1 0 1 0 0 1 0 1 1 0

Sign Number

off / on, close / open, negative / positive

8-bit word

+/- 0000000 are the same, therefore we may use “-0” to represent “-128”

Total numbers = 28 = 256 (-128 127)

Integer RepresentationInteger Representation

Sign Number

0123456 2 2 2 2 2 2 2

base10base2

base10base2

1271111111number largest

00000000number smallest

Integer RepresentationInteger Representation16-bit word

Range: -32,768 to 32,767Overflow: > 32,767 (cannot represent 43,000 A&M students)Underflow: < -32,768 (magnitude too large)

32-bit wordRange: -2,147,483,648 to 2,147,483,6479 significant digitsOverflow: world population 6 billionUnderflow: budget deficit -$100 billion

767,322121....2121 011314

Integer OperationsInteger Operations Integer arithmetic can be exact as long as

you don't get remainders in division 7/2 = 3 in integer math

or overflow the maximum integer For a 8-bit computer max = 128 (or -127) So 123 + 45 = overflow and -74 * 2 = underflow

Floating-Point RepresentationFloating-Point Representation Real numbers (also called floating-point

numbers) are represented differently For fraction or very large numbers Store as

sign is 1 or 0 for negative or positive exponent is maximum value (positive or

negative) of base mantissa contains significant digits

sign signed exponent mantissa

Floating-Point RepresentationFloating-Point Representation

m: mantissaB: Base of the number systeme: “signed” exponentNote: the mantissa is usually “normalized”

if the leading digit is zero

m

p321

e

m21 d d d d e e e sign of

numbersigned exponent mantissa

eep321 mBBddd.d N

Integer representationInteger representation

Floating-point number representationFloating-point number representation

8-bit word

Decimal RepresentationDecimal Representation

sign signed exponent number

432101 10 10 10 10 10 01

1|095|1467 (base: B = 10)mantissa: m = -(1*10-1 + 4*10-2 + 6*10-3 + 7*10-4 ) = -0.1467

signed exponent: e = + (9*101 + 5*100) = 95

95e10base 1014670mB10951467 .

8-bit word (without normalization)

Floating-Point RepresentationFloating-Point Representation


432101 2 2 2 2 2 2

0|111|0101 (base: B = 2)

mantissa: m = +(0*2-1 + 1*2-2 + 0*2-3 + 1*2-4 ) = 5/16

signed exponent: e = - (1*21 + 1*20) = -3

5/1282(5/16)mB10111001 3e2base

NormalizationNormalization

Remove the leading zero by lowering the exponent (d1 = 1 for all numbers)

if m < 1/2, multiply by 2 to remove the leading 0 floating-point allow fractions and very large numbers to

be represented, but take up more memory and CPU time

222

222

ft10694444.0in 1ft006944.0ft(1/144) in 1 (Less accurate)

(Normalization)

1m2

1 :2base

1m0.1 1m10

1 :10base

1mB

1

8-bit word (with normalization)

Binary RepresentationBinary Representation


432101 2 2 2 2 2 2

1|011|1001 (base: B = 2)

mantissa: m = -(1*2-1 + 0*2-2 + 0*2-3 + 1*2-4 ) = -9/16

signed exponent: e = + (1*21 + 1*20) = 3

9/22(9/16)mB10111001 3e2base

Single PrecisionSingle PrecisionA real variable (number) is stored in four words,

or 32 bits (64 bits for Supercomputers)bit (binary digit): 0 or 1byte: 4 bits, 24 = 16 possible valuesword: 2 bytes = 8 bits, 28 = 256 possible values

23 for the digits

32 bits 8 for the signed exponent

1 for the sign

39128

38127

1034028011(2).111 largest100.2938700(2).100 smallest

.

Double PrecisionDouble PrecisionA real variable is stored in eight words, or 64 bits16 words, 128 bits for supercomputers

signed exponent 210 = 1024

52 for the digits

64 bits 11 for the signed exponent

1 for the sign

1024

1023

11(2).111 largest00(2).100 smallest

Round-off ErrorsRound-off ErrorsFloating point characteristics contribute to round-off

error (limited bits for storage)Limited range of quantities can be representedA finite number of quantities can be representedThe interval between numbers increases as the

numbers grow

Example - three significant digits

0.0100 0.0101 0.0102 …… 0.0999 (0.0001 increment)

0.100 0.101 0.102 ……. 0.999 (0.001 increment)

1.00 1.01 1.02 ……. 9.99 (0.01 increment)

MATLABMATLABFinite number of real quantities (integers,

real numbers or text) can be represented

For 8-bit, 28 = 256 quantities

For 16-bit, 216 = 65536 quantities

MATLAB uses double precision 4 bytes = 64 bits more than 1019 (264) quantities

chapter 4 round-off and truncation errors. numerical accuracy truncation error : method dependent...

Documents