discrete mathematics - mgnetdouglas/classes/na-sc/notes/2008f.doc · web viewprocess stage 0 stage...
TRANSCRIPT
![Page 1: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/1.jpg)
Computational Methods in Applied Sciences I
University of Wyoming MA 5310Fall, 2008
Professor Craig C. Douglas
http://www.mgnet.org/~douglas/Classes/na-sc/notes/2008f.pdf
![Page 2: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/2.jpg)
Course Description: First semester of a three-semester computational methods series. Review of iterative solutions of linear and nonlinear systems of equations, polynomial interpolation/approximation, numerical integration and differentiation, and basic ideas of Monte Carlo methods. Comparison of numerical techniques for programming time and space requirements, as well as convergence and stability.
Prerequisites: Math 3310 and COSC 1010. Identical to COSC 5310, CHE 5140, ME 5140, and CE 5140. (3 hours).
Textbook: George Em Karniadakis and Robert M. Kirby II, Parallel Scientific Computing in C++ and MPI: A Seamless Approach to Parallel Algorithms and Their Implementation, Cambridge University Press, 2003 (with a cdrom of software).
2
![Page 3: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/3.jpg)
Outline
1. Errors2. Parallel computing basics3. Solution of linear systems of equations
a. Matrix algebra reviewb. Gaussian elimination and factorization methodsc. Iterative methods:
i. Splitting/Relaxation methods: Sxi+1 = Txi + b, x0 givenii. Krylov space methods
d. Sparse matrix methods4. Nonlinear equations5. Interpolation and approximation
a. Given {f(x0), f(x1), …, f(xN+1)}, what is f(x), x0xxN+1 and xi<xi+1
6. Numerical integration and differentiation7. Monte Carlo methods
a. When you do not know how to solve a problem any deterministic way, go for a random walk through your solution space. Good luck.
3
![Page 4: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/4.jpg)
1. Errors
1. Initial errorsa. Inaccurate representation of constants (, e, etc.)b. Inaccurate measurement of datac. Overly simplistic model
2. Truncationa. From approximate mathematical techniques, e.g.,
ex = 1 + x + x2/2 + … + xn/n! + … e = 1 + + … + k/k! + E
3. Roundinga. From finite number of digits stored in some baseb. Chopping and symmetric rounding
4
![Page 5: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/5.jpg)
Error types 1-2 are problem dependent whereas error type 3 is machine dependent.Floating Point Arithmetic
We can represent a real number x by
x = ±(0⋅a1⋅a2⋅ ... ⋅am )×c,
where 0aib, and m, b, and mcM are machine dependent with common bases b of 2, 10, and 16.
IEEE 755 (circa 1985) floating point standard (all of ~6 pages):
Feature Single precision Double precisionBits total 32 64Sign bits 1 1Mantissa bits 23 52Exponent bits 8 11Exponent Range [-44,38] [-323,308]
5
![Page 6: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/6.jpg)
Decimal digits 9 16
Conversion between bases is simple for integers, but is really tricky for real numbers. For example, given r base 10, its equivalent in base 16 is (r)10→ (%r)16 is derived by computing
a0160 + a1161 + a2162 + … + 116-1 + 216-2 + …
Integers are relatively easy to convert. Real numbers are quite tricky, however.
Consider r1 = 1/10:
16 r 1 = 1.6 = 1 + 2/16 + 3/162 + …16 r 2 = 9.6 = 2 + 3/16 + 4/162 + …
Hence, (.1)10 =(.199999)16 a number with m digits in one base may not have terminal representation in another base. It is not just irrationals that are a problem (e.g., consider (3.0)10→ (3.0)2 ).
6
![Page 7: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/7.jpg)
Consider r = .115 if b = 10 and m = 2, then
r = .11 choppingr = .12 symmetric rounding (r+.5bc-m-1 and then chop)
Most computers chop instead of round off. IEEE compliant CPUs can do both and there may be a system call to switch, which is usually not user accessible.
Note: When the rounding changes, almost all nontrivial codes break.
Warning: On all common computers, none of the standard arithmetic operators are associative. When dealing with multiple chained operations, none you would expect are commutative, either, thanks to round off properties. (What a deal!)
Let’s take a look, one operator at a time.
7
![Page 8: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/8.jpg)
Let e(x)=x−x in the arithmetic operations that follow in the remainder of this section.
Addition:
x+y=(x+e(x))+(y+e(y))=(x+y)+(e(x)+e(y)).x+y≠ ____
x+y is fun to construct an example.
In addition, x+y can overflow (rounds off to or underflow (rounds off to zero) even though the number in infinite precision is neither. Overflow is a major error, but underflow usually is not a big deal.
Warning: The people who defined IEEE arithmetic assumed that 0 is a signed number, thus violating a basic mathematical definition of the number system. Hence, on IEEE compliant CPUs, there is both +0 and -0 (but no signless 0), which are different numbers in floating point. This seriously disrupts comparisons with 0. The programming fix is to compare abs(expression) with 0, which is computationally ridiculous and inefficient.
8
![Page 9: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/9.jpg)
Decimal shifting can lead to errors.
Example: Consider b = 10 and m = 4. Then given x1=0.5055×104 and
x2 =...x11=0.4000×100 we have
x1+x2=0.50554×104≅0.5055×104=x1.
Even worse, (...(x1+x2)+x3)+...)+x11)=x1,but (...(x11+x10)+x9)+...)+x1)=0.5059×10
4 .
Rule of thumb: Sort the numbers by positive, negative, and zero values based on their absolute values. Add them up in ascending order inside each category. Then combine the numbers.
9
![Page 10: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/10.jpg)
Subtraction:
x−y=(x+e(x))−(y+e(y))=(x−y)+(e(x)−e(y)).
If x and y are close there is a loss of significant digits.
Multiplication:
x⋅y≅x⋅y+xe(y)+ye(x).
Note that the e(x)e(y) term is not present above. Why?
10
![Page 11: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/11.jpg)
Division:
xy=
x+e(x)y
⎛
⎝⎜⎜
⎞
⎠⎟⎟
11+e(y)/y
⎛
⎝⎜⎜
⎞
⎠⎟⎟≅
xy+
e(x)y −xe(y)
y2
where we used 11+r=1−r+r
2−r3+...
y sufficiently close to 0 can be utterly and completely disastrous to rounding error.
11
![Page 12: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/12.jpg)
2. Parallel Computing Basics
Assume there are p processors numbered from 0 to p-1 and labeled Pi. The communication between the processors uses one or more high speed and bandwidth switches.
In the old days, various topologies were used, none of which scaled to more than a modest number of processors. The Internet model saved parallel computing.
Today parallel computers come in several flavors (hybrids, too): Small shared memory (SMPs) Small clusters of PCs Blade servers (in one or more racks) Forests of racks GRID or Cloud computing
Google operates the world’s largest Cloud/GRID system with an estimated 50 Petaflops total.
12
![Page 13: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/13.jpg)
Data needs to be distributed sensibly among the p processors. Where the data needs to be can change, depending on the operation, and communication is usual. Algorithms that essentially never need to communicate are known as embarrassingly parallel. These algorithms scale wonderfully and are frequently used as examples of how well so and so’s parallel system scales. Most applications are not in this category, unfortunately.
To do parallel programming, you need only a few functions to get by: Initialize the environment and find out processor numbers i. Finalize or end parallel processing on one or all processors. Send data to one, a set, or all processors. Receive data from one, a set, or all processors. Cooperative operations on all processors (e.g., sum of a distributed vector).
Everything else is a bonus. Almost all of MPI is designed for compiler writers and operating systems developers. Only a small subset is expected to be used by regular people.
13
![Page 14: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/14.jpg)
3. Solution of Linear Systems of Equations
3a. Matrix Algebra Review
Let R= rij⎛⎝⎜
⎞⎠⎟ be mn and S= sij
⎛⎝⎜
⎞⎠⎟ be np.
Then T =RS= tij⎛⎝⎜
⎞⎠⎟ is mp with tij = rikskjk=1
n∑ .
SR exists if and only if n=p and SRRS normally.
Q= qij⎛⎝⎜
⎞⎠⎟=R+S=rij+sij exists if and only if n=p.
Transpose: for R= rij⎛⎝⎜
⎞⎠⎟ , RT = rji
⎛⎝⎜
⎞⎠⎟ .
14
![Page 15: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/15.jpg)
Inner product: for x,y n-vectors, (x,y) = xTy and (Ax,y) = (Ax)Ty.
Matrix-Matrix Multiplication (an aside)
for i = 1,M do
for j = 1,M do
for k = 1,M do
A(i,j) = B(i,k) ∗C(k,j)
or the blocked form
for i = 1,M, step by s, do
for j = 1,M, step by s, do
for k = 1,M step by s do
for l = i, i + s –1 do
for m = j, j + s –1 do
for n = k, k+s-1 do
A(l,m) = B(l,n) ∗C(n,m)
15
![Page 16: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/16.jpg)
If you pick the block size right, it runs 2X+ faster than the standard form.
Why does the blocked form work so much better? If you pick s correctly, the blocks fit in cache and only have to be moved into cache once with double usage. Arithmetic is no longer the limiting factor in run times for numerical algorithms. Memory cache misses is the limiting factor.
An even better way of multiplying matrices is a Strassen style algorithm (the Winograd variant is the fastest in practical usage).
16
![Page 17: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/17.jpg)
Continuing basic definitions…
If x=(xi) is an n-vector (i.e., a n1 matrix), then
diag(x)=
x1x2
Oxn
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥
.
Let ei be a n-vector with all zeroes except the ith component, which is 1. Then
I = [ e1, e2, …, en ]
is the nn identity matrix. Further, if A is nn, then IA=AI=A.
17
![Page 18: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/18.jpg)
The nn matrix A is said to be nonsingular if ! x such that Ax=b, b.Tests for nonsingularity:
Let 0n be the zero vector of length n. A is nonsingular if and only if 0n is the only solution of Ax=0n.
A is nonsingular if and only if det(A)0.
Lemma: ! A such that A-1A=AA-1=I if and only if A is nonsingular.Proof: Suppose C such that CA-1, but CA=AC=I. Then C=IC=(A-1A)C=A-1(AC)=A-1I=A-1.
18
![Page 19: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/19.jpg)
Diagonal matrices: D=a 0
c
0 d
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
.
Triangular matrices: upper U =x x x x
x x xx x
x
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥, strictly upper U =
0 x x x0 x x
0 x0
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
.
lower L=xx xx x xx x x x
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥, strictly lower L=
0x 0x x 0x x x 0
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
.
19
![Page 20: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/20.jpg)
3b. Gaussian elimination
Solve Ux=b, U upper triangular, real, and nonsingular:
xn =nann
and xn−1=1
an−1,n−1(n−1−an−1,nxn)
If we define aiji>nn
∑ xj=0 , then the formal algorithm is
xi =(aii)−1(i− aijxj)j=i+1
n∑ , i=n,n-1, …, 1.
Solve Lx=b, L lower triangular, real, and nonsingular similarly.
Operation count: O(n2) multiplies
20
![Page 21: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/21.jpg)
Tridiagonal Systems
Only three diagonals nonzero around main diagonal:
a11x1+a12x2 = 1a21x1+a22x2+a23x3 = 2
Man,n−1xn−1+annxn = n
Eliminate xi from (i+1)-st equations sequentially to get
x1−1x2 = q1x2−2x3 = q2
Mxn = qn
where
21
![Page 22: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/22.jpg)
p1=−a12a11 q1=−
1a11
pi =−ai,i+1
ai,i−1i−1+aii qi =−i−ai,i−1qi−1ai,i−1i−1+aii
Operation count: 5n-4 multiplies
22
![Page 23: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/23.jpg)
Parallel Tridiagonal Solver
We use the fact that we can factor a NN tridiagonal A into LU form, where L and U are lower and upper triangular:
A=LU =
a1 c12 a2 c2
3 a3 OO O
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥⎥
=
1l2 1
l3 1O O
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥⎥
d1 u1d2 u2
d3 OO
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥⎥⎥
A recurrence relation exists for dj, lj, and uj for j=1,…,n and k=2,…,n.
a1 = d1.cj = uj.ak = dk + lkuk-1.bk = lkdk-1.
23
![Page 24: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/24.jpg)
Substituting equations into each other and simplifying yields the recurrence
d j = aj−ljuj−1
= aj−jdj−1
uj−1
=ajdj−1−jcj−1
dj−1
We can use this form of dj and bk’s equation to get the lj’s. The parallel algorithm is based on a fully recursive algorithm using 22 matrices defined by
R0 =a0 01 0
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥ and Rj=
aj −jcj−11 0
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥ for j=1,…,N.
24
![Page 25: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/25.jpg)
We use the Mobius transformation, T j =Rj⋅Rj−1⋅ L ⋅R0 . Then
d j =
10
⎛
⎝⎜⎜
⎞
⎠⎟⎟
T
Tj11
⎛
⎝⎜⎜
⎞
⎠⎟⎟
01
⎛
⎝⎜⎜
⎞
⎠⎟⎟
T
Tj11
⎛
⎝⎜⎜
⎞
⎠⎟⎟
.
Example: p=8 processors, N = 40, and all processors have all of A. We partition each processor into 8 rows of A: processor P0 has rows 0-4, processor P1 has rows 5-9, and so forth.
Parallel algorithm:1. On each process Pj form the matrices Rk , where k corresponds to the row
indices for which the process is responsible, and ranges between k_min and k_max.
2. On each process Pj form the matrix S j = Rk_m axRk_m ax−1L Rk_m in .
25
![Page 26: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/26.jpg)
3. Using the full-recursive-doubling communication pattern as given in Table 1, distribute and combine the Sj matrices as given in Table 2 (see following slides).
4. On each process Pj calculate the local unknown coefficients dk (k_min ≤ k ≤ k_max) using local Rk and matrices obtained from the full recursive doubling.
5. For processes P0 through Pp-1, send the local dk_max to the process one process id up (i.e., P0 sends to P1, P1 sends to P2, …).
6. On each process Pj calculate the local unknown coefficients lk (k_min ≤ k ≤ k_max) using the local dk values and the value obtained in the previous step.
7. Distribute the dj and lj values across all processes so that each process has all the dj and lj coefficients.
8. On each process Pj perform a local forward and backward substitution to obtain the solution.
The two tables that follow are specific to the example. You can generalize the tables easily.
26
![Page 27: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/27.jpg)
Stage 1 Stage 2 Stage 3P0→ P1 P0→ P2 P0→ P4P1→ P2 P1→ P3 P1→ P5P2→ P3 P2→ P4 P2→ P6P3→ P4 P3→ P5 P3→ P7P4→ P5 P4→ P6P5→ P6 P5→ P7P6→ P7
Table 1: Full-recursive-doubling communication pattern. The number of stages is equal to the log2p where p is the number of processes. In this case, p = 8 and there are only three stages of communication.
27
![Page 28: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/28.jpg)
Process Stage 0 Stage 1 Stage 2 Stage 3P0 S0
P1 S1 S1 S0
P2 S2 S2 S1 S2S1 S0
P3 S3 S3 S2 S3S2S1 S0
P4 S4 S4 S3 S4S3S2S1 S4S3S2S1 S0
P5 S5 S5 S4 S5S4S3S2 S5S4S3S2S1 S0
P6 S6 S6 S5 S6S5S4S3 S6S5S4S3S2S1 S0
P7 S7 S7 S6 S7S6S5S4 S7S6S5S4S3S2S1 S0
Table 2: Distribution and combination pattern of the Sj matrices for each stage. The interpretation of the table is as follows: Given the communication pattern as given in Table 1, in stage one P0 sends S0 to P1, which P1 combines with its local S1 to form the product S1S0. Similarly in stage one, P1 sends S1 to P2, etc. In stage two, P0 sends S0 to P2, which P2 combines with its local product S2S1 to form S2S1S0. Similarly P1 sends S1S0 to P3 which is then combined on P3 to form S3S2S1S0. In stage three, the final communications occur such that each process j stores locally the product
SjSj−1L S0 .
28
![Page 29: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/29.jpg)
General Matrix A (nonsingular), solve Ax = f by Gaussian elimination
Produce A(k), f(k), k=1,…n, where A(1)=A and f(1)=f and for k=2, 3, …, n,
aij(k)=
aij(k−1)
0
aij(k−1)−
ai,k−1(k−1)
ak−1,k−1(k−1) ak−1, j
(k−1)
i≤k−1i≥k, j≤k−1i≥k, j≥k
⎧
⎨
⎪⎪⎪⎪⎪
⎩
⎪⎪⎪⎪⎪
fi(k)=
fi(k−1) i≤k−1
fi(k−1)−
ai,k−1(k−1)
ak−1,k−1(k−1) fk−1
(k−1) i≥k
⎧
⎨
⎪⎪⎪
⎩
⎪⎪⎪
The 22 block form of A(k) is
29
![Page 30: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/30.jpg)
A(k)=U (k) A(k)
0 %A(k)
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥.
Theorem 3.1: Let A be such that Gaussian elimination yields nonzero diagonal elements akk
(k) , k=1, 2, …, n. Then A is nonsingular and
detA=a11(1)a22
2 L ann(n).
Also, A(n)≡U is upper triangular and A has the factorization
LU =A,
where L=(m ik) is lower triangular with elements
30
![Page 31: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/31.jpg)
mik =0 for i<k1 i=kaik(k)
akk(k) i>k
⎧
⎨
⎪⎪⎪⎪⎪
⎩
⎪⎪⎪⎪⎪
The vector
g≡f(n)=L−1f.
Proof: Note that once is proven, det(A)=det(L)det(U)=det(U), so follows.Now we prove . Set LU =(cij). Then (since L and U are triangular and A(k) is satisfied for k=n)
cij = m ikakj(n)= m ikakj
(k)k=1m in(i, j)
∑k=1n
∑ .
31
![Page 32: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/32.jpg)
From the definitions of aij(k) and mik we get
mi,k−1ak−1.j(k−1)=aij
(k−1)−aij(k) for 2≤k≤i, k≤j
and recall that aij(1)=aij. Thus, if ij, then
cij = m ikakj(k)
k=1i−1
∑ +aij(i)= aij
k−aij(k+1)⎛
⎝⎜⎞⎠⎟k=1
i−1∑ +aij
(i)=aij.
When i>j, aij( j+1)=0⇒ .
Finally, we prove . Let h≡Lg. So,
hi = m ikgk= m ikfk(k)
k=1i
∑k−1i
∑ .
From the definitions of fi(k), mik, and fi
(1)=fi,
32
![Page 33: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/33.jpg)
.
L nonsingular completes the proof of . QED
Examples:
A=A(1)=4 6 18 10 3
−12 48 2
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
, A(2)=4 6 10 −2 10 66 15
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
, A(3)=4 6 10 −2 10 0 38
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
=U and L=1 0 02 1 0−3 −33 1
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
and
A=A(1)=4 6 18 10 3−8 −12 −2
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
, A(2)=4 6 10 −2 10 0 0
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
=U and L=1 0 02 1 0−3 0 1
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥.
The correct way to solve Ax=f is to compute L and U first, then solve
33
![Page 34: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/34.jpg)
Ly=f,Ux=y.
Generalized Gaussian elimination
1. Order of elimination arbitrary.2. Set A(1)=A, and f(1)=f .3. Select an arbitrary ai1, j1
(1) ≠0 as the first pivot element. We can eliminate x j1
from all but the i1-st equation. The multipliers are mk, j1=ak,j1
(1) /ai, j1(1) .
4. The reduced system is now A(2)x=f(2).5. Select another pivot ai2, j2
(2) ≠0 and repeat the elimination.6. If ars
(2)=0, ∀r,s, then the remaining equations are degenerate and we halt.
Theorem 3.2: Let A have rank r. Then we can find a sequence of distinct row and column indices (i1,j1), (i2,j2), …, (ir,jr) such that corresponding pivot
34
![Page 35: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/35.jpg)
elements in A(1), A(2), …, A(r) are nonzero and aij
(r)=0 if i≠i1,i2, K ,ir. Define permutation matrices (whose columns are unit vectors)
P=e(i1),e(i2),L ,e(ir),L ,e(in)
⎡
⎣⎢⎢
⎤
⎦⎥⎥ and
Q=e(j1),e(j2),L ,e(jr),L ,e(jn)
⎡
⎣⎢⎢
⎤
⎦⎥⎥,
where {ik} and {jk} are permutations of {1,2,…,n}. Then
By=g
(where B≡PTAQ, y≡QTx, and g≡PT f) is equivalent to Ax=f and can be reduced to triangular form by Gaussian elimination with the natural ordering.
Proof: Generalized Gaussian elimination alters A≡A(1) by forming linear combinations of the rows. Thus, whenever no nonzero pivot can be found, the remaining rows were linearly dependent on the preceding rows. Permutations P and Q rearrange equations and unknowns such that
bvv =aiv, jv, v=1,2,L ,n. By
35
![Page 36: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/36.jpg)
the first half of the theorem, the reduced B(r) is triangular since all rows r+1, …, n vanish. QED
Operation Counts
To compute aij(k) : (n-k+1)2 + (n-k+1) (do quotients only once)
To compute fi(k) : (n-k+1)
Recall that kk=1n
∑ =n(n+1)2 and k2k=1n
∑ =n(n+1)(2n+1)6 . Hence, there are n(n2−1)
3 multiplies to triangularize A and n(n−1)2 multiplies to modify f.
Using the Ly=f and Ux=y approach, computing xi requires (n-i) multiplies plus 1 divide. Hence, only n(n+1)
2 multiplies are required to solve the triangular systems.
Lemma: n3
3 +mn2−n3 operations are required to solve m systems Ax( j)=f(j), j=1, …, m by Gaussian elimination.
36
![Page 37: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/37.jpg)
Note: To compute A-1 requires n3 operations. In general, n2 operations are required to compute A−1f(j). Thus, to solve m systems requires mn2 operations. Hence, n3+mn2 operations are necessary to solve m systems.
Thus, it is always more efficient to use Gaussian elimination instead of computing the inverse!
We can always compute A-1 by solving Axi=ei, i=1,2,…,n and then the xi’s are the columns of A-1.
Theorem 3.3: If A is nonsingular, P such that PA=LU is possible and P is only a permutation of the rows. In fact, P may be found such that lkk ≥lik for i>k, k=1,2,…,n-1.
Theorem 3.4: Suppose A is symmetric. If A=LU is possible, then the choice of lkk=ukklik=uki. Hence, U=LT.
37
![Page 38: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/38.jpg)
Variants of Gaussian elimination
LDU factorization: L and U are strictly lower and upper triangular and D is diagonal.
Cholesky: A=AT, so factor A=LLT.
Fun example: A=0 11 0
⎡
⎣⎢⎢⎢
⎤
⎦⎥⎥⎥ is symmetric, but cannot be factored into LU form.
Definition: A is positive definite if xT Ax>0, ∀xTx>0 .
Theorem 3.5 (Cholesky Method): Let A be symmetric, positive definite. Then A can be factored in the form A=LLT.
Operation counts: To find L and g=L-1f is n3
6 +n2−n6 oerations + n 's.
38
![Page 39: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/39.jpg)
To find U is n2+n2 operations.
Total is n3
6 + 32n
2+n3 oerations + n 's operations.
39
![Page 40: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/40.jpg)
Parallel LU Decomposition
There are 6 convenient ways of writing the factorization step of the nn A in LU decomposition. The two most common are as follows:
kij loop: A by row (daxpy) kji loop: A by column (daxpy)for k = 1, n − 1 for k = 1, n − 1 for i = k + 1, n for p = k + 1, n lik = aik /akk lpk = apk /akk
for j = k + 1, n endfor aij = aij − likakj for j = k + 1, n endfor for i = k + 1, n endfor aij = aij − likakj
endfor endfor endforendfor
It is frequently convenient to store A by rows in the computer.
40
![Page 41: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/41.jpg)
Suppose there are n processors Pi, with one row of A stored on each Pi. Using the kji access method, the factorization algorithm is
for i = 1, n-1 Send aii to processors Pk, k=i+1,…, n In parallel on each processor Pk, k=i+1,…, n, do the daxpy update to row kendfor
Note that in step i, after Pi sends aii to other processors that the first i processors are idle for the rest of the calculation. This is highly inefficient if this is the only thing the parallel computer is doing.
A column oriented version is very similar.
We can overlap communication with computing to hide some of the expenses of communication. This still does not address the processor dropout issue. We can do a lot better yet.
41
![Page 42: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/42.jpg)
Improvements to aid parallel efficiency:
1. Store multiple rows (columns) on a processor. This assumes that there are p processors and that p = n . While helpful to have mod(n,p)=0, it is unnecessary (it just complicates the implementation slightly).
2. Store multiple blocks of rows (columns) on a processor.3. Store either 1 or 2 using a cyclic scheme (e.g., store rows 1 and 3 on P1 and
rows 2 and 4 on P2 when p=2 and n=4).
Improvement 3, while extremely nasty to program (and already has been as part of Scalapack so you do not have to reinvent the wheel if you choose not to) leads to the best use of all of the processors. No processor drops out. Figuring out how to get the right part of A to the right processors is lots of fun, too.
Now that we know how to factor A = LU in parallel, we need to know how to do back substitution in parallel. This is a classic divide and conquer algorithm leading to an operation count that cannot be realized on a known computer (why?).
42
![Page 43: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/43.jpg)
We can write the lower triangular matrix L in block form as
L=L1 0L2 L3
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥,
where L1 and L2 are also lower triangular. If L is of order 2k, some k>0, then no special cases arise in continuing to factor the Li’s. In fact, we can prove that
L−1=L1−1 0
−L3−1L2L1
−1 L3−1
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥,
which is also known as a Schur complement.
43
![Page 44: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/44.jpg)
Norms
Definition: A vector norm ⋅: ° n →° satisfies for any x= xij
⎛⎝⎜
⎞⎠⎟∈° n and any
y∈° n ,1. x ≥0, ∀x∈° n and x =0 if and only if x1=x2=L =xn=0 .2.
ax = α ⋅ x , ∀α ∈° , ∀x∈° n
3. x+y≤x + y, x,y∈° n
In particular,x 1= xii=1
n∑ .
x p = xii=1n
∑⎛
⎝⎜⎜
⎞
⎠⎟⎟
1/
, ≥1.
x ∞=max x1,x2,L ,xn
⎧⎨⎩⎫⎬⎭.
Example: x=−4,−2, 5⎛⎝⎜
⎞⎠⎟, x1=6+ 5, x 2=5, x∞=4 .
44
![Page 45: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/45.jpg)
Definition: A matrix norm ⋅: ° n×n →° satisfies for any A=aij
⎛⎝⎜
⎞⎠⎟∈° n×n and any
B∈° n×n ,1. A ≥0 and A=0 if and only if ∀i, j, aij=0.
2. aA = α ⋅A
3. A+B ≤A+ B
4. AB ≤A⋅B
In particular,A
1=max1≤j≤n
aiji=1n
∑ , which is the maximum absolute column sum.
A∞=max1≤i≤n
aijj=1n
∑ , which is the maximum absolute row sum.
AE= aij
2j=1n
∑i=1n
∑⎛
⎝⎜⎜
⎞
⎠⎟⎟
1/2, which is the Euclidean matrix norm.
A =maxu=1
Au
45
![Page 46: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/46.jpg)
Examples:
1. A=1 −2 39 −1 2−1 −2 −4
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
, A1=11, A
2=12, A
E=11
2. Let In∈° n×n . Then In 1=In 2
=1, but In E=n .
Condition number of a matrix
Definition: cond(A)= A ⋅A−1 .
Facts (compatible norms): Ax1≤A
1x1, Ax∞
≤A∞x∞, Ax 2
≤AEx 2 .
46
![Page 47: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/47.jpg)
Theorem 3.6: Suppose we have an approximate solution of Ax=b by some %x , where
b >0 and A∈° n×n is nonsingular. Then for any compatible matrix and
vector norms,
k−1A%x−b
b≤
%x− xx ≤κ
A%x−b
b, where κ = cond(A).
Proof: (rhs) %x−x=A−1r, where r=A%x− is the residual. Thus,
%x−x ≤A−1 ⋅r=A−1 ⋅A%x−
Since Ax=b,A ⋅ ≥ and A / ≥x−1
.Thus,
%x−x / x ≤A−1 ⋅A%x− ⋅A / .
(lhs) Note that since A >0 ,
A%x− =r=A%x−Ax ≤A⋅%x−x or
%x−x ≥A%x− / .
Further,
47
![Page 48: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/48.jpg)
x=A−1⇒ x ≤A−1 ⋅ or x−1≥ A−1 ⋅⎛
⎝⎜⎜
⎞
⎠⎟⎟
−1.
Combining the two inequalities gives us the lhs. QED
Theorem 3.7: Suppose x and dx satisfy Ax=f and (A+dA)(x+dx)=f+d f, where dx and dx are perturbations. Let A be nonsingular and dA be so small that
dA < A−1−1
. Then
dxx ≤ κ
1−κ δ A / A
δ f
f+
δ A
A
⎛
⎝
⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟
.
Note: Theorem 3.7 implies that when x is small, small relative changes in f and A cause small changes in x.
48
![Page 49: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/49.jpg)
Iterative Improvement
1. Solve Ax=f to an approximation %x (all single precision).2. Calculate r=A%x−f using double the precision of the data.3. Solve Ae=r to an approximation %e (single precision).4. Set %x'=%x−%e (single precision %x ) and repeat steps 2-4 with %x=%x'.
Normally the solution method is a variant of Gaussian elimination.Note that r=A%x−f=A(%x−x)=Ae . Since we cannot solve Ax=f exactly, we probably cannot solve Ae=r exactly, either.
Fact: If 1st %x' has q digits correct. Then the 2nd %x ' will have 2q digits correct (assuming that 2q is less than the number of digits representable on your computer) and the nth %x ' will have nq digits correct (under a similar assumption as before).
Parallelization is straightforward: Use a parallel Gaussian elimination code and parallelize the residual calculation based on where the data resides.
49
![Page 50: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/50.jpg)
3c. Iterative Methods
3c (i) Splitting or Relaxation Methods
Let A=ST, where S is nonsingular. Then Ax= ⇔ Sx=Tx+. Then the iterative procedure is defined by
x0 givenSxk+1=Txk+, k≥1
⎧
⎨⎪⎪
⎩⎪⎪
To be useful requires that
1. xk+1 be easy to compute.2. xk → x
50
![Page 51: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/51.jpg)
Example: Let A=D-L-U, D diagonal, L and U strictly lower and upper triangular, respectively. Then
a. S=D and T=L+U are both easy to compute, but many iterations are required in practice.
b. S=A and T=0 is hard to compute, but requires only 1 iteration.
Let ek =x−xk . Then
Sek+1=Tek or ek=(S−1T)ke0 ,
which proves the following:
Theorem 3.8: The iterative procedure converges or diverges at the rate of S−1T
∞.
51
![Page 52: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/52.jpg)
Named relaxation (or splitting) methods:1. S=D, T =L+U (Jacobi): requires 2 vectors for xk and xk+1, which is
somewhat unnatural, but parallelizes trivially and scales well.2. S=D −L, T =U (Gauss-Seidel or Gau-Seidel in German): requires only 1
vector for xk. The method was unknown to Gauss, but known to Seidel.3. S=ω−1D −L, T =1−ωω D +U :
a. ω∈(1,2) (Successive Over Relaxation, or SOR)b.ω∈(0,1) (Successive Under Relaxation, or SUR)c. ω=1 is just Gauss-Seidel
Example: A= 2 −1−1 2
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥, SJ =
2 00 2
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥, TJ =
0 11 0
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥, and SJ
−1TJ =12 , whereas
SGS =2 01 2
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥, TGS =
0 10 0
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥, and SGS
−1TGS =14 ,
which implies that 1 Gauss-Seidel iteration equals 2 Jacobi iterations.
52
![Page 53: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/53.jpg)
Special Matrix Example
Let
A=
2 −1−1 2 −1
−1 2 −1O
O−1 2 −1
−1 2 −1−1 2
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
be tridiagonal.
For this matrix, let m=SJ−1TJ and l=SSOR,ω
−1 TSOR,ω . The optimal ω is such that
l+ω−1⎛⎝⎜
⎞⎠⎟2
= λω2μ 2 , which is part of Young’s thesis (1950), but correctly
proven by Varga later. We can show that ω=2μ −2 1− 1− μ 2⎛
⎝⎜
⎞
⎠⎟ makes l as small
as possible.
53
![Page 54: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/54.jpg)
Aside: If ω=1, then l2 = λω or l=ω . Hence, Gauss-Seidel is twice as fast as Jacobi (in either convergence or divergence).
If Afd ∈°
n×n, let h= 1n+1.
Facts: m=cos(π h) Jacobim2 = cos2(π h) Gauss-Seidell=1−sin(π h)
1+sin(π h)SOR-optimal ω
Example: n=21 and h=1/22. Then m≈0.99, μ 2 ≈ 0.98, λ ≈ 0.75 ⇒ 30 Jacobis equals 1 SOR with the optimal ω.
There are many other splitting methods, including Alternating Direction Implicit (ADI) methods (1950’s) and a cottage industry of splitting methods developed in the U.S.S.R. (1960’s). There are some interesting parallelization methods based on ADI and properties of tridiagonal matrices to make ADI-like methods have similar convergence properties of ADI.Parallelization of the Iterative Procedure
54
![Page 55: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/55.jpg)
For Jacobi, parallelization is utterly trivial:1. Split up the unknowns onto processors.2. Each processor updates all of its unknowns.3. Each processor sends its unknowns to processors that need the updated
information.4. Continue iterating until done.
Common fallacies: When an element of the solution vector xk has a small enough element-
wise residual, stop updating the element. This leads to utterly wrong solutions since the residuals are affected by updates of neighbors after the element stops being updated.
Keep computing and use the last known update from neighboring processors. This leads to chattering and no element-wise convergence.
Asynchronous algorithms exist, but eliminate the chattering through extra calculations.
55
![Page 56: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/56.jpg)
Parallel Gauss-Seidel and SOR are much, much harder. In fact, by and large, they do not exist. Googling efforts leads to an interesting set of papers that approximately parallelize Gauss-Seidel for a set of matrices with a very well known structures only. Even then, the algorithms are extremely complex.
Parallel Block-Jacobi is commonly used instead as an approximation. The matrix A is divided up into a number of blocks. Each block is assigned to a processor. Inside of each block, Jacobi is performed some number of iterations. Data is exchanged between processors and the iteration continues.
See the book (absolutely shameless plug),
C. C. Douglas, G. Haase, and U. Langer, A Tutorial on Elliptic PDE Solvers and Their Parallelization, SIAM Books, Philadelphia, 2003.
for how to do parallelization of iterative methods for matrices that commonly occur when solving partial differential equations (what else would you ever want to solve anyway???).
56
![Page 57: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/57.jpg)
3b (ii) Krylov Space Methods
Conjugate Gradients
Let A be symmetric, positive definite, i.e.,
A=AT and (Ax,x)≥r x 2, ωhere r>0 .
The conjugate gradient iteration method for the solution of Ax+b=0 is defined as follows with r=r(x)=Ax+b:
x0 arbitrary (approximate solution)r0=Ax0+b (approximate residual)w0=r0 (search direction)
57
![Page 58: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/58.jpg)
For k=0,1,L
xk+1=xk+akωk, ak = − (rk,wk)(wk,Awk)
rk+1=rk+akAωk
wk+1=rk+1+kωk, k = −(rk+1,Awk)(wk,Awk)
Lemma CG1: If Q(x(t))=12(x(t),Ax(t))+(,x(t)) and x(t)=xk+tωk, then ak is chosen to minimize Q(x(t)) as a function of t.
Proof:
Q(x(t)) = 12(xk +tωk,Axk+tAωk)+(,xk+tωk)
= 12 (xk,Axk)+2t(xk,Aωk)+t
2(ωk,Aωk)⎧⎨⎩⎪
⎫⎬⎭⎪+(,xk)+t(,ωk)
58
![Page 59: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/59.jpg)
ddt Q(x(t)) = (xk,Awk)+t(ωk,Aωk)+(,ωk)
ddt Q(ak)
= (xk,Awk)+ak(ωk,Aωk)+(,ωk)
= (xk,Awk)−(rk,ωk)+(,ωk)= (Axk +−rk,ωk)= 0
since
Axk −rk = A(xk−1+ak−1ωk−1)−(rk−1+ak−1Aωk−1)= Axk−1−rk−1= M= Ax0−r0= 0
59
![Page 60: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/60.jpg)
Lemma CG2: The parameter k is chosen so that (wk+1,Aωk)=0 .
Lemma CG3: For 0≤q≤k,1. (rk+1,ωq)=0
2. (rk+1,rq)=03. (wk+1,Aωq)=0
Lemma CG4: ak = (rk+1,rk+1)(rk+1,Ark) .
Lemma CG5: k = (rk+1,rk+1)(rk,rk)
Theorem 3.9 (CG): Let A∈° N×N be symmetric, positive definite. The the CG iteration converges to the exact solution of Ax+b=0 in not more than N iterations.
60
![Page 61: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/61.jpg)
Preconditioning
We seek a matrix M (or a set of matrices) to use in solving M−1Ax=M−1 such that k (M −1A)= κ (A) M is easy to use when solving Mx=b. M and A have similar properties (e.g., symmetry and definiteness)
Reducing the condition number reduces the number of iterations necessary to achieve an adequate convergence factor.
Thereom 3.9: In finite arithmetic, the preconditioned conjugate gradient method converges at the rate based on the largest and smallest eigenvalues of M−1A,
x−xk 2x−x0 2
≤2 k(M−1A)k2(M
−1A)−1k2(M
−1A)+1
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
k
, where k2(M −1A)= λmaxλmin
.
61
![Page 62: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/62.jpg)
What are some common preconditioners?
Identity!!! Main diagonal (the easiest to implement in parallel and very hard to beat) Jacobi Gauss-Seidel Tchebyshev Incomplete LU, known as ILU (or modified ILU)
Most of these do not work straight out of the box since symmetry may be required. How do we symmetrize Jacobi or a SOR-like iteration?
Do two iterations: once in the order specified and once in the opposite order. So, if the order is natural, i.e., 1N, the the opposite is N1.
There are a few papers that show how to do two way iterations for less than the cost of two matrix-vector multiplies (which is the effective cost of the solves).
62
![Page 63: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/63.jpg)
Preconditioned conjugate gradients
x0 arbitrary (approximate solution)r0=Ax0+b (approximate residual) M%r0 =r0 w0 =%r0 (search direction)
followed by for k=0,1,L until (%rk+1,rk+1) ≤e (%r0,r0) and (rk+1,rk+1) ≤e (r0,r0)
for a given e:
xk+1=xk+akωk, ak = − (%rk,rk)
(wk,Awk)rk+1=rk+akAωk
M%rk+1=rk+1
wk+1=%rk+1+kωk,
k = −(%rk+1,rk)
(%rk,rk)
63
![Page 64: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/64.jpg)
3d. Sparse Matrix Methods
We want to solve Ax=b, where A is large, sparse, and NN. By sparse, A is nearly all zeroes. Consider the tridiagonal matrix, A=[−1,2,−1] . If N=10,000, then A is sparse, but if N=4 it is not sparse. Typical sparse matrices are not banded (diagonal) matrices. The nonzero pattern may appear to be random at first glance.
There are a small number of common storage schemes so that (almost) no zeroes are stored for A, ideally storing only NZ(A) = number of nonzeroes in A:
Diagonal (or band) Profile Row or column (and several variants) Any of the above for blocks
The schemes all work in parallel, too, for the local parts of A. Sparse matrices arise in a very large percentage of problems on large parallel computers.
64
![Page 65: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/65.jpg)
Row storage scheme
3 vectors: IA, JA, and AM.
Length DescriptionN+1 IA(j) = index in AM of 1st nonzero in row jNZ(A) JA(j) = column of jth element in AMNZ(A) AM(j) = aik, for some row i and k=JA(j)
Row j is stored in AM (IA( j):IA( j+1)−1). The order in the row may be arbitrary or ordered such that JA( j)<JA(j+1) within a row. Sometimes the diagonal entry for a row comes first, then the rest of the row is ordered.
The column storage scheme is defined similarly.
65
![Page 66: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/66.jpg)
Modified row storage scheme
2 vectors: IJA, AM, each of length NZ(A)+1. Assume A = D + L + U, where D is diagonal and L and U are strictly lower
and upper triangular, respectively. Let mi = NZ(row i of A).
Then
IJA(1)=N+2
IJA(i)=IJA(i−1)+mi−1, i=2,3,L ,N+1IJA( j)= column index of jth element in AMAM (i)=aii, 1≤i≤NAM (N +1) is arbitraryAM ( j)=aik, IJA(i)≤j<IJA(i+1) and k=IJA(j)
The modified column storage scheme is defined similarly.
Very modified column storage scheme
66
![Page 67: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/67.jpg)
Assumes that A is either symmetric or nearly symmetric. Assume A = D + L + U, where D is diagonal and L and U are strictly lower
and upper triangular, respectively. Let hi = NZ(column i of U) that will be stored. Let h= hii=1
N∑ . 2 vectors: IJA, AM with both aij and aji stored if either is nonzero.
IJA(1)=N+2
IJA(i)=IJA(i−1)+hi−1, i=2,3,L ,N+1IJA( j)= row index of jth element in AMAM (i)=aii, 1≤i≤NAM (N +1) is arbitraryAM ( j)=aki, IJA(i)≤j<IJA(i+1) and k=IJA(j)If A≠AT , then AM ( j+h)=aik
AM contains first D, an arbitrary element, UT, and then possibly L.
67
![Page 68: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/68.jpg)
Example:
A=
a11 00 a22
a13 00 a24
a31 00 a42
a33 00 a44
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
Then
D and column “pointers” UT Optional L1 2 3 4 5 6 7 8 9
IJA 6 7 7 8 8 1 2AM a11 a22 a33 a44 a13 a24 a31 a42
68
![Page 69: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/69.jpg)
Compute Ax or ATx
Procedure MULT( N, A, x, y )do i = 1:N
y(i)=A(i)x(i)enddoLshift=0 if A=AT or IJA(N+1)-IJA(1) otherwiseUshift=0 if y=Ax or IJA(N+1)-IJA(1) if y=ATxdo i = 1:N
do k = 1:IJA(i):IJA(i+1)-1j=IJA(k)y(i)+=A(k+Lshift)x(j) // So-so caching propertiesy(j)+=A(k+Ushift)x(i) // Cache misses galore
enddoenddo
end MULT
69
![Page 70: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/70.jpg)
In the double loop, the first y update has so-so cache properties, but the second update is problematic. It is almost guaranteed to cause at least one cache miss. Storing small blocks of size pq (instead of 11) is frequently helpful.
Note that when solving Ax=b by iterative methods like Gauss-Seidel or SOR, independent access to D, L, and U is required. These algorithms can be implemented fairly easily on a single processor core.
Sparse Gaussian elimination
We need to factor A=LDU. Without loss of generality, we assume that A is already reordered so that this is easily accomplished without pivoting. The solution is computed using
Lw=Dz=ωUx=z
⎧
⎨⎪⎪
⎩⎪⎪
70
![Page 71: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/71.jpg)
There are 3 phases:
1. symbolic factorization (determine the nonzero structure of U and possibly L,
2. numeric factorization (compute LDU), and3. forward/backward substitution (compute x).
Let G=(V ,E) denote the ordered, undirected graph corresponding to the matrix
A. V =vi⎧⎨⎩
⎫⎬⎭i=1
N is the virtex set, E= eij=eji | aij+ aji >0
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪ is the edge set, and
virtex adjaceny set is adjG(vi)= k | eik∈E⎧⎨⎩
⎫⎬⎭.
Gaussian elimination corresponds to a sequence of elimination graphs Gi, 0i<N. Let G0=G. Define Gi from Gi-1, i>0, by removing vi from Gi-1 and all of its incident edges from Gi-1, and adding new edges as required to pairwise connect all vertices in adjGi−1
(vi).
71
![Page 72: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/72.jpg)
Let F denote the set of edges added during the elimination process. Let G '=V ,E∪F⎛
⎝⎜⎞⎠⎟ . Gaussian elimination applied to G’ produces no new fillin edges.
Symbolic factorization computes E∪F. Define
m(i) = min k>i | k∈adjG '(vi)⎧⎨⎩
⎫⎬⎭, 1≤i<N
= min k>i | k∈adjGi−1(vi)
⎧⎨⎪⎩⎪
⎫⎬⎪⎭⎪
Theorem 3.10: eij ∈E∪F, i< j, if and only if1. eij ∈E , or
2. sequence k1,k2,L ,kp
⎛⎝⎜
⎞⎠⎟ such that
a. k1=l1, kp=j, el j∈E ,
b. i=kq, some 2qp1, andc. kq =m(kq−1), 2qp.
Computing the fillin
72
![Page 73: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/73.jpg)
The cost in time will be O(N + E∪F). We need 3 vectors:
M of length N1 LIST of length N JU of length N+1 (not technically necessary for fillin)
The fillin procedure has three major sections: initialization, computing row indices of U, and cleanup.
Procedure FILLIN( N, IJA, JU, M, LIST )// Initialization of vectorsM(i)=0, 1iNLIST(i)=0, 1iNJU(1)=N+1
73
![Page 74: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/74.jpg)
do i=1:NLength=0LIST(i)=i
// Compute row indices of Udo j=IJA(i):IJA(i+1)1
k=IJA(j)while LIST(k)=0
LIST(k)=LIST(i)LIST(i)=kLength++if M(k)=0, then M(k)=ik=M(k)
endwhileenddo // jJU(i+1)=JU(i)+Length
74
![Page 75: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/75.jpg)
// Cleanup loop: we will modify this loop when computing either// Ly=b or Ux=z (computing Dz=y is a separate simple scaling loop)k=ido j=1:Length+1
ksave=kk=LIST(k)LIST(ksave)=0
enddo // jenddo // i
end FILLIN
Numerical factorization (A=LDU) is derived by embedding matrix operations involving U, L, and D into a FILLIN-like procedure.
The solution step replaces the Cleanup loop in FILLIN with
k=iSum=0
75
![Page 76: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/76.jpg)
do j=JU(i):JU(i+1)1ksave=kk=LIST(k)LIST(ksave)=0Sum+=U(j)y(k)
enddo // jy(i)=b(i)SumLIST(k)=0
The i loop ends after this substitution.
Solving Ux=z follows the same pattern, but columns are processed in the reverse order. Adding Lshift and Ushift parameters allows the same code handle both cases A=AT and AAT equally easily.
R.E. Bank and R.K. Smith, General sparse elimination requires no permanent integer storage, SIAM J. Sci. Stat. Comp., 8 (1987), pp. 574-584 and the SMMP and Madpack2 packages in the Free Software section of my home web.
76
![Page 77: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/77.jpg)
4. Solution of Nonlinear Equations
Intermediate Value Theorem: A continuos function on a closed interval takes on all values between and including its local maximum and mimum.
(First) Mean Value Theorem: If f is continuous on [a,b] and is differentiable on (a,b), then there exists at least one m∈(a,b) such that f (b)−f(a)=f'(m)(−a).
Taylor’s Theorem: Let f be a function such that f (n+1) is continuous on (a,b). If x≠y∈(a,), then
f (x)=f(y)+ f(i)(y)i=1n
∑ (x−y)ii!
⎛
⎝⎜⎜⎜
⎞
⎠⎟⎟⎟+Rn+1(y,x),
where m between x and y such that
Rn+1(y,x)=f(n+1)(m)(x−y)n+1
(n+1)! .
Given y=f(x), find all s such that f(s)=0.
77
![Page 78: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/78.jpg)
Backtracking Schemes
Suppose ∃a <b such that f (a)< f (b) and f is continuous on [a,b].
Bisection method: Let m=a+2 . Then either1. f (a) f (m)<0 : replace b by m. 2. f (a) f (m)>0 : replace a by m. 3. f (a) f (m)=0 : stop since m is a root.
78
![Page 79: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/79.jpg)
Features include Will always converge (usually quite slowly) to some root if one exists. We can obtain error estimates. 1 function evaluation per step.
False position method: Derived from geometry.
79
![Page 80: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/80.jpg)
First we determine the secant line from (a, f (a)) to (b, f (b)):
y−f(a)x− =f(a)−f()
a− .
The secant line crosses the x-axis when x=x1 , where
x1=af()−f(a)f()−f(a) .
Then a root lies in either [a,x1] or [x1,b] depending on sign of f (a) f (x1) as before.
Features include Usually converges faster than the Bisection method.
xn+1=xn−f(xn)(xn−xn−1)f(xn)−f(xn−1)
.
80
![Page 81: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/81.jpg)
Fixed point methods: Construct a function g(s) such that g(s)=s⇔ f(s)=0 .
Example: g(s)=s−f(s).
Constructing a good fixed point method is easy. The motivation is to look at a function y=g(x) and see when g(x) intersects y=x.
Let I =[a,] and assume that g is defined on . Then g has either zero or possibly many fixed points on I.
81
![Page 82: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/82.jpg)
Theorem 4.1: If g(I)⊆I and g is continuous, then g has at least one fixed point in I.
Proof: g(I)⊆I means that a≤g(a)≤ and a≤g()≤ . If either a=g(a) or b=g(), then we are done. Assume that is not the case: hence, g(a)−a>0 and g(b)−<0 . For F(x)≡g(x)−x, F is continuous and F(a)>0 and F(b)<0 . Thus, by the initial value theorem, there exists at least one s∈I such that 0=F(s)=g(s)−s. QED
Why are Theorem 4.1’s requirements reasonable? s∈I : s cannot equal g(s) if not g(s)∈I . Continuity: if g is discontinuous, the graph of g may lie partly above and
below y=x without an intersection.
Theorem 4.2: If g(I)⊆I and g'(x) ≤L<1, ∀x∈I , then ∃!s∈I such that g(s)=s.
82
![Page 83: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/83.jpg)
Proof: Suppose s1 <s2∈I are both fixed points of g. The mean value theorem with m∈(s1,s2) has the property that
s2−s1 =g(s2)−g1)=g'(m)(s2−s1)≤Ls2−s1 < s2−s1 ,
which is a contradiction. QED
Note that the condition on g' ⇒ g must be continuous.
83
![Page 84: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/84.jpg)
Algorithm: Let x0∈I be arbitrary and set xn+1=g(xn), n=0,1,K
Note that after n steps, g(xn)=xn⇒ xm =xn, ∀m ≥n .
84
![Page 85: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/85.jpg)
Theorem 4.3: Let g(I)⊆I and g'(x) ≤L<1, ∀x∈I . For x0∈I , the sequence
xn =g(xn−1), n=1,2,K converges to the fixed point s and the nth error en =xn−s satisfies
en ≤ Ln1−Lx1−x0 .
Note that Theorem 4.3 is a nonlocal convergence theorem because s is fixed, a known interval I is assumed, and convergence is for any x0∈I .
Proof: (convergence) Recall that s is unique. For any n, ∃mn between xn-1 and s such that
xn−s≤Ln g(xn−1)−s=g'(mn)⋅xn−1−s.Repeating this gives
xn−s≤Ln x0−s.Since 0≤L<1,
85
![Page 86: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/86.jpg)
limn→∞Ln=0⇒ limn→ ∞xn=s.
(error bound) Note that
x0−s x0−x1 + x1−s x0−x1 +Lx0−s
∴ (1− L) x0 − s x1−x0
Since xn−s≤Ln x0−s, en =xn−s≤ Ln1−Lx0−s. QED
Theorem 4.4: Let g'(x) be continuous on some open interval containing s, where g(s)=s. If g'(x) ≤1, then ∃e >0 such that the fixed point iteration is convergent
whenever x0−s≤e .
86
![Page 87: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/87.jpg)
Note that Theorem 4.4 is a local convergence theorem since x0 must be sufficiently close to s.
Proof: Since g' I continuous in an open interval containing s and g'(s) <1, then
for any constant K satisfying g'(s) <K<1, ∃e >0 such that if x∈[s−e,s+e]≡Ie ,
then g'(x) <K . By the mean value theorem, given any x∈Ie, ∃d between x and
s such that g(x)−s=g'(s)⋅x−s≤Ke <e and thus g(Ie)⊆Ie . Using Ie in Theorem 4.3 completes the proof. QED
Notes: There is no hint what e is. If g'(s) >1, then ∃Iε such that g'(x) >1, ∀x∈Ie . So if x0 ∈Ie , x0≠s, then
g(x0)−g(s)=g'(m)(x0−s) or x1−s=g'(m)⋅x0−s> x0−s. Hence, only x0 =s imply convergence, all others imply divergence.
87
![Page 88: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/88.jpg)
Error Analysis
Let ek =xk−s, I a closed interval, and g satisfies a local theorem’s requirements on I. The Taylor series of g about x=s is
en+1 = g(xn)−g(s)
= g'(s)en+
g''(s)2 en2+L +g'(k)(s)k! enk+Ek,n ,
where Ek,n = g(k+1)(an)(k+1)! enk+1.
If x0 ≠s, g'(s)=g''(s)=L =g(k−1)(s)=0 , and 0∉g(k)(I), then
limn→∞en+1enk
=g(k)(s)k! ⇒ kthorder convergence.
The important k’s are 1 and 2.
88
![Page 89: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/89.jpg)
If we only have 1st order convergence, we can speed it up using quadratic
interpolation: given (xi, f (xi)⎧⎨⎩
⎫⎬⎭i=1
3, fit a 2nd order polynomial p to the data such
that p(xi)=f(xi), i=1,2,3. Use p to get the next guess. Let
Dxn ≡ xn+1− xn, Δ2xn ≡ xn+2 −2xn+1+ xn, and xn' = xn − (Δxn)2
Δ2xn.
If en = xn − xn' satisfies en+1 =(B+βn)εn, εn ≠ 0, B <1, and limn→∞βn = 0 , then for n
sufficiently large, xn' is well defined and limn→∞
xn'−x*xn−x*
=0 , where x*=limn→ ∞xn (x* is
hopefully s).
We can apply the fixed point method to the zeroes of f: Choose g(x)=x−f(x)f(x), where 0<f(x)<∞ . Note that f (x) and f (x)f(x) have the same zeroes, which is true also for g(x)=x−F(f(x)), where F(y)≠0 if y0 and F(0)=0 .
89
![Page 90: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/90.jpg)
Chord Method
Choose f (x)≡ m , m constant. So, g'(x)=1−mf'(x). We want
g'(x) <1 ⇒ 0<mf'(x)<2 in some x−s<r .
Thus, m must have the same sign as f '. Let xn+1=xn−mf(xn). Solving for m,
m=xn−xn+1f(xn) .
Therefore, xn+1 is the x-intercept of the line through (xn, f (xn)) with slope 1/m.
Properties: 1st order convergence Convergence if xn+1 can be found (always) Can obtain error estimates
Newton’s Method
90
![Page 91: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/91.jpg)
Choose f (x)≡ 1f '(x) . Let s be such that f (s)=0 . Then
g'(x) = 1−(f'(x))2−f(x)f''(x)(f'(x))2
=f (x) f ''(x)( f '(x))2
If f '' exists in I =x−s≤r and f '(x)≠0 (x∈I), then g'(x)=0 ⇒ 2nd order convergence. So,
xn+1=xn−f(xn)f'(xn)
.
What if f '(s)=0 and f'' exists? Then f (x)=(x−s)h(x), where h(s)≠0, h'' exists. So,
g(x) = x− (x−s)h(x)(x−s)h'(x)+ (x−s)−1h(x)
91
![Page 92: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/92.jpg)
= x−1(x−s)
(x−s) h'(x)h(x)+1
g'(x) =
(1−1)−(x−s)
2h'(x)h(x)+(x−s)
2 h''(x)2h(x)
1+(x−s) h'(x)h(x)⎛
⎝⎜⎜
⎞
⎠⎟⎟
2
Thus, g'(s)=1−1 . Then xn+1=xn− f(x)
f'(x) makes the method 2nd order again.
Properties: 2nd order convergence Evaluation of both f (xn) and f '(xn) .
If f '(xn) is not known, it can be approximated using
92
![Page 93: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/93.jpg)
f '(xn)≈f(xn)−f(xn−1)xn−xn−1
.
Secant Method
x0 is given and x1 is given by false position. Thus,
f (x)= xn − xn−1f (xn)− f (xn−1) and g(x)=x−f(x)f(x).
Properties: Must only evaluate f (xn) Convergence order ≈1.618 (two steps ≈ 2.5)
93
![Page 94: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/94.jpg)
N Equations
Let f (x)= fi(x)⎡⎣⎢
⎤⎦⎥i=1
N=0⎡
⎣⎢⎤⎦⎥. Construct a fixed point function g(x)=gi(x)
⎡⎣⎢
⎤⎦⎥i=1
N from
f (x) . Replace
g'(x) ≤L<1, ∀x such that x−s∞≤r by
∂gi(x)∂x j
< LN , L <1, all i, j and x− s ∞ < ρ
Equivalent: for i=1,2,L ,N ,
gi(x)−gi(y)=∂gi(m(i))∂xjj=1
N∑ (x−y).
Thus,
94
![Page 95: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/95.jpg)
gi(x)−gi(y) ∂gi(μ (i))∂xjj=1
N∑ ⋅ x− y
x−y∞
∂gi(m(i))
∂xjj=1N
∑
x−y∞
LNj=1
N∑
L x−y∞
Thus, g(x)−g(y) ≤Lx−y∞ .
Newton’s Method
Define the Jacobian by J(x)= ∂fi(x)∂xj
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟. If J(x) ≠0 for x−s<r , then we define
95
![Page 96: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/96.jpg)
xn+1=xn−J−1(xn)f(xn), or (better)
1. Solve J(xn)cn =f(xn)2. Set xn+1=xn−cn
Quadratic convergence if1. f ''(x) exists for x−s<r2. J(s) nonsingular3. x0 sufficiently close to s
1D Example: f (x)=cosx−x, x∈[0,1] . To reduce f (x) to : 0.1×10−6 ,
Bisection: x0 =0.5 20 steps False position: x0 =0.5 7 steps Secant: x0 =0 6 steps Newton: x0 =0 5 steps
Zeroes of Polynomials
96
![Page 97: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/97.jpg)
Let p(x)=a0xn+a1x
n−1+L +an−1x+an, a0≠0 .
Properties: Computing p(x) and p'(x) are easy. Finding zeroes when n4 is by use of formulas, e.g., when n=2,
−a1 ± a1
2 − 4a0a22a0
When n5, there are no formulas.
Theorem 4.5 (Fundamental Theorem of Algebra): Given p(x) with n1, there exists at least one r (possibly complex) such that p(r)=0 .
97
![Page 98: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/98.jpg)
We can uniquely factor p using Theorem 4.5:
p(x) = (x−r1)q1(x), q1 is a (n−1)st degree polynomial= (x−r1)(x−r2)q2(x), q2 is a (n−2)nd degree polynomial
M= a0 (x−ri)i=1
n∏
We can prove by induction that there exist no more than n roots.
Suppose that r=a+i such that p(r)=0 . Then p(r)=0 , where r=a−i.
Theorem 4.6 (Division Theorem): Let P(x) and Q(x) be polynomials of degree n and m, where 1mn. Then ∃! S(x) of degree nm and a unique polynomial R(x) of degree m1 or less such that P(x)=Q(x)S(x)+R(x).
98
![Page 99: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/99.jpg)
Evaluating Polynomials
How do we compute p'(a), ''(a), L , (m )(a)? We may need to make a change of variables: t =x−a , which leads to
p(x)=0(x−a)n+1(x−a)
n−1+L +n−1(x−a)+n .
Using Taylor’s Theorem we know that
j = p(n− j)(α )(n− j)! , 0 ≤ j ≤ n .
We use nested multiplication,
p(x)=x(x(x(L x(a0x+a1)+a2)+L an−1)+an ,
where there are n1 multiplies by x before the inner a0x+a1 expression.The cost of evaluating p is
99
![Page 100: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/100.jpg)
multiplies addsnested multiplication n n+1direct evaluation 2n1 n+1
Synthetic Division
To evaluate p(a), a ∈° :
b0 =a0bj =aj−1+aj, 1j
n
Then bn =(a) for the same cost as nested multiplication. We use this method to evaluate p(m)(a), 0≤m ≤n . Write
p(x)=(x−a)qn−1(x)+r0(x).
100
![Page 101: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/101.jpg)
Note that qn−1(x) has degree n1 since a0 ≠0 in the definition of p(x) . Further, its leading coefficient is also a0 . Also, r0(x)=(a) using the previous way of writing p(x) . So, we can show that
qn−1(x)=0xn−1+1x
n−2+L +n−1.
Further, qn−1(x)=(x−a)qn−2(x)+r1, ωhere r1=qn−1(a). Substituting,
p(x) = (x−a)2qn−2(x)+(x−a)r1+r0or
p'(x) = qn−1(a)=r1.
We can continue this to get
p(x)=rn(x−a)n+L +r1(x−a)+r0 , where rm =(m )(a)m! , 0≤m ≤n .
Deflation
101
![Page 102: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/102.jpg)
Find r1 for p(x) . Then
p(x)=1(x)(x−r1).
Now find r2 for p1(x) . Then
p(x)=2(x)(x−r1)(x−r2).
Continue for all ri . A problem arises on a computer. Whatever method we use to find the roots will not find usually the exact roots, but something close. So we really compute %r1. By the Division Algorithm Theorem,
p(x)=1(x)(x−%r1)+ 1(%r1) with p(%r1)≠0 usually.
102
![Page 103: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/103.jpg)
Now we compute %r2 , which is probably wrong, too (and possibly quite wrong). A better r̂2 can be computed using %r2 as the initial guess in our zero finding algorithm for p(x) . This correction strategy should be used for all %ri, i≥2 .
Suppose r1 < r2 and r1−%r1 =r2−%r2 =e >0 . Then
r1−%r1r1
>r2−%r2r2
,
which implies that we should find the smaller roots first.
103
![Page 104: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/104.jpg)
Descartes Rule of Signs
In order, write down the signs of nonzero coefficients of p(x) . Then count the number of sign changes and call it m .
Examples:p1(x)= x3 2x2 +
4x+3
+ + + , m=2p2(x)= x3 +
2x24x 3
+ + , m=1
Rule: Let k be the number of positive real roots of p(x) . Then k≤m and m−k is a nonnegative even integer.
Example: For p2(x) above, m=1, m−k = 0 or k=1, which implies that p2 has one positive real root.
104
![Page 105: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/105.jpg)
Fact: If p(r)=0 , then −r is a root of p(−x). Hence, we can obtain information about the number of negative real roots by looking at p(−x).
Example: p2(−x)=−x3+2x2+4x−3, m=2, m−k=0 or 2 , which implies that p2 has 0 or 2 negative real roots.
Localization of Polynomial Zeroes
Once again, let p(x)=a0xn+a1x
n−1+L +an−1x+an, a0≠0 .
Theorem 4.7: Given p(x) , then all of the zeroes of p(x) lie in Cii=1n
U , where
Cn = z∈: z≤an
⎧⎨⎪⎩⎪
⎫⎬⎪⎭⎪,
C1= z∈: z+a1 ≤1
⎧⎨⎪⎩⎪
⎫⎬⎪⎭⎪, and
Ck = z∈: z≤1+ ak
⎧⎨⎪⎩⎪
⎫⎬⎪⎭⎪, 2≤k<n .
105
![Page 106: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/106.jpg)
Corollary 4.8: Given p(x) and r=1+m ax1≤i≤n
ai , then every zero lies in
C= z∈: z≤r⎧⎨⎩
⎫⎬⎭.
Note that the circles C2,L ,Cn have origin 0. One big root makes at least one circle large. A change of variable (t =x−a ) can help reduce the size of the largest circle.
Example: Let p(x)=(x−10)(x−12)=x2−22x+120 . Then
C2,x = z∈: z≤120⎧⎨⎩
⎫⎬⎭.
Let x=11 and generate p(m)(11), 1≤m ≤2 . We get p'(x)=2x−22, '(11)=0, and p''(11)=2 . So, p(x)=(x−11)2−1=t2−1 for t =x−11. Then
C2,t = z∈: z≤1⎧⎨⎩
⎫⎬⎭.
106
![Page 107: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/107.jpg)
Theorem 4.9: Given any a such that p'(a)≠0 , then there exists at least one zero
of p(x) in
C= z∈: z−a ≤n (a)'(a)
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪.
Apply Theorem 4.9 to Newton’s method. We already have p(xm) and p'(xm) calculated for xm{ } . If p(x)=a0x
n+a1xn−1+L +an−1x+an=a0(x−r1)L (x−rn) and
a0 ≠0, an≠0 , then no ri =0, 1≤i≤n . If p(s) ≤e , some e, then
min1≤i≤n
1−sri≤ e
an⎛
⎝⎜⎜
⎞
⎠⎟⎟
1/n
,
which is an upper bound on the relative error of s with respect to some zero of p(x) .
107
![Page 108: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/108.jpg)
5. Interpolation and Approximation
Assume we want to approximate some function f (x) by a simpler function p(x) .
Example: a Taylor expansion.
Besides approximating f (x) , p(x) may be used to approximate f (m)(x), m≥1 or f (x)dx∫ .
Polynomial interpolation
p(x)=a0xn+a1x
n−1+L +an−1x+an, a0≠0 . Most of the theory relies on
Division Algorithm Theorem p has at most n zeroes unless it is identically zero.
108
![Page 109: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/109.jpg)
Lagrange interpolation
Given (xi, fi)⎧⎨⎩
⎫⎬⎭i=1
n, find p of degree n1such that p(xi)=fi, i=1,L ,n .
Note that if we can find polynomials di(x) of degree n1 such that for i=1,L ,n
di(x j)= 1, i = j0, i ≠ j
⎧
⎨⎪
⎩⎪⎪
then p(x)= fjd j(x)j=1n
∑ is a polynomial of degree n1 and
p(xi)= fjd j(xi)j=1n
∑ =fi .
109
![Page 110: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/110.jpg)
There are many solutions to the Lagrange interpolation problem.
The first one is
di(x)=x− x jxi − x jj=1, j≠i
n∏ .
di(x) has n1 factors (x−xj), so di(x) is a polynomial of degree n1. Further, it satisfies the remaining requirements.
Examples:
n=2: p(x)=f1x−x2x1−x2
+ f2x−x1x2−x1
n=3: p(x)=f1
(x−x2)(x−x3)(x1−x2)(x1−x3)
+L
n3: very painful to convert p(x) into the form aixi∑ .
The second solution is an algorithm: assume that p(x) has the Newton form,
110
![Page 111: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/111.jpg)
p(x)=0+1(x−x1)+2(x−x1)(x−x2)+L +n−1(x−x1)L (x−xn−1).
Note that
f1=(x1)=0 ,
f2 =(x2)=0+1(x2−x1) or 1=f2−0x2−x1
,
M
bi−1=
fi−0−1(xi−x1)−L −i−2(xi−x1)L (xi−xi−2)(xi−x1)L (xi−xi−1)
.
For all solutions to the Lagrange interpolation problem, we have a theorem that describes the uniqueness, no matter how it is written.
111
![Page 112: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/112.jpg)
Theorem 5.1: For fixed (xi, fi)⎧⎨⎩
⎫⎬⎭i=1
n, there exists a unique Lagrange interpolating
polynomial.
Proof: Suppose p(x) and q(x) are distinct Lagrange interpolating polynomials. Each has degree n1 and r(x)=(x)−q(x) is also a polynomial of degree n1. However, r(xi)=(xi)−q(xi)=0 , which implies that r has n zeroes. We know it can have at most n1 zeroes or must be identically zero. QED
Equally spaced xi ’s can be disastrous, e.g.,
f (x)= 11+25x2
, x∈[−1,1] .
It can be shown that
limn→∞ m ax−1≤x≤1
f(x)−n(x)=∞.
See picture (jpeg)…
We can write the Newton form in terms of divided differences.
112
![Page 113: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/113.jpg)
1st divided difference:
f [xi,xi+1]=fi+1−fixi+1−xi
kth divided difference:
f [xi,xi+1,L ,xi+k]=
f[xi+1,L ,xi+k]−f[xi,xi+1,L ,xi+k−1]xi+k−xi
We can prove that the Newton form coefficients are bi =f[x1,L ,xi+1] .
113
![Page 114: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/114.jpg)
We build a divided difference table in which coefficients are found on downward slanting diagonals.
x1 f (x1)=0f [x1,x2]=1
x2 f (x2) f [x1,x2,x3]=2 M M f [x2,x3] f [x1,x2,x3,x4]=3
M f [x2,x3,x4] O M f [x1,L ,xn]=n−1
M f [xn−2,xn−1]xn−1 f (xn−1) f [xn−2,xn−1,xn]
f [xn−1,xn]xn f (xn)
This table contains wealth of information about many interpolating polynomials for f (x) . For example, the quadratic polynomial of f (x) at x2,x3,x4 is a table lookup starting at f (x2) .
114
![Page 115: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/115.jpg)
Hermite interpolation
This is a generalization of Lagrange interpolation. We assume that xi, fi, fi '⎧⎨⎩
⎫⎬⎭i=1
n is
available, where x1<x2 <L <xn . We seek a p(x) of degree 2n1 such that for i=1,L ,n two conditions are met:
1. p(xi)=fi2. p'(xi)=fi'
There are two solutions. The first solution is as follows.
P(x)= fjhj(x)j=1n
∑ + fj'gj(x)j=1n
∑ ,
where g j(xi)=0 and hj(xi)=1, i=j0, i≠j
, ∀1≤i, j⎧⎨⎪
⎩⎪⎪
≤n, satisfies condition 1.
115
![Page 116: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/116.jpg)
Also,
P'(x)= fjhj'(x)j=1n
∑ + fj'gj'(x)j=1n
∑ ,
where hj '(xi)=0 and gj'(xi)=1, i=j0, i≠j
, ∀1≤i, j⎧⎨⎪
⎩⎪⎪
≤n , satisfies condition 2.
We must find polynomials g j and hj of degree 2n1 satisfying these conditions. Let
H(x)= (x−xj)j=1n
∏ and l i(x)=
H(x)⎛⎝⎜
⎞⎠⎟2
(x−xi)2
.
Note that l i(x) and l i '(x) vanish at all of the nodes except xi and that l i(x) is a polynomial of degree 2n2. Put
116
![Page 117: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/117.jpg)
hi(x)=li(x) ai(x−xi)+i
⎡⎣⎢
⎤⎦⎥
and determine ai and bi so that hi(xi)=1 and hi '(xi)=0 : choose
ai =−
li'(x)(li(x))
2 and i=1
li(xi).
Similarly,
gi(x)=li(x)
li(xi)(x−xi).
117
![Page 118: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/118.jpg)
The second solution to the Hermite interpolation problem requires us to write
P(x)=0+1(x−x1)+2(x−x1)2+3(x−x1)
2(x−x2)+
4(x−x1)2(x−x2)
2+L +2n−1 (x−xi)2
i=1n
∏⎛⎝⎜⎜
⎞⎠⎟⎟(x−xn).
Then
f1=(x1)=0f1'='(x1)=1
f2 =(x2)=0+1(x2−x1)+2(x2−x1)2
or
b2 =f2−0−1(x2−x1)
(x2−x1)2
and so on…
Theorem 5.2: Given xi, fi, fi '⎧⎨⎩
⎫⎬⎭i=1
n, the Hermite interpolant is unique.
118
![Page 119: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/119.jpg)
Just as in the Lagrange interpolation case, equally spaced nodes can cause disastrous problems.
Hermite cubics
n=2, so it is a cubic polynomial. Let h=x2−x1. Then
b0 =f1, 1=f1', b2 =h−2 f2−f1−hf1'
⎡⎣⎢
⎤⎦⎥, and b3=h
−3 h(f1'+ f2')−2(f1+ f2)⎡⎣⎢
⎤⎦⎥.
Hermite cubics is by far the most common form of Hermite interpolation that you are likely to see in practice.
119
![Page 120: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/120.jpg)
Piecewise polynomial interpolation
Piecewise linears: P(x)=fix−xi+1xi−xi−1
+ fi+1x−xixi+1−xi
, xi≤x≤xi+1.
----- See picture (jpeg)… -----
Piecewise quadratics: Use Lagrange interpolation of degree 2 over [x1,x2,x3], [x3,x4,x5], … This extends to Lagrange interpolation of degree k1 over groups of k nodal points.
Piecewise Hermite: Cubics is the best known case. For xi ≤x≤xi+1,
Q(x)=fi+ fi'(x−xi)+fi+1−fi−fi'(xi+1−xi)
(xi+1−xi)2 (x−xi)
2+
(xi+1−xi)(fi'+ fi+1')−2(fi+ fi+1)
(xi+1−xi)3 (x−xi)
2(x−xi+1).
120
![Page 121: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/121.jpg)
Facts: Q(x) and Q'(x) are continuous, but Q''(x) is not usually continuous.
121
![Page 122: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/122.jpg)
Cubic spline: We want a cubic polynomial such that s, s', and s'' are continuous. We write
si ''=s''(xi).
Note that s''(x) must be linear on [xi,xi+1] . So
s(x)=si''+x−xixi+1−xi
(si+1''−si').
Then
s'(x) = si '+ s''(t)dtxi
xi+1∫
= si '+si''(x−xi)+si+1''−si''xi+1−xi
⋅(x−xi)
2
2
and
122
![Page 123: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/123.jpg)
s(x)=s(xi)+si'(x−xi)+si''(x−xi)
2
2 +si+1''−si''xi+1−xi
⋅(x−xi)
3
6 .
We know si =fi and si+1=fi+1, so
si '=(xi+1−xi)−1 fi+1−fi−si''
(xi+1−xi)2
2 −si+1''−si''
6 (xi+1−xi)2
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥.
At this point, s(x) can be written by knowing xi, fi, si ''. The si '' can be eliminated by using the continuity condition on s' . Suppose that xi+1−xi=h, ∀i. Then s'(xi)=si', but for xi−1≤x≤xi ,
s(x)=s(xi−1)+si−1'(x−xi−1)+si−1''(x−xi−1)
2
2 +si''−si−1''
h ⋅(x−xi−1)
3
6
and
123
![Page 124: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/124.jpg)
s'(xi)=si−1'+h2(si''+si−1'').
Equating both expressions for s'(xi) we get
si−1''+4si''−si+1''=
6h2
fi−1−2fi+ fi+1⎛⎝⎜
⎞⎠⎟, i=2,L ,n−1.
Imposing s1''=sn''=0 gives us n2 equations and n2 unknowns (plus 2 knowns) to make the system have a unique solution.
Error Analysis
Consider Lagrange interpolation with (xi, fi)
⎧⎨⎩⎫⎬⎭i=1
n, x1<x2 <L <xn, fi=f(xi). We
want to know what
124
![Page 125: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/125.jpg)
p(x)−f(x)= 0 if x∈{xi}? otherωise
⎧⎨⎪⎪
⎩⎪⎪
We write
f (x)=(x)+g(x)G(x), where g(x)= (x − xi)i=1n∏ and G is to be determined.
Theorem 5.3: G(x)=f(n)(m)n! , where m depends on x.
Proof: Note that G is continuous at any x∉{xi} i=1n . Using L’Hopital’s Rule,
G(xi)=limx→ xi
f'(x)−P(x)g'(x) =
f'(xi)−'(xi)g'(xi)
.
Since g '(xi)≠ 0, G(x) is continuous at any node xi . Let x be fixed and consider
125
![Page 126: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/126.jpg)
H(z)=f(z)−(z)−g(z)G(x).
Note that H(xi)=0 since f (xi)=(xi) and g(xi)=0 . By the definition of f (x) , H(x)=0 . Now suppose that x∉{xi} i=1
n . Then H(z) vanishes at n+1 distinct points. H '(z) mush vanish at some point strictly between each adjacent pair of these points (by Rolle’s Theorem), so H '(z) vanishes on n distinct points. Similarly, H ''(z) vanishes at n1 distinct points. We continue this until we have 1 point, m, depending on x, such that H (n)(m)=0 . Since p(x) is a polynomial of degree n1,
0 = H (n)(m) = f (n)(m)−g(n)(m)G(x)= f (n)(m)−n!G(x)
orG(x) = f (n)(m)
n! .
Now suppose that x=xi , some i. Then H(z) only vanishes at n distinct points. But,
126
![Page 127: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/127.jpg)
H '(z)=f'(z)−'(z)−g'(z)G(x),
so H '(xi)=0 and H '(z) still vanishes at n distinct points. We use the same trick as before. QED
Consider Hermite interpolation, with (xi, fi, fi ')⎧⎨⎩
⎫⎬⎭i=1
n, x1<x2 <L <xn , fi =f(xi),
fi '=f'(xi), and p(x) is the Hermite interpolant. Set
q(x)=g(x)⎡⎣⎢
⎤⎦⎥2 and f (x)=(x)+q(x)G(x).
Since q'(xi)=0 and q''(xi)≠0 , we have G(x) is continuous and
G(xi)=f''(xi)−''(xi)
q''(xi).
127
![Page 128: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/128.jpg)
Define
H(z)=f(z)−(z)−q(z)G(x).
In this case,
H ' vanishes at 2n distinct points,H '' vanishes at 2n1 distinct points,
M
Hence, G(x)=f(2n)(m)(2n)! .
128
![Page 129: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/129.jpg)
Note that interpolation is a linear process. Let Pf be any interpolating function (e.g., Lagrange, Hermite, or spline) using a fixed set of nodes. Then for any functions f and g,
Pa f+g=aPf+Pg, any a,β .
Examples:
Lagrange: Pa f+g= a fi+gi⎛⎝⎜
⎞⎠⎟d j(x)j=1
n∑ =aPf+Pg .
Hermite: Similar to Lagrange.
Splines: Define the Kronecker delta function, dij = 1 i = j0 i ≠ j
⎧
⎨⎪
⎩⎪⎪
. Let y i(x) be the
unique spline function satisfying y i(x j)=δij(x) . If Pf = fiy i(x)i=1n
∑ , then Pf is the interpolatory spline for f (x) . Linearity follows as before.
129
![Page 130: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/130.jpg)
Let Pf be any linear interpolatory process that is exact for polynomials of degree m, i.e., if q(x) is a polynomial of degree m, Pq =q . For a given function f (x) , Taylor’s Theorem says that
f (x)=f(x1)+ f'(x1)(x−x1)+L + f(m )(x1)
(x−x1)m
m! + f(m +1)(t)(x−t)m
m! dtx1
x∫ .
130
![Page 131: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/131.jpg)
Define
K(x,t)=(x−t)mm! , x1≤t<x0, x≤t≤xn
⎧
⎨⎪⎪
⎩⎪⎪
so that
f (x)= f(j)(x1)(x−x1)
j
j!j=0n
∑⎛
⎝
⎜⎜⎜
⎞
⎠
⎟⎟⎟+ K(x,t)f(m +1)(t)dt≡F(x)+F∑ (x)
x1
xn∫ .
Even More Error Analysis
Define Ck([a,b])≡ f:[a,]→ ° such that f(k) is continuous on [a,]⎧
⎨⎪⎩⎪
⎫⎬⎪⎭⎪.
131
![Page 132: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/132.jpg)
Theorem 5.4: If p(x) is a polynomial of degree n1 that interpolates f ∈Ck([a,]) at xi
⎧⎨⎩⎫⎬⎭i=1
n⊆[a,] , then
f (x)−(x)=f(n)(m)n! W (x), ωhere W (x)= (x−xi)i=1
n∏ .
Tchebyshev Polynomials of the First Kind
Define
Tk(x)=cos(kcos−1(x)), k=0,1,2,L , x∈[−1,1] ,
whereT0(x)≡1 and T1(x)≡x.
Choose x=cos(q), 0≤q≤ . Then Tk(x)=Tk(cos(q))=cos(kq) and we get a three term recurrence:
132
![Page 133: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/133.jpg)
Tk+1(x) = cos((k+1)q)=2cos(q)cos(kq)−cos((k−1)q)= 2xTk(x)−Tk−1(x), k≥1
Hence,
T2(x) = 2x2−1T2(x) = 4x3−3x
M
We can verify inductively that Tk(x) is a kth order polynomial. The leading coefficient of Tk(x) is 2k−1 and Tk(xi)=0, 0≤i≤k−1, when
xi =cos(2i+1)
2k⎛
⎝⎜⎜
⎞
⎠⎟⎟ .
133
![Page 134: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/134.jpg)
Finally, for x∈[−1,1] we can show that Tk ∞≤1. For yi =cos(
ik ),
Tk(yi)=cos(i)=(−1)i , so, in fact, Tk ∞
=1. From Theorem 5.4, we can prove
that W (x) is minimized when W (x)=2−nTn+1(x).
Translating Intvervals
Suppose the problem on [c,d] needs to be reformulated on [a,b].
Example: Tchebyshev only works on [a,b]=[−1,1] . We use a straight line transformation: t∈[c,d], x∈[a,] . Hence,
x=mt+, ωhere =ad−cd−c and m =−ad−c.
Example: Tchebyshev with x∈[a,], a< arbitrary. Then
134
![Page 135: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/135.jpg)
x= −a2
⎛
⎝⎜⎜
⎞
⎠⎟⎟t+
+a2
⎛
⎝⎜⎜
⎞
⎠⎟⎟ or t=
2−a x−+a2
⎛
⎝⎜⎜
⎞
⎠⎟⎟=
2(x−a)−a −1.
The shifted Tchebyshev polynomials are defined by
T≤ k(x)=Tk(t)=Tk2(x−a)−a −1
⎛
⎝⎜⎜
⎞
⎠⎟⎟=coskcos
−1 2(x−a)−a −1
⎛
⎝⎜⎜
⎞
⎠⎟⎟
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥.
Since
Tk(ti)=0 for ti=cos(2i+1)
2k⎛
⎝⎜⎜
⎞
⎠⎟⎟, 0≤i≤k−1,
then
xi =−a2
⎛
⎝⎜⎜
⎞
⎠⎟⎟ti+
+a2 , 0≤i≤k−1.
135
![Page 136: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/136.jpg)
are zeroes of T∑ k(x) . Further,
T≤ 0(x) = 1
T≤ 0(x) = 2 x−a
−1⎛
⎝⎜⎜
⎞
⎠⎟⎟−1
M T≤
k+1(x) = Tk+1(x)=2tTk(t)−Tk−1(t)
= 2 2(x−a)
−a −1⎛
⎝⎜⎜
⎞
⎠⎟⎟T≤ k(x)−T≤ k−1(x), k≥1,
We can prove that the leading coefficient of T≤
k+1(x) is 2k−1 2−a
⎛
⎝⎜⎜
⎞
⎠⎟⎟
k, k≥1.
Further, we know that W (x)=2−n 2
−a⎛
⎝⎜⎜
⎞
⎠⎟⎟
−n−1T≤ n+1(x) from Theorem 5.4.
136
![Page 137: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/137.jpg)
Tensor Product Interpolation
Given xi⎧⎨⎩
⎫⎬⎭i=0
n and y j
⎧⎨⎪⎩⎪
⎫⎬⎪⎭⎪j=0
m, interpolate f (x,y) over (xi,y j)
⎧⎨⎪⎩⎪
⎫⎬⎪⎭⎪, giving us
p(x,y)= aijxiyjj=0
m∑i=0
n∑ .
The bi-Lagrangian is defined by l ij(x,y)=l± i(x)l± j(y), where l±k is the one
dimensional Lagrangian along either the x or y axis (k=1,2 respectivefully) and aij =f(xi,yj).
The bi-Hermite and bi-Spline can be defined similarly.
137
![Page 138: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/138.jpg)
Orthogonal Polynomials and Least Squares Approximation
We approximate f (x) on [a,b] given (xi, fi)⎧⎨⎩
⎫⎬⎭i=0
m. Define
℘n = p(x)| p a polynomial of degree ≤ n⎧⎨⎩
⎫⎬⎭.
Problem A: Let wi⎧⎨⎩
⎫⎬⎭i=0
m, wi >0 (weights), m>n. Find p*(x)∈℘n which minimizes
wi p*(xi)−fi⎡⎣⎢
⎤⎦⎥2
i=0m
∑ .
Problem B: Let w(x)∈C([a,]) and positive on (a,b). Find p*(x)∈℘n which
minimizes w(x) p*(x)−f(x)⎡⎣⎢
⎤⎦⎥2dx
a
∫ .
Properties of Both: Unique solutions and are “easy” to solve by a finite number of steps in math formulas (which is not true of solving the more general problem min
p*maxx p*(x)−f(x) .
138
![Page 139: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/139.jpg)
Define
< f ,g >1 = w j f (x j)gj=0m
∑ (xj)
< f ,g >2 = w(x) f (x)g(x)dxa
b∫
f = < f , f > (either inner product)
Note that f is a real norm for <⋅,⋅>1, but is only a semi-norm for <⋅,⋅>2 .
Theorem 5.5 (Cauchy-Schwarz): Let f ,g∈C([a,]). Then
< f ,g > ≤ f ⋅ g .
Proof: If <g,g >= 0 , then < f ,g >= 0 . If <g,g >≠ 0 , then
0≤< f−ag, f−ag>=< f, f>−2a < f,g>+a2 <g,g>.
139
![Page 140: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/140.jpg)
Use a=< f ,g >< g,g > . Then 0≤< f, f>−< f,g>2
<g,g> . QED
Definitions: p and q are orthogonal if and only if < p,q >= 0 . p and q are orthonormal if and only if < p,q >= 0 and p =q =1.
Consider S=1,x,x2,L ,xn⎧
⎨⎩⎪
⎫⎬⎭⎪, the set of monomials. The elements are not
orthogonal to each other under either < f ,g >1 or < f ,g >2 . Yet any p∈℘n is a linear combination of the elements of S. We can transform S into a different set of orthogonal polynomials using the
Gram-Schmidt Algorithm: Given S, let
q0(x)≡1 p0(x)=q0(x)q0
140
![Page 141: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/141.jpg)
qk(x)=xk− <xk, j> j(x)j=0k−1
∑ pk(x)=qk(x)qk
Then qk⎧⎨⎩
⎫⎬⎭k=0
n is orthogonal and pk
⎧⎨⎩
⎫⎬⎭k=0
n is orthonormal.
Note that for <⋅,⋅>2 , 1 = 12dxa
∫
⎛
⎝⎜⎜
⎞
⎠⎟⎟
1/2
= −a.
Let p(x)= akxk,k=0
r∑ r≤n. Then
xk =qk(x)+ <xk, j> j(x)j=0k−1
∑ =qk k(x)+ <xk, j> j(x)j=0k−1
∑ .
Using this expression, we can write p(x) as
141
![Page 142: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/142.jpg)
p(x)= < ,j> j(x)j=0r
∑
since
< pk, p >= β j < p j, pk >j=0n∑ = βk < pk, pk >= βk .
Best Least Squares Approximation
Theorem 5.6: Let <⋅,⋅> be either <⋅,⋅>1 or <⋅,⋅>2 and f = < f, f> . If f ∈C([a,]), then the polynomial p*(x)∈℘n that minimizes f −* over ℘n is given by
p*(x)= < f,j> j(x)j=0n
∑ ,
142
![Page 143: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/143.jpg)
where p j(x)⎧⎨⎪⎩⎪
⎫⎬⎪⎭⎪j=0
n
is the orthonormal set of polynomials generated by Gram-
Schmidt.
Proof: Let p*(x)∈℘n . Then p(x)= a jj(x)j=0n
∑ . Further,
0≤ f−2
= < f − α j p jj=0n∑ , f − α j p jj=0
n∑ >
= < f , f >−2 α j < f , p j >j=0n∑ + α j
2j=0n∑
= < f , f >−2 < f , f >2j=0n∑ + α j
2 −2α j < f , p j >+ < f , p j >2⎛⎝⎜
⎞⎠⎟j=0
n∑= < f , f >− < f , p j >2
j=0n∑ + α j− < f , p j >⎛
⎝⎜⎞⎠⎟j=0
n∑2,
which is minimized when we choose a j =< f , p j > . QED
143
![Page 144: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/144.jpg)
Note: The coefficients a j =< f , p j > are called the generalized Fourier coefficients.
Facts:
f (x)−n*(x)⊥℘n
f2≥ < f,j>
2j=0n
∑
Efficient Computation of pn*(x)
144
![Page 145: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/145.jpg)
We can show that we have a three term recurrence:
qk(x)=(x−ak)qk−1(x)+kqk−2(x), k≥2 ,
whereak =
<xqk−1,qk−1><qk−1,qk−1>
and bk =<qk−1,xqk−2 ><qk−2,qk−2 >
.
This gives us
pn*(x) = < f , p j > p j(x)j=0
n−1∑⎛
⎝⎜⎜
⎞
⎠⎟⎟+ < f , pn > pn(x)
= pn−1* (x)+< f, n> n(x)
So,
145
![Page 146: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/146.jpg)
pn*(x)= < f,j> j(x)j=0
n∑
⎛
⎝⎜⎜
⎞
⎠⎟⎟
is equivalent and may be less sensitive to roundoff error.
Also,
pn*(x) = < f , p j > p j(x)j=0
n∑
=< f ,q j ><q j,q j >q j(x)j=0
n∑
= c jq j(x)j=0n
∑
If we precompute a j,bj,cj⎧⎨⎪⎩⎪
⎫⎬⎪⎭⎪, then pn(x) only costs 2n1 mltiplies.
146
![Page 147: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/147.jpg)
6. Numerical Integration and Quadrature Rules
Assume f (x) is integrable over [a,b]. Define
I( f )= f(x)ω(x)dxa
∫ , where w(x)≥0 is a weight function.
Frequently, w(x)≡1. An formula that approximates I( f ) is called numerical integration or a quadrature rule. In practice, if g(x) approximates f (x) well enough, then I(g)≈I(f).
Interpolatory Quadrature
Let pn(x) be the Lagrangian interpolant of f (x) at {xi}i=0n . i.e.,
pn(x)= f(xj)lj(x)j=0n
∑ .
Define
147
![Page 148: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/148.jpg)
Qn( f )=I(n) = pn( f )w(x)dxab∫
= f (x j)l j(x)j=0n
∑⎛
⎝⎜⎜
⎞
⎠⎟⎟ω(x)dxa
∫
= f (x j)j=0n
∑ lj(x)ω(x)dxa
∫
= Aj f (x j)j=0n
∑ ,
where the Aj are quadrature weights and the x j are the quadrature nodes.
Note that if f (x)∈℘n , then f (x)=n(x)⇒ Qn(f)=I(f), i.e., the quadrature is exact. If Qn( f ) is exact for polynomials of degree ≤m , then we say the quadrature rule has precision m. We will develop quadrature rules that have precision >n later (e.g., Gaussian quadrature).
Method of Undetermined Coefficients
148
![Page 149: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/149.jpg)
If Qn( f ) has precision n, then it is exact for the monomials 1,x,x2,L ,xn . Suppose the nodes are no longer fixed. We start with n+1 equations
I(xk)= xkω(x)dx=Qn(xk)=∫ Ajj=0n
∑ xjk, 0≤k≤n ,
for our 2n+1 unknowns Aj and x j . Let k∈[0,2n+1]∩¢ so we have 2n+2 (nonlinear) equations and unknowns. If it has a solution, then it has precision 2n+1. This is what Gaussian quadrature is based on (which we will get to later).
The Trapezoidal and Simpson’s Rules are trivial examples.
Trapezoidal Rule
Let [a,b]=[−h,h], h>0, ω(x)≡1. This is derived by direct integration rule. Take x0 =−h and x1=h. Then
p1(x) = 12h f (−h)(h−x)+ f(h)(h+x)⎡
⎣⎢⎤⎦⎥
149
![Page 150: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/150.jpg)
Q1(x) = I(p1)= 1(x)dx−h
h∫
= 14h f (h)(h+x)2
−h
h−f(−h)(h−x)2
−h
h⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
= h f (−h)+ f(h)⎡⎣⎢
⎤⎦⎥.
Simpson’s Rule
This derived using undetermined coefficients. Let [a,b]=[−h,h], h>0, ω(x)≡1, x0 =−h, x1=0 , and x2 =h. We force
Q2( f )= Ajf(xj)= f(x)dx−h
h∫j=0
2∑ for f (x)=1,x,x2 .
Then
150
![Page 151: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/151.jpg)
I(1) = 1dx−h
h∫ = 2h = A0+A1+A2
I(x) = xdx−h
h∫ = 0 = A0h+A2h
I(x2) = x2 dx−h
h∫ = 2h3
3 = A0h2+A2h2
Solving this 33 system of linear equations gives us A0 =A1=h3 and A1=
4h3 .
Note that I(x3)=0=Q2(x3), but I(x4)≠Q2(x
4) so that Simpson’s Rule has precision 3.
151
![Page 152: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/152.jpg)
What Does Increasing n Do for You?
Theorem 6.1: For any n∈• , f∈C([a,]), let Qn( f )= Aj(n)
j=0n
∑ f(xj) be an interpolatory quadrature derived by direct integration. Then ∃K , constant, such that
Aj(n) ≤K, ∀n ⇔ j=0
n∑ limn→ ∞Qn(f)=I(f), ∀f∈C([a,]).
Justification for Positive Weights
We must have w(x)dx>0a
∫ . Further,
0<I(1)= ω(x)dx=Qn(1)= Aj(n)
j=0n
∑a
∫ .
If Aj(n)≥0, 0≤j≤n, and we can choose a set of x j ’s to get this, then Theorem
6.1 guarantees convergence. All positive weights are good because they reduce
152
![Page 153: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/153.jpg)
roundoff errors since we ought to have as many roundoffs on the high and low sides, thus canceling errors. Finally, we expect roundoff to be minimized when the Aj
(n)’s are (nearly) equal.
Translating Intervals
We will derive a formula on a specific interval, e.g., [1,1], and then apply it to another interval [a,b]. Suppose that
Qn( f )= Ajg(xj)j=0n
∑ that approximates g(t)dta
b∫
and we want f (x)dxa
b∫ . Set x=at+ , a =b−a
2 , and =b+a2 . Then
f (x)dx=a
∫ a f(at+)dt
−1
1∫ . Let g(t)=a f(at+). Then
Qn(g)= aAjf(atj+)j=0n
∑ approximates I(g).
153
![Page 154: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/154.jpg)
So,
Qn*( f ) = Aj
* f (x j)j=0n
∑= b−a
2 Ajf(xj)j=0n
∑ , so
x j = b−a2 tj+
+a2 .
Newton-Coates Formulas
Assume that the xi ’s are equally spaced in [a,b] and that we define a quadrature rule by Qn( f )= Ajf(xj)∑
154
![Page 155: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/155.jpg)
The closed Newton-Cotes formulas Qn( f )= Ajf(xj)j=0n
∑ assume that h=−an
and xi =a+ih, 0≤i≤n . The open Newton-Cotes formulas Qn( f )= Aj*f(yj)j=1
n+1∑
assume that h=−an+2 and yi =a+ih, 1≤i≤n+1.
Examples:
T ( f )=−a2 f(a)+ f()⎡⎣⎢
⎤⎦⎥
2 point closed Trapezoidal Rule
S( f )=−a6 f(a)+4 f(a+2 )+ f()⎡
⎣⎢⎢
⎤
⎦⎥⎥ 2 point closed Simpson Rule
f (x)dx≈23 2 f(−12)−f(0)+2 f(12)⎡
⎣⎢⎢
⎤
⎦⎥⎥−1
1∫ 3 point open
f (x)dx≈14 f(−1)+3f(−13)+3f(13)+ f(1)
⎡
⎣⎢⎢
⎤
⎦⎥⎥−1
1∫ 4 point closed
155
![Page 156: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/156.jpg)
For n≥10 , the weights are always of mixed signs. Higher order formulas are not necessarily convergent. Lower order formulas are extremely useful.
Suppose we have p3(x) , the Hermite interpolant of f (x) . We want I(p3), which we can get by observing that
I(p3) = S(p3)
= b−a2 f(a)+ f()⎡
⎣⎢⎤⎦⎥+(−a)
2
12 f'(a)−f'()⎡⎣⎢
⎤⎦⎥
= T ( f )+(−a)2
12 f'(a)−f'()⎡⎣⎢
⎤⎦⎥
= CT ( f )
This is known as the Trapezoidal Rule with Endpoint Correction (a real mouthful). It has precision 3.
156
![Page 157: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/157.jpg)
Error Analysis
Assuming that f ∈Cn+1([a,]), the error in interpolation is given by
en(x)=f(x)−n(x)=f(n+1)(m)(n+1)! W (x), ωhere W (x)= (x−xi)i=0
n∏ .
The error in integration is
en = en(x)w(x)dxa
b∫
= f (n)(m)(n+1)! W (x)ω(x)dx
a
∫
So,
en ≤ f(n+1)∞⋅ 1(n+1)! W (x)ω(x)dx
a
∫ .
157
![Page 158: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/158.jpg)
We can simply the last equation by applying the Second Mean Value Theorem (which states that for g,h∈C((a,)) such that g does not change signs, then
g(x)h(x)dxa
b∫ =h(q) g(x)dx
a
∫ , q ∈(a,)) to the formula for en . Hence,
eT = I( f )−T(f) = f (2)(q)2! (x−a)
a
∫ (x−)dx = −f ''(θ )
12 (b−a)3
eCT = f (4)(q)4! (x−a)2
a
∫ (x−)2dx = = −f (4)(θ)
702 (b−a)5
eS = −f (4)(θ)90
b−a2
⎛
⎝⎜⎜
⎞
⎠⎟⎟
2
Composite Rules
What if we wanted a highly accurate rule on [a,b]? The best approach is to divide [a,b] into subintervals, use a low order quadrature rule on each subinterval, and add them up since high order quadrature rules tend to have problems.
158
![Page 159: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/159.jpg)
Let a0 =x0 <x1<L <xn=. Then
I( f )= f(x)ω(x)dxa
∫ = f(x)ω(x)dx
xj
xj+1∫j=0n−1
∑ .
Consider w(x)≡1, xj+1−xj=h. Then for the Trapazoidal Rule,
Tn( f ) = h2 f (x j+1)+ f(xj)
⎛⎝⎜
⎞⎠⎟j=0
n−1∑
= h f (x j)+h2 f(x0)+ f(xn)
⎛⎝⎜
⎞⎠⎟j=0
n−1∑
enT = −h3
12 f ''(θ j)j=0n−1∑
159
![Page 160: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/160.jpg)
Theorem 6.2: Let g∈C([a,]) and aj⎧⎨⎪⎩⎪
⎫⎬⎪⎭⎪j=0
n−1 be constants of the same sign. If
t j ∈[a,], 0≤j<n , then for some q∈[a,b],
a jg(t j)=g(q)j=0n−1
∑ ajj=0n−1
∑ .
Hence,
enT =−f''(q) h3
12j=0n−1
∑ =−f''(q)nh312 =−h
2(−a)12 f''(q).
Consider Simpson’s Rule:
f (x)dxx j
x j+1∫ =h6 f(xj)+4 f(xj+xj+1
2 )+ f(xj+1)⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥−f(4)(qj)90
h5
⎛
⎝⎜⎜
⎞
⎠⎟⎟
5
, xj<qj<xj+1.
So,
160
![Page 161: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/161.jpg)
Sn( f )=h6 f(x0)+ f(xn)+2 f(xj)+4 f(xj+xj+1
2 )j=0n−1
∑j=1n−1
∑⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
and
enS =−
f(4)(qj)90j=0
n−1∑ h
2⎛
⎝⎜⎜
⎞
⎠⎟⎟
2
=−f(4)(q)90 n h
2⎛
⎝⎜⎜
⎞
⎠⎟⎟
5
=−−a180h4
⎛
⎝⎜⎜
⎞
⎠⎟⎟
4
f(4)(q).
Corrected Trapezoidal Rule
CTn( f )=Tn(f)+h2
12 f'(a)−f'()⎡⎣⎢
⎤⎦⎥ and
enCT =h
4(−a)720 f(4)(q)
161
![Page 162: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/162.jpg)
The number of function evaluations and order of error over n points is
Tn( f ) N O(h2)Sn( f ) 2N+1 O(h4)CTn( f ) N+2 derivatives (yikes) O(h4)
We can show that
limn→∞Tn(f)=limn→ ∞Sn(f)=limn→ ∞cTn(f)=I(f).
Adaptive Quadrature
Suppose we want I( f ) to within an error tolerance of e >0 and an automatic procedure to accomplish this feat. Consider Sn( f ).
Motivation: Suppose f (x) is badly behaved only over [a,]⊂[a,] , where [a,] is a small part of [a,b]. Then Sn( f ) over [a,a] and [,] will be accurate for small n’s, but Sn( f ) over [a,b] may be a very poor approximation to I( f ).
162
![Page 163: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/163.jpg)
Doubling n will not necessarily increase accuracy over [a,a] and [,] , where it was already acceptable, and we still not get an acceptable approximation over [a,] . Instead, we want to subdivide [a,] and work hard just there while doing minimal work in [a,a] and [,]… and we do not want to know where [a,] is in advance!
Adaptive quadrature packages accept [a,b], f, and e and return EST, which supposedly satisfies
f (x)dx−ESTa
∫ ≤e .
An error sensing mechanism is used on intermediate steps to control the overall quadrature error. For instance, if [c,d]⊆[a,] and H =d−c, then
f (x)dx−S(f)=a
∫ −f(4)(q)
90H2
⎛
⎝⎜⎜
⎞
⎠⎟⎟
5
and f (x)dx−S2(f)=a
∫ −f(4)(g)
90 2 H2
⎛
⎝⎜⎜
⎞
⎠⎟⎟
4
,
163
![Page 164: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/164.jpg)
where q,γ ∈[c,d]. The critical (and sometimes erroneous) assumption is that f (4)(x)≈K, K constant over [c,d]. This is true when [c,d] is small in comparison to how rapidly f (4)(x) varies in [c,d]. Set
Icd( f )= f(x)dxc
d∫ .
Then
Icd( f )−S2(f)≈116 Icd(f)−S(f)
⎡⎣⎢
⎤⎦⎥,
which mean that S2( f ) is 16 times more accurate than S( f ) when f (4)(x) is well behaved on [c,d]. So,
16 Icd −S2⎡⎣⎢
⎤⎦⎥≈ Icd−S
⎡⎣⎢
⎤⎦⎥ or 15 Icd−S2
⎡⎣⎢
⎤⎦⎥≈S2−S
⎡⎣⎢
⎤⎦⎥.
164
![Page 165: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/165.jpg)
We know to compute both S( f ) and S2( f ) over [c,d]. Many applications require that EST be very accurate, rather than inexpensive to compute. Hence, we can use a conservative error estimator of the form,
Icd( f )−S2(f)≤12 S2(f)−S(f) .
Algorithm apparent: Compute S( f ) and S2( f ) over [c,d].1. If the error is acceptable, then add the estimate of Icd( f ) into EST.2. Otherwise, divide [c,d] into two equal sized intervals and try again in both
intervals. The expected error on both intervals is reduced by a factor of 32.The real estimator must depend on the size of [c,d], however. A good choice is
12 S2( f )−S(f)≤e d−c
−a⎛
⎝⎜⎜
⎞
⎠⎟⎟ .
Theorem 6.3: This estimator will eventually produce a interval [c,e] that is acceptable.
165
![Page 166: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/166.jpg)
Proof: Every time we half the interval [c,d], the quadrature error decreases by a factor of 32. Set
err(c,d)= f(x)dx−EST(c,d)c
d∫ .
If
err(a,z) ≤e z−a−a and err(z,t)≤et−z−a,
then
err(a,t) ≤err(a,z)+ err(z,t)≤e z−a−a+et−z−a=e
t−a−a.
Taking t =⇒ err(a,)≤e . QEDTheorem 6.4: The cost is only two extra function evaluations at each step.
166
![Page 167: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/167.jpg)
Folk Theorem 6.5: Given any adaptive quadrature algorithm, there exists an infinite number of f (x)’s that will fool the Algorithm Apparent into failing. (Better algorithms work for the usual f (x)’s.)
Proof: Let a=r1<s1<t1<u1<v1< be 5 equally spaced points used in computing S( f ) and S2( f ). Test
S2( f )−S(f)≤2ev1−r1−a .
If true, then use S2( f ) as an estimate to Ir1,v1( f ).
If false, then retreat to [r2.v2], where r1=r2 <s2 <t2 <u2 <v2 =t1, equally spaced. Now only evaluations at s2 and u2 are necessary if we saved our
previous function evaluations. We test S2( f )−S(f)≤2ev2−r2−a . If the test
succeeds, then we pass on to interval [t2.v2], otherwise we work on a new level 3. This process is not guaranteed to succeed. Hence, we need to add an
167
![Page 168: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/168.jpg)
extra condition that
vi −ri≥HMIN always.
If this fails, then we cannot produce EST. QED
168
![Page 169: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/169.jpg)
Richardson Extrapolation
This method combines two or more estimates of something to get a better estimate. Suppose
a0 is estimated by A(h),
where A(h) is computable for any h≠0 . Further, we assume that
limh→ 0
A(h)=a0 .
Finally, we assume that
,
169
![Page 170: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/170.jpg)
where the ai ’s are independent of h and ak ≠0 . Take h1=h, h2=rh1, and 0<r<1
with r=12 the most common value. We want to eliminate the hk term using a combination of A(h) and A(rh) by noting that
a0 =A(rh)+ aii=km
∑ (rh)i+Cm (rh)(rh)m +1.
We have two definitions of a0 , so we can equate them to compute −rk = first + second definitions, or
a0−rka0=A(rh)−r
kA(h)+ ai(ri−rk)hi+ Cm (rh)rm +1−Cm (h)rk
⎡⎣⎢
⎤⎦⎥j=k
m∑ hm +1.
170
![Page 171: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/171.jpg)
Set
bi = airi −rk1−rk
C≤ m(h) = Cm(rh)rm+1−Cm (h)rk
1−rk
B(h) = Cm(rh)rm+1−Cm (h)rk1−rk
Then
a0 =B(h)+ ih
i+C≤ m (h)hm +1i=k+1m
∑ .
If hk+1≠0 , then we can repeat this process to eliminate the hk+1 term. Define
A0,m =A(rmh), m =0,1,2,L
171
![Page 172: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/172.jpg)
Then
A1,m =A0,m +1−r
kA0,m1−rk
and Ai+1,m =
Ai,m +1−rk+iAi,m
1−rk+i, m =0,1,L
Applications of Richardson Extrapolation
Differentiation is a primary application. Assume that f '(a)=limh→ 0
f(a +h)−f(a)h .
First, try for small h, A(h)=f(a +h)−f(a)h . The Taylor expansion about x=a
gives us
a0 =A(h)+ −f(i+1)(a)(i+1)!
⎛
⎝
⎜⎜⎜
⎞
⎠
⎟⎟⎟i=1
m∑ hi+ −
f(m +2)(qh)(m +2)!
⎛
⎝
⎜⎜⎜
⎞
⎠
⎟⎟⎟hm +1≡A(h)+ai+Cm (h),
where the ai ’s are independent of h and probably unknown.
172
![Page 173: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/173.jpg)
Second, try A(h)=f(a +h)−f(a−h)2h . We can prove that
A(h)=f'(a)+ f(3)(a)
3! h2+ f(5)(a)5! h4 +L
We can modify the definition of Ai+1,m to use r2,r4,r6,L Then
B(h)=A(rh)−r2A(h)
1−r2=f'(a)+4h
4 +L
Next extrapolatation must be of the form B(rh)−r4B(h)1−r4 . So,
A≤ i+1,m =A≤ i,m +1−rk+2A≤ i,m
1+rk+2i .
Use this formula whenever
a0 =A(h)+akh
k+ak+2hk+2+ak+4h
k+4 +L
173
![Page 174: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/174.jpg)
Romberg Integration
On TN (g), h=−a, r=12 , approximate I(g)= g(x)dxa
∫ . Define
T0,m =−a2m
12g0+g1+L +gs−1+
12gs
⎡
⎣⎢⎢
⎤
⎦⎥⎥,
where gi =g(xi), xi=a+i−a2m , s=2m . This choice of T0,m eliminates half of the
g(x) function evaluations when computing T0,m+1. The error only contains even powers of h. Hence,
T1,m =3(T0,m +1−
14T0,m )
4 or Ti,m =
Ti−1,m +1−14
⎛
⎝⎜⎜
⎞
⎠⎟⎟
iTi−1,m
1− 14
⎛
⎝⎜⎜
⎞
⎠⎟⎟
i =Ti−1,m +1−Ti−1,m +1−Ti−1,m
4i−1 .
Continue extrapolation as long as
174
![Page 175: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/175.jpg)
Ri,m =Ti,m −Ti,m−1Ti,m +1−Ti,m
≈4i+1.
Roundoff error is the typical culprit for stopping Richardson extrapolation.
175
![Page 176: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/176.jpg)
7. Automatic Differentiation (AD)
This is a technique to numerically evaluate the derivative of a function using a computer program. There have been two standard techniques in the past:
Symbolic differentiation Numerical differentiation
Symbolic differentiation is slow, frequently produces many pages of expressions instead of a compact one, and has great difficulty converting a computer program. Numerical differentiation involves finite differences, which are subject to roundoff errors in the discretization and cancellation effects. Higher order derivatives exasperate the difficulties of both techniques.
“Automatic differentiation solves all of the mentioned problems.” Wikipedia
Throughout this section, we follow Wikipedia’s AD description and use its figures.
176
![Page 177: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/177.jpg)
The primary tool of AD is the chain rule,
dfdx =
dgdh⋅
dhdx for a function f (x)=g(h(x)).
There are two ways to traverse the chain rule:
Left to right, known as forward accumulation. Right to left, known as backward accumulation.
177
![Page 178: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/178.jpg)
Assume that any computer program that evaluates a function ry=F(rx) can be
decomposed into a sequence of simpler, or elementary partial differivatives, each of which is differentiated using a trivial table lookup procedure. Each elementary partial derivative is evaluated for a particular argument using the chain rule to provide derivative information about F (e.g., gradients, tangents, Jacobian matrix, etc.) that is exact numerically to some level of accuracy. Problems with symbolic mathematics are avoided by only using it for a set of very basic expressions, not complex ones.
178
![Page 179: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/179.jpg)
Forward accumulation
First compute dgdh then dh
dx in dg(h(x))dx =dgdh⋅
dhdx .
Example: Find the derivative of f (x1,x2)=x1x2+sin(x1). We have to seed the expression to distinguish between the derivative for x1 and x2 .
Original code statements Added AD statementsw1=x1 w1
' =1 (seed)w2 =x2 w2
' =0 (seed)w3=ω1ω2 w3
' =ω1'ω2+ω2
'ω1=1x2 +x10=x2w4 =sin(ω1) w4
' =cos(ω1)ω1'=cos(x1)1
w5 =ω3+ω4 w5' =ω3
'+ω4' =x2+cos(x1)
179
![Page 180: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/180.jpg)
Forward accumulation traverses the figure from bottom to top to accumulate the result.
180
![Page 181: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/181.jpg)
In order to compute the gradient of f, we have to evaluate both ∂f∂x1
and ∂f∂x2
,
which corresponds to using seeds x1=1, x2=0 and x1=0, x2=1, respectively.
The computational complexity of forward accumulation is proportional to the complexity of the original code.
Reverse accumulation
First compute dhdx then dg
dh in dg(h(x))dx =dgdh⋅
dhdx .
Example: As before. We can produce a graph of the steps needed. Unike, forward accumulation, we only need one seed to walk through the graph (from top to bottom this time) to calculate the gradient in half the work of forward accumulation.
181
![Page 182: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/182.jpg)
Superiority condition of forward versus reverse accumulation
Forward accumulation is superior to reverse accumulation for functions f : ° → ° m , m ? 1. Reverse accumulation is superior to forward accumulation for functions f : ° n→ ° , n? 1.
182
![Page 183: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/183.jpg)
Jacobean computation
The Jacobean J of f : ° n→ ° m is a m×n matrix. We can compute the Jacobian using either
n sweeps of forward accumulation, where each sweep produces a column of J.
m sweeps of backward accumulation, where each sweep produces a row of J.
Computing the Jacobean with a minimum number of arithmetic operations is known as optimal Jacobean accumulation and has been proven to be a NP-complete hard problem.
Dual numbers
We define a new arithmetic in which every x∈° is replaced by x+x'e , where x'∈° and e is nothing but a symbol such that e2 = 0 . For regular arithmetic, we
183
![Page 184: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/184.jpg)
can show that
(x+x'e)+(y+y'e)=x+y+(x'+y')e ,
(x+x'e)(y+y'e)=xy+xy'e +yx'e +x'y'e =xy+(x'+y')e ,
and similarly for subtractraction and division. Po lynomials can be calculated using dual numbers:
P(x+x'e) = p0+ 1(x+x'e)+L + n(x+x'e)n
= p0+ 1x+L + nxn
⎛⎝⎜
⎞⎠⎟+ 1x'e +22xx'e +L +nnxn−1x'e
⎛⎝⎜
⎞⎠⎟
= P(x)+P(1)(x)x'e ,
where P(1)(x) represents the derivative of P with respect to its first argument and x ' is an arbitrarily chosen seed.
184
![Page 185: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/185.jpg)
The dual number based arithmetic we use consists of ordered pairs <x,x'> with ordinary arithmetic on the first element and first order differential arithmetic on the second element. In general for a function f, we have
f (<u,u'>,<v,v'>)=< f(u,v), f(1)(u,v)u'+ f(2)(u,v)v'>,
where f (1) and f (2) represent the derivative of f with respect to the first and second arguments, respectively. Some common expressions are the following:
<u,u '>+ <v,v'>=<u +v,u '+v'> and <u,u '>− <v,v'>=<u −v,u '−v'>
<u,u '>*<v,v'>=<uv,u 'v+uv'> and <u,u '><v,v'> = u
v ,u 'v−uv'v2 , v ≠ 0
sin<u,u'>=<sin(u),u'cos(u)> and cos<u,u'>=<cos(u),−u'sin(u)>
e<u,u'>=<eu,u'eu> and log<u,u'>= log(u),u'u
185
![Page 186: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/186.jpg)
<u,u '>k=<uk,kuk−1u '> and <u,u '> = u ,u 'sign(u)
c∈° ⇔<c,0>
The derivative of f : ° → ° at some point x∈° in some direction x'∈° is given by
< y1,y1'>,L ,< ym,ym '>⎛⎝⎜
⎞⎠⎟= f < x1,x1'>,L ,< xm,xm '>⎛
⎝⎜⎞⎠⎟
using the just defined arithmetic. We can generalize this method to higher order derivatives, but the rules become quite complicated. Truncated Taylor series arithmetic is typically used instead since the Taylor summands in a series are known coefficients and derivatives of the function in question.
Implementations
186
![Page 187: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/187.jpg)
Google “automatic differentiation” and just search through the interesting sites.
Oldies, but goodies:
ADIFOR (Fortran 77) ADIC (C, C++) OpenAD (Fortran 77/95, C, C++) MAD (Matlab)
Typically, the transformation process is similar to the following:
187
![Page 188: Discrete Mathematics - MGNetdouglas/Classes/na-sc/notes/2008f.doc · Web viewProcess Stage 0 Stage 1 Stage 2 Stage 3 P0 S0 P1 S1 S1 S0 P2 S2 S2 S1 S2S1 S0 P3 S3 S3 S2 S3S2S1 S0 P4](https://reader036.vdocuments.us/reader036/viewer/2022070612/5b6400f27f8b9ad9618cd22e/html5/thumbnails/188.jpg)
9. Monte Carlo Methods
a
188