determining the roots of non-linear equations – part ii...a word of caution prof. dr. florianrupp...

Determining the Roots ofNon-Linear Equations

– Part II –

Prof. Dr. Florian Rupp

German University of Technology in Oman (GUtech)Introduction to Numerical Methods for ENG & CS

(Mathematics IV)

Spring Term 2016

Exercise Session

Reviewing the highlights from last time

Prof. Dr. Florian Rupp GUtech 2016: Numerical Methods – 3 / 40

Reviewing the highlights from last time

Page 123, exercise 1Find where the graphs of y = 3x and y = exp(x) intersect by finding solutionsof exp(x)− 3x = 0 correct to four decimal digits with the bisection method.

Page 123, exercise 1 (reformulated)Find where the graphs of y = 3x and y = exp(x) intersect by finding solutionsof exp(x)− 3x = 0 correct to four decimal digits with Newton’s method.

Computer exerciseWrite a MATLAB program that solves exp(x)−3x = 0 with Newton’s methodand plot the resulting error over the number of iterations (convergence plot).

Today, we will focus on algorithms forroot determination


Today’s topics:

■ Discussion of Newton’s method in 1D (including its quadratic speedof convergence)

■ Newton’s method in higher dimensions

■ The secant method (including its super-linear speed of convergence)

■ Comparison of the bisection method (w/o regula falsi), Newton’smethod and the secant method.

Corresponding textbook chapters: 3.2 and 3.3

Discussion of Newton’smethod in 1D

Another way to view Newton’s method(1/ 3)


■ Suppose again, that x0 is an initial approximation of a root of f , and letus ask:

What correction h should be added to x0 to obtain the root moreprecisely?

Obviously, we want f(x0 + h) = 0.

■ If f is sufficiently smooth, it has a Taylor expansion at x0, and we canre-write f(x0 + h) = 0 as

f(x0 + h) = f(x0) + hf ′(x0) + 12h

2f ′′(x0) + . . . = 0 .

Determining h from this equation is of course not easy.

■ Therefore, we go for an approximation of the correction term h and ignoreall but the first two terms of the series expansion:

f(x0) + hf ′(x0) = 0 .

(Recall, for an error analysis we now need f ∈ C2.)



■ The h that solves f(x0) + hf ′(x0) = 0 is of course not the true correctionf(x0 + h) = 0 we are seeking, but it is an easily computed number:

h = −f(x0)

f ′(x0).

■ Our new approximation is then

x1 = x0 + h = x0 −f(x0)

f ′(x0),

and the process can be repeated.

■ In retrospect, we see that the Taylor expansion was not needed after allbecause we only used the first two terms. In the convergence analysis wegive next time, it is assumed that f ′′ is continuous in a neighborhood ofthe root. This assumption enables us to estimate the errors in the process.



■ If Newton’s method is described in terms of a sequence x0, x1, x2, . . .

then the following recursive or inductive definition applies:

xn+1 = xn −f(xn)

f ′(xn).

■ Naturally, the interesting question is whether

limn→∞

xn = r ,

where r is the desired root of f .

Introducing quadratic speed ofconvergence


Definition (Quadratic Speed of Convergence)

A sequence {xn}n∈N exhibits quadratic (speed of) convergence to a limitx, if there is a constant C ∈ [0, 1) such that

|xn+1 − x| ≤ C|xn − x|2 (for all n ≥ 1) .

Example

Suppose, for simplicity, that C = 1 and also that xn is an estimate of x thatdiffers from it by at most unit in the k-th decimal place, i.e., |xn − x| ≤ 10−k.

Then, quadratic (speed of) convergence implies that |xn+1 − x| ≤ 10−2k.

In other words, xn+1 differs from x by at most one unit in the 2k-th decimalplace. So xn+1 has approximately twice as many correct digits as xn. This isdoubling the significant digits.

Newton’s method has quadratic(speed of) convergence


Theorem (Quadratic (Speed of) Convergence)

Let f , f ′ and f ′′ be continuous in a neighborhood of a root r of f and letf ′(r) 6= 0, then there is a positive δ with the following property:

If the initial point in Newton’s method satisfies |r−x0| ≤ δ, then all subsequentpoints xn satisfy the same inequality, converge to r, and do so quadratically,i.e.,

|xn+1 − r| ≤ c(δ)|xn − r|2 ,

where the constant c(δ) is given as

c(δ) := 12

max|x−r| ≤ δ |f′′(x)|

min|x−r| ≤ δ |f ′′(x)|.

A word of caution


■ Although Newton’s method is truly a marvelous invention, its convergencedepends upon hypotheses that are difficult to verify a priori.

■ As we have already seen, in applying Newton’s method considerationmust be given to the proper choice of a starting point.

■ Usually, one must have some insight into the shape of the graph of thefunction. Sometimes a coarse graph is adequate, but in other cases, astep-by-step evaluation of the function at various points may be necessaryto find a point near the root.

■ Often several steps of the bisection method are used to obtain a suitablestarting point, so that Newton’s method converges more rapidly.

Some cases where Newton’s method fails


x0 x1 x2

runaway

x0

flat point

x0 = x2

cycle

x1 = x3

Here, roundoff errors may lead to a sprialingtowards the root or from it away

The problem of multiple roots and itsremedy


■ The Newton speed of convergence theorem discloses another troublesomehypothesis; namely, f ′(r) 6= 0.

■ One can indeed show that for a multiple root (at least f(r) = f ′(r) = 0)speed of convergence of Newtons method becomes just linear although themethod itself is not affected by f ′(r) 6= 0 in a suitable neighborhood of r.

■ Normally, we do not know in advance that the root r we want to detect isa multiple root. If we know that the multiplicity of r is m, however,Newton’s method can be accelerated by modifying the update equation to

xn+1 = xn −mf(xn)

f ′(xn)

Multiple roots and regions of uncertainty(1/ 2)


■ The next slide shows the graphs of the polynomialsp1(x) = x2 − 2x+ 1 = (x− 1)2 andp2(x) = x3 − 3x2 + 3x− 1 = (x− 1)3.

■ p1 has a root at 1 with multiplicity 2 and p2 has a root at 1 withmultiplicity 3.

■ Both graphs are rather flat at the roots, which slows down theconvergence of the regular Newton method.

■ Also, the slide illustrates the graphs of these two non-linear functionstogether with their regions of uncertainty around the curves (due tocomputational inaccuracies). So the computed solution could beanywhere within the indicated intervals on the x-axis.

■ This is yet another illustration of the difficulty in obtaining precisesolutions of non-linear functions with multiple roots.

Multiple roots and regions of uncertainty(2/ 2)


0 1 2

y = p1

0 1 2

y = p2

Newton’s method in higherdimensions

The key idea of Newton’s method inhigher dimensions (1/ 3)


Many biological, physical or engineering problems involve the solution ofsystems of N non-linear equations in N unknowns xi, i = 1, 2, . . . , N , like

f1(x1, x2, . . . , xN ) = 0 ,

f2(x1, x2, . . . , xN ) = 0 ,...

fN (x1, x2, . . . , xN ) = 0 .

One approach is to linearize and solve, repeatedly. This is the same strategy

used by Newton’s method in solving a single non-linear equation. Not

surprisingly, a natural extension of Newton’s method for non-linear systems

can be found.



Using matrix vector notation and defining F := (f1, f2, . . . , fN )T andX := (x1, x2, . . . , xN )T , we can rewrite the initial system of non-linearequations as F (X) = 0.

The natural extension of Newton’s method for non-linear systems is given bythe update scheme

Xn+1 = Xn − (DF (Xn))−1

F (Xn) ,

where DF (Xn) is the Jacobian matrix of F at the point Xn:

DF (xn) =

∂1f1(x1, . . . , xN ) ∂2f1(x1, . . . , xN ) . . . ∂Nf1(x1, . . . , xN )

∂1f2(x1, . . . , xN ) ∂2f2(x1, . . . , xN ) . . . ∂Nf2(x1, . . . , xN )

. . .

∂1fN (x1, . . . , xN ) ∂2fN (x1, . . . , xN ) . . . ∂NfN (x1, . . . , xN )

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

X=Xn



Of course, in the update scheme

Xn+1 = Xn − (DF (Xn))−1

F (Xn)

we would not like to go for setting-up an inverse matrix.

To circumvent this, we solve in each step n the Jacobian linear system

DF (Xn)Hn = F (Xn)

for an auxiliary vector Hn. The next iteration of Newton’s method reads then

Xn+1 = Xn −Hn .

This is Newton’s method for non-linear systems.

Illustration of Newton’s method fornon-linear systems (1/ 3)


Example

Discuss Newton’s method for non-linear system based on the following three-dimensional example:

f1(x1, x2, x3) = 0 ,

f2(x1, x2, x3) = 0 ,

f3(x1, x2, x3) = 0 .

Assuming, that each of the fi (i = 1, 2, 3) is at least a C2-function, we applya liner Taylor expansion for the three variables x1, x2 and x3 gives

F (x1 + h1, x2 + h2, x3 + h3) = F (x1, x2, x3) +DF (x1, x2, x3)H + . . . ,

where H := (h1, h2, h3)T is the vector of step sizes h1, h2 and h3.



Example [cont.]

Suppose, the vector X(0) := (x(0)1 , x

(0)2 , x

(0)3 )T is an approximate solution of

F (X) = 0, and let H(0) := (h(0)1 , h

(0)2 , h

(0)3 )T a to be computed correction to

this initial guess such that

X(1) := X(0) +H(0) = (x(0)1 + h

(0)1 , x

(0)2 + h

(0)2 , x

(0)3 + h

(0)3 )T

is a better approximate solution.

Discarding the higher order terms in the Taylor expansion (recall we need aC2-function for this ansatz) we get

0 ≈ F (X(0) +H(0)) ≈ F (X(0)) +DF (X(0))H(0)

Next, we have to assume that the Jacobian matrix DF (X(0)) is non-singular.



Example [cont.]

Assuming the Jacobian matrix DF (X(0)) to be non-singular, gives thecorrection

H(0) = −(

DF (X(0)))−1

F ((0))

and thus

X(1) = X(0) +H(0) = X(0) −(

DF (X(0)))−1

F ((0))

as a better approximation of the solution of F (X) = 0.

Of course, we can obtain H(0) rather easily by solving the linear system

DF (X(0))H(0) = F (X(0)) .

Computing the root of a non-linearfunction (1/ 2)


Example

We want to use Newton’s method to determine a solution of the planar system

f(x, y) = x2 + y2 + 0.6y − 0.16 = 0 ,

g(x, y) = x2 − y2 + x− 1.6y − 0.14 = 0 ,

with the starting point x0 = 0.6, y0 = 0.25.

For the given starting point the Jacobi matrix reads as

DF (x0, y0) =

(

2x 2y + 0.62x+ 1 −2y − 1.6

)∣

∣

∣

∣

(x,y)=(x0,y0)

=

(

1.2 1.12.2 −2.1

)

and f(0.6, 0.25) = 0.4125 as well g(0.6, 0.25) = 0.3575.

Computing the root of a non-linearfunction (2/ 2)


Example [cont.]

This leads to the linear system

(

1.2 1.12.2 −2.1

)(

h1h2

)

=

(

0.41250.3575

)

which has the solutionh1 ≈ 0.254960 and h2 ≈ 0.09682.Thus, the new solution reads as

x1 = x0 + h1 = 0.345040

y1 = y0 + h2 = 0.153138 .

Example [cont.]

k xk yk0 0.6 0.251 0.345040 0.1531382 0.277531 0.1224633 0.271885 0.1196644 0.271885 0.1196435 0.271885 0.119643

This nicely illustrates quadratic

convergence. At each new step two

correct digits are gained.

The Secant Method

The key idea of the secant method (1/ 3)


■ We now consider a general-purpose procedure that converges almost asfast as Newton’s method. This method mimics Newton’s method, butavoids the calculation of derivatives.

■ Recall, that Newton’s iteration defines xn+1 as

xn+1 = xn −f(xn)

f ′(xn).

■ In the secant method, we replace f ′(x) by an approximation that is easyto compute:

f(x) = xn −f(x+ h)− f(x)

h=

f(xn−1)− f(xn)

xn−1 − xn,

where we take x = xn and h = xn−1 − xn.

(The right-hand side is is nothing else as the definition of the slope of asecant line through the points (xn−1, f(xn−1)) and (xn, f(xn)).)



a

b

f(a) < 0

f(b) > 0

■ Plugging this approximation of the first derivative into Newton’s updateformula defines the update step of the secant method:

xn+1 = xn − f(xn)

(

xn−1 − xn

f(xn−1)− f(xn)

)

.

Some remarks on the secant method


■ The secant method can be used for non-linear systems, too.

■ Clearly, xn+1 depends on two points xn−1 and xn. So to start themethod, two points x0 and x1 are required.

■ Studying the update formula

xn+1 = xn − f(xn)

(

xn−1 − xn

f(xn−1)− f(xn)

)

.

we see that the quantity f(xn−1)− f(xn) may eventually become zero.

If f(xn−1) and f(xn) are of the same sign, additional significant digits arecanceled in the subtraction.

So we could halt the iteration when |f(xn−1)− f(xn)| ≤ δ|f(xn)| forsome specified tolerance δ, such as 1

210−6.

Speed of convergence of the secantmethod


The advantages of the secant method are that (after the first step) only onefunction evaluation is required per step (in contrast to Newton’s method,which requires two) and that it is almost as rapidly convergent as Newton’smethod.

It can be shown that the described secant method obeys an equation for theerror en+1 = r − xn+1 of the form

en+1 = −12

(

f ′′(ξn)f ′(ζn)

)

enen−1 ≈ −12

(

f ′′(r)f ′(r)

)

enen−1

where ξn and ζn are in the smallest interval that contains the desired root r,xn and xn−1.

The rapidity of convergence of this method is, in general, between that of the

bisection method and that of Newton’s method.

Comparison of the Methods

Comparison of the root finding methods(1/ 3)


■ We discussed three primary methods for solving f(x) = 0: the bisectionmethod, Newton’s method and the secant method.

■ The bisection method is reliable but slow.

■ Newton’s method is fast but often only near the root and requires f ′.One must provide a starting point near the root and ensure that f isdifferentiable.

Newton’s method can be interpreted as the repetition of the two stepprocedure linearize and solve. This strategy is applicable in many othernumerical problems, and its importance cannot be overemphasized.

■ The secant method is nearly as fast as Newton’s method and does notrequire knowledge of the derivative f ′, which may not be available or maybe too expensive to compute. One must provide two points at which thesigns differ (to avoid cancellation in the update formula) and the functionmust only be continuous.



■ The secant method is often faster at approximating the root of anon-linear function compared to the bisection method and the regula falsi.Unlike these two methods, the intervals [ak, bk] do not have to be onopposite ends of the root and have a change of sign.

■ Moreover, the slope of the secant line can be quite small, and a step canmove far from the current point. The secant method can thus fail to finda root of a non-linear function that has a small slope near the rootbecause the secant line can induce a large jump.

■ For nice functions and initial guesses relatively close to the root, most ofthese methods require relatively few iterations before coming close to theroot.

■ However, there are pathological examples that can cause troubles for anychoice of those methods



■ When selecting a method for solving a given non-linear problem, one mustconsider many issues such as what you know about the behavior of thefunction, an interval [a, b] satisfying f(a)f(b) < 0, the first derivative ofthe function, a good initial guess to the desired root, and so on.

■ In an effort to find the best algorithm for finding roots of a non-lineargiven function, various hybrid methods have been developed. Some ofthem combine the bisection method (used during the early iterations)with either the secant method or with Newton’s method.

Summary & Outlook

Major concepts covered today (1/ 3):Newton’s method in 1D


■ For finding a root of a continuously differentiable function f , Newton’smethod is given by

xn+1 = xn −f(xn)

f ′(xn)(n ≥ 0) .

It requires a given initial value and two function evaluation at each step.

■ A rigorous error analysis shows that the errors ek = r − xk in Newton’smethod are related by

en+1 = −12

(

f ′′(ξnf ′(xn)

)

e2n

which leads, with some constant c ∈ [0, 1), to the inequality

|en+1| ≤ c · |en|2 .

This means that Newton’s method has quadratic convergence behaviorfor initial value sufficiently close to the root r.

Major concepts covered today (2/ 3):Newton’s method in 2D


■ For an n× n system of non-linear equations F (X) = 0, the update stepin Newton’s method is written as

X(n+1) = X(n) −(

DF (X(n)))−1

F (X(n)) (n ≥ 0) ,

which involves the Jacobian matrix F ′(X(n)) =(

∂xjFi(X

(n)))

i,j=1,...,N.

In practice, one solves the Jacobian linear system(

DF (X(n)))

H(n) = −F (X(n))

for H(n) and then finds the next iterate from the equation

X(n+1) = X(n) +H(n) .

Major concepts covered today (3/ 3):secant method


■ The update step of the secant method for finding a zero r of a functionf(x) is written as

xn+1 = xn − f(xn)

(

xn−1 − xn

f(xn−1)− f(xn)

)

(n ≥ 1) ,

which requires two initial values x0 and x1. After the first step, only onenew function evaluation per step is required.

■ A rigorous error analysis shows that after n+ 1 steps of the secantmethod, the error iterates ek = r − xk obey the equation

en+1 = −12

(

f ′′(ξnf ′(ζn)

)

enen−1

which leads, with some constant c ∈ [0, 1), to the approximation

|en+1| ≈ c · |en|(1+

√5)/2 ≈ c · |en|

1.62 ,

Therefore, the secant method has superlinear convergence behavior.

Preparation for the next lecture (1/ 2)


Please, prepare these short exercises for the next lecture:

1. Page 123, exercise 1 (reformulated)Find where the graphs of y = 3x and y = exp(x) intersect by finding

solutions of exp(x)−3x = 0 correct to four decimal digits with the secant

method.

2. Page 149, exercise 4Application of the secant method for f(x) = 2 − ex with x0 = 0 andx1 = 1 leads to the following sequence of iterates

xn+1 = xn + (2− exn)(xn − xn−1)(exn − exn−1)−1 .

What is limn→∞ xn?

Preparation for the next lecture (2/ 2)


Please, prepare these short exercises for the next lecture:

3. Page 151, computer exercise 12Test numerically whether Olver’s method, given by the update formula

xn+1 = xn −f(xn)

f ′(xn)−

1

2

f ′′(xn)f ′(xn)

(

f(xn)

f ′(xn)

)2

is cubically convergent to a root of f . Try to establish that it is.

determining the roots of non-linear equations – part ii...a word of caution prof. dr. florianrupp...

Documents