numerical solution of non-linear equations

Numerical solution of non-linear equations

October 14, 2020

Anne Kværnø (modified by André Massing and Markus Grasmair)Date: May 27, 2020Revision: Oct 11, 2020

1 Introduction

We know that the quadratic equations of the form

ax2 + bx + cx = 0

has the roots

r± =−b±

√b2 − 4ac

2a.

More generally, for a given function f , a number r satisfying f (r) = 0 is called a root (or solution)to the equation

f (x) != 0.

In many applications, we encounter such equations for which we do not have a simple solutionformula as for quadratic functions. In fact, an analytical solution formula might not even exist!Thus the goal of the chapter is to develop some numerical techniques for solving nonlinear scalarequations (one equation, one unknown), such as, for example

x3 + x2 − 3x = 3.

or systems of equations, such as, for example

xey = 1,

−x2 + y = 1.

NB!

• We refer to section 3.1 in Preliminaries for some general comments on convergence.• There are no examples of numerical calculations done by hand in this note. If you would

like some, you can easily generate them yourself. Take one of the computer exercises, doyour calculations by hand, if needed modify the code so that you get the output you want,run the code, and compare with the results you got by your pencil and paper calculation.

1

2 Scalar equations

In this section we discuss the solution of scalar equations. The techniques we will use are knownfrom previous courses. When they are repeated here, it is because the techniques used to developand analyse these methods can, at least in principle, be extended to systems of equations. We willalso emphasise the error analysis of the methods.

A scalar equation is given byf (x) = 0, (1)

where f is a continuous function defined on some interval [a, b]. A solution r of the equation iscalled a zero or a root of f . A nonlinear equation may have one, more than one, or no roots.

Example 1: Given the function f (x) = x3 + x2 − 3x− 3 on the interval [−2, 2]. Plot the functionon the interval, and locate possible roots.

In [5]: %matplotlib inline

import numpy as npfrom numpy import pifrom numpy.linalg import solve, norm # Solve linear systems and compute normsimport matplotlib.pyplot as plt

newparams = {'figure.figsize': (8.0, 4.0), 'axes.grid': True,'lines.markersize': 8, 'lines.linewidth': 2,'font.size': 14}

plt.rcParams.update(newparams)

In [6]: def f(x): # Define the functionreturn x**3+x**2-3*x-3

# Plot the function on an intervalx = np.linspace(-2, 2, 101) # x-values (evaluation points)plt.plot(x, f(x)) # Plot the functionplt.plot(x, 0*x, 'r') # Plot the x-axisplt.xlabel('x')plt.grid(True)plt.ylabel('f(x)');

2

According to the plot, the function f has three real roots in the interval [−2, 2]. The functioncan be rewritten as

f (x) = x3 + x2 − 3x− 3 = (x + 1)(x2 − 3).

Thus the roots of f are −1 and ±√

3. We will use this example as a test problem later on.

2.1 Existence and uniqueness of solutions

The following theorem is well known:

Theorem 1: Existence and uniqueness of a solution.

• If f ∈ C[a, b] with f (a) and f (b) of opposite sign, then there exist at least one r ∈ (a, b) suchthat f (r) = 0.

• The solution is unique if f ∈ C1[a, b] and f ′(x) > 0 or f ′(x) < 0 for all x ∈ (a, b).

The first condition guarantees that the graph of f will pass the x-axis at some point r, thesecond guarantees that the function is either strictly increasing or strictly decreasing.

We note that the condition that f (a) and f (b) are of opposite sign can be summarised in thecondition that f (a) · f (b) < 0.

2.2 Bisection method

The first part of the theorem can be used to construct the first, quite intuitive algorithm for solvingscalar equations. Given an interval, check if f has a root in the interval (by comparing the signs ofthe function values at the end-points), divide it in two, check in which of the two halfs the root is,and continue until a root is located with sufficient accuracy.

3

Bisection method.

• Given the function f and the interval I0 := [a, b], such that f (a) · f (b) < 0.

• Set a0 = a, b0 = b.

• For k = 0, 1, 2, 3, . . .

• ck =ak + bk

2

• Ik+1 := [ak+1, bk+1] =

{[ak, ck] if f (ak) · f (ck) ≤ 0[ck, bk] if f (bk) · f (ck) ≤ 0

Use ck as the approximation to the root. Since ck is the midpoint of the interval [ak, bk], the errorsatisfies |ck − r| 6 (bk − ak)/2. The loop is terminated when (bk − ak)/2 is smaller than some userspecified tolerance. Since the length of interval Ik satisfies (bk − ak) = 2−k(b− a), we also have ana priori estimate of the error after k bisections,

|ck − r| 6 12(bk − ak) 6

12k+1 (b− a)

Consequently, we can estimate how many bisections we have to perform to compute a root up toa given tolerance tol by simply requiring that

12k+1 (b− a) 6 tol⇔ b− a

2 tol6 2k ⇔

log( b− a

2 tol

)log 2

6 k.

We will of course also terminate the loop if f (ck) is very close to 0.

2.2.1 Implementation

The algorithm is implemented in the function bisection().

In [7]: def bisection(f, a, b, tol=1.e-6, max_iter = 100):# Solve the scalar equation f(x)=0 by bisection# The result of each iteration is printed# Input:# f: The function.# a, b: Interval:# tol : Tolerance# max_iter: Maximum number of iterations# Output:# the root and the number of iterations.

fa = f(a)fb = f(b)if fa*fb > 0:

print('Error: f(a)*f(b)>0, there may be no root in the interval.')return 0, 0

for k in range(max_iter):

4

c = 0.5*(a+b) # The midpointfc = f(c)print('k ={:3d}, a = {:.6f}, b = {:.6f}, c = {:10.6f}, f(c) = {:10.3e}'

.format(k, a, b, c, fc))if abs(f(c)) < 1.e-14 or (b-a) < 2*tol: # The zero is found!

breakelif fa*fc <= 0:

b = c # There is a root in [a, c]else:

a = c # There is a root in [c, b]return c, k

Example 2: Use the code above to find the root of

f (x) = x3 + x2 − 3x− 3

in the interval [1.5, 2]. Use 10−6 as the tolerance.

In [8]: # Example 2def f(x): # Define the function

return x**3+x**2-3*x-3a, b = 1.5, 2 # The intervalc, nit = bisection(f, a, b, tol=1.e-6) # Apply the bisection method

k = 0, a = 1.500000, b = 2.000000, c = 1.750000, f(c) = 1.719e-01k = 1, a = 1.500000, b = 1.750000, c = 1.625000, f(c) = -9.434e-01k = 2, a = 1.625000, b = 1.750000, c = 1.687500, f(c) = -4.094e-01k = 3, a = 1.687500, b = 1.750000, c = 1.718750, f(c) = -1.248e-01k = 4, a = 1.718750, b = 1.750000, c = 1.734375, f(c) = 2.203e-02k = 5, a = 1.718750, b = 1.734375, c = 1.726562, f(c) = -5.176e-02k = 6, a = 1.726562, b = 1.734375, c = 1.730469, f(c) = -1.496e-02k = 7, a = 1.730469, b = 1.734375, c = 1.732422, f(c) = 3.513e-03k = 8, a = 1.730469, b = 1.732422, c = 1.731445, f(c) = -5.728e-03k = 9, a = 1.731445, b = 1.732422, c = 1.731934, f(c) = -1.109e-03k = 10, a = 1.731934, b = 1.732422, c = 1.732178, f(c) = 1.201e-03k = 11, a = 1.731934, b = 1.732178, c = 1.732056, f(c) = 4.596e-05k = 12, a = 1.731934, b = 1.732056, c = 1.731995, f(c) = -5.317e-04k = 13, a = 1.731995, b = 1.732056, c = 1.732025, f(c) = -2.429e-04k = 14, a = 1.732025, b = 1.732056, c = 1.732040, f(c) = -9.845e-05k = 15, a = 1.732040, b = 1.732056, c = 1.732048, f(c) = -2.624e-05k = 16, a = 1.732048, b = 1.732056, c = 1.732052, f(c) = 9.860e-06k = 17, a = 1.732048, b = 1.732052, c = 1.732050, f(c) = -8.192e-06k = 18, a = 1.732050, b = 1.732052, c = 1.732051, f(c) = 8.340e-07

Control that the numerical result is within the error tolerance:

In [9]: r = np.sqrt(3)print('The error: |c_k-r|={:.2e}'.format(abs(r-c)))

The error: |c_k-r|=8.81e-08

5

2.3 Exercise 1: Bisection method

a) Compute the solution(s) of x2 + sin(x)− 0.5 = 0 by the bisection method.

b) Compute the solution(s) of x2 + sin(x)− 0.5 = 0 by the bisection method.

c) Given a root in the interval [1.5, 2]. How many iterations are required to guarantee that theerror is less than 10−4.

2.4 Fixed point iterations

The bisection method is very robust, but not particular fast. In addition, the bisection method inthat form can only be used for the solution of equations in a single variable, and it is difficult touse similar methods for systems of equations in several variables.

We will therefore discuss a major class of iteration schemes, namely the so-called fixed pointiterations. The idea is:

• Given a scalar equation f (x) = 0 with a root r.

• Rewrite the equation in fixed point form x = g(x) such that the root r of f is a fixed point of g,that is, r satisfies r = g(r).

The fixed point iterations are then given by:

Fixed point iterations.

• Given g and a starting value x0.

• For k = 0, 1, 2, 3, . . .xk+1 = g(xk)


The fixed point scheme is implemented in the function fixedpoint. The iterations are terminatedwhen either the error estimate |xk+1 − xk| is less than a given tolerance tol, or the number ofiterations reaches some maximal number max_iter.

In [10]: def fixedpoint(g, x0, tol=1.e-8, max_iter=30):# Solve x=g(x) by fixed point iterations# The output of each iteration is printed# Input:# g: The function g(x)# x0: Initial values# tol: The tolerance# Output:# The root, the number of iterationsx = x0print('k ={:3d}, x = {:14.10f}'.format(0, x))for k in range(max_iter):

x_old = x # Store old values for error estimationx = g(x) # The iteration

6

err = abs(x-x_old) # Error estimateprint('k ={:3d}, x = {:14.10f}'.format(k+1, x))if err < tol: # The solution is accepted

breakreturn x, k+1

Example 3: The equation

x3 + x2 − 3x− 3 = 0 can be rewritten as x =x3 + x2 − 3

3.

The fixed points are the intersections of the two graphs y = x and y = x3+x2−33 , as can be demon-

strated by the following script:

In [11]: # Example 3x = np.linspace(-2, 2, 101)plt.plot(x, (x**3+x**2-3)/3,'b', x, x, '--r' )plt.axis([-2, 3, -2, 2])plt.xlabel('x')plt.grid('True')plt.legend(['g(x)','x']);

We observe that the fixed points of g are the same as the zeros of f .Apply fixed point iterations on g(x). Aim for the fixed point between 1 and 2, so choose

x0 = 1.5. Do the iterations converge to the root r =√

3?

In [12]: # Solve the equation from example 3 by fixed point iterations.

7

# Define the functiondef g(x):

return (x**3+x**2-3)/3

# Apply the fixed point schemex, nit = fixedpoint(g, x0=1.5, tol=1.e-6, max_iter=30)

print('\n\nResultat:\nx = {}, antall iterasjoner={}'.format(x, nit))

k = 0, x = 1.5000000000k = 1, x = 0.8750000000k = 2, x = -0.5214843750k = 3, x = -0.9566232041k = 4, x = -0.9867682272k = 5, x = -0.9957053567k = 6, x = -0.9985807218k = 7, x = -0.9995282492k = 8, x = -0.9998428981k = 9, x = -0.9999476491k = 10, x = -0.9999825515k = 11, x = -0.9999941841k = 12, x = -0.9999980614k = 13, x = -0.9999993538k = 14, x = -0.9999997846

Resultat:x = -0.9999997845980656, antall iterasjoner=14

2.5 Exercise 2: Fixed point method part I

Repeat the experiment with the following reformulations of f (x) = 0:

x = g2(x) =−x2 + 3x + 3

x2 ,

x = g3(x) = 3√

3 + 3x− x2,

x = g4(x) =

√3 + 3x− x2

x

Use x0 = 1.5 in your experiments, you may well experiment with other values as well.

2.6 Theory

Let us first state some existence and uniqueness results. Apply Theorem 1 in this note on theequation f (x) = x− g(x) = 0. The following is then given (the details are left to the reader): * Ifg ∈ C[a, b] and a < g(x) < b for all x ∈ [a, b] then g has at least one fixed point r ∈ (a, b).

• If in addition g ∈ C1[a, b] and |g′(x)| < 1 for all x ∈ [a, b] then the fixed point is unique.

8

In the following, we will write the assumption a < g(x) < b for all x ∈ [a, b] as g([a, b]) ⊂(a, b).

In this section we will discuss the convergence properties of the fixed point iterations, underthe conditions for existence and uniqueness given above.

The error after k iterations is given by ek = r − xk. The iterations converge when ek → 0 ask→ ∞. Under which conditions is this the case?

This is the trick: For every k we have:

xk+1 = g(xk), the iterations,r = g(r), the fixed point.

Take the difference between those and use the mean value theorem (Result 3 in section 5 in Pre-liminaries), and finally take the absolute value of each expression in the sequence of equalities:

|ek+1| = |r− xk+1| = |g(r)− g(xk)| = |g′(ξk)| · |r− xk| = |g′(ξk)| · |ek|. (2)

Here ξk is some unknown value between xk (known) and r (unknown). We can now use theassumptions from the existence and uniqueness result. Note here that the condition g([a, b]) ⊂(a, b) guarantees that if x0 ∈ [a, b] then xk ∈ (a, b) for k = 1, 2, 3, . . ..

The condition |g′(x)| ≤ L < 1 guarantees convergence towards the unique fixed point r, since

|ek+1| ≤ L |ek| ⇒ |ek| ≤ Lk |e0| → 0 as k→ ∞,

and Lk → 0 as k→ ∞ when L < 1. In addition, we have that

|xk+1 − xk| = |g(xk)− g(xk−1)| = |g′(ξk)| · |xk − xk−1|

for some ξk between xk−1 and xk, which shows that

|xk+1 − xk| ≤ L|xk − xk−1|.

Repeating this argumentation k-times yields the estimate

|xk+1 − xk| ≤ Lk|x1 − x0|.

As a consequence, since r = limk→∞ xk, we obtain that

|x1 − r| =∣∣∣ ∞

∑k=0

(xk+1 − xk+2)∣∣∣ ≤ ∞

∑k=0|xk+1 − xk+2| ≤

∞

∑k=0

Lk+1|x1 − x0| =L

1− L|x1 − x0|.

A similar argumentation (but now starting at xk instead of x1) yields that

|xk+1 − r| ≤ L1− L

|xk+1 − xk| ≤Lk+1

1− L|x1 − x0|.

In summary we have

9

The fixed point theorem. If there is an interval [a, b] such that g ∈ C1[a, b], g([a, b]) ⊂ (a, b) andthere exist a positive constant L < 1 such that |g′(x)| ≤ L < 1 for all x ∈ [a, b] then the followinghold:

• g has a unique fixed point r in (a, b).

• The fixed point iteration xk+1 = g(xk) converges towards r for all starting values x0 ∈ [a, b].

• The error ek+1 = r− xk+1 after iteration k + 1 satisfies the estimates

• |ek+1| 6 L|ek| . . . error reduction rate,

• |ek+1| 6Lk+1

1− L|x1 − x0| . . . a-priori error estimate.

• |ek+1| 6L

L− 1|xk+1 − xk| . . . a-posteriori error estimate.

From the discussion above, we can draw some conclusions:

• The smaller the constant L, the faster the convergence.

• If |g′(r)| < 1 then there will always be an interval (r− δ, r + δ) around r for some δ > 0 onwhich all the conditions are satisfied, meaning that the iterations will always converge if x0is sufficiently close to r.

• If |g′(r)| > 1, a similar argumentation shows that the fixed point iterates move away fromthe point r. In practice, this means that the fixed point iteration in this case will never con-verge towards r.

• If the constant L is known, the a-priori error estimate can be used in order to estimate thenumber of steps it takes in order to obtain an error that is smaller than some specified toler-ance tol. To that end, we simply have to solve the inequality

Lk+1

1− L|x1 − x0| ≤ tol

for k.

• The a-posteriori estimate can be used to estimate the actual error based on the size of the lastupdate |xk+1 − xk|.

• If the constant L is not known, one can estimate it using the ideas found inpreliminaries.ipynb. Then one can use this estimate of L in the a-posteriori error estimatein order to estimate the actual error.

Example 3 revisited: Use the theory above to analyse the iteration scheme from Example 3,where

g(x) =x3 + x2 − 3

3, g′(x) =

3x2 + 2x3

It is clear that g is differentiable. We already know that g has three fixed points, r = ±√

3 andr = −1. For the first two, we have that g′(

√3) = 3+ 2

3

√3 = 4.15 and g′(−

√3) = 3− 2

3

√3 = 1.85,

so the fixed point iterations will never converges towards those roots. But g′(−1) = 1/3, sowe get convergence towards this root, given sufficiently good starting values. The figure belowdemonstrates that the assumptions of the theorem are satisfied in some region around x = −1, forexample [−1.3,−0.7].

10

In [13]: plt.rcParams['figure.figsize'] = 10, 5 # Resize the figure for a nicer plot

In [14]: # Demonstrate the assumptions of the fixed point theorem

def g(x): # The function g(x)return (x**3+x**2-3)/3

def dg(x): # The derivative g'(x)return (3*x**2+2*x)/3

a, b = -4/3, -2/3 # The interval [a,b] around r.x = np.linspace(a, b, 101) # x-values on [a,b]y_is_1 = np.ones(101) # For plotting the for g'

plt.rcParams['figure.figsize'] = 10, 5 # Resize the figure# Plot x and g(x) around the $r=-1$.plt.subplot(1,2,1)plt.plot(x, g(x), x, x, 'r--')plt.xlabel('x')plt.axis([a,b,a,b])plt.grid(True)plt.legend(['g(x)','x'])

# Plot g'(x), and the limits -1 and +1plt.subplot(1,2,2)plt.plot(x, dg(x), x, y_is_1, 'r--', x, -y_is_1, 'r--');plt.xlabel('x')plt.grid(True)plt.title('dg/dx');

11

In [15]: # run the fixed point iteration starting at x_0 = -4/3x, nit = fixedpoint(g, x0=-2/3, tol=1.e-6, max_iter=30)

print('\n\nResultat:\nx = {}, antall iterasjoner={}'.format(x, nit))

k = 0, x = -0.6666666667k = 1, x = -0.9506172840k = 2, x = -0.9851247206k = 3, x = -0.9951879923k = 4, x = -0.9984113972k = 5, x = -0.9994721469k = 6, x = -0.9998242347k = 7, x = -0.9999414321k = 8, x = -0.9999804797k = 9, x = -0.9999934935k = 10, x = -0.9999978312k = 11, x = -0.9999992771k = 12, x = -0.9999997590

Resultat:x = -0.9999997590221911, antall iterasjoner=12

The plot to the left demonstrates the assumption g([a, b]) ⊂ (a, b), as the graph y = g(x) entersat the left boundary and exits at the right and does not leave the region [a, b] anywhere in between.The plot to the right shows that |g′(x)| ≤ |g′(a)| = L < 1 is satisfied in the interval.

2.7 Exercise 3: Fixed point method part II

a) See how far the interval [a, b] can be stretched, still with convergence guaranteed.

b) Do a similar analysis on the three other iteration schemes suggested above, and confirm theresults numerically. The schemes are:

x = g2(x) =−x2 + 3x + 3

x2 ,

x = g3(x) = 3√

3 + 3x− x2,

x = g4(x) =

√3 + 3x− x2

x

Example 3 revisited: The fixed point theorem stated that |ek+1| 6 L|ek| with L < 1 and thusthe fixed point iteration is linearly convergent.

In [17]: def eocs(errs):# Computes the experimentally observed convergence rates for a

12

# list of iterates# errs: list of computed errors# Output:# list of computed eocs

if (len(errs) < 3):print('Error: Need at least 3 iterates to compute EOCs')

# (Recall list indexing and slicing in Python if you trouble# to understand the next line)ps = np.log(errs[2:]/errs[1:-1])/np.log(errs[1:-1]/errs[:-2])

# Printfor k in range(len(ps)):

print("k = {:3d}, eoc(k) = {:3.3f}".format(k+2, ps[k]))

return ps

We will also modify the fixedpoint() function so that it returns not only the last iteration, butall iterations, so that we can compute the estimated order of convergence.

In [18]: def fixedpoint_v2(g, x0, tol=1.e-8, max_iter=30):# Solve x=g(x) by fixed point iterations# The output of each iteration is printed# Input:# g: The function g(x)# x0: Initial values# tol: The tolerance# Output:# List of iterationsx = [x0]print('k ={:3d}, x = {:14.10f}'.format(0, x[-1]))for k in range(max_iter):

x.append(g(x[-1]))err = abs(x[-1]-x[-2]) # Error estimateprint('k ={:3d}, x = {:14.10f}'.format(k+1, x[-1]))if err < tol: # The solution is accepted

break# Return as numpy arrayreturn np.array(x)

Now we can rerun example 3 with g(x) = (x3 + x2 − 3)/3 and compute the estimated orderof convergence:

In [19]: ## Compute the eoc for the fixed point method when solving example 3.

# Define the functiondef g(x):

return (x**3+x**2-3)/3

13

# Apply the fixed point schemex = fixedpoint_v2(g, x0=-2/3, tol=1.e-6, max_iter=30)

# Compute x_k - r,r = -1 # exact solutionerrs = x - r # error for each iterate

# Find the order and the error constant Cprint('\nThe order p and the error constant C')for k in range(len(errs)-2):

p = np.log(np.abs(errs[k+2]/errs[k+1]))/np.log(np.abs(errs[k+1]/errs[k]))C = np.abs(errs[k+2])/np.abs(errs[k+1])**pprint('k = {:2d}, p = {:4.2f}, C = {:6.4f}'.format(k, p, C))

k = 0, x = -0.6666666667k = 1, x = -0.9506172840k = 2, x = -0.9851247206k = 3, x = -0.9951879923k = 4, x = -0.9984113972k = 5, x = -0.9994721469k = 6, x = -0.9998242347k = 7, x = -0.9999414321k = 8, x = -0.9999804797k = 9, x = -0.9999934935k = 10, x = -0.9999978312k = 11, x = -0.9999992771k = 12, x = -0.9999997590

The order p and the error constant Ck = 0, p = 0.63, C = 0.0985k = 1, p = 0.94, C = 0.2519k = 2, p = 0.98, C = 0.2999k = 3, p = 0.99, C = 0.3200k = 4, p = 1.00, C = 0.3282k = 5, p = 1.00, C = 0.3314k = 6, p = 1.00, C = 0.3326k = 7, p = 1.00, C = 0.3331k = 8, p = 1.00, C = 0.3332k = 9, p = 1.00, C = 0.3333k = 10, p = 1.00, C = 0.3333

2.8 Newton’s method

Based on the previous discussion, it is clear that fast convergence can be achieved if g′(r) is assmall as possible, preferably g′(r) = 0. Can this be achieved for a general problem?

We have that xk = r− ek where ek is the error in iteration k. Do a Taylor expansion (Preliminar-

14

ies, section 4) of g(xk) around r:

ek+1 = r− xk+1 = g(r)− g(xk) = g(r)− g(r− ek) = −g′(r)ek +12

g′′(ξk)e2k .

If g′(r) = 0 and there is a constant M such that |g′′(x)|/2 ≤ M, then

|ek+1| ≤ M|ek|2

and quadratic convergence has been achieved, see Preliminaries, section 3.1 on "Convergence of aniterative process."

Question: Given the equation f (x) = 0 with an unknown solution r. Can we find a g with r asa fixed point, satisfying g′(r) = 0?

Idea: Letg(x) = x− h(x) f (x).

for some function h(x). Independent of the choice of h(x), r will be a fixed point of g. Choose h(x)such that g′(r) = 0, that is

g′(x) = 1− h′(x) f (x)− h(x) f ′(x) ⇒ g′(r) = 1− h(r) f ′(r)

That is, we have to choose h(x) in such a way that h(r) = 1/ f ′(r). There are several choices for hwith this property, but the most obvious one is the choose h(x) = 1/ f ′(x). The result is

Newton’s method.

• Given f and f ′, and a starting value x0.

• For k = 0, 1, 2, 3, . . .

xk+1 = xk −f (xk)

f ′(xk)


The method is implemented in the function newton(). The iterations are terminated when | f (xk)|is less than a given tolerance.

In [20]: def newton(f, df, x0, tol=1.e-8, max_iter=30):# Solve f(x)=0 by Newtons method# The output of each iteration is printed# Input:# f, df: The function f and its derivate f'.# x0: Initial values# tol: The tolerance# Output:# The root and the number of iterationsx = x0print('k ={:3d}, x = {:18.15f}, f(x) = {:10.3e}'.format(0, x, f(x)))for k in range(max_iter):

fx = f(x)if abs(fx) < tol: # Accept the solution

15

breakx = x - fx/df(x) # Newton-iterationprint('k ={:3d}, x = {:18.15f}, f(x) = {:10.3e}'.format(k+1, x, f(x)))

return x, k+1

Example 4: Solve f (x) = x3 + x2 − 3x − 3 = 0 by Newton’s method. Choose x0 = 1.5. Thederivative of f is f ′(x) = 3x2 + 2x− 3.

In [28]: # Example 4def f(x): # The function f

return x**3+x**2-3*x-3

def df(x): # The derivative f'return 3*x**2+2*x-3

x0 = -4/3 # Starting valuex, nit = newton(f, df, x0, tol=1.e-14, max_iter=30) # Apply Newtonprint('\n\nResult:\nx={}, number of iterations={}'.format(x, nit))

k = 0, x = -1.333333333333333, f(x) = 4.074e-01k = 1, x = -0.111111111111110, f(x) = -2.656e+00k = 2, x = -0.944875107665806, f(x) = -1.162e-01k = 3, x = -0.997403215651661, f(x) = -5.207e-03k = 4, x = -0.999993308904898, f(x) = -1.338e-05k = 5, x = -0.999999999955230, f(x) = -8.954e-11k = 6, x = -1.000000000000000, f(x) = 0.000e+00

Result:x=-1.0, number of iterations=7

2.8.2 Error analysis

The method was constructed to give quadratic convergence, that is

|ek+1| ≤ M|ek|2,

where ek = r − xk is the error after k iterations. But under which conditions is this true, and canwe say something about the size of the constant M?

By a Taylor expansion of f (r) around xk, we get:

0 = f (r) = f (xk) + f ′(xk)(r− xk) +12

f ′′(ξk)(r− xk)2 (Taylor series)

0 = f (xk) + f ′(xk)(xk+1 − xk) (Newton’s method)

where ξk is between r and xk.Subtract the two equations to get

f ′(xk)(r− xk+1) +12

f ′′(ξk)(r− xk)2 = 0 ⇒ ek+1 = −1

2f ′′(ξk)

f ′(xk)e2

k

16

So we obtain quadratic convergence if f is twice differentiable around r, f ′(xk) 6= 0, and x0 ischosen sufficiently close to r. More precisely:

Theorem: Convergence of Newton iterations. Assume that the function f has a root r, and letIδ = [r− δ, r + δ] for some δ > 0. Assume further that

• f ∈ C2(Iδ).

• There is a positive constant M such that

•∣∣∣∣ f ′′(y)

f ′(x)

∣∣∣∣ ≤ 2M, for all x, y ∈ Iδ.

In this case, Newton’s iterations converge quadratically with

|ek+1| ≤ M|ek|2

for all starting values satisfying |x0 − r| < min{1/M, δ}.

Here the condition |x0 − r| ≤ 1/M guarantees that the error actually decreases in each step,since we have this condition that |ek+1| ≤ M|ek|2 < |ek|.

2.9 Exercise 4: Newton’s method for a scalar equation

a) Repeat Example 4 using different starting values x0. Find the two other roots.

b) Verify quadratic convergence numerically. How to do so is explained in Preliminaries, section3.1.

c) Solve the equation x(1− cos(x)) = 0, both by the bisection method and by Newton’s methodwith x0 = 1. Comment on the result.

3 System of nonlinear equations

In this section we will discuss how to solve systems of non-linear equations, given by

f1(x1, x2, . . . , xn) = 0f2(x1, x2, . . . , xn) = 0

...fn(x1, x2, . . . , xn) = 0

or short byf(x) = 0.

where f : Rn → Rn.Example 5: We are given the two equations

x31 − x2 +

14= 0,

x21 + x2

2 − 1 = 0.

This is illustrated in the following figure.

17

In [22]: plt.rcParams['figure.figsize'] = 6, 6 # Resize the figure for a nicer plot

In [23]: # Example 5 with graphsx = np.linspace(-1.0, 1.0, 101)plt.plot(x, x**3+0.25);t = np.linspace(0,2*pi,101)plt.plot(np.cos(t), np.sin(t));plt.axis('equal');plt.xlabel('x1')plt.ylabel('x2')plt.grid(True)plt.legend(['$x_1^3-x_2+1/4=0$','$x_1^2+x_2^2-1=0$']);

The solutions of the two equations are the intersections of the two graphs. So there are twosolutions, one in the first and one in the third quadrant. We will use this as a test example in thesequel.

18

3.1 Newton’s method for systems of equations

The idea of fixed point iterations can be extended to systems of equations. But it is in generalhard to find convergent schemes. So we will concentrate on the extension of Newton’s method tosystems of equations. And for the sake of illustration, we only discuss systems of two equationsand two unknowns written as

f (x, y) = 0,g(x, y) = 0,

to avoid getting completely lost in indices.Let r = [rx, ry]T be a solution to these equations and some x = [x, y]T a known approximation

to r. We search for a better approximation. This can be done by replacing the nonlinear equationf(x) = 0 by its linear approximation. This can be found by a multidimensional Taylor expansionaround x:

f (x, y) = f (x, y) +∂ f∂x

(x, y)(x− x) +∂ f∂y

(x, y)(y− y) + . . .

g(x, y) = g(x, y) +∂g∂x

(x, y)(x− x) +∂g∂y

(x, y)(y− y) + . . .

where the . . . represent higher order terms, which are small if x ≈ x. By ignoring these terms weget a linear approximation to f(x), and rather than solving the nonlinear original system, we cansolve its linear approximation:

f (x, y) +∂ f∂x

(x, y)(x− x) +∂ f∂y

(x, y)(y− y) = 0

g(x, y) +∂g∂x

(x, y)(x− x) +∂g∂y

(x, y)(y− y) = 0

or more compactf(x) + J(x)(x− x) = 0,

where the Jacobian J(x) is given by

J(x) =

(∂ f∂x (x, y) ∂ f

∂y (x, y)∂g∂x (x, y) ∂g

∂y (x, y)

)

It is to be hoped that the solution of the linear equation x provides a better approximation to rthan our initial guess x, so the process can be repeated, resulting in

Newton’s method for system of equations.

• Given a function f(x), its Jacobian J(x) and a starting value x0.

• For k = 0, 1, 2, 3, . . .

• Solve the system J(xk)∆k = −f(xk).

• Let xk+1 = xk + ∆k.

19

The strategy can be generalized to systems of n equations in n unknowns, in which case the Jaco-bian is given by:

J(x) =

∂ f1∂x1

(x) ∂ f1∂x2

(x) · · · ∂ f1∂xn

(x)∂ f2∂x1

(x) ∂ f2∂x2

(x) · · · ∂ f2∂xn

(x)...

......

∂ fn∂x1

(x) ∂ fn∂x2

(x) · · · ∂ fn∂xn

(x)

.


Newton’s method for system of equations is implemented in the function newton_system. The nu-merical solution is accepted when all components of f(xk) are smaller than a tolerance in absolutevalue, that means when |f(xk)|{∞} < tol. See Preliminaries, section 1 for a description of norms.

In [24]: np.set_printoptions(precision=15) # Output with high accuracy

In [25]: def newton_system(f, jac, x0, tol = 1.e-10, max_iter=20):x = x0print('k ={:3d}, x = '.format(0), x)for k in range(max_iter):

fx = f(x)if norm(fx, np.inf) < tol: # The solution is accepted.

breakJx = jac(x)try:

delta = solve(Jx, -fx)except:

print("Error in solving the Newton system")k = -1 # return -1 as iteration numberbreak

x = x + deltaprint('k ={:3d}, x = '.format(k+1), x)

return x, k

Example 6: Solve the equations from Example 5 by Newton’s method, that is, the system

x31 − x2 +

14= 0,

x21 + x2

2 − 1 = 0.

The vector valued function f and the Jacobian J are in this case

f(x) =(

x31 − x2 +

14

x21 + x2

2 − 1

)and J(x) =

(3x2

1 −12x1 2x2

)We already know that the system has 2 solutions, one in the first and one in the third quadrant. Tofind the first one, choose x0 = [1, 1]T as starting value.

In [33]: # Example 6

20

# The vector valued function. Notice the indexing.def f(x):

y = np.array([x[0]**3-x[1]+0.25,x[0]**2+x[1]**2-1])

return y

# The Jacobiandef jac(x):

J = np.array([[3*x[0]**2, -1],[2*x[0], 2*x[1]]])

return J

x0 = np.array([1.0, 1.0]) # Starting valuesmax_iter = 50x, nit = newton_system(f, jac, x0, tol = 1.e-12, max_iter = max_iter) # Apply Newton's method

print('\nTest: f(x)={}'.format(f(x)))if nit == max_iter-1:

print('Warning: Convergence has not been achieved')

k = 0, x = [ 1. 1.]k = 1, x = [ 0.8125 0.6875]k = 2, x = [ 0.750687815833801 0.663959854014599]k = 3, x = [ 0.746302675769953 0.665623251157924]k = 4, x = [ 0.746281278080405 0.665630719318386]k = 5, x = [ 0.746281277575054 0.665630719499142]

Test: f(x)=[ 1.110223024625157e-16 0.000000000000000e+00]

3.2 Exercise 5: Newton’s method for a system

a) Search for the solution of Example 5 in the third quadrant by changing the initial values.

b) Apply Newton’s method to the system

xey = 1,−x2 + y = 1,

using x0 = y0 = −1.

3.2.1 Final remarks

A complete error and convergence analysis of Newton’s method for systems is far from trivial,and outside the scope of this course. But in summary: If f is sufficiently differentiable, and thereis a solution r of the system f(x) = 0 and J(r) is nonsingular, then the Newton iterations willconverge quadratically towards r for all x0 sufficiently close to r.

Finding solutions of nonlinear equations is difficult. Even if the Newton iterations in principlewill converge, it can be very hard to find sufficient good starting values. Nonlinear equationscan have none or many solutions. Even when there are solutions, there is no guarantee that thesolution you find is the one you want.

21

If n is large, each iteration is computationally expensive since the Jacobian is evaluated in eachiteration, and a linear system has to be solved. In practice, some modified and more efficientversion of Newton’s method will be used, maybe together with more robust but slow algorithmsfor finding sufficiently good starting values.

22

numerical solution of non-linear equations

Documents