lecture topic: systems of linear equations - ibm › researcher › files › ie... ·...

206
Lecture Topic: Systems of Linear Equations

Upload: others

Post on 24-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Lecture Topic: Systems of Linear Equations

Page 2: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Introduction: systems of linear equationsWe examine both iterative and direct methods for solving equations

Ax = b (1)

where x , b ∈ Rn, A ∈ MnR is an n × n matrix,x is an unknown vector (to be found) and b and A are known.

Solving systems of linear equations is still the most important problem incomputational mathematics, because it is used as a sub-problem in solving otherproblems.

Algorithms that solve non-linear systems commonly use linear approximations,which give rise to systems of linear equations.

Algorithms that optimise over feasible sets given by linear and non-linear equalitiesand inequalities commonly solve systems related to first-order optimalityconditions iteratively, which give rise to systems of linear equations.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 2 / 1

Page 3: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Introduction: systems of linear equationsWe examine both iterative and direct methods for solving equations

Ax = b (1)

where x , b ∈ Rn, A ∈ MnR is an n × n matrix,x is an unknown vector (to be found) and b and A are known.

Solving systems of linear equations is still the most important problem incomputational mathematics, because it is used as a sub-problem in solving otherproblems.

Algorithms that solve non-linear systems commonly use linear approximations,which give rise to systems of linear equations.

Algorithms that optimise over feasible sets given by linear and non-linear equalitiesand inequalities commonly solve systems related to first-order optimalityconditions iteratively, which give rise to systems of linear equations.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 2 / 1

Page 4: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Introduction: systems of linear equationsWe examine both iterative and direct methods for solving equations

Ax = b (1)

where x , b ∈ Rn, A ∈ MnR is an n × n matrix,x is an unknown vector (to be found) and b and A are known.

Solving systems of linear equations is still the most important problem incomputational mathematics, because it is used as a sub-problem in solving otherproblems.

Algorithms that solve non-linear systems commonly use linear approximations,which give rise to systems of linear equations.

Algorithms that optimise over feasible sets given by linear and non-linear equalitiesand inequalities commonly solve systems related to first-order optimalityconditions iteratively, which give rise to systems of linear equations.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 2 / 1

Page 5: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Mathematics and Algorithms

In Quantitative Methods you have learned the mathematical ideas behind directand iterative approaches to solving (1). You should understand:

Theorem

The following are equivalent for any n × n matrix A:

Ax = b has a unique solution of all b ∈ R.

Ax = 0 implies x = 0.

A−1 exists.

det(A) 6= 0.

rank(A) = n.

The full rank of A is also our assumption throughout the chapter.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 3 / 1

Page 6: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Mathematics and Algorithms

Here, we build on these and analyse the related algorithms, focussing first onconditioning, second on the stability of direct methods, and third on theconvergence and stability of iterative methods.

This still leaves much unexplained, including conjugate gradients (CG),generalised minimal residuals (GMRES), and preconditioning, i.e. methods forchanging the condition.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 4 / 1

Page 7: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Mathematics and Algorithms

Here, we build on these and analyse the related algorithms, focussing first onconditioning, second on the stability of direct methods, and third on theconvergence and stability of iterative methods.

This still leaves much unexplained, including conjugate gradients (CG),generalised minimal residuals (GMRES), and preconditioning, i.e. methods forchanging the condition.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 4 / 1

Page 8: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Condition of a System of Linear Equations

Condition suggests how changes in A and b, the “instance” of the problem, affectthe solution x , using any algorithm.

We will examine errors in A and b separately.

It turns out that in both cases the condition number of the matrix A plays a role.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 5 / 1

Page 9: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Condition of a System of Linear Equations

Condition suggests how changes in A and b, the “instance” of the problem, affectthe solution x , using any algorithm.

We will examine errors in A and b separately.

It turns out that in both cases the condition number of the matrix A plays a role.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 5 / 1

Page 10: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Condition of a System of Linear Equations

Condition suggests how changes in A and b, the “instance” of the problem, affectthe solution x , using any algorithm.

We will examine errors in A and b separately.

It turns out that in both cases the condition number of the matrix A plays a role.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 5 / 1

Page 11: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An ExampleConsider the system of linear equations

x1 + 0.99x2 = 1.99

0.99x1 + 0.98x2 = 1.97.

The true solution is x1 = 1 and x2 = 1 but x1 = 3.0000 and x2 = −1.0203 gives

x1 + 0.99x2 = 1.989903

0.99x1 + 0.98x2 = 1.970106.

Thus, a small change in the problem data, a change in the vector b from(1.991.97

)to

(1.9899031.970106

),

leads to a large change in the solution:this is our criterion for ill-conditioning.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 6 / 1

Page 12: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An ExampleConsider the system of linear equations

x1 + 0.99x2 = 1.99

0.99x1 + 0.98x2 = 1.97.

The true solution is x1 = 1 and x2 = 1 but x1 = 3.0000 and x2 = −1.0203 gives

x1 + 0.99x2 = 1.989903

0.99x1 + 0.98x2 = 1.970106.

Thus, a small change in the problem data, a change in the vector b from(1.991.97

)to

(1.9899031.970106

),

leads to a large change in the solution:this is our criterion for ill-conditioning.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 6 / 1

Page 13: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An ExampleConsider the system of linear equations

x1 + 0.99x2 = 1.99

0.99x1 + 0.98x2 = 1.97.

The true solution is x1 = 1 and x2 = 1 but x1 = 3.0000 and x2 = −1.0203 gives

x1 + 0.99x2 = 1.989903

0.99x1 + 0.98x2 = 1.970106.

Thus, a small change in the problem data, a change in the vector b from(1.991.97

)to

(1.9899031.970106

),

leads to a large change in the solution:this is our criterion for ill-conditioning.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 6 / 1

Page 14: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An ExampleConsider the system of linear equations

x1 + 0.99x2 = 1.99

0.99x1 + 0.98x2 = 1.97.

The true solution is x1 = 1 and x2 = 1 but x1 = 3.0000 and x2 = −1.0203 gives

x1 + 0.99x2 = 1.989903

0.99x1 + 0.98x2 = 1.970106.

Thus, a small change in the problem data, a change in the vector b from(1.991.97

)to

(1.9899031.970106

),

leads to a large change in the solution:this is our criterion for ill-conditioning.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 6 / 1

Page 15: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Example

−1 0 1 2 3−1

0

1

2

3

x1

x2

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 7 / 1

Page 16: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Example

0.9999 1.0000 1.00010.9999

1.0000

1.0001

x1

x2

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 8 / 1

Page 17: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Perturbation of bLet the right-hand-side b be perturbed by δb. So we want to find the solution of

A(x + δx) = b + δb. (2.1)

‖ ‖ denotes a vector or matrix norm, according to context. By (2.1), Aδx = δb, so

δx = A−1δb ⇒ ‖δx‖ ≤ ‖A−1‖ ‖δb‖ (a sharp bound). (2.2)

Since b = Ax , the properties of matrix norms again give

‖b‖ ≤ ‖A‖ ‖x‖. (2.3)

Hence, combining (2.2) and (2.3): each LHS≤RHS, so∏

LHS≤∏RHS:

‖δx‖ ‖b‖ ≤ ‖A‖ ‖A−1‖ ‖x‖ ‖δb‖and assuming b 6= 0 we get

‖δx‖‖x‖ ≤ ‖A‖ ‖A

−1‖‖δb‖‖b‖i.e., (rel. error in x) ≤ ‖A‖ ‖A−1‖(rel. error in b).

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 9 / 1

Page 18: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Perturbation of bLet the right-hand-side b be perturbed by δb. So we want to find the solution of

A(x + δx) = b + δb. (2.1)

‖ ‖ denotes a vector or matrix norm, according to context. By (2.1), Aδx = δb, so

δx = A−1δb ⇒ ‖δx‖ ≤ ‖A−1‖ ‖δb‖ (a sharp bound). (2.2)

Since b = Ax , the properties of matrix norms again give

‖b‖ ≤ ‖A‖ ‖x‖. (2.3)

Hence, combining (2.2) and (2.3): each LHS≤RHS, so∏

LHS≤∏RHS:

‖δx‖ ‖b‖ ≤ ‖A‖ ‖A−1‖ ‖x‖ ‖δb‖and assuming b 6= 0 we get

‖δx‖‖x‖ ≤ ‖A‖ ‖A

−1‖‖δb‖‖b‖i.e., (rel. error in x) ≤ ‖A‖ ‖A−1‖(rel. error in b).

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 9 / 1

Page 19: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Perturbation of bLet the right-hand-side b be perturbed by δb. So we want to find the solution of

A(x + δx) = b + δb. (2.1)

‖ ‖ denotes a vector or matrix norm, according to context. By (2.1), Aδx = δb, so

δx = A−1δb ⇒ ‖δx‖ ≤ ‖A−1‖ ‖δb‖ (a sharp bound). (2.2)

Since b = Ax , the properties of matrix norms again give

‖b‖ ≤ ‖A‖ ‖x‖. (2.3)

Hence, combining (2.2) and (2.3): each LHS≤RHS, so∏

LHS≤∏RHS:

‖δx‖ ‖b‖ ≤ ‖A‖ ‖A−1‖ ‖x‖ ‖δb‖and assuming b 6= 0 we get

‖δx‖‖x‖ ≤ ‖A‖ ‖A

−1‖‖δb‖‖b‖i.e., (rel. error in x) ≤ ‖A‖ ‖A−1‖(rel. error in b).

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 9 / 1

Page 20: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Perturbation of bLet the right-hand-side b be perturbed by δb. So we want to find the solution of

A(x + δx) = b + δb. (2.1)

‖ ‖ denotes a vector or matrix norm, according to context. By (2.1), Aδx = δb, so

δx = A−1δb ⇒ ‖δx‖ ≤ ‖A−1‖ ‖δb‖ (a sharp bound). (2.2)

Since b = Ax , the properties of matrix norms again give

‖b‖ ≤ ‖A‖ ‖x‖. (2.3)

Hence, combining (2.2) and (2.3): each LHS≤RHS, so∏

LHS≤∏RHS:

‖δx‖ ‖b‖ ≤ ‖A‖ ‖A−1‖ ‖x‖ ‖δb‖and assuming b 6= 0 we get

‖δx‖‖x‖ ≤ ‖A‖ ‖A

−1‖‖δb‖‖b‖i.e., (rel. error in x) ≤ ‖A‖ ‖A−1‖(rel. error in b).

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 9 / 1

Page 21: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Perturbation of bLet the right-hand-side b be perturbed by δb. So we want to find the solution of

A(x + δx) = b + δb. (2.1)

‖ ‖ denotes a vector or matrix norm, according to context. By (2.1), Aδx = δb, so

δx = A−1δb ⇒ ‖δx‖ ≤ ‖A−1‖ ‖δb‖ (a sharp bound). (2.2)

Since b = Ax , the properties of matrix norms again give

‖b‖ ≤ ‖A‖ ‖x‖. (2.3)

Hence, combining (2.2) and (2.3): each LHS≤RHS, so∏

LHS≤∏RHS:

‖δx‖ ‖b‖ ≤ ‖A‖ ‖A−1‖ ‖x‖ ‖δb‖and assuming b 6= 0 we get

‖δx‖‖x‖ ≤ ‖A‖ ‖A

−1‖‖δb‖‖b‖i.e., (rel. error in x) ≤ ‖A‖ ‖A−1‖(rel. error in b).

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 9 / 1

Page 22: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Perturbation of bLet the right-hand-side b be perturbed by δb. So we want to find the solution of

A(x + δx) = b + δb. (2.1)

‖ ‖ denotes a vector or matrix norm, according to context. By (2.1), Aδx = δb, so

δx = A−1δb ⇒ ‖δx‖ ≤ ‖A−1‖ ‖δb‖ (a sharp bound). (2.2)

Since b = Ax , the properties of matrix norms again give

‖b‖ ≤ ‖A‖ ‖x‖. (2.3)

Hence, combining (2.2) and (2.3): each LHS≤RHS, so∏

LHS≤∏RHS:

‖δx‖ ‖b‖ ≤ ‖A‖ ‖A−1‖ ‖x‖ ‖δb‖and assuming b 6= 0 we get

‖δx‖‖x‖ ≤ ‖A‖ ‖A

−1‖‖δb‖‖b‖i.e., (rel. error in x) ≤ ‖A‖ ‖A−1‖(rel. error in b).

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 9 / 1

Page 23: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Perturbation of bLet the right-hand-side b be perturbed by δb. So we want to find the solution of

A(x + δx) = b + δb. (2.1)

‖ ‖ denotes a vector or matrix norm, according to context. By (2.1), Aδx = δb, so

δx = A−1δb ⇒ ‖δx‖ ≤ ‖A−1‖ ‖δb‖ (a sharp bound). (2.2)

Since b = Ax , the properties of matrix norms again give

‖b‖ ≤ ‖A‖ ‖x‖. (2.3)

Hence, combining (2.2) and (2.3): each LHS≤RHS, so∏

LHS≤∏RHS:

‖δx‖ ‖b‖ ≤ ‖A‖ ‖A−1‖ ‖x‖ ‖δb‖and assuming b 6= 0 we get

‖δx‖‖x‖ ≤ ‖A‖ ‖A

−1‖‖δb‖‖b‖i.e., (rel. error in x) ≤ ‖A‖ ‖A−1‖(rel. error in b).

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 9 / 1

Page 24: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Condition Number of a MatrixThus, the quantity ‖A‖ ‖A−1‖ measures the relative change in solution for a givenrelative change in problem: it measures the relative condition of the system oflinear equations problem.

Definition

Given a matrix norm ‖ ‖, the condition number of matrix A is

condrel(A) = ‖A‖‖A−1‖.

This depends on the norm used; but, since the underlying vector norms only differby a fixed multiplicative constant for a given n (all norms on Rn are equivalent),all measures of condition number are equally good.

We can interpret condrel(A) as:

the amount a relative error in b is magnified in the solution vector x ; or

the distortion A produces when applied to the unit sphere; or

how “close” A (and indeed A−1) is to being a singular matrix.

Page 25: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Condition Number of a MatrixThus, the quantity ‖A‖ ‖A−1‖ measures the relative change in solution for a givenrelative change in problem: it measures the relative condition of the system oflinear equations problem.

Definition

Given a matrix norm ‖ ‖, the condition number of matrix A is

condrel(A) = ‖A‖‖A−1‖.

This depends on the norm used; but, since the underlying vector norms only differby a fixed multiplicative constant for a given n (all norms on Rn are equivalent),all measures of condition number are equally good.

We can interpret condrel(A) as:

the amount a relative error in b is magnified in the solution vector x ; or

the distortion A produces when applied to the unit sphere; or

how “close” A (and indeed A−1) is to being a singular matrix.

Page 26: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Condition Number of a MatrixThus, the quantity ‖A‖ ‖A−1‖ measures the relative change in solution for a givenrelative change in problem: it measures the relative condition of the system oflinear equations problem.

Definition

Given a matrix norm ‖ ‖, the condition number of matrix A is

condrel(A) = ‖A‖‖A−1‖.

This depends on the norm used; but, since the underlying vector norms only differby a fixed multiplicative constant for a given n (all norms on Rn are equivalent),all measures of condition number are equally good.

We can interpret condrel(A) as:

the amount a relative error in b is magnified in the solution vector x ; or

the distortion A produces when applied to the unit sphere; or

how “close” A (and indeed A−1) is to being a singular matrix.

Page 27: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Condition Number of a MatrixThus, the quantity ‖A‖ ‖A−1‖ measures the relative change in solution for a givenrelative change in problem: it measures the relative condition of the system oflinear equations problem.

Definition

Given a matrix norm ‖ ‖, the condition number of matrix A is

condrel(A) = ‖A‖‖A−1‖.

This depends on the norm used; but, since the underlying vector norms only differby a fixed multiplicative constant for a given n (all norms on Rn are equivalent),all measures of condition number are equally good.

We can interpret condrel(A) as:

the amount a relative error in b is magnified in the solution vector x ; or

the distortion A produces when applied to the unit sphere; or

how “close” A (and indeed A−1) is to being a singular matrix.

Page 28: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Condition Number of a MatrixThus, the quantity ‖A‖ ‖A−1‖ measures the relative change in solution for a givenrelative change in problem: it measures the relative condition of the system oflinear equations problem.

Definition

Given a matrix norm ‖ ‖, the condition number of matrix A is

condrel(A) = ‖A‖‖A−1‖.

This depends on the norm used; but, since the underlying vector norms only differby a fixed multiplicative constant for a given n (all norms on Rn are equivalent),all measures of condition number are equally good.

We can interpret condrel(A) as:

the amount a relative error in b is magnified in the solution vector x ; or

the distortion A produces when applied to the unit sphere; or

how “close” A (and indeed A−1) is to being a singular matrix.

Page 29: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Spectral Condition Number of a MatrixDefinition

We also define the spectral condition number of A as

cond*rel(A) :=

maxλ∈σ(A) |λ|minλ∈σ(A) |λ|

.

Here σ(A), the spectrum of A, is the set of all eigenvalues of A.

If λ is an eigenvalue of A, its modulus or length |λ| is the factor by which aλ-eigenvector is expanded (if |λ| > 1) or contracted (if |λ| < 1). Thus

ρ(A) = maxλ∈σ(A) |λ|, the spectral radius, is the largest factor by which Amultiplies an eigenvector, whileminλ∈σ(A) |λ| is the smallest factor by which A multiplies an eigenvector.

The ratio cond*rel(A) is thus a measure of the distortion produced by A:

how great is the difference in expansion/contraction of eigenvectors that A cancause.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 11 / 1

Page 30: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Spectral Condition Number of a MatrixDefinition

We also define the spectral condition number of A as

cond*rel(A) :=

maxλ∈σ(A) |λ|minλ∈σ(A) |λ|

.

Here σ(A), the spectrum of A, is the set of all eigenvalues of A.

If λ is an eigenvalue of A, its modulus or length |λ| is the factor by which aλ-eigenvector is expanded (if |λ| > 1) or contracted (if |λ| < 1). Thus

ρ(A) = maxλ∈σ(A) |λ|, the spectral radius, is the largest factor by which Amultiplies an eigenvector, whileminλ∈σ(A) |λ| is the smallest factor by which A multiplies an eigenvector.

The ratio cond*rel(A) is thus a measure of the distortion produced by A:

how great is the difference in expansion/contraction of eigenvectors that A cancause.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 11 / 1

Page 31: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Spectral Condition Number of a MatrixDefinition

We also define the spectral condition number of A as

cond*rel(A) :=

maxλ∈σ(A) |λ|minλ∈σ(A) |λ|

.

Here σ(A), the spectrum of A, is the set of all eigenvalues of A.

If λ is an eigenvalue of A, its modulus or length |λ| is the factor by which aλ-eigenvector is expanded (if |λ| > 1) or contracted (if |λ| < 1). Thus

ρ(A) = maxλ∈σ(A) |λ|, the spectral radius, is the largest factor by which Amultiplies an eigenvector, whileminλ∈σ(A) |λ| is the smallest factor by which A multiplies an eigenvector.

The ratio cond*rel(A) is thus a measure of the distortion produced by A:

how great is the difference in expansion/contraction of eigenvectors that A cancause.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 11 / 1

Page 32: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Spectral Condition Number of a MatrixDefinition

We also define the spectral condition number of A as

cond*rel(A) :=

maxλ∈σ(A) |λ|minλ∈σ(A) |λ|

.

Here σ(A), the spectrum of A, is the set of all eigenvalues of A.

If λ is an eigenvalue of A, its modulus or length |λ| is the factor by which aλ-eigenvector is expanded (if |λ| > 1) or contracted (if |λ| < 1). Thus

ρ(A) = maxλ∈σ(A) |λ|, the spectral radius, is the largest factor by which Amultiplies an eigenvector, whileminλ∈σ(A) |λ| is the smallest factor by which A multiplies an eigenvector.

The ratio cond*rel(A) is thus a measure of the distortion produced by A:

how great is the difference in expansion/contraction of eigenvectors that A cancause.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 11 / 1

Page 33: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Spectral Condition Number of a MatrixDefinition

We also define the spectral condition number of A as

cond*rel(A) :=

maxλ∈σ(A) |λ|minλ∈σ(A) |λ|

.

Here σ(A), the spectrum of A, is the set of all eigenvalues of A.

If λ is an eigenvalue of A, its modulus or length |λ| is the factor by which aλ-eigenvector is expanded (if |λ| > 1) or contracted (if |λ| < 1). Thus

ρ(A) = maxλ∈σ(A) |λ|, the spectral radius, is the largest factor by which Amultiplies an eigenvector, whileminλ∈σ(A) |λ| is the smallest factor by which A multiplies an eigenvector.

The ratio cond*rel(A) is thus a measure of the distortion produced by A:

how great is the difference in expansion/contraction of eigenvectors that A cancause.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 11 / 1

Page 34: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Spectral Condition Number of a MatrixDefinition

We also define the spectral condition number of A as

cond*rel(A) :=

maxλ∈σ(A) |λ|minλ∈σ(A) |λ|

.

Here σ(A), the spectrum of A, is the set of all eigenvalues of A.

If λ is an eigenvalue of A, its modulus or length |λ| is the factor by which aλ-eigenvector is expanded (if |λ| > 1) or contracted (if |λ| < 1). Thus

ρ(A) = maxλ∈σ(A) |λ|, the spectral radius, is the largest factor by which Amultiplies an eigenvector, whileminλ∈σ(A) |λ| is the smallest factor by which A multiplies an eigenvector.

The ratio cond*rel(A) is thus a measure of the distortion produced by A:

how great is the difference in expansion/contraction of eigenvectors that A cancause.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 11 / 1

Page 35: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Example

In the example above, the related matrix

A =

(1.00 .99.99 .98

).

has eigenvalues λ1 = 1.98, λ2 = −0.00005 of the characteristic equation,

det(A− λI ) = det

(1.00− λ .99.99 .98− λ

)= (1− λ)(.98− λ)− .992

= λ2 − 1.98λ+ .98− .9801 = λ2 − 1.98λ− .0001.

Thus the spectral condition number is cond*rel(A) = |1.98|/| − 0.00005| =39,600.

Hence, this matrix is very ill-conditioned.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 12 / 1

Page 36: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Example

In the example above, the related matrix

A =

(1.00 .99.99 .98

).

has eigenvalues λ1 = 1.98, λ2 = −0.00005 of the characteristic equation,

det(A− λI ) = det

(1.00− λ .99.99 .98− λ

)= (1− λ)(.98− λ)− .992

= λ2 − 1.98λ+ .98− .9801 = λ2 − 1.98λ− .0001.

Thus the spectral condition number is cond*rel(A) = |1.98|/| − 0.00005| =39,600.

Hence, this matrix is very ill-conditioned.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 12 / 1

Page 37: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Properties of the Condition NumberThe condition number condrel(A) is bounded below by 1:this is seen by noting that ‖I‖ = 1 for any norm and

1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖‖A−1‖ = condrel(A).

Fact

Each norm-based condition number is also bounded below by the spectralcondition number of A:

1 ≤ cond*rel(A) ≤ condrel(A)

for any norm.

Thus the spectral condition number is the smallest measure of relative conditionof the system of linear equations problem.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 13 / 1

Page 38: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Properties of the Condition NumberThe condition number condrel(A) is bounded below by 1:this is seen by noting that ‖I‖ = 1 for any norm and

1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖‖A−1‖ = condrel(A).

Fact

Each norm-based condition number is also bounded below by the spectralcondition number of A:

1 ≤ cond*rel(A) ≤ condrel(A)

for any norm.

Thus the spectral condition number is the smallest measure of relative conditionof the system of linear equations problem.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 13 / 1

Page 39: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Properties of the Condition NumberThe condition number condrel(A) is bounded below by 1:this is seen by noting that ‖I‖ = 1 for any norm and

1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖‖A−1‖ = condrel(A).

Fact

Each norm-based condition number is also bounded below by the spectralcondition number of A:

1 ≤ cond*rel(A) ≤ condrel(A)

for any norm.

Thus the spectral condition number is the smallest measure of relative conditionof the system of linear equations problem.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 13 / 1

Page 40: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Perturbation of AIf A is perturbed by δA then we have

b = (A + δA)(x + δx)

= Ax + Aδx + δAx + δAδx

⇒ Aδx = −δA(x + δx)

⇒ δx = −A−1δA(x + δx)

Taking norms and using the triangle inequality we have∥∥δx∥∥ =∥∥A−1δA(x + δx)

∥∥≤ ‖A−1‖‖δA‖(‖x‖+ ‖δx‖)

⇒ ‖δx‖(1− ‖A−1‖ ‖δA‖

)≤ ‖A−1‖ ‖δA‖ ‖x‖.

Thus‖δx‖‖x‖ ≤ ‖A−1‖ ‖δA‖

1− ‖A−1‖ ‖δA‖ =‖A‖ ‖A−1‖

(1− ‖A−1‖ ‖δA‖)‖δA‖‖A‖

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 14 / 1

Page 41: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Perturbation of AIf A is perturbed by δA then we have

b = (A + δA)(x + δx)

= Ax + Aδx + δAx + δAδx

⇒ Aδx = −δA(x + δx)

⇒ δx = −A−1δA(x + δx)

Taking norms and using the triangle inequality we have∥∥δx∥∥ =∥∥A−1δA(x + δx)

∥∥≤ ‖A−1‖‖δA‖(‖x‖+ ‖δx‖)

⇒ ‖δx‖(1− ‖A−1‖ ‖δA‖

)≤ ‖A−1‖ ‖δA‖ ‖x‖.

Thus‖δx‖‖x‖ ≤ ‖A−1‖ ‖δA‖

1− ‖A−1‖ ‖δA‖ =‖A‖ ‖A−1‖

(1− ‖A−1‖ ‖δA‖)‖δA‖‖A‖

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 14 / 1

Page 42: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Perturbation of AIf A is perturbed by δA then we have

b = (A + δA)(x + δx)

= Ax + Aδx + δAx + δAδx

⇒ Aδx = −δA(x + δx)

⇒ δx = −A−1δA(x + δx)

Taking norms and using the triangle inequality we have∥∥δx∥∥ =∥∥A−1δA(x + δx)

∥∥≤ ‖A−1‖‖δA‖(‖x‖+ ‖δx‖)

⇒ ‖δx‖(1− ‖A−1‖ ‖δA‖

)≤ ‖A−1‖ ‖δA‖ ‖x‖.

Thus‖δx‖‖x‖ ≤ ‖A−1‖ ‖δA‖

1− ‖A−1‖ ‖δA‖ =‖A‖ ‖A−1‖

(1− ‖A−1‖ ‖δA‖)‖δA‖‖A‖

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 14 / 1

Page 43: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Condition Number Again. . .

Since 1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖‖A−1‖ we have 1‖A‖ ≤ ‖A−1‖.

Thus, if ‖A−1‖ ‖δA‖ � 1 (and so ‖δA‖/‖A‖ � 1),

then 1− ‖A−1‖ ‖δA‖ ≈ 1 and we have

‖δx‖‖x‖ ≤ condrel(A)

‖δA‖‖A‖ .

Thus, for a small perturbation of A, we again have that condition numbermeasures the relative condition of the system of linear equations problem.

A similar result can be derived for the case where both A and b are perturbed.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 15 / 1

Page 44: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Condition Number Again. . .

Since 1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖‖A−1‖ we have 1‖A‖ ≤ ‖A−1‖.

Thus, if ‖A−1‖ ‖δA‖ � 1 (and so ‖δA‖/‖A‖ � 1),

then 1− ‖A−1‖ ‖δA‖ ≈ 1 and we have

‖δx‖‖x‖ ≤ condrel(A)

‖δA‖‖A‖ .

Thus, for a small perturbation of A, we again have that condition numbermeasures the relative condition of the system of linear equations problem.

A similar result can be derived for the case where both A and b are perturbed.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 15 / 1

Page 45: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Condition Number Again. . .

Since 1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖‖A−1‖ we have 1‖A‖ ≤ ‖A−1‖.

Thus, if ‖A−1‖ ‖δA‖ � 1 (and so ‖δA‖/‖A‖ � 1),

then 1− ‖A−1‖ ‖δA‖ ≈ 1 and we have

‖δx‖‖x‖ ≤ condrel(A)

‖δA‖‖A‖ .

Thus, for a small perturbation of A, we again have that condition numbermeasures the relative condition of the system of linear equations problem.

A similar result can be derived for the case where both A and b are perturbed.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 15 / 1

Page 46: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Condition Number Again. . .

Since 1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖‖A−1‖ we have 1‖A‖ ≤ ‖A−1‖.

Thus, if ‖A−1‖ ‖δA‖ � 1 (and so ‖δA‖/‖A‖ � 1),

then 1− ‖A−1‖ ‖δA‖ ≈ 1 and we have

‖δx‖‖x‖ ≤ condrel(A)

‖δA‖‖A‖ .

Thus, for a small perturbation of A, we again have that condition numbermeasures the relative condition of the system of linear equations problem.

A similar result can be derived for the case where both A and b are perturbed.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 15 / 1

Page 47: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Condition Number Again. . .

Since 1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖‖A−1‖ we have 1‖A‖ ≤ ‖A−1‖.

Thus, if ‖A−1‖ ‖δA‖ � 1 (and so ‖δA‖/‖A‖ � 1),

then 1− ‖A−1‖ ‖δA‖ ≈ 1 and we have

‖δx‖‖x‖ ≤ condrel(A)

‖δA‖‖A‖ .

Thus, for a small perturbation of A, we again have that condition numbermeasures the relative condition of the system of linear equations problem.

A similar result can be derived for the case where both A and b are perturbed.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 15 / 1

Page 48: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

“How Close to Singular?”

Theorem

If A is non-singular, and‖δA‖‖A‖ <

1

condrel(A)

then A + δA is also non-singular.

This theorem tells us that the condition number measures the distance from A tothe nearest singular matrix:it is a better measure than the determinant of“how close to singularity” a matrix is.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 16 / 1

Page 49: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

“How Close to Singular?”

Theorem

If A is non-singular, and‖δA‖‖A‖ <

1

condrel(A)

then A + δA is also non-singular.

This theorem tells us that the condition number measures the distance from A tothe nearest singular matrix:it is a better measure than the determinant of“how close to singularity” a matrix is.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 16 / 1

Page 50: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

“How Close to Singular?”

Theorem

If A is non-singular, and‖δA‖‖A‖ <

1

condrel(A)

then A + δA is also non-singular.

This theorem tells us that the condition number measures the distance from A tothe nearest singular matrix:it is a better measure than the determinant of“how close to singularity” a matrix is.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 16 / 1

Page 51: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

“How Close to Singular?”

Theorem

If A is non-singular, and‖δA‖‖A‖ <

1

condrel(A)

then A + δA is also non-singular.

This theorem tells us that the condition number measures the distance from A tothe nearest singular matrix:it is a better measure than the determinant of“how close to singularity” a matrix is.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 16 / 1

Page 52: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Errors and Residuals

There are two common ways to measure the discrepancy between the truesolution x and the computed solution x :

Error δx = x − x

Residual r = b − Ax

If A is invertible, and either δx or r is zero, then both must be zero.In many applications, we want to solve Ax = b so that r , the difference betweenthe LHS and RHS, is small, i.e., so that ‖r‖ = ‖b − Ax‖ is small.

Intuitively, we can think of the residual as follows:if you have a computed solution x to a system of linear equations and you knowthe exact solution x , then you know the error δx = x − x ;but if you don’t know the solution x beforehand then the residual is a measurealong a different axis of how close you are, r = b − Ax .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 17 / 1

Page 53: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Errors and Residuals

There are two common ways to measure the discrepancy between the truesolution x and the computed solution x :

Error δx = x − x

Residual r = b − Ax

If A is invertible, and either δx or r is zero, then both must be zero.In many applications, we want to solve Ax = b so that r , the difference betweenthe LHS and RHS, is small, i.e., so that ‖r‖ = ‖b − Ax‖ is small.

Intuitively, we can think of the residual as follows:if you have a computed solution x to a system of linear equations and you knowthe exact solution x , then you know the error δx = x − x ;but if you don’t know the solution x beforehand then the residual is a measurealong a different axis of how close you are, r = b − Ax .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 17 / 1

Page 54: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Errors and Residuals

There are two common ways to measure the discrepancy between the truesolution x and the computed solution x :

Error δx = x − x

Residual r = b − Ax

If A is invertible, and either δx or r is zero, then both must be zero.In many applications, we want to solve Ax = b so that r , the difference betweenthe LHS and RHS, is small, i.e., so that ‖r‖ = ‖b − Ax‖ is small.

Intuitively, we can think of the residual as follows:if you have a computed solution x to a system of linear equations and you knowthe exact solution x , then you know the error δx = x − x ;but if you don’t know the solution x beforehand then the residual is a measurealong a different axis of how close you are, r = b − Ax .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 17 / 1

Page 55: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Errors and Residuals

There are two common ways to measure the discrepancy between the truesolution x and the computed solution x :

Error δx = x − x

Residual r = b − Ax

If A is invertible, and either δx or r is zero, then both must be zero.In many applications, we want to solve Ax = b so that r , the difference betweenthe LHS and RHS, is small, i.e., so that ‖r‖ = ‖b − Ax‖ is small.

Intuitively, we can think of the residual as follows:if you have a computed solution x to a system of linear equations and you knowthe exact solution x , then you know the error δx = x − x ;but if you don’t know the solution x beforehand then the residual is a measurealong a different axis of how close you are, r = b − Ax .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 17 / 1

Page 56: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Errors and Residuals

There are two common ways to measure the discrepancy between the truesolution x and the computed solution x :

Error δx = x − x

Residual r = b − Ax

If A is invertible, and either δx or r is zero, then both must be zero.In many applications, we want to solve Ax = b so that r , the difference betweenthe LHS and RHS, is small, i.e., so that ‖r‖ = ‖b − Ax‖ is small.

Intuitively, we can think of the residual as follows:if you have a computed solution x to a system of linear equations and you knowthe exact solution x , then you know the error δx = x − x ;but if you don’t know the solution x beforehand then the residual is a measurealong a different axis of how close you are, r = b − Ax .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 17 / 1

Page 57: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Cond(A), r and Errors in xLet x be the computed solution to Ax = b.Then δx = x − x and r = b − Ax , giving

Aδx = Ax − Ax = b − Ax = r

⇒ δx = A−1r .

Thus ‖δx‖ ≤ ‖A−1‖ ‖r‖ (property of matrix norms) [1].

Similarly ‖b‖ ≤ ‖A‖ ‖x‖

so1

‖x‖ ≤‖A‖‖b‖ [2].

It follows that‖δx‖‖x‖ ≤ ‖A‖ ‖A

−1‖ ‖r‖‖b‖ (combining [1] and [2]).

Thus‖δx‖‖x‖ ≤ condrel(A)

‖r‖‖b‖

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 18 / 1

Page 58: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Effect of Ill-conditioned A

The conclusion is:if A is ill-conditioned then small ‖r‖ does not imply small ‖δx‖/‖x‖.(We’ll see there is a similar conclusion for solutions of non-linear equations:if the problem is ill-conditioned,then a small “residual” |f (xk)| does not mean that |xk − xk−1| is small.)

Solutions with small residuals,these can lead to large errors in x if A is ill-conditioned.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 19 / 1

Page 59: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Effect of Ill-conditioned A

The conclusion is:if A is ill-conditioned then small ‖r‖ does not imply small ‖δx‖/‖x‖.(We’ll see there is a similar conclusion for solutions of non-linear equations:if the problem is ill-conditioned,then a small “residual” |f (xk)| does not mean that |xk − xk−1| is small.)

Solutions with small residuals,these can lead to large errors in x if A is ill-conditioned.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 19 / 1

Page 60: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Effect of Ill-conditioned A

The conclusion is:if A is ill-conditioned then small ‖r‖ does not imply small ‖δx‖/‖x‖.(We’ll see there is a similar conclusion for solutions of non-linear equations:if the problem is ill-conditioned,then a small “residual” |f (xk)| does not mean that |xk − xk−1| is small.)

Solutions with small residuals,these can lead to large errors in x if A is ill-conditioned.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 19 / 1

Page 61: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Effect of Ill-conditioned A

The conclusion is:if A is ill-conditioned then small ‖r‖ does not imply small ‖δx‖/‖x‖.(We’ll see there is a similar conclusion for solutions of non-linear equations:if the problem is ill-conditioned,then a small “residual” |f (xk)| does not mean that |xk − xk−1| is small.)

Solutions with small residuals,these can lead to large errors in x if A is ill-conditioned.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 19 / 1

Page 62: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Special SystemsLet us consider the following types of matrices:

Symmetric: A = AT

Positive definite: xTAx > 0 for all x 6= 0 and therefore xTAx = λxT x > 0and eigenvalues λ > 0.Diagonally dominant (DD): The element on the diagonal is larger or equal tothe sum of the other elements in the row, i.e., |aii | ≥

∑j 6=i |aij |.

Strictly DD: The same, except for the strict inequality, i.e., |aii | >∑

j 6=i |aij |.Upper triangular: for aii 6= 0

a11 a12 · · · · · · a1n

0 a22...

... 0. . .

......

.... . .

...0 0 ann

Recall that if a matrix is in echelon form (e.g., upper triangular)the first non-zero entry in a row is called the pivot for that row:here akk is the pivot for the kth row.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 20 / 1

Page 63: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Special SystemsLet us consider the following types of matrices:

Symmetric: A = AT

Positive definite: xTAx > 0 for all x 6= 0 and therefore xTAx = λxT x > 0and eigenvalues λ > 0.Diagonally dominant (DD): The element on the diagonal is larger or equal tothe sum of the other elements in the row, i.e., |aii | ≥

∑j 6=i |aij |.

Strictly DD: The same, except for the strict inequality, i.e., |aii | >∑

j 6=i |aij |.Upper triangular: for aii 6= 0

a11 a12 · · · · · · a1n

0 a22...

... 0. . .

......

.... . .

...0 0 ann

Recall that if a matrix is in echelon form (e.g., upper triangular)the first non-zero entry in a row is called the pivot for that row:here akk is the pivot for the kth row.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 20 / 1

Page 64: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Overview of the Algorithms

Direct methods for solving Ax = b apply elementary matrix operations to A andb, giving a transformed problem A′x ′ = b′ which is easily solved for x ′. Withindirect methods:

In Gauss-Jordan, multiples of a pivot row are subtracted from other rows,such that one obtains an upper triangular matrix first, and an identity matrixnext. Gauss-Jordan works (with appropriate pivot) on any matrix, but isstable only for diagonally dominant or positive-definite matrices.

Gauss-Jordan is also closely related to the LU and LUP decomposition, whereU stands for upper triangular matrix and L stands for upper triangular matrix.

On symmetric positive definite matices, one can also use other decompositionmethods (e.g., Cholesky, QR), which are stable and faster.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 21 / 1

Page 65: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Overview of the Algorithms

Iterative methods successively improve an initial guess until it becomessatisfactory.

Iterative methods for systems of linear equations are best understood as means ofsolving an associated optimisation problem.

Let us have a quadric f := 12x

TAx + bT x + c with A positive definite. Wheneverthe first-order optimality conditions of minx∈Rn f (x) are satisfied, we have Ax = b.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 22 / 1

Page 66: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Overview of the Algorithms

Iterative methods successively improve an initial guess until it becomessatisfactory.

Iterative methods for systems of linear equations are best understood as means ofsolving an associated optimisation problem.

Let us have a quadric f := 12x

TAx + bT x + c with A positive definite. Wheneverthe first-order optimality conditions of minx∈Rn f (x) are satisfied, we have Ax = b.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 22 / 1

Page 67: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Overview of the Algorithms

Iterative methods successively improve an initial guess until it becomessatisfactory.

Iterative methods for systems of linear equations are best understood as means ofsolving an associated optimisation problem.

Let us have a quadric f := 12x

TAx + bT x + c with A positive definite. Wheneverthe first-order optimality conditions of minx∈Rn f (x) are satisfied, we have Ax = b.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 22 / 1

Page 68: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Overview of the Algorithms

Within iterative methods:

Jacobi method is guaranteed to converge if A is strictly diagonally dominant.

Gauss-Seidel is guaranteed to converge if A is either diagonally dominant orsymmetric positive semidefinite.

Many other algorithms (CG, GMRES) work on symmetric positive-definitematrices.

In a number of applications, iterative methods are preferred to direct methods,especially when the coefficient matrix A is sparse or structured.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 23 / 1

Page 69: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Gauss-JordanRecall that this method uses a sequence of elementary matrix operations totransform the square system Ax = b into an upper triangular system Ux = b′,which is then solved using back substitution.

We use a superscript in parentheses to denote the stage: x(k)i denotes the value

for xi at the kth stage and A(k) denotes the matrix A at this stage.

At stage k we have:

a(1)11 a

(1)12 · · · a

(1)1k · · · a

(1)1n b

(1)1

0 a(2)22 · · · a

(2)2k · · · a

(2)2n b

(2)2

.... . .

...

0 · · · · · · a(k)kk · · · a

(k)kn b

(k)k

......

0 · · · · · · a(k)nk · · · a

(k)nn b

(k)n

=(A(k) b(k)

)

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 24 / 1

Page 70: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Gauss-JordanRecall that this method uses a sequence of elementary matrix operations totransform the square system Ax = b into an upper triangular system Ux = b′,which is then solved using back substitution.

We use a superscript in parentheses to denote the stage: x(k)i denotes the value

for xi at the kth stage and A(k) denotes the matrix A at this stage.

At stage k we have:

a(1)11 a

(1)12 · · · a

(1)1k · · · a

(1)1n b

(1)1

0 a(2)22 · · · a

(2)2k · · · a

(2)2n b

(2)2

.... . .

...

0 · · · · · · a(k)kk · · · a

(k)kn b

(k)k

......

0 · · · · · · a(k)nk · · · a

(k)nn b

(k)n

=(A(k) b(k)

)

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 24 / 1

Page 71: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

What Gauss-Jordan does at Stage kThe elements a

(k)k+1,k , a

(k)k+2,k ,. . . , a

(k)nk are eliminated by subtracting the following

multiples of row k from rows k + 1, k + 2, . . . , n:

mk+1,k :=a

(k)k+1,k

a(k)kk

, mk+2,k :=a

(k)k+2,k

a(k)kk

, . . . , mn,k :=a

(k)n,k

a(k)kk

.

We have in general, assuming that a(k)kk 6= 0, the (i , k) multiplier

mik :=a

(k)ik

a(k)kk

i = k + 1, . . . , n

and, for all i , j = k + 1, . . . , n,

a(k+1)ij = a

(k)ij −mika

(k)kj ,

b(k+1)i = b

(k)i −mikb

(k)k .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 25 / 1

Page 72: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

What Gauss-Jordan does at Stage kThe elements a

(k)k+1,k , a

(k)k+2,k ,. . . , a

(k)nk are eliminated by subtracting the following

multiples of row k from rows k + 1, k + 2, . . . , n:

mk+1,k :=a

(k)k+1,k

a(k)kk

, mk+2,k :=a

(k)k+2,k

a(k)kk

, . . . , mn,k :=a

(k)n,k

a(k)kk

.

We have in general, assuming that a(k)kk 6= 0, the (i , k) multiplier

mik :=a

(k)ik

a(k)kk

i = k + 1, . . . , n

and, for all i , j = k + 1, . . . , n,

a(k+1)ij = a

(k)ij −mika

(k)kj ,

b(k+1)i = b

(k)i −mikb

(k)k .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 25 / 1

Page 73: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

What Gauss-Jordan does at Stage kThe elements a

(k)k+1,k , a

(k)k+2,k ,. . . , a

(k)nk are eliminated by subtracting the following

multiples of row k from rows k + 1, k + 2, . . . , n:

mk+1,k :=a

(k)k+1,k

a(k)kk

, mk+2,k :=a

(k)k+2,k

a(k)kk

, . . . , mn,k :=a

(k)n,k

a(k)kk

.

We have in general, assuming that a(k)kk 6= 0, the (i , k) multiplier

mik :=a

(k)ik

a(k)kk

i = k + 1, . . . , n

and, for all i , j = k + 1, . . . , n,

a(k+1)ij = a

(k)ij −mika

(k)kj ,

b(k+1)i = b

(k)i −mikb

(k)k .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 25 / 1

Page 74: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

What Gauss-Jordan does at Stage kThe elements a

(k)k+1,k , a

(k)k+2,k ,. . . , a

(k)nk are eliminated by subtracting the following

multiples of row k from rows k + 1, k + 2, . . . , n:

mk+1,k :=a

(k)k+1,k

a(k)kk

, mk+2,k :=a

(k)k+2,k

a(k)kk

, . . . , mn,k :=a

(k)n,k

a(k)kk

.

We have in general, assuming that a(k)kk 6= 0, the (i , k) multiplier

mik :=a

(k)ik

a(k)kk

i = k + 1, . . . , n

and, for all i , j = k + 1, . . . , n,

a(k+1)ij = a

(k)ij −mika

(k)kj ,

b(k+1)i = b

(k)i −mikb

(k)k .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 25 / 1

Page 75: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

What Gauss-Jordan does at Stage kThe elements a

(k)k+1,k , a

(k)k+2,k ,. . . , a

(k)nk are eliminated by subtracting the following

multiples of row k from rows k + 1, k + 2, . . . , n:

mk+1,k :=a

(k)k+1,k

a(k)kk

, mk+2,k :=a

(k)k+2,k

a(k)kk

, . . . , mn,k :=a

(k)n,k

a(k)kk

.

We have in general, assuming that a(k)kk 6= 0, the (i , k) multiplier

mik :=a

(k)ik

a(k)kk

i = k + 1, . . . , n

and, for all i , j = k + 1, . . . , n,

a(k+1)ij = a

(k)ij −mika

(k)kj ,

b(k+1)i = b

(k)i −mikb

(k)k .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 25 / 1

Page 76: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

What Gauss-Jordan does at Stage kThe elements a

(k)k+1,k , a

(k)k+2,k ,. . . , a

(k)nk are eliminated by subtracting the following

multiples of row k from rows k + 1, k + 2, . . . , n:

mk+1,k :=a

(k)k+1,k

a(k)kk

, mk+2,k :=a

(k)k+2,k

a(k)kk

, . . . , mn,k :=a

(k)n,k

a(k)kk

.

We have in general, assuming that a(k)kk 6= 0, the (i , k) multiplier

mik :=a

(k)ik

a(k)kk

i = k + 1, . . . , n

and, for all i , j = k + 1, . . . , n,

a(k+1)ij = a

(k)ij −mika

(k)kj ,

b(k+1)i = b

(k)i −mikb

(k)k .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 25 / 1

Page 77: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

What Gauss-Jordan does at Stage kThe elements a

(k)k+1,k , a

(k)k+2,k ,. . . , a

(k)nk are eliminated by subtracting the following

multiples of row k from rows k + 1, k + 2, . . . , n:

mk+1,k :=a

(k)k+1,k

a(k)kk

, mk+2,k :=a

(k)k+2,k

a(k)kk

, . . . , mn,k :=a

(k)n,k

a(k)kk

.

We have in general, assuming that a(k)kk 6= 0, the (i , k) multiplier

mik :=a

(k)ik

a(k)kk

i = k + 1, . . . , n

and, for all i , j = k + 1, . . . , n,

a(k+1)ij = a

(k)ij −mika

(k)kj ,

b(k+1)i = b

(k)i −mikb

(k)k .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 25 / 1

Page 78: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

A Picture of the Matrix at Stage kNote that rows 1, . . . , k will not change from stage k + 1 onwards.

akk

Reduce to zero

part of matrix that changes

Figure : Gauss-Jordan: changes at stage k.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 26 / 1

Page 79: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Gauss-Jordan1 def GaussJordan(A, b, pivoting = noPivot):

(rows, cols) = A.shapefor row in range(0, rows-1):pivot = pivoting(A, row)if abs(A[pivot, row]) < 1e-8: raise ValueError()

6 if pivot != row:A[[row, pivot],:] = A[[pivot, row],:]b[[row, pivot]] = b[[pivot, row]]

for i in range(row+1, rows):if abs(A[row, row]) < 1e-8: raise ValueError()

11 factor = A[i, row] / A[row, row]A[i, row+1:rows] = A[i, row+1:rows] -

factor*A[row, row+1:rows]b[i] = b[i] - factor*b[row]

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 27 / 1

Page 80: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Gauss-Jordan

The back-substitution can be written as:

for k in range(rows-1,-1,-1):2 b[k] = (b[k] - dot(A[k, k+1:rows], b[k+1:rows])) /

A[k, k]return b

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 28 / 1

Page 81: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Analysis of Gauss-JordanSee that Line 12 performs O(n) “multiply–accumulate” operations for n rows andn. If we see “multiplyaccumulate” as 1 operation, the number S(n) of operationsperformed is:

S(n) =n−1∑k=1

n∑i=k+1

n∑j=k+1

1

=n−1∑k=1

n∑i=k+1

(n − k)

=n−1∑k=1

(n − k)2

= (n − 1)2 + (n − 2)2 + · · ·+ 22 + 12

= n(n − 1)(2n − 1)/6 ≈ n3/3 for large n.

Hence Gauss-Jordan is a Θ(n3) process. (∑n−1

k=1 k2 = 1

6n(n − 1)(2n − 1) byinduction.)

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 29 / 1

Page 82: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Analysis of Gauss-JordanSee that Line 12 performs O(n) “multiply–accumulate” operations for n rows andn. If we see “multiplyaccumulate” as 1 operation, the number S(n) of operationsperformed is:

S(n) =n−1∑k=1

n∑i=k+1

n∑j=k+1

1

=n−1∑k=1

n∑i=k+1

(n − k)

=n−1∑k=1

(n − k)2

= (n − 1)2 + (n − 2)2 + · · ·+ 22 + 12

= n(n − 1)(2n − 1)/6 ≈ n3/3 for large n.

Hence Gauss-Jordan is a Θ(n3) process. (∑n−1

k=1 k2 = 1

6n(n − 1)(2n − 1) byinduction.)

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 29 / 1

Page 83: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

A Perspective on Gauss-Jordan

To put Θ(n3) into perspective, consider a single computer, which can sustain theperformance of 1011 operations per second (“100 gigaFLOPS”).

For a 10000× 10000 matrix, you need 1012 operations, or 10 seconds.

For a 100000× 100000 matrix, you need 1015 operations, or under 3 hours, if youcan store the 80 GB in RAM.

For a 1000000× 1000000 matrix, you need 1018 operations, or over 115 days, ifyou can store the 8 TB in RAM.

As you can test using your own laptopn, this is a very optimistic estimate.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 30 / 1

Page 84: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

A Perspective on Gauss-Jordan

To put Θ(n3) into perspective, consider a single computer, which can sustain theperformance of 1011 operations per second (“100 gigaFLOPS”).

For a 10000× 10000 matrix, you need 1012 operations, or 10 seconds.

For a 100000× 100000 matrix, you need 1015 operations, or under 3 hours, if youcan store the 80 GB in RAM.

For a 1000000× 1000000 matrix, you need 1018 operations, or over 115 days, ifyou can store the 8 TB in RAM.

As you can test using your own laptopn, this is a very optimistic estimate.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 30 / 1

Page 85: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

A Perspective on Gauss-Jordan

To put Θ(n3) into perspective, consider a single computer, which can sustain theperformance of 1011 operations per second (“100 gigaFLOPS”).

For a 10000× 10000 matrix, you need 1012 operations, or 10 seconds.

For a 100000× 100000 matrix, you need 1015 operations, or under 3 hours, if youcan store the 80 GB in RAM.

For a 1000000× 1000000 matrix, you need 1018 operations, or over 115 days, ifyou can store the 8 TB in RAM.

As you can test using your own laptopn, this is a very optimistic estimate.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 30 / 1

Page 86: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

A Perspective on Gauss-Jordan

To put Θ(n3) into perspective, consider a single computer, which can sustain theperformance of 1011 operations per second (“100 gigaFLOPS”).

For a 10000× 10000 matrix, you need 1012 operations, or 10 seconds.

For a 100000× 100000 matrix, you need 1015 operations, or under 3 hours, if youcan store the 80 GB in RAM.

For a 1000000× 1000000 matrix, you need 1018 operations, or over 115 days, ifyou can store the 8 TB in RAM.

As you can test using your own laptopn, this is a very optimistic estimate.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 30 / 1

Page 87: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

A Perspective on Gauss-Jordan

To put Θ(n3) into perspective, consider a single computer, which can sustain theperformance of 1011 operations per second (“100 gigaFLOPS”).

For a 10000× 10000 matrix, you need 1012 operations, or 10 seconds.

For a 100000× 100000 matrix, you need 1015 operations, or under 3 hours, if youcan store the 80 GB in RAM.

For a 1000000× 1000000 matrix, you need 1018 operations, or over 115 days, ifyou can store the 8 TB in RAM.

As you can test using your own laptopn, this is a very optimistic estimate.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 30 / 1

Page 88: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Net Effect. . .

Gauss-Jordan transforms the original system Ax = b to upper triangular form:

Ux =

a

(1)11 a

(1)12 · · · a

(1)1n

0 a(2)22

......

. . ....

0 0 · · · a(n)nn

x1

x2...xn

=

b

(1)1

b(2)2...

b(n)n

This system of equations can now be solved using back substitution.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 31 / 1

Page 89: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The Net Effect. . .

Gauss-Jordan transforms the original system Ax = b to upper triangular form:

Ux =

a

(1)11 a

(1)12 · · · a

(1)1n

0 a(2)22

......

. . ....

0 0 · · · a(n)nn

x1

x2...xn

=

b

(1)1

b(2)2...

b(n)n

This system of equations can now be solved using back substitution.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 31 / 1

Page 90: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Observations on Gauss-Jordan

Assumes a(k)kk 6= 0: but in fact since A is invertible, we could always swap row

k with a later row to get a(k)kk 6= 0 (see later).

A and b are overwritten.

The 0’s beneath the pivot element are not calculated.They are ignored, as they are known to be zero.

Thus the storage space for these zeros could be used for something else. . .

An extra matrix is not needed to store the mik ’s.They can be stored in place of the zeros.

The operations on b can be done separately, once we have stored the mik ’s.

Because of the last observation we may now solve for any b without goingthrough the elimination calculations again.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 32 / 1

Page 91: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Observations on Gauss-Jordan

Assumes a(k)kk 6= 0: but in fact since A is invertible, we could always swap row

k with a later row to get a(k)kk 6= 0 (see later).

A and b are overwritten.

The 0’s beneath the pivot element are not calculated.They are ignored, as they are known to be zero.

Thus the storage space for these zeros could be used for something else. . .

An extra matrix is not needed to store the mik ’s.They can be stored in place of the zeros.

The operations on b can be done separately, once we have stored the mik ’s.

Because of the last observation we may now solve for any b without goingthrough the elimination calculations again.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 32 / 1

Page 92: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Observations on Gauss-Jordan

Assumes a(k)kk 6= 0: but in fact since A is invertible, we could always swap row

k with a later row to get a(k)kk 6= 0 (see later).

A and b are overwritten.

The 0’s beneath the pivot element are not calculated.They are ignored, as they are known to be zero.

Thus the storage space for these zeros could be used for something else. . .

An extra matrix is not needed to store the mik ’s.They can be stored in place of the zeros.

The operations on b can be done separately, once we have stored the mik ’s.

Because of the last observation we may now solve for any b without goingthrough the elimination calculations again.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 32 / 1

Page 93: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Observations on Gauss-Jordan

Assumes a(k)kk 6= 0: but in fact since A is invertible, we could always swap row

k with a later row to get a(k)kk 6= 0 (see later).

A and b are overwritten.

The 0’s beneath the pivot element are not calculated.They are ignored, as they are known to be zero.

Thus the storage space for these zeros could be used for something else. . .

An extra matrix is not needed to store the mik ’s.They can be stored in place of the zeros.

The operations on b can be done separately, once we have stored the mik ’s.

Because of the last observation we may now solve for any b without goingthrough the elimination calculations again.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 32 / 1

Page 94: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Observations on Gauss-Jordan

Assumes a(k)kk 6= 0: but in fact since A is invertible, we could always swap row

k with a later row to get a(k)kk 6= 0 (see later).

A and b are overwritten.

The 0’s beneath the pivot element are not calculated.They are ignored, as they are known to be zero.

Thus the storage space for these zeros could be used for something else. . .

An extra matrix is not needed to store the mik ’s.They can be stored in place of the zeros.

The operations on b can be done separately, once we have stored the mik ’s.

Because of the last observation we may now solve for any b without goingthrough the elimination calculations again.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 32 / 1

Page 95: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Observations on Gauss-Jordan

Assumes a(k)kk 6= 0: but in fact since A is invertible, we could always swap row

k with a later row to get a(k)kk 6= 0 (see later).

A and b are overwritten.

The 0’s beneath the pivot element are not calculated.They are ignored, as they are known to be zero.

Thus the storage space for these zeros could be used for something else. . .

An extra matrix is not needed to store the mik ’s.They can be stored in place of the zeros.

The operations on b can be done separately, once we have stored the mik ’s.

Because of the last observation we may now solve for any b without goingthrough the elimination calculations again.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 32 / 1

Page 96: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Observations on Gauss-Jordan

Assumes a(k)kk 6= 0: but in fact since A is invertible, we could always swap row

k with a later row to get a(k)kk 6= 0 (see later).

A and b are overwritten.

The 0’s beneath the pivot element are not calculated.They are ignored, as they are known to be zero.

Thus the storage space for these zeros could be used for something else. . .

An extra matrix is not needed to store the mik ’s.They can be stored in place of the zeros.

The operations on b can be done separately, once we have stored the mik ’s.

Because of the last observation we may now solve for any b without goingthrough the elimination calculations again.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 32 / 1

Page 97: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Gauss-Jordan with Varying b′

We solved Ax = b using Gauss-Jordan which required elementary row operationsto be performed on both A and b.

If we are required to solve the equation Ax = b′ then we would need to performexactly the same operations because these are determined by the elements of Aonly, and A is the same in both equations.

Hence if we have stored the multipliers mik we need to perform only thelast-but-one line of Gauss-Jordan, i.e.,

bi := bi −mikbk , k = 1, . . . , n − 1, i = k + 1, . . . , n.

Page 98: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Gauss-Jordan with Varying b′

We solved Ax = b using Gauss-Jordan which required elementary row operationsto be performed on both A and b.

If we are required to solve the equation Ax = b′ then we would need to performexactly the same operations because these are determined by the elements of Aonly, and A is the same in both equations.

Hence if we have stored the multipliers mik we need to perform only thelast-but-one line of Gauss-Jordan, i.e.,

bi := bi −mikbk , k = 1, . . . , n − 1, i = k + 1, . . . , n.

Page 99: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Gauss-Jordan with Varying b′

We solved Ax = b using Gauss-Jordan which required elementary row operationsto be performed on both A and b.

If we are required to solve the equation Ax = b′ then we would need to performexactly the same operations because these are determined by the elements of Aonly, and A is the same in both equations.

Hence if we have stored the multipliers mik we need to perform only thelast-but-one line of Gauss-Jordan, i.e.,

bi := bi −mikbk , k = 1, . . . , n − 1, i = k + 1, . . . , n.

Page 100: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Gauss-Jordan with Varying b′

We solved Ax = b using Gauss-Jordan which required elementary row operationsto be performed on both A and b.

If we are required to solve the equation Ax = b′ then we would need to performexactly the same operations because these are determined by the elements of Aonly, and A is the same in both equations.

Hence if we have stored the multipliers mik we need to perform only thelast-but-one line of Gauss-Jordan, i.e.,

bi := bi −mikbk , k = 1, . . . , n − 1, i = k + 1, . . . , n.

Page 101: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The LU Decomposition of A

If at each stage k of Gauss-Jordan we store mik in those cells of A that becomezero then the A matrix after elimination would be as follows

a(1)11 a

(1)12 · · · a

(1)1n

m21 a(2)22

......

. . ....

mn1 mn2 · · · a(n)nn

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 34 / 1

Page 102: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

LU Decomposition of AWe define the upper and unit lower triangular parts as

U = (uij) =

a

(1)11 a

(1)12 · · · a

(1)1n

0 a(2)22

......

. . ....

0 0 · · · a(n)nn

, L = (`ij) =

1 0 · · · 0

m21 1...

.... . .

...mn1 mn2 · · · 1

.

That is, for all i , j ∈ {1, . . . , n},

uij =

{a

(i)ij if i ≤ j

0 otherwise

`ij =

mij if i > j1 if i = j0 otherwise

.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 35 / 1

Page 103: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

LU Decomposition of AWe define the upper and unit lower triangular parts as

U = (uij) =

a

(1)11 a

(1)12 · · · a

(1)1n

0 a(2)22

......

. . ....

0 0 · · · a(n)nn

, L = (`ij) =

1 0 · · · 0

m21 1...

.... . .

...mn1 mn2 · · · 1

.

That is, for all i , j ∈ {1, . . . , n},

uij =

{a

(i)ij if i ≤ j

0 otherwise

`ij =

mij if i > j1 if i = j0 otherwise

.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 35 / 1

Page 104: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

LU Decomposition of AWe define the upper and unit lower triangular parts as

U = (uij) =

a

(1)11 a

(1)12 · · · a

(1)1n

0 a(2)22

......

. . ....

0 0 · · · a(n)nn

, L = (`ij) =

1 0 · · · 0

m21 1...

.... . .

...mn1 mn2 · · · 1

.

That is, for all i , j ∈ {1, . . . , n},

uij =

{a

(i)ij if i ≤ j

0 otherwise

`ij =

mij if i > j1 if i = j0 otherwise

.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 35 / 1

Page 105: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

LU Decomposition of AWe define the upper and unit lower triangular parts as

U = (uij) =

a

(1)11 a

(1)12 · · · a

(1)1n

0 a(2)22

......

. . ....

0 0 · · · a(n)nn

, L = (`ij) =

1 0 · · · 0

m21 1...

.... . .

...mn1 mn2 · · · 1

.

That is, for all i , j ∈ {1, . . . , n},

uij =

{a

(i)ij if i ≤ j

0 otherwise

`ij =

mij if i > j1 if i = j0 otherwise

.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 35 / 1

Page 106: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Unexpected Fact: A = LUTheorem (LU Decomposition)

If L = (`ij) and U = (uij) are the upper and lower triangular matrices generated

by Gauss-Jordan, assuming a(k)kk 6= 0 at each stage, then

A = (aij) = LU, that is, aij =n∑

k=1

`ikukj

whereukj = a

(k)kj , k ≤ j , in particular, ukk = a

(k)kk

and`ik = mik , k ≤ i , `kk = 1,

and this decomposition is unique.

For proof, c.f., (Watkins, 2004, 51–53)Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 36 / 1

Page 107: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Unexpected Fact: A = LUTheorem (LU Decomposition)

If L = (`ij) and U = (uij) are the upper and lower triangular matrices generated

by Gauss-Jordan, assuming a(k)kk 6= 0 at each stage, then

A = (aij) = LU, that is, aij =n∑

k=1

`ikukj

whereukj = a

(k)kj , k ≤ j , in particular, ukk = a

(k)kk

and`ik = mik , k ≤ i , `kk = 1,

and this decomposition is unique.

For proof, c.f., (Watkins, 2004, 51–53)Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 36 / 1

Page 108: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Unexpected Fact: A = LUTheorem (LU Decomposition)

If L = (`ij) and U = (uij) are the upper and lower triangular matrices generated

by Gauss-Jordan, assuming a(k)kk 6= 0 at each stage, then

A = (aij) = LU, that is, aij =n∑

k=1

`ikukj

whereukj = a

(k)kj , k ≤ j , in particular, ukk = a

(k)kk

and`ik = mik , k ≤ i , `kk = 1,

and this decomposition is unique.

For proof, c.f., (Watkins, 2004, 51–53)Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 36 / 1

Page 109: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Unexpected Fact: A = LUTheorem (LU Decomposition)

If L = (`ij) and U = (uij) are the upper and lower triangular matrices generated

by Gauss-Jordan, assuming a(k)kk 6= 0 at each stage, then

A = (aij) = LU, that is, aij =n∑

k=1

`ikukj

whereukj = a

(k)kj , k ≤ j , in particular, ukk = a

(k)kk

and`ik = mik , k ≤ i , `kk = 1,

and this decomposition is unique.

For proof, c.f., (Watkins, 2004, 51–53)Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 36 / 1

Page 110: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Unexpected Fact: A = LUTheorem (LU Decomposition)

If L = (`ij) and U = (uij) are the upper and lower triangular matrices generated

by Gauss-Jordan, assuming a(k)kk 6= 0 at each stage, then

A = (aij) = LU, that is, aij =n∑

k=1

`ikukj

whereukj = a

(k)kj , k ≤ j , in particular, ukk = a

(k)kk

and`ik = mik , k ≤ i , `kk = 1,

and this decomposition is unique.

For proof, c.f., (Watkins, 2004, 51–53)Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 36 / 1

Page 111: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Unexpected Fact: A = LUTheorem (LU Decomposition)

If L = (`ij) and U = (uij) are the upper and lower triangular matrices generated

by Gauss-Jordan, assuming a(k)kk 6= 0 at each stage, then

A = (aij) = LU, that is, aij =n∑

k=1

`ikukj

whereukj = a

(k)kj , k ≤ j , in particular, ukk = a

(k)kk

and`ik = mik , k ≤ i , `kk = 1,

and this decomposition is unique.

For proof, c.f., (Watkins, 2004, 51–53)Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 36 / 1

Page 112: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

An Unexpected Fact: A = LUTheorem (LU Decomposition)

If L = (`ij) and U = (uij) are the upper and lower triangular matrices generated

by Gauss-Jordan, assuming a(k)kk 6= 0 at each stage, then

A = (aij) = LU, that is, aij =n∑

k=1

`ikukj

whereukj = a

(k)kj , k ≤ j , in particular, ukk = a

(k)kk

and`ik = mik , k ≤ i , `kk = 1,

and this decomposition is unique.

For proof, c.f., (Watkins, 2004, 51–53)Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 36 / 1

Page 113: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

A Reinterpretation of Gauss-Jordan

We can now interpret Gauss-Jordan as a process which decomposes A into L andU and hence we have

Ax = LUx = L(Ux) = Ly = b.

This represents two triangular systems of equations

Ly = b and Ux = y

whose solutions are:

y = L−1b, Ux = L−1b, x = U−1L−1b.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 37 / 1

Page 114: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

A Reinterpretation of Gauss-Jordan

We can now interpret Gauss-Jordan as a process which decomposes A into L andU and hence we have

Ax = LUx = L(Ux) = Ly = b.

This represents two triangular systems of equations

Ly = b and Ux = y

whose solutions are:

y = L−1b, Ux = L−1b, x = U−1L−1b.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 37 / 1

Page 115: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

A Reinterpretation of Gauss-Jordan

We can now interpret Gauss-Jordan as a process which decomposes A into L andU and hence we have

Ax = LUx = L(Ux) = Ly = b.

This represents two triangular systems of equations

Ly = b and Ux = y

whose solutions are:

y = L−1b, Ux = L−1b, x = U−1L−1b.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 37 / 1

Page 116: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Overall, we solve Ly = B for y first (”forward”), and solve y = Ux for x second(”backward”). The revised code is:

def LU(A, b):2 L, U = lu(A, permute_l=True)

y = zeros_like(b)for m, bi in enumerate(b.flatten()):

y[m] = biif m:

7 for n in xrange(m):y[m] -= y[n] * L[m, n]

y[m] /= L[m, m]

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 38 / 1

Page 117: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

1 x = zeros_like(b)for midx in xrange(B.size):

m = b.size - 1 - midxx[m] = y[m]if midx:

6 for nidx in xrange(midx):n = b.size - 1 - nidxx[m] -= x[n] * U[m, n]

x[m] /= U[m, m]return x

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 39 / 1

Page 118: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The LDU Decomposition of AGauss-Jordan also provides the decomposition

A = LDU ′,

where L and U ′ are unit lower and unit upper triangularand D = diag(uii ),the diagonal matrix with u11, . . . , unn as the diagonal entries.

To see this decompose A = LU and let U ′ = D−1U.

Since U is non-singular, uii 6= 0, i = 1, 2, . . . , n and hence D−1 exists.

It is easy to show that U ′ := D−1U is a unit upper triangular matrix.

Thus,A = LU = LDD−1U = LDU ′.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 40 / 1

Page 119: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The LDU Decomposition of AGauss-Jordan also provides the decomposition

A = LDU ′,

where L and U ′ are unit lower and unit upper triangularand D = diag(uii ),the diagonal matrix with u11, . . . , unn as the diagonal entries.

To see this decompose A = LU and let U ′ = D−1U.

Since U is non-singular, uii 6= 0, i = 1, 2, . . . , n and hence D−1 exists.

It is easy to show that U ′ := D−1U is a unit upper triangular matrix.

Thus,A = LU = LDD−1U = LDU ′.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 40 / 1

Page 120: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The LDU Decomposition of AGauss-Jordan also provides the decomposition

A = LDU ′,

where L and U ′ are unit lower and unit upper triangularand D = diag(uii ),the diagonal matrix with u11, . . . , unn as the diagonal entries.

To see this decompose A = LU and let U ′ = D−1U.

Since U is non-singular, uii 6= 0, i = 1, 2, . . . , n and hence D−1 exists.

It is easy to show that U ′ := D−1U is a unit upper triangular matrix.

Thus,A = LU = LDD−1U = LDU ′.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 40 / 1

Page 121: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The LDU Decomposition of AGauss-Jordan also provides the decomposition

A = LDU ′,

where L and U ′ are unit lower and unit upper triangularand D = diag(uii ),the diagonal matrix with u11, . . . , unn as the diagonal entries.

To see this decompose A = LU and let U ′ = D−1U.

Since U is non-singular, uii 6= 0, i = 1, 2, . . . , n and hence D−1 exists.

It is easy to show that U ′ := D−1U is a unit upper triangular matrix.

Thus,A = LU = LDD−1U = LDU ′.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 40 / 1

Page 122: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The LDU Decomposition of AGauss-Jordan also provides the decomposition

A = LDU ′,

where L and U ′ are unit lower and unit upper triangularand D = diag(uii ),the diagonal matrix with u11, . . . , unn as the diagonal entries.

To see this decompose A = LU and let U ′ = D−1U.

Since U is non-singular, uii 6= 0, i = 1, 2, . . . , n and hence D−1 exists.

It is easy to show that U ′ := D−1U is a unit upper triangular matrix.

Thus,A = LU = LDD−1U = LDU ′.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 40 / 1

Page 123: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The LDU Decomposition of AGauss-Jordan also provides the decomposition

A = LDU ′,

where L and U ′ are unit lower and unit upper triangularand D = diag(uii ),the diagonal matrix with u11, . . . , unn as the diagonal entries.

To see this decompose A = LU and let U ′ = D−1U.

Since U is non-singular, uii 6= 0, i = 1, 2, . . . , n and hence D−1 exists.

It is easy to show that U ′ := D−1U is a unit upper triangular matrix.

Thus,A = LU = LDD−1U = LDU ′.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 40 / 1

Page 124: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The LDU Decomposition for Special Kinds of A

If A is symmetric thenA = LDU ′ = LDLt ,

where L is unit lower triangular.

If A is symmetric and positive definite (that is, x tAx > 0 for all x)then each uii is positive and

A = LDLt = L√D√DLt = CC t ,

where C = L√D and

√D = diag(

√uii ).

This is called the Cholesky Factorization of A.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 41 / 1

Page 125: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The LDU Decomposition for Special Kinds of A

If A is symmetric thenA = LDU ′ = LDLt ,

where L is unit lower triangular.

If A is symmetric and positive definite (that is, x tAx > 0 for all x)then each uii is positive and

A = LDLt = L√D√DLt = CC t ,

where C = L√D and

√D = diag(

√uii ).

This is called the Cholesky Factorization of A.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 41 / 1

Page 126: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The LDU Decomposition for Special Kinds of A

If A is symmetric thenA = LDU ′ = LDLt ,

where L is unit lower triangular.

If A is symmetric and positive definite (that is, x tAx > 0 for all x)then each uii is positive and

A = LDLt = L√D√DLt = CC t ,

where C = L√D and

√D = diag(

√uii ).

This is called the Cholesky Factorization of A.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 41 / 1

Page 127: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The LDU Decomposition for Special Kinds of A

If A is symmetric thenA = LDU ′ = LDLt ,

where L is unit lower triangular.

If A is symmetric and positive definite (that is, x tAx > 0 for all x)then each uii is positive and

A = LDLt = L√D√DLt = CC t ,

where C = L√D and

√D = diag(

√uii ).

This is called the Cholesky Factorization of A.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 41 / 1

Page 128: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

The LDU Decomposition for Special Kinds of A

If A is symmetric thenA = LDU ′ = LDLt ,

where L is unit lower triangular.

If A is symmetric and positive definite (that is, x tAx > 0 for all x)then each uii is positive and

A = LDLt = L√D√DLt = CC t ,

where C = L√D and

√D = diag(

√uii ).

This is called the Cholesky Factorization of A.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 41 / 1

Page 129: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Pivoting in Gauss-Jordan

In Gauss-Jordan we assumed that a(k)kk 6= 0 at each stage of the process.

If a(k)kk = 0 then we can interchange rows of the matrix A(k) so that a

(k)kk 6= 0.

In fact, we need only find a row i > k for which a(k)kk 6= 0 and then interchange

rows i and k.

It can be easily shown that if A is non-singular then such a row exists.

Hence, theoretically, zero pivots cause no difficulty.

However, there is a much more important reason for interchanging rows:

if a(k)kk is small (even if a

(k)kk 6= 0)

then division by a(k)kk would cause problems because of roundoff.

We can see this in the next example.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 42 / 1

Page 130: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Pivoting in Gauss-Jordan

In Gauss-Jordan we assumed that a(k)kk 6= 0 at each stage of the process.

If a(k)kk = 0 then we can interchange rows of the matrix A(k) so that a

(k)kk 6= 0.

In fact, we need only find a row i > k for which a(k)kk 6= 0 and then interchange

rows i and k.

It can be easily shown that if A is non-singular then such a row exists.

Hence, theoretically, zero pivots cause no difficulty.

However, there is a much more important reason for interchanging rows:

if a(k)kk is small (even if a

(k)kk 6= 0)

then division by a(k)kk would cause problems because of roundoff.

We can see this in the next example.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 42 / 1

Page 131: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Pivoting in Gauss-Jordan

In Gauss-Jordan we assumed that a(k)kk 6= 0 at each stage of the process.

If a(k)kk = 0 then we can interchange rows of the matrix A(k) so that a

(k)kk 6= 0.

In fact, we need only find a row i > k for which a(k)kk 6= 0 and then interchange

rows i and k.

It can be easily shown that if A is non-singular then such a row exists.

Hence, theoretically, zero pivots cause no difficulty.

However, there is a much more important reason for interchanging rows:

if a(k)kk is small (even if a

(k)kk 6= 0)

then division by a(k)kk would cause problems because of roundoff.

We can see this in the next example.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 42 / 1

Page 132: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Pivoting in Gauss-Jordan

In Gauss-Jordan we assumed that a(k)kk 6= 0 at each stage of the process.

If a(k)kk = 0 then we can interchange rows of the matrix A(k) so that a

(k)kk 6= 0.

In fact, we need only find a row i > k for which a(k)kk 6= 0 and then interchange

rows i and k.

It can be easily shown that if A is non-singular then such a row exists.

Hence, theoretically, zero pivots cause no difficulty.

However, there is a much more important reason for interchanging rows:

if a(k)kk is small (even if a

(k)kk 6= 0)

then division by a(k)kk would cause problems because of roundoff.

We can see this in the next example.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 42 / 1

Page 133: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Pivoting in Gauss-Jordan

In Gauss-Jordan we assumed that a(k)kk 6= 0 at each stage of the process.

If a(k)kk = 0 then we can interchange rows of the matrix A(k) so that a

(k)kk 6= 0.

In fact, we need only find a row i > k for which a(k)kk 6= 0 and then interchange

rows i and k.

It can be easily shown that if A is non-singular then such a row exists.

Hence, theoretically, zero pivots cause no difficulty.

However, there is a much more important reason for interchanging rows:

if a(k)kk is small (even if a

(k)kk 6= 0)

then division by a(k)kk would cause problems because of roundoff.

We can see this in the next example.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 42 / 1

Page 134: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Pivoting in Gauss-Jordan

In Gauss-Jordan we assumed that a(k)kk 6= 0 at each stage of the process.

If a(k)kk = 0 then we can interchange rows of the matrix A(k) so that a

(k)kk 6= 0.

In fact, we need only find a row i > k for which a(k)kk 6= 0 and then interchange

rows i and k.

It can be easily shown that if A is non-singular then such a row exists.

Hence, theoretically, zero pivots cause no difficulty.

However, there is a much more important reason for interchanging rows:

if a(k)kk is small (even if a

(k)kk 6= 0)

then division by a(k)kk would cause problems because of roundoff.

We can see this in the next example.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 42 / 1

Page 135: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Pivoting in Gauss-Jordan

In Gauss-Jordan we assumed that a(k)kk 6= 0 at each stage of the process.

If a(k)kk = 0 then we can interchange rows of the matrix A(k) so that a

(k)kk 6= 0.

In fact, we need only find a row i > k for which a(k)kk 6= 0 and then interchange

rows i and k.

It can be easily shown that if A is non-singular then such a row exists.

Hence, theoretically, zero pivots cause no difficulty.

However, there is a much more important reason for interchanging rows:

if a(k)kk is small (even if a

(k)kk 6= 0)

then division by a(k)kk would cause problems because of roundoff.

We can see this in the next example.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 42 / 1

Page 136: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Pivoting in Gauss-Jordan

In Gauss-Jordan we assumed that a(k)kk 6= 0 at each stage of the process.

If a(k)kk = 0 then we can interchange rows of the matrix A(k) so that a

(k)kk 6= 0.

In fact, we need only find a row i > k for which a(k)kk 6= 0 and then interchange

rows i and k.

It can be easily shown that if A is non-singular then such a row exists.

Hence, theoretically, zero pivots cause no difficulty.

However, there is a much more important reason for interchanging rows:

if a(k)kk is small (even if a

(k)kk 6= 0)

then division by a(k)kk would cause problems because of roundoff.

We can see this in the next example.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 42 / 1

Page 137: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

More Roundoff ErrorThe problem with roundoff in Gauss-Jordan is that it propagates and is amplifiedfrom stage to stagebecause, there is no contraction of error.

Thus, roundoff error control is absolutely essential in Gauss-Jordan.

We indicate approaches to this, Partial Pivoting and Complete Pivoting.

Note: It can be shown that the step A(k) −→ A(k+1) in Gauss-Jordanmay be viewed as multiplication by a matrix M(k),where M(k) is a product of elementary matrices(matrices associated to elementary row operations).

It can be shown that if all the multipliers have magnitude < 1,then the final result will be accurate,as in our second approach to the example.

This is the basic idea of partial pivoting.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 43 / 1

Page 138: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

More Roundoff ErrorThe problem with roundoff in Gauss-Jordan is that it propagates and is amplifiedfrom stage to stagebecause, there is no contraction of error.

Thus, roundoff error control is absolutely essential in Gauss-Jordan.

We indicate approaches to this, Partial Pivoting and Complete Pivoting.

Note: It can be shown that the step A(k) −→ A(k+1) in Gauss-Jordanmay be viewed as multiplication by a matrix M(k),where M(k) is a product of elementary matrices(matrices associated to elementary row operations).

It can be shown that if all the multipliers have magnitude < 1,then the final result will be accurate,as in our second approach to the example.

This is the basic idea of partial pivoting.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 43 / 1

Page 139: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

More Roundoff ErrorThe problem with roundoff in Gauss-Jordan is that it propagates and is amplifiedfrom stage to stagebecause, there is no contraction of error.

Thus, roundoff error control is absolutely essential in Gauss-Jordan.

We indicate approaches to this, Partial Pivoting and Complete Pivoting.

Note: It can be shown that the step A(k) −→ A(k+1) in Gauss-Jordanmay be viewed as multiplication by a matrix M(k),where M(k) is a product of elementary matrices(matrices associated to elementary row operations).

It can be shown that if all the multipliers have magnitude < 1,then the final result will be accurate,as in our second approach to the example.

This is the basic idea of partial pivoting.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 43 / 1

Page 140: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

More Roundoff ErrorThe problem with roundoff in Gauss-Jordan is that it propagates and is amplifiedfrom stage to stagebecause, there is no contraction of error.

Thus, roundoff error control is absolutely essential in Gauss-Jordan.

We indicate approaches to this, Partial Pivoting and Complete Pivoting.

Note: It can be shown that the step A(k) −→ A(k+1) in Gauss-Jordanmay be viewed as multiplication by a matrix M(k),where M(k) is a product of elementary matrices(matrices associated to elementary row operations).

It can be shown that if all the multipliers have magnitude < 1,then the final result will be accurate,as in our second approach to the example.

This is the basic idea of partial pivoting.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 43 / 1

Page 141: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

More Roundoff ErrorThe problem with roundoff in Gauss-Jordan is that it propagates and is amplifiedfrom stage to stagebecause, there is no contraction of error.

Thus, roundoff error control is absolutely essential in Gauss-Jordan.

We indicate approaches to this, Partial Pivoting and Complete Pivoting.

Note: It can be shown that the step A(k) −→ A(k+1) in Gauss-Jordanmay be viewed as multiplication by a matrix M(k),where M(k) is a product of elementary matrices(matrices associated to elementary row operations).

It can be shown that if all the multipliers have magnitude < 1,then the final result will be accurate,as in our second approach to the example.

This is the basic idea of partial pivoting.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 43 / 1

Page 142: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

More Roundoff ErrorThe problem with roundoff in Gauss-Jordan is that it propagates and is amplifiedfrom stage to stagebecause, there is no contraction of error.

Thus, roundoff error control is absolutely essential in Gauss-Jordan.

We indicate approaches to this, Partial Pivoting and Complete Pivoting.

Note: It can be shown that the step A(k) −→ A(k+1) in Gauss-Jordanmay be viewed as multiplication by a matrix M(k),where M(k) is a product of elementary matrices(matrices associated to elementary row operations).

It can be shown that if all the multipliers have magnitude < 1,then the final result will be accurate,as in our second approach to the example.

This is the basic idea of partial pivoting.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 43 / 1

Page 143: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

More Roundoff ErrorThe problem with roundoff in Gauss-Jordan is that it propagates and is amplifiedfrom stage to stagebecause, there is no contraction of error.

Thus, roundoff error control is absolutely essential in Gauss-Jordan.

We indicate approaches to this, Partial Pivoting and Complete Pivoting.

Note: It can be shown that the step A(k) −→ A(k+1) in Gauss-Jordanmay be viewed as multiplication by a matrix M(k),where M(k) is a product of elementary matrices(matrices associated to elementary row operations).

It can be shown that if all the multipliers have magnitude < 1,then the final result will be accurate,as in our second approach to the example.

This is the basic idea of partial pivoting.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 43 / 1

Page 144: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

More Roundoff ErrorThe problem with roundoff in Gauss-Jordan is that it propagates and is amplifiedfrom stage to stagebecause, there is no contraction of error.

Thus, roundoff error control is absolutely essential in Gauss-Jordan.

We indicate approaches to this, Partial Pivoting and Complete Pivoting.

Note: It can be shown that the step A(k) −→ A(k+1) in Gauss-Jordanmay be viewed as multiplication by a matrix M(k),where M(k) is a product of elementary matrices(matrices associated to elementary row operations).

It can be shown that if all the multipliers have magnitude < 1,then the final result will be accurate,as in our second approach to the example.

This is the basic idea of partial pivoting.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 43 / 1

Page 145: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Scaled Partial PivotingScaled partial pivoting is a variation of standard partial pivoting.

In scaled partial pivoting, at stage k , we choose as the pivotthe entry in column kwhich is of greatest absolute value relative to the entries in its row(as before, we only consider rows k, . . . , n).

The scaled pivoting approach is useful when entries have large differences inabsolute value, since this causes propagation of roundoff error.

We use it for systems of linear equations where the row entries vary greatly inmagnitude, e.g., (

10 105 106

1 −1 3

)Here, it is worth transposing the two rows, since the current pivot, 10, is largerthan 1 but is very small relative to the other entries 105 and 106 in the first row.

Without a row swap, roundoff errors will lead to loss of accuracy.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 44 / 1

Page 146: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Scaled Partial PivotingScaled partial pivoting is a variation of standard partial pivoting.

In scaled partial pivoting, at stage k , we choose as the pivotthe entry in column kwhich is of greatest absolute value relative to the entries in its row(as before, we only consider rows k, . . . , n).

The scaled pivoting approach is useful when entries have large differences inabsolute value, since this causes propagation of roundoff error.

We use it for systems of linear equations where the row entries vary greatly inmagnitude, e.g., (

10 105 106

1 −1 3

)Here, it is worth transposing the two rows, since the current pivot, 10, is largerthan 1 but is very small relative to the other entries 105 and 106 in the first row.

Without a row swap, roundoff errors will lead to loss of accuracy.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 44 / 1

Page 147: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Scaled Partial PivotingScaled partial pivoting is a variation of standard partial pivoting.

In scaled partial pivoting, at stage k , we choose as the pivotthe entry in column kwhich is of greatest absolute value relative to the entries in its row(as before, we only consider rows k, . . . , n).

The scaled pivoting approach is useful when entries have large differences inabsolute value, since this causes propagation of roundoff error.

We use it for systems of linear equations where the row entries vary greatly inmagnitude, e.g., (

10 105 106

1 −1 3

)Here, it is worth transposing the two rows, since the current pivot, 10, is largerthan 1 but is very small relative to the other entries 105 and 106 in the first row.

Without a row swap, roundoff errors will lead to loss of accuracy.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 44 / 1

Page 148: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Scaled Partial PivotingScaled partial pivoting is a variation of standard partial pivoting.

In scaled partial pivoting, at stage k , we choose as the pivotthe entry in column kwhich is of greatest absolute value relative to the entries in its row(as before, we only consider rows k, . . . , n).

The scaled pivoting approach is useful when entries have large differences inabsolute value, since this causes propagation of roundoff error.

We use it for systems of linear equations where the row entries vary greatly inmagnitude, e.g., (

10 105 106

1 −1 3

)Here, it is worth transposing the two rows, since the current pivot, 10, is largerthan 1 but is very small relative to the other entries 105 and 106 in the first row.

Without a row swap, roundoff errors will lead to loss of accuracy.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 44 / 1

Page 149: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Scaled Partial PivotingScaled partial pivoting is a variation of standard partial pivoting.

In scaled partial pivoting, at stage k , we choose as the pivotthe entry in column kwhich is of greatest absolute value relative to the entries in its row(as before, we only consider rows k, . . . , n).

The scaled pivoting approach is useful when entries have large differences inabsolute value, since this causes propagation of roundoff error.

We use it for systems of linear equations where the row entries vary greatly inmagnitude, e.g., (

10 105 106

1 −1 3

)Here, it is worth transposing the two rows, since the current pivot, 10, is largerthan 1 but is very small relative to the other entries 105 and 106 in the first row.

Without a row swap, roundoff errors will lead to loss of accuracy.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 44 / 1

Page 150: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Scaled Partial PivotingScaled partial pivoting is a variation of standard partial pivoting.

In scaled partial pivoting, at stage k , we choose as the pivotthe entry in column kwhich is of greatest absolute value relative to the entries in its row(as before, we only consider rows k, . . . , n).

The scaled pivoting approach is useful when entries have large differences inabsolute value, since this causes propagation of roundoff error.

We use it for systems of linear equations where the row entries vary greatly inmagnitude, e.g., (

10 105 106

1 −1 3

)Here, it is worth transposing the two rows, since the current pivot, 10, is largerthan 1 but is very small relative to the other entries 105 and 106 in the first row.

Without a row swap, roundoff errors will lead to loss of accuracy.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 44 / 1

Page 151: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Scaled Partial PivotingScaled partial pivoting is a variation of standard partial pivoting.

In scaled partial pivoting, at stage k , we choose as the pivotthe entry in column kwhich is of greatest absolute value relative to the entries in its row(as before, we only consider rows k, . . . , n).

The scaled pivoting approach is useful when entries have large differences inabsolute value, since this causes propagation of roundoff error.

We use it for systems of linear equations where the row entries vary greatly inmagnitude, e.g., (

10 105 106

1 −1 3

)Here, it is worth transposing the two rows, since the current pivot, 10, is largerthan 1 but is very small relative to the other entries 105 and 106 in the first row.

Without a row swap, roundoff errors will lead to loss of accuracy.Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 44 / 1

Page 152: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Complete PivotingComplete pivoting (also called maximal pivoting) is a natural extension of partialpivoting whereby we find i∗ and j∗ such that |ai∗j∗ | = max

k≤i ,j≤n|aij |.

This means that we interchange rows i∗ and k and columns j∗ and k .

The row interchange does not have any effect (theoretically) on the solution butthe column interchange interchanges the variable names (labels) i.e., xj∗ ↔ xk .

These interchanges of columns must be recorded so that the correct variable isassociated with the corresponding solution value at the end of the algorithm.

Complete pivoting is an O(n2) process at each stage k .

Thus it adds O(n3) steps to Gauss-Jordan which is a substantial increase,although G.E. is still Θ(n3).

It is rarely used because it has been found in practice that partial pivoting isadequate to ensure numerical stability, except in isolated cases:in these cases, complete pivoting may be needed to attain acceptable accuracy.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 45 / 1

Page 153: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Complete PivotingComplete pivoting (also called maximal pivoting) is a natural extension of partialpivoting whereby we find i∗ and j∗ such that |ai∗j∗ | = max

k≤i ,j≤n|aij |.

This means that we interchange rows i∗ and k and columns j∗ and k .

The row interchange does not have any effect (theoretically) on the solution butthe column interchange interchanges the variable names (labels) i.e., xj∗ ↔ xk .

These interchanges of columns must be recorded so that the correct variable isassociated with the corresponding solution value at the end of the algorithm.

Complete pivoting is an O(n2) process at each stage k.

Thus it adds O(n3) steps to Gauss-Jordan which is a substantial increase,although G.E. is still Θ(n3).

It is rarely used because it has been found in practice that partial pivoting isadequate to ensure numerical stability, except in isolated cases:in these cases, complete pivoting may be needed to attain acceptable accuracy.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 45 / 1

Page 154: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Complete PivotingComplete pivoting (also called maximal pivoting) is a natural extension of partialpivoting whereby we find i∗ and j∗ such that |ai∗j∗ | = max

k≤i ,j≤n|aij |.

This means that we interchange rows i∗ and k and columns j∗ and k .

The row interchange does not have any effect (theoretically) on the solution butthe column interchange interchanges the variable names (labels) i.e., xj∗ ↔ xk .

These interchanges of columns must be recorded so that the correct variable isassociated with the corresponding solution value at the end of the algorithm.

Complete pivoting is an O(n2) process at each stage k.

Thus it adds O(n3) steps to Gauss-Jordan which is a substantial increase,although G.E. is still Θ(n3).

It is rarely used because it has been found in practice that partial pivoting isadequate to ensure numerical stability, except in isolated cases:in these cases, complete pivoting may be needed to attain acceptable accuracy.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 45 / 1

Page 155: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Complete PivotingComplete pivoting (also called maximal pivoting) is a natural extension of partialpivoting whereby we find i∗ and j∗ such that |ai∗j∗ | = max

k≤i ,j≤n|aij |.

This means that we interchange rows i∗ and k and columns j∗ and k .

The row interchange does not have any effect (theoretically) on the solution butthe column interchange interchanges the variable names (labels) i.e., xj∗ ↔ xk .

These interchanges of columns must be recorded so that the correct variable isassociated with the corresponding solution value at the end of the algorithm.

Complete pivoting is an O(n2) process at each stage k.

Thus it adds O(n3) steps to Gauss-Jordan which is a substantial increase,although G.E. is still Θ(n3).

It is rarely used because it has been found in practice that partial pivoting isadequate to ensure numerical stability, except in isolated cases:in these cases, complete pivoting may be needed to attain acceptable accuracy.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 45 / 1

Page 156: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Complete PivotingComplete pivoting (also called maximal pivoting) is a natural extension of partialpivoting whereby we find i∗ and j∗ such that |ai∗j∗ | = max

k≤i ,j≤n|aij |.

This means that we interchange rows i∗ and k and columns j∗ and k .

The row interchange does not have any effect (theoretically) on the solution butthe column interchange interchanges the variable names (labels) i.e., xj∗ ↔ xk .

These interchanges of columns must be recorded so that the correct variable isassociated with the corresponding solution value at the end of the algorithm.

Complete pivoting is an O(n2) process at each stage k.

Thus it adds O(n3) steps to Gauss-Jordan which is a substantial increase,although G.E. is still Θ(n3).

It is rarely used because it has been found in practice that partial pivoting isadequate to ensure numerical stability, except in isolated cases:in these cases, complete pivoting may be needed to attain acceptable accuracy.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 45 / 1

Page 157: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Complete PivotingComplete pivoting (also called maximal pivoting) is a natural extension of partialpivoting whereby we find i∗ and j∗ such that |ai∗j∗ | = max

k≤i ,j≤n|aij |.

This means that we interchange rows i∗ and k and columns j∗ and k .

The row interchange does not have any effect (theoretically) on the solution butthe column interchange interchanges the variable names (labels) i.e., xj∗ ↔ xk .

These interchanges of columns must be recorded so that the correct variable isassociated with the corresponding solution value at the end of the algorithm.

Complete pivoting is an O(n2) process at each stage k.

Thus it adds O(n3) steps to Gauss-Jordan which is a substantial increase,although G.E. is still Θ(n3).

It is rarely used because it has been found in practice that partial pivoting isadequate to ensure numerical stability, except in isolated cases:in these cases, complete pivoting may be needed to attain acceptable accuracy.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 45 / 1

Page 158: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Complete PivotingComplete pivoting (also called maximal pivoting) is a natural extension of partialpivoting whereby we find i∗ and j∗ such that |ai∗j∗ | = max

k≤i ,j≤n|aij |.

This means that we interchange rows i∗ and k and columns j∗ and k .

The row interchange does not have any effect (theoretically) on the solution butthe column interchange interchanges the variable names (labels) i.e., xj∗ ↔ xk .

These interchanges of columns must be recorded so that the correct variable isassociated with the corresponding solution value at the end of the algorithm.

Complete pivoting is an O(n2) process at each stage k.

Thus it adds O(n3) steps to Gauss-Jordan which is a substantial increase,although G.E. is still Θ(n3).

It is rarely used because it has been found in practice that partial pivoting isadequate to ensure numerical stability, except in isolated cases:in these cases, complete pivoting may be needed to attain acceptable accuracy.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 45 / 1

Page 159: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Complete PivotingComplete pivoting (also called maximal pivoting) is a natural extension of partialpivoting whereby we find i∗ and j∗ such that |ai∗j∗ | = max

k≤i ,j≤n|aij |.

This means that we interchange rows i∗ and k and columns j∗ and k .

The row interchange does not have any effect (theoretically) on the solution butthe column interchange interchanges the variable names (labels) i.e., xj∗ ↔ xk .

These interchanges of columns must be recorded so that the correct variable isassociated with the corresponding solution value at the end of the algorithm.

Complete pivoting is an O(n2) process at each stage k.

Thus it adds O(n3) steps to Gauss-Jordan which is a substantial increase,although G.E. is still Θ(n3).

It is rarely used because it has been found in practice that partial pivoting isadequate to ensure numerical stability, except in isolated cases:in these cases, complete pivoting may be needed to attain acceptable accuracy.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 45 / 1

Page 160: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Direct Methods: Conclusions

In theory, the complexity is can be decreased to that of matrix-matrixmultiplication.

Complete pivoting is safe (proven), but so computationally expensive, that itis not used.

Partial pivoting is safe with high probability,particularly if the scaled version is used (experimental result).

In practice, the various decompositions (LU, LDU, LUP, Cholesky, etc), areof particular importance, as they often allow for elegant solutions ofnon-trivial problems.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 46 / 1

Page 161: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods for Solving Systems of Linear Equations

Iterative methods successively improve an initial guess until it becomessatisfactory. The iterative solution of Ax = b requires the equation to bere-arranged into fixed point form as follows:

x = T (x) := Cx + d .

Since subscripts are traditionally used to indicate components of a vector, we willuse a superscript on the vector x to denote the iteration:

xk is the kth “guess” or iteration of the solution vector x .

Then xki denotes the value for the i th component xi at the kth iteration.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 47 / 1

Page 162: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods for Solving Systems of Linear Equations

Iterative methods successively improve an initial guess until it becomessatisfactory. The iterative solution of Ax = b requires the equation to bere-arranged into fixed point form as follows:

x = T (x) := Cx + d .

Since subscripts are traditionally used to indicate components of a vector, we willuse a superscript on the vector x to denote the iteration:

xk is the kth “guess” or iteration of the solution vector x .

Then xki denotes the value for the i th component xi at the kth iteration.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 47 / 1

Page 163: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods for Solving Systems of Linear Equations

Iterative methods successively improve an initial guess until it becomessatisfactory. The iterative solution of Ax = b requires the equation to bere-arranged into fixed point form as follows:

x = T (x) := Cx + d .

Since subscripts are traditionally used to indicate components of a vector, we willuse a superscript on the vector x to denote the iteration:

xk is the kth “guess” or iteration of the solution vector x .

Then xki denotes the value for the i th component xi at the kth iteration.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 47 / 1

Page 164: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods for Solving Systems of Linear Equations

Iterative methods successively improve an initial guess until it becomessatisfactory. The iterative solution of Ax = b requires the equation to bere-arranged into fixed point form as follows:

x = T (x) := Cx + d .

Since subscripts are traditionally used to indicate components of a vector, we willuse a superscript on the vector x to denote the iteration:

xk is the kth “guess” or iteration of the solution vector x .

Then xki denotes the value for the i th component xi at the kth iteration.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 47 / 1

Page 165: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

A Revision

The convergence of of is usually restricted to diagonally dominant matrices,because:

T is a contraction mapping ⇐⇒ the spectral radius r(C ) < 1, which is theabsolute value of C ’s largest eigenvalue.

A sufficient condition for this is that:for some matrix norm ‖ ‖, we have ‖C‖ < 1. This is the case for strictlydiagonally dominant matrices.

Then Banach’s Fixed Point Theorem tells us thatthe sequence (xk) defined by xk+1 := T (xk)will converge to a unique limit x , the solution of Ax = b.

Page 166: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

A Revision

The convergence of of is usually restricted to diagonally dominant matrices,because:

T is a contraction mapping ⇐⇒ the spectral radius r(C ) < 1, which is theabsolute value of C ’s largest eigenvalue.

A sufficient condition for this is that:for some matrix norm ‖ ‖, we have ‖C‖ < 1. This is the case for strictlydiagonally dominant matrices.

Then Banach’s Fixed Point Theorem tells us thatthe sequence (xk) defined by xk+1 := T (xk)will converge to a unique limit x , the solution of Ax = b.

Page 167: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Order of Convergence

Assume the sequence (xk) converges to the fixed point x and defineek+1 = x − xk+1, the error at the kth iteration.

Then we have

x − xk+1 = x − (Cxk + d)

= Cx + d − (Cxk + d) since x is a fixed point

= C (x − xk) by linearity of matrix multiplication.

Hence ‖ek+1‖ ≤ ‖C‖ ‖ek‖, i.e., linear order of convergence.

It is obvious that the smaller ‖C‖ is, the faster the iterations converge to asolution.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 49 / 1

Page 168: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Order of Convergence

Assume the sequence (xk) converges to the fixed point x and defineek+1 = x − xk+1, the error at the kth iteration.

Then we have

x − xk+1 = x − (Cxk + d)

= Cx + d − (Cxk + d) since x is a fixed point

= C (x − xk) by linearity of matrix multiplication.

Hence ‖ek+1‖ ≤ ‖C‖ ‖ek‖, i.e., linear order of convergence.

It is obvious that the smaller ‖C‖ is, the faster the iterations converge to asolution.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 49 / 1

Page 169: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Order of Convergence

Assume the sequence (xk) converges to the fixed point x and defineek+1 = x − xk+1, the error at the kth iteration.

Then we have

x − xk+1 = x − (Cxk + d)

= Cx + d − (Cxk + d) since x is a fixed point

= C (x − xk) by linearity of matrix multiplication.

Hence ‖ek+1‖ ≤ ‖C‖ ‖ek‖, i.e., linear order of convergence.

It is obvious that the smaller ‖C‖ is, the faster the iterations converge to asolution.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 49 / 1

Page 170: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Order of Convergence

Assume the sequence (xk) converges to the fixed point x and defineek+1 = x − xk+1, the error at the kth iteration.

Then we have

x − xk+1 = x − (Cxk + d)

= Cx + d − (Cxk + d) since x is a fixed point

= C (x − xk) by linearity of matrix multiplication.

Hence ‖ek+1‖ ≤ ‖C‖ ‖ek‖, i.e., linear order of convergence.

It is obvious that the smaller ‖C‖ is, the faster the iterations converge to asolution.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 49 / 1

Page 171: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Order of Convergence

Assume the sequence (xk) converges to the fixed point x and defineek+1 = x − xk+1, the error at the kth iteration.

Then we have

x − xk+1 = x − (Cxk + d)

= Cx + d − (Cxk + d) since x is a fixed point

= C (x − xk) by linearity of matrix multiplication.

Hence ‖ek+1‖ ≤ ‖C‖ ‖ek‖, i.e., linear order of convergence.

It is obvious that the smaller ‖C‖ is, the faster the iterations converge to asolution.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 49 / 1

Page 172: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Order of Convergence

Assume the sequence (xk) converges to the fixed point x and defineek+1 = x − xk+1, the error at the kth iteration.

Then we have

x − xk+1 = x − (Cxk + d)

= Cx + d − (Cxk + d) since x is a fixed point

= C (x − xk) by linearity of matrix multiplication.

Hence ‖ek+1‖ ≤ ‖C‖ ‖ek‖, i.e., linear order of convergence.

It is obvious that the smaller ‖C‖ is, the faster the iterations converge to asolution.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 49 / 1

Page 173: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Transforming Ax = b to x = Cx + d

A can be split to rewrite Ax = b in fixed point form x = Cx + d in a number ofways, incl. Jacobi and Gauss-Seidel.

In both cases, because of the way C is derived from A, it turns out that if A isdiagonally dominant, so is C : thus, if ‖A‖1 < 1 or ‖A‖∞ < 1, then C also hasnorm < 1and our sufficient condition for convergence of the sequence (xk) holds true.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 50 / 1

Page 174: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Transforming Ax = b to x = Cx + d

A can be split to rewrite Ax = b in fixed point form x = Cx + d in a number ofways, incl. Jacobi and Gauss-Seidel.

In both cases, because of the way C is derived from A, it turns out that if A isdiagonally dominant, so is C : thus, if ‖A‖1 < 1 or ‖A‖∞ < 1, then C also hasnorm < 1and our sufficient condition for convergence of the sequence (xk) holds true.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 50 / 1

Page 175: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Transforming Ax = b to x = Cx + d

A can be split to rewrite Ax = b in fixed point form x = Cx + d in a number ofways, incl. Jacobi and Gauss-Seidel.

In both cases, because of the way C is derived from A, it turns out that if A isdiagonally dominant, so is C : thus, if ‖A‖1 < 1 or ‖A‖∞ < 1, then C also hasnorm < 1and our sufficient condition for convergence of the sequence (xk) holds true.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 50 / 1

Page 176: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Jacobi MethodThis splits A as follows:

Ax = (A− D + D)x = b,

where D is diagonal formed from the diagonal elements of A. This leads to

C = −D−1(A− D) and d = +D−1b.

Each component of the new vector xk+1 can be calculated using A and b: fori := 1 to n do

xk+1i :=

1

aii

bi −i−1∑j=1

aijxkj −

n∑j=i+1

aijxkj

This iteration formula can be written in correction form as: for i := 1 to n do

xk+1i := xki +

1

aii

bi −n∑

j=1

aijxkj

.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 51 / 1

Page 177: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Jacobi MethodThis splits A as follows:

Ax = (A− D + D)x = b,

where D is diagonal formed from the diagonal elements of A. This leads to

C = −D−1(A− D) and d = +D−1b.

Each component of the new vector xk+1 can be calculated using A and b: fori := 1 to n do

xk+1i :=

1

aii

bi −i−1∑j=1

aijxkj −

n∑j=i+1

aijxkj

This iteration formula can be written in correction form as: for i := 1 to n do

xk+1i := xki +

1

aii

bi −n∑

j=1

aijxkj

.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 51 / 1

Page 178: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Jacobi MethodThis splits A as follows:

Ax = (A− D + D)x = b,

where D is diagonal formed from the diagonal elements of A. This leads to

C = −D−1(A− D) and d = +D−1b.

Each component of the new vector xk+1 can be calculated using A and b: fori := 1 to n do

xk+1i :=

1

aii

bi −i−1∑j=1

aijxkj −

n∑j=i+1

aijxkj

This iteration formula can be written in correction form as: for i := 1 to n do

xk+1i := xki +

1

aii

bi −n∑

j=1

aijxkj

.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 51 / 1

Page 179: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Jacobi MethodThis splits A as follows:

Ax = (A− D + D)x = b,

where D is diagonal formed from the diagonal elements of A. This leads to

C = −D−1(A− D) and d = +D−1b.

Each component of the new vector xk+1 can be calculated using A and b: fori := 1 to n do

xk+1i :=

1

aii

bi −i−1∑j=1

aijxkj −

n∑j=i+1

aijxkj

This iteration formula can be written in correction form as: for i := 1 to n do

xk+1i := xki +

1

aii

bi −n∑

j=1

aijxkj

.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 51 / 1

Page 180: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Jacobi Method

In terms of code:

def Jacobi(A, b, tol = 1e-10, limit = 100):x = zeros_like(b)for iteration in range(limit):next = zeros_like(x)

5 for i in range(A.shape[0]):s1 = dot(A[i, :i], x[:i])s2 = dot(A[i, i + 1:], x[i + 1:])next[i] = (b[i] - s1 - s2) / A[i, i]

if allclose(x, next, atol=tol): break10 x = next

return x

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 52 / 1

Page 181: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Gauss-Seidel MethodThis splits A as follows:

Ax = (L + D + U)x = b,

where L, U and D are the matrices formed from the sub-, super-, and diagonalelements of A, respectively. This leads to

C = −(D + L)−1U and d = (D + L)−1b.

Each component of the new vector xk+1 can be calculated using A and b: fori := 1 to n do

xk+1i :=

1

aii

bi −i−1∑j=1

aijxk+1j −

n∑j=i+1

aijxkj

.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 53 / 1

Page 182: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Gauss-Seidel MethodThis splits A as follows:

Ax = (L + D + U)x = b,

where L, U and D are the matrices formed from the sub-, super-, and diagonalelements of A, respectively. This leads to

C = −(D + L)−1U and d = (D + L)−1b.

Each component of the new vector xk+1 can be calculated using A and b: fori := 1 to n do

xk+1i :=

1

aii

bi −i−1∑j=1

aijxk+1j −

n∑j=i+1

aijxkj

.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 53 / 1

Page 183: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Gauss-Seidel MethodThis splits A as follows:

Ax = (L + D + U)x = b,

where L, U and D are the matrices formed from the sub-, super-, and diagonalelements of A, respectively. This leads to

C = −(D + L)−1U and d = (D + L)−1b.

Each component of the new vector xk+1 can be calculated using A and b: fori := 1 to n do

xk+1i :=

1

aii

bi −i−1∑j=1

aijxk+1j −

n∑j=i+1

aijxkj

.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 53 / 1

Page 184: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Gauss-Seidel Method

In terms of code:

def GaussSeidel(A, b, tol = 1e-10, limit = 100):x = zeros_like(b)for iteration in range(limit):

4 next = zeros_like(x)for i in range(A.shape[0]):

s1 = dot(A[i, :i], next[:i])s2 = dot(A[i, i + 1:], x[i + 1:])next[i] = (b[i] - s1 - s2) / A[i, i]

9 if allclose(x, next, rtol=tol): breakx = next

return x

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 54 / 1

Page 185: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Contrasting Jacobi and Gauss-SeidelGauss-Seidel uses a new component of x as soon as it becomes available,in contrast to the Jacobi method,which waits for all n new components before using any of them.

The correction form of the Gauss-Seidel iteration formula is

for i := 1 to n do

xk+1i := xki +

1

aii

bi −i−1∑j=1

aijxk+1j −

n∑j=i

aijxkj

.

In vector-matrix form this is

xk+1 = xk + D−1(b − Lxk+1 − (D + U)xk) = xk + D−1rk,k+1,

where rk,k+1 is the ‘residual’ after the kth iteration.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 55 / 1

Page 186: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Contrasting Jacobi and Gauss-SeidelGauss-Seidel uses a new component of x as soon as it becomes available,in contrast to the Jacobi method,which waits for all n new components before using any of them.

The correction form of the Gauss-Seidel iteration formula is

for i := 1 to n do

xk+1i := xki +

1

aii

bi −i−1∑j=1

aijxk+1j −

n∑j=i

aijxkj

.

In vector-matrix form this is

xk+1 = xk + D−1(b − Lxk+1 − (D + U)xk) = xk + D−1rk,k+1,

where rk,k+1 is the ‘residual’ after the kth iteration.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 55 / 1

Page 187: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Contrasting Jacobi and Gauss-SeidelGauss-Seidel uses a new component of x as soon as it becomes available,in contrast to the Jacobi method,which waits for all n new components before using any of them.

The correction form of the Gauss-Seidel iteration formula is

for i := 1 to n do

xk+1i := xki +

1

aii

bi −i−1∑j=1

aijxk+1j −

n∑j=i

aijxkj

.

In vector-matrix form this is

xk+1 = xk + D−1(b − Lxk+1 − (D + U)xk) = xk + D−1rk,k+1,

where rk,k+1 is the ‘residual’ after the kth iteration.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 55 / 1

Page 188: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Contrasting Jacobi and Gauss-SeidelGauss-Seidel uses a new component of x as soon as it becomes available,in contrast to the Jacobi method,which waits for all n new components before using any of them.

The correction form of the Gauss-Seidel iteration formula is

for i := 1 to n do

xk+1i := xki +

1

aii

bi −i−1∑j=1

aijxk+1j −

n∑j=i

aijxkj

.

In vector-matrix form this is

xk+1 = xk + D−1(b − Lxk+1 − (D + U)xk) = xk + D−1rk,k+1,

where rk,k+1 is the ‘residual’ after the kth iteration.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 55 / 1

Page 189: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Comparison of Iterative Methods

All have first order convergence, i.e., ‖ek+1‖ ≤ ‖C‖ ‖ek‖,where C depends on the method used.

The similarities between the methods can be seen most easily if we write them inmatrix correction form:

xk+1 = xk + D−1(b − Axk) = xk + D−1rk

(Jacobi: here rk is the residual after the kth iteration);

xk+1 = xk + D−1(b − Lxk+1 − (D + U)xk) = xk + D−1rk,k+1

(Gauss-Seidel: here rk,k+1 is the ‘residual’ after the kth iteration).

Thus the Jacobi and Gauss-Seidel use different approximations to the A−1 matrix.

In both cases, the rate of convergence slows down, as the the condition numberincreases.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 56 / 1

Page 190: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Comparison of Iterative Methods

All have first order convergence, i.e., ‖ek+1‖ ≤ ‖C‖ ‖ek‖,where C depends on the method used.

The similarities between the methods can be seen most easily if we write them inmatrix correction form:

xk+1 = xk + D−1(b − Axk) = xk + D−1rk

(Jacobi: here rk is the residual after the kth iteration);

xk+1 = xk + D−1(b − Lxk+1 − (D + U)xk) = xk + D−1rk,k+1

(Gauss-Seidel: here rk,k+1 is the ‘residual’ after the kth iteration).

Thus the Jacobi and Gauss-Seidel use different approximations to the A−1 matrix.

In both cases, the rate of convergence slows down, as the the condition numberincreases.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 56 / 1

Page 191: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Comparison of Iterative Methods

All have first order convergence, i.e., ‖ek+1‖ ≤ ‖C‖ ‖ek‖,where C depends on the method used.

The similarities between the methods can be seen most easily if we write them inmatrix correction form:

xk+1 = xk + D−1(b − Axk) = xk + D−1rk

(Jacobi: here rk is the residual after the kth iteration);

xk+1 = xk + D−1(b − Lxk+1 − (D + U)xk) = xk + D−1rk,k+1

(Gauss-Seidel: here rk,k+1 is the ‘residual’ after the kth iteration).

Thus the Jacobi and Gauss-Seidel use different approximations to the A−1 matrix.

In both cases, the rate of convergence slows down, as the the condition numberincreases.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 56 / 1

Page 192: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Comparison of Iterative Methods

All have first order convergence, i.e., ‖ek+1‖ ≤ ‖C‖ ‖ek‖,where C depends on the method used.

The similarities between the methods can be seen most easily if we write them inmatrix correction form:

xk+1 = xk + D−1(b − Axk) = xk + D−1rk

(Jacobi: here rk is the residual after the kth iteration);

xk+1 = xk + D−1(b − Lxk+1 − (D + U)xk) = xk + D−1rk,k+1

(Gauss-Seidel: here rk,k+1 is the ‘residual’ after the kth iteration).

Thus the Jacobi and Gauss-Seidel use different approximations to the A−1 matrix.

In both cases, the rate of convergence slows down, as the the condition numberincreases.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 56 / 1

Page 193: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Comparison of Iterative Methods

All have first order convergence, i.e., ‖ek+1‖ ≤ ‖C‖ ‖ek‖,where C depends on the method used.

The similarities between the methods can be seen most easily if we write them inmatrix correction form:

xk+1 = xk + D−1(b − Axk) = xk + D−1rk

(Jacobi: here rk is the residual after the kth iteration);

xk+1 = xk + D−1(b − Lxk+1 − (D + U)xk) = xk + D−1rk,k+1

(Gauss-Seidel: here rk,k+1 is the ‘residual’ after the kth iteration).

Thus the Jacobi and Gauss-Seidel use different approximations to the A−1 matrix.

In both cases, the rate of convergence slows down, as the the condition numberincreases.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 56 / 1

Page 194: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods in the Real World

There are much more sophisticated iterative methods, including conjugategradients (CG), generalised minimal residuals (GMRES), and numerousrandomised methods.

More importantly, there are sophisticated means of preconditioning, i.e., loweringthe condition number.

These fall outside of our scope, but we will provide the briefest of overviews ofeach.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 57 / 1

Page 195: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods in the Real World

There are much more sophisticated iterative methods, including conjugategradients (CG), generalised minimal residuals (GMRES), and numerousrandomised methods.

More importantly, there are sophisticated means of preconditioning, i.e., loweringthe condition number.

These fall outside of our scope, but we will provide the briefest of overviews ofeach.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 57 / 1

Page 196: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods in the Real World

There are much more sophisticated iterative methods, including conjugategradients (CG), generalised minimal residuals (GMRES), and numerousrandomised methods.

More importantly, there are sophisticated means of preconditioning, i.e., loweringthe condition number.

These fall outside of our scope, but we will provide the briefest of overviews ofeach.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 57 / 1

Page 197: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods in the Real World: Randomisation

If one draws an i.i.d. random matrix S ∈ Rm×q at each iteration, one can applyan step, where xk+1 is the best approximation of x∗ in a random space passingthrough xk :

xk+1 = arg minx∈Rn||x −x∗||2B subject to x = xk +B−1ATSy , y is free (6.1)

where B is an n × n positive definite matrix B used to define the B-inner productand the induced B-norm by

〈x , y〉B := 〈Bx , y〉, ‖x‖B :=√〈x , x〉B , (6.2)

where 〈·, ·〉 is the standard Euclidean inner product. As it turns out, one can provevery strong convergence results for such methods.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 58 / 1

Page 198: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods in the Real World: Krylov Subspace

CG and GMRES can be explained as Krylov subspace methods , with iteration

xk+1 := arg minx∈Rn||x − x∗||2B subject to x ∈ x0 +Kk+1, (6.3)

where Kk+1 ⊂ Rn is a (k + 1)–dimensional subspace and the constraintx ∈ x0 +Kk+1 is an affine space that contains x0.

GMRES uses B = ATA in the objective ‖x − x∗‖2B and

CG uses B = A.

Alternatively, one can think in terms of the CayleyHamilton theorem: for anyinvertible A there exists a polynomial q of degree n1, such that q(A) = A1. Ineach iteration, we increase the allowable degree by 1

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 59 / 1

Page 199: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods in the Real World: Krylov Subspace

CG and GMRES can be explained as Krylov subspace methods , with iteration

xk+1 := arg minx∈Rn||x − x∗||2B subject to x ∈ x0 +Kk+1, (6.3)

where Kk+1 ⊂ Rn is a (k + 1)–dimensional subspace and the constraintx ∈ x0 +Kk+1 is an affine space that contains x0.

GMRES uses B = ATA in the objective ‖x − x∗‖2B and

CG uses B = A.

Alternatively, one can think in terms of the CayleyHamilton theorem: for anyinvertible A there exists a polynomial q of degree n1, such that q(A) = A1. Ineach iteration, we increase the allowable degree by 1

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 59 / 1

Page 200: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods in the Real World: Krylov Subspace

CG and GMRES can be explained as Krylov subspace methods , with iteration

xk+1 := arg minx∈Rn||x − x∗||2B subject to x ∈ x0 +Kk+1, (6.3)

where Kk+1 ⊂ Rn is a (k + 1)–dimensional subspace and the constraintx ∈ x0 +Kk+1 is an affine space that contains x0.

GMRES uses B = ATA in the objective ‖x − x∗‖2B and

CG uses B = A.

Alternatively, one can think in terms of the CayleyHamilton theorem: for anyinvertible A there exists a polynomial q of degree n1, such that q(A) = A1. Ineach iteration, we increase the allowable degree by 1

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 59 / 1

Page 201: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods in the Real World: Krylov Subspace

CG and GMRES can be explained as Krylov subspace methods , with iteration

xk+1 := arg minx∈Rn||x − x∗||2B subject to x ∈ x0 +Kk+1, (6.3)

where Kk+1 ⊂ Rn is a (k + 1)–dimensional subspace and the constraintx ∈ x0 +Kk+1 is an affine space that contains x0.

GMRES uses B = ATA in the objective ‖x − x∗‖2B and

CG uses B = A.

Alternatively, one can think in terms of the CayleyHamilton theorem: for anyinvertible A there exists a polynomial q of degree n1, such that q(A) = A1. Ineach iteration, we increase the allowable degree by 1

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 59 / 1

Page 202: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods in the Real World: Krylov Subspace

CG and GMRES can be explained as Krylov subspace methods , with iteration

xk+1 := arg minx∈Rn||x − x∗||2B subject to x ∈ x0 +Kk+1, (6.3)

where Kk+1 ⊂ Rn is a (k + 1)–dimensional subspace and the constraintx ∈ x0 +Kk+1 is an affine space that contains x0.

GMRES uses B = ATA in the objective ‖x − x∗‖2B and

CG uses B = A.

Alternatively, one can think in terms of the CayleyHamilton theorem: for anyinvertible A there exists a polynomial q of degree n1, such that q(A) = A1. Ineach iteration, we increase the allowable degree by 1

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 59 / 1

Page 203: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods in the Real World: Preconditioning

Many people solve P−1(Ax − b) = 0 instead of Ax − b = 0, with the hope thatP−1A has a lower the condition number than A:

xk+1 = xk − γkP−1(Axk − b). (6.4)

A non-singular preconditioner P is often problem-specific and applied in amatrix-free fashion, i.e., without ever instantiating P.

For example, Jacobi preconditioner uses P = diag(A).

Many other preconditioners approximate A−1.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 60 / 1

Page 204: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods in the Real World: Preconditioning

Many people solve P−1(Ax − b) = 0 instead of Ax − b = 0, with the hope thatP−1A has a lower the condition number than A:

xk+1 = xk − γkP−1(Axk − b). (6.4)

A non-singular preconditioner P is often problem-specific and applied in amatrix-free fashion, i.e., without ever instantiating P.

For example, Jacobi preconditioner uses P = diag(A).

Many other preconditioners approximate A−1.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 60 / 1

Page 205: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods in the Real World: Preconditioning

Many people solve P−1(Ax − b) = 0 instead of Ax − b = 0, with the hope thatP−1A has a lower the condition number than A:

xk+1 = xk − γkP−1(Axk − b). (6.4)

A non-singular preconditioner P is often problem-specific and applied in amatrix-free fashion, i.e., without ever instantiating P.

For example, Jacobi preconditioner uses P = diag(A).

Many other preconditioners approximate A−1.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 60 / 1

Page 206: Lecture Topic: Systems of Linear Equations - IBM › researcher › files › ie... · 2015-10-08 · Introduction: systems of linear equations We examine both iterative and direct

Iterative Methods in the Real World: Preconditioning

Many people solve P−1(Ax − b) = 0 instead of Ax − b = 0, with the hope thatP−1A has a lower the condition number than A:

xk+1 = xk − γkP−1(Axk − b). (6.4)

A non-singular preconditioner P is often problem-specific and applied in amatrix-free fashion, i.e., without ever instantiating P.

For example, Jacobi preconditioner uses P = diag(A).

Many other preconditioners approximate A−1.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software October 8, 2015 60 / 1