functions of a matrix: theory and computationhigham/talks/talk06_funm.pdf · functions of a matrix:...
TRANSCRIPT
![Page 1: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/1.jpg)
Functions of a Matrix:
Theory and Computation
Nick HighamSchool of Mathematics
The University of Manchester
http://www.ma.man.ac.uk/~higham/
Landscapes in Mathematical Science,University of Bath, November 24 2006
![Page 2: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/2.jpg)
Outline
1 Definition of f (A)
2 Motivation and MATLAB
3 eA and its Frechét derivative
4 A1/2: Modified Newton Methods
MIMS Nick Higham Functions of a Matrix 2 / 45
![Page 3: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/3.jpg)
Function of a Matrix
f : Cn×n 7→ C
n×n for an underlying scalar function f .
These are not matrix functions:
trace(A), det(A).
The adjugate (or adjoint) matrix.
Transfer function f (t) = B(tI − A)−1C.
sin A = (sin aij).
These are matrix functions:
eA = I + A +A2
2!+ · · · .
log(I + A) = A− A2
2+
A3
3+ · · · , ρ(A) < 1.
A−1, A1/2.
MIMS Nick Higham Functions of a Matrix 3 / 45
![Page 4: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/4.jpg)
Multiplicity of Definitions
There have been proposed in the literature since 1880
eight distinct definitions of a matric function,
by Weyr, Sylvester and Buchheim,
Giorgi, Cartan, Fantappiè, Cipolla,
Schwerdtfeger and Richter.
— R. F. Rinehart,The Equivalence of Definitions of a Matric Function,
Amer. Math. Monthly (1955)
MIMS Nick Higham Functions of a Matrix 4 / 45
![Page 5: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/5.jpg)
Jordan Canonical Form
Z−1AZ = J = diag(J1, . . . , Jp), Jk︸︷︷︸mk×mk
=
λk 1
λk. . .. . . 1
λk
Definition
f (A) = Zf (J)Z−1 = Zdiag(f (Jk))Z−1,
f (Jk) =
f (λk) f ′(λk) . . .f (mk−1))(λk)
(mk − 1)!
f (λk). . .
.... . . f ′(λk)
f (λk)
.
MIMS Nick Higham Functions of a Matrix 5 / 45
![Page 6: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/6.jpg)
The Formula for f (Jk)
Write Jk = λk I + Ek ∈ Cmk×mk . For mk = 3 we have
Ek =
0 1 0
0 0 1
0 0 0
, E2
k =
0 0 1
0 0 0
0 0 0
, E3
k = 0.
Assume f has Taylor expansion
f (t) = f (λk) + f ′(λk)(t − λk) + · · ·+ f (j)(λk)(t − λk)j
j!+ · · · .
Then
f (Jk) = f (λk)I + f ′(λk)Ek + · · ·+ f (mk−1)(λk)Emk−1k
(mk − 1)!.
MIMS Nick Higham Functions of a Matrix 6 / 45
![Page 7: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/7.jpg)
Interpolation
Definition (Sylvester, 1883; Buchheim, 1886)
Distinct e’vals λ1, . . . , λs, ni = max size of Jordan blocks for
λi . Then f (A) = r(A), where r is unique Hermite
interpolating poly of degree <∑s
i=1 ni satisfying
r (j)(λi) = f (j)(λi), j = 0 : ni − 1, i = 1 : s.
MIMS Nick Higham Functions of a Matrix 7 / 45
![Page 8: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/8.jpg)
Interpolation
Definition (Sylvester, 1883; Buchheim, 1886)
Distinct e’vals λ1, . . . , λs, ni = max size of Jordan blocks for
λi . Then f (A) = r(A), where r is unique Hermite
interpolating poly of degree <∑s
i=1 ni satisfying
r (j)(λi) = f (j)(λi), j = 0 : ni − 1, i = 1 : s.
Example. Let f (t) = t1/2, A =
[2 2
1 3
], λ(A) = {1, 4}.
Taking +ve square roots,
r(t) = f (1)t − 4
1− 4+ f (4)
t − 1
4− 1=
1
3(t + 2).
⇒ A1/2 = r(A) =1
3(A + 2I) =
1
3
[4 2
1 5
].
MIMS Nick Higham Functions of a Matrix 7 / 45
![Page 9: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/9.jpg)
Cauchy Integral Theorem
Definition
f (A) =1
2πi
∫
Γ
f (z)(zI − A)−1 dz,
where f is analytic on and inside a closed contour Γ that
encloses λ(A).
MIMS Nick Higham Functions of a Matrix 8 / 45
![Page 10: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/10.jpg)
Equivalence of Definitions
Theorem
The three definitions are equivalent, modulo analyticity
assumption for Cauchy.
Interpolation: for basic properties.
JCF: for solving matrix equations (e.g., X 2 = A,
eX = A). For evaluation (normal A).
Cauchy: various uses.
MIMS Nick Higham Functions of a Matrix 9 / 45
![Page 11: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/11.jpg)
Equivalence of Definitions
Theorem
The three definitions are equivalent, modulo analyticity
assumption for Cauchy.
Interpolation: for basic properties.
JCF: for solving matrix equations (e.g., X 2 = A,
eX = A). For evaluation (normal A).
Cauchy: various uses.
For computation:
Use the definitions (with care).
Schur decomposition for general f .
Methods specific to particular f .
MIMS Nick Higham Functions of a Matrix 9 / 45
![Page 12: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/12.jpg)
Outline
1 Definition of f (A)
2 Motivation and MATLAB
3 eA and its Frechét derivative
4 A1/2: Modified Newton Methods
MIMS Nick Higham Functions of a Matrix 10 / 45
![Page 13: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/13.jpg)
Toolbox of Matrix Functions
Want to have techniques for evaluating interesting f at
matrix args as well as scalar args.
Example:
d2y
dt2+ Ay = 0, y(0) = y0, y ′(0) = y ′
0
has solution
y(t) = cos(√
At)y0 +(√
A)−1
sin(√
At)y ′0,
where√
A is any square root of A.
MATLAB has expm, logm, sqrtm, funm.
MIMS Nick Higham Functions of a Matrix 11 / 45
![Page 14: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/14.jpg)
Classic MATLAB
< M A T L A B >
Version of 01/10/84
HELP is available
<>
help
Type HELP followed by
INTRO (To get started)
NEWS (recent revisions)
ABS ANS ATAN BASE CHAR CHOL CHOP CLEA COND CONJ COS
DET DIAG DIAR DISP EDIT EIG ELSE END EPS EXEC EXIT
EXP EYE FILE FLOP FLPS FOR FUN HESS HILB IF IMAG
INV KRON LINE LOAD LOG LONG LU MACR MAGI NORM ONES
ORTH PINV PLOT POLY PRIN PROD QR RAND RANK RCON RAT
REAL RETU RREF ROOT ROUN SAVE SCHU SHOR SEMI SIN SIZE
SQRT STOP SUM SVD TRIL TRIU USER WHAT WHIL WHO WHY
< > ( ) = . , ; \ / ’ + - * :
MIMS Nick Higham Functions of a Matrix 12 / 45
![Page 15: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/15.jpg)
Classic MATLAB
<>
help fun
FUN For matrix arguments X , the functions SIN, COS, ATAN,
SQRT, LOG, EXP and X**p are computed using eigenvalues D
and eigenvectors V . If <V,D> = EIG(X) then f(X) =
V*f(D)/V . This method may give inaccurate results if V
is badly conditioned. Some idea of the accuracy can be
obtained by comparing X**1 with X .
For vector arguments, the function is applied to each
component.
The availability of [FUN] in early versions of MATLAB
quite possibly contributed to
the system’s technical and commercial success.
— Cleve Moler (2003)
MIMS Nick Higham Functions of a Matrix 13 / 45
![Page 16: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/16.jpg)
Outline
1 Definition of f (A)
2 Motivation and MATLAB
3 eA and its Frechét derivative
4 A1/2: Modified Newton Methods
MIMS Nick Higham Functions of a Matrix 14 / 45
![Page 17: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/17.jpg)
Matrix Exponential
Large literature.
Early exposition in Frazer, Duncan & Collar.
Elementary Matrices and Some Applications toDynamics and Differential Equations. CUP, 1938.
Moler & Van Loan.
Nineteen dubious ways to compute the exponentialof a matrix, twenty-five years later, SIAM Rev., 45
(2003).
Over 500 citations on ISI Citation Index.
MIMS Nick Higham Functions of a Matrix 15 / 45
![Page 18: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/18.jpg)
Scaling and Squaring Method
◮ B ← A/2s so ‖B‖∞ ≈ 1
◮ rm(B) = [m/m] Padé approximant to eB
◮ X = rm(B)2s ≈ eA
Used by expm in MATLAB.
Originates with Lawson (1967).
Ward (1977): algorithm, with rounding error analysis
and a posteriori error bound.
Moler & Van Loan (1978): give backward error
analysis covering truncation error in Padé
approximants, allowing choice of s and m.
MIMS Nick Higham Functions of a Matrix 16 / 45
![Page 19: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/19.jpg)
Padé Approximations rm to ex
rm(x) = pm(x)/qm(x) known explicitly:
pm(x) =m∑
j=0
(2m − j)!m!
(2m)! (m − j)!
x j
j!
and qm(x) = pm(−x). Error satisfies
ex−rm(x) = (−1)m (m!)2
(2m)!(2m + 1)!x2m+1+O(x2m+2).
MIMS Nick Higham Functions of a Matrix 17 / 45
![Page 20: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/20.jpg)
Analysis
Let
e−Arm(A) = I + G = eH
and assume ‖G‖ < 1. Then
‖H‖ = ‖ log(I + G)‖ ≤∞∑
j=1
‖G‖j/j = − log(1− ‖G‖).
Hence
rm(A) = eAeH = eA+H .
Rewrite as
rm(A/2s)2s
= eA+E ,
where E = 2sH satisfies
‖E‖ ≤ −2s log(1− ‖G‖).
MIMS Nick Higham Functions of a Matrix 18 / 45
![Page 21: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/21.jpg)
Result
Theorem
Let
e−2−sA rm(2−sA) = I + G,
where ‖G‖ < 1. Then the Padé approximant rm satisfies
rm(2−sA)2s
= eA+E ,
where‖E‖‖A‖ ≤
− log(1− ‖G‖)‖2−sA‖ .
◮ Need to bound ‖G‖ given m and ‖2−sA‖.◮ Then ‖E‖/‖A‖ bounded in terms of m, s and ‖A‖.◮ Now select “best” (s, m) pair for given ‖A‖.
MIMS Nick Higham Functions of a Matrix 19 / 45
![Page 22: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/22.jpg)
Bounding ‖G‖
ρ(x) := e−x rm(x)− 1 =∞∑
i=2m+1
cixi
converges absolutely for |x | < min{ |t | : qm(t) = 0 } =: νm.
Hence, with θ := ‖2−sA‖ < νm,
‖G‖ = ‖ρ(2−sA)‖ ≤∞∑
i=2m+1
|ci |θi =: f (θ). (∗)
Thus ‖E‖/‖A‖ ≤ − log(1− f (θ))/θ) .
◮ If only ‖A‖ known, (∗) is optimal bound on ‖G‖.◮ Moler & Van Loan (1978) bound less sharp;
Dieci & Papini (2000) bound a different error.
MIMS Nick Higham Functions of a Matrix 20 / 45
![Page 23: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/23.jpg)
Summary of Computations
Solve “bound = unit roundoff” by summing 150 terms of
series in 250 digit arithmetic.
Work out cost of evaluating rm(B) and of the squaring
phase.
Minimize cost subject to retaining numerical stability in
evaluation of rm(B).
Precision m θm
IEEE single 7 3.9
IEEE double 13 5.4
IEEE quad 17 3.3
MIMS Nick Higham Functions of a Matrix 21 / 45
![Page 24: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/24.jpg)
Scaling and Squaring Algorithm for eA
Algorithm 1 (H, 2005; MATLAB 7.2, Mathematica 5.1)
1 for m = [3 5 7 9]2 if ‖A‖1 ≤ θm
3 X = rm(A).4 quit
5 end
6 end
7 A← A/2s with s ≥ 0 minimal s.t. ‖A/2s‖1 ≤ θ13 = 5.48 A2 = A2, A4 = A2
2, A6 = A2A4
9 U = A[A6(b13A6 + b11A4 + b9A2) + b7A6 + b5A4 + b3A2 + b1I
]
10 V = A6(b12A6 + b10A4 + b8A2) + b6A6 + b4A4 + b2A2 + b0I
11 Solve (−U + V )r13 = U + V for r13.
12 X = r132s
by repeated squaring.
MIMS Nick Higham Functions of a Matrix 22 / 45
![Page 25: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/25.jpg)
Comparison with Existing Algorithms
Method m max ‖2−sA‖Alg 1 13 5.4
Ward (1977) 8 1.0 [θ8 = 1.5]
Old MATLAB expm 6 0.5 [θ6 = 0.54]
Sidje (1998) 6 0.5
◮ ‖A‖1 > 1: Alg 1 requires 1–2 fewer mat mults than
Ward, 2–3 fewer than expm.
◮ ‖A‖1 ∈ (2, 2.1):Alg 1 Ward expm Sidje
mults 5 7 8 10
MIMS Nick Higham Functions of a Matrix 23 / 45
![Page 26: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/26.jpg)
Squaring Phase
◮ The bound
‖A2 − fl(A2)‖ ≤ γn‖A‖2, γn =nu
1− nu.
shows the dangers in matrix squaring.
◮ Open question: are errors in squaring phase
consistent with conditioning of the problem?
◮ Our choice of parameters uses 1–5 fewer matrix
squarings than previous algorithms. Hence has
potential accuracy advantages.
MIMS Nick Higham Functions of a Matrix 24 / 45
![Page 27: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/27.jpg)
Numerical Experiment
◮ 70 test matrices, dimension 2–10.
◮ Evaluated 1-norm relative error ‖X − X‖1/‖X‖1.
◮ Notation:
◮ expm: Alg 1 (MATLAB 7.2).
◮ old_expm: MATLAB 7.1.
◮ funm: MATLAB 7.
◮ padm: Sidje.
◮ cond(A) = limǫ→0
max‖E‖2≤ǫ‖A‖2
‖eA+E − eA‖2
ǫ‖eA‖2
.
MIMS Nick Higham Functions of a Matrix 25 / 45
![Page 28: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/28.jpg)
Different S&S Codes and funm
0 10 20 30 40 50 60 70
10−18
10−16
10−14
10−12
10−10
10−8
10−6
old_expm
padm
funm
expm (Alg 1)
cond*u
MIMS Nick Higham Functions of a Matrix 26 / 45
![Page 29: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/29.jpg)
Performance Profiles
Dolan & Moré (2002) propose a new type of performance
profile.
For the given set of solvers and test problems, plot
x-axis: α
y -axis: probability that solver has error within factor αof smallest error over all solvers on the test set.
MIMS Nick Higham Functions of a Matrix 27 / 45
![Page 30: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/30.jpg)
Performance Profile
1 1.5 2 2.5 3 3.5 4 4.5 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
expm (Alg 1)
padm
funm
old_expm
α
p
MIMS Nick Higham Functions of a Matrix 28 / 45
![Page 31: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/31.jpg)
Frechét Derivative
Fréchet derivative of f : Cn×n → C
n×n at X ∈ Cn×n
A linear mapping L : Cn×n → C
n×n s.t. for all E ∈ Cn×n
f (X + E)− f (X )− L(X , E) = o(‖E‖).
Example For f (X ) = X 2 we have
f (X + E)− f (X ) = XE + EX + E2,
so L(X , E) = XE + EX .
MIMS Nick Higham Functions of a Matrix 29 / 45
![Page 32: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/32.jpg)
Frechét Derivative of eA
L(A, E) =
∫ 1
0
eA(1−s)EeAs ds.
‖L(A)‖ := max ‖L(A, E)‖/‖E‖.
Condition no. of the exponential: κexp(A) =‖L(A)‖‖A‖‖eA‖ .
‖L(A)‖ ≥ ‖L(A, I)‖ = ‖eA‖ ⇒ κexp(A) ≥ ‖A‖ .
Theorem
If A ∈ Cn×n is normal then in the 2-norm,
κexp(A) = ‖A‖2.
If A ∈ Rn×n is a nonnegative scalar multiple of a
stochastic matrix then in the∞-norm, κexp(A) = ‖A‖∞.
MIMS Nick Higham Functions of a Matrix 30 / 45
![Page 33: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/33.jpg)
Quadrature
Repeated trap rule:∫ 1
0f (t) dt ≈ 1
m(1
2f0 + f1 + f2 + · · ·+ fm−1 + 1
2fm), fi := f (i/m)
gives L(A, E) ≈ 1m
(12eAE +
∑m−1i=1 eA(1−i/m)EeAi/m + 1
2EeA
).
Requires eA/m, e2A/m, . . . , e(m−1)A/m, eA.
Lemma
Consider R1(A, E) =∑p
i=1 wi eA(1−ti )EeAti and denote its
m-times repeated form by Rm(A, E). If
Qs = R1(2−sA, 2−sE),
Qi−1 = e2−i AQi + Qi e2−i A, i = s : −1 : 1,
then Q0 = R2s(A, E) .
MIMS Nick Higham Functions of a Matrix 31 / 45
![Page 34: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/34.jpg)
Scaling and Squaring
Recall Lx2(A, E) = AE + EA.
Applying chain rule to eA = (eA/2)2 gives
Lexp(A, E) = Lx2
(eA/2, Lexp(A/2, E/2)
)
= eA/2Lexp(A/2, E/2) + Lexp(A/2, E/2)eA/2.
Recurrence for L0 = Lexp(A, E):
Ls = Lexp(2−sA, 2−sE),
Li−1 = e2−i ALi + Li e2−i A, i = s : −1 : 1.
MIMS Nick Higham Functions of a Matrix 32 / 45
![Page 35: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/35.jpg)
Quadrature Algorithm
Algorithm (Kenney & Laub, 1998; Mathias, 1993)
Approximates L = Lexp(A, E) via repeated Simpson rule;
order of magnitude estimate.
1 B = A/2s with s ≥ 0 minimal s.t. ‖A/2s‖1 ≤ 1/2
2 X = eB
3 X = eB/2
4 Qs = 2−s(XE + 4XEX + EX )/6
5 for i = s:−1: 1
6 if i < s, X = e2−i A, end
7 Qi−1 = XQi + QiX
8 end
9 L = Q0
MIMS Nick Higham Functions of a Matrix 33 / 45
![Page 36: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/36.jpg)
Kronecker Formula
L(A, E) =
∫ 1
0
eA(1−s)EeAs ds.
MIMS Nick Higham Functions of a Matrix 34 / 45
![Page 37: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/37.jpg)
Kronecker Formula
L(A, E) =
∫ 1
0
eA(1−s)EeAs ds.
A⊗ B = (aijB) ∈ Cn2×n2
.
A⊕ B = A⊗ In + Im ⊗ B.
eA⊕B = eA ⊗ eB.
vec(AXB) = (BT ⊗ A)vec(X ).
MIMS Nick Higham Functions of a Matrix 34 / 45
![Page 38: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/38.jpg)
Kronecker Formula
L(A, E) =
∫ 1
0
eA(1−s)EeAs ds.
A⊗ B = (aijB) ∈ Cn2×n2
.
A⊕ B = A⊗ In + Im ⊗ B.
eA⊕B = eA ⊗ eB.
vec(AXB) = (BT ⊗ A)vec(X ).
Theorem
vec(L(A, E)) = K (A)vec(E), where
K (A) = 12(eAT ⊕ eA)τ
(12[AT ⊕ (−A)]
)∈ C
n2×n2,
τ(x) = tanh(x)/x and 12‖AT ⊕ (−A)‖ < π/2 assumed.
MIMS Nick Higham Functions of a Matrix 34 / 45
![Page 39: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/39.jpg)
Approximating the Frechét Derivative
[AT ⊕ (−A)− θI
]vec(E) =
[AT ⊗ I − I ⊗ A− θI
]vec(E)
= vec(EA− AE − θI)
= vec(E(A− θ/2 · I)− (A + θ/2 · I)E).
Obtain Padé approximant
τ(x) = tanh(x)/x ≈ rm(x) =m∏
i=1
(x/βi − 1)−1(x/αi − 1)
by truncating
τ(x) = 1 +1
1 +x2/(1 · 3)
1 +x2/(3 · 5)
1 + · · · x2/((2k − 1) · (2k + 1))
1 + · · ·
.
MIMS Nick Higham Functions of a Matrix 35 / 45
![Page 40: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/40.jpg)
Algorithm (Fréchet derivative; Kenney & Laub, 1998)
Evaluates L = Lexp(A, E) using scaling and squaring and
[8/8] Padé approximant to τ .
1 B = A/2s with s ≥ 0 minimal s.t. ‖A/2s‖1 ≤ 1
2 G0 = 2−sE
3 for i = 1: 8
4 Solve (I + B/βi) Gi + Gi (I − B/βi) =(I + B/αi)Gi−1 + Gi−1(I − B/αi).
5 end
6 X = eB
7 Ls = (GmX + XGm)/2
8 for i = s:−1: 1
9 if i < s, X = e2−i A, end
10 Li−1 = XLi + LiX
11 end
12 L = L0
![Page 41: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/41.jpg)
Outline
1 Definition of f (A)
2 Motivation and MATLAB
3 eA and its Frechét derivative
4 A1/2: Modified Newton Methods
MIMS Nick Higham Functions of a Matrix 37 / 45
![Page 42: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/42.jpg)
Matrix Square Root
X is a square root of A ∈ Cn×n ⇐⇒ X 2 = A .
Number of square roots may be zero, finite or infinite.
Definition
For A with no eigenvalues on R− = {x ∈ R : x ≤ 0} the
principal square root A1/2 is unique square root X with
spectrum in open right half-plane.
MIMS Nick Higham Functions of a Matrix 38 / 45
![Page 43: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/43.jpg)
Newton’s Method for Square Root
Apply Newton to F (X ) = X 2 − A = 0: X0 given,
Solve XkEk + EkXk = A− X 2k
Xk+1 = Xk + Ek
}k = 0, 1, 2, . . .
Modified Newton iteration: freeze Fréchet derivative at X0:
Solve X0Ek + EkX0 = A− X 2k
Xk+1 = Xk + Ek
}k = 0, 1, 2, . . . ,
X0 diagonal⇒ cheap to solve for Ek .
MIMS Nick Higham Functions of a Matrix 39 / 45
![Page 44: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/44.jpg)
Visser Iteration
Set X0 = (2α)−1I in modified Newton:
Visser iteration (1937)
Xk+1 = Xk + α(A− X 2k ), X0 = (2α)−1I.
Stationary iteration.
Richardson iteration.
Linear convergence.
Choice of α?
MIMS Nick Higham Functions of a Matrix 40 / 45
![Page 45: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/45.jpg)
Visser History
Xk+1 = Xk + α(A− X 2k ), X0 = (2α)−1I.
Used with α = 1/2 by Visser (1937) to show positive
operator on Hilbert space has a positive square root.
Likewise in functional analysis texts, e.g. Riesz &
Sz.-Nagy (1956).
Enables proof of existence of A1/2 without using
spectral theorem.
Used computationally by Liebl (1965), Babuška,
Práger & Vitásek (1966), Späth (1966), Duke (1969),
Elsner (1970).
Elsner proves cgce for A ∈ Cn×n with real, positive
ei’vals if 0 < α ≤ ρ(A)−1/2.
MIMS Nick Higham Functions of a Matrix 41 / 45
![Page 46: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/46.jpg)
Visser Transformations
Let Xk = θYk , β = θα and A = θ−2A.
Yk+1 = Yk + β(A− Y 2k ), Y0 =
1
2βI.
Set β = 1/2:
Yk+1 = Yk +1
2(A− Y 2
k ), Y0 = I.
With A ≡ I − C and Yk = I − Pk :
Pk+1 =1
2(C + P2
k ), P0 = 0.
Qk = Pk/2:
Qk+1 = Q2k +
C
4, Q0 = 0.
MIMS Nick Higham Functions of a Matrix 42 / 45
![Page 47: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/47.jpg)
Visser Convergence
Xk+1 = Xk + α(A− X 2k ), X0 = (2α)−1I.
Theorem (H, 2006)
Let A ∈ Cn×n and α > 0. If Λ(I − 4α2A) lies in the cardioid
D = {2z − z2 : z ∈ C, |z| < 1 }
then A1/2 exists and Xk → A1/2 linearly.
−4 −3 −2 −1 0 1 2
−2
−1
0
1
2
MIMS Nick Higham Functions of a Matrix 43 / 45
![Page 48: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/48.jpg)
Example
A ∈ R16×16 spd with aii = i2, aij = 0.1, i 6= j .
Aim for rel residual < nu in IEEE DP arithmetic.
Pulay iteration D = diag(A): θ = 0.191, 9 iters.
Visser iteration α = 0.058 (hand optimized), 245 iters.
−4 −3 −2 −1 0 1 2
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
MIMS Nick Higham Functions of a Matrix 44 / 45
![Page 49: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/49.jpg)
In Conclusion
Many applications of f (A), e.g. control theory,
computer graphics, theoretical physics.
Need better understanding of conditioning of
f (A).
Can we exploit structure, e.g. A ∈ matrix
automorphism group or Jordan or Lie
algebra?
Krylov methods needed for large, sparse A.
How to use Cauchy integral computationally
(H & Trefethen)?
MIMS Nick Higham Functions of a Matrix 45 / 45
![Page 50: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/50.jpg)
References I
D. A. Bini, N. J. Higham, and B. Meini.
Algorithms for the matrix pth root.
Numer. Algorithms, 39(4):349–378, 2005.
C.-H. Guo and N. J. Higham.
A Schur–Newton method for the matrix pth root and its
inverse.
SIAM J. Matrix Anal. Appl., 28(3):788–804, 2006.
N. J. Higham.
Functions of a Matrix: Theory and Computation.
Book in preparation.
MIMS Nick Higham Functions of a Matrix 44 / 45
![Page 51: Functions of a Matrix: Theory and Computationhigham/talks/talk06_funm.pdf · Functions of a Matrix: Theory and Computation Nick Higham School of Mathematics The University of Manchester](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b14274c7f8b9a3e7c8b800d/html5/thumbnails/51.jpg)
References II
N. J. Higham.
The scaling and squaring method for the matrix
exponential revisited.
SIAM J. Matrix Anal. Appl., 26(4):1179–1193, 2005.
C. S. Kenney and A. J. Laub.
A Schur–Fréchet algorithm for computing the logarithm
and exponential of a matrix.
SIAM J. Matrix Anal. Appl., 19(3):640–663, 1998.
R. Mathias.
Evaluating the Frechet derivative of the matrix
exponential.
Numer. Math., 63(2):213–226, 1992.
MIMS Nick Higham Functions of a Matrix 45 / 45