functions of a matrix: theory and computationhigham/talks/talk06_funm.pdf · functions of a matrix:...

Functions of a Matrix:

Theory and Computation

Nick HighamSchool of Mathematics

The University of Manchester

[email protected]

http://www.ma.man.ac.uk/~higham/

Landscapes in Mathematical Science,University of Bath, November 24 2006


http://www.ma.man.ac.uk

http://www.man.ac.uk

mailto:[email protected]


Outline

1 Definition of f (A)

2 Motivation and MATLAB

3 eA and its Frechét derivative

4 A1/2: Modified Newton Methods

MIMS Nick Higham Functions of a Matrix 2 / 45

http://www.mims.manchester.ac.uk/

Function of a Matrix

f : Cn×n 7→ C

n×n for an underlying scalar function f .

These are not matrix functions:

trace(A), det(A).

The adjugate (or adjoint) matrix.

Transfer function f (t) = B(tI − A)−1C.

sin A = (sin aij).

These are matrix functions:

eA = I + A +A2

2!+ · · · .

log(I + A) = A− A2

2+

A3

3+ · · · , ρ(A) < 1.

A−1, A1/2.



Multiplicity of Definitions

There have been proposed in the literature since 1880

eight distinct definitions of a matric function,

by Weyr, Sylvester and Buchheim,

Giorgi, Cartan, Fantappiè, Cipolla,

Schwerdtfeger and Richter.

— R. F. Rinehart,The Equivalence of Definitions of a Matric Function,

Amer. Math. Monthly (1955)



Jordan Canonical Form

Z−1AZ = J = diag(J1, . . . , Jp), Jk︸︷︷︸mk×mk

=

λk 1

λk. . .. . . 1

λk

Definition

f (A) = Zf (J)Z−1 = Zdiag(f (Jk))Z−1,

f (Jk) =

f (λk) f ′(λk) . . .f (mk−1))(λk)

(mk − 1)!

f (λk). . .

.... . . f ′(λk)

f (λk)

.



The Formula for f (Jk)

Write Jk = λk I + Ek ∈ Cmk×mk . For mk = 3 we have

Ek =

0 1 0

0 0 1

0 0 0

, E2

k =

0 0 1

0 0 0

0 0 0

, E3

k = 0.

Assume f has Taylor expansion

f (t) = f (λk) + f ′(λk)(t − λk) + · · ·+ f (j)(λk)(t − λk)j

j!+ · · · .

Then

f (Jk) = f (λk)I + f ′(λk)Ek + · · ·+ f (mk−1)(λk)Emk−1k

(mk − 1)!.



Interpolation

Definition (Sylvester, 1883; Buchheim, 1886)

Distinct e’vals λ1, . . . , λs, ni = max size of Jordan blocks for

λi . Then f (A) = r(A), where r is unique Hermite

interpolating poly of degree <∑s

i=1 ni satisfying

r (j)(λi) = f (j)(λi), j = 0 : ni − 1, i = 1 : s.



Interpolation

Definition (Sylvester, 1883; Buchheim, 1886)

Distinct e’vals λ1, . . . , λs, ni = max size of Jordan blocks for

λi . Then f (A) = r(A), where r is unique Hermite

interpolating poly of degree <∑s

i=1 ni satisfying

r (j)(λi) = f (j)(λi), j = 0 : ni − 1, i = 1 : s.

Example. Let f (t) = t1/2, A =

[2 2

1 3

], λ(A) = {1, 4}.

Taking +ve square roots,

r(t) = f (1)t − 4

1− 4+ f (4)

t − 1

4− 1=

1

3(t + 2).

⇒ A1/2 = r(A) =1

3(A + 2I) =

1

3

[4 2

1 5

].



Cauchy Integral Theorem

Definition

f (A) =1

2πi

∫

Γ

f (z)(zI − A)−1 dz,

where f is analytic on and inside a closed contour Γ that

encloses λ(A).



Equivalence of Definitions

Theorem

The three definitions are equivalent, modulo analyticity

assumption for Cauchy.

Interpolation: for basic properties.

JCF: for solving matrix equations (e.g., X 2 = A,

eX = A). For evaluation (normal A).

Cauchy: various uses.



Equivalence of Definitions

Theorem

The three definitions are equivalent, modulo analyticity

assumption for Cauchy.

Interpolation: for basic properties.

JCF: for solving matrix equations (e.g., X 2 = A,

eX = A). For evaluation (normal A).

Cauchy: various uses.

For computation:

Use the definitions (with care).

Schur decomposition for general f .

Methods specific to particular f .



Outline







Toolbox of Matrix Functions

Want to have techniques for evaluating interesting f at

matrix args as well as scalar args.

Example:

d2y

dt2+ Ay = 0, y(0) = y0, y ′(0) = y ′

0

has solution

y(t) = cos(√

At)y0 +(√

A)−1

sin(√

At)y ′0,

where√

A is any square root of A.

MATLAB has expm, logm, sqrtm, funm.



Classic MATLAB

< M A T L A B >

Version of 01/10/84

HELP is available

<>

help

Type HELP followed by

INTRO (To get started)

NEWS (recent revisions)

ABS ANS ATAN BASE CHAR CHOL CHOP CLEA COND CONJ COS

DET DIAG DIAR DISP EDIT EIG ELSE END EPS EXEC EXIT

EXP EYE FILE FLOP FLPS FOR FUN HESS HILB IF IMAG

INV KRON LINE LOAD LOG LONG LU MACR MAGI NORM ONES

ORTH PINV PLOT POLY PRIN PROD QR RAND RANK RCON RAT

REAL RETU RREF ROOT ROUN SAVE SCHU SHOR SEMI SIN SIZE

SQRT STOP SUM SVD TRIL TRIU USER WHAT WHIL WHO WHY

< > ( ) = . , ; \ / ’ + - * :



Classic MATLAB

<>

help fun

FUN For matrix arguments X , the functions SIN, COS, ATAN,

SQRT, LOG, EXP and X**p are computed using eigenvalues D

and eigenvectors V . If <V,D> = EIG(X) then f(X) =

V*f(D)/V . This method may give inaccurate results if V

is badly conditioned. Some idea of the accuracy can be

obtained by comparing X**1 with X .

For vector arguments, the function is applied to each

component.

The availability of [FUN] in early versions of MATLAB

quite possibly contributed to

the system’s technical and commercial success.

— Cleve Moler (2003)



Outline







Matrix Exponential

Large literature.

Early exposition in Frazer, Duncan & Collar.

Elementary Matrices and Some Applications toDynamics and Differential Equations. CUP, 1938.

Moler & Van Loan.

Nineteen dubious ways to compute the exponentialof a matrix, twenty-five years later, SIAM Rev., 45

(2003).

Over 500 citations on ISI Citation Index.



Scaling and Squaring Method

◮ B ← A/2s so ‖B‖∞ ≈ 1

◮ rm(B) = [m/m] Padé approximant to eB

◮ X = rm(B)2s ≈ eA

Used by expm in MATLAB.

Originates with Lawson (1967).

Ward (1977): algorithm, with rounding error analysis

and a posteriori error bound.

Moler & Van Loan (1978): give backward error

analysis covering truncation error in Padé

approximants, allowing choice of s and m.



Padé Approximations rm to ex

rm(x) = pm(x)/qm(x) known explicitly:

pm(x) =m∑

j=0

(2m − j)!m!

(2m)! (m − j)!

x j

j!

and qm(x) = pm(−x). Error satisfies

ex−rm(x) = (−1)m (m!)2

(2m)!(2m + 1)!x2m+1+O(x2m+2).



Analysis

Let

e−Arm(A) = I + G = eH

and assume ‖G‖ < 1. Then

‖H‖ = ‖ log(I + G)‖ ≤∞∑

j=1

‖G‖j/j = − log(1− ‖G‖).

Hence

rm(A) = eAeH = eA+H .

Rewrite as

rm(A/2s)2s

= eA+E ,

where E = 2sH satisfies

‖E‖ ≤ −2s log(1− ‖G‖).



Result

Theorem

Let

e−2−sA rm(2−sA) = I + G,

where ‖G‖ < 1. Then the Padé approximant rm satisfies

rm(2−sA)2s

= eA+E ,

where‖E‖‖A‖ ≤

− log(1− ‖G‖)‖2−sA‖ .

◮ Need to bound ‖G‖ given m and ‖2−sA‖.◮ Then ‖E‖/‖A‖ bounded in terms of m, s and ‖A‖.◮ Now select “best” (s, m) pair for given ‖A‖.



Bounding ‖G‖

ρ(x) := e−x rm(x)− 1 =∞∑

i=2m+1

cixi

converges absolutely for |x | < min{ |t | : qm(t) = 0 } =: νm.

Hence, with θ := ‖2−sA‖ < νm,

‖G‖ = ‖ρ(2−sA)‖ ≤∞∑

i=2m+1

|ci |θi =: f (θ). (∗)

Thus ‖E‖/‖A‖ ≤ − log(1− f (θ))/θ) .

◮ If only ‖A‖ known, (∗) is optimal bound on ‖G‖.◮ Moler & Van Loan (1978) bound less sharp;

Dieci & Papini (2000) bound a different error.



Summary of Computations

Solve “bound = unit roundoff” by summing 150 terms of

series in 250 digit arithmetic.

Work out cost of evaluating rm(B) and of the squaring

phase.

Minimize cost subject to retaining numerical stability in

evaluation of rm(B).

Precision m θm

IEEE single 7 3.9

IEEE double 13 5.4

IEEE quad 17 3.3



Scaling and Squaring Algorithm for eA

Algorithm 1 (H, 2005; MATLAB 7.2, Mathematica 5.1)

1 for m = [3 5 7 9]2 if ‖A‖1 ≤ θm

3 X = rm(A).4 quit

5 end

6 end

7 A← A/2s with s ≥ 0 minimal s.t. ‖A/2s‖1 ≤ θ13 = 5.48 A2 = A2, A4 = A2

2, A6 = A2A4

9 U = A[A6(b13A6 + b11A4 + b9A2) + b7A6 + b5A4 + b3A2 + b1I

]

10 V = A6(b12A6 + b10A4 + b8A2) + b6A6 + b4A4 + b2A2 + b0I

11 Solve (−U + V )r13 = U + V for r13.

12 X = r132s

by repeated squaring.



Comparison with Existing Algorithms

Method m max ‖2−sA‖Alg 1 13 5.4

Ward (1977) 8 1.0 [θ8 = 1.5]

Old MATLAB expm 6 0.5 [θ6 = 0.54]

Sidje (1998) 6 0.5

◮ ‖A‖1 > 1: Alg 1 requires 1–2 fewer mat mults than

Ward, 2–3 fewer than expm.

◮ ‖A‖1 ∈ (2, 2.1):Alg 1 Ward expm Sidje

mults 5 7 8 10



Squaring Phase

◮ The bound

‖A2 − fl(A2)‖ ≤ γn‖A‖2, γn =nu

1− nu.

shows the dangers in matrix squaring.

◮ Open question: are errors in squaring phase

consistent with conditioning of the problem?

◮ Our choice of parameters uses 1–5 fewer matrix

squarings than previous algorithms. Hence has

potential accuracy advantages.



Numerical Experiment

◮ 70 test matrices, dimension 2–10.

◮ Evaluated 1-norm relative error ‖X − X‖1/‖X‖1.

◮ Notation:

◮ expm: Alg 1 (MATLAB 7.2).

◮ old_expm: MATLAB 7.1.

◮ funm: MATLAB 7.

◮ padm: Sidje.

◮ cond(A) = limǫ→0

max‖E‖2≤ǫ‖A‖2

‖eA+E − eA‖2

ǫ‖eA‖2

.



Different S&S Codes and funm

0 10 20 30 40 50 60 70

10−18

10−16

10−14

10−12

10−10

10−8

10−6

old_expm

padm

funm

expm (Alg 1)

cond*u



Performance Profiles

Dolan & Moré (2002) propose a new type of performance

profile.

For the given set of solvers and test problems, plot

x-axis: α

y -axis: probability that solver has error within factor αof smallest error over all solvers on the test set.



Performance Profile

1 1.5 2 2.5 3 3.5 4 4.5 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

expm (Alg 1)

padm

funm

old_expm

α

p



Frechét Derivative

Fréchet derivative of f : Cn×n → C

n×n at X ∈ Cn×n

A linear mapping L : Cn×n → C

n×n s.t. for all E ∈ Cn×n

f (X + E)− f (X )− L(X , E) = o(‖E‖).

Example For f (X ) = X 2 we have

f (X + E)− f (X ) = XE + EX + E2,

so L(X , E) = XE + EX .



Frechét Derivative of eA

L(A, E) =

∫ 1

0

eA(1−s)EeAs ds.

‖L(A)‖ := max ‖L(A, E)‖/‖E‖.

Condition no. of the exponential: κexp(A) =‖L(A)‖‖A‖‖eA‖ .

‖L(A)‖ ≥ ‖L(A, I)‖ = ‖eA‖ ⇒ κexp(A) ≥ ‖A‖ .

Theorem

If A ∈ Cn×n is normal then in the 2-norm,

κexp(A) = ‖A‖2.

If A ∈ Rn×n is a nonnegative scalar multiple of a

stochastic matrix then in the∞-norm, κexp(A) = ‖A‖∞.



Quadrature

Repeated trap rule:∫ 1

0f (t) dt ≈ 1

m(1

2f0 + f1 + f2 + · · ·+ fm−1 + 1

2fm), fi := f (i/m)

gives L(A, E) ≈ 1m

(12eAE +

∑m−1i=1 eA(1−i/m)EeAi/m + 1

2EeA

).

Requires eA/m, e2A/m, . . . , e(m−1)A/m, eA.

Lemma

Consider R1(A, E) =∑p

i=1 wi eA(1−ti )EeAti and denote its

m-times repeated form by Rm(A, E). If

Qs = R1(2−sA, 2−sE),

Qi−1 = e2−i AQi + Qi e2−i A, i = s : −1 : 1,

then Q0 = R2s(A, E) .



Scaling and Squaring

Recall Lx2(A, E) = AE + EA.

Applying chain rule to eA = (eA/2)2 gives

Lexp(A, E) = Lx2

(eA/2, Lexp(A/2, E/2)

)

= eA/2Lexp(A/2, E/2) + Lexp(A/2, E/2)eA/2.

Recurrence for L0 = Lexp(A, E):

Ls = Lexp(2−sA, 2−sE),

Li−1 = e2−i ALi + Li e2−i A, i = s : −1 : 1.



Quadrature Algorithm

Algorithm (Kenney & Laub, 1998; Mathias, 1993)

Approximates L = Lexp(A, E) via repeated Simpson rule;

order of magnitude estimate.

1 B = A/2s with s ≥ 0 minimal s.t. ‖A/2s‖1 ≤ 1/2

2 X = eB

3 X = eB/2

4 Qs = 2−s(XE + 4XEX + EX )/6

5 for i = s:−1: 1

6 if i < s, X = e2−i A, end

7 Qi−1 = XQi + QiX

8 end

9 L = Q0



Kronecker Formula

L(A, E) =

∫ 1

0

eA(1−s)EeAs ds.



Kronecker Formula

L(A, E) =

∫ 1

0

eA(1−s)EeAs ds.

A⊗ B = (aijB) ∈ Cn2×n2

.

A⊕ B = A⊗ In + Im ⊗ B.

eA⊕B = eA ⊗ eB.

vec(AXB) = (BT ⊗ A)vec(X ).



Kronecker Formula

L(A, E) =

∫ 1

0

eA(1−s)EeAs ds.

A⊗ B = (aijB) ∈ Cn2×n2

.

A⊕ B = A⊗ In + Im ⊗ B.

eA⊕B = eA ⊗ eB.

vec(AXB) = (BT ⊗ A)vec(X ).

Theorem

vec(L(A, E)) = K (A)vec(E), where

K (A) = 12(eAT ⊕ eA)τ

(12[AT ⊕ (−A)]

)∈ C

n2×n2,

τ(x) = tanh(x)/x and 12‖AT ⊕ (−A)‖ < π/2 assumed.



Approximating the Frechét Derivative

[AT ⊕ (−A)− θI

]vec(E) =

[AT ⊗ I − I ⊗ A− θI

]vec(E)

= vec(EA− AE − θI)

= vec(E(A− θ/2 · I)− (A + θ/2 · I)E).

Obtain Padé approximant

τ(x) = tanh(x)/x ≈ rm(x) =m∏

i=1

(x/βi − 1)−1(x/αi − 1)

by truncating

τ(x) = 1 +1

1 +x2/(1 · 3)

1 +x2/(3 · 5)

1 + · · · x2/((2k − 1) · (2k + 1))

1 + · · ·

.



Algorithm (Fréchet derivative; Kenney & Laub, 1998)

Evaluates L = Lexp(A, E) using scaling and squaring and

[8/8] Padé approximant to τ .

1 B = A/2s with s ≥ 0 minimal s.t. ‖A/2s‖1 ≤ 1

2 G0 = 2−sE

3 for i = 1: 8

4 Solve (I + B/βi) Gi + Gi (I − B/βi) =(I + B/αi)Gi−1 + Gi−1(I − B/αi).

5 end

6 X = eB

7 Ls = (GmX + XGm)/2

8 for i = s:−1: 1

9 if i < s, X = e2−i A, end

10 Li−1 = XLi + LiX

11 end

12 L = L0

Outline







Matrix Square Root

X is a square root of A ∈ Cn×n ⇐⇒ X 2 = A .

Number of square roots may be zero, finite or infinite.

Definition

For A with no eigenvalues on R− = {x ∈ R : x ≤ 0} the

principal square root A1/2 is unique square root X with

spectrum in open right half-plane.



Newton’s Method for Square Root

Apply Newton to F (X ) = X 2 − A = 0: X0 given,

Solve XkEk + EkXk = A− X 2k

Xk+1 = Xk + Ek

}k = 0, 1, 2, . . .

Modified Newton iteration: freeze Fréchet derivative at X0:

Solve X0Ek + EkX0 = A− X 2k

Xk+1 = Xk + Ek

}k = 0, 1, 2, . . . ,

X0 diagonal⇒ cheap to solve for Ek .



Visser Iteration

Set X0 = (2α)−1I in modified Newton:

Visser iteration (1937)

Xk+1 = Xk + α(A− X 2k ), X0 = (2α)−1I.

Stationary iteration.

Richardson iteration.

Linear convergence.

Choice of α?



Visser History

Xk+1 = Xk + α(A− X 2k ), X0 = (2α)−1I.

Used with α = 1/2 by Visser (1937) to show positive

operator on Hilbert space has a positive square root.

Likewise in functional analysis texts, e.g. Riesz &

Sz.-Nagy (1956).

Enables proof of existence of A1/2 without using

spectral theorem.

Used computationally by Liebl (1965), Babuška,

Práger & Vitásek (1966), Späth (1966), Duke (1969),

Elsner (1970).

Elsner proves cgce for A ∈ Cn×n with real, positive

ei’vals if 0 < α ≤ ρ(A)−1/2.



Visser Transformations

Let Xk = θYk , β = θα and A = θ−2A.

Yk+1 = Yk + β(A− Y 2k ), Y0 =

1

2βI.

Set β = 1/2:

Yk+1 = Yk +1

2(A− Y 2

k ), Y0 = I.

With A ≡ I − C and Yk = I − Pk :

Pk+1 =1

2(C + P2

k ), P0 = 0.

Qk = Pk/2:

Qk+1 = Q2k +

C

4, Q0 = 0.



Visser Convergence

Xk+1 = Xk + α(A− X 2k ), X0 = (2α)−1I.

Theorem (H, 2006)

Let A ∈ Cn×n and α > 0. If Λ(I − 4α2A) lies in the cardioid

D = {2z − z2 : z ∈ C, |z| < 1 }

then A1/2 exists and Xk → A1/2 linearly.

−4 −3 −2 −1 0 1 2

−2

−1

0

1

2



Example

A ∈ R16×16 spd with aii = i2, aij = 0.1, i 6= j .

Aim for rel residual < nu in IEEE DP arithmetic.

Pulay iteration D = diag(A): θ = 0.191, 9 iters.

Visser iteration α = 0.058 (hand optimized), 245 iters.

−4 −3 −2 −1 0 1 2

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5



In Conclusion

Many applications of f (A), e.g. control theory,

computer graphics, theoretical physics.

Need better understanding of conditioning of

f (A).

Can we exploit structure, e.g. A ∈ matrix

automorphism group or Jordan or Lie

algebra?

Krylov methods needed for large, sparse A.

How to use Cauchy integral computationally

(H & Trefethen)?



References I

D. A. Bini, N. J. Higham, and B. Meini.

Algorithms for the matrix pth root.

Numer. Algorithms, 39(4):349–378, 2005.

C.-H. Guo and N. J. Higham.

A Schur–Newton method for the matrix pth root and its

inverse.

SIAM J. Matrix Anal. Appl., 28(3):788–804, 2006.

N. J. Higham.

Functions of a Matrix: Theory and Computation.

Book in preparation.



References II

N. J. Higham.

The scaling and squaring method for the matrix

exponential revisited.


C. S. Kenney and A. J. Laub.

A Schur–Fréchet algorithm for computing the logarithm

and exponential of a matrix.


R. Mathias.

Evaluating the Frechet derivative of the matrix

exponential.

Numer. Math., 63(2):213–226, 1992.



functions of a matrix: theory and computationhigham/talks/talk06_funm.pdf · functions of a matrix:...

Documents