the matrix exponential - lunds universitet · the matrix exponential erik wahlén...

The matrix exponential

Erik Wahlé[email protected]

October 3, 2014

1 Definition and basic propertiesThese notes serve as a complement to Chapter 7 in Ahmad and Ambrosetti. InChapter 7, the general solution of the equation

(1) x′ = Ax

is computed using eigenvalues and eigenvectors. This method works when A hasn distinct eigenvalues or, more generally, when there is a basis for Cn consisting ofeigenvectors of A. If the latter case, we say that A is diagonalizable. In this case,the general solution is given by

x(t) = c1etλ1 v1 + · · ·+ cne

tλn vn,

where λ1, . . . , λn are the, not necessarily distinct, eigenvalues of A and v1, . . . , vnthe corresponding eigenvectors. Example 7.4.5 in the book illustrates what canhappen if A is not diagonalizable, i.e. if there is no basis of eigenvectors of A. Inthese notes we will discuss this question in a more systematic way.

Our starting point is to write the general solution of (1) as

x(t) = etAc,

where c is an arbitrary vector in Cn, in analogy with the scalar case n = 1. Wethen have to begin by defining etA and show that the above formula actually givesa solution of the equation. Note that it is simpler to work in the complex vectorspace Cn than Rn, since even if A is real it might have complex eigenvalues andeigenvectors. If A should happen to be real, we can restrict to real vectors c ∈ Rn

to obtain real solutions.Recall from Taylor’s formula that

et =n∑k=0

tk

k!+ rn(t),

wherern(t) =

estn+1

(n+ 1)!,

1

for some s between 0 and t. Since limn→∞ rn(t) = 0, we have

et =∞∑k=0

tk

k!, t ∈ R.

This is an example of a power series.More generally a power series is a series of the form

∞∑k=0

ak(x− x0)k,

where the the coefficients ak are complex numbers. The number

R := sup{r ≥ 0: {|ak|rk}∞k=0 is bounded}

is called the radius of convergence of the series. The significance of the radius ofconvergence is that the power series converges inside of it, and that the series canbe manipulated as though it were a finite sum there (e.g. differentiated termwise).To prove this we first show a preliminary lemma.

Lemma 1. Suppose that {fn} is a sequence of continuously differentiable functionson an interval I. Assume that f ′n → g uniformly on I and that {fn(a)} convergesfor some a ∈ I. Then {fn} converges pointwise on I to some function f . Moreover,f is continuously differentiable with f ′(x) = g(x).

Proof. We have

fn(x) = fn(a) +

∫ x

a

f ′n(s) ds, ∀n.

Since f ′n(s) → g(s) uniformly on the interval between a and x, we can take thelimit under the integral and obtain that fn(x) converges to f(x) defined by

f(x) = f(a) +

∫ x

a

g(s) ds, x ∈ I,

where f(a) = limn→∞ fn(a). The fundamental theorem of calculus now shows thatf is continuously differentiable with f ′(x) = g(x).

Theorem 2. Assume that the power series

(2)∞∑n=0

ak(x− x0)k

has positive radius of convergence. The series converges uniformly and absolutelyin the interval [x0− r, x0 + r] whenever 0 < r < R and diverges when |x−x0| > R.The limit is infinitely differentiable and the series can be differentiated termwise.

Proof. If |x−x0| > R the sequence ak(x−x0)k is unbounded, so the series diverges.Consider an interval |x− x0| ≤ r, where r ∈ (0, R). Choose r < r < R. Then

|ak(x− x0)k| ≤ |ak|rk(rr

)k≤ C

(rr

)k,

2

when |x− x0| ≤ r, where C is a constant such that |ak|rk ≤ C ∀k. Since r/r < 1we find that

∞∑k=0

(rr

)k<∞.

Weierstrass’ M-test then shows that (2) converges uniformly when |x − x0| ≤ rto some function S(x). S is at least continuous since it’s the uniform limit of asequence of continuous functions. If S is differentiable and can be differentiatedtermwise, then the derivative is

∞∑k=0

(k + 1)ak+1(x− x0)k.

This is a again a power series with radius of convergence R (prove this!). Henceit converges uniformly on [x0 − r, x0 + r], 0 < r < R. It follows from the previouslemma that S is continuously differentiable on (x0−R, x0 +R) and that the powerseries can be differentiated termwise. An induction argument shows that S isinfinitely many times differentiable.

The interval (x0 − R, x0 + R) is called the interval of convergence. The powerseries also converges for complex x with |x − x0| < R (hence the name radius ofconvergence). All of the above properties still hold for complex x, but the derivativemust now be understood as a complex derivative. This concept is studied in coursesin complex analysis; we will not linger on it here.

Let us now return to matrices. The set of complex n × n matrices is denotedCn×n.

Definition 3. If A ∈ Cn×n, we define

etA =∞∑k=0

tkAk

k!, t ∈ R.

In particular,

eA =∞∑k=0

Ak

k!.

For this to make sense, we have to show that the series converges for all t ∈ R.A matrix sequence {Ak}∞k=1, Ak ∈ Cn×n, is said to converge if it converges element-wise, i.e. [Ak]ij converges as k →∞ for all i and j, where the notation [A]ij is usedto denote the element on row i and column j of A (this will also be denoted aijin some places). As usual, a matrix series is said to converge if the corresponding(matrix) sequence of partial sums converges. Instead of checking if all the elementsconverge, it is sometimes useful to work with the matrix norm.

Definition 4. The matrix norm of A is defined by

‖A‖ = maxx∈Cn,|x|=1

|Ax|.

3

Equivalently, ‖A‖ is the smallest K ≥ 0 s.t.

|Ax| ≤ K|x|, ∀x ∈ Cn

since|Ax| =

∣∣∣∣A(|x| x|x|)∣∣∣∣ =

∣∣∣∣A( x

|x|

)∣∣∣∣ |x| and∣∣∣∣ x|x|

∣∣∣∣ = 1.

The following proposition follows from the definition (prove this!).

Proposition 5. The matrix norm satisfies

(1) ‖A‖ ≥ 0, with equality iff A = 0.

(2) ‖zA‖ = |z|‖A‖, z ∈ C.

(3) ‖A+B‖ ≤ ‖A‖+ ‖B‖.

A more interesting property is the following.

Proposition 6.‖AB‖ ≤ ‖A‖‖B‖.

Proof. We have

|ABx| ≤ ‖A‖|Bx| ≤ ‖A‖‖B‖|x| = ‖A‖‖B‖ if |x| = 1.

Proposition 7.

(1) |aij| ≤ ‖A‖.

(2) ‖A‖ ≤(∑

i,j |aij|2)1/2

.

Proof. (1) Recall that aij = [A]ij. Let {e1, . . . , en}, be the standard basis. Then

Aej =

a1j...anj

⇒ |aij| ≤ ‖Aej‖ ≤ ‖A‖|ej| ≤ ‖A‖.

(2) We have thatAx =

R1...Rn

x =

R1x...

Rnx

,

whereRi =

(ai1 · · · ain

)is row i of A. Hence

‖Ax‖2 =n∑i=1

‖Rix‖2 ≤Cauchy-Schwarz

(n∑i=1

|Ri|2)|x|2 =

(∑i,j

|aij|2)|x|2.

The following corollary is an immediate consequence of the above proposition.

4

Corollary 8. Let {Ak}∞k=1 ⊂ Cn×n and A ∈ Cn×n. Then

limk→∞

Ak = A⇔ limk→∞‖Ak − A‖ → 0.

We can now show that our definition of the matrix exponential makes sense.

Proposition 9. The series∑∞

k=0tkAk

k!defining etA converges uniformly on compact

intervals. Moreover, the function t 7→ etA is differentiable with derivative AetA.

Proof. Each matrix element of∑∞

k=0tkAk

k!is a power series in t with coefficients

[Ak]ij/k!. The radius of convergence is infinite, R =∞, since∣∣∣∣ [Ak]ijk!

∣∣∣∣ rk ≤ ‖Ak‖rkk!≤ ‖A‖

krk

k!→ 0

as k → ∞ for all r ≥ 0. The series thus converges uniformly on any compactinterval and pointwise on R. Differentiating termwise, we obtain

d

dt

∞∑k=0

tkAk

k!=∞∑k=1

tk−1Ak

(k − 1)!=

j=k−1A∞∑j=0

tjAj

j!= AetA.

Since e0·A = I, we obtain the following result.

Theorem 10. The general solution of (1) is given by

x(t) = etAc, c ∈ Cn

and the unique solution of the IVP

x′ = Ax, x(0) = x0

is given byx(t) = etAx0.

In certain cases it is possible to compute the matrix exponential directly fromthe definition.

Example 11. Suppose that

A =

λ1 0 · · · 00 λ2 · · · 0...

... . . . ...0 0 · · · λn

is a diagonal matrix. Then

Ak =

λk1 0 · · · 00 λk2 · · · 0...

... . . . ...0 0 · · · λkn

,

5

so

etA =

∑∞

k=0tkλk1k!

0 · · · 0

0∑∞

k=0tkλk2k!· · · 0

...... . . . ...

0 0 · · ·∑∞

k=0tkλknk!

=

etλ1 0 · · · 00 etλ2 · · · 0...

... . . . ...0 0 · · · etλn

.

Example 12. Let’s compute etA where

A =

(0 1−1 0

).

We find that

A2 =

(−1 0

0 −1

), A3 =

(0 −11 0

), A4 =

(1 00 1

)and A4j+r = Ar.

Hence,

etA =

( ∑∞j=0

(−1)jt2j

(2j)!

∑∞j=0

(−1)jt2j+1

(2j+1)!

−∑∞

j=0(−1)jt2j+1

(2j+1)!

∑∞j=0

(−1)jt2j

(2j)!

)=

(cos t sin t− sin t cos t

).

Let’s now look at the case of a general diagonalizable matrix A. As alreadymentioned, A is diagonalizable if there is a basis for Cn consisting of eigenvectorsv1, . . . , vn of A. The corresponding eigenvalues λ1, . . . , λn don’t have to be distinct.In this case we know from Chapter 7 in Ahmad and Ambrosetti that the generalsolution of (1) is given by

x(t) = c1eλ1tv1 + · · ·+ cne

λntvn.

In matrix notation, we can write this as

x(t) =

eλ1tv1 · · · eλntvn

c1

...cn

,

or

x(t) =

v1 · · · vn

e

tλ1 · · · 0... . . . ...0 · · · etλn

c1

...cn

.

Denote the matrix to the left by T . If we want to find the solution with x(0) = x0,we have to solve the system of equations

T c = x0 ⇔ c = T−1x0.

6

Thus the solution of the IVP is

x(t) = TetDT−1x0,

where

D =

λ1 · · · 0... . . . ...0 · · · λn

.

This can also be seen by computing the matrix exponential of etA using the fol-lowing proposition.

Proposition 13.eTBT

−1

= TeBT−1

Proof. This follows by noting that

(TBT−1)k = (TBT−1)(TBT−1) · · · (TBT−1)

= TB(T−1T )B(T−1T ) · · · (T−1T )BT−1

= TBkT−1,

whence∞∑k=0

(TBT−1)k

k!=∞∑k=0

TBkT−1

k!= TeBT−1.

Returning to the discussion above, if A is diagonalizable, then it can be written

(3) A = TDT−1

with T and D as above. D is the matrix for the linear operator x 7→ Ax in thebasis v1, . . . , vn. The matrix for this linear operator in the standard basis is simplyA. Equation (3) is the change of basis formula for matrices. For convenience wewill also refer to D as the matrix for A in the basis v1, . . . , vn, although this is abit sloppy. From (3) and Proposition 13 it follows that

etA = TetDT−1.

The matrix for etA in the basis v1, . . . , vn is thus given by etD. The solution of theIVP is given by

etAx0 = TetDT−1x0,

confirming our previous result.We finally record some properties of the matrix exponential which will be useful

for matrices which can’t be diagonalized.

Lemma 14. AB = BA ⇒ eAB = BeA.

Proof. AkB = Ak−1BA = · · · = ABk. Thus

limN→∞

(N∑k=0

Ak

k!

)B = lim

N→∞B

(N∑k=0

Ak

k!

).

7

Proposition 15.

(1) (eA)−1 = e−A.

(2) eAeB = eA+B = eBeA if AB = BA.

(3) etAesA = e(t+s)A.

Proof. (1) We have that

d

dtetAe−tA = AetAe−tA + etA(−Ae−tA) = (A− A)etAe−tA = 0,

where we have used the previous lemma to interchange the order of etA and A.Hence

etAe−tA ≡ C (constant matrix).

Setting t = 0, we find that C = I (identity matrix). Setting t = 1, we find thateAe−A = I.

(2) We have that

d

dtet(A+B)e−tAe−tB = (A+B)et(A+B)e−tAe−tB − et(A+B)Ae−tAe−tB − et(A+B)e−tABe−tB

= (A+B − (A+B))etAetBe−t(A+B)

= 0,

where we have used the previous lemma in the second line. As in (1) we obtaineAeBe−(A+B) = I ⇒ eAeB = eA+B, where we have used (1).

(3) follows from (2) since tA and sA commute.

Let’s use this to compute the matrix exponential of a matrix which can’t bediagonalized.

Example 16. Let

D =

(2 00 2

), N =

(0 10 0

)and

A = D +N =

(2 10 2

).

The matrix A is not diagonalizable, since the only eigenvalue is 2 and Cx = 2xhas the solution

x = z

(10

), z ∈ C.

Since D is diagonal, we have that

etD =

(e2t 00 e2t

).

Moreover, N2 = 0 (confirm this!), so

etN = I + tN =

(1 t0 1

)8

Since D and N commute, we find that

etA = etD+tN = etDetN =

(e2t 00 e2t

)(1 t0 1

)=

(e2t te2t

0 e2t

).

Exercise 5 below shows that the hypothesis that A and B commute in Propo-sition 15b is essential.

2 Generalized eigenvectorsWe know now how to compute the matrix exponential when A is diagonalizable. Inthe next section we will discuss how this can be done when A is not diagonalizable.In order to do that, we need to introduce some more advanced concepts from linearalgebra. When A is diagonalizable there is a basis consisting of eigenvectors. Themain idea when A is not diagonalizable is to replace this basis by a basis consistingof ‘generalized eigenvectors’.

Definition 17. Let A ∈ Cn×n. A vector v ∈ Cn, v 6= 0, is called a generalizedeigenvector corresponding to the eigenvalue λ if

(A− λI)mv = 0

for some integer m ≥ 1.

Note that according to this definition, an eigenvector also qualifies as a gen-eralized eigenvector. We also remark that λ in the above definition has to be aneigenvalue since if m ≥ 1 is the smallest positive integer such that (A−λI)mv = 0,then w = (A− λI)m−1v 6= 0 and

(A− λI)w = (A− λI)(A− λI)m−1v = (A− λI)mv = 0,

so w is an eigenvector and λ is an eigenvalue.

The goal of this section is to prove the following theorem.

Theorem 18. Let A ∈ Cn×n. Then there is a basis for Cn consisting of generalizedeigenvectors of A.

We will also discuss methods for constructing such a basis.

Before proving the main theorem, we discuss a number of preliminary results.Let V be a vector space and let V1, . . . , Vk be subspaces. We say that V is thedirect sum of V1, . . . , Vk if each vector x ∈ V can be written in a unique way as

x = x1 + x2 + · · ·+ xk, where xi ∈ Vi, i = 1, . . . , k.

If this is the case we use the notation

V = V1 ⊕ V2 ⊕ · · · ⊕ Vk.

We say that a subspace W of V is invariant under A if

x ∈ W ⇒ Ax ∈ W.

9

Example 19. Suppose that A has n distinct eigenvalues λ1, . . . , λn with cor-responding eigenvectors v1, . . . , vn. It then follows that the vectors v1, . . . , vn arelinearly independent (see Theorem 7.1.2 in Ahmad and Ambrosetti) and thus forma basis for Cn. Let

ker(A− λiI) = span{vi}, i = 1, . . . , n,

be the corresponding eigenspaces. Here

kerB = {x ∈ Cn : Bx = 0}

is the kernel (or null space) of an n× n matrix B. Then

Cn = ker(A− λ1I)⊕ ker(A− λ2I)⊕ · · · ⊕ ker(A− λnI)

by the definition of a basis. It is also clear that each eigenspace is invariant underA.

More generally, suppose that A is diagonalizable, i.e. that it has k distincteigenvalues λ1, . . . , λk and that the geometric multiplicity of each eigenvalue λiequals the algebraic multiplicity ai. Let ker(A − λiI), i = 1, . . . , k, be the corre-sponding eigenspaces. We can then find a basis for each eigenspace consisting ofai eigenvectors. The union of these bases consists of a1 + · · · + ak = n elementsand is linearly independent, since eigenvectors belonging to different eigenvaluesare linearly independent (this follows from an argument similar to Theorem 7.1.2in Ahmad and Ambrosetti). We thus obtain a basis for Cn and it follows that

Cn = ker(A− λ1I)⊕ ker(A− λ2I)⊕ · · · ⊕ ker(A− λkI).

In this basis, A has the matrix

D =

λ1I1

. . .λkIk

where each Ii is an ai × ai unit matrix. In other words, D is a diagonal matrixwith the eigenvalues on the diagonal, each repeated ai times. This explains whyA is called diagonalizable.

Let A ∈ Cn×n be a square matrix and p(λ) = a0λm+a1λ

m−1 + · · ·+am−1λ+ama polynomial. We define

p(A) = a0Am + a1A

n−1 + · · ·+ am−1A+ amI.

Theorem 20 (Cayley-Hamilton). Let pA(λ) = det(A − λI) be the characteristicpolynomial of A. Then pA(A) = 0.

Proof. Cramer’s rule from linear algebra says that

(A− λI)−1 =1

pA(λ)adj(A− λI),

10

where the adjugate matrix adjB is a matrix whose elements are the cofactors ofB. More specifically, the element on row i and column j of the adjugate matrix forB is Cji, where Cji = (−1)j+i detBji, in which Bji is the (n− 1)× (n− 1) matrixobtained by eliminating row j and column i from B (see sections 7.1.2–7.1.3 ofAhmad and Ambrosetti). Note that each element of adj(A−λI) is a polynomial inλ of degree at most n−1 (since at least one element of the diagonal is eliminated).Thus,

pA(λ)(A− λI)−1 = λn−1Bn−1 + · · ·+ λB1 +B0,

for some constant n× n matrices B0, . . . , Bn−1. Multiplying with A− λI gives

pA(λ)I = pA(λ)(A− λI)(A− λI)−1

= −λnBn−1 + λn−1(ABn−1 −Bn−2) + · · ·+ λ(AB1 −B0) + AB0.

Thus,

−Bn−1 = a0I,

ABn−1 −Bn−2 = a1I,

...AB1 −B0 = an−1I,

AB0 = anI,

wherepA(λ) = a0λ

n + a1λn−1 + · · ·+ an−1λ+ an.

Multiplying the rows by An, An−1, . . ., A, I and adding them, we get

pA(A) = a0An + a1A

n−1 + · · ·+ an−1A+ anI

= −AnBn−1 + An−1(ABn−1 −Bn−2) + · · ·+ A(AB1 −B0) + AB0

= −AnBn−1 + AnBn−1 − An−1Bn−2 + · · ·+ A2B1 − AB0 + AB0

= 0.

Lemma 21. Suppose that p(λ) = p1(λ)p2(λ) where p1 and p2 are relatively prime.If p(A) = 0 we have that

Cn = ker p1(A)⊕ ker p2(A)

and each subspace ker pi(A) is invariant under A.

Proof. The invariance follows from pi(A)Ax = Api(A)x = 0, x ∈ ker pi(A). Sincep1 and p2 are relatively prime, it follow by Euclid’s algorithm that there existpolynomials q1, q2 such that

p1(λ)q1(λ) + p2(λ)q2(λ) = 1.

Thusp1(A)q1(A) + p2(A)q2(A) = I.

11

Applying this identity to the vector x ∈ Cn, we obtain

x = p1(A)q1(A)x︸︷︷︸x2

+ p2(A)q2(A)x︸︷︷︸x1

,

wherep2(A)x2 = p2(A)p1(A)q1(A)x = p(A)q1(A)x = 0,

so that x2 ∈ ker p2(A). Similarly x1 ∈ ker p1(A). Thus V = ker p1(A) + ker p2(A).On the other hand, if

x1 + x2 = x′1 + x′2, xi, x′i ∈ ker pi(A), j = 1, 2,

we obtain that

y = x1 − x′1 = x′2 − x2 ∈ ker p1(A) ∩ ker p2(A),

so that

y = p1(A)q1(A)y + p2(A)q2(A)y = q1(A)p1(A)y + q2(A)p2(A)y = 0.

It follows that the representation x = x1 + x2 is unique and therefore

Cn = ker p1(A)⊕ ker p2(A).

Recall that the characteristic polynomial pA(λ) = det(A−λI) can be factorizedas

pA(λ) = (−1)n(λ− λ1)a1 · · · (λ− λk)ak

where λ1, . . . , λk are the distinct eigenvalues of A and a1, . . . , ak the correspondingalgebraic multiplicities. Applying Theorem 20 and Lemma 21 to pA(λ) we obtainthe following important result.

Theorem 22. We have that

Cn = ker(A− λ1I)a1 ⊕ · · · ⊕ ker(A− λkI)ak ,

where each ker(A− λiI)ai is invariant under A.

Proof. We begin by noting that the polynomials (λ− λi)ai , i = 1, . . . , k, are rela-tively prime. Repeated application of Lemma 21 therefore shows that

Cn = ker(A− λ1I)a1 ⊕ · · · ⊕ ker(A− λkI)ak ,

with each ker(A− λiI)ai invariant.

The space ker(A − λiI)ai is called the generalized eigenspace correspondingto λi. We leave it as an exercise to show that this is the space spanned by allgeneralized eigenvectors of A corresponding to λi.

We can now prove Theorem 18.

12

Proof of Theorem 18. If we select a basis {vi,1, . . . , vi,ni} for each subspace ker(A−

λiI)ai , then the union {v1,1, . . . , v1,n1 , v2,1, . . . , v2,n2 , . . . , vk,1, . . . , vk,nk} will be a

basis for Cn consisting of generalized eigenvectors. The fact these vectors arelinearly independent follows from Theorem 22. Indeed, suppose that

k∑i=1

(ni∑j=1

αi,j vi,j

)= 0.

Then by the definition of a direct sum, we have∑ni

j=1 αi,j vi,j for each i. But thenfor each i, αi,j = 0, j = 1, . . . , ni since {vi,1, . . . , vi,ni

} is linearly independent.

When A is diagonalizable it takes the form of a diagonal matrix in the eigen-vector basis. We might therefore ask what a general matrix looks like in a basisof generalized eigenvalues. Since each generalized eigenspace is invariant under A,the matrix in the new basis will be block diagonal:

B =

B1

. . .Bk

,

where each Bi is an ni×ni matrix, ni = dim ker(A−λiI)ai . Moreover, Bi only hasone eigenvalue λi. Indeed, Bi is the matrix for the restriction of A to ker(A−λiI)ai

in the basis vi,1, . . . , vi,niand if Av = λv for some non-zero v ∈ ker(A − λiI)ai ,

then0 = (A− λiI)ai v = (λ− λi)ai v ⇒ λ = λi.

It follows that the dimension of ker(A− λiI)ai equals the algebraic multiplicity ofthe eigenvalue λi, that is, ni = ai. This follows since

(−1)n(λ− λ1)a1 · · · (λ− λk)ak = det(A− λI)

= det(B − λI)

= det(B1 − λI1) · · · det(Bk − λIk)= (−1)n(λ− λ1)n1 · · · (λ− λk)nk ,

where we have used the facts that the determinant of a matrix is independent ofbasis, and that the determinant of a block diagonal matrix is the product of thedeterminants of the blocks.

Set Ni = Bi − λiIi, where Ii is the ni × ni unit matrix. Then Naii = 0 by the

definition of the generalised eigenspaces. A linear operator N with the propertythat Nm = 0 for some m is called nilpotent.

We can summarize our findings as follows.

Theorem 23. Let A ∈ Cn×n. There exists a basis for Cn in which A has the blockdiagonal form

B =

B1

. . .Bk

,

and Bi = λiIi + Ni, where λ1, . . . , λk are the distinct eigenvalues of A, Ii is theai × ai unit matrix and Ni is nilpotent.

13

We remark that matrix B in the above theorem are not unique. Apart from theorder of the blocks Bi, the blocks themselves depend on the particular bases chosenfor the generalized eigenspaces. There is a particularly useful way of choosing thesebases which gives rise to the Jordan normal form. Although the Jordan normalform will not be required to compute the matrix exponential we present it here forcompleteness and since it is mentioned in Ahmad and Ambrosetti.

Theorem 24. Let A ∈ Cn×n. There exists an invertible n× n matrix T such that

T−1AT = J,

where J is a block diagonal matrix,

J =

J1

. . .Jm

and each block Ji is a square matrix of the form

Ji = λI +N =

λ 1 0

. . . . . .. . . 1

0 λ

,

where λ is an eigenvalue of A, I is a unit matrix and N has ones on the line directlyabove the diagonal and zeros everywhere else. In particular, N is nilpotent.

See http://en.wikipedia.org/wiki/Jordan_normal_form or any advanced text-book in linear algebra for more information. Note in particular that there is analternative version of the Jordan normal form for real matrices with complex eigen-values, which is briefly mentioned in Ahmad and Ambrosetti and discussed in moredetail on the Wikipedia page.

3 Computing the matrix exponentialWe will now use the results of the previous section in order to find an algorithmfor computing etA when A is not diagonalizable. Note that our main interestis actually in solving the equation x′ = Ax, possibly with the initial conditionx(0) = x0. As we will see, this doesn’t actually require computing etA explicitly.

As previously mentioned, when A is diagonalizable, the general solution ofx′ = Ax is given by

(4) x(t) = c1eλ1tv1 + · · ·+ cne

λntvn.

If we want to solve the IVP

(5) x′ = Ax, x(0) = x0,

14

we simply choose c1, . . . , cn so that

x(0) = c1v1 + · · ·+ cnvn = x0.

In other words, the numbers ci are the coordinates for the vector x0 in the basisv1, . . . , vn. Note that each term eλitvi in the solution is actually etAvi. Since thejth column of the matrix etA is given by etAej, where {e1, . . . , en} is the standardbasis, we can compute the matrix exponential by repeating the above steps withinitial data x0 = ej for j = 1, . . . , n.

The same approach works when A is not diagonalizable, with the differencethat the basis vectors vi are now generalized eigenvectors instead of eigenvectors.Denote the basis vectors vi,j, i = 1, . . . , k, j = 1, . . . , ai, as in the proof of Theo-rem 18 (recall that the dimension of each generalized eigenspace is the algebraicmultiplicity of the corresponding eigenvalue). Then the general solution is

x(t) =k∑i=1

ai∑j=1

ci,jetAvi,j

We thus need to computeetAvi,j,

where vi,j is a generalized eigenvector corresponding to the eigenvalue λi. Since

(A− λiI)ai vi,j = 0,

we find that

etAvi,j = etλiI+t(A−λiI)vi,j

= etλiIet(A−λiI)vi,j

= eλitet(A−λiI)vi,j

= eλit(I + t(A− λiI) +

t2

2(A− λiI)2 + · · ·+ ta−1

(a− 1)!(A− λiI)ai−1

)vi,j,

where we have also used the fact that I and (A− λI) commute and the definitionof the matrix exponential. The general solution can therefore be written

(6) x(t) =k∑i=1

ai∑j=1

ci,jeλit

(ai−1∑`=0

t`

`!(A− λiI)`

)vi,j.

In order to solve the IVP (5), we simply have to express x0 in the basis vi,j to findthe coefficients ci,j. Finally, to compute etA we repeat the above steps for eachstandard basis vector ej.

Remark 25. Formula (6) shows that the general solution is a linear combinationof exponential functions multiplied with polynomials. The polynomial factors canonly appear for eigenvalues with (algebraic) multiplicity two or higher. This isprecisely the same structure which we encountered for homogeneous higher orderscalar linear differential equations with constant coefficients.

15

Remark 26. When A is diagonalizable we see from (4) that the general solutionis just a linear combination of exponential functions, without polynomial factors.This seems to contradict the previous remark. A closer look reveals that in thiscase

(7) (A− λiI)2vi,j = 0

for each j. Indeed, we know that ker(A− λiI) ⊆ ker(A− λiI)ai . If A is diagonal-izable then then the geometric multiplicity dim ker(A − λiI) equals the algebraicmultiplicity ai = dim ker(A − λiI)ai , so the two subspaces coincide. This meansthat every generalized eigenvector is in fact an eigenvector, so that (7) holds. Thisimplies that

(A− λiI)`vi,j = 0, ` ≥ 2

in (6) even if ai ≥ 2.

Remark 27. More generally, it might happen that (A − λiI)` vanishes on thegeneralized eigenspace ker(A−λiI)ai for some ` < ai−1. One can show that thereis a unique monic polynomial pmin(λ) such that pmin(A) = 0. pmin(λ) is calledthe minimal polynomial. It divides the characteristic polynomial pA(λ) and cantherefore be factorized as

pmin(λ) = (λ− λ1)m1 · · · (λ− λk)mk ,

with mi ≤ ai for each i. Repeating the proof of Theorem 22, we obtain that

Cn = ker(A− λ1I)m1 ⊕ · · · ⊕ ker(A− λkI)mk ,

so that ker(A−λiI)ai = ker(A−λiI)mi , i.e. (A−λiI)mi vanishes on ker(A−λiI)ai .In fact, one can show that (A − λiI)m vanishes on ker(A − λiI)ai if and only ifm ≥ mi. In the diagonalizable case we have

pmin(λ) = (λ− λ1) · · · (λ− λk),

i.e. mi = 1 for all i.

We now consider some examples.

Example 28. Let

A =

(1 22 1

).

The characteristic polynomial is (λ − 1)2 − 4 = (λ + 1)(λ − 3), so the distincteigenvalues are λ1 = −1 and λ2 = 3. Since the eigenvalues are distinct, there isa basis consisting of eigenvectors. Solving the equations Ax = −x and Ax = 3x,we find the eigenvectors v1 = (1,−1) and v2 = (1, 1). The general solution of thesystem x′ = Ax is thus given by

x(t) = c1e−tv1 + c2e

3tv2 = c1e−t(

1−1

)+ c2e

3t

(11

)

16

In order to compute the matrix exponential, we find the solutions with x(0) = e1

and x(0) = e2, respectively. In the first case, we obtain the equations

c1

(1−1

)+ c2

(11

)=

(10

)⇔

{c1 + c2 = 1

−c1 + c2 = 0

⇔

{c1 = 1

2

c2 = 12.

In the second case, we find that

c1

(1−1

)+ c2

(11

)=

(01

)⇔

{c1 = −1

2

c2 = 12.

Hence,

eAte1 =1

2e−t(

1−1

)+

1

2e3t

(11

)=

1

2

(e−t + e3t

−e−t + e3t

)and

eAte2 = −1

2e−t(

1−1

)+

1

2e3t

(11

)=

1

2

(−e−t + e3t

e−t + e3t

).

Finally,

eAt =1

2

(e−t + e3t −e−t + e3t

−e−t + e3t e−t + e3t

).

Example 29. Let

A =

(−3 4−1 1

).

The characteristic polynomial is (λ+ 1)2, so λ1 = −1 is the only eigenvalue. Thismeans that any vector belongs to the generalized eigenspace ker(A+ I)2, so that

eAtv = e−te(A+I)tv

= e−t(I + t(A+ I))v

= e−t((

1 00 1

)+ t

(−2 4−1 2

))v

= e−t(

1− 2t 4t−t 1 + 2t

)v.

In particular

eAt = e−t(

1− 2t 4t−t 1 + 2t

).

17

We could come to the same conclusion by using the standard basis vectors e1 ande2 as our basis v1,1, v1,2 in the solution formula (6). This would give the generalsolution

x(t) = c1etAe1 + c2e

tAe2 = c1e−t(

1− 2t−t

)+ c2e

−t(

4t1 + 2t

).

There is one more possibility for 2 × 2 matrices. The matrix could be diago-nalizable and still have a double eigenvalue. We leave it as an exercise to showthat this can happen if and only if A is a scalar multiple of the identity matrix,i.e. A = λI for some number λ (which will be the only eigenvalue). In this caseetA = eλtI.

For 3× 3 matrices there are more possibilities.

Example 30. Let

A =

1 0 10 2 0−1 0 −1

.

The characteristic polynomial of A is pA(λ) = −λ2(λ − 2). Thus, A has the onlyeigenvalues λ1 = 0 and λ2 = 2 with algebraic multiplicities a1 = 2 and a2 = 1,respectively. We find that

Ax = 0 ⇐⇒ x = z(1, 0,−1),

Ax = 2x ⇐⇒ x = z(0, 1, 0),

z ∈ C. Thus v1 = (1, 0,−1) and v2 = (0, 1, 0) are eigenvectors corresponding to λ1

and λ2, respectively. We see that A is not diagonalizable.The generalised eigenspace corresponding to λ2 is simply the usual eigenspace

ker(A− 2I), but the one corresponding to λ1 is kerA2. Calculating

A2 =

0 0 00 4 00 0 0

,

we find e.g. the basis v1,1 = v1 = (1, 0,−1), v1,2 = (1, 0, 0) for kerA2 and wepreviously found the basis v2,1 = v2 = (0, 1, 0) for ker(A− 2I).

We have

etA

010

= e2t

010

and

etA

10−1

= e0·t

10−1

=

10−1

.

18

Finally,

etA

100

= (I + tA)

100

=

1 + t 0 t0 1 + 2t 0−t 0 1− t

100

=

1 + t0−t

.

We can thus already write the general solution as

(8) x(t) = c1

10−1

+ c2

1 + t0−t

+ c3e2t

010

,

where we chose to use the simpler notation c1, c2, c3 for the coefficients instead ofc1,1, c1,2, c2,1 from eq. (6).

In order to compute etA, we need to compute etAei for the standard basisvectors. Note that we have already computed

etAe1 =

1 + t0−t

and

etAe2 = e2t

010

,

so it remains to compute etAe3. We thus need to solve the equation001

= c1

10−1

+ c2

100

+ c3

010

,

and a simple calculation gives c1 = −1, c2 = 1 and c3 = 0, so that

etA

001

= −

10−1

+

1 + t0−t

=

t0

1− t

.

Thus,

etA =

1 + t 0 t0 e2t 0−t 0 1− t

.

Example 31. Suppose that we in the previous example wanted to solve the IVP

x′ = Ax, x(0) =

011

.

Of course, once we have the formula for the matrix exponential we can find thesolution by calculating

x(t) = etAx(0) =

1 + t 0 t0 e2t 0−t 0 1− t

011

=

te2t

1− t

.

19

We could however also solve the problem by using the general solution (8) andfinding c1, c2, c3 to match the initial data. We thus need to solve the system0

11

= c1

10−1

+ c2

100

+ c3

010

,

giving c1 = −1, c2 = 1 and c3 = 1. Thus,

x(t) = −

10−1

+

1 + t0−t

+ e2t

010

=

te2t

1− t

,

which coincides with the result of our previous computation.

Example 32. Let

A =

3 1 −10 2 01 1 1

.

The characteristic polynomial of A is pA(λ) = −(λ − 2)3. Thus, A has the onlyeigenvalue λ1 = 2 with algebraic multiplicity 3. The corresponding generalizedeigenspace is the whole of C3. Just like in Example 29 we can therefore computethe matrix exponential directly through the formula

etA = e2tet(A−2I) = e2t

(I + t(A− 2I) +

t2

2(A− 2I)2

).

We find that

A− 2I =

1 1 −10 0 01 1 −1

and (A− 2I)2 =

0 0 00 0 00 0 0

.

Thus the term t2

2(A− 2I)2 vanishes and we obtain

etA = e2t(I + t(A− 2I))

= e2t

1 0 00 1 00 0 1

+ t

1 1 −10 0 01 1 −1

=

(1 + t)e2t te2t −te2t

0 e2t 0te2t te2t (1− t)e2t

.

Example 33. Let

A =

2 0 00 2 1−1 0 2

.

20

Again pA(λ) = −(λ− 2)3 and thus 2 is the only eigenvalue. This time

A− 2I =

0 0 00 0 1−1 0 0

and (A− 2I)2 =

0 0 0−1 0 0

0 0 0

,

so that

etA = e2t

(I + t(A− 2I) +

t2

2(A− 2I)2

)

= e2t

1 0 00 1 00 0 1

+ t

0 0 00 0 1−1 0 0

+t2

2

0 0 0−1 0 0

0 0 0

=

e2t 0 0

− t2

2e2t e2t te2t

−te2t 0 e2t

.

The 4× 4 case can be analyzed in a similar way. In general, the computationswill get more involved the higher n is. Most computer algebra systems have rou-tines for computing the matrix exponential. In Maple this can be done using thecommand MatrixExponential from the LinearAlgebra package.

We can also formulate the above algorithm using the block diagonal represen-tation of A from Theorem 23. Let T be the matrix for the corresponding change ofbasis, i.e. T is the matrix whose columns are the coordinates for the basis vectorsin which A takes the block diagonal form B. Then A = TBT−1 and

etA = TetBT−1,

where

etB =

etB1

. . .etBk

,

and

etBi = et(λiIi+Ni) = etλiIietNi = eλit(Ii + tNi + · · ·+ tai−1

(ai − 1)!Nmi−1i

),

since Nmi = 0 for m ≥ ai.

In the last two examples above, the matrices were already in block diagonalform, so that we could take T = I. Let us instead consider Example 30 from thisperspective.

Example 34. For the matrix

A =

1 0 10 2 0−1 0 −1

21

in Example 30 we found that the generalized eigenspace kerA2 had the basis

v1,1 =

10−1

, v1,2 =

100

and the eigenspace ker(A− 2I) the basis

v2,1 =

010

.

Thus, we take

T =

1 1 00 0 1−1 0 0

and find that

T−1 =

0 0 −11 0 10 1 0

.

Since Av1,1 = 0, Av1,2 = v1,1 and Av2,1 = 2v2,1, The corresponding block diagonalmatrix is

B =

0 1 00 0 00 0 2

=

(B1

B2

),

whereB1 =

(0 10 0

), B2 = 2.

We find thatetB1 = I + tB1 =

(1 t0 1

), etB2 = e2t.

Thus,

etB =

1 t 00 1 00 0 e2t

and

etA = TetBT−1 =

1 1 00 0 1−1 0 0

1 t 00 1 00 0 e2t

0 0 −11 0 10 1 0

=

1 + t 0 t0 e2t 0−t 0 1− t

.

in agreement with our previous calculation.

22

Exercises1. Compute eA by summing the power series when

a) A =

(0 11 0

)b) A =

0 1 −20 0 20 0 0

.

2. Compute etA by diagonalising the matrix, where A =

(0 1−1 0

).

3. Show that‖eA‖ ≤ e‖A‖.

4. a) Show that(eA)∗ = eA

∗.

b) Show that eS is unitary if S is skew symmetric, that is, S∗ = −S.

5. Show that the following identities (for all t ∈ R) imply AB = BA.

a) AetB = etBA, b) etAetB = et(A+B).

6. Let

A1 =

0 1 10 −1 −10 −1 −1

, A2 =

1 1 21 −1 0−1 −1 −2

, A3 =

1 2 03 1 30 −2 1

.

Calculate the generalized eigenspaces and determine a basis consisting of gen-eralized eigenvectors in each case.

7. Calculate etAj for the matrices Aj in the previous exercise.

8. Solve the initial-value problem

x′ = Ax, x(0) = x0,

where

A =

2 1 −14 1 −45 1 −4

and x0 =

100

.

9. The matrix

A =

18 3 2 −120 2 0 0−2 12 2 124 6 3 −16

has the eigenvalues 1 and 2. Find the corresponding generalized eigenspacesand a determine a basis consisting of generalized eigenvectors.

23

10. Consider the initial value problem{x′1 = x1 + 3x2,

x′2 = 3x1 + x2,x(0) = x0.

For which initial data x0 does the solution converge to zero as t→∞?

11. Can you find a general condition on the eigenvalues of A which guaranteesthat all solutions of the IVP

x′ = Ax, x(0) = x0.

converge to zero as t→∞?

12. The matrices A1 and A2 in Exercise 6 have the same eigenvalues. If you’vesolved Exercise 7 correctly, you will notice that all solutions of the IVP cor-responding to A1 are bounded for t ≥ 0 while there are unbounded solutionsof the IVP corresponding to A2. Explain the difference and try to formulatea general principle.

24

the matrix exponential - lunds universitet · the matrix exponential erik wahlén...

Documents