differential equations -...

Differential Equations

Joe Erickson

Contents

1 Basic Principles 11.1 Functions of Several Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Linear Differential Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3 Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4 Explicit and Implicit Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 121.5 Direction Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.6 The Euler Approximation Method . . . . . . . . . . . . . . . . . . . . . . . . 19

2 First-Order Equations 212.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Separable Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.3 Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.4 Exact Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.5 Integrating Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.6 Substitutions and Transformations . . . . . . . . . . . . . . . . . . . . . . . . 41

3 First-Order Applications 473.1 Growth and Decay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.2 Compartmental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4 Higher-Order Equations 544.1 Linear Independence of Functions . . . . . . . . . . . . . . . . . . . . . . . . 544.2 The Theory of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . 604.3 Abel’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.4 Homogeneous Equations with Constant Coefficients . . . . . . . . . . . . . . 734.5 The Differential Operator Approach . . . . . . . . . . . . . . . . . . . . . . . 824.6 Method of Undetermined Coefficients . . . . . . . . . . . . . . . . . . . . . . 844.7 Method of Variation of Parameters . . . . . . . . . . . . . . . . . . . . . . . . 924.8 Cauchy-Euler Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1004.9 Nonlinear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5 Higher-Order Applications 1055.1 Free Mechanical Vibrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.2 Forced Mechanical Vibrations . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6 The Laplace Transform 1166.1 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1166.2 Piecewise Continuity and Exponential Order . . . . . . . . . . . . . . . . . . 1216.3 Definition of the Laplace Transform . . . . . . . . . . . . . . . . . . . . . . . 1286.4 Laplace Transform Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 1356.5 The Inverse Laplace Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.6 The Method of Laplace Transforms . . . . . . . . . . . . . . . . . . . . . . . 1486.7 Piecewise-Defined Nonhomogeneities . . . . . . . . . . . . . . . . . . . . . . . 1546.8 The Convolution Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1606.9 Impulse Functions and the Dirac Delta . . . . . . . . . . . . . . . . . . . . . 164

7 Series Solutions 1727.1 Taylor Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1727.2 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1797.3 Series Solutions Near an Ordinary Point . . . . . . . . . . . . . . . . . . . . . 185

8 Systems of Equations 1948.1 Methods of Solving Systems of Linear ODEs . . . . . . . . . . . . . . . . . . 1948.2 The Theory of First-Order Linear Systems . . . . . . . . . . . . . . . . . . . 2028.3 Homogeneous Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

1

1Basic Principles

1.1 – Functions of Several Variables

The symbol R denotes the set of real numbers. We also define

R2 = {(x1, x2) : x1, x2 ∈ R},and in general

Rn = {(x1, x2, . . . , xn) : xi ∈ R for all 1 ≤ i ≤ n}.

The Cartesian product of two sets X and Y is

X × Y = {(x, y) : x ∈ X and y ∈ Y }.

In particular if [a, b] and [c, d] are closed intervals in R, we have

[a, b]× [c, d] ={

(x, y) : x ∈ [a, b] and y ∈ [c, d]}

={

(x, y) : a ≤ x ≤ b and c ≤ y ≤ d},

which forms a closed rectangle in R2 as shown in Figure 1. The product (a, b)× (c, d), calledan open rectangle, includes all points in the interior of the rectangle. More precisely,

(a, b)× (c, d) ={

(x, y) : a < x < b and c < y < d}.

In algebra, given a function f of a single real-valued variable x, we may construct an equationof the form f(x) = 0 and attempt to determine the solution set of the equation. Of course a

x

y

a b

c

d[a, b]× [c, d]

Figure 1. The Cartesian product of two closed intervals.

2

solution would be some real number c which, when substituted for x in the equation, results ina true statement. That is, c would be such that f(c) = 0 is true, and thus c is an element ofthe solution set of f(x) = 0. However, this is not the only kind of equation that is possible inmathematics. In this course we will be studying a kind of equation whose solution set consistsof functions rather than numbers.

Recall the common practice of letting the value of a function f at x be denoted by y, so thaty = f(x). In this case we call x the independent variable and y the dependent variable.In deference to tradition we will often let y denote both a function and also the numericaloutput of the function, so that the symbol serves a dual role: that of function and number. Thisshould not lead to ambiguity as long as care is taken to understand the context in which thesymbol y appears. In the same vein the symbol y(x) will also have two possible meanings: itcan be taken to be the number the function y returns as output when given the number x asinput (which is the technically correct interpretation), and it can be taken to be the function yitself (but with an added emphasis that the function depends on x). We write

y : D ⊆ R→ R

to indicate that y is a function that maps each real number x ∈ D to some number in R.Commonly encountered in this text will be real-valued functions of two or more independent

variables, otherwise known as multivariable functions. For instance we may have a functionf that depends on two independent variables x and y. The symbol f(x, y) can represent eitherthe real number that f returns as output when given the numbers x and y as inputs, or thefunction f itself. (As in the single-variable setting context will make clear which interpretationis the correct one.) It is only natural to put the variables x and y together in a pair (x, y) andthink of the domain D of f as consisting of points in R2, in which case we write f : D ⊆ R2 → R.For example we may have

f(x, y) =x3y − y2 ln(xy)√

x2 + y2.

Extending this idea we have f : D ⊆ Rn → R, a function of n variables x1, . . . , xn such thatf(x1, . . . , xn) ∈ R for each (x1, . . . , xn) ∈ D. Unless otherwise specified we always take thedomain of a multivariable function f , Dom(f), to be the set of all points (x1, . . . , xn) in Rn forwhich f(x1, . . . , xn) is defined as a real number. That is,

Dom(f) = {(x1, . . . , xn) ∈ Rn : f(x1, . . . , xn) ∈ R}.

An example of a function of four variables would be f : R4 → R given by

f(x1, x2, x3, x4) =3 cos(x1x3) + 5 sin2(x32)

|x1 + x4|+ 2.

Returning to the case of a function f of two independent variables x and y, again conven-tionally denoted by f(x, y), it’s natural to wonder what it means to differentiate f . In fact fcan be differentiated with respect to either variable x or y. The way this is done is to treat oneof the variables as a constant and differentiate with respect to the other variable in the usualmanner developed in first-semester calculus.

3

Definition 1.1. Suppose (x, y) ∈ R2 is an interior point of Dom(f). The partial derivativeof f with respect to x at (x, y) is

fx(x, y) = limh→0

f(x+ h, y)− f(x, y)

h,

and the partial derivative of f with respect to y at (x, y) is

fy(x, y) = limh→0

f(x, y + h)− f(x, y)

h,

provided these limits exist.

The functions fx and fy are together referred to as the “first-order partial derivatives” of f ,or simply the “first partials” of f . If f is a function of three variables x, y, and z, then thereare three first partials of f : fx, fy, and fz, where

fx(x, y, z) = limh→0

f(x+ h, y, z)− f(x, y, z)

h

and so on. In general, for a function f of n variables x1, . . . , xn, we have

fxi(x1, . . . , xn) = limh→0

f(x1, . . . , xi + h, . . . , xn)− f(x1, . . . , xn)

h

for each 1 ≤ i ≤ n.Besides the symbol fx (called subscript notation), the partial derivative of f with respect to

x may be denoted by ∂xf (operator notation) or ∂f/∂x (Leibniz notation). Correspondinglyfx(x, y) can be denoted by

∂f

∂x(x, y) or ∂xf(x, y).

All notations extend naturally into higher-order partial derivatives. For instance,

fxx = (fx)x, ∂xxf = ∂x(∂xf), and∂2f

∂x2=

∂

∂x

(∂f

∂x

)are three ways to denote the partial derivative of fx with respect to x, otherwise known as thesecond partial derivative of f with respect to x. A mixed partial derivative is a sequenceof partial derivatives with respect to at least two different variables, such as fxy or fyx. In thevarious notations we have

fxy = (fx)y, ∂yxf = ∂y(∂xf), and∂2f

∂y∂x=

∂

∂y

(∂f

∂x

).

Note in particular thatfxy = (fx)y = ∂y(∂xf) = ∂yxf,

so subscript and operator notation denote the partial derivative of fx with respect to y by xyand yx, respectively. Another convenient notational device is to define ∂2xf = ∂xxf , ∂3xf = ∂xxxf ,and so on, with

∂nxf or∂nf

∂xn

used generally to denote the nth partial derivative of f with respect to x.Partial differentiation, which is the process of determining partial derivatives, obeys the

same rules that ordinary differentiation of single-variable functions follows.

4

Consider a function F (x, y). That is, F is a function whose value depends on the independentvariables x and y. It may be that x and y each in turn depend on some other variable t, whichwe indicate by writing x = x(t) and y = y(t). Now define G(t) = F (x(t), y(t)), so G is a functionwhose value depends on the single independent variable t. The derivative of G with respect to tis given by the following Chain Rule:

G′(t) =∂F

∂x(x(t), y(t))

dx

dt(t) +

∂F

∂y(x(t), y(t))

dy

dt(t).

Usually this is written asdG

dt=∂F

∂x

dx

dt+∂F

∂y

dy

dt,

which is more compact but suppresses some information. Now a word of warning: In practiceanalysts and textbook authors commonly do not introduce a new letter such as our G above todenote the function t 7→ F (x(t), y(t)); rather, the old letter F is recycled so as to write

F (t) = F (x(t), y(t)),

and so the Chain Rule is rendered as

dF

dt=∂F

∂x

dx

dt+∂F

∂y

dy

dt.

Returning to the original function F (x, y), it will frequently be the case for us that x isindependent and y is a function of x: y = y(x). Defining F (x) = F (x, y(x)), we have

dF

dx=∂F

∂x

dx

dx+∂F

∂y

dy

dx=∂F

∂x+∂F

∂y

dy

dx. (1.1)

Example 1.2. Given f(x, y) = 3x2y7 − 2xy + 5y − 8x3, find fx and fy.

Solution. To find fx we treat y as a constant and consider x to be the only variable, enablingus to differentiate in the usual fashion.

fx(x, y) = ∂x(3x2y7 − 2xy + 5y − 8x3)

= ∂x(3x2y7)− ∂x(2xy) + ∂x(5y)− ∂x(8x3)

= 6xy7 − 2y − 24x2.

Notice that ∂x(5y) = 0 since 5y is considered to be a constant.To find fy we treat x as a constant and consider y to be the only variable.

fy(x, y) = ∂y(3x2y7 − 2xy + 5y − 8x3)

= ∂y(3x2y7)− ∂y(2xy) + ∂y(5y)− ∂y(8x3)

= 21x2y6 − 2x+ 5,

where ∂y(8x3) = 0. �

5

Example 1.3. Given

f(x, y, z) =sinxy − ln yz

x2 + y3 + z4,

find fx, fy, and fz.

Solution. To find fx we treat y and z as constants and consider x to be the only variable.

fx(x, y, z) =(x2 + y3 + z4)(y cosxy)− 2x(sinxy − ln yz)

(x2 + y3 + z4)2

To find fy we treat x and z as constants.

fy(x, y, z) =

(x2 + y3 + z4)

(x cosxy − 1

yz· z)− 3y2(sinxy − ln yz)

(x2 + y3 + z4)2

=(x2 + y3 + z4)(xy cosxy − 1)− 3y3(sinxy − ln y2)

y(x2 + y3 + z4)2

Finally, to find fz we treat x and y as constants.

fz(x, y, z) =

(x2 + y3 + z4)

(− 1

yz· y)− 4z3(sinxy − ln yz)

(x2 + y3 + z4)2

=(x2 + y3 + z4) (−1/z) + 4z3(ln yz − sinxy)

(x2 + y3 + z4)2

=4z4(ln yz − sinxy)− (x2 + y3 + z4)

z(x2 + y3 + z4)2

�

We close this section by giving a few other theorems, without proof, that will be neededlater on. The proof of the first theorem, however, may be found in §5.5 of [CAL], and it is onlya minor variant of the Fundamental Theorem of Calculus.

Theorem 1.4. If ϕ is continuous on [a, b] and x0 ∈ [a, b], then the function Φ : [a, b] → Rgiven by

Φ(x) =

ˆ x

x0

ϕ(t)dt

is differentiable on [a, b], with Φ′(x) = ϕ(x) for each x ∈ [a, b].

The integral in Theorem 1.4, even when it exists, can not always be evaluated in terms of anelementary function. The set of elementary functions encompasses all polynomial, rational,trigonometric, inverse trigonometric, logarithmic, and exponential functions, as well as functionsthat can be constructed from these via a finite sequence of arithmetic operations (addition,subtraction, multiplication, division, and root extraction). An example would be the rather

innocuous-looking integral´ x0et

2dt for any x > 0.

Theorem 1.5 (Clairaut’s Theorem). Let U ⊆ R2 be an open set and f(x, y) a real-valuedfunction on U . If fxy and fyx are continuous on U , then fxy = fyx on U .

http://faculty.bucks.edu/erickson/math140/140chap5.pdf

6

Theorem 1.6 (Leibniz’s Integral Rule). If f and fy are continuous on [a, b] × [c, d] andF : [c, d]→ R is defined by

F (y) =

ˆ b

a

f(x, y)dx,

then F ′(y) exists on [c, d] and is given by

F ′(y) =

ˆ b

a

fy(x, y)dx.

In the Leibniz notation the conclusion of Leibniz’s Integral Rule is written as

d

dy

ˆ b

a

f(x, y)dx =

ˆ b

a

∂f

∂y(x, y)dx.

If fx is given to be continuous instead of fy, a similar formula holds:

d

dx

ˆ d

c

f(x, y)dy =

ˆ d

c

∂f

∂x(x, y)dy.

7

1.2 – Linear Differential Operators

Recall from algebra the customary definitions for the scalar multiple of a function and thesum of two functions: if c is a constant and f, g are real- or complex-valued functions of a singlevariable x, then the new function cf is given by

(cf)(x) = cf(x) (1.2)

for any x in the domain of f , and the function f + g is given by

(f + g)(x) = f(x) + g(x) (1.3)

for any x in the domain of f and g.An operator is a function L that takes as “input” a function f and returns as “output”

another function L[f ]. Here we will only be concerned with operators on real- or complex-valuedfunctions of a single real variable. If L is an operator and c any constant, then we define cL tobe the operator given by

(cL)[f ] = cL[f ] (1.4)

for any function f ; and if L1 and L2 are two operators, we define L1 + L2 by

(L1 + L2)[f ] = L1[f ] + L2[f ]. (1.5)

To be clear, cL[f ] and L1[f ] + L2[f ] are functions given by(cL[f ]

)(x) = c · L[f ](x) and

(L1[f ] + L2[f ]

)(x) = L1[f ](x) + L2[f ](x),

in accord with the rules (1.2) and (1.3).Let y be a function of x, which we typically indicate by writing y = y(x). For each integer

n ≥ 1 we defineDn[y] = y(n),

and also set D0y = y. Thus Dn is the “nth derivative operator,” frequently denoted by dn/dxn

in calculus. It is common to write Dny instead of Dn[y], so that

Dny(x) = Dn[y](x) = y(n)(x)

for any x for which y(n)(x) is defined.Now, for coefficient functions a0(x), . . . , an(x), with an(x) not the zero function, we define

the operator

Λ =n∑k=0

ak(x)Dk = an(x)Dn + an−1(x)Dn−1 + · · ·+ a1(x)D + a0(x), (1.6)

where we hasten to stress that a0(x) here denotes the operator a0(x)D0. By the natural extensionof (1.5), readily proven by induction, we thus have

Λ[y] =n∑k=0

ak(x)Dk[y] = an(x)y(n) + an−1(x)y(n−1) + · · ·+ a1(x)y′ + a0(x)y.

Once again it is common to denote Λ[y] more simply by Λy.

8

Using (1.4) and (1.5), followed by the properties of differentiation established in calculus, wefind that

Λ[f + g] =

(n∑k=0

ak(x)Dk

)[f + g]

(1.5)

=n∑k=0

(ak(x)Dk

)[f + g]

(1.4)

=n∑k=0

ak(x)Dk[f + g] =n∑k=0

ak(x)(Dk[f ] +Dk[g]

)=

n∑k=0

ak(x)Dk[f ] +n∑k=0

ak(x)Dk[g] = Λ[f ] + Λ[g], (1.7)

for any suitably differentiable functions f and g, and similarly

Λ[cf ] = cΛ[f ] (1.8)

for any constant c (left as an exercise). Since

Λ[0] = Λ[0 + 0] = Λ[0] + Λ[0]

by (1.7), we see that any differential operator applied to the zero function yields again the zerofunction:

Λ[0] = 0. (1.9)

Any operator having the properties exhibited by (1.7) and (1.8) is said to be linear, andsince a differential operator is any operator constructed using the derivative operator D,we naturally designate any operator of the form (1.6) a linear differential operator.1 Anexample of a linear differential operator is

x6D5 − sin(x)D3 + (3x− 5)D +4

x, (1.10)

so that for any function y for which y(5) is defined on some open interval of real numbers wehave (

x6D5 − sin(x)D3 + (3x− 5)D +1

x

)y = x6D5y − sin(x)D3y + (3x− 5)Dy +

y

x

= x6y(5) − sin(x)y′′′ + (3x− 5)y′ +y

x.

An example of a nonlinear differential operator would be something like lnD, given by

(lnD)[y] = ln(Dy) = ln y′,

for in general we have

(lnD)[f + g] = ln[(f + g)′] = ln(f ′ + g′)

6= ln f ′ + ln g′ = (lnD)[f ] + (lnD)[g].

1In the interests of brevity we will often still refer to a linear differential operator as a “differential operator”or “operator” if linearity is clear from context.

9

Given differential operators Λ1 and Λ2, we define the product Λ1Λ2 by

(Λ1Λ2)[y] = Λ1[Λ2y]. (1.11)

In the notation of elementary algebra Λ1Λ2 would be written as Λ1 ◦ Λ2, and so it may appearthat we are courting confusion with the definition (1.11). After all, in elementary algebrathe convention is that (fg)(x) = f(x)g(x) while (f ◦ g)(x) = f(g(x)). However, in all futuredevelopments both theoretical and computational we will never have need of the “product”Λ1[y]Λ2[y], and so the definition (1.11) as it specifically applies to differential operators shouldnever lead to ambiguity.

Remark. From the definition 1.11 we see that

(DD)[y] = D[Dy] = D[y′] = y′′ = D2[y],

so that DD = D2, and in generalD · · ·D︸︷︷︸n factors

= Dn.

The expression at right in (1.6) is the standard form for a linear differential operator. Theorder of a linear differential operator is defined to equal the order of the highest-order derivativeoperator Dk present in its standard form. Thus the operator Λ in (1.6) is an nth-order lineardifferential operator, and the operator (1.10) is 5th-order.

The following theorem establishes that the product of two linear differential operators havingconstant coefficients is not only commutative, but is also formally carried out in the samemanner as taking a product of two polynomials. More is said about this matter in §4.5, when itis put to use to solve certain differential equation.

Theorem 1.7. If

Λ1 =n∑k=0

akDk and Λ2 =

m∑j=0

bjDj

for constants a0, . . . , an, b0, . . . , bm, then

Λ1Λ2 =n∑k=0

m∑j=0

akbjDk+j,

and moreover Λ1Λ2 = Λ2Λ1.

Proof. Noting that DkDj = Dk+j = Dj+k = DjDk in general,

(Λ1Λ2)[y] = Λ1[Λ2y] = Λ1

[m∑j=0

bjDjy

]=

m∑j=0

bjΛ1[Djy]

=m∑j=0

bj

(n∑k=0

akDk[Djy]

)=

m∑j=0

n∑k=0

akbjDk+jy

=n∑k=0

m∑j=0

akbjDk+jy =

n∑k=0

ak

(m∑j=0

bjDj[Dky]

)

10

=n∑k=0

akΛ2[Dky] = Λ2

[n∑k=0

akDky

]= Λ2[Λ1y]

= (Λ2Λ1)[y]

for any function y. �

11

1.3 – Ordinary Differential Equations

We begin the study of differential equations with some definitions. Recall that y(n) representsthe nth derivative of a function y.

Definition 1.8. Given an integer n ≥ 1 and a function F : D ⊆ Rn+2 → R, an nth-orderordinary differential equation in y is an equation of the form

F(x, y(x), y′(x), . . . , y(n)(x)

)= 0, (1.12)

where y is any real-valued function of x that satisfies the equation.Given a function G : D ⊆ Rn+1 → R, an explicit nth-order ordinary differential

equation in y is an nth-order ordinary differential equation in y that can be written in theform

G(x, y(x), y′(x), . . . , y(n−1)(x)

)= y(n)(x). (1.13)

What it means for a function y to “satisfy” an ordinary differential equation (ODE) willbe defined in precise terms in the next section. The ODE (1.12) is defined by a function Fthat takes n + 2 real numbers as inputs: x, y(x), y′(x), . . . , yn(x). The variable x is calledthe independent variable of the ODE, and though y, y′, . . . , y(n) all depend on x (i.e. arefunctions of x), it’s customary to refer to only y as the dependent variable of the ODE.The ODE (1.13), in contrast, is defined by a function G that takes n+ 1 real number inputs:x, y(x), y′(x), . . . , y(n−1)(x). Once again x is the independent variable and y the dependentvariable. Employing the common practice of letting the symbol for a function double as thesymbol for the function’s numerical output, so that y = y(x), y′ = y′(x), and so on, we canrewrite equations (1.12) and (1.13) more simply as

F(x, y, y′, . . . , y(n)

)= 0

andG(x, y, y′, . . . , yn−1

)= y(n).

If n = 2 then F is a function of x, y, y′, and y′′, so for example we could have

F (x, y, y′, y′′) = 4y′′ − 3x2y′ +√xy − 2x

and thereby obtain the 2nd-order ODE

4y′′ − 3x2y′ +√xy − 2x = 0. (1.14)

The true “variable of interest” in an ODE is y—which actually is a function—and not theindependent variable x as one might expect. What hope do we have that there actually exists afunction of x which, when substituted for y in equation (1.12), will in fact satisfy the equation?If such a function does exist, how do we find it? And if we find one such function, might therebe others? Finding answers to these questions is precisely what the formal study of differentialequations is all about.

12

1.4 – Explicit and Implicit Solutions

Unlike an algebraic equation which has a solution set consisting of one or more numbers, thesolution set of a differential equation is in general a family of functions. However for practicalpurposes we are only interested in functions ϕ of a variable x that satisfy a differential equationfor all x in an open interval I ⊆ R. This is what motivates the following definition, which finallyclarifies what it means for a function ϕ to “satisfy” an nth-order ODE in y.

Definition 1.9. A function ϕ is an explicit solution to F(x, y, y′, . . . , y(n)

)= 0 if there exists

an open interval I ⊆ Dom(ϕ) such that

F(x, ϕ(x), ϕ′(x), . . . , ϕ(n)(x)

)= 0

for all x ∈ I.

Thus an explicit solution to an ODE F(x, y, y′, . . . , y(n)

)= 0 is a function, not a number.

Specifically we seek a function ϕ which, when substituted for y in the ODE, has the effect ofsatisfying the equation for all x in some open interval I, and not just at isolated values of x!

Example 1.10. Is the function ϕ(x) = x2 − x−1 an explicit solution to x2y′′ − 2y = 0? To findout, first substitute ϕ(x) for y in the left-hand side of the differential equation to obtain

x2ϕ′′(x)− 2ϕ(x) = 0. (1.15)

Now, since

ϕ′′(x) = 2− 2x−3,

equation (1.15) is found to be equivalent to

x2(2− 2x−3)− 2(x2 − x−1) = 0,

from which we obtain

2x2 − 2x−1 − 2x2 + 2x−1 = 0.

It is seen that 0 = 0 results for any x ∈ (−∞, 0) ∪ (0,∞). That is, ϕ is an explicit solution tothe ODE on the interval (0,∞), and also on the interval (−∞, 0). �

Example 1.11. Verify that ϕ(x) = c1ex + c2e

−2x is a solution to y′′+ y′− 2y = 0 for any choiceof constants c1 and c2.

Solution. Substitute ϕ(x) for y in the ODE to obtain ϕ′′(x) + ϕ′(x)− 2ϕ(x) = 0. Since

ϕ′(x) = c1ex − 2c2e

−2x

and

ϕ′′(x) = c1ex + 4c2e

−2x,

from ϕ′′(x) + ϕ′(x)− 2ϕ(x) = 0 we obtain

(c1ex + 4c2e

−2x) + (c1ex − 2c2e

−2x)− 2(c1ex + c2e

−2x) = 0.

13

Rearranging gives

(c1ex + c1e

x − 2c1ex) + (4c2e

−2x − 2c2e−2x − 2c2e

−2x) = 0,

which yields 0 = 0 for all x ∈ R.Therefore ϕ(x) = c1e

x + c2e−2x is a solution to y′′ + y′ − 2y = 0 for all x ∈ (−∞,∞). �

Recall that a relation is any set of ordered pairs. Often a relation is defined by an algebraicequation featuring two variables x and y, with the set of ordered pairs being taken to be theassociated solution set of the equation. For instance the equation x2 + y2 = 4 is the relationconsisting of the ordered pairs corresponding to the points in R2 that lie on a circle of radius 2centered at the origin. If we let f(x, y) = x2 + y2 − 4, then the relation can be expressed simplyas f(x, y) = 0. Now, it is a fact that this relation implicitly defines at least two functions of xwhich, it so happens, can be made explicit by solving x2 + y2 = 4 for y to obtain

y = ±√

4− x2.

Therefore one function of x which f(x, y) = 0 defines is ϕ1(x) =√

4− x2, and the other isϕ2(x) = −

√4− x2.

Now letf(x, y) = x3 + y3 − 2xy

and consider the relation f(x, y) = 0. The ordered pairs belonging to the relation must satisfythe equation x3 + y3 = 2xy, and the question arises: does this relation implicitly define y asa function of x on a given interval I? Solving the equation for y is not so easy this time, butthe following theorem oftentimes renders this unnecessary. In the theorem the symbol C ′(U)represents the collection of all continuous functions U → R that have continuous first-orderpartial derivatives.

Theorem 1.12 (Implicit Function Theorem). Let U ⊆ R2 be open, and suppose f ∈ C ′(U)is such that f(x0, y0) = 0 at (x0, y0) ∈ U .

1. If fy(x0, y0) 6= 0, then there exists an open interval I ⊆ R and a function ϕ ∈ C ′(I) such thatx0 ∈ I, ϕ(x0) = y0, and f(x, ϕ(x)) = 0 for all x ∈ I.

2. If fx(x0, y0) 6= 0, then there exists an open interval I ⊆ R and a function ψ ∈ C ′(I) such thaty0 ∈ I, ψ(y0) = x0, and f(ψ(y), y) = 0 for all y ∈ I.

In part (1) it’s understood that I is sufficiently small so that (x, ϕ(x)) ∈ U for all x ∈ I, andsimilarly in part (2) I is such that (ψ(y), y) ∈ U for all y ∈ I. The theorem is so named becauseit provides a means for determining when an equation of the form f(x, y) = 0 implicitly defineseither y as a function of x or x as a function of y.

Remark. In Theorem 1.12 it is not necessary to have f(x0, y0) = 0 specifically. If f(x0, y0) = cand fy(x0, y0) 6= 0 for some c 6= 0, we can define g(x, y) = f(x, y)− c so that g(x0, y0) = 0 andgy(x0, y0) = fy(x0, y0) 6= 0. It follows that there exists an open interval I ⊆ R and a functionϕ ∈ C ′(I) such that x0 ∈ I, ϕ(x0) = y0, and g(x, ϕ(x)) = 0 for all x ∈ I. From this we concludethat f(x, ϕ(x)) = c for all x ∈ I, which is to say the relation given by f(x, y) = c defines y as afunction of x for all x in a sufficiently small neighborhood of x0.

14

Example 1.13. Consider the relation R defined by the equation x3 + y3 = 2xy. On whatinterval for x can we expect R to implicitly define y as a function of x? To help clarify matterslet f be the function given by f(x, y) = x3 + y3− 2xy, and note that R can now be expressed asf(x, y) = 0. Certainly f is of class C ′ on R2, with fy(x, y) = 3y2 − 2x in particular. Now, since

f(1, 1) = 13 + 13 − 2(1)(1) = 0

andfy(1, 1) = 3(1)2 − 2(1) = 1 6= 0,

the Implicit Function Theorem implies there is an open interval I ⊆ R containing 1, and afunction ϕ : I → R that is of class C ′ on I, such that ϕ(1) = 1 and f(x, ϕ(x)) = 0 for all x ∈ I.That is, setting y = ϕ(x) will satisfy the equation f(x, y) = 0 for all x ∈ I, and therefore Rimplicitly defines y as a function x in a neighborhood of x = 1. �

We will be interested in determining whether a relation G(x, y) = 0 implicitly defines afunction which satisfies an ordinary differential equation.

Definition 1.14. A relation G(x, y) = 0 is an implicit solution to F(x, y, y′, . . . , y(n)

)= 0

if it implicitly defines at least one function ϕ(x) that is an explicit solution to the ODE.

Example 1.15. Show that the relation

x2 = 1 + sin(x+ y)

is an implicit solution to y′ = 2x sec(x+ y)− 1.

Solution. If we letG(x, y) = x2 − sin(x+ y)− 1,

then the relation can be expressed as G(x, y) = 0. It is necessary to show that this relationexpresses y as a function of x on at least one interval I of values for x, which is to say there is afunction ϕ such that y = ϕ(x) for all x ∈ I. Toward this end, first notice that G ∈ C ′(R2) and

G(1,−1) = 12 − sin(1− 1)− 1 = 1− sin(0)− 1 = 0.

Now, from Gy(x, y) = − cos(x + y) we obtain Gy(1,−1) = − cos(0) = −1 6= 0, and thus bythe Implicit Function Theorem there is some open interval I containing x = 1, and a functionϕ : I → R, such that G(x, ϕ(x)) = 0 for all x ∈ I. That is, the relation G(x, y) = 0 definesy = ϕ(x) for x ∈ I, which is to say

x2 = 1 + sin(x+ ϕ(x))

is satisfied for all x ∈ I, where we have substituted ϕ(x) for y in the equation x2 = 1+sin(x+y).Since the functions x2 and 1 + sin(x+ ϕ(x)) are equal on I it follows that their derivatives

are also equal on I,(x2)′ = [1 + sin(x+ ϕ(x))]′,

where

(x2)′ = [1 + sin(x+ ϕ(x))]′ ⇒ 2x = cos(x+ ϕ(x)) · (x+ ϕ(x))′

⇒ 2x = cos(x+ ϕ(x)) · (1 + ϕ′(x))

15

⇒ ϕ′(x) = 2x sec(x+ ϕ(x))− 1.

Hence ϕ is a function such that ϕ′(x) = 2x sec(x+ ϕ(x))− 1 for all x ∈ I, which shows that ϕis an explicit solution to y′ = 2x sec(x+ y)− 1 and therefore x2 = 1 + sin(x+ y) is an implicitsolution to the ODE. �

Definition 1.16. An initial value problem is an ODE

F (x, y, y′, . . . , y(n)) = 0

together with initial conditions

y(x0) = y0, y′(x0) = y1, . . . , y

(n−1)(x0) = yn−1,

where x0, y0, y1, . . . , yn−1 are given constants.A solution to an initial value problem is a solution ϕ : I → R to the ODE such that x0 ∈ I

and ϕ satisfies the initial conditions.

To be clear, to say ϕ satisfies the initial conditions means that

ϕ(x0) = y0, ϕ′(x0) = y1, . . . , ϕ

(n−1)(x0) = yn−1.

It’s understood that the set I in Definition 1.16 is an open interval as required by Definitions1.9 and 1.14.

Example 1.17. Find a solution to the initial value problem

y′′ + y′ − 2y = 0, y(1) = 1, y′(1) = 0.

Solution. In Example 1.11 it was found that

ϕ(x) = c1ex + c2e

−2x

is a solution to the ODE y′′+ y′− 2y = 0. What remains to do, then, is to find values for c1 andc2 so that ϕ satisfies the initial conditions, if possible. That is, c1 and c2 must be determined sothat ϕ(1) = 1 and ϕ′(1) = 0. From ϕ(1) = 1 we have

c1e+ c2e−2 = 1, (1.16)

and from ϕ′(1) = 0 we havec1e− 2c2e

−2 = 0. (1.17)

Now, (1.16) and (1.17) imply that

c1e = 1− c2e−2 and c1e = 2c2e−2,

respectively, and thus 1− c2e−2 = 2c2e−2. From this comes 3c2e

−2 = 1, and finally c2 = e2/3.Putting this result into (1.16) then gives

c1e+e2

3e−2 = 1,

whence we obtain c1 = 2/3e.Therefore

ϕ(x) =2

3eex +

e2

3e−2x =

2

3ex−1 +

1

3e−2x+2

16

is a solution to the IVP. �

Not every initial value problem has a solution, and even if there is a solution it is notalways clear that there cannot be other solutions. The following theorem, however, is useful fordetermining when a first-order IVP has a unique solution.

Theorem 1.18 (Existence-Uniqueness Theorem). Given the initial value problemy′ = f(x, y), y(x0) = y0, if f and fy are continuous on some open set U containing (x0, y0),then the IVP has a unique solution ϕ : I → R on some open interval I containing x0.

So, given the IVP y′ = f(x, y), y(x0) = y0, it is a fact that if the functions f and fy are bothcontinuous on some open set U with (x0, y0) ∈ U , then there exists a unique solution to theIVP of the form ϕ : I → R, where I = (x0 − δ, x0 + δ) for some sufficiently small δ > 0. Theproof of this so-called Existence-Uniqueness Theorem will be a long time in coming, but for themoment it will suffice to see it put into practice.

Example 1.19. Determine whether the initial value problem

y′ + cos y = sinx, y(π) = 0,

has a unique solution.

Solution. First we rewrite the ODE as

y′ = sinx− cos y,

so that the function f in Theorem 1.18 is f(x, y) = sinx− cos y. Now, since fy(x, y) = sin y, itis clear that both f and fy are continuous everywhere on R2. That is, we can take the set U inTheorem 1.18 to be R2 itself. From the initial condition y(π) = 0 we have x0 = π and y0 = 0, so(x0, y0) = (π, 0) ∈ R2. The hypotheses of Theorem 1.18 are all satisfied, and therefore it can beconcluded that there does indeed exist a unique solution to the IVP. More specifically the IVPhas a unique solution of the form ϕ : I → R, where I is some open interval containing π. �

17

1.5 – Direction Fields

A first-order ordinary differential equation of the form y′ = f(x, y) specifies a value for y′

at each point (x, y) ∈ R2 where f(x, y) is defined. The value is most naturally interpreted tobe the slope of a solution curve y = ϕ(x) for the ODE that passes through the point (x, y). Adirection field is a plot of line segments of identical length drawn at regularly spaced pointsin some rectangle R in the xy-plane, each line segment with midpoint at (x, y) having slopegiven by f(x, y).

Example 1.20. Consider the ODE y′ = x2 − y, a portion of whose direction field is given atleft in Figure 2. The general solution to y′ = x2 − y is the one-parameter family of functions

ϕ(x) = x2 − 2x+ 2 + ce−x. (1.18)

At (0, 0) we find that y′ = 02 − 0 = 0, indicating that a solution curve to the ODE whichpasses through (0, 0) must have a slope of 0 there. So in particular the solution to the initialvalue problem y′ = x2 = y, y(0) = 0, which will be unique, must be a function ϕ : I → Rwhose graph is a curve that has a horizontal tangent line at (0, 0). See the red curve at right inFigure 2, which is the curve that contains the point (0, 0). From (1.18) and the initial conditiony(0) = 0 we can solve for c and determine that the curve is given by

ϕ(x) = x2 − 2x+ 2− 2e−x.

At (1, 0) we obtain y′ = 12 − 0 = 1, so a solution curve to the ODE which passes through(1, 0) must have a slope of 1 there. In particular any solution to the initial value problemy′ = x2 = y, y(1) = 0, which will be unique, must be a function whose graph is a curve that has

x

y

3

Figure 2. The direction field for y′ = x2 − y, along with some solution curves.

18

t

p(t)

0

1

2

3

2 4

Figure 3. The direction field for p′ = 3p− 2p2, along with solution curves giveninitial conditions p(0) = 3, p(0) = 0.5, and p(0) = 0.001.

a tangent line that makes a 45◦ angle with the positive x-axis. See the green curve at right inFigure 2, which is the curve that contains the points (1, 0). From (1.18) and the initial conditiony(1) = 0 we can solve for c and determine that the curve is given by

ϕ(x) = x2 − 2x+ 2− e−x+1.

In §2.3 we will develop a technique to solve y′ = x2 − y and obtain (1.18). �

Example 1.21. The logistic equation for the population p(t) (in thousands) of a certain speciesat time t (in years) is given to be p′ = 3p− 2p2, which has the direction field shown at left inFigure 3. At right in the figure are graphs of unique solutions that result when initial conditionsp(0) = 3, p(0) = 0.5, and p(0) = 0.001 are given. These graphs can be sketched using thedirection field simply by starting at points (0, 3), (0, 0.5), and (0, 0.001), and drawing curvesthat are approximately parallel to nearby direction markers.

Analyzing the sketched solution curves, it can be seen that if the initial population is 3000(i.e. p(0) = 3), then the limiting population is

limt→∞

p(t) = 1.5,

or 1500; and if the initial population is 500 (i.e. p(0) = 0.5), then the limiting population is

limt→∞

p(t) = 1.5,

or 1500 once more.Over time every population greater than zero will trend toward 1500, according to the model.

In fact even if the initial condition is p(0) = 0.001 (i.e. a population of 1 at time t = 0), it canbe seen at right in Figure 3 that the population will still grow to virtually 1500 by time t = 5years! �

19

1.6 – The Euler Approximation Method

The Euler Approximation Method (or simply Euler’s Method) is an iterativenumerical algorithm for approximating the solution curve y = ϕ(x) for an initial value problemy′ = f(x, y), y(x0) = y0. It start with a step size h and utilizes two recursive formulas,

xn+1 = xn + h (1.19)

and

yn+1 = yn + hf(xn, yn), (1.20)

where n = 0, 1, 2, 3, . . .. The general idea is to use a series of line segments, formed as linearinterpolations between points and connected one to another, to obtain a polygonal path thatstarts at the initial point (x0, y0) given by the initial condition. The procedure will first beillustrated by example.

Example 1.22. Use the Euler Approximation Method with step size h = 0.2 to approximatethe solution to the initial value problem y′ = 2x+ y, y(0) = 0.

Solution. Here f(x, y) = 2x + y. Given the initial condition y(0) = 0 it’s known that thesolution to the IVP must generate a curve that contains the point (0, 0). Thus we have x0 = 0and y0 = 0, and so setting n = 0 in equations (1.19) and (1.20) we obtain

x1 = x0 + h = 0 + 0.2 = 0.2

y1 = y0 + hf(x0, y0) = 0 + 0.2f(0, 0) = 0.2[2(0) + 0] = 0,

yielding the point (x1, y1) = (0.2, 0).Setting n = 1 in (1.19) and (1.20) gives

x2 = x1 + h = 0.2 + 0.2 = 0.4

y2 = y1 + hf(x1, y1) = 0 + 0.2f(0.2, 0) = 0.2[2(0.2) + 0] = 0.08,

yielding the point (x2, y2) = (0.4, 0.08).

x

y

ϕ

`

Figure 4.

20

Next, set n = 2 in (1.19) and (1.20) to get

x3 = x2 + h = 0.4 + 0.2 = 0.6

y3 = y2 + hf(x2, y2) = 0.08 + 0.2f(0.4, 0.08) = 0.08 + 0.2[2(0.4) + 0.08] = 0.256,

yielding the point (x3, y3) = (0.6, 0.256).Next, set n = 3 in (1.19) and (1.20) to get

x4 = x3 + h = 0.6 + 0.2 = 0.8

y4 = y3 + hf(x3, y3) = 0.256 + 0.2[2(0.6) + 0.256] = 0.5472,

yielding the point (x4, y4) = (0.8, 0.5472).Continuing in this fashion we also obtain the points

(x5, y5) = (1.0, 0.9766)

(x6, y6) = (1.2, 1.5720)

(x7, y7) = (1.4, 2.3664)

(x8, y8) = (1.6, 3.3996)

(x9, y9) = (1.8, 4.7196)

(x10, y10) = (2.0, 6.3835),

where y values have been rounded to four decimal places.For n = 0, 1, . . . , 9, let `n be the line segment in R2 that has (xn, yn) and (xn+1, yn+1) as its

endpoints, so `0 is the line segment from (0, 0) to (0.2, 0), `1 is the line segment from (0.2, 0) to(0.4, 0.08), and so on. The union ` of all these line segments,

` = `0 ∪ `1 ∪ · · · ∪ `9,

forms a polygonal path in R2 that serves as an approximation of the actual solution curve forthe IVP. The techniques of the next chapter will enable us to determine that the actual solutioncurve is given by ϕ(x) = 2ex − 2x− 2. A portion of the graph of both ` and ϕ are shown inFigure 4, where it can be seen that the curve ` provides a passable approximation of ϕ near theinitial point (0, 0). �

21

2First-Order Equations

2.1 – Introduction

Let D ⊆ R3 be an open set. A first-order ordinary differential equation in y is anequation of the form

F (x, y, y′) = 0 (2.1)

for some F : D → R, where y is any real-valued function of x for which the domain of y′ isnonempty. We say y is a solution to (2.1) on an interval I ⊆ R if

F (x, y(x), y′(x)) = 0

holds for all x ∈ I. The equation (2.1) becomes an explicit first-order ordinary differentialequation in y if it can be put in the form

y′ = f(x, y),

meaning it is possible to isolate y′.Throughout this chapter we will be developing techniques to solve many kinds of first-order

differential equations, both linear and nonlinear, that arise in applications. Here we shallconsider the simplest differential equation, which has the form

y′ = f(x). (2.2)

Such an equation is solved by direct integration to obtain

y =

ˆf(x)dx+ c,

where here the symbol´f(x)dx may be taken to represent a particular antiderivative of the

function f , and c ∈ R is an arbitrary constant.Suppose F is an antiderivative of f on an interval I. The claim is that the general solution

to (2.2) on I (i.e. the set of all solutions on I) is the set

S = {F + c : c ∈ R}.

22

The verification is straightforward. If y ∈ S, then y = F + c on I for some c ∈ R, so that

y′(x) = [F (x) + c]′ = F ′(x) = f(x)

for all x ∈ I, and hence y is a solution to (2.2) on I. On the other hand suppose y is a solutionto (2.2) on I. Then y′(x) = f(x) for all x ∈ I, which shows that y is an antiderivative of f on I,and since F is another antiderivative of f on I, the functions y and F must differ by a constant;that is, there exists c ∈ R such that y − F = c on I (this is a known calculus result). Thereforey = F + c and we conclude that y ∈ S.

Example 2.1. Find the general solution to

(x2 + 1)(y′)3 = 8000x9

Solution. Solving for y′ gives

y′ =20x3

3√x2 + 1

,

and so

y =

ˆ20x3

3√x2 + 1

dx = 10

ˆx2

3√x2 + 1

· 2xdx.

Let u = x2 + 1, so x2 = u− 1 and we replace 2xdx with du to obtain

y = 10

ˆu− 1

3√udu = 10

ˆ (u2/3 − u−1/3

)du = 6u5/3 − 15u2/3 + c

= 6(x2 + 1)5/3 − 15(x2 + 1)2/3 + c

for arbitrary c ∈ R. �

Example 2.2. The velocity of an object at time t is given by v(t) = te−t. Find the position xof the object at time t, given that x(0) = 4.

Solution. It is known that, if the velocity function v of an object is continuous, then theposition function x will be an antiderivative of v; that is, x′(t) = v(t), and so here

x′(t) = te−t.

Hence

x(t) =

ˆte−tdt = −(t+ 1)e−t + c (2.3)

for some c ∈ R. We are given that x(0) = 4, whereas (2.3) indicates that x(0) = −1 + c, and soc = 5. Therefore

x(t) = −(t+ 1)e−t + 5

is the position at time t. �

23

2.2 – Separable Equations

Definition 2.3. An explicit 1st-order ODE y′ = f(x, y) is separable if there exist functionsg(x) and p(y) such that f(x, y) = g(x)p(y) for all (x, y) ∈ Dom(f).

If we define h(y) = 1/p(y), then a separable equation can just as well be written as

y′ = g(x)/h(y),

which turns out to be convenient as we go forward.Separable equations can be solved with relative ease by what’s known as the Method of

Separation of Variables. The theoretical underpinnings of the method are supplied by thefollowing theorem.

Theorem 2.4. Let c be an arbitrary constant. If functions g(x) and h(y) have antiderivativesG(x) and H(y), respectively, and H(y) = G(x) + c implicitly defines y as a function of x, thenH(y) = G(x) + c is an implicit solution to the separable equation y′ = g(x)/h(y).

Proof. Suppose G and H are antiderivatives of g and h, respectively, and also assume thatH(y) = G(x) + c implicitly defines a function ϕ(x) = y on some open interval I, so that

H(ϕ(x)) = G(x) + c (2.4)

is satisfied for all x ∈ I. Applying implicit differentiation to (2.4) gives

(H ◦ ϕ)′(x) = G′(x),

whereupon the Chain Rule leads to

H ′(ϕ(x))ϕ′(x) = G′(x).

Next, since G′ = g and H ′ = h, we obtain h(ϕ(x))ϕ′(x) = g(x) and hence

ϕ′(x) =g(x)

h(ϕ(x))(2.5)

for all x ∈ I. Observing that (2.5) is precisely the equation that results when ϕ(x) is substitutedfor y in the ODE y′ = g(x)/h(y), it follows that ϕ : I → R is an explicit solution to the ODE.

Therefore H(y) = G(x) + c is an implicit solution to the ODE, since it implicitly defines atleast one function that satisfies it on some interval. �

The Separation of Variables Method is a formal procedure for obtaining the implicitsolution H(y) = G(x) + c to y′ = g(x)/h(y). It flows as follows.

• Write the equation asdy

dx=g(x)

h(y).

• “Multiply” by dx to obtain h(y) dy = g(x) dx.

• Integrate both sides:

ˆh(y) dy =

ˆg(x) dx.

24

Letting H(y) =´h(y) dy and G(x) =

´g(x) dx, and inserting an arbitrary constant c,

immediately yields H(y) = G(x) + c as desired. It should be stressed that the “equation” inthe middle step is not really a proper mathematical equation, since the symbols dx and dy bythemselves have no meaning in this context. Some examples are in order.

Example 2.5. Solve the initial value problem

y′ =2x3 − x+ 5

4− y, y(0) = −2.

Solution. Here we have a first-order ordinary differential equation of the form y′ = f(x, y),and since f(x, y) = g(x)p(y) for

g(x) = 2x3 − x+ 5 and p(y) =1

4− y,

it’s seen that the ODE is separable. Letting h(y) = 1/p(y) = 4− y, we can write the ODE asdy/dx = g(x)/h(y), from which we obtain h(y) dy = g(x) dx and finallyˆ

(4− y) dy =

ˆ(2x3 − x+ 5) dx.

Integrating both sides gives

4y − 1

2y2 + c1 =

1

2x4 − 1

2x2 + 5x+ c2,

where c2 and c2 are arbitrary constants produced by each indefinite integral. Subtracting c2from both sides gives c1− c2 on the right-hand side, which taken as a whole is still nothing morethan an arbitrary constant and so it is convenient to simply denote it by c to obtain

4y − 1

2y2 =

1

2x4 − 1

2x2 + 5x+ c. (2.6)

The Implicit Function Theorem could be used to verify that (2.6) implicitly defines y as afunction of x in at least one way, but we take this for granted. Hence (2.6) is an implicit solutionto the ODE, and if we multiply both sides by 2 it takes the form

8y − y2 = x4 − x2 + 10x+ c, (2.7)

where 2c is written simply as c since, in either case, the term represents any arbitrary realnumber.

It is not necessary to solve (2.7) for y here. The initial condition y(0) = −2 can now be usedto determine c: substituting 0 for x and −2 for y in (2.7), we obtain

8(−2)− (−2)2 = 04 − 02 + 10(0) + c

and so c = −20. An implicit solution to the IVP is therefore

8y − y2 = x4 − x2 + 10x− 20.

Can an explicit solution be found? The answer is yes: if we rewrite the implicit solution inthe form Ay2 +By + C = 0,

y2 − 8y + (x4 − x2 + 10x− 20) = 0,

25

we can use the quadratic formula to get

y =−(−8)±

√(−8)2 − 4(1)(x4 − x2 + 10x− 20)

2(1)= 4±

√36− 10x+ x2 − x4.

Now, putting x = 0 into the right-hand side yields y = 10 from 4 +√· , and y = −2 from

4−√· . Since only y = −2 satisfies the initial condition we conclude that

y(x) = 4−√

36− 10x+ x2 − x4

is the explicit solution to the IVP. �

Example 2.6. Solve the initial value problem:

Ldi

dt+Ri = E, i(0) = i0,

where L 6= 0, R 6= 0, E, and i0 are constants. Assume E 6= Ri0.

Solution. Here the independent and dependent variables are t and i, respectively. Rewritingthe equation as

di

dt=E −RiL

(2.8)

shows it to be separable, since the right-hand side is a product of the functions g(t) = 1 andp(i) = (E −Ri)/L. From this we find thatˆ

L

E −Ridi =

ˆdt ⇒ −L

Rln |E −Ri| = t+ c

⇒ |E −Ri| = e−RL(t+c) = Ce−Rt/L,

where C = e−Rc/L for c arbitrary. With the initial condition i(0) = i0 it follows that C = |E−Ri0|,and hence

|E −Ri| = |E −Ri0|e−Rt/L (2.9)

It remains to resolve the ungainly absolute values, if possible. If E > Ri0, then by theassumed continuity of the function i(t) it must be that E > Ri(t) for all t in some interval ofreal numbers I containing 0, and (2.9) implies

E −Ri = (E −Ri0)e−Rt/L (2.10)

for all t ∈ I. On the other hand if E < Ri0, then E < Ri(t) for all t in some interval I, andagain (2.9) implies (2.10) for all t ∈ I. Therefore

i(t) =E − (E − i0R)e−Rt/L

Rfor all t in some interval containing 0.

What would happen if E = Ri0 were allowed? From (2.8) we would obtain i′(0) = 0, so itmay be that i(t) has a local extremum at 0, but this is by no means assured.2 How the analysiswould proceed would at the very least depend on whether or not E = 0. If E = 0, then i0 = 0also (since R 6= 0 by hypothesis), and i(t) ≡ 0 would be a solution to the IVP. In any case the

2Consider, for example, that f ′(0) = 0 for f(x) = x3.

26

separation of variables maneuver carried out above presupposes that E 6= Ri(t) for all t in someinterval containing 0, since E −Ri must be put in the denominator of a fraction. �


y′ = 2x cos2 y, y(0) = π/3,

and give an interval of validity for the solution.

Solution. The differential equation is of the form y′ = g(x)p(y), with g(x) = 2x and p(y) =cos2 y, and so the Method of Separation of Variables is applicable. From dy/dx = 2x cos2 y wewrite

1

cos2 ydy = 2x dx,

which leads to the equation ˆsec2 y dy =

ˆ2x dx.

Integrating gives tan y = x2 + c, which is an implicit solution to the ODE. Using the giveninitial condition, we substitute x = 0 and y = π/3 to obtain tan(π/3) = 02 + c, so that c =

√3

and we find thattan y = x2 +

√3 (2.11)

is an implicit solution to the IVP.To obtain an explicit solution first recall from trigonometry that the domain of the tangent

function isDom(tan) =

⋃k∈Z

(−π

2+ kπ,

π

2+ kπ

),

and recall from calculus that the tangent function is continuous on its domain. Consider againthe initial condition y(0) =

√3, which in particular is a point in R2 where y is

√3. The only

interval (−π/2 + kπ, π/2 + kπ) in the domain of the tangent function that contains√

3 is theone for which k = 0, which is to say the interval (−π/2, π/2). Thus the implicit solution (2.11)to the IVP must generate a curve that is entirely contained within a narrow band in R2 where−π/2 < y < π/2. It’s known that tan y is one-to-one on (−π/2, π/2) and hence has an inversetan−1. Thus the explicit solution to the IVP is

y(x) = tan−1(x2 +

√3),

where it’s understood that the range of the function y is (−π/2, π/2), the customary range ofthe function tan−1. This is the largest interval of validity for the solution to the IVP. �

27

2.3 – Linear Equations

We determine here a general method for finding the general solution to a first-order lineardifferential equation,

a1(x)y′ + a0(x)y = f(x), (2.12)

on any interval I where the functions a0, a1, and f are continuous, and a1 is nonvanishing on I(that is, a1(x) 6= 0 for all x ∈ I). We observe that if a0 ≡ 0 on I then (2.12) becomes simplyy′ = f(x)/a1(x), which can be solved by direct integration.

Assuming a1 is nonvanishing on I allows us to divide (2.12) by a1(x) to obtain the standardform. Defining p(x) = a0(x)/a1(x) and q(x) = f(x)/a1(x), we have

y′ + p(x)y = q(x), (2.13)

What we would like to do is find a function µ(x), called an integrating factor, for which

µ(x)y′(x) + µ(x)p(x)y(x) = [µ(x)y(x)]′ (2.14)

By the product rule (2.14) becomes

µ(x)y′(x) + µ(x)p(x)y(x) = µ′(x)y(x) + µ(x)y′(x),

which implies that

µ(x)p(x)y(x) = µ′(x)y(x)

and hence µ must be such that µ′(x) = µ(x)p(x). Since p(x) is continuous on I, the FundamentalTheorem of Calculus implies that is has an antiderivative P (x). Define

µ(x) = eP (x),

and observe that

µ′(x) = eP (x)P ′(x) = µ(x)p(x)

precisely as desired. Indeed, if we let´p(x)dx denote a particular antiderivative of p(x), then

setting

µ(x) = e´p(x)dx, (2.15)

will serve our purpose.Now let us see what this choice for µ(x) does. Multiplying both sides of (2.13) by µ(x) as

given by (2.15), we obtain

µ(x)y′ + µ(x)p(x)y = µ(x)q(x),

which by (2.14) becomes

[µ(x)y]′ = µ(x)q(x).

This is ϕ′(x) = µ(x)q(x) for ϕ(x) = µ(x)y, which is a directly integrable equation of the sorttreated in §2.1. Integrating both sides with respect to x yields

µ(x)y =

ˆµ(x)q(x)dx+ c,

28

where c is an arbitrary constant. Finally, since µ is nonvanishing on I, we may divide by it soas to isolate y and obtain explicit solutions to (2.13) of the form

y(x) =1

µ(x)

(ˆµ(x)q(x)dx+ c

).

We have now partially proven the following.

Theorem 2.8. If p and q are continuous on an interval I, then y′(x)+p(x)y = q(x) has generalsolution given by

y(x) = e−´p(x)dx

(ˆq(x)e

´p(x)dxdx+ c

)(2.16)

for all x ∈ I.

Proof. What remains to show is that every solution to (2.13) on I has the form given by (2.16).This follows from an argument much the same as the derivation of (2.16) itself, however. If ϕ(x)is a solution to (2.13) on I, so that

ϕ′(x) + p(x)ϕ(x) = q(x)

for all x ∈ I, then[ϕ(x)e

´p(x)dx

]′= ϕ′(x)e

´p(x)dx + p(x)ϕ(x)e

´p(x)dx = q(x)e

´p(x)dx

on I, and hence

ϕ(x)e´p(x) dx =

ˆq(x)e

´p(x) dxdx+ c

It is now clear that ϕ(x) has the form of the expression at right in (2.16). �

The symbol´p(x)dx usually represents the family of all antiderivatives for p(x) on whatever

interval I is being considered. In the present context, however,´p(x)dx is taken to denote a

particular antiderivative for p(x). Since one antiderivative of a function differs from anotheronly by a constant term, it’s convenient to let

´p(x)dx denote the antiderivative with constant

term equal to 0. However, if P (x) is an antiderivative for p(x) on I, there is no harm in lettingµ(x) be eP (x)+a for any constant a 6= 0. From (2.16) we obtain

y(x) = e−P (x)−a(ˆ

q(x)eP (x)+adx+ c

)= e−P (x)e−aea

ˆq(x)eP (x)dx+ ce−ae−P (x)

= e−P (x)

ˆq(x)eP (x)dx+ ce−P (x) = e−P (x)

(ˆq(x)eP (x)dx+ c

),

where we replace ce−a with c since the latter is arbitrary, and so we see that a vanishes fromthe final expression.

Example 2.9. Solve xy′ + 3y + 2x2 = x3 + 4x.

29

Solution. If we write the equation as

xy′ + 3y = x3 − 2x2 + 4x

it can be seen that it is indeed a first-order linear ODE. Assuming x to be in either (−∞, 0) or(0,∞), we may divide the equation by it to obtain

y′ +3

xy = x2 − 2x+ 4, (2.17)

which is the standard form (2.13) with p(x) = 3/x and q(x) = x2 − 2x + 4. An appropriateintegrating factor is

µ(x) = e´3/xdx = e3 ln |x| = |x|3,

so that µ(x) = −x3 if x ∈ (−∞, 0), and µ(x) = x3 if x ∈ (0,∞). Since we have no initialcondition or physical context that might inform us which interval x lies in, we must retain theabsolute value bars for the moment. Using (2.16) we obtain a general solution:

y(x) = |x|−3[ˆ

(x2 − 2x+ 4)|x|3dx+ c

]If x ∈ (0,∞), then

y(x) = x−3[ˆ

(x2 − 2x+ 4)x3dx+ c

]= x−3

ˆ(x5 − 2x4 + 4x3)dx+ cx−3

= x−3(

1

6x6 − 2

5x5 + x4

)+ cx−3 =

1

6x3 − 2

5x2 + x+ cx−3.

If x ∈ (−∞, 0), then

y(x) = −x−3[−ˆ

(x2 − 2x+ 4)x3dx+ c

]= x−3

ˆ(x5 − 2x4 + 4x3)dx− cx−3

=1

6x3 − 2

5x2 + x− cx−3 =

1

6x3 − 2

5x2 + x+ cx−3,

where in the end we change the sign of the last term since c is arbitrary. Since the expressionfor y(x) is the same whether x < 0 or x > 0, we may simply state that the general solution to(2.17), and hence to the original ODE, is

y(x) =1

6x3 − 2

5x2 + x+ cx−3

on any interval I that does not include zero. �

In the example above we find that there is a way to write the general solution to the ODEon an arbitrary interval without using absolute values, which is usually desirable when it ispossible. The next example shows that this is not always possible.

Example 2.10. Solve y′ = y/x+ 2x+ 1.

30

Solution. Writing the equation in standard form, we have

y′ − 1

xy = 2x+ 1,

so p(x) = −1/x and q(x) = 2x + 1. Now, assuming x to be in either (−∞, 0) or (0,∞), asuitable integrating factor is

µ(x) = e´(−1/x)dx = e− ln |x| = eln |x|

−1

= |x|−1.

Using (2.16), for x ∈ (0,∞) we have

y(x) = |x|[ˆ

(2x+ 1)|x|−1dx+ c

]= x

[ˆ(2x+ 1)x−1dx+ c

]= x

ˆ (2 +

1

x

)dx+ cx = x(2x+ lnx) + cx = 2x2 + x lnx+ cx,

and for x ∈ (−∞, 0) we have

y(x) = |x|[ˆ

(2x+ 1)|x|−1dx+ c

]= −x

[ˆ(2x+ 1)(−x)−1dx+ c

]= x

ˆ (2 +

1

x

)dx− cx = x[2x+ ln(−x)]− cx = 2x2 + x ln(−x) + cx.

(Note that since c is arbitrary we’re free to change the sign of the last term.) Here the expressionsfor y(x) are not identical: the argument of the natural logarithm is either x or −x. In fact wehave ln |x| in the two expressions, and therefore the general solution to the ODE is

y(x) = 2x2 + x ln |x|+ cx

on any interval I that does not include zero. �

31

2.4 – Exact Equations

Consider the equation F (x, y) = c, where y = y(x) and c is a constant. We differentiateboth sides of the equation with respect to x to obtain

d

dxF (x, y(x)) = 0,

and then using the Chain Rule (1.1) it follows that

∂F

∂x+∂F

∂y

dy

dx= 0.

Equivalently we may write

Fx(x, y) + Fy(x, y)y′ = 0. (2.18)

In general if we have a differential equation of the form

M(x, y) +N(x, y)y′ = 0, (2.19)

and we can find some function F such that Fx = M and Fy = N , then we would obtain theform (2.18), and from there we could work back to the original form F (x, y) = c for arbitraryconstant c. The equation F (x, y) = c would be a one-parameter family of implicit solutionsto (2.19), as we prove later. The existence of a function F for which Fx(x, y) = M(x, y) andFy(x, y) = N(x, y) hold for all (x, y) on some open set in R2 is by no means guaranteed, but ifthere is such a function, the equation (2.19) is of a very special kind.

Definition 2.11. An equation of the form M(x, y) +N(x, y)y′ = 0 is exact on a set S if thereexists a function F : S → R such that

Fx(x, y) = M(x, y) and Fy(x, y) = N(x, y)

for all (x, y) ∈ S.

The function F in Definition 2.11 is called the potential function of M + Ny′ = 0. Itshould never occasion that time is wasted seeking after a potential function that does not exist,because fortunately there is an easy test to determine whether M +Ny′ = 0 is exact.

Theorem 2.12 (Test for Exactness). Suppose that M(x, y) and N(x, y) have continuousfirst partial derivatives on a closed rectangle R. Then M(x, y) +N(x, y)y′ = 0 is exact on R ifand only if My(x, y) = Nx(x, y) for all (x, y) ∈ R.

Proof. Suppose that M(x, y) +N(x, y)y′ = 0 is exact on R = [x1, y1]× [x2, y2]. Then there issome function F : R → R such that Fx = M and Fy = N on R. The continuity of the firstpartial derivatives of M and N now implies the continuity of all second partial derivatives ofF , and so Fxy = Fyx on R by Theorem 1.5 (Clairaut’s Theorem). Now, Fxy = (Fx)y = My andFyx = (Fy)x = Nx, and therefore My = Nx on R.

For the converse, suppose that My = Nx on R. Define the function

F (x, y) =

ˆ x

x1

M(t, y1)dt+

ˆ y

y1

N(x, t)dt. (2.20)

32

For each (x, y) ∈ R the intervals [x1, x] and [y1, y] both lie in R, and since the functionst 7→M(t, y1) and t 7→ N(x, t) are continuous on [x1, x] and [y1, y], respectively, both integralson the right-hand side of (2.20) exist in R. Thus F as defined by (2.20) has domain containingR.

It remains to show that Fx = M and Fy = N on R. Using Theorem 1.6 (Leibniz’s IntegralRule), the Fundamental Theorem of Calculus, and the supposition that Nx = My, we have

Fx(x, y) =∂

∂x

ˆ x

x1

M(t, y1)dt+∂

∂x

ˆ y

y1

N(x, t)dt

= M(x, y1) +

ˆ y

y1

∂N

∂x(x, t)dt

= M(x, y1) +

ˆ y

y1

∂M

∂y(x, t)dt

= M(x, y1) +[M(x, t)

]t=yt=y1

= M(x, y1) +[M(x, y)−M(x, y1)

]= M(x, y)

for any (x, y) ∈ R. Also

Fy(x, y) =∂

∂y

ˆ x

x1

M(t, y1)dt+∂

∂y

ˆ y

y1

N(x, t)dt = N(x, y)

obtains immediately for any (x, y) ∈ R, since the first integral is constant with respect to y. �

The formula (2.20) could always be used to find F for a given exact equation (2.19), butin practice it tends to be more natural to ferret out F by means of a procedure that employsindefinite integrals. We give this procedure after the statement and proof of the followingtheorem.

Theorem 2.13. If the ODEM(x, y) +N(x, y)y′ = 0

is exact on a closed rectangle R, so there is a function F such that Fx = M and Fy = N on R,then F (x, y) = c is a one-parameter family of implicit solutions to the ODE on R.

Proof. Let c ∈ R be arbitrary and set F (x, y) = c. We take the relation F (x, y) = c toimplicitly define y as a function ϕ of x on some open interval I, so that

F (x, ϕ(x)) = c (2.21)

for all x ∈ I. Differentiating both sides of (2.21) with respect to x yields

Fx(x, ϕ(x)) + Fy(x, ϕ(x))ϕ′(x) = 0

for all x ∈ I. Recalling that Fx = M and Fy = N , we now obtain

M(x, ϕ(x)) +N(x, ϕ(x))ϕ′(x) = 0

for all x ∈ I.

33

Since ϕ : I → R satisfies M(x, y) +N(x, y)y′ = 0 we conclude that F (x, y) = c is an implicitsolution to the ODE. �

The procedure for solving an exact equation is as follows.

Procedure. Suppose M(x, y) + N(x, y)y′ = 0 is exact, so there exists F such that Fx = M ,Fy = N .

1. Integrate Fx(x, y) = M(x, y) with respect to x to get F (x, y) =´M(x, y) dx+ g(y).

2. Differentiate F (x, y) =´M(x, y) dx+ g(y) with respect to y , substitute N for Fy, and

solve for g′(y).

3. Integrate g′(y) to determine an expression for g(y) up to an arbitrary constant.

4. Put the expression for g(y) into F (x, y) =´M(x, y) dx+ g(y)

The general implicit solution is F (x, y) = c. �

Example 2.14. Solve

y′ =3x2 − ex sin y

ex cos y + y−2/3/3.

Solution. Rewriting the equation as

−3x2 + ex sin y +

(ex cos y +

1

3y−2/3

)y′ = 0,

we have M(x, y) = ex sin y − 3x2 and N(x, y) = ex cos y + y−2/3/3, and since

My(x, y) = ex cos y = Nx(x, y)

we conclude by Theorem 2.12 that the equation is exact. Thus it is appropriate to employ theprocedure given above to find a function F that will determine implicit solutions to the ODE.From Fx = M we obtain, for any fixed y,

F (x, y) =

ˆM(x, y) dx+ g(y) =

ˆ(ex sin y − 3x2) dx+ g(y) = ex sin y − x3 + g(y), (2.22)

where g(y) is an arbitrary function of y. (Notice that the partial derivative of the right-handside of (2.22) with respect to x is indeed M(x, y).)

Taking the partial derivative of F with respect to y gives

Fy(x, y) = ∂y[ex sin y − x3 + g(y)

]= ex cos y + g′(y),

and since Fy = N we obtain

ex cos y +1

3y−2/3 = ex cos y + g′(y)

and thus

g′(y) =1

3y−2/3.

34

Now we integrate to get

g(y) =1

3

ˆy−2/3 dy = y1/3 + c1

for arbitrary constant c1. Substituting this result into (2.22) yields

F (x, y) = ex sin y − x3 + y1/3 + c1.

Therefore implicit solutions to the ODE are given by

ex sin y − x3 + y1/3 + c1 = c2,

but this can be written simply as

ex sin y − x3 + y1/3 = c,

where c is the arbitrary constant that results when we consolidate the arbitrary constants c1and c2. �


1

x+ 2xy2 +

(2x2y − cos y

)y′ = 0, y(1) = π.

Solution. Here we have

My(x, y) = ∂y

[1

x+ 2xy2

]= 4xy

and

Nx(x, y) = ∂x[2x2y − cos y

]= 4xy,

so My = Nx on at least some rectangle R in R2 and it follows that the ODE is exact. Thusthere exists a function F such that Fx = M and Fy = N . From the former we obtain

F (x, y) =

ˆM(x, y) dx+ g(y) =

ˆ (1

x+ 2xy2

)dx = ln |x|+ x2y2 + g(y). (2.23)

From this comes

Fy(x, y) = 2x2y + g′(y),

and since Fy = N this leads to

2x2y − cos y = 2x2y + g′(y).

Thus g′(y) = − cos y, which gives us g(y) = − sin y as a suitable antiderivative of g. Substitutingthis into (2.23) gives

F (x, y) = ln |x|+ x2y2 − sin y,

and therefore

ln |x|+ x2y2 − sin y = c (2.24)

are the implicit solutions to the ODE.

35

Now we make use of the initial condition y(1) = π. Substituting 1 for x and π for y we findthat

c = ln |1|+ (1)2(π)2 + sin(π) = π2,

and solnx+ x2y2 + sin y = π2

is the implicit solution to the IVP. Notice that, since x > 0 at the initial point (1, π), we cantake |x| = x in the solution. �

36

2.5 – Integrating Factors

If an equation of the form

M(x, y) +N(x, y)y′ = 0, (2.25)

is not exact on any open set U ⊆ R2, we may be able to find some function µ(x, y) such that

µ(x, y)M(x, y) + µ(x, y)N(x, y)y′ = 0

is exact. That is, if we define

M(x, y) = µ(x, y)M(x, y) and N(x, y) = µ(x, y)N(x, y),

then for the equation

M(x, y) + N(x, y)y′ = 0 (2.26)

there exists some open set U and function F : U → R such that Fx = M and Fy = N on U . Inthis case the function µ(x, y) is called an integrating factor.

By Theorem 2.12, in order for (2.26) to be exact we must have My(x, y) = Nx(x, y) on someopen set U , which can be expressed as

∂y [µ(x, y)M(x, y)] = ∂x [µ(x, y)N(x, y)] .

By the product rule of differentiation we obtain

µ(x, y)My(x, y) +M(x, y)µy(x, y) = µ(x, y)Nx(x, y) +N(x, y)µx(x, y),

which is a partial differential equation that generally would be no trivial matter to solve forµ. However, if µ were a function of x alone, we could replace µ(x, y) with µ(x), µx(x, y) withµ′(x), and µy(x, y) with 0, to arrive at the marginally less hideous equation

µ(x)My(x, y) = µ(x)Nx(x, y) +N(x, y)µ′(x).

This we rewrite asµ′

µ=My(x, y)−Nx(x, y)

N(x, y). (2.27)

Now, if the right-hand side of this equation is independent of y, which is to say that it is afunction of x alone, then the equation is in fact a first-order linear ODE in µ. Indeed we canthen let

g(x) =My(x, y)−Nx(x, y)

N(x, y)

and h(µ) = µ, and write (2.27) as µ′ = g(x)h(µ) to see that the ODE actually separable! Bythe Method of Separation of Variables we obtainˆ

1

h(µ)dµ =

ˆg(x) dx,

for all x on some open interval I where g is continuous, which implies thatˆ1

µdµ =

ˆMy(x, y)−Nx(x, y)

N(x, y)dx,

37

and thus

ln |µ| =ˆMy −Nx

Ndx+ c

for an arbitrary constant c. Hence an integrating factor of the form µ(x) is given by

µ(x) = exp

(ˆMy −Nx

Ndx

). (2.28)

A similar analysis based on the assumption that µ and (Nx −My)/M are functions of yalone (i.e. independent of x) leads to another formula:

µ(y) = exp

(ˆNx −My

Mdy

). (2.29)

In the special case when we are given a first-order linear ODE in standard form,

y′ + P (x)y = Q(x),

we can write[P (x)y −Q(x)] + y′ = 0

to obtain the form (2.25) with

M(x, y) = P (x)y −Q(x) and N(x, y) = 1,

which generally is not exact. However multiplying by µ(x) = exp(´P (x) dx), where here´

P (x) dx represents any particular antiderivative of P (x), yields

e´P (x) dx[P (x)y −Q(x)]︸︷︷︸

M(x,y)

+ e´P (x) dx︸︷︷︸N(x,y)

y′ = 0, (2.30)

and sinceMy(x, y) = P (x)e

´P (x) dx = Nx(x, y),

it follows from Theorem 2.12 that (2.30) is exact and therefore µ(x) is an integrating factor asthe term is defined in this section. Of course exp(

´P (x) dx) was encountered before in Section

2.3, where it was also defined to be an integrating factor, and so we see that the integratingfactor of that section is just a special instance of the more inclusive notion of an integratingfactor under consideration here.

Example 2.16. Solve 2xy + (y2 + 3x2)y′ = 0.

Solution. Here we have an ODE of the form M +Ny′ = 0 with

M(x, y) = 2xy and N(x, y) = y2 + 3x2.

SinceMy(x, y) = 2x 6= 6x = Nx(x, y),

the equation is not exact. We will attempt to make it exact by finding an integrating factor ofsome kind. In order to obtain an integrating factor that depends only on x we need (My−Nx)/Nto be independent of y, but this is not the case due to a y2 term that cannot be expunged:

My(x, y)−Nx(x, y)

N(x, y)=

2x− 6x

y2 + 3x2= − 4x

y2 + 3x2.

38

To obtain an integrating factor that depends only on y we need (Nx − My)/M to beindependent of x, and here we have more luck:

Nx(x, y)−My(x, y)

M(x, y)=

6x− 2x

2xy=

2

y,

where no x is to be seen in the final expression. To determine µ(y) we use the formula (2.29):

µ(y) = exp

(ˆNx(x, y)−My(x, y)

M(x, y)dy

)= e

´2/y dy = e2 ln |y| = eln y

2

= y2

Multiplying the ODE by y2 gives

2xy3 + (y4 + 3x2y2)y′ = 0,

which is of the form M + Ny′ = 0 with

M(x, y) = 2xy3 and N(x, y) = y4 + 3x2y2.

Since My(x, y) = 6xy2 = Nx(x, y) the equation is exact (exact on R2, in fact), and so there

exists some function F (x, y) such that Fx = M and Fy = N . From the latter equation3 weobtain

F (x, y) =

ˆN(x, y) dy =

ˆ(y4 + 3x2y2) dy =

1

5y5 + x2y3 + h(x),

where h(x) is an arbitrary function of x. Differentiating with respect to x then yields

Fx(x, y) = 2xy3 + h′(x),

and since Fx = M we have 2xy3 + h′(x) = 2xy3 and thus h′(x) = 0. We conclude that h(x) = c1for any arbitrary constant c1, so that

F (x, y) =1

5y5 + x2y3 + c1.

The general implicit solution is therefore

1

5y5 + x2y3 + c1 = c2,

where c2 is arbitrary. Naturally we may combine c1 and c2 by letting c = c2 − c1, and write

y5 + 5x2y3 = c

as the general solution. �

It is within our capability to determine an integrating factor that depends on both x and yin certain instances, as the next example illustrates.

Example 2.17. Find an integrating factor of the form xmyn that will make the equation(2y2 − 6xy) + (3xy − 4x2)y′ = 0 exact, then solve the equation.

3It does not, of course, matter which equation we start with.

39

Solution. Multiply the ODE by xmyn to obtain

(2xmyn+2 − 6xm+1yn+1) + (3xm+1yn+1 − 4xm+2yn)y′ = 0, (2.31)

so that

M(x, y) = 2xmyn+2 − 6xm+1yn+1 and N(x, y) = 3xm+1yn+1 − 4xm+2yn.

For the ODE to be exact we must choose m and n such that

My(x, y) = 2(n+ 2)xmyn+1 − 6(n+ 1)xm+1yn

equals

Nx(x, y) = 3(m+ 1)xmyn+1 − 4(m+ 2)xm+1yn,

which is to say that

(2n+ 4)xmyn+1 + (−6n− 6)xm+1yn = (3m+ 3)xmyn+1 + (−4m− 8)xm+1yn.

In order for this equation to be satisfied for all (x, y) in some open set U ⊆ R2, the coefficientsof like terms will have to be equal; that is, the coefficients of the xmyn+1 terms will need tomatch, giving 2n + 4 = 3m + 3, and the coefficients of the xm+1yn terms will need to match,giving −6n− 6 = −4m− 8. Thus we have a system of equations{

3m− 2n= 14m− 6n=−2

The solution to this system is m = 1 and n = 1, giving us an integrating factor µ(x, y) = xyand turning (2.31) into the exact equation

(2xy3 − 6x2y2) + (3x2y2 − 4x3y)y′ = 0.

We now set about finding the general solution to the ODE. There is a function F (x, y) suchthat

Fx(x, y) = 2xy3 − 6x2y2 (2.32)

and

Fy(x, y) = 3x2y2 − 4x3y. (2.33)

From (2.32) we obtain

F (x, y) =

ˆ(2xy3 − 6x2y2) dx = x2y3 − 2x3y2 + g(y).

Differentiating this equation with respect to y then gives

Fy(x, y) = 3x2y2 − 4x3y + g′(y),

which together with (2.33) implies that

3x2y2 − 4x3y + g′(y) = 3x2y2 − 4x3y,

and hence g′(y) = 0. So g(y) = c1 for any constant c1, whence we get

F (x, y) = x2y3 − 2x3y2 + c1.

40

The general implicit solution is F (x, y) = c2 for any constant c2. We may write this as

x2y3 − 2x3y2 = c.

by letting c = c2 − c1. �

41

2.6 – Substitutions and Transformations

If an explicit first-order equation y′ = f(x, y) is not separable, linear, exact, or amenableto being made exact with an integrating factor, there is yet another technique that may beattempted in order to solve it. Just as a Riemann integral can sometimes be rendered moretractable by effecting a substitution of variables, so too might a substitution tame a differentialequation.

Definition 2.18. A function f(x, y) is homogenous of degree zero if f(tx, ty) = f(x, y)for all x, y, t ∈ R such that (x, y), (tx, ty) ∈ Dom f . An explicit first-order equation y′ = f(x, y)is homogeneous if f is homogeneous of degree zero.

Making the substitution y(x) = xu(x) in a homogeneous equation y′ = f(x, y) yields a newequation that is separable and thus may be solved (at least implicitly) in terms of u(x). Inparticular, suppose a function u is such that y(x) = xu(x) for all x in some open interval I.Then, since f is homogeneous of degree zero,

y′(x) = f(x, y(x)) ⇔ [xu(x)]′ = f(x, xu(x)) ⇔ xu′(x) + u(x) = f(1, u(x)) (2.34)

for all x ∈ I. That is,

xdu

dx= f(1, u)− u,

which with separation of variables leads toˆ1

f(1, u)− udu =

ˆ1

xdx, (2.35)

whereupon it is a matter of integration to obtain a solution to the rightmost equation in (2.34).Indeed, the chain of equivalencies in (2.34) make it clear that y is a solution to y′ = f(x, y) on Iif and only if u is a solution to xu′+ u = f(1, u) on I. Since the Separation of Variables Methodalways gives the general solution to a separable ODE, it follows that the general solution toxu′+u = f(1, u) deriving from (2.35) will in turn yield the general solution to y′ = f(x, y) uponreplacing u with y/x.

Example 2.19. Solve

xy′ = y(ln y − lnx+ 1). (2.36)

Solution. The equation is not separable, linear, or exact. However, dividing by x we obtain

y′ =y(ln y − lnx+ 1)

x=y

xln(yx

)+y

x,

where the right-hand side can be seen to be a function of y/x. The equation is thereforehomogeneous, and so we let u = y/x and get

u+ xu′ = u lnu+ u,

and thus

xu′ = u lnu.

42

Separation of variables now gives ˆ1

u lnudu =

ˆ1

xdx,

whereupon the substitution v = lnu for the integral on the right givesˆ1

vdv =

ˆ1

xdx.

Hence ln |v| = ln |x|+ c0 for any constant c0, yielding

ln |ln(y/x)| = ln |x|+ c0

since v = lnu = ln(y/x). A glance at the original equation (2.36) makes clear that x > 0 mustbe the case, and so

ln |ln(y/x)| = lnx+ c0, c0 ∈ Ris the general solution.

A far nicer form for the general solution is achievable. To start, we exponentiate both sidesof the equation to find that

|ln (y/x)| = elnx+c1 = ec0x = cx

for arbitrary c = ec0 > 0. There are two possibilities: ln(y/x) > 0 (i.e. y > x) and ln(y/x) < 0(i.e. y < x), with ln(y/x) = 0 discounted at this juncture since, again, we have x > 0 and c > 0.If y > x, then we have

|ln (y/x)| = cx ⇒ ln (y/x) = cx ⇒ y

x= ecx ⇒ y = xecx;

and if y < x, then

|ln (y/x)| = cx ⇒ − ln (y/x) = cx ⇒ y

x= e−cx ⇒ y = xe−cx.

Thus for any c > 0 we find that y = xecx and y = xe−cx are both solutions to (2.36), and so wemay say that the general solution to (2.36) contains the family of functions y = xecx for anyc 6= 0.

If we were to let c = 0 then y = xecx would become y = x, and we should check whether thisalso satisfies (2.36) for x > 0. This we do by directly substituting x for y in (2.36) to obtain

xx′ = x(lnx− lnx+ 1) ⇒ x = x(0 + 1) ⇒ x = x.

That x = x is true for all x > 0 shows that y = x for x > 0 is indeed another solution. Therefore

y = xecx, c ∈ R

is the general solution. �

Example 2.20. Solve

y′ =(x+ 1)2y + y3

(x+ 1)3. (2.37)

43

Solution. The equation may be written as

y′(x) =y(x)

x+ 1+

[y(x)

x+ 1

]3,

which has nearly the form of a homogeneous equation if we regard x + 1 as the independentvariable. We therefore define t = x+ 1 to obtain

y′(t− 1) =y(t− 1)

t+

[y(t− 1)

t

]3.

Now we let w(t) = y(t− 1), so that w′(t) = y′(t− 1) and then the ODE becomes

w′ =w

t+(wt

)3.

This equation is homogeneous, and so we make the substitution u = w/t, whence w′ = u+ tu′

obtains and we haveu+ tu′ = u+ u3.

Separation of variables leads to ˆ1

u3du =

ˆ1

tdt,

giving1

u2=

1

c− 2 ln |t|for c ∈ R, and thus

y2(t− 1) = w2(t) =t2

c− 2 ln |t|.

The general solution is therefore

y2(x) =(x+ 1)2

c− 2 ln |x+ 1|for c ∈ R. �

Given an equation y′ = f(x, y), suppose there can be found constants a and b, and a functionG, such that

f(x, y) = G(ax+ by)

for all (x, y) in some open set U ⊆ R2. Then the ODE may be written as y′ = G(ax + by),and it becomes possible to transform it into a separable equation by making the substitutionz = ax+ by. Differentiating with respect to x, we have z′ = a+ by′ and thus

y′ =z′ − ab

,

which enables us to write y′ = G(ax+ by) as

z′ − ab

= G(z)

and finally z′ = a+ bG(z). By the Method of Separation of Variables this leads to the equationˆ1

a+ bG(z)dz =

ˆdx,

44

and we’re on our way toward glorious victory.

Example 2.21. Solve the initial value problem y′ = sin(x− y), y(0) = π/4.

Solution. Here f(x, y) = sin(x− y), and so if we define G(z) = sin(z) we can see that

f(x, y) = sin(x− y) = G(x− y)

for all (x, y) ∈ R2. Thus, we may make the substitution z = x− y for y′ = sin(x− y) and expecta separable equation to result. From z = x− y we get z′ = 1− y′, so that y′ = 1− z′ and theODE becomes 1− z′ = sin(z). By the Method of Separation of Variables this leads toˆ

1

1− sin(z)dz =

ˆdx. (2.38)

Now, since

1

1− sin(z)=

1

1− sin(z)· 1 + sin(z)

1 + sin(z)=

1 + sin(z)

cos2(z)= sec2(z) + tan(z) sec(z),

equation (2.38) becomes ˆsec2(z) dz +

ˆtan(z) sec(z) dz = x+ c,

where c is an arbitrary constant. This immediately yields

tan(z) + sec(z) = x+ c,

and thus

tan(x− y) + sec(x− y) = x+ c

is the general solution to the ODE.Now we make use of the initial condition. Setting x = 0 and y = π/4 in the general solution

gives

tan(−π/4) + sec(−π/4) = c,

so that c =√

2− 1 and we obtain

tan(x− y) + sec(x− y) = x+√

2− 1

as the solution to the IVP. �

We now make a study of one last specialized kind of first-order differential equation thatnonetheless arises often in applications. The method of solution again makes use of a specificsubstitution.

Definition 2.22. A Bernoulli equation is a first-order ODE of the form

y′ + P (x)y = Q(x)yn (2.39)

for some n ∈ R, where P (x) and Q(x) are continuous on an open interval I.

45

The substitution to make here is v = y1−n, which can also be expressed as yn = y/v.Differentiating with respect to x gives

v′ = (1− n)y−ny′,

so that

y′ =yn

1− nv′ =

y/v

1− nv′ =

yv′

(1− n)v

and (2.39) becomes

yv′

(1− n)v+ P (x)y = Q(x)

y

v.

Dividing both sides of this equation by y, we get

v′

(1− n)v+ P (x) = Q(x)

1

v,

and finally a linear equation in standard form is obtained with one more manipulation:

v′ + (1− n)P (x)v = (1− n)Q(x). (2.40)

Indeed if we let

P (x) = (1− n)P (x) and Q(x) = (1− n)Q(x),

then (2.40) can be written as v′ + P (x)v = Q(x).It should be noted that the zero function y ≡ 0 is a solution to any Bernoulli equation with

n 6= 0, and so in the examples that follow no mention of it will be made.

Example 2.23. Solve

y′ +y

x− 2− 5(x− 2)

√y = 0.

Solution. First rewrite the equation as

y′ +1

x− 2y = 5(x− 2)y1/2,

which can be seen to be a Bernoulli equation with n = 1/2,

P (x) =1

x− 2, and Q(x) = 5(x− 2).

Making the substitution v = y1−1/2 = y1/2 the equation becomes, by (2.40),

v′ +1

2(x− 2)v =

5(x− 2)

2.

An integrating factor for this linear equation is given by

µ(x) = exp

(ˆ1

2(x− 2)dx

)= exp

(1

2ln |x− 2|

)=√|x− 2|,

46

where we cannot wave away the absolute value bars since it’s left open whether x lies in (−∞, 2)or (2,∞). (Note that x 6= 2 in any case since division by 0 would result in the original ODE.)By equation (2.16) we obtain

v(x) =1√|x− 2|

[ˆ5

2(x− 2)

√|x− 2| dx+ c

].

Assuming that x > 2 yields

v(x) =1√x− 2

[5

2

ˆ(x− 2)

√x− 2 dx+ c

]=

1√x− 2

[(x− 2)5/2 + c

],

and assuming that x < 2 yields

v(x) =1√

2− x

[5

2

ˆ(x− 2)

√2− x dx+ c

]=

1√2− x

[(2− x)5/2 + c

],

Combining these results, we conclude that√y(x) =

1√|x− 2|

[|x− 2|5/2 + c

]= (x− 2)2 +

c√|x− 2|

,

and therefore

y(x) =

[(x− 2)2 +

c√|x− 2|

]2is the general explicit solution. �

47

3First-Order Applications

3.1 – Growth and Decay

Problems concerning the decay of one or more radioactive isotopes also lend themselves tocompartmental analysis. In the simplest case there is a single radioactive substance that decaysinto some other element, so that a “compartment” is a sample of the substance. No quantity ofthe substance is entering the compartment, but gradually over time atoms of the substance areleaving (i.e. fissioning into other material). Empirical data make one thing clear: the rate ofdecay of a radioactive substance is directly proportional to the amount of the substance that ispresent. Thus, if x(t) is the amount of the substance (in grams, say), then x′(t) is the rate ofchange of the amount of the substance, with

x′(t) = kx(t)

for some constant of proportionality k < 0 that depends on the isotope under consideration.The half-life of a radioactive isotope, which is the time it takes for half of it to decay, can varyfrom nanoseconds to gigayears.

Example 3.1. Cobra Commander has 260 grams of kaboomium-320 (chemical symbol 320Ka)in the basement of his secret hideout. Upon returning from a carefree five-hour drive withDestro in the countryside in his spiffy new Nissan Cube, he finds that 192 grams remain. Afterhow many hours will only 10 grams remain?

Solution. Let x(t) be the amount of 320Ka in grams at time t in hours, where x(0) = 260 andx(5) = 192. The rate at which 320Ka decays is proportional to the amount present, which is tosay that x′(t) = kx(t) for some constant k. This is a separable ODE which becomesˆ

1

xdx =

ˆk dt,

and thus ln |x| = kt + c. Of course, x(t) is never negative, so we may write ln(x) = kt + c,whence we obtain x(t) = ekt+c = x0e

kt (letting x0 = ec). From the initial condition x(0) = 260comes 260 = x0e

0, so that x0 = 260 and hence

x(t) = 260ekt. (3.1)

48

It remains to find k. Fortunately we have another bit of information available: x(5) = 192.Putting this into (3.1) gives 192 = 260e5k, whence

k = 0.2 ln(192/260) ≈ −0.0606

and we fully determine x(t) to be

x(t) = 260e−0.0606t.

With this model in hand we can ascertain how many hours it will be before only 10 g of 320Karemains for Cobra Commander to play with: set x(t) = 10 to get

260e−0.0606t = 10,

and thus

t =1

−0.0606ln

(1

26

)≈ 53.8.

That is, after about 53.8 hours only 10 g of kaboomium-320 will remain. We assume that CobraCommander knows his business. And knowing is half the battle! �

Example 3.2. A dead butler was found in a mansion where the temperature is kept at aconstant 68◦ F. At the time it was discovered, the core temperature of the body was determinedto be 83◦ F. One hour later a second measurement showed the core temperature to be 77◦ F.Assume the core temperature at the time of death was 98.6◦ F. How much time has elapsedbetween the time of death and the time the body was found?

Solution. Newton’s Law of Cooling states that T ′(t) = k[T (t) −M ], where M = 68 is thetemperature of the mansion. From this equation we obtainˆ

1

T − 68dT =

ˆk dt ⇒ T (t) = 68 + Cekt.

Letting t = 0 be the time of death and t = τ the time of discovery, we’re given T (0) = 98.6,T (τ) = 83, T (τ + 1) = 77. With T (0) = 98.6 we find that C = 30.6. So

T (t) = 68 + 30.6ekt.

Now, with T (τ) = 83 and T (τ + 1) = 77 we obtain

83 = 68 + 30.6ekτ ⇒ ekτ =25

51(3.2)

and77 = 68 + 30.6ek(τ+1) = 68 + 30.6ekτek, (3.3)

respectively. Substituting (3.2) into (3.3) gives

77 = 68 + 30.6

(25

51ek),

which solves to give k = ln 0.6. Putting this into (3.3) yields

eτ ln 0.6 =25

51⇒ τ ≈ 1.40 hr.

49

Therefore 1.40 hours has elapsed between the time of the butler’s death and the time the bodywas found. �

50

3.2 – Compartmental Analysis

For a first example we shall analyze a one-compartment system, and then later consider atwo-compartment system.

Example 3.3. A brine solution (a mixture of salt and water) flows at a constant rate of 4L/min into a tank that initially contains 100 L of pure water. The solution inside the tank iskept well-stirred and flows out of the tank at a rate of 3 L/min. If the concentration of saltin the brine entering the tank is 0.2 kg/L, determine the mass of the salt in the tank after tminutes. Find when will the concentration of salt in the tank be 0.1 kg/L.

Solution. Let x(t) be the mass of salt, in kilograms, that is in the tank at time t. Since thetank is initially filled with fresh water we know that x(0) = 0. See Figure 5.

In order to determine x(t) we will use what we know about the rate at which x(t) changesover time, which is x′(t). An expression for x′(t) will be given as the rate salt enters the tankminus the rate salt leaves the tank (in kilograms per minute). To do this we first need to knowthe volume of the solution in the tank at time t. The volume is given to be 100 L initially, andsince 4 L of liquid enters the tank and 3 L leaves with each minute, we can see that at time tthe volume of solution in the tank must be 100 + t.

Now, the rate salt enters the tank is easily reckoned: 4 liters of brine is entering per minute,there’s 0.2 kg of salt per liter, and so a total of 0.8 kg of salt is entering per minute. As for therate salt leaves, at time t there’s x(t) kg of salt in the tank, and we assume that it is uniformlydissolved throughout the 100 + t liters of solution to give a concentration of x(t)/(100 + t) kg ofsalt per liter. Since 3 liters of solution is leaving per minute, we conclude that 3x(t)/(100 + t)kg of salt is leaving per minute. The derivation of x′(t) is as follows:

x′(t) = (rate salt enters Tank 1)− (rate salt leaves Tank 1)

=

(0.2 kg

1 L

)(4 L

1 min

)−(

x(t) kg

100 + t L

)(3 L

1 min

)= 0.8− 3x(t)

100 + t.

Thus we have a linear first-order ODE:

x′ +3

t+ 100x = 0.8.

x(t) kg salt

x(0) = 0

4 L/min

0.2 kg/L

3 L/min

Figure 5.

51

To solve this equation, we multiply by the integrating factor

µ(t) = exp

(ˆ3

t+ 100dt

)= e3 ln |t+100|+c = (t+ 100)3

to obtain

(t+ 100)3x′ + 3(t+ 100)2x = 0.8(t+ 100)3,

which becomes [(t+ 100)3x

]′= 0.8(t+ 100)3

and thus

(t+ 100)3x =

ˆ0.8(t+ 100)3 dt =

0.8

4(t+ 100)4 + c.

From this we get a general explicit solution to the ODE,

x(t) =1

5(t+ 100) +

c

(t+ 100)3.

To determine c we use the initial condition x(0) = 0:

0 =1

5(0 + 100) +

c

(0 + 100)3⇒ c

1003= −20 ⇒ c = −2× 107.

Thus, the mass of salt in the tank at time t is given by

x(t) =1

5(t+ 100)− 2× 107

(t+ 100)3.

The concentration, C(t), of salt at time t is given by the mass x(t) at time t divided by thevolume t+ 100 at time t. That is, C(t) = x(t)/(t+ 100), so that

C(t) = 0.2− 2× 107

(t+ 100)4.

Now, to find when the concentration of salt is 0.1 kg/L we solve C(t) = 0.1, giving the equation

0.1 = 0.2− 2× 107

(t+ 100)4.

From this we find that t = 100 4√

2− 100 ≈ 18.9 minutes. �

The following example illustrates a physical system consisting of two compartments. Thegeneral approach is to analyze the first compartment, then use the information garnered toanalyze the second compartment.

Example 3.4. Beginning at time t = 0, fresh water is pumped at a rate of 3 L/min into awell-stirred tank that is initially filled with 60 L of brine. The increasingly less concentratedsalt solution flows at a rate of 3 L/min out a drain that feeds into a second tank initially filledwith 60 L of pure water. The resultant mixture of water and salt in the second tank, whichis also well-stirred, is pumped into the ocean at a rate of 3 L/min. Find the time when theconcentration of salt in the second tank is greatest, and compare the maximal concentration inthe second tank to the initial concentration in the first tank.

52

60 L solutionx(t) kg salt

x(0) = x0

3 L/min

0 kg/L

60 L solutiony(t) kg salt

y(0) = 0 3 L/min160y(t) kg/L

3 L/min160x(t) kg/L

Figure 6.

Solution. It is not actually necessary to know how much salt is initially in Tank 1 to determinewhen the concentration of salt in Tank 2 is greatest. Let x(t) be the number of kilograms ofsalt in Tank 1 at time t, and let y(t) be the number of kilograms of salt in Tank 2 at time t.We have x(0) = x0 for some constant x0, and also y(0) = 0 since the water in Tank 2 is initiallypure. See Figure 6. Now, noting that the volume of solution in Tank 1 is a constant 60 L, wehave

x′(t) = (rate salt enters Tank 1)− (rate salt leaves Tank 1)

= 0−(x(t) kg

60 L

)(3 L

1 min

)= −3x(t)

60,

which yields the equation x′ = − 120x, also written as dx/dt = −x/20. This equation is separable,

giving ˆ20

xdx = −

ˆdt,

and hence

ln(x20)

= −t+ c0

for arbitrary constant c0. Exponentiating both sides and letting c1 = ec0 be an arbitrary positiveconstant, we obtain

x20 = e−t+c0 = c1e−t,

and then x(t) = c1e−t/20. Using the initial condition x(0) = x0, we substitute t = 0 and x = x0

into the equation to get x0 = c1e0 = c1, and thus

x(t) = x0e−t/20. (3.4)

Now we turn our attention to Tank 2. Since the volume of solution in Tank 2 is always 60 L,we have

y′(t) = (rate salt enters Tank 2)− (rate salt leaves Tank 2)

=

(x(t) kg

60 L

)(3 L

1 min

)−(y(t) kg

60 L

)(3 L

1 min

)

53

=x(t)

20− y(t)

20=x0e−t/20 − y(t)

20where the last equality follows from (3.4). Hence we have the equation

y′ +1

20y =

x020e−t/20, (3.5)

which is a first-order linear ODE and so can be solved by finding an appropriate integratingfactor µ(t). We have

µ(t) = exp

(ˆ1

20dt

)= et/20

and so, multiplying (3.5) by et/20, we obtain

y′et/20 +1

20yet/20 =

x020,

which can be written (yet/20

)′=x020,

and therefore

yet/20 =

ˆx020dt =

x020t+ c.

Using the initial condition y(0) = 0, we substitute t = 0 and y = 0 into this equation to findthat c = 0, and at last we have an expression for y(t):

y(t) =x020te−t/20.

The concentration of salt in Tank 2 at time t, C(t), is given by C(t) = y(t)/60; that is,

C(t) =x0

1200te−t/20.

To determine when the concentration is greatest, we must find t > 0 for which C(t) attains aglobal maximum value on (0,∞).4 This entails finding t for which C ′(t) = 0; that is, we mustsolve

x01200

e−t/20 − x024, 000

te−t/20 = 0.

But this equation immediately implies that 20− t = 0, and hence t = 20 minutes.Finally, at time t = 20 minutes we find that the mass of salt in Tank 2 is

y(20) =x020

(20)e−20/20 =1

ex0.

That is, Tank 2 is at most 1/e times as salty as Tank 1 was initially.Observe that x(t) → 0 and y(t) → 0 as t → ∞, as is to be expected since fresh water is

ultimately displacing brine in the system. �

4Note that, since the volume of solution in Tank 2 is a constant 60 L, we could just as well determine wheny(t) attains a maximum.

54

4Higher-Order Equations

4.1 – Linear Independence of Functions

In linear algebra the notion of linear independence arises frequently in the context of vectorspaces. Letting 0 denote the zero element of a vector space V over R, we say the vectorsv1, . . . ,vn ∈ V are linearly independent if

n∑k=1

ckvk = 0

implies that ck = 0 for all 1 ≤ k ≤ n.The vectors in a vector space can be functions. If I ⊆ R is an interval, then we denote by VI

the vector space over R consisting of functions f : I → R. Given f ∈ VI and c ∈ R, we definescalar multiplication of c with f as yielding a new function cf ∈ VI given by

(cf)(x) = cf(x)

for all x ∈ I. If f, g ∈ VI , we define addition of f with g as yielding a new function f + g ∈ VIgiven by

(f + g)(x) = f(x) + g(x)

for all x ∈ I. These operations are consonant with conventions established in elementary algebra.If an interval I is not specified at the outset of an analysis involving real-valued functions

f1, . . . , fn of a single real variable, then we take

I =n⋂i=1

Dom(fi) = Dom(f1) ∩ · · · ∩Dom(fn)

provided that this results in an interval, and we carry out the analysis in the vector space VI .A linear combination of f1, . . . , fn ∈ VI is an expression of the form

n∑k=1

ckfk

55

for some choice of constants c1, c2, . . . , cn ∈ R, which of course is itself a function in VI given by(n∑k=1

ckfk

)(x) =

n∑k=1

ckfk(x)

for all x ∈ I. Setting F = {f1, . . . , fn}, we define

Span(F ) =

{n∑k=1

ckfk : c1, . . . , cn ∈ R

}to be the set of all linear combinations of f1, . . . , fn.

As in the past, we say a function f is identically equal to another function g on I, writtenf ≡ g on I, if f(x) = g(x) for all x ∈ I. In particular if f(x) = 0 for all x ∈ I, then wewrite f ≡ 0 on I, with the symbol 0 on the right-hand side of the identity denoting the zerofunction.

Definition 4.1. If f1, . . . , fn are functions with common domain I ⊆ R, then f1, . . . , fn arelinearly independent on I if

n∑k=1

ckfk ≡ 0

on I implies that ck = 0 for all 1 ≤ k ≤ n.

If functions f1, . . . , fn are linearly independent on I and we define F = {f1, . . . , fn}, then wesay F is a linearly independent set on I. Functions that are not linearly independent on Iare said to be linearly dependent on I. Thus, f1, . . . , fn are linearly dependent on I if therecan be found constants c1, . . . , cn ∈ R, not all zero, such that (c1f1 + · · ·+ cnfn)(x) = 0 for allx ∈ I. Suppose for instance that c1f1 + · · ·+ cnfn ≡ 0 on I for cm 6= 0. Then for all x ∈ I wehave

c1f1(x) + · · ·+ cmfm(x) + · · ·+ cnfn(x) = 0,

whence

fm(x) = − c1cmf1(x)− · · · − cm−1

cmfm−1(x)− cm+1

cmfm+1(x)− · · · − cn

cmfn(x),

or more compactly

fm(x) = −∑k 6=m

ckcmfk(x).

Thus we see that if f1, . . . , fn are linearly dependent on I, then at least one of the functions canbe written as a linear combination of the others.

If one of the functions f1, . . . , fn happens to be the zero function, then it follows thatf1, . . . , fn are linearly dependent on any interval I. To see this, suppose that fm ≡ 0 for some1 ≤ m ≤ n, and let I ⊆ R be any interval. For each k 6= m set ck = 0, and let cm = 1. For anyx ∈ I we have fm(x) = 0, and thus

(c1f1 + · · ·+ cnfn)(x) =n∑k=1

ckfk(x) = cmfm(x) +∑k 6=m

ckfk(x)

= 1 · 0 +∑k 6=m

0 · fk(x) = 0 +∑k 6=m

0 = 0.

56

That is,c1f1 + · · ·+ cnfn ≡ 0

on I, and since not all the constants c1, . . . , cn are zero we conclude that f1, . . . , fn are linearlydependent on I.

Example 4.2. Determine whether the functions ϕ(x) = xe2x and ψ(x) = e2x are linearlyindependent on (−∞,∞).

Solution. Suppose c1 and c2 are constants such that c1ϕ+ c2ψ ≡ 0 on (−∞,∞), meaning

c1xe2x + c2e

2x = 0

for all x ∈ R. Substituting x = 0 immediately gives c2 = 0. Substituting x = 1 gives c1e2 = 0,

and hence c1 = 0 also. Thus c1ϕ+ c2ψ = 0 on (−∞,∞) necessarily implies that c1 = c2 = 0,and therefore ϕ and ψ are linearly independent on (−∞,∞). �

Example 4.3. Determine whether the functions f(x) = x2, g(x) = 6x2− 1, and h(x) = 2x2 + 3are linearly independent on (−∞,∞).

Solution. Suppose c1, c2, c3 are constants such that c1f + c2g + c3h ≡ 0 on (−∞,∞). That is,

c1f(x) + c2g(x) + c3h(x) = 0 (4.1)

for all x ∈ R, which is to say

c1x2 + c2(6x

2 − 1) + c3(2x2 + 3) = 0

for all x ∈ R. Rewrite this as

(c1 + 6c2 + 2c3)x2 + (−c2 + 3c3) = 0,

and note that if we let c2 = 3 and c3 = 1, then the constant term −c2 + 3c3 is eliminated andwe have 6c2 + 2c3 = 20. Now all we need do is set c1 = −20 to also eliminate the x2 term. Thatis, if we choose c1 = −20, c2 = 3, and c3 = 1, then (4.1) is satisfied for all x ∈ R. Therefore f ,g, and h are linearly dependent on (−∞,∞). �

Example 4.4. Show the functions f(x) = x3 and g(x) = |x|3 are linearly dependent on [0,∞)and (−∞, 0], but not on (−∞,∞).

Solution. For all x ∈ [0,∞) we have |x|3 = x3, so that g ≡ f on [0,∞), and hence f − g ≡ 0on [0,∞). This shows that c1f + c2g ≡ 0 on [0,∞) is satisfied if c1 = 1 and c2 = −1, andtherefore f and g are linearly dependent on [0,∞).

For all (−∞, 0] we have |x|3 = −x3, so that g ≡ −f on (−∞, 0], and hence f + g ≡ 0 on(−∞, 0]. This shows that c1f +c2g ≡ 0 on (−∞, 0] is satisfied if c1 = 1 and c2 = 1, and thereforef and g are linearly dependent on (−∞, 0].

Now suppose c1 and c2 are such that c1f + c2g ≡ 0 on (−∞,∞). In particular this impliesthat

c1f(1) + c2g(1) = 0 and c1f(−1) + c2g(−1) = 0,

57

giving c1 + c2 = 0 and −c1 + c2 = 0. Adding these equations yields 2c2 = 0 and thus c2 = 0.Substituting this into c1 + c2 = 0 then gives c1 = 0, and thus c1 = c2 = 0 is the necessaryconclusion. Therefore f and g are linearly independent on (−∞,∞). �

One result from linear algebra that we shall have frequent need of is the following theorem,a more comprehensive version of which is found in §5.3 of the Linear Algebra Notes along witha complete proof. Recall that C denotes the set of complex numbers, and R ⊆ C.

Theorem 4.5 (Invertible Matrix Theorem). Let A be an n× n matrix with entries in C.Then the following statements are equivalent.

1. A is invertible.2. The row vectors of A are linearly independent.3. The column vectors of A are linearly independent.4. The system Ax = y has a unique solution for each y ∈ Cn.5. The system Ax = 0 has only the trivial solution.6. det(A) 6= 0.

We employ this theorem presently to devise an alternate means of determining whether acollection of functions satisfying certain differentiability requirements is linearly independent onan interval I. Toward that end the following special kind of determinant will help.

Definition 4.6. Let f1, . . . , fn be functions that have derivatives of all orders up to n− 1 onan interval I. The Wronskian of f1, . . . , fn is the function W [f1, . . . , fn] : I → R given by

W [f1, . . . , fn](x) =

∣∣∣∣∣∣∣∣∣f1(x) f2(x) · · · fn(x)f ′1(x) f ′2(x) · · · f ′n(x)

......

. . ....

f(n−1)1 (x) f

(n−1)2 (x) · · · f

(n−1)n (x)

∣∣∣∣∣∣∣∣∣for all x ∈ I.

Remark. The interval I in Definition 4.6 is not required to be open, and thus (as ever) it isunderstood that any derivative being considered at an endpoint of I will be the appropriateone-sided derivative.

It will be convenient to have a symbol for the matrix whose entries correspond to the entriesof the Wronskian determinant in Definition 4.6, and so we define

M[f1, . . . , fn](x) =

f1(x) f2(x) · · · fn(x)f ′1(x) f ′2(x) · · · f ′n(x)

......

. . ....

f(n−1)1 (x) f

(n−1)2 (x) · · · f

(n−1)n (x)

.Thus W [f1, . . . , fn](x) is the determinant of M[f1, . . . , fn](x):

W [f1, . . . , fn](x) = det(M[f1, . . . , fn](x)

).

Another notational convenience: If F = {f1, . . . , fn}, then we shall sometimes denoteW [f1, . . . , fn] and M[f1, . . . , fn] by W [F ] and M[F ].

http://faculty.bucks.edu/erickson/math260/LinearAlgebra.pdf

58

Theorem 4.7. Let F = {f1, . . . , fn} be a set of functions having derivatives of all orders up ton− 1 on an interval I. If F is a linearly dependent set on I, then W [F ] ≡ 0 on I.

Proof. Suppose F is a linearly dependent set on I, so there exists c1, . . . , cn not all zero suchthat

n∑k=1

ckfk ≡ 0

on I. The derivative of the zero function is the zero function, and therefore we obtain ahomogeneous system of linear equations

c1f1(x) + · · ·+ cnfn(x) = 0c1f′1(x) + · · ·+ cnf

′n(x) = 0

......

...

c1f(n−1)1 (x) + · · ·+ cnf

(n−1)n (x) = 0

for each x ∈ I. Defining

c =

c1...cn

,the system may be written as the matrix equation(

M[F ](x))c = 0,

where c 6= 0 since c1, . . . , cn are not all zero. Thus the matrix equation has a nontrivial solution,and by the Invertible Matrix Theorem it follows that W [F ](x) = 0 for each x ∈ I. �

It is immediate from Theorem 4.7 that if W[f1, . . . , fn](x0) 6= 0 for some x0 ∈ I, then{f1, . . . , fn} is a linearly independent set on I. The following corollary merely states the nextlogical conclusion.

Corollary 4.8. If W [f1, . . . , fn](x0) 6= 0, then {f1, . . . , fn} is a linearly independent set on anyinterval I containing x0 on which f1, . . . , fn have derivatives of all orders up to n− 1.

Example 4.9. The converse of Theorem 4.7 is not true general. That is, if W[f1, . . . , fn] ≡ 0on I, it does not necessarily follow that {f1, . . . , fn} is a linearly dependent set on I. Consider,for example, the functions f(x) = x3 and g(x) = |x|3 of Example 4.4. For x > 0 we havef(x) = g(x) = x3, and then

W [f, g](x) =

∣∣∣∣ f(x) g(x)f ′(x) g′(x)

∣∣∣∣ =

∣∣∣∣ x3 x3

3x2 3x2

∣∣∣∣ = 3x5 − 3x5 = 0.

For x < 0 we have

W [f, g](x) =

∣∣∣∣ x3 −x33x2 −3x2

∣∣∣∣ = −3x5 − (−3x5) = 0.

59

Now what of x = 0? We easily find that f ′(0) = 3(0)2 = 0, but for W [f, g](0) to be defined it isrequired that g′(0) be defined. By definition,

g′(0) = limh→0

g(h)− g(0)

h= lim

h→0

|h|3

h,

which we shall evaluate by evaluating the two one-sided limits. The right-hand derivative is

g′+(0) = limh→0+

|h|3

h= lim

h→0+

h3

h= lim

h→0+h2 = 0,

and the left-hand derivative is

g′−(0) = limh→0−

|h|3

h= lim

h→0−

−h3

h= lim

h→0−(−h2) = 0.

Now g′+(0) = g′−(0) = 0 implies g′(0) = 0. We compute

W [f, g](0) =

∣∣∣∣ f(0) g(0)f ′(0) g′(0)

∣∣∣∣ =

∣∣∣∣ 0 00 0

∣∣∣∣ = 0,

and conclude that W [f1, . . . , fn] ≡ 0 on (−∞,∞). Yet in Example 4.4 it was found that f andg are linearly independent on (−∞,∞). �

60

4.2 – The Theory of Linear Equations

An nth-order linear differential equation is an equation of the form

an(t)y(n)(t) + an−1(t)y(n−1)(t) + · · ·+ a1(t)y

′(t) + a0(t)y(t) = f(t), (4.2)

where an(t) is not the zero function. If f ≡ 0, then we say (4.2) is homogeneous; otherwise itis nonhomogeneous and call f(t) the nonhomogeneity.5 We call


′(t) + a0(t)y(t) = 0 (4.3)

the reduced equation for (4.2).If I is an interval containing no zeros of an(t), then (4.2) has an equivalent standard form

on I,

y(n)(t) +an−1(t)

an(t)y(n−1)(t) + · · ·+ a1(t)

an(t)y′(t) +

a0(t)

an(t)y(t) =

f(t)

an(t), (4.4)

obtained by dividing by an(t). (By “equivalent” we mean here that (4.2) and (4.4) have thesame solution set on I.) Defining

bn−1(t) =an−1(t)

an(t), . . . , b1(t) =

a1(t)

an(t), b0 =

a0(t)

an(t), g(t) =

f(t)

an(t),

we then may write (4.4) as

y(n)(t) + bn−1(t)y(n−1)(t) + · · ·+ b1(t)y

′(t) + b0(t)y(t) = g(t),

as is common practice.If a function ϕ : I → R satisfies

an(t)ϕ(n)(t) + an−1(t)ϕ(n−1)(t) + · · ·+ a1(t)ϕ

′(t) + a0(t)ϕ(t) = f(t)

for all t ∈ I, then ϕ(t) is a solution to (4.2) on I. The general solution to (4.2) on I isthe set of all solutions to the ODE on I. There is a robust theory pertaining to solutions tolinear differential equations, at least on intervals where all coefficient functions ak(t) and thenonhomogeneity f(t) are continuous. The following existence-uniqueness theorem provides thebasis for this theory, though its proof is deferred to a later time.

Theorem 4.10 (Existence-Uniqueness). Suppose a0(t), . . . , an(t) and f(t) are continuouson an open interval I, with an(t) having no zeros in I. If t0 ∈ I, then for any ξ0, ξ1, . . . , ξn−1 ∈ Rthere exists a unique solution on I to the initial-value problem

n∑k=0

ak(t)y(k)(t) = f(t), y(t0) = ξ0, y

′(t0) = ξ1, . . . , y(n−1)(t0) = ξn−1.

Remark. Henceforth, whenever considering the linear ODE (4.2), we will always assume thereexists some open interval in R on which all coefficient functions ak(t) as well as f(t) arecontinuous, thereby obviating the need to replicate much of the first sentence in Theorem 4.10in the statement of every theoretical result.

5This use of the word “homogeneous” is in no way related to how it was used in §2.6.

61

Let Λn be the linear differential operator defined by

Λn[ϕ] = anϕ(n) + an−1ϕ

(n−1) + · · ·+ a1ϕ′ + a0ϕ, (4.5)

for any function ϕ that has derivatives up to order n on some open interval I, so that

Λn[ϕ](t) = an(t)ϕ(n)(t) + an−1(t)ϕ(n−1)(t) + · · ·+ a1(t)ϕ

′(t) + a0(t)ϕ(t).

Then (4.2) may be written more compactly as

Λn[y](t) = f(t),

or even more compactly as Λn[y] = f . Crucially, Λn is a linear operator, which is to say that for

any constants c1, . . . , cm ∈ R and functions ϕ1, . . . , ϕm for which ϕ(n)1 , . . . , ϕ

(n)m are defined on

some open interval I ⊆ R,

Λn

[m∑k=1

ckϕk

]=

m∑k=1

ckΛn[ϕk]

on I. This has the following important consequence.

Proposition 4.11 (Homogeneous Superposition Principle). If y1, . . . , ym are solutionsto Λn[y] = 0 on I, then

m∑k=1

ckyk

is also a solution to Λn[y] = 0 on I for any c1, . . . , cm ∈ R.

Proof. Suppose y1, . . . , ym are m solutions to the homogeneous equation Λn[y] = 0 on I, meaningΛn[yk](t) = 0 for all 1 ≤ k ≤ m and t ∈ I, then for any choice of constants c1, . . . , cm ∈ R wehave, by the linearity of Λn,

Λn[c1y1 + · · ·+ cmym](t) = c1Λn[y1](t) + · · ·+ cmΛn[ym](t) = c1 · 0 + · · ·+ cm · 0 = 0

for all t ∈ I. Therefore c1y1 + · · ·+ cmym is also a solution to Λn[y] = 0 on I. �

Now suppose Y = {y1, . . . , yn} is a set of solutions to Λn[y] = 0 on an open interval I thatcontains no zeros of the function an. If S represents the set of all solutions to Λn[y] = 0 on I(i.e. the general solution on I), then Span(Y ) ⊆ S by the Homogeneous Superposition Principle;that is, every linear combination of the functions y1, . . . , yn is a solution to Λn[y] = 0 on I.

The question is, does equality Span(Y ) = S hold? That is, is every solution to Λn[y] = 0on I expressible as a linear combination of the functions y1, . . . , yn? To determine the answer,suppose ϕ ∈ S, so that ϕ is a solution to Λn[y] = 0 on I. In order to conclude that ϕ ∈ Span(Y ),we must find constants c1, . . . , cn ∈ R such that ϕ = c1y1 + · · ·+ cnyn on I. Toward that end,consider that if, for some fixed τ ∈ I, we could find constants c1, . . . , cn such that

c1y1(τ) + · · ·+ cnyn(τ) = ϕ(τ)c1y′1(τ) + · · ·+ cny

′n(τ) = ϕ′(τ)

......

...

c1y(n−1)1 (τ) + · · ·+ cny

(n−1)n (τ) = ϕ(n−1)(τ)

(4.6)

62

is satisfied, then both c1y1 + · · ·+ cnyn and ϕ would be solutions to the IVP

Λn[y](t) = 0, y(τ) = ϕ(τ), y′(τ) = ϕ′(τ), . . . , y(n−1)(τ) = ϕ(n−1)(τ)

on I, and since Theorem 4.10 states that any solution on I to the initial-value problem must beunique, we would conclude that ϕ = c1y1 + · · ·+ cnyn on I, and therefore ϕ ∈ Span(Y ). In short,S = Span(Y ) if there is a solution (c1, . . . , cn) to the system (4.6) for some particular τ ∈ I.

What we require is a clear criterion which, when satisfied, guarantees the system (4.6) has asolution. Defining

c =

c1...cn

and y =

ϕ(τ)...

ϕ(n−1)(τ)

,and using notation introduced in the previous section, the system (4.6) may be expressed simplyas (

M[Y ](τ))c = y. (4.7)

Now the Invertible Matrix Theorem (Theorem 4.5) implies that the equation 4.7 possesses asolution if W[Y ](τ) 6= 0. Thus, in order to conclude that ϕ ∈ Span(Y ), it is sufficient thatW[Y ](τ) 6= 0 for some particular τ ∈ I. However, Theorem 4.7 informs us that no such τ canexist in I if y1, . . . , yn are linearly dependent on I. Therefore, in order for Span(Y ) to haveany prospect of equalling the general solution S to Λn[y] = 0 on I, it is necessary that Y be alinearly independent set of functions on I. But is this sufficient? The answer turns out to beyes, but to establish this we first need a couple more results.

Theorem 4.12. Let Y = {y1, . . . , yn} be a set of functions that are solutions to Λn[y] = 0 onan open interval I having no zeros of an(t). Then Y is linearly independent on I if and only ifW [Y ](t) 6= 0 for all t ∈ I.

Proof. Theorem 4.7 makes clear that if W[Y ](t) 6= 0 for even a single value t ∈ I, then Y islinearly independent on I. It only remains to show that if Y is linearly independent on I, thenW[Y ](t) never equals 0 on I. Suppose that W[Y ](τ) = 0 for some τ ∈ I. Then the InvertibleMatrix Theorem implies the matrix A =M[Y ](τ) is such that Ac = 0 has a nontrivial solution.That is, there exists c 6= 0 satisfying Ac = 0, or equivalently there exist c1, . . . , cn not all equalto 0 for which the system

c1y1(τ) + · · ·+ cnyn(τ) = 0c1y′1(τ) + · · ·+ cny

′n(τ) = 0

......

...

c1y(n−1)1 (τ) + · · ·+ cny

(n−1)n (τ) = 0

(4.8)

is satisfied. Define two functions on I:

ϕ1 =n∑k=1

ckyk and ϕ2 = 0.

Both ϕ1 and ϕ2 are solutions to Λn[y] = 0 on I. Also, from (4.8) we have

ϕ1(τ) = 0, ϕ′1(τ) = 0, . . . , ϕ(n−1)1 (τ) = 0,

63

and it is clear that

ϕ2(τ) = 0, ϕ′2(τ) = 0, . . . , ϕ(n−1)2 (τ) = 0.

Hence ϕ1 and ϕ2 are both solutions to the initial value problem

n∑k=0

ak(t)y(k)(t) = 0, y(τ) = 0, y′(τ) = 0, . . . , y(n−1)(τ) = 0,

and since any solution to such an IVP is unique by Theorem 4.10, it follows that ϕ1 = ϕ2 on I.That is,

n∑k=1

ckyk ≡ 0

on I. Now, because the ck values are not all equal to 0, we conclude that the functions y1, . . . , ynare not linearly independent on I, which is to say the set Y is linearly dependent on I. Thisfinishes the proof. �

Definition 4.13. A fundamental set to Λn[y] = 0 on an interval I is a set {y1, . . . , yn} of nlinearly independent solutions to Λn[y] = 0 on I.

Thus any linearly independent set of n functions on I that also happen to be solutions to annth-order equation Λn[y] = 0 on I is a fundamental set to the equation on I. The conditions fora fundamental set to exist are light, as the next theorem shows.

Theorem 4.14. There is a fundamental set to Λn[y] = 0 on any open interval I where all ak(t)are continuous and an(t) has no zeros.

Proof. Fix t0 ∈ I. For each integer 0 ≤ k ≤ n − 1, by Theorem 4.10, there exists a uniquesolution yk to the initial-value problem consisting of the differential equation Λn[y] = 0 togetherwith the initial conditions given by

y(j)(t0) =

{0, if 0 ≤ j ≤ n− 1 and j 6= k

1, if j = k.

Thus Λn[yk] = 0, and for each 0 ≤ j ≤ n− 1 we have6 y(j)k (t0) = δjk. Since the n functions in the

set {y0, . . . , yn−1} are solutions to Λn[y] = 0 on I, it remains only to show that they are linearlyindependent on I. This we do using Definition 4.1. Suppose constants c0, . . . , cn−1 are such that

n−1∑k=0

ckyk = 0

on I. Thenn−1∑k=0

cky(j)k = 0

6Recall the Kronecker delta function: δij = 0 if i 6= j, and δii = 1.

64

on I for all 0 ≤ j ≤ n− 1, and since t0 ∈ I it follows that

0 =n−1∑k=0

cky(j)k (t0) =

n−1∑k=0

ckδjk = cj

for all 0 ≤ j ≤ n − 1. Hence {y0, . . . , yn−1} is a set of n linearly independent solutions toΛn[y] = 0 on I, and therefore constitutes a fundamental set to Λn[y] = 0 on I. �

Theorem 4.15. Suppose an(t) has no zeros on an interval I. If Y = {y1, . . . , yn} is a funda-mental set on I to Λn[y] = 0, then

Span(Y ) =

{n∑k=1

ckyk : c1, . . . , cn ∈ R

}is the general solution to Λn[y] = 0 on I.

Proof. Suppose Y = {y1, . . . , yn} is a fundamental set on I to Λn[y] = 0. The HomogeneousSuperposition Principle makes clear that Span(Y ) ⊆ S. Let ϕ ∈ S. As discussed earlier, we willobtain ϕ ∈ Span(Y ) if there exists τ ∈ I such that the system (4.6) has a solution (c1, . . . , cn).We found that this will be the case if W[Y ](τ) 6= 0 for some τ ∈ I, and since Theorem 4.12ensures that W [Y ](t) 6= 0 for all t ∈ I, such a τ does indeed exist. It follows that ϕ ∈ Span(Y ),and therefore Span(Y ) = S. �

Theorem 4.15 implies that if S is the general solution to Λn[y] = 0 on an interval I containing

no zeros of an(t), then the vector space dimension of S is dim(S) = n. Let Y = {y1, . . . , ym} beany set of m distinct (but not necessarily linearly independent) solutions to Λn[y] = 0 on I. If

m < n, then dim(Span Y ) ≤ m < n = dim(S), and since Span(Y ) is a subspace of S it follows7

that Span(Y ) 6= S. That is, Λn[y] = 0 has solutions that are not expressible in the form

c1y1 + c2y2 + · · ·+ cmym

for some choice of real numbers c1, c2, . . . , cm if m < n. The general solution to an nth-orderlinear ODE therefore cannot be generated by a set containing fewer than n linearly independentsolutions to the ODE. If m > n, on the other hand, then it is possible that Span(Y ) = S, but

in this case Y cannot ever be a linearly independent set.

Example 4.16. Consider the linear ODE√

1 + t3y′′ − t2y′ + y = 0.

Here a2(t) =√

1 + t3 has domain [−1,∞), and in particular has no zeros on (−1,∞). Accordingto Theorem 4.14 the ODE has a fundamental set {y1, y2} on (−1,∞), and according to Theorem4.15 the general solution to the ODE on (−1,∞) is given by c1y1 + c2y2. �

7See §3.6 of [LIN].


65

In the statement of the next theorem we make use of a specialized notation. If V is a vectorspace, W is a subspace of V , and v ∈ V , then we define the set

v +W = {v + w : w ∈ W},

called a coset of W in V .

Theorem 4.17. Suppose all ak(t) and f(t) are continuous on an interval I, and an(t) has nozeros on I. If yp(t) is a particular solution to Λn[y] = f(t) on I, and Y = {y1, . . . , yn} is afundamental set on I to the reduced equation Λn[y] = 0, then

yp + Span(Y ) =

{yp +

n∑k=1

ckyk : c1, . . . , cn ∈ R

}is the general solution to Λn[y] = f(t) on I.

Proof. Let the set S denote the general solution to Λn[y] = f(t) on I. Suppose yp ∈ S, so thatΛn[yp(t)] = f(t) for all t ∈ I. Also suppose Y = {y1, . . . , yn} is a fundamental set to Λn[y] = 0on I, meaning Λn[yk] = 0 on I for each 1 ≤ k ≤ n. If ϕ ∈ yp + Span(Y ), then

ϕ = yp +n∑k=1

ckyk

for some c1, . . . , cn ∈ R, so that

Λn[ϕ] = Λn

[yp +

n∑k=1

ckyk

]= Λn[yp] +

n∑k=1

ckΛn[yk] = f +n∑k=1

ck · 0 = f,

and thus ϕ ∈ S. This shows that yp + Span(Y ) ⊆ S.Next suppose that ϕ ∈ S, so Λn[ϕ] = f on I. Now,

Λn[ϕ− yp] = Λn[ϕ]− Λn[yp] = f − f = 0

shows that ϕ−yp is a solution to the reduced equation Λn[y] = 0 on I. Since Y is a fundamentalset to Λn[y] = 0 on I, Theorem 4.15 implies that ϕ− yp ∈ Span(Y ); that is,

ϕ− yp =n∑k=1

ckyk

for some c1, . . . , cn ∈ R, and hence

ϕ = yp +n∑k=1

ckyk ∈ yp + Span(Y ).

Therefore S ⊆ yp + Span(Y ), and the proof is done. �

The Homogeneous Superposition Principle proven earlier is just a special case of the followingresult, known simply as the Superposition Principle.

66

Theorem 4.18 (Superposition Principle). For each 1 ≤ k ≤ m let yk be a solution to


′(t) + a0(t)y(t) = fk(t)

on an interval Ik. Provided I =⋂mk=1 Ik 6= ∅, then for any constants b1, . . . , bm the function∑m

k=1 bkyk is a solution to


′(t) + a0(t)y(t) =m∑k=1

bkfk(t) (4.9)

on I.

Proof. Recalling (4.5), for each 1 ≤ k ≤ m we have Λn[yk](t) = fk(t) for all t ∈ Ik. Thus fort ∈ I we obtain, by the linearity of the operator Λn,

Λn

[m∑k=1

bkyk

](t) =

(m∑k=1

bkΛn[yk]

)(t) =

m∑k=1

bkΛn[yk](t) =m∑k=1

bkfk(t),

and the proof is finished. �

Example 4.19. The differential equation y′′ + 2y′ = 2t+ 5 has general solution

y = a1 + a2e−2t + 2t+

1

2t2

for a1, a2 ∈ R, and so setting a1 = a2 = 0 gives 2t+ t2/2 as a particular solution. Meanwhiley′′ + 2y′ = −e−2t has general solution

y = b1 + b2e−2t +

1

2te−2t,

so te−2t/2 is a particular solution. By the Superposition Principle, then, the equation

y′′ + 2y′ = 2t+ 5− e−2t (4.10)

has particular solution

yp = 2t+1

2t2 +

1

2te−2t.

In fact (a1 + a2e

−2t + 2t+1

2t2)

+

(b1 + b2e

−2t +1

2te−2t

)is a particular solution to (4.10) for any choice of constants a1, a2, b1, b2 ∈ R. Rewriting this as

(a1 + b1) + (a2 + b2)e−2t +

(2t+

1

2t2 +

1

2te−2t

),

and letting c1 = a1 + b1 and c2 = a2 + b2, we obtain the two-parameter family of solutions

c1 + c2e−2t +

(2t+

1

2t2 +

1

2te−2t

),

which is in fact the general solution to (4.10). To see this, we simply note that 1 and e−2t arelinearly independent solutions to y′′ + 2y′ = 0, and then apply Theorem 4.17. �

67

4.3 – Abel’s Formula

Determinants, and hence the Wronskian function, can be quite tedious to evaluate, andfor the Wronskian in particular it would seem that the determinant W[y1, . . . , yn](x) wouldneed to be evaluated from scratch using cofactor expansion for each value of x we wish toconsider. Remarkably, however, there is a relatively simple formula for the Wronskian, at leastfor homogeneous linear equations. It features a constant c that may be found by evaluatingW[y1, . . . , yn](x) just once using cofactor expansion, for some value x0 within an interval ofvalidity I, whereafter the formula will give Wronskian values for all x in I.

Before giving the formula we establish a simple continuity property of the Wronskian function.

Proposition 4.20. If {y1, . . . , yn} is a fundamental set to

y(n) + bn−1(x)y(n−1) + · · ·+ b1(x)y′ + b0(x)y = 0

on I, then W [y1, . . . , yn] is continuous on I.

Proof. Since y(n) exists on I, the functions y, y′, . . . , y(n−1) must be differentiable on I andhence continuous there. Using the Leibniz formula for the determinant function (see §5.6 of[LIN]), we have

W [y1, . . . , yn](x) =∑σ∈Sn

sgn(σ)n∏i=1

y(i−1)σ(i) (x).

Thus the Wronskian is constructed as a finite sum of products of y, y′, . . . , y(n−1), and thereforemust be continuous on I. �

Theorem 4.21 (Abel’s Formula). If {y1, . . . , yn} is a fundamental set to

y(n) + bn−1(x)y(n−1) + · · ·+ b1(x)y′ + b0(x)y = 0 (4.11)

on I, then there exists some constant k such that

W [y1, . . . , yn](x) = ke−´bn−1(x)dx

for all x ∈ I.

Proof. We shall here present the proof for the case when n = 2. The proof of the general caseis given at the end of the section.

For n = 2 a fundamental set consists of two linearly independent functions y1 and y2. Inparticular these functions satisfy

y′′k = −b1(x)y′k − b0(x)yk

for k = 1, 2 and x ∈ I. Defining W(x) =W [y1, y2](x) for the sake of brevity, for each x ∈ I wehave

W ′(x) =d

dx

[y1(x)y′2(x)− y′1(x)y2(x)

]= (y1y

′′2 + y′1y

′2)− (y′′1y2 + y′1y

′2) = y1y

′′2 − y′′1y2

= y1[− b1(x)y′2 − b0(x)y2

]− y2

[− b1(x)y′1 − b0(x)y1

]


68

= −b1(x)(y1y′2 − y′1y2) = −b1(x)W(x).

ThusdWdx

= −b1(x)W (4.12)

on I, which is a separable equation that impliesˆ1

WdW = −

ˆb1(x)dx+ C.

Solving in the usual way yields

|W| = ce−´b1(x) dx,

where c = eC . Now, since W(x) 6= 0 for all x ∈ I by Theorem 4.12, the Intermediate ValueTheorem and Proposition 4.20 imply that either W > 0 on I or W < 0 on I, resulting in either

W [y1, y2](x) = ce−´b1(x) dx

for all x ∈ I, or

W [y1, y2](x) = −ce−´b1(x) dx

for all x ∈ I. Thus, for all x ∈ I,

W [y1, y2](x) = ke−´b1(x)dx

for some constant k. �

It should be noted that the constant k that replaces the c or −c at the end of the foregoingproof is not arbitrary. The integral at right in the final equation represents a particularantiderivative of b1(x), say B1(x), so that

W [y1, y2](x) = ke−B1(x);

then, for any particular x0 ∈ I, we find that

k =W [y1, y2](x0)

e−B1(x0).

More generally

k =W [y1, . . . , yn](x0)

exp[−Bn−1(x0)],

where Bn−1(x) is a particular antiderivative of bn−1(x). We see that the value of k will dependon the choice made for the antiderivative of bn−1(x), as well as the choice of fundamental set{y1, . . . , yn}.

Example 4.22. In Example 4.16 we saw that√

1 + x3y′′ − x2y′ + y = 0,

must have a fundamental set {y1, y2} on (−1,∞). Suppose in particular that

y1(1) = 1, y′1(1) = 0, y2(1) = −1, y′2(1) = 1.

Find the Wronskian of y1, y2 on (−1,∞).

69

Solution. First we put the equation into standard form:

y′′ − x2√1 + x3

y′ +1√

1 + x3y = 0.

We have ˆb1(x)dx = −

ˆx2√

1 + x3dx = −2

3

√1 + x3,

and so by Abel’s Formula there exists some c such that

W [y1, y2](x) = ce−´b1(x) dx = ce2

√1+x3/3

for all x ∈ (−1,∞). Since 1 ∈ (−1,∞) and

W [y1, y2](1) = y1(1)y′2(1)− y′1(1)y2(1) = (1)(1)− (−1)(0) = 1,

it follows thatce2√1+13/3 = 1,

and hence c = e−2√2/3. Therefore

W [y1, y2](x) = exp

(2

3

√1 + x3 − 2

3

√2

)for all x ∈ (−1,∞). �

Proposition 4.23. Suppose y1 is a nontrivial solution to

y′′ + b1(x)y′ + b0(x)y = 0

on the interval I. If y1 is nonvanishing on an interval J ⊆ I, then y2 : J → R given by

y2(x) = y1(x)

ˆe−´b1(x)dx

y21(x)dx (4.13)

is such that {y1, y2} forms a fundamental set to the ODE on J .

Proof. Suppose y1 is nonvanishing on an interval J ⊆ I. By Theorem 4.14 there must exist somefunction y such that {y1, y} is a fundamental set to the ODE on J . RecallingW [y1, y] = y1y

′−y′1y,Abel’s Formula implies there exists a constant k such that

y1(x)y′(x)− y′1(x)y(x) = ke−´b1(x)dx

for all x ∈ J . Since y1(x) 6= 0 for all x ∈ J , we may divide to obtain

y′ − y′1(x)

y1(x)y =

ke−´b1(x)dx

y1(x),

a standard-form first-order ODE that by Theorem 2.8 has solution

y(x) = e−´p(x)dx

(ˆq(x)e

´p(x)dxdx+ c

)for

p(x) = −y′1(x)

y1(x)and q(x) =

ke−´b1(x)dx

y1(x),

70

and arbitrary c ∈ R. Letting u = y1(x) gives

e´p(x)dx = exp

(−ˆ

1

udu

)= e− ln |u| =

1

|u|=

1

|y1(x)|,

and similarly

e−´p(x)dx = |y1(x)|.

Setting c = 0, we obtain

y(x) = |y1(x)|ˆ

ke−´b1(x)dx

y1(x)|y1(x)|dx.

Since y1 is nonvanishing on J , either y1(x) > 0 for all x ∈ J (so that |y1(x)| = y1(x)), ory1(x) < 0 for all x ∈ J (so |y1(x)| = −y1(x)). In either case we obtain

y(x) = ky1(x)

ˆe−´b1(x)dx

y21(x)dx, (4.14)

as a solution to the ODE on J . The ODE is homogeneous, so any constant multiple of thefunction (4.14) is also a solution. Multiplying by 1/k in particular yields precisely (4.13) as asolution on J .

It remains to show that y2 as defined by (4.13) is such that {y1, y2} is indeed a linearlyindependent set on J . But {y1, y2} is linearly dependent only if y2 is a constant multiple of y1on J , which is to say ˆ

e−´b1(x)dx

y21(x)dx (4.15)

is a constant function on J . This can only be the case if the integrand in (4.15) is the zerofunction on J (only the zero function can have a constant antiderivative), which clearly isimpossible, and therefore {y1, y2} is linearly independent. �


(x2 − x)y′′ + 2y′ − 6y = 0 (4.16)

on (1,∞).

Solution. Since the coefficients of the ODE (4.16) are polynomials, there is a possibility (thoughno guarantee) that at least one solution has the form xm. Letting y = xm in the ODE yields

(x2 − x) ·m(m− 1)xm−2 + 2mxm−1 − 6xm = 0,

which with some algebra becomes

(m− 3)(m+ 2)xm +m(3−m)xm−1 = 0.

This equation would be satisfied for all x > 1 if there exists some m such that (m−3)(m+2) = 0and m(3−m) = 0 both hold simultaneously. Only one value works, namely m = 3, and so wefind that y1(x) = x3 is a solution to the ODE on (1,∞).

Since x2 − x 6= 0 for all x > 1, we may divide (4.16) by x2 − x to obtain the new equation

y′′ +2

x2 − xy′ − 6

x2 − xy = 0, (4.17)

71

which likewise has solution y1(x) = x3 on (1,∞). Since the new equation (4.17) is in standardform we may apply Proposition 4.23 to find another solution y2 for it that is valid on (1,∞),and this second solution y2 will also be a solution to (4.16) on (1,∞). The formula in theproposition gives

y2(x) = x3ˆe−´2/(x2−x)dx

(x3)2dx,

where

−ˆ

2

x2 − xdx = −2

ˆ (1

x− 1− 1

x

)dx = −2 ln

(x− 1

x

)= ln

(x

x− 1

)2

for x > 1 (using partial fraction decomposition), and thus

y2(x) = x3ˆeln[x

2/(x−1)2]

x6dx = x3

ˆ1

x4(x− 1)2dx

= x3ˆ (

4

x+

3

x2+

2

x3+

1

x4− 4

x− 1+

1

(x− 1)2

)dx

= x3[4 lnx− 3

x− 1

x2− 1

3x3− 4 ln(x− 1)− 1

x− 1

]= 4x3 ln

(x

x− 1

)− x3

x− 1− 3x2 − x− 1

3

is another solution to (4.17) on (1,∞) that is linearly independent from y1.By Proposition 4.23 and Theorem 4.17 the general solution to (4.17) on (1,∞) is

y(x) = c1x3 + c2

[4x3 ln

(x

x− 1

)− x3

x− 1− 3x2 − x− 1

3

],

which also is the general solution to (4.16) on (1,∞). �

The general proof of Abel’s Formula uses the fact that a determinant with two identicalrows equals zero, or equivalently∑

σ∈Sn

sgn(σ)

(bk,σ(k)bk,σ(`)

∏i∈In\{k,`}

bi,σ(i)

)= 0 (4.18)

for In = {1, 2, . . . , n}, which is proven in §5.6 of [LIN].

General Proof of Abel’s Formula Let n ≥ 2. Suppose Y = {y1, . . . , yn} is a fundamentalset to (4.11), so that

y(n)i = −

n∑k=1

bk−1(x)y(k−1)i (4.19)

for each 1 ≤ i ≤ n. Using the Leibniz formula for the determinant, the Wronskian of Y is

W [Y ](x) =∑σ∈Sn

sgn(σ)n∏i=1

y(i−1)σ(i) (x).


72

and so its derivative is

W [Y ]′(x) =∑σ∈Sn

sgn(σ)n∑k=1

(y(k)σ(k)

∏i∈In\{k}

y(i−1)σ(i)

)

=∑σ∈Sn

sgn(σ)

y(n)σ(n)

n−1∏i=1

y(i−1)σ(i) +

n−1∑k=1

(y(k)σ(k)

∏i∈In\{k}

y(i−1)σ(i)

) .Now (4.19) implies that

W [Y ]′(x) =∑σ∈Sn

sgn(σ)

(− n∑k=1

bk−1(x)y(k−1)σ(n)

)n−1∏i=1

y(i−1)σ(i) +

n−1∑k=1

(y(k)σ(k)

∏i∈In\{k}

y(i−1)σ(i)

)=∑σ∈Sn

sgn(σ)

[− bn−1(x)

n∏i=1

y(i−1)σ(i) −

n−1∑k=1

bk−1(x)y(k−1)σ(n)

n−1∏i=1

y(i−1)σ(i)

+n−1∑k=1

y(k)σ(k)

∏i∈In\{k}

y(i−1)σ(i)

]

= −bn−1(x)W [Y ](x)−n−1∑k=1

bk−1(x)∑σ∈Sn

sgn(σ)y(k−1)σ(n)

n−1∏i=1

y(i−1)σ(i)

+n−1∑k=1

∑σ∈Sn

sgn(σ)y(k)σ(k)

∏i∈In\{k}

y(i−1)σ(i) .

The last two sums over σ ∈ Sn equal zero by (4.18). Therefore

W [Y ]′(x) = −bn−1(x)W [Y ](x),

and the rest of the proof proceeds as in the n = 2 case starting with equation (4.12). �

73

4.4 – Homogeneous Equations with Constant Coefficients

We treat here homogeneous linear differential equations of the form

any(n)(x) + an−1y

(n−1)(x) + · · ·+ a1y′(x) + a0y(x) = 0, (4.20)

where the real coefficients a1, . . . , an are constants with an 6= 0. As we will see, the generalsolution to (4.20) can be fully determined from the solutions to the polynomial equation

anrn + an−1r

n−1 + · · ·+ a1r + a0 = 0,

called the characteristic equation8 for (4.20).It will be instructive (and considerably easier) to first examine the n = 2 case, with ODE

a2y′′ + a1y

′ + a0y = 0 (4.21)

and characteristic equation

a2r2 + a1r + a0 = 0 (4.22)

for a2 6= 0. Assuming a0, a1, a2 ∈ R, then the characteristic equation’s solution set may consistof two distinct real roots ρ1, ρ2, or a repeated real root ρ, or a complex conjugate pair α± iβ.Suppose (4.22) has distinct real roots ρ1 and ρ2, so that

a2ρ2k + a1ρk + a0 = 0 (4.23)

for k = 1, 2. Now, the equation (4.21) suggests that y′ and y′′ are merely constant multiplesof y, which is a feature of the exponential function erx for any constant r. At the very leastit seems worthwhile to conjecture that a solution to (4.21) has the form y(x) = erx for someappropriate choice of constant r. Substituting erx for y on the left side of (4.21) gives

a2y′′ + a1y

′ + a0y = a2(erx)′′ + a1(e

rx)′ + a0erx

= a2(r2erx) + a1(re

rx) + a0erx

= (a2r2 + a1r + a0)e

rx, (4.24)

and hence if we replace r with ρk, by (4.23) we obtain

a2y′′ + a1y

′ + a0y = (a2ρ2k + a1ρk + a0)e

ρkx = 0 · eρkx = 0.

That is, the functions y1(x) = eρ1x and y2(x) = eρ2x are in fact solutions to (4.21) on (−∞,∞)in the eventuality that ρ1 and ρ2 are distinct real roots to the characteristic equation (4.22).Since ρ1 6= ρ2, the functions eρ1x and eρ2x are linearly independent on (−∞,∞), and therefore{eρ1x, eρ2x} is a fundamental set to (4.21). By Theorem 4.17 it follows that the general solutionto (4.21) is

y(x) = c1eρ1x + c2e

ρ2x

for arbitrary c1, c2 ∈ R.

8Some authors use the term auxiliary equation instead.

74

We next consider the case when (4.22) has a repeated real root ρ. Since a2 6= 0, we maysimplify the analysis by dividing (4.21) by a2 to obtain the standard form

y′′ + b1y′ + b0y = 0, (4.25)

where b1 = a1/a2 and b0 = a0/a2. This ODE has the same general solution as (4.21), and sinceits characteristic equation

r2 + b1r + b0 = 0

must have the same roots as (4.22), it likewise has ρ as a repeated real root, and hence

r2 + b1r + b0 = (r − ρ)2.

This implies not only that ρ2 + b1ρ + b0 = 0, but also 2ρ + b1 = 0 (left as an exercise), factswe will need shortly. In light of our findings when it was assumed that (4.22) had distinct realroots, we should not be surprised to discover that eρx is a solution to (4.25) on (−∞,∞). Butwe know that any fundamental set to (4.25) must consist of two linearly independent functions.What candidates are there for the second function? In fact we find that xeρx is another solutionon (−∞,∞). To show this, we simply substitute xeρx for y in (4.25) to obtain

y′′ + b1y′ + b0y = (xeρx)′′ + b1(xe

ρx)′ + b0xeρx

= (2ρ+ ρ2x)eρx + b1(1 + ρx)eρx + b0xeρx

= [(2ρ+ b1) + (ρ2 + b1ρ+ b0)x]eρx

= 0 · eρx = 0

for all x ∈ (−∞,∞). Thus we see that {eρx, xeρx} is a fundamental set to (4.25), and thereforealso for (4.21).

Aside from merely guessing, one way to determine that xeρx is a solution to (4.25) in the casewhen the characteristic equation has a double root is to put the obvious solution y1(x) = eρx

into the formula supplied by Proposition 4.23 to obtain

y2(x) = y1(x)

ê−´b1 dx

y21(x)dx = eρx

ê−b1x

(eρx)2dx = eρx

ê−(2ρ+b1)xdx = eρx

ˆdx = xeρx,

recalling that 2ρ+ b1 = 0 in the double root case. The following has now been proven.

Theorem 4.25. Suppose a0, a1, a2 ∈ R with a2 6= 0. If the characteristic equation for

a2y′′ + a1y

′ + a0y = 0

has distinct real roots ρ1 and ρ2, then the ODE has general solution

y(x) = c1eρ1x + c2e

ρ2x;

and if the characteristic equation has double root ρ, then the ODE has general solution

y(x) = c1eρx + c2xe

ρx.


y′′ − 4y′ + 3y = 0, y(0) = 1, y′(0) = 13.

75

Solution. The characteristic equation associated with the ODE is r2−4r+ 3 = 0, which factorsas (r − 3)(r − 1) = 0 and so yields two distinct real roots: ρ1 = 1 and ρ2 = 3. By Theorem4.25 it follows that y1(x) = ex and y2(x) = e3x are particular solutions to the ODE, and so thegeneral solution is

y(x) = c1ex + c2e

3x.

Now, using the initial condition y(0) = 1 we obtain 1 = c1e0 + c2e

3(0), or c1 + c2 = 1. Also,since y′(x) = c1e

x+3c2e3x, the initial condition y′(0) = 1

3gives 1

3= c1e

0+3c2e3(0), or c1+3c2 = 1

3.

Thus arises the system of equations {c1 + c2 = 1

c1 + 3c2 = 13

Solving this system yields c1 = 43

and c2 = −13, and so

y(x) = 43ex − 1

3e3x

is the (unique) solution to the initial value problem. �


y′′ − 6y′ + 9y = 0, y(0) = 2, y′(0) = 253.

Solution. The characteristic equation associated with the ODE is r2 − 6r + 9 = 0. Factoringgives (r − 3)2 = 0, and so we obtain the repeated real root ρ = 3. Theorem 4.25 impliesy1(x) = e3x and y2(x) = xe3x are particular solutions to the ODE, and so the general solution is

y(x) = c1e3x + c2xe

3x.

Now, using the initial condition y(0) = 2 we obtain 2 = c1e0 + c2(0)e3(0), or c1 = 2. Also,

since

y′(x) = 3c1e3x + c2e

3x + 3c2xe3x,

the initial condition y′(0) = 253

gives 253

= 3c1e0 + c2e

3(0) + 3c2(0)e3(0), or 3c1 + c2 = 253

. We thushave the system of equations {

c1 = 2

3c1 + c2 = 253

Solving for c2 results in c2 = 73, and so

y(x) = 2e3x + 73xe3x

is the solution to the initial value problem. �

Before considering the final case when the characteristic equation for (4.21) has complexconjugate roots, we establish some basic facts about the exponential function as defined on thecomplex plane

C = {x+ iy : x, y ∈ R},

76

where i is the number for which i2 = −1. In calculus the natural logarithm function ln :(0,∞)→ R is usually defined by the formula

ln(x) =

ˆ x

0

1

tdt,

with the number e being defined as the unique real number for which ln(e) = 1. The naturallogarithm function is one-to-one, and therefore has an inverse known as the exponential functionexp : R→ (0,∞). By definition

ex = exp(x)

for all x ∈ R, and thus ex = y if and only if ln(y) = x for all x ∈ R and y > 0.The exponential function is extended to become a complex-valued function on C by the

formulaexp(x+ iy) = ex(cos y + i sin y),

with ez = exp(z) for any z ∈ C by definition, and hence

ex+iy = ex cos y + iex sin y. (4.26)

The first property of this extended exponential function is the following.

Proposition 4.28. For any α, β ∈ R,

eiα+iβ = eiαeiβ.

Proof. Using established trigonometric identities, we have

eiα+iβ = ei(α+β) = cos(α + β) + i sin(α + β)

= (cosα cos β − sinα sin β) + i(sinα cos β + i cosα sin β)

= cos β(cosα + i sinα) + sin β(i cosα− sinα)

= cos β(cosα + i sinα) + i sin β(cosα + i sinα)

= (cosα + i sinα)(cos β + i sin β) = eiαeiβ

for any α, β ∈ R. �

Remark. Using Proposition 4.28 it can be further shown that ewez = ew+z for any w, z ∈ C,but we will not need this fact here.

The formula (4.26) is the most “natural” extension of the exponential function in the sensethat it is the only extension that results in a function that is differentiable on all of C. In generalthe derivative of a complex-valued function f of a single complex variable z = x+ iy at z0 ∈ Cis defined to be

f ′(z0) = limz→z0

f(z)− f(z0)

z − z0,

provided the limit exists. This definition is of course “backward-compatible” with the definitionof derivative given in calculus.

Any complex-valued function f : S ⊆ R→ C of a single real variable x can be cast in the formf(x) = u(x) + iv(x), where u(x) and v(x) are real-valued functions on S. It is straightforward

77

to show that f ′(x) = u′(x) + iv′(x), and so f is differentiable at x if and only if u and v aredifferentiable at x in the sense defined in calculus. With this property we find that, for anyconstants α, β ∈ R,

d

dx

[e(α+iβ)x

]=

d

dx(eαx cos βx+ ieαx sin βx)

=d

dx(eαx cos βx) + i

d

dx(eαx sin βx)

= (αeαx cos βx− βeαx sin βx) + i (αeαx sin βx+ βeαx cos βx)

= eαx (α cos βx+ iα sin βx+ iβ cos βx− β sin βx)

= eαx(α + iβ)(cos βx+ i sin βx)

= (α + iβ) · eαx(cos βx+ i sin βx) = (α + iβ)e(α+iβ)x,

and henced

dx(ezx) = zezx (4.27)

for any fixed z ∈ C.Now we suppose (4.22) has complex conjugate roots, which is to say roots ρ1 = α + iβ and

ρ2 = α− iβ for α, β ∈ R with β 6= 0. Using (4.27) we find that the steps in (4.24) hold wheneither of these roots are substituted for r, and hence both eρ1x and eρ2x are solutions to (4.21).Observing that the theory developed in §4.2 applies equally well to fundamental sets consistingof complex-valued functions on a single real variable, we conclude that the general solution to(4.21) on (−∞,∞) may be presented as

y(x) = c1e(α+iβ)x + c2e

(α−iβ)x = eαx(c1eiβx + c2e

−iβx), (4.28)

where the parameters c1 and c2 may be taken to be complex valued.The functions e±iβx in (4.28) are not real-valued; however, since differential equations are

often used to model physical systems it would seem desirable to find an expression for thegeneral solution to (4.21) that features only real-valued functions. Finding such an expression isfacilitated by the following result.

Proposition 4.29. Let u(x) and v(x) be real-valued functions on an interval I ⊆ R. Ifu(x) + iv(x) is a solution to a2y

′′ + a1y′ + a0y = 0 on I, then u(x) and v(x) are also solutions

on I.

Proof. Suppose ϕ(x) = u(x) + iv(x) is a solution to (4.21) on I, to that

a2ϕ′′(x) + a1ϕ

′(x) + a0ϕ(x) = 0

for all x ∈ I. Then

a2[u′′(x) + iv′′(x)] + a1[u

′(x) + iv′(x)] + a0[u(x) + iv(x)] = 0,

and rearranging terms yields

[a2u′′(x) + a1u

′(x) + a0u(x)] + i[a2v′′(x) + a1v

′(x) + a0v(x)] = 0.

78

A complex number α + iβ equals zero if and only if α = 0 and β = 0, and so

a2u′′(x) + a1u

′(x) + a0u(x) = 0 and a2v′′(x) + a1v

′(x) + a0v(x) = 0,

which shows that u(x) and v(x) are both solutions to (4.21) on I. �

Now, sinceeαx+iβx = eαx cos βx+ ieαx sin βx

is a solution to (4.21) on (−∞,∞), Proposition 4.29 implies that

y1(x) = eαx cos βx and y2(x) = eαx sin βx

are likewise solutions to (4.21). We have proven the following.

Theorem 4.30. If the characteristic equation for a2y′′ + a1y

′ + a0y = 0 has complex conjugateroots α± iβ, then the ODE has general solution

y(x) = c1eαx cos βx+ c2e

αx sin βx.

The parameters c1 and c2 in Theorem 4.30 could still be taken to be complex-valued in ageneral mathematical setting, but in physical applications (such as when modeling a mass-springsystem) only real values have any potential relevance.

Example 4.31. Find the general solution to 2y′′ − 2y′ + 13y = 0.

Solution. The characteristic equation is 2r2 − 2r + 13 = 0, which solves to give

r =−(−2)±

√(−2)2 − 4(2)(13)

2(2)=

2±√−100

4=

2± 10i

4=

1

2± 5

2i.

That is, r = α± iβ with α = 12

and β = 52. By Theorem 4.30

y(x) = c1ex/2 cos(5x/2) + c2e

x/2 sin(5x/2)

is the general solution to the ODE. �


y′′ + 9y = 0, y(0) = 1, y′(0) = 1.

Solution. The characteristic equation is r2 + 9 = 0, which solves to give r = ±√−9 = ±3i.

That is, r = α± iβ with α = 0 and β = 3. By Theorem 4.30

y(x) = c1 cos 3x+ c2 sin 3x

is the general solution to the ODE. From the initial condition y(0) = 1 we then obtain

1 = c1 cos 0 + c2 sin 0,

and thus c1 = 1.Next, given that y′(x) = −3c1 sin 3x+ 3c2 cos 3x and y′(0) = 1, we obtain

1 = −3c1 sin 0 + 3c2 cos 0

79

and thus c2 = 13.

Therefore we conclude thaty(x) = cos 3x+ 1

3sin 3x

is the solution to the IVP. �

The two theorems 4.25 and 4.30 taken together have a natural generalization to addressnth order equations of the form (4.20). The proof of the next theorem is best handled usingdifferential operators and so is postponed to the end of the next section.

Theorem 4.33. If the characteristic equation for (4.20) has:

1. Real root ρ of multiplicity m, then solutions to the ODE are

eρx, xeρx, . . . , xm−1eρx.

2. Complex root α + iβ of multiplicity m, then solutions to the ODE are

eαx cos βx, xeαx cos βx, . . . , xm−1eαx cos βx

andeαx sin βx, xeαx sin βx, . . . , xm−1eαx sin βx.

In the second part of Theorem 4.33 recall that if α+ iβ is a complex root of the characteristicequation for (4.20), then α− iβ must also be a root since the polynomial in the characteristicequation has real coefficients.


y′′′ − 2y′′ − y′ + 2y = 0, y(0) = 2, y′(0) = 3, y′′(0) = 5.

Solution. The characteristic equation is

r3 − 2r2 − r + 2 = 0,

and since

r3 − 2r2 − r + 2 = r2(r − 2)− (r − 2) = (r − 2)(r2 − 1) = (r − 2)(r − 1)(r + 1)

the roots are 2,−1, 1. By Theorem 4.33 it follows that e−x, ex, and e2x are solutions to theODE on (−∞,∞), so that {e−x, ex, e2x} is a fundamental set and Theorem 4.17 implies thatthe general solution to the ODE is

y(x) = c1e−x + c2e

x + c3e2x. (4.29)

Next, from (4.29) we obtain

y′(x) = −c1e−x + c2ex + 2c3e

2x

andy′′(x) = c1e

−x + c2ex + 4c3e

2x,

and these equations, together with the given initial conditions, form the system of equations{c1 + c2 + c3 = 2−c1 + c2 + 2c3 = 3c1 + c2 + 4c3 = 5

80

The solution to the system is (c1, c2, c3) = (0, 1, 1), and therefore from (4.29) we find that

y(x) = ex + e2x

is the solution to the initial value problem on (−∞,∞). �

Example 4.35. Find the general solution to y(4) + 13y′′ + 36y = 0.


r4 + 13r2 + 36 = 0,

and since

r4 + 13r2 + 36 = (r2 + 4)(r2 + 9) = (r − 2i)(r + 2i)(r − 3i)(r + 3i)

the roots are ±2i,±3i. We have α + iβ = 2i if α = 0 and β = 2, so by Theorem 4.33 it followsthat cos 2x and sin 2x are solutions to the ODE on (−∞,∞). Similarly α + iβ = 3i if α = 0and β = 3, so cos 3x and sin 3x are also seen to be solutions. Hence

{cos 2x, sin 2x, cos 3x, sin 3x}

is a fundamental set to the ODE on (−∞,∞), and by Theorem 4.17 it follows that

y(x) = c1 cos 2x+ c2 sin 2x+ c3 cos 3x+ c4 sin 3x

is the general solution to the ODE on (−∞,∞). �


y(9) − 2y(8) − 7y(7) − 4y(6) + 28y(5) + 48y(4) + 36y(3) = 0.


r9 − 2r8 − 7r7 − 4r6 + 28r5 + 48r4 + 36r3 = 0,

or equivalently

r3(r6 − 2r5 − 7r4 − 4r3 + 28r2 + 48r + 36) = 0.

The Rational Roots Theorem of algebra implies that any roots of the polynomial in parenthesesmight possess must be in the set

{±1,±2,±3,±4,±6,±9,±12,±18,±36}.

Either trial-and-error or considering the graph of

f(x) = x6 − 2x5 − 7x4 − 4x3 + 28x2 + 48x+ 36

will reveal that 3 is indeed a root. Carrying out synthetic division,

3 1 −2 −7 −4 28 48 363 3 −12 −48 −60 −36

1 1 −4 −16 −20 −12 0

81

we find that

r3(r6 − 2r5 − 7r4 − 4r3 + 28r2 + 48r + 36) = r3(r − 3)(r5 + r4 − 4r3 − 16r2 − 20r − 12).

It so happens 3 is also a root of the 5th-degree polynomial above, and thus

r3(r − 3)(r5 + r4 − 4r3 − 16r2 − 20r − 12) = r3(r − 3)2(r4 + 4r3 + 8r2 + 8r + 4).

If we assumer4 + 4r3 + 8r2 + 8r + 4 = (r2 + pr + q)2

and expand the right-hand side, we quickly discover p = q = 2, and therefore the characteristicequation factors as

r3(r − 3)2(r2 + 2r + 2)2 = 0.

It is now clear that 0 is a root of multiplicity 3, and 3 is a root of multiplicity 2. By Theorem4.33 solutions to the ODE are thus 1, x, x2, e3x, xe3x. Since

r2 + 2r + 2 = 0 ⇒ r2 + 2r + 1 = −1 ⇒ (r + 1)2 = −1,

we also have −1± i as double roots, and hence

e−x cosx, xe−x cosx, e−x sinx, xe−x sinx

are additional solutions to the ODE. The general solution to the ODE is therefore

y(x) = c1 + c2x+ c3x2 + (c4 + c5x)e3x + (c6 cosx+ c7x cosx+ c8 sinx+ c9x sinx)e−x

on (−∞,∞). �

82

4.5 – The Differential Operator Approach

The product of two polynomials

P1 =n∑k=0

akxk and P2 =

m∑j=0

bjxj

may be expressed as

P1P2 =n∑k=0

m∑j=0

akbjxk+j (4.30)

if no like terms are combined. In similar fashion, given two differential operators

Λ1 =n∑k=0

akDk and Λ2 =

m∑j=0

bjDj,

by Theorem 1.7 the product Λ1Λ2 is the differential operator given by

Λ1Λ2 =n∑k=0

m∑j=0

akbjDk+j. (4.31)

Thus we see that products of differential operators, as defined in §1.2, are formally computedin precisely the same way that products of polynomials are computed in elementary algebra:the expressions at right in (4.30) and (4.31) are identical except for the symbol x in the formerbeing replaced by the derivative operator D in the latter.

A consequence of this is that the polynomial∑n

k=0 akxk factors as

n∑k=0

akxk =

p∏j=1

(rjx+ sj)nj

if and only if the differential operator∑n

k=0 akDk can be written as

n∑k=0

akDk =

p∏j=1

(rjD + sj)nj .

The following example illustrates how this fact may be used to solve differential equations ofthe form (4.20).

Example 4.37. We consider the ODE

6y′′ + 7y′ − 5y = 0. (4.32)

Since6r2 + 7r − 5 = (2r − 1)(3r + 5),

we have6D2 + 7D − 5 = (2D − 1)(3D + 5).

Thus for any twice-differentiable function y(x) we find, citing the definition from §1.2 beingused over each relevant equal sign, that

6y′′ + 7y′ − 5y = 6D2y + 7Dy − 5y(1.5)

= (6D2 + 7D − 5)y

83

=((2D − 1)(3D + 5)

)y

(1.11)

= (2D − 1)[(3D + 5)y],

so that the equation (4.32) may be written as

(2D − 1)[(3D + 5)y] = 0. (4.33)

However, by Theorem 1.7, the product operation of differential operators is commutative (i.e.Λ1Λ2 = Λ2Λ1), and so (4.32) may also be rendered as

(3D + 5)[(2D − 1)y] = 0. (4.34)

Recalling the property (1.9) stating that Λ[0] = 0 for any differential operator Λ, we see that(4.33) would be satisfied if (3D + 5)y = 0, while (4.34) would be satisfied if (2D − 1)y = 0.Hence (4.32) is satisfied if y is such that

3y′ + 5y = 0 or 2y′ − y = 0.

Both of these equations can be solved by the Separation of Variables Method,9 which will showthat one solution for 3y′ + 5y = 0 is y = e−5x/3, and one solution for 2y′ − y = 0 is y = ex/2. Itfollows that e−5x/3 and ex/2 are solutions to (4.32), and therefore

y = c1e−5x/3 + c2e

x/2

is the general solution. �

9They could also be solved using Theorem 4.33.

84

4.6 – Method of Undetermined Coefficients

We consider now nth-order nonhomogeneous linear differential equations with constantcoefficients,

any(n) + · · ·+ a1y

′ + a0y = f(x), (4.35)

for which the nonhomogeneity f(x) is specifically a polynomial, exponential, sine, or cosinefunction, or certain combinations of these. We will use the Method of Undetermined Coefficients,which is described in the following theorem.

Theorem 4.38 (Method of Undetermined Coefficients). Let Pm(x) be a nonzero polyno-mial of degree m, and let yp(x) denote a particular solution to (4.35).

1. If f(x) = Pm(x)eαx, then

yp(x) = xseαxm∑k=0

Akxk,

where s = 0 if α is not a root of the characteristic equation, otherwise s equals the multiplicityof α as a root of the characteristic equation.

2. If f(x) = Pm(x)eαx cos βx or f(x) = Pm(x)eαx sin βx for β 6= 0, then

yp(x) = xseαx

(cos βx

m∑k=0

Akxk + sin βx

m∑k=0

Bkxk

),

where s = 0 if α + iβ is not a root of the characteristic equation, otherwise s equals themultiplicity of α + iβ as a root of the characteristic equation.

In the first part of the theorem the coefficients that must be determined are A0, A1, . . . , Am,and in the second part they are A0, . . . , Am and B0, . . . , Bm. It is understood that α and β areknown constants, and if β = 0 then the second part of the theorem merely becomes the firstpart.

There are no arbitrary constants present in the statement of the theorem since the theoremis furnishing templates for a particular solution to (4.35), not a general solution. To construct ageneral solution to (4.35) is simply a matter of finding the general solution to the correspondinghomogeneous equation (4.20), henceforth to be referred to as the reduced equation for (4.35),and then applying Theorem 4.17. The full proof of Theorem 4.38 will come later.

Example 4.39. Find a form for a particular solution yp(x) to

2y′′ + 7y′ − 15y = f(x)

for each expression for f(x).

(a) f(x) = 3xe8x

(b) f(x) = 9x3e−5x

(c) f(x) = (7x2 − x)e4x

(d) f(x) = x cos 6x

85

Solution.(a) The function 3xe8x fits the form Pm(x)eαx in part (1) of the Method of UndeterminedCoefficients, with Pm(x) = 3x being a first-degree polynomial so that m = 1, and eαx = e8x sothat α = 8. Hence by Theorem 4.38(1) we have

yp(x) = xse8x1∑

k=0

Akxk = xse8x(A0 + A1x).

To determine s, observe that the characteristic equation 2r2 + 7r − 15 = 0 has distinct realroots −5 and 3/2, and so α = 8 is not a root of the characteristic equation, and we concludethat s = 0 by Theorem 4.38(1). Therefore

yp(x) = (A1x+ A0)e8x

is the desired form.

(b) The function 9x3e−5x fits the form Pm(x)eαx, with Pm(x) = 9x3 being a third-degreepolynomial so that m = 3, and eαx = e−5x so that α = −5. Hence by Theorem 4.38(1) we have

yp(x) = xse−5x3∑

k=0

Akxk = xse−5x(A0 + A1x+ A2x

2 + A3x3).

The characteristic equation 2r2 + 7r − 15 = 0 has real roots −5 and 3/2, so α = −5 is asingle root of the characteristic equation and so we conclude that s = 1 by Theorem 4.38(1b).Therefore

yp(x) = x(A3x3 + A2x

2 + A1x+ A0)e−5x


(c) The function (7x2−x)e4x fits the form Pm(x)eαx, with Pm(x) = 7x2−x being a second-degreepolynomial so that m = 2, and eαx = e4x so that α = 4. Hence by Theorem 4.38(1) we have

yp(x) = xse4x2∑

k=0

Akxk = xse−5x(A0 + A1x+ A2x

2).

The characteristic equation has real roots −5 and 3/2, and since α = 4 is not a root we concludethat s = 0 by Theorem 4.38(1). Therefore

yp(x) = (A2x2 + A1x+ A0)e

4x


(d) The function x cos 6x fits the form Pm(x)eαx cos βx in part (2) of the Method of UndeterminedCoefficients, with Pm(x) = x being a first-degree polynomial so that m = 1, eαx = 1 so thatα = 0, and cos βx = cos 6x so that β = 6. Hence by Theorem 4.38(2) we have

yp(x) = xse0x cos 6x1∑

k=0

Akxk + xse0x sin 6x

1∑k=0

Bkxk

= xs(A0 + A1x) cos 6x+ xs(B0 +B1x) sin 6x.

86

The characteristic equation has roots −5 and 3/2, and since α + iβ = 6i is not a root weconclude that s = 0 by Theorem 4.38(2). Therefore

yp(x) = (A1x+ A0) cos 6x+ (B0 +B1x) sin 6x

is the desired form. �

Example 4.40. Find a particular solution to 2y′′ + 7y′ − 15y = 3xe8x.

Solution. From Example 4.39(a) it was found that a particular solution would be of the form

yp(x) = (A1x+ A0)e8x.

From this we obtainy′p(x) = 8(A1x+ A0)e

8x + A1e8x

andy′′p(x) = 64(A1x+ A0)e

8x + 16A1e8x.

Substituting these expressions into the ODE gives

3xe8x = 2y′′p + 7y′p − 15yp

= 2[64(A1x+ A0)e

8x + 16A1e8x]

+ 7[8(A1x+ A0)e

8x + A1e8x]− 15

[(A1x+ A0)e

8x]

= (169A1x+ 169A0 + 39A1)e8x,

and hence169A1x+ (169A0 + 39A1) = 3x. (4.36)

The objective here is not to solve for x, but to find A1 and A0 such that (4.36) is satisfied forall −∞ < x < ∞. This requires the coefficients of x to be the same on both sides, so that169A1 = 3; and also the constant terms must match, so 169A0 + 39A1 = 0. (There is no constantterm on the right, which is to say it is zero.) This gives us a simple system of equations,{

169A1 = 3

169A0 + 39A1 = 0

Solving this system yields A1 = 3/169 and A0 = −9/2197.Therefore

yp(x) =

(3

169x− 9

2197

)e8x

is a particular solution to the ODE. �

Example 4.41. Find a particular solution to y′′ − y = x sinx.

Solution. Here x sinx fits the form Pm(x)eαx sin βx in Theorem 4.38(2), with Pm(x) = x beinga first-degree polynomial so that m = 1, eαx = 1 so that α = 0, and sin βx = sinx so that β = 1.Hence, by the Method of Undetermined Coefficients a particular solution to the ODE will havethe form

yp(x) = xse0x cosx1∑

k=0

Akxk + xse0x sinx

1∑k=0

Bkxk

87

= xs(A1x+ A0) cosx+ xs(B1x+B0) sinx.

The characteristic equation is r2 − 1 = 0, which has roots r = ±1. So, since α + iβ = i is not aroot of the characteristic equation, by Theorem 4.38(2) we conclude that s = 0 and hence yp(x)takes the form

yp(x) = (A1x+ A0) cosx+ (B1x+B0) sinx.

From this we get

y′p(x) = (A1 +B1x+B0) cosx+ (−A1x− A0 +B1) sinx

and

y′′p(x) = (−A1x− A0 + 2B1) cosx+ (−2A1 −B1x−B0) sinx.

Substituting these expressions into the ODE gives

x sinx = y′′p(x)− yp(x)

= (−A1x− A0 + 2B1) cosx+ (−2A1 −B1x−B0) sinx

− [(A1x+ A0) cosx+ (B1x+B0) sinx]

= (−2B1)x sinx+ (−2A1)x cosx+ (−2A0 + 2B1) cosx+ (−2A1 − 2B0) sinx.

Equating the coefficients of the linearly independent functions x sinx, x cosx, cosx, and sinxon each side, we obtain the system

− 2B1 = 1− 2A1 = 0

−2A0 + 2B1 = 0−2A1 − B0 = 0

Solving this system gives A0 = −1/2, A1 = 0, B0 = 0, and B1 = −1/2.Therefore

yp(x) = −1

2cosx− 1

2x sinx

is a particular solution to the ODE. �

Example 4.42. Determine the form of a particular solution to

y′′ + 5y′ + 6y = sinx− cos 2x. (4.37)

Solution. First consider the equation

y′′ + 5y′ + 6y = sinx.

Consulting Theorem 4.38 we have f1(x) = Pm(x)eαx sin βx with Pm(x) = 1, eαx = 1, andsin βx = sinx, so that m = 0, α = 0, and β = 1. Now, the characteristic equation

r2 + 5r + 6 = 0

88

has roots −3 and −2, and since α + iβ = i is not a root, Theorem 4.38(2) implies that s = 0and the particular solution takes the form

yp(x) = x0e0x cosx0∑

k=0

Akxk + x0e0x sinx

0∑k=0

Bkxk = A0 cosx+B0 sinx.

Next we consider the equation

y′′ + 5y′ + 6y = − cos 2x.

We have f2(x) = Pm(x)eαx cos βx with Pm(x) = −1, eαx = 1, and cos βx = cos 2x, so thatm = 0, α = 0, and β = 2. Since α+ iβ = 2i is not a root of the characteristic equation, Theorem4.38(2) implies that s = 0 and the particular solution takes the form

yp(x) = x0e0x cos 2x0∑

k=0

Ckxk + x0e0x sin 2x

0∑k=0

Dkxk = C0 cos 2x+D0 sin 2x.

Now, by the Superposition Principle a particular solution to (4.37), where sinx− cos 2x =f1(x) + f2(x), has the form

A0 cosx+B0 sinx+ C0 cos 2x+D0 sin 2x,

which is a linear combination of four linearly independent functions. �

Example 4.43. Find a general solution to

y′′ + 2y′ + y = x2 + 1− ex, (4.38)

and then find the solution to the initial value problem

y′′ + 2y′ + y = x2 + 1− ex, y(0) = 0, y′(0) = 2.

Solution. First we consider the equation

y′′ + 2y′ + y = x2 + 1,

which has nonhomogeneity f1(x) = x2 + 1 that is of the form Pm(x)eαx with Pm(x) = x2 + 1and eαx = 1, so that m = 2 and α = 0. The associated characteristic equation

r2 + 2r + 1 = 0

has repeated real root r = −1, which make clear that α = 0 is not a root and so, by Theorem4.38(1) we conclude that

yp1(x) = x0e0x2∑

k=0

Akxk = A2x

2 + A1x+ A0 (4.39)

is the form of a particular solution. Substituting this, along with y′p1(x) = 2A2x + A1 and

y′′p1(x) = 2A2, into y′′ + 2y′ + y = x2 + 1 gives

2A2 + 2(2A2x+ A1) + (A2x2 + A1x+ A0) = x2 + 1,

89

which we may rewrite as

A2x2 + (4A2 + A1)x+ (2A2 + 2A1 + A0) = x2 + 0x+ 1.

Equating coefficients gives rise to the system{A2 = 1

A1 + 4A2 = 0A0 + 2A1 + 2A2 = 1

Solving this system yields A0 = 7, A1 = −4, and of course A2 = 1. Putting these values into(4.39) delivers

yp1(x) = x2 − 4x+ 7.

Next we turn to the equationy′′ + 2y′ + y = −ex,

which has nonhomogeneity f2(x) = −ex that is of the form Pm(x)eαx with Pm(x) = −1 andeαx = ex, so that m = 0 and α = 1. Since α = 1 is not a root of the characteristic equation, byTheorem 4.38(1) we conclude that

yp2(x) = x0ex0∑

k=0

Akxk = A0e

x (4.40)

is the form of a particular solution. Substituting this into y′′ + 2y′ + y = −ex gives

4A0ex = −ex,

and so we must have 4A0 = −1, or A0 = −1/4. Putting this result into (4.40) results in

yp2(x) = −1

4ex.

The Superposition Principle now implies that

yp(x) = yp1(x) + yp2(x) = x2 − 4x+ 7− 1

4ex

is a particular solution to (4.38). Since the characteristic equation has the repeated real root−1, by Theorem 4.25 the reduced equation y′′ + 2y′ + y = 0 has general solution

yh(x) = c1e−x + c2xe

−x,

and so by Theorem 4.17 the general solution to (4.38) is y = yp + yh, or

y(x) = x2 − 4x+ 7− 1

4ex + c1e

−x + c2xe−x. (4.41)

Finally we turn to the matter of solving the initial value problem. From (4.41) and the initialcondition y(0) = 0 we obtain 0 = 7− 1/4 + c1, or c1 = −27/4. Now, differentiating (4.41) gives

y′(x) = 2x− 4− 1

4ex +

27

4e−x + c2e

−x − c2xe−x,

which together with the initial condition y′(0) = 2 implies that c2 = −1/2. Therefore

y(x) = x2 − 4x+ 7− 1

4ex − 27

4e−x − 1

2te−x

90

is the solution to the IVP. �


y(4) − 5y′′ + 4y = 10 cos x− 20 sinx. (4.42)

Solution. Start with

y(4) − 5y′′ + 4y = 10 cos x.

We have f1(x) = Pm(x)eαx cos βx with Pm(x) = 10, eαx = 1, and cos βx = cosx, so that m = 0,α = 0, and β = 1. The characteristic equation r4−5r2+4 = 0 factors first as (r2−4)(r2−1) = 0,and then as

(r − 2)(r + 2)(r − 1)(r + 1) = 0,

so that it can be seen to have roots r = ±2,±1. Since α+iβ = i is not a root of the characteristicequation, we set s = 0 in Theorem 4.38(2) to obtain

yp1(x) = x0e0x cosx0∑

k=0

Akxk + x0e0x sinx

0∑k=0

Bkxk = A0 cosx+B0 sinx

as the form of a particular solution to the ODE.Next we consider

y(4) − 5y′′ + 4y = −20 sinx.

We have f2(x) = Pm(x)eαx sin βx with Pm(x) = −20, eαx = 1, and cos βx = cosx, so that m = 0,α = 0, and β = 1. Since α + iβ = i is not a root of the characteristic equation r4 − 5r2 + 4 = 0we set s = 0 in Theorem 4.38(2) to obtain

yp2(x) = x0e0x cosx0∑

k=0

Ckxk + x0e0x sinx

0∑k=0

Dkxk = C0 cosx+D0 sinx

as the form of a particular solution to the ODE. (Note: the symbols A0 and B0 are already inuse in this problem, so C0 and D0 are used here instead.)

Now, by the Superposition Principle the form for a particular solution to (4.42) is given byyp = yp1 + yp2 ; that is,

yp(x) = (A0 cosx+B0 sinx) + (C0 cosx+D0 sinx).

However, the four terms on the right-hand side of the equation do not represent four linearlyindependent functions. If we let A = A0 + C0 and B = B0 +D0, we may write simply

yp(x) = A cosx+B sinx.

Substituting this into (4.42) gives

10 cosx− 20 sinx = y(4)p (x)− y′′p(x) + 4yp(x)

= (A cosx+B sinx)− 5(−A cosx−B sinx) + 4(A cosx+B sinx)

= 10A cosx+ 10B sinx,

91

Matching coefficients gives 10A = 10 and 10B = −20, so that A = 1 and B = −2, and theparticular solution becomes

yp(x) = cos x− 2 sinx.

A general solution for the reduced equation y(4) − 5y′′ + 4y = 0 is

yh(x) = c1e−2x + c2e

2x + c3e−x + c4e

x,

since the roots of the characteristic equation are ±2,±1. Therefore by Theorem 4.17 we concludethat

y(x) = cos x− 2 sinx+ c1e−2x + c2e

2x + c3e−x + c4e

x

is the general solution to (4.42). �

92

4.7 – Method of Variation of Parameters

We begin by restricting our focus to second-order linear equations. If {y1, y2} is a fundamentalset to

y′′ + p1(x)y′ + p0(x)y = 0 (4.43)

on an interval I, where p0 and p1 are assumed to be continuous functions on I, then Theorem4.15 implies that the general solution to (4.43) on I is c1y1 + c2y2. Now, if we replace the zerofunction on the right-hand side of (4.43) by some function q(x) that is continuous on I, weobtain the ODE

y′′ + p1(x)y′ + p0(x)y = q(x), (4.44)

and we make a conjecture that there exist differentiable functions u1 and u2 such that a particularsolution yp to (4.44) can be expressed in the form

yp(x) = u1(x)y1(x) + u2(x)y2(x);

that is, a form in which the parameters c1 and c2 in the general solution c1y1 + c2y2 to (4.43)are allowed to vary as functions of x. This approach is called the Variation of ParametersMethod.

If the functions u1 and u2 exist, then

y′p = u1y′1 + u′1y1 + u2y

′2 + u′2y2

andy′′p = u1y

′′1 + 2u′1y

′1 + u′′1y1 + u2y

′′2 + 2u′2y

′2 + u′′2y2,

which, when substituted into (4.44), yields

(u1y′′1 + 2u′1y

′1 + u′′1y1 + u2y

′′2 + 2u′2y

′2 + u′′2y2)

+ p1(u1y′1 + u′1y1 + u2y

′2 + u′2y2) + p0(u1y1 + u2y2) = q. (4.45)

By hypothesis y1 and y2 are solutions to (4.43), so that

y′′1 + p1(x)y′1 + p0(x)y1 = 0 and y′′2 + p1(x)y′2 + p0(x)y2 = 0,

and hence (4.45) becomes

(u′′1y1 + u′′2y2 + 2u′1y′1 + 2u′2y

′2) + p1(u

′1y1 + u′2y2) = q.

This we rewrite as [(u′1y1 + u′2y2)

′ + (u′1y′1 + u′2y

′2)]

+ p1(u′1y1 + u′2y2) = q,

which can be seen to be satisfied on I if the system{u′1(x)y1(x) + u′2(x)y2(x) = 0

u′1(x)y′1(x) + u′2(x)y′2(x) = q(x),

or equivalently [y1(x) y2(x)y′1(x) y′2(x)

][u′1(x)u′2(x)

]=

[0

q(x)

], (4.46)

93

is satisfied for all x ∈ I. Since {y1, y2} is a linearly independent set of functions on I, Theorem4.12 implies that W[y1, y2](x) 6= 0 for each x ∈ I, and thus by the Invertible Matrix Theoremthe leftmost matrix in (4.46) is invertible for each x ∈ I. In fact, by Cramer’s Rule we have

u′1(x) =1

W [y1, y2](x)

∣∣∣∣ 0 y2(x)q(x) y′2(x)

∣∣∣∣ and u′2(x) =1

W [y1, y2](x)

∣∣∣∣ y1(x) 0y′1(x) q(x)

∣∣∣∣ (4.47)

for each x ∈ I.It would seem that our wager that a particular solution to (4.44) has the form yp = u1y1+u2y2

has paid off, but there is still the question of whether the functions u′1 and u′2 as given by (4.47)have antiderivatives on I. But they do: the function W [y1, y2] is continuous on I by Proposition4.20, and since it is also nonvanishing on I, it follows that u′1 and u′2 in (4.47) are likewisecontinuous on I. By the Fundamental Theorem of Calculus this ensures that u′1 and u′2 eachhave an antiderivative on I. Indeed, by Theorem 1.4 we may let

u1(x) = −ˆ x

x0

y2(t)q(t)

W [y1, y2](t)dt and u2(x) =

ˆ x

x0

y1(t)q(t)

W [y1, y2](t)dt (4.48)

for any choice of x0 ∈ I.It can be shown that if u1y1 + u2y2 is a particular solution to (4.44), then so too is

(u1 + k1)y1 + (u2 + k2)y2

for any constants k1, k2 ∈ R. Thus, whatever choice is made for x0 ∈ I in (4.48), the values ofk1 and k2 can be chosen so that the expressions for u1(x) + k1 and u2(x) + k2 have no constantterm. Because of this it is common to informally write the integrals in (4.48) as

u1(x) = −ˆ

y2(x)q(x)

W [y1, y2](x)dx and u2(x) =

ˆy1(x)q(x)

W [y1, y2](x)dx, (4.49)

with the understanding that the integrals in (4.49) represent antiderivatives for u′1(x) and u′2(x)on I with constant terms usually chosen to be 0.


y′′ + 4y = tan 2x

on the interval (−π/4, π/4).

Solution. The characteristic equation r2 + 4 = 0 has roots ±2i, so the general solution to thereduced equation y′′ + 4y = 0 is

yh(x) = c1 cos 2x+ c2 sin 2x.

We see that y1(x) = cos 2x and y2(x) = sin 2x are two linearly independent solutions toy′′ + 4y = 0. By the Method of Variation of Parameters a proposed particular solution to theODE is of the form

yp(x) = u1(x) cos 2x+ u2(x) sin 2x,

where u1 and u2 are given by (4.49). Hence

u1(x) = −ˆ

sin2 2x

2 cos 2xdx =

ˆcos2 2x− 1

2 cos 2xdx =

1

2

ˆ(cos 2x− sec 2x)dx

94

=sin 2x

4− 1

4ln∣∣ sec 2x+ tan 2x

∣∣,and

u2(x) =1

2

ˆsin 2xdx = −cos 2x

4.

In the expressions for u1(x) and u2(x) we (quite arbitrarily) choose the arbitrary constant termsto be zero.

Finally we determine yp:

yp(x) = u1(x) cos 2x+ u2(x) sin 2x

=

[sin 2x

4− 1


∣∣]cos 2x− cos 2x sin 2x

4

= −1

4(cos 2x) ln

∣∣ sec 2x+ tan 2x∣∣.

By Theorem 4.17 the general solution to the ODE is y(x) = yp(x) + yh(x), or

y(x) = −cos 2x


∣∣+ c1 cos 2x+ c2 sin 2x

for all −π/4 < t < π/4. �

In the example above we could have found a general solution to the ODE on any openinterval where tan(2x) is continuous, such as (π/4, 3π/4). The resultant expression for y(x)would have been the same.

As the next theorem demonstrates, the Variation of Parameters Method can be extended totreat linear ordinary differential equations of any order. As in the second-order case the key isto have on hand a fundamental set to the reduced equation, which the method does not offer ameans of finding.

Theorem 4.46 (Variation of Parameters). Suppose Y = {y1, . . . , yn} is a fundamental setto

y(n)(x) + pn−1(x)y(n−1)(x) + · · ·+ p1(x)y′(x) + p0(x)y(x) = 0

on an interval I. If q is continuous on I, then there exist functions u1, . . . , un which satisfy thesystem

u′1y1 + · · ·+ u′nyn = 0u′1y

′1 + · · ·+ u′ny

′n = 0

......

...

u′1y(n−2)1 + · · ·+ u′ny

(n−2)n = 0

u′1y(n−1)1 + · · ·+ u′ny

(n−1)n = q

(4.50)

on I, and for any such functions the function y =∑n

k=1 ukyk is a solution to

y(n)(x) + pn−1(x)y(n−1)(x) + · · ·+ p1(x)y′(x) + p0(x)y(x) = q(x) (4.51)

on I.

95

Proof. Suppose q is continuous on I. Defining

u′ =

u′1...u′n

and q =

0...0q

,the system (4.50) takes the form of the matrix equation M[Y ]u′ = q. For each x ∈ I thedeterminant of M[Y ](x) is W[Y ](x), and since W[Y ](x) 6= 0 by Theorem 4.12, the systemM[Y ](x)u′(x) = q(x) has a unique solution u′(x) by the Invertible Matrix Theorem. Thisimplies there are functions u′1, . . . , u

′n : I → R that satisfy (4.50) on I. Indeed, defining

yk = [yk, y′k, . . . , y

(n−1)k ]> and then

Wk[Y ] = det(y1, . . . ,yk−1,q,yk+1, . . . ,yn),

by Cramer’s Rule we have

u′k =Wk[Y ]

W [Y ].

Proposition 4.20 implies W [Y ] is continuous on I, and the same argument given in the proof ofthe proposition shows that Wk[Y ] is also continuous on I. Then since W [Y ] is nonvanishing onI it follows that u′k is continuous on I, and therefore has an antiderivative uk there. With theantiderivatives u1, . . . , un we now define y =

∑nk=1 ukyk.

Using the first equation in (4.50), we obtain

y′ =n∑k=1

(ukyk)′ =

n∑k=1

(uky′k + u′kyk) =

n∑k=1

uky′k.

Taking the derivative of both sides of the first equation in (4.50) gives

n∑k=1

(u′ky′k + u′′kyk) = 0,

which together with the second equation in (4.50) gives

y′′ =n∑k=1

(ukyk)′′ =

n∑k=1

(uky′′k + 2u′ky

′k + u′′kyk) =

n∑k=1

uky′′k .

Continuing in this fashion, we find that

y(j) =n∑k=1

uky(j)k (4.52)

for each 0 ≤ j ≤ n− 1, and finally, using the last equation in (4.50),

y(n) =[y(n−1)

]′=

(n∑k=1

uky(n−1)k

)′=

n∑k=1

uky(n)k +

n∑k=1

u′ky(n−1)k =

n∑k=1

uky(n)k + q. (4.53)

96

We now substitute y =∑n

k=1 ukyk into the left-hand side of (4.51), and with (4.52) and (4.53)we find that

y(n) + pn−1y(n−1) + · · ·+ p0y =

(n∑k=1

uky(n)k + q

)+ pn−1

n∑k=1

uky(n−1)k + · · ·

· · ·+ p1

n∑k=1

uky′k + p0

n∑k=1

ukyk

= q +n∑k=1

[uky

(n)k + pn−1uky

(n−1)k + · · ·+ p1uky

′k + p0ukyk

]= q +

n∑k=1

[y(n)k + pn−1y

(n−1)k + · · ·+ p1y

′k + p0yk

]uk

= q +n∑k=1

(0)uk = q.

This shows that y =∑n

k=1 ukyk is a solution to (4.51). �

The statement of Theorem 4.46 together with the revelations in its proof immediately implythe following corollary concerning the general solution to a linear ODE. The last statement inthe corollary obtains from Theorem 4.17.

Corollary 4.47. Suppose q : I → R is continuous. If Y = {y1, . . . , yn} is a fundamental set to

y(n) + pn−1y(n−1) + · · ·+ p1y

′ + p0y = 0

on I, then a particular solution to

y(n) + pn−1y(n−1) + · · ·+ p1y

′ + p0y = q (4.54)

on I is

yp(x) =n∑k=1

yk(x)

ˆ x

x0

Wk[Y ](t)

W [Y ](t)dt

for any choice of constant x0 ∈ I. Therefore

y(x) = yp(x) +n∑k=1

ckyk(x)

is the general solution to (4.54) on I.

By the same reasoning that allowed us to pass from (4.48) to (4.49), we may write theexpression for yp(x) in Corollary 4.47 as

yp(x) =n∑k=1

yk(x)

ˆWk[Y ](x)

W [Y ](x)dx, (4.55)

97

with the integral in (4.55) being interpreted as an antiderivative for Wk[Y ](x)/W[Y ](x) on Iwith constant term usually chosen to be 0. In the n = 3 case, in particular, we have

yp(x) = y1(x)

ˆW1[Y ](x)

W [Y ](x)dx+ y2(x)

ˆW2[Y ](x)

W [Y ](x)dx+ y3(x)

ˆW3[Y ](x)

W [Y ](x)dx,

where

W1[Y ] =

∣∣∣∣∣∣0 y2 y30 y′2 y′3q y′′2 y′′3

∣∣∣∣∣∣ , W2[Y ] =

∣∣∣∣∣∣y1 0 y3y′1 0 y′3y′′1 q y′′3

∣∣∣∣∣∣ , W3[Y ] =

∣∣∣∣∣∣y1 y2 0y′1 y′2 0y′′1 y′′2 q

∣∣∣∣∣∣ . (4.56)


y′′ − 4y =e2x

x. (4.57)

Solution. Here q(x) = e2x/x is continuous on (−∞, 0) and (0,∞), with discontinuity at 0, solet I represent either one of these intervals. The characteristic equation r2 − 4 = 0 has rootsr = ±2, and so Y = {y1, y2} with y1(x) = e2x and y2(x) = e−2x is a fundamental set to thereduced equation y′′ − 4y = 0. We calculate

W [Y ](x) =

∣∣∣∣ e2x e−2x

2e2x −2e−2x

∣∣∣∣ = −4,

W1[Y ](x) =

∣∣∣∣ 0 e−2x

e2x/x −2e−2x

∣∣∣∣ = −1

x,

W2[Y ](x) =

∣∣∣∣ e2x 02e2x e2x/x

∣∣∣∣ =e4x

x.

Referring to (4.55), a particular solution to (4.57) has the form

yp(x) = y1(x)

ˆW1[Y ](x)

W [Y ](x)dx+ y2(x)

ˆW2[Y ](x)

W [Y ](x)dx.

We easily find that ˆW1[Y ](x)

W [Y ](x)=

ˆ1

4xdx =

1

4ln |x|,

where we conveniently choose the constant term to be 0; however,ˆW2[Y ](x)

W [Y ](x)= −1

4

ˆe4x

xdx

is not an integral that can be resolved as an elementary function. To retain technical precision,therefore, we set ˆ

W2[Y ](x)

W [Y ](x)= −1

4

ˆ x

x0

e4t

tdt

for any choice of x0 ∈ I, and therefore by Corollary 4.47 the general solution to (4.57) is

y(x) =e2x

4lnx− e−2x

4

ˆ x

x0

e4t

tdt+ c1e

2x + c2e−2x

98

for any x0 > 0 if I = (0,∞), or

y(x) =e2x

4ln(−x)− e−2x

4

ˆ x

x0

e4t

tdt+ c1e

2x + c2e−2x

for any x0 < 0 if I = (−∞, 0). �


x3y′′′ − 3xy′ + 3y = x4 cosx, x > 0. (4.58)

Solution. We start by finding a fundamental set to

x3y′′′ − 3xy′ + 3y = 0 (4.59)

on (0,∞). Since the coefficients of this equation are all polynomials, there is the possibility thatthere is a solution of the form y = xm.10 Substituting xm for y in (4.59) yields

x3 ·m(m− 1)(m− 2)xm−3 − 3x ·mxm−1 + 3xm = 0,

which simplifies to become

(m3 − 3m2 −m+ 3)xm = 0.

This is satisfied for x > 0 if and only if m3 − 3m2 −m+ 3 = 0, and since

m3 − 3m2 −m+ 3 = m2(m− 3)− (m− 3) = (m2 − 1)(m− 3),

we arrive at three possibilities, namely m = −1, 1, 3, showing that {x−1, x, x3} is a fundamentalset to (4.59).

To apply the Method of Variation of Parameters we need to work with an ODE that is instandard form. For x > 0 we find that (4.58) holds if and only if

y′′′ − 3

x2y′ +

3

x3y = x cosx, (4.60)

so that the general solution to (4.58) equals the general solution to the standard-form equation(4.60), and moreover Y = {y1, y2, y3} with y1(x) = x−1, y2(x) = x, y3(x) = x3 is a fundamentalset to the reduced equation for (4.60). Letting

uk(x) =

ˆWk[Y ](x)

W [Y ](x)dx,

where each Wk[Y ] is given by (4.56) with q(x) = x cosx, by Corollary 4.47 a particular solutionto (4.60) has the form

yp(x) = x−1u1(x) + xu2(x) + x3u3(x).

What remains is some calculation. We have

W [Y ](x) =

∣∣∣∣∣∣x−1 x x3

−x−2 1 3x2

2x−3 0 6x

∣∣∣∣∣∣ = 16,

10In fact (4.59) is a Cauchy-Euler equation, treated in the next section.

99

W1[Y ](x) =

∣∣∣∣∣∣0 x x3

0 1 3x2

x cosx 0 6x

∣∣∣∣∣∣ = 2x4 cosx,

W2[Y ](x) =

∣∣∣∣∣∣x−1 0 x3

−x−2 0 3x2

2x−3 x cosx 6x

∣∣∣∣∣∣ = −4x2 cosx,

W3[Y ](x) =

∣∣∣∣∣∣x−1 x 0−x−2 1 02x−3 0 x cosx

∣∣∣∣∣∣ = 2 cos x,

so that

u1(x) =1

8

ˆx4 cosxdx, u2(x) = −1

4

ˆx2 cosxdx, u3(x) =

1

8

ˆcosxdx.

Clearly u3(x) = 18

sinx. Applying integration by parts twice yieldsˆx2 cosxdx = x2 sinx−

ˆ2x sinxdx = x2 sinx+ 2x cosx− 2 sinx,

which we use after more integration by parts to obtainˆx4 cosxdx = x4 sinx−

(−4x3 cosx+ 12

ˆx2 cosxdx

)= x4 sinx+ 4x3 cosx− 12x2 sinx− 24x cosx+ 24 sinx.

Now,

u2(x) = −1

4(x2 sinx+ 2x cosx− 2 sinx)

and

u1(x) =1

8(x4 sinx+ 4x3 cosx− 12x2 sinx− 24x cosx+ 24 sinx),

and therefore (after some simplification)

yp(x) = −x sinx− 3 cosx+3 sinx

x.

The general solution to (4.60), and hence to (4.58), is

y(x) = −x sinx− 3 cosx+3 sinx

x+c1x

+ c2x+ c3x3

for x > 0. �

100

4.8 – Cauchy-Euler Equations

A Cauchy-Euler equation is a linear ODE of the form

anxn d

ny

dxn+ an−1x

n−1 dn−1y

dxn−1+ · · ·+ a1x

dy

dx+ a0y = f(x)

for constants a0, . . . , an.

101

4.9 – Nonlinear Equations

The focus in this section will be on second-order nonlinear ODEs. We start with the case inwhich the dependent variable y is missing, which can be treated by making the substitutionu = y′.

Example 4.50. Solve the initial-value problem

y′y′′ = 4x, y(1) = 5, y′(1) = 2.

Solution. Here the dependent variable y is missing, and so we effect a reduction of order byletting u = y′, so that u′ = y′′ and the ODE becomes

uu′ = 4x.

This is a separable first-order differential equation, and since

uu′ = 4x ⇒ udu

dx= 4x ⇒

ˆu du =

ˆ4x dx ⇒ 1

2u2 = 2x2 + c,

we obtain(y′)2 = 4x2 + c.

With the initial condition y′(1) = 2 we find that 22 = 4(1)2 + c, or c = 0, and hence (y′)2 = 4x2.Taking the square root implies y′(x) = ±2x; however, if we take y′(x) = −2x, then y′(1) = −2follows, which violates the initial condition y′(1) = 2. So we must take y′(x) = 2x, and then

y =

ˆ2x dx = x2 + c.

With the initial condition y(1) = 5 we find that c = 4, and therefore

y = x2 + 4

is the solution. �

Example 4.51. Find solutions to the nonlinear equation

y′′ = 2x(y′)2. (4.61)

Solution. The dependent variable y is missing, so let u = y′ to get the first-order ODE

du

dx= 2xu2.

With separation of variables comes ˆ1

u2du =

ˆ2x dx,

which yields −1/u = x2 + c1, and thus

y′ = − 1

x2 + c1. (4.62)

For the parameter c1 there are three possibilities: c1 > 0, c1 = 0, or c1 < 0.

102

If c1 > 0, we may replace c1 by c21 in (4.62), and write

y′ = − 1

x2 + c21,

whereupon integration gives

y = −ˆ

1

x2 + c21dx = − 1

c1arctan

(x

c1

)+ c2 (4.63)

for c2 ∈ R. Thus for any c1 > 0 and c2 ∈ R we obtain a solution with interval of validity(−∞,∞).

If c1 = 0, then (4.62) becomes y′ = −x−2, and hence

y =1

x+ c

for c ∈ R. Thus for any c ∈ R we obtain a solution having either (−∞, 0) or (0,∞) as itsinterval of validity, depending on initial conditions.

If c1 < 0, we may replace c1 by −c21 in (4.62), and write

y′ = − 1

x2 − c21= − 1

(x− c1)(x+ c1)=

1

2c1

(1

x+ c1− 1

x− c1

),

whereupon integration gives

y =1

2c1

ˆ (1

x+ c1− 1

x− c1

)dx =

1

2c1ln

∣∣∣∣x+ c1x− c1

∣∣∣∣+ c2 (4.64)

for c2 ∈ R. Here we must have x 6= ±c1, and so for any c1 < 0 and c2 ∈ R we obtain a solutionwith interval of validity not equal to (−∞,∞). What the interval of validity may be will dependon initial conditions. Possibilities are (−∞, c1), (c1,−c1), and (−c1,∞).

Have we found all solutions to (4.61) in our analyses? In fact we have not. First, any constantfunction will satisfy (4.61), and none of the families of functions obtained above includes theconstant functions. Also we find by direct substitution that the functions (4.63) will still satisfy(4.61) even if c1 < 0, and the functions (4.64) will satisfy (4.61) even if c1 > 0. And perhapsthere are still other solutions! �

The previous example certainly illustrates that the general solutions to nonlinear differentialequations are, in general, not nearly so nicely defined as for linear equations!

The next case is the one wherein the independent variable x is missing from a second-ordernonlinear ODE, so that the ODE is given as F (y, y′, y′′) = 0 for some nonlinear expressionF (y, y′, y′′). In this case we again make the substitution u = y′, so that u′ = y′′. Since xis absent, however, it becomes possible (and desirable) to designate y to be the independentvariable in the ODE. The problem is that the u′ in u′ = y′′ signifies differentiation of u withrespect to x: u′ = du/dx. We need to recast du/dx in a form involving differentiation withrespect to y.

To do this, we note that setting u = y′ declares u as a function of x, for the simple reasonthat y′ is a function of x. We also note that y is a function of x, and while we could expressthis by writing y = y(x), it will be convenient to instead write y = f(x) so as not to overworkthe symbol y in what follows. We can assume that f ′ is not identically zero, otherwise we havey′ ≡ 0 and y′′ ≡ 0, and the ODE has only uninteresting constant functions as solutions. By the

103

Implicit Function Theorem it follows that there are open intervals on which f is one-to-one andhence has local inverses.

Denote a local inverse of f on some particular open interval I by f−1. Then for all x ∈ Iand y ∈ f(I) we have y = f(x) if and only if x = f−1(y), and so we find x to be a function of yon the open interval f(I). So, since u is a function of x and x is a function of y, we find that umay be regarded a function of y. Specifically, if y ∈ f(I) is made to vary, then x will in turnvary since x = f−1(y) on f(I), and then u will also vary as a consequence. To speak of du/dytherefore has meaning, and that is essential if we hope to cast y in the role of the independentvariable in our ODE once we make the substitution u = y′.

Observing thatu(x) = u(f−1(y))

for all x ∈ I, where f−1(y) = x, we see that u is a function of y in the sense that the functionu ◦ f−1 : f(I)→ R is a function of y. We now differentiate u ◦ f−1, making use of the ChainRule followed by the Inverse Function Theorem, to get

(u ◦ f−1)′(y) = u′(f−1(y)) · (f−1)′(y) = u′(x) · 1

f ′(x),

whenceu′(x) = (u ◦ f−1)′(y)f ′(x) = (u ◦ f−1)′(y)y′(x).

For the last equality we return to our convention of letting y = y(x) instead of y = f(x). UsingLeibniz notation gives

d

dx[u(x)] =

d

dy

[u(f−1(y))

] ddx

[y(x)].

Next replace f−1(y) with x to get

d

dx[u(x)] =

d

dy[u(x)]

d

dx[y(x)],

or more compactlydu

dx=du

dy

dy

dx.

Thus, since u = y′ = dy/dx, we find that

y′′ = u′ =du

dx= u

du

dy.

What we see is that for the nonlinear second-order equation F (y, y′, y′′) = 0 that is missingx, when we make the substitution u = y′ and declare y the independent variable, we must have

y′′ = udu

dy. (4.65)

Note that y′ and y′′ always indicate differentiation with respect to x, so that y′ = dy/dx andy′′ = d2y/dx2 in the Leibniz notation.11

11This seems a good time to emphasize anew that a single-variable function f , in general, only has one kindof derivative in this course: f ′, where for any real number a we have

f ′(a) = limh→0

f(a+ h)− f(a)

h,

104

Example 4.52. Solve the initial-value problem

2y′y′′ = 1, y(0) = 2, y′(0) = 1.

Solution. The independent variable x is absent from the ODE, so we let u = y′, with y′′ givenby (4.65). We have:

2u · ududy

= 1 ⇒ˆ

2u2 du =

ˆdy ⇒ 2

3u3 = y + c ⇒ (y′)3 =

3

2y + c.

The last equation reverses the substitution u = y′, and restores x as the independent variable.To find c we need to employ both initial conditions simultaneously:

c = [y′(0)]3 − 3

2y(0) = (1)3 − 3

2(2) = −2.

Now we obtain

(y′)3 =3

2y − 2 ⇒ dy

dx=

(3y − 4

2

)1/3

⇒ˆ (

2

3y − 4

)1/3

dy =

ˆdx,

and thus (3y − 4

2

)2/3

= x+ c.

To find c we must use the initial condition y(0) = 2 a second time to get c = 1. Finally(3y − 4

2

)2/3

= x+ 1 ⇒ y =2(x+ 1)3/2 + 4

3,

our solution. �

provided the limit exists. Thus something like u′(f−1(y)) simply denotes the value of u′ at the number f−1(y):

u′(f−1(y)) = limh→0

u(f−1(y) + h)− u(f−1(y))

h.

When we spoke above of “differentiating u with respect to y” as opposed to with respect to x, it may haveseemed as if u had two different derivatives. This is not the case. As we saw, we were really talking about thederivatives of two different functions: namely, u (with respect to x) and u ◦ f−1 (with respect to y).

105

5Higher-Order Applications

5.1 – Free Mechanical Vibrations

Newton’s Second Law of Motion states that the force acting on an object equals the mass ofthe object times the object’s acceleration. In vector form we write this as

F = ma,

or simply F = ma if motion is along a line, in which case working with scalar quantities issufficient. If y(t) is the position of an object on a line at time t, then y′(t) gives the velocityalong the line, and y′′(t) gives the acceleration. That is, a(t) = y′′(t), and so by Newton’s SecondLaw we obtain the differential equation F = my′′.

If an object O is set upon a horizontal surface and attached to a spring that is anchored toa wall, we create a so-called mass-spring system. Setting O in motion, we can expect a fewforces to act on it.

First there is the force exerted by the spring (the tension), which by Hooke’s Law isproportional to the extent to which the unanchored end of the spring that is attached to Ois displaced from the position where the tension is zero. The constant of proportionality istypically denoted by −k, where k > 0 is called the spring constant or stiffness. Let y(t)be the position of O at time t, with y = 0 at the point where the spring is tension-free (theequilibrium position), y > 0 where the spring is stretched, and y < 0 where the spring iscompressed. Then Hooke’s Law states that

Fspring = −ky,

where the negative sign reflects that Fspring is directed in the positive direction when y < 0, andin the negative direction when y > 0.

Next, there is the force of friction exerted by the surface upon which O is moving. Thisforce is generally considered to be proportional to the velocity y′ of O, with the constant ofproportionality being denoted by −b, where b > 0 is called the damping constant. Thus wehave

Ffriction = −by′,

where the negative sign reflects that Ffriction is directed in the direction opposite the directionthat O is moving.

106

Any other forces acting on O are collectively called external forces and denoted by Fext.The net force on O is thus Ffriction + Fspring + Fext, and if m is the mass of O, it follows fromNewton’s Second Law that

my′′ = Ffriction + Fspring + Fext = −by′ − ky + Fext,

which can be written as

my′′ + by′ + ky = Fext,

a second-order linear differential equation with constant coefficients. If we consider no otherforces to be acting on O other than spring tension and friction, then we set Fext = 0 and obtainsimply

my′′ + by′ + ky = 0, (5.1)

which is the model for a physical state known as free mechanical vibration, the kind ofmodel that we’ll be analyzing in this section.

Recall that a nonconstant function y(t) is periodic if there exists some constant p > 0 suchthat y(t) = y(t + p) for all t ∈ Dom(y), in which case p is called the period of the function.Assuming y(t) is a continuous periodic function with period p and t0 ∈ Dom(y), let

ymax = max {y(t) : t ∈ [t0, t0 + p)} and ymin = min {y(t) : t ∈ [t0, t0 + p)} .

We define the amplitude A of the function y(t) to be

A =ymax − ymin

2, (5.2)

and the natural frequency ν to be

ν =1

p. (5.3)

A period value p is generally regarded as having the unit seconds/cycle, so that a naturalfrequency value ν has unit cycles/second.

A mass-spring system for which the damping constant b is zero, which is to say there is nofriction acting on the object, is called an undamped system. In such a system the object Owill exhibit an oscillatory motion modeled by a periodic position function y(t).

Example 5.1. A 2 kg object O is attached to a spring with stiffness k = 50 N/m. InitiallyO is displaced 1/4 m to the left of the equilibrium point and given a velocity of 1 m/s to theleft. Neglecting damping, find the equation of motion of O, the times O is at the equilibriumposition, and also the period, natural frequency, and amplitude of its motion.

Solution. To find the equation of motion means to find an expression for y(t) which gives theposition of O at time t. Referring to (5.1), we obtain an initial value problem

2y′′ + 50y = 0, y(0) = −1/4, y′(0) = −1,

where y(0) = −1/4 indicates that O is initially 1/4 m to the left of the equilibrium point, andy′(0) = −1 indicates that O initially has a velocity of 1 m/s in the leftward direction.

107

The characteristic equation 2r2 + 50 = 0 has roots α± iβ = ±5i, so that α = 0 and β = 5,and by Theorem 4.30 the general solution to the ODE is

y(t) = eαt(c1 cos βt+ c2 sin βt) = c1 cos 5t+ c2 sin 5t.

From this and the initial condition y(0) = −1/4 we obtain −1/4 = c1 cos 0 + c2 sin 0, and hencec1 = −1/4. From

y′(t) = −5c1 sin 5t+ 5c2 cos 5t

and the initial condition y′(0) = −1 we obtain −1 = −5c1 sin 0 + 5c2 cos 0, and hence c2 = −1/5.Therefore the solution to the IVP is

y(t) = −1

4cos 5t− 1

5sin 5t,

which is therefore the equation of motion of O. See Figure 7, which shows that the motion of Ois periodic as to be expected with an undamped system.

To find the times when O is in the equilibrium position we find the values t > 0 for whichy(t) = 0; that is, we find positive solutions to

−1

4cos 5t− 1

5sin 5t = 0,

which with a little algebra can be written as

tan 5t = −5

4. (5.4)

Now, if t0 is such that tan 5t0 = −5/4, then we must also have

tan(5t0 − nπ) = −5

4

for all n ∈ Z since the tangent function is periodic with period π. Thus, to find all solutions to(5.4), we find all t for which tan(5t− nπ) = −5/4. Isolating t gives

t =1

5

[tan−1

(−5

4

)+ nπ

]=

1

5tan−1

(−5

4

)+nπ

5, n ∈ Z.

t

y(t)

0

−0.25

0.320

t1 2t2 4t3

Figure 7. An undamped system.

108

Each integer n yields a different t value, denoted by tn. We have

t0 =1

5tan−1

(−5

4

)≈ −0.1792,

which is not of interest to us here since we are looking for time values that are greater than 0.To get these, we need to set n equal to natural numbers:

t1 = t0 +π

5≈ 0.449, t2 = t0 +

2π

5≈ 1.077, t3 = t0 +

3π

5≈ 1.706, . . .

In general O passes the equilibrium point at times

tn = t0 + nπ/5 ≈ −0.1792 + nπ seconds,

where n ∈ N.To find the period p for y(t), we need only examine Figure 7 to see that, starting at time t1,

every second time O passes the equilibrium point it will have completed a cycle, and so

p = t3 − t1 =(t0 +

3π

5

)−(t0 +

π

5

)=

2π

5seconds/cycle.

Thus the natural frequency is

ν =5

2πcycles/second.

To find the amplitude it is necessary to find the maximum displacement from the equilibriumposition that O attains. This entails finding the values of t for which y(t) attains a localextremum, which can be done by the usual calculus procedure: find y′(t), set it equal to 0, andsolve the equation for t. In our case the equation is

5

4sin 5t− cos 5t = 0,

which becomes tan 5t = 4/5, and so

t =1

5tan−1

(4

5

)≈ 0.1349 s

Putting this value for t into the function y(t) gives

y(0.1349) = −1

4cos(0.6747)− 1

5sin(0.6747) ≈ −0.320 m.

Thus, the farthest to the left of the equilibrium point that O goes is 0.320 m, and by thesymmetry of the situation it should be clear that 0.320 m is also as far to the right of theequilibrium point that O can go.12 Therefore the amplitude A is

A =ymax − ymin

2=

0.320− (−0.320)

2= 0.320 m,

which completes the analysis. �

12To formally find other values of t for which y(t) attains an extreme value remember that the tangentfunction is periodic with period π, and so in fact tan(5t− nπ) = 4/5 yields a different solution for each n ∈ Z.

109

It can be shown (see the exercises) that a function of the form

g(t) = c1 cos βt+ c2 sin βt

is periodic with periodp = 2π/β. (5.5)

For our purposes we will call a function y(t) quasiperiodic if there exists a monotonicfunction ϕ(t) and periodic function g(t) such that y(t) = ϕ(t)g(t). If g(t) has period p, wesay that y(t) has quasiperiod p, and natural quasifrequency 1/p. In what follows in thissection and the next the monotonic function ϕ(t) will always be some exponential functioneαt. Loosely speaking a quasiperiodic function is a function that exhibits some cyclical (i.e.oscillatory) behavior, and while the pattern followed in any one cycle is similar to all othercycles, it is not identical.

Example 5.2. A 2 kg object O is attached to a spring with stiffness k = 50 N/m. Initially Ois displaced 1/4 m to the left of the equilibrium point and given a velocity of 1 m/s to the left.Assuming the damping constant is 2 N·s/m, find the equation of motion of O, the times O is atthe equilibrium position, and also the quasiperiod of its motion.

Solution. Referencing (5.1), the model for the system is

2y′′ + 2y′ + 50y = 0, y(0) = −1/4, y′(0) = −1,

The characteristic equation 2r2 + 2r + 50 = 0 has complex conjugate roots

α± iβ =−2±

√22 − 4(2)(50)

2(2)=−1±

√−99

2≈ −0.5± 4.975i,

so α = −1/2 and β = 4.975, and by Theorem 4.30 the general solution to the ODE is

y(t) = eαt(c1 cos βt+ c2 sin βt) = e−t/2(c1 cos 4.975t+ c2 sin 4.975t).

From this and the initial condition y(0) = −1/4 we obtain c1 · 1 + c2 · 0 = 0, so that c1 = −0.25;and from

y′(t) = (−0.5c1 + 4.975c2)e−t/2 cos 4.975t+ (−4.975c1 − 0.5c2)e

−t/2 sin 4.975t

t

y(t)

0

−0.25

1 2 3 4

Figure 8. An underdamped system.

110

and the initial condition y′(0) = −1 we obtain −0.5c1 + 4.975c2 = −1, which yields

− 0.5(−0.25) + 4.975c2 = −1

and thus c2 ≈ 0.226. The equation of motion for O is thus

y(t) = (−0.25 cos 4.975t− 0.226 sin 4.975t)e−t/2,

the graph of which is given in Figure 8.Observe that the function y(t) is quasiperiodic, since it is the product of the monotone

decreasing function e−t/2 and the periodic function

g(t) = −0.25 cos 4.975t− 0.226 sin 4.975t.

The system is called underdamped since, while the damping constant results in an oscillatorymotion that lessens in amplitude over time, it is not enough to halt the oscillation at any time.By (5.5) the function g(t) has period p = 2π/4.975 ≈ 1.263 seconds/cycle, and therefore y(t)has quasiperiod 1.263 seconds/cycle. �

Example 5.3. A 2 kg object O is attached to a spring with stiffness k = 50 N/m. Initially Ois displaced 1/4 m to the left of the equilibrium point and given a velocity of 1 m/s to the left.Assuming the damping constant is 52 N·s/m, find the equation of motion of O, and find themaximum displacement from the equilibrium position that O attains.

Solution. This is the same mass-spring system as in Example 5.2 in all respects except for adrastically increased damping constant. The IVP that models the system is

2y′′ + 52y′ + 50y = 0, y(0) = −1/4, y′(0) = −1,

The characteristic equation2r2 + 52r + 50 = 0

has roots r = −25,−1, and so by Theorem 4.25 the general solution to the ODE is

y(t) = c1e−t + c2e

−25t.

Thusy′(t) = −c1e−t − 25c2e

−25t,

and the initial conditions of the IVP yield the system{c1 + c2 = −1

4

−c1 − 25c2 = −1

Solving the system yields c1 = −29/96 and c2 = 5/96, and so the equation of motion for O is

y(t) = −29

96e−t +

5

96e−25t.

Figure 9 shows the graph of y(t), which exhibits no oscillatory behavior whatsoever. In factthe object O, initially placed to the left of the equilibrium point and then given a push fartherleft, spends all eternity creeping rightward toward the equilibrium point without ever gettingthere even once. Certainly y(t)→ 0 as t→∞, however. A mass-spring system for which thereexists some t0 > 0 such that y(t) 6= 0 for all t > t0 is called either critically damped if the

111

t

y(t)

0

−0.25

10.061

Figure 9. An overdamped system.

characteristic equation has a single repeated negative real root, or overdamped if it has twodistinct negative real roots.

The global minimum value that y(t) is seen to attain in Figure 9 is the maximum displacementof O from the equilibrium position. To find this value we find the time t > 0 for which y′(t) = 0;that is, we solve

29

96e−t − 125

96e−25t = 0,

which becomes 29e24t = 125, and thus

t =1

24ln

(125

29

)≈ 0.061 s.

The maximum displacement of O from the position y = 0 is therefore

y(0.061) = −29

96e−0.061 +

5

96e−25(0.061) ≈ −0.273 m,

only slightly farther to the left than the initial position. �

112

5.2 – Forced Mechanical Vibrations

As established in the previous section, if the object O in a mass-spring system has someother force acting on it aside from spring tension and surface friction, then the force is calledexternal, denoted by Fext(t), and the model is

my′′ + by′ + ky = Fext. (5.6)

This is not a physics text, so there’s scarcely any point to deriving a bevy of ridiculousformulas to be used as crutches when analyzing any equation of the form (5.6). The idea herewill be to recall the techniques of Section 4.5 in order to solve (5.6) for different values of m, b,and k, different external forces Fext that vary over time t, and different initial conditions, andthen analyze the solutions to make conclusions about the physics of the associated systems.

Example 5.4. A 1 kg object O is attached to a spring with stiffness 6 N/m. There is adamping constant of 2 N·s/m, and also O is subjected to an external force that varies overtime t according to the formula 60 cos 10t. If O is initially displaced 5 m to the right of theequilibrium point and is at rest, find the equation of motion for O.

Solution. The IVP that models this mass-spring system is

y′′ + 2y′ + 6y = 60 cos 10t, y(0) = 5, y′(0) = 0.

The nonhomogeneity has the form Pm(t)eαt cos βt with m = 0, α = 0, and β = 10, and so sinceα + iβ = 10i is not a root of the characteristic equation

r2 + 2r + 6 = 0,

by the Method of Undetermined Coefficients a particular solution to the ODE will have the form

yp(t) = A cos 10t+B sin 10t.

Substituting this into the ODE and performing some algebra, we obtain

(−94A+ 20B) cos 10t+ (−94B − 20A) sin 10t = 60 cos 10t,

which yields the system of equations{−94A + 20B = 60

−20A − 94B = 0

Solving the system gives A = −1410/2309 and B = 300/2309, and so

yp(t) = −1410

2309cos 10t+

300

2309sin 10t.

The characteristic equation r2 + 2r + 6 = 0 has complex conjugate roots −1± i√

5, and sothe homogeneous equation y′′ + 2y′ + 6y = 0 has a general solution of the form

yh(t) = e−t(c1 cos

√5t+ c2 sin

√5t)

113

By the Superposition Principle, then, the nonhomogeneous equation y′′ + 2y′ + 6y = 60 cos 10thas a general solution of the form

y(t) = e−t(c1 cos

√5t+ c2 sin

√5t)− 1410

2309cos 10t+

300

2309sin 10t.

The initial condition y(0) = 5 implies that c1 = 12, 955/2309, and this, together with the initialcondition y′(0) = 0, will give c2 = 1991

√5/2309. The solution to the IVP is thus

y(t) = e−t

(12, 955

2309cos√

5t+1991√

5

2309sin√

5t

)︸︷︷︸

transient part of solution

+

(−1410

2309cos 10t+

300

2309sin 10t

)︸︷︷︸

steady-state solution

which is the equation of motion for the object O. Note that the solution has two parts: atransient part that fades over time (since e−t → 0 as t→∞), and a periodic steady-statesolution that for all practical purposes is “the” solution to the IVP for t very large.

To more compactly express the transient part of y(t), we set

A =√c21 + c22 = 3

√9030/2309

and find ϕ such that sinϕ = c1/A and cosϕ = c2/A. Thus

tanϕ =c1c2

=12, 955

1991√

5=

2591√

5

1991,

and since sinϕ > 0 and cosϕ > 0 imply that ϕ is in Quadrant I, we have

ϕ = tan−1(

2591√

5/1991)≈ 1.240

t

y(t)

general solution y(t)

steady-state solution

1

3

−1

5

1 4 5

Figure 10.

114

and so, by the trigonometric wiles adumbrated in §5.1,

12, 955

2309cos√

5t+1991√

5

2309sin√

5t = 3

√9030

2309sin(√

5t+ ϕ).

In a similar fashion we obtain

−1410

2309cos 10t+

300

2309sin 10t =

30√2309

sin(10t+ ψ),

whereψ = tan−1(−1410/300) ≈ −1.361.

Therefore

y(t) = 3

√9030

2309e−t sin

(√5t+ ϕ

)+

30√2309

sin(10t+ ψ),

or in the land of physicists and engineers,

y(t) = 5.93e−t sin(√

5t+ 1.24)

+ 0.62 sin(10t− 1.36).

Refer to Figure 10 to see how the general solution converges to the steady-state solution as tincreases. �

Example 5.5. An 8-kg object O is attached to a spring hanging from the ceiling, therebycausing the spring to stretch 1.96 m upon coming to rest at equilibrium. At time t = 0, anexternal force F (t) = cos 2t N is applied to the system. The damping constant for the system is3 N-sec/m. Determine the steady-state solution for the system.

Solution. There are two external forces acting on O: F (t) and also gravity. Thus the totalexternal force on O at time t is

Fext = mg + F (t) = (8 kg)(9.8 m/s2) + cos 2t N = 78.4 + cos 2t N.

We need to determine the spring constant k. Upon attaching O to the spring, the springstretches until its tension comes to equal the magnitude of the gravitational force acting on O.We have, by Hooke’s Law, with the understanding that down is the positive direction,

− (8 kg)(9.8 m/s2) = −mg = −Fgravity = Fspring = −ky = −k(1.96 m),

and so −1.96k = −78.4, which yields a spring constant of k = 40 N/m. The model for themass-spring system, my′′ + by′ + ky = Fext, is thus

8y′′ + 3y′ + 40y = 78.4 + cos 2t. (5.7)

The form of a particular solution to

8y′′ + 3y′ + 40y = 78.4

is yp1(t) = A, where A is some constant. Now, y′p1(t) = y′′p1(t) = 0, and so substitution into8y′′ + 3y′ + 40y = 78.4 gives 40A = 78.4 and finally A = 1.96. Therefore yp1(t) = 1.96.

The form of a particular solution to

8y′′ + 3y′ + 40y = cos 2t

115

is yp2(t) = A cos 2t+B sin 2t. Substituting this for y in the ODE yields

8(A cos 2t+B sin 2t)′′ + 3(A cos 2t+B sin 2t)′ + 40(A cos 2t+B sin 2t) = cos 2t,

which gives(8A+ 6B) cos 2t+ (−6A+ 8B) sin 2t = cos 2t,

and so we must have {8A + 6B = 1

−6A + 8B = 0

Solving this system of equations yields A = 2/25 and B = 3/50, and therefore

yp2(t) =2

25cos 2t+

3

50sin 2t.

By the Superposition Principle a particular solution to (5.7) is yp = yp1 + yp2 , or

yp(t) = 1.96 +2

25cos 2t+

3

50sin 2t, (5.8)

which happens to be the steady-state solution for the mass-spring system. However, it is commonpractice to take the equilibrium position to be wherever the spring comes to rest once the massis attached to it. We’re given that the spring stretches 1.96 m, so we shift the point where yequals zero down by 1.96 m by subtracting 1.96 from the right-hand side of (5.8), resulting in

yp(t) =2

25cos 2t+

3

50sin 2t

as the steady-state solution. Alternatively we may write

yp(t) = 0.1 sin(2t+ ϕ),

where ϕ = arctan(4/3) ≈ 0.927. �

116

6The Laplace Transform

6.1 – Improper Integrals

We shall call an interval I ⊆ R compact if it is both closed and bounded. Thus [a, b]is a compact interval if and only if −∞ < a < b < ∞, and indeed we always assume−∞ < a < b <∞ to be the case whenever the symbol [a, b] appears.

Given a function f : [a, b] → R, the Riemann integral (also known as the definiteintegral or simply the integral) of f on [a, b] that is a staple of calculus courses may bedenoted by ˆ b

a

f(t)dt or

ˆ b

a

f,

with f referred to as the integrand. As in calculus we defineˆ a

a

f = 0 and

ˆ a

b

f = −ˆ b

a

f ;

If the integral´ baf exists as a real number, then we say f is integrable on [a, b]. The set of all

real-valued functions f(t) that are integrable on [a, b] we denote by R[a, b]; that is,

R[a, b] =

{f : [a, b]→ R

∣∣∣∣ ˆ b

a

f ∈ R}.

An improper integral is any “integral” that is given as some kind of limit on a family ofintegrals. Of chief relevance to the present chapter is the improper integral defined as follows.

Definition 6.1. If f ∈ R[a, T ] for all T ≥ a, then we defineˆ ∞a

f = limT→∞

ˆ T

a

f

and say that´∞af converges if

´∞af ∈ R; otherwise

´∞af diverges.

To fully appreciate this definition it is well to recall that if c is a real number, then to write

limx→∞

f(x) = c

117

means by definition that, for any ε > 0, there exists some number xε such that |f(x)− c| < εfor all x > xε.

If an improper integral equals some real number c then it is customary to say the integral“converges to c.” Extending our previous notation, we define

R[a,∞) =

{f : [a,∞)→ R

∣∣∣∣ ˆ ∞a

f ∈ R}.

If´∞af converges, then

´∞τf likewise converges for any τ > a, and the additive property

states that ˆ ∞a

f =

ˆ τ

a

f +

ˆ ∞τ

f ; (6.1)

furthermore, if α ∈ R and´∞ag also converges, thenˆ ∞

a

αf = α

ˆ ∞a

f and

ˆ ∞a

(f + g) =

ˆ ∞a

f +

ˆ ∞a

g. (6.2)

These are the linearity properties, and in the parlance of linear algebra they establish thatthe transformation

´∞a

: R[a,∞)→ R is a linear mapping.

Remark. Throughout this chapter we shall have no need to consider integrals (improper orotherwise) for which the integrand is unbounded on some compact interval contained within theinterval of integration. Indeed, most of the time integrands will be continuous on the interval ofintegration.

Example 6.2. Evaluate the improper integralˆ ∞1

ln t

t2dt,

or state that it diverges

Solution. It will be easier to first determine the indefinite integralˆln t

t2dt.

t

y

1

0.1

0.2 Area =´∞1

ln(t)/t2dt = 1

Figure 11. The area under the curve y = ln(t)/t2.

118

We start with a substitution: let x = ln t, so that dx = (1/t)dt and ex = eln t = t; now,ˆln t

t2dt =

ˆxe−xdx.

Next, we employ integration by parts, letting u′ = e−x and v = x to obtainˆxe−xdx = −xe−x +

ˆe−x dx = −xe−x − e−x + c

for arbitrary constant c. Hence,ˆln t

t2dt = − ln t · 1

t− 1

t+ c = − ln t+ 1

t+ c.

Now the Fundamental Theorem of Calculus implies thatˆ ∞1

ln t

t2dt = lim

T→∞

ˆ T

1

ln t

t2dt = lim

T→∞

[− ln t+ 1

t

]T1

= limT→∞

[− lnT + 1

T+ 1

]= lim

T→∞

(T − lnT − 1

T

)= lim

T→∞

(1− 1/T

1

)= 1,

where L’Hopital’s Rule is used for the penultimate equality. �

Example 6.3. Evaluate the improper integralˆ ∞0

cos tdt,

or state that it diverges

Solution. We have ˆ ∞0

cos tdt = limT→∞

ˆ T

0

cos tdt = limT→∞

sinT,

and since the limit at right does not exist in R (indeed it cannot even be resolved as ±∞), weconclude that the integral diverges. �

If f(t) ≥ 0 for all t ∈ [a,∞), then the improper integral´∞af can be naturally interpreted

as being the area under the curve y = f(t) for t ≥ a (provided the integral converges). ThusExample 6.2 shows that the area under the curve y = ln(t)/t2 for t ≥ 1 is 1. See Figure 11.

Example 6.4. Evaluate ˆ ∞0

e−stdt

for all values of s for which the integral converges.

119

Solution. If s = 0 we obtainˆ ∞0

e−stdt =

ˆ ∞0

dt = limT→∞

ˆ T

0

dt = limT→∞

[t]T0

= limT→∞

T =∞,

and so the integral diverges. Assuming s 6= 0,

ˆ ∞0

e−stdt = limT→∞

ˆ T

0

e−stdt = limT→∞

[−e−st

s

]T0

= limT→∞

1− e−sT

s,

and since

limT→∞

e−sT =

{0, if s > 0

∞, if s < 0,

we find that the integral diverges for any s < 0, whileˆ ∞0

e−stdt =1

s

for all s > 0. �

We conclude this section with a couple more results concerning improper integrals, both ofwhich will be needed later in the chapter.

Theorem 6.5 (Comparison Test for Integrals). If´∞ag converges and |f(t)| ≤ g(t) for all

t ≥ a, then´∞af converges.

Another conclusion that follows from the hypotheses of this theorem is that´∞a|f | converges;

for verily, if´∞a|f | were divergent, and |f | ≤ g on [a,∞), then the theorem would imply that´∞

ag diverges—a contradiction.

Example 6.6. Suppose function f is bounded on [0,∞); that is, there exists some α ∈ R suchthat |f(t)| ≤ α for all t ∈ [0,∞). Then |e−stf(t)| ≤ αe−st for all t ≥ 0, and sinceˆ ∞

0

αe−stdt = α

ˆ ∞0

e−stdt =α

s

for all s > 0 by Example 6.4, the Comparison Test for Integrals implies thatˆ ∞0

e−stf(t)dt

converges for all s > 0. �

Proposition 6.7. If´∞af converges, then

limx→∞

ˆ ∞x

f = 0.

120

Proof. Since´∞af converges, there exists some c ∈ R such thatˆ ∞

a

f = limx→∞

ˆ x

a

f = c.

Now, recalling the additive property (6.1), for all x > a we haveˆ ∞x

f =

ˆ ∞a

f −ˆ x

a

f = c−ˆ x

a

f,

and therefore

limx→∞

ˆ ∞x

f = limx→∞

(c−ˆ x

a

f

)= lim

x→∞c− lim

x→∞

ˆ x

a

f = c− c = 0.

�

121

6.2 – Piecewise Continuity and Exponential Order

Given an interval I ⊆ R, a function f : I → R, and a point c ∈ I, recall that f is defined tobe continuous at c if limt→c f(t) = f(c). We say f is continuous on I if it is continuous atevery point in I. Continuous functions are ideal tools for work in a vast array of applications,but other scenarios as mundane as the switching on of an electrical circuit are better modeledusing a function having one or more discontinuities. Introducing the notation

f(c+) = limt→c+

f(t) and f(c−) = limt→c−

f(t)

for one-sided limits, we now define a larger class of functions that includes the class of continuousfunctions.

Definition 6.8. A function f is piecewise continuous on [a, b] if it satisfies the following:

1. f is defined and continuous at all but a finite number of points in [a, b].2. The limits f(t+) and f(t−) both exist in R for each t ∈ (a, b), as do f(a+) and f(b−).

Given an interval I, f is piecewise continuous on I if it is piecewise continuous on [t1, t2]for every t1, t2 ∈ I with t1 < t2.

In particular we find that f is piecewise continuous on [a,∞) if and only if it is piecewisecontinuous on [a, t] for every t > a. A careful reading of the definition should make clear that afunction need not be defined at every point in an interval I in order to be piecewise continuousthere.

Example 6.9. The function f given by

f(t) =

{4, if 0 < t < 1

7, if 1 < t ≤ 2

is continuous at all points in [0, 2] except 0, 1, and 2. Indeed, for any t ∈ (0, 1) ∪ (1, 2) we havef(t+) = f(t−) = f(t) ∈ R, and since f(0+) = 1, f(1−) = 4, f(1+) = 7, and f(2−) = 7, we seethat all relevant one-sided limits of f exist in R. Therefore f is piecewise continuous on [0, 2],even though f(1) is undefined and the domain of f is (0, 1) ∪ (1, 2]. �

Example 6.10. The function f(t) = 1/t2 is not piecewise continuous on [0,∞), since f(0+) =∞makes clear that f is not piecewise continuous even on [0, 1] ⊆ [0,∞). Indeed, because f(0−) =∞we find that f is not piecewise continuous on any interval containing 0.

Similarly, g(t) = sin(1/t) is not piecewise continuous on any interval containing 0, sinceneither g(0+) nor g(0−) exist in R. �

Most piecewise continuous functions that we consider that are not continuous on theirdomains feature one or more jump discontinuities. Perhaps the most important such function isthe unit step function, shown in Figure 12.

Definition 6.11. The unit step function u(t) is defined by

u(t) =

{0, if t < 0

1, if t ≥ 0.

122

t

u(t)

1

Figure 12. The unit step function u.

Some authors have > where we have ≥, and so leave u(0) undefined; however, this results

in simple integrals such as´ 10u(t)dt becoming technically improper, which we wish to avoid.

Therefore we designate u(0) = 1, so that u has a jump discontinuity in its domain at t = 0, withu(0−) = 0 and u(0+) = 1. In any case u is clearly piecewise continuous on (−∞,∞), and thejump discontinuity may be shifted to t = a with a simple modification:

u(t− a) =

{0, if t < a

1, if t ≥ a,

shown in Figure 13. More generally, given any function f(t) : [a,∞)→ R, we define

f(t)u(t− a) =

{0, if t < a

f(t), if t ≥ a,

regardless of how f(t) may or may not be defined for t < a!

Example 6.12. Any piecewise continuous functions can be expressed in terms of the unit stepfunction. For instance for

Πa,b(t) =

0, if t < a

1, if a ≤ t < b

0, if t ≥ b,

called the rectangular window function associated with a and b, we find that

Πa,b(t) = u(t− a)− u(t− b).

t

u(t− a)

1

a

Figure 13. The function u(t− a).

123

Another example is f : [0,∞)→ R given by

f(t) =

sin t, if t ∈ [0, π/2)

1, if t ∈ [π/2, 10)

et−10 if t ∈ [10,∞),

which is expressible as

f(t) = sin t+ (1− sin t)u(t− π

2

)+(et−10 − 1

)u(t− 10).

�

It is a fact that if f, g ∈ R[a, b] and f(t) = g(t) for all but finitely many t ∈ [a, b], thenˆ b

a

f =

ˆ b

a

g. (6.3)

Suppose that f : [a, b] → R is continuous on (a, b) but piecewise continuous on [a, b]. Thisleaves open the possibility that f is discontinuous at either a or b, so either f(a+) 6= f(a) orf(b−) 6= f(b), but in any case we have f(a+) ∈ R and f(b−) ∈ R. Define g : [a, b]→ R by

g(t) =

f(a+), if t = a

f(t), if a < t < b

f(b−), if t = b,

which is the continuous extension of f : (a, b)→ R to [a, b]. Then (6.3) holds, and becauseg is continuous on [a, b], the Fundamental Theorem of Calculus informs us that g has anantiderivative G on [a, b], and therefore

ˆ b

a

f =

ˆ b

a

g = G(b)−G(a).

This is a common means of evaluating the integral of many functions that are continuous on acompact interval except at the endpoints.

Another fact is that if f is defined and piecewise continuous on [a, b], then f ∈ R[a, b];and what is more, if t1 < t2 < · · · < tn are the points in (a, b) where f is discontinuous, then,allowing for the possibility that f is also discontinuous at a or b, we find that

ˆ b

a

f = limh→0+

[ˆ t1−h

a+h

f +n−1∑k=1

(ˆ tk+1−h

tk+h

f

)+

ˆ b−h

tn+h

f

]. (6.4)

There are no improper integrals lurking here since f is assumed defined on the compact interval[a, b]. However, for all sufficiently small h > 0 we find that each of the integrals at right in (6.4)is taken over an interval where f is continuous and so has an antiderivative. Letting F denotethe antiderivative of f on each closed interval that forms the limits of an integral at right in(6.4), and noting that

f(c+) = limh→0+

f(c+ h) and f(c−) = limh→0+

f(c− h)

124

in general, the Fundamental Theorem of Calculus impliesˆ b

a

f = limh→0+

(F (t)

∣∣t1−ha+h

+n−1∑k=1

F (t)∣∣tk+1−htk+h

+ F (t)∣∣b−htn+h

)

=[F (t−1 )− F (a+)

]+

n−1∑k=1

[F (t−k+1)− F (t+k )

]+[F (b−)− F (t+n )

].

As involved as this formulation may appear, it can be largely circumvented by using the techniqueillustrated in the previous paragraph and passing to the (unique) continuous extension of f oneach of the intervals [a, t1], [tk, tk+1], and [tn, b]. If we denote these continuous extensions by g0,gk, and gn, respectively, then we have

ˆ b

a

f =

ˆ t1

a

g0 +n−1∑k=1

(ˆ tk+1

tk

gk

)+

ˆ b

tn

gn.

Each of the integrals at right in this simpler formulation may theoretically be evaluated usingthe Fundamental Theorem of Calculus.

Remark. After the next example, in the interests of easing the evaluation of integrals ofpiecewise continuous functions, passing to continuous extensions in the foregoing fashion will bedone implicitly!

Example 6.13. We consider the function

f(t) =

t, if t ∈ [0, 2)

1/t, if t ∈ [2, 4]√t if t ∈ (4, 9],

which is piecewise continuous on [0, 9]. We haveˆ 9

0

f =

ˆ 2

0

f +

ˆ 4

2

f +

ˆ 9

4

f =

ˆ 2

0

tdt+

ˆ 4

2

1

tdt+

ˆ 9

4

√tdt

=

[1

2t2]20

+[

ln t]42

+

[2

3t3/2]94

=44

3+ ln 2. (6.5)

In the expression after the first equality in (6.5), specifically in the first and third integrals,we are implicitly passing to appropriate continuous extensions of f . For the first integral, forinstance, we are passing from f : [0, 2]→ R given by

f(t) =

{t, if t ∈ [0, 2)

1/2, if t = 2,

which has no antiderivative on [0, 2], to the continuous function g : [0, 2]→ R given by g(t) = tfor all t ∈ [0, 2], which has antiderivative t2/2 on [0, 2]. This allows for the easy evaluation of´ 2

0f using the Fundamental Theorem of Calculus:

ˆ 2

0

f =

ˆ 2

0

g =

ˆ 2

0

tdt =

[1

2t2]20

= 2.

125

Again, the passing to continuous extensions in (6.5) is implicit, in the sense that no commentabout the use of such extensions is made in the course of the calculations. As stated in theRemark above, this is how the integration of piecewise continuous functions will be handledfrom now on. �

Given a function f : [a,∞) → R, we have seen in the previous section that the integral´∞af does not always exist. Indeed,

´∞af may diverge even if f is piecewise continuous or

continuous on [a,∞). Under what conditions will the integral converge? It is not a trivial matterto ascertain the collection of all functions f : [a,∞) → R for which

´∞af is convergent, but

fortunately this is not necessary. For most applications it is enough that f have two properties:piecewise continuity and being of exponential order.

Definition 6.14. Given α ∈ R, a function f is of exponential order α if there exist constantsC > 0 and τ ∈ R such that

|f(t)| ≤ Ceαt (6.6)

for all t ≥ τ .

The choice of constants for which a function f satisfies (6.6) on [τ,∞) is not unique. Anyworkable C and τ values may be replaced by larger ones, for instance, and if we choose τ to bepositive then any workable α value could be replaced by something larger. Usually the smallestα value is sought, if such exists. A function that is of exponential order α may simply be saidto be of exponential order if the value of α is immaterial. Clearly for a function to be ofexponential order its domain must contain an interval [a,∞) for some a ∈ R.

Example 6.15. Perhaps not surprisingly, any exponential function is of exponential order. Forf(t) = 2t we have

|f(t)| = 2t = et ln 2

for all t ∈ R, which shows that (6.6) can be satisfied by choosing α = ln 2, τ = 0, and C = 1. �

Example 6.16. Let p(t) = tn for some integer n ≥ 1. We start by observing that

|p(t)| = |t|n = tn = en ln t (6.7)

for all t ≥ 1. (It’s worth further noting that the last equality fails if t ≤ 0. Now, applyingL’Hopital’s Rule,

limt→∞

ln t

t= lim

t→∞

1

t= 0,

which by the definition of limit implies there exists some τ > 1 such that

ln t

t≤ 1

for all t ≥ τ , and hence ln t ≤ t for t ≥ τ . Recalling (6.7), it follows that

|p(t)| ≤ ent

for t ≥ τ . This shows that tn is of exponential order, since we may substitute p for f in (6.6),and choose α = n and C = 1.

126

If n ≤ 0 we have |p(t)| = tn ≤ 1 < et for t ≥ 1, which again shows tn to be of exponentialorder. �

Proposition 6.17. Let c be a constant. Suppose f(t) is of exponential order α and g(t) is ofexponential order β. Then cf , f + g, and fg are of exponential order α, max{α, β}, and α+ β,respectively.

Proof. Let A,B > 0 and τ1, τ2 ∈ R be such that |f(t)| ≤ Aeαt for t ≥ τ1 and |g(t)| ≤ Beβt fort ≥ τ2. For t ≥ τ1,

|(cf)(t)| = |cf(t)| = |c||f(t)| ≤ |c|Aeαt,

and thus cf is also of exponential order.Now let µ = max{α, β}, τ = max{τ1, τ2}, and C = max{A,B}. For any t ≥ τ , by the

Triangle Inequality,

|(f + g)(t)| = |f(t) + g(t)| ≤ |f(t)|+ |g(t)| ≤ Aeαt +Beβt ≤ Ceµt + Ceµt = 2Ceµt,

and therefore f + g is of exponential order µ.Finally, for t ≥ τ ,

|(fg)(t)| = |f(t)||g(t)| ≤ (Aeαt)(Beβt) = ABe(α+β)t,

and so fg is of exponential order α + β. �

Remark. The details of the proof of Proposition 6.17 make clear that if f and g are both ofexponential order α in particular, then so too are cf and f +g, whereas fg will be of exponentialorder 2α.

Example 6.18. Given constants a0, . . . , an, any polynomial function

p(t) =n∑k=0

aktk

is of exponential order. To see this, note that each monomial tk is of exponential order byExample 6.16, and thus each term akt

k is of exponential order by Proposition 6.17. Now anotherapplication of Proposition 6.17 makes clear that the sum of all akt

k, and hence p(t) itself, is ofexponential order. �

Example 6.19. For any integer n ≥ 1, since tn/eεt → 0 as t→∞ for any ε > 0, we find thattn is of exponential order ε for any ε > 0. However, tn is clearly not of exponential order 0, asthat requires there be constants C > 0 and τ ∈ R such that |t|n ≤ C for all t ≥ τ . In light ofProposition 6.17, if f(t) is of exponential order α, then tnf(t) is of exponential order α + ε forany ε > 0. �

The next proposition is sometimes a useful tool for determining whether or not a function isof exponential order.

127

Proposition 6.20. If f is of exponential order α, then

limt→∞

f(t)e−st = 0

for all s > α.

Proof. Suppose f is of exponential order α, so there are constants C, τ > 0 such that |f(t)| ≤Ceαt for all t ≥ τ . Let s > α. Since

0 ≤ |f(t)|e−st ≤ Ceαte−st = Ce(α−s)t

for t ≥ τ , and Ce(α−s)t → 0 as t → ∞, the Squeeze Theorem implies that |f(t)|e−st → 0 ast→ 0. Therefore f(t)e−st → 0 as t→∞ as well. �

Example 6.21. We know that f(t) = et is of exponential order, and in Example 6.16 we found

g(t) = t2 is also of exponential order. Is the composition (f ◦ g)(t) = et2

of exponential order?For any s ∈ R we find that t− s→∞ as t→∞, and so

limt→∞

(f ◦ g)(t)e−st = limt→∞

et2

e−st = limt→∞

et(t−s) =∞.

Since the limit is nonzero for all s ∈ R, Proposition 6.20 implies that et2

is not of exponentialorder. �

128

6.3 – Definition of the Laplace Transform

We now introduce the chief protagonist of the current chapter, the Laplace transform. Likedifferential operators, the Laplace transform is a function having domain and range consistingof functions (as opposed to scalars). After developing some properties of the transform in thissection and the next, and working with the inverse of the transform in §6.5, we will start in §6.6to employ the transform in order to solve initial-value problems.

Definition 6.22. Given a function f : [0,∞) → R, the Laplace transform of f is thefunction L[f ] given by

L[f ](s) =

ˆ ∞0

e−stf(t)dt (6.8)

for all s ∈ R for which the integral converges.

The definition presents L[f ] as a function of s, and since this will virtually always be the caseit is customary to let the symbol L[f ] stand in for L[f ](s) in the interests of brevity. Moreoverif, for example, f(t) = sin t, we may write L[sin t] instead of L[f ], and in general there is nosubstantive difference between the symbols L[f ] and L[f(t)]. Thus in certain contexts thesymbols L[f ], L[f ](s), L[f(t)] and L[f(t)](s) may be used interchangeably.

As the definition implies, the domain of L[f ] is the set of all s ∈ R such that the improperintegral in (6.8) exists as a real number. Sinceˆ ∞

0

e−stf(t)dt = limT→∞

ˆ T

0

e−stf(t)dt,

we see that any question concerning the existence of L[f ](s) boils down to a question aboutwhether the associated limit of integrals exists.

A harder question to answer is this: What is the domain of the transformation L itself? Tostart, we could say that a function f is in the domain of L if there exists at least one value of sfor which L[f ](s) is defined as a real number. However, if L[f ] is not defined on at least someopen interval of real numbers then it cannot be of much use for our purposes. While requiringthat L[f ](s) be defined for all s in some open interval in order for f to be in the domain of L isa step in the right direction, it still leaves us with a hard question. Fortunately it is not trulynecessary to determine the domain of L in its entirety: a practical subset will do.

The next theorem informs us that the set of functions of exponential order that are piecewisecontinuous on [0,∞) is a subset of Dom(L). The proof of the theorem uses a couple facts fromcalculus: first, if f, g ∈ R[a, b], then fg ∈ R[a, b]; and second, if f ∈ R[a, b], then |f | ∈ R[a, b].

Theorem 6.23. If f is piecewise continuous on [0,∞) and of exponential order α, then L[f ](s)exists for all s > α.

Proof. Suppose f is of exponential order α, so there exists τ, C > 0 such that

|f(t)| ≤ Ceαt

for all t ∈ [τ,∞). Also suppose f is piecewise continuous on [0,∞), so that f is piecewisecontinuous on [τ, t] and hence f ∈ R[τ, t] for all t ≥ τ . This implies |f | ∈ R[τ, t], and since e−st

129

is integrable on any compact interval, it follows that the product e−st|f(t)| is integrable on anycompact interval in [τ,∞).

For t ≥ τ we have0 ≤ e−st|f(t)| ≤ Ce−steαt = Ce(α−s)t.

Let s > α. Then α− s < 0 so that e(α−s)t → 0 as t→∞. Now,ˆ ∞τ

Ce(α−s)tdt = limT→∞

ˆ T

τ

Ce(α−s)tdt = limT→∞

C

α− s[e(α−s)T − e(α−s)τ

]=

C

α− s[0− e(α−s)τ

]=Ce(α−s)τ

s− α,

so ˆ ∞τ

Ce(α−s)t

is convergent. Therefore ˆ ∞τ

e−st|f(t)| dt

is convergent by Theorem 6.5 (the Comparison Test for Integrals), whence it follows thatˆ ∞τ

e−stf(t) dt

is also convergent and hence exists in R.Finally, f is piecewise continuous on [a, τ ], so that f and subsequently e−stf(t) is integrable

on [a, τ ]. That is, ˆ τ

a

e−stf(t) dt

is defined in R, and then

L[f ](s) =

ˆ ∞a

e−stf(t)dt = limT→∞

ˆ T

a

e−stf(t)dt

= limT→∞

[ˆ τ

a

e−stf(t)dt+

ˆ T

τ

e−stf(t)dt

]=

ˆ τ

a

e−stf(t)dt+

ˆ ∞τ

e−stf(t)dt

shows that L[f ](s) likewise is defined in R.Therefore L[f ](s) exists for all s > α, and the proof is done. �

Henceforth we shall let the symbol E denote the collection of piecewise continuous functionsf : [0,∞) → R that are of exponential order, and take E to be the domain of the Laplacetransform L. This is not to say there don’t exist functions f : [0,∞)→ R that don’t belong toE and yet have a Laplace transform (for example t−1/2), but the functions in E are suitable fortreating a vast array of applications.

We now give some examples of finding explicit Laplace transforms using Definition 6.22,an oftentimes tedious process that nonetheless is necessary to construct a table of commontransforms.

130

Example 6.24. Assuming a 6= 0, the Laplace transform for eat is relatively easy to obtain. Fors > a,

L[eat](s) =

ˆ ∞0

e−steatdt = limT→∞

ˆ T

0

e(a−s)tdt = limT→∞

[e(a−s)t

a− s

]T0

= limT→∞

[e−(s−a)T

a− s+

1

s− a

]=

1

s− a. (6.9)

The integral diverges if s ≤ a.If a = 0 the function eat becomes the constant function 1, but in fact the work in Example

6.4 shows that L[1](s) = 1/s for all s > 0, and so (6.9) is seen to hold even when a = 0. �

Example 6.25. Find L[t2], the Laplace transform of the function t 7→ t2.

Solution. To “find” L[t2] means to find an expression for L[t2](s). By definition we have

L[t2](s) =

ˆ ∞0

e−stt2dt = limT→∞

ˆ T

0

t2e−stdt. (6.10)

To evaluate the definite integral we employ integration by parts. Letting u(t) = t2 andv′(t) = e−st, we obtain u′(t) = 2t and v(t) = −e−st/s, and soˆ T

0

t2e−stdt =

[−1

st2e−st

]T0

−ˆ T

0

−2

ste−stdt = −T

2

se−sT +

2

s

ˆ T

0

te−stdt. (6.11)

Now, the integral´ T0te−stdt itself requires integration by parts. Letting u(t) = t and v′(t) = e−st,

we obtain u′(t) = 1 and v(t) = −e−st/s, and soˆ T

0

te−stdt =

[−1

ste−st

]T0

−ˆ T

0

−1

se−stdt = −T

se−sT − 1

s2[e−st

]T0

= −Tse−sT − 1

s2(e−sT − 1). (6.12)

Putting (6.12) into (6.11) givesˆ T

0

t2e−st dt = −T2

se−sT − 2

s

[T

se−sT +

1

s2(e−sT − 1)

]=

2

s3− s2T 2 + 2sT + 2

s3esT,

and putting this result into (6.10) yields

L[t2](s) = lim

T→∞

(2

s3− s2T 2 + 2sT + 2

s3esT

).

This limit does not exist if s ≤ 0; however if s > 0, then by two successive applications ofL’Hopital’s Rule we obtain

limT→∞

s2T 2 + 2sT + 2

s3esT= lim

T→∞

2s2T + 2s

s4esT= lim

T→∞

2s2

s5esT= lim

T→∞

2

s3esT= 0,

and therefore

L[t2](s) = lim

T→∞

2

s3− lim

T→∞

s2T 2 + 2sT + 2

s3esT=

2

s3

for all s > 0. �

131

One striking feature of the Laplace transform is that if f ∈ E is a necessarily piecewise-definedfunction, the Laplace transform L[f ](s) may be defined by a single expression in s.

Example 6.26. Find the Laplace transform of the function

f(t) =

1− t, if 0 ≤ t ≤ 1

0, if 1 < t ≤ 3

e2t, if 3 < t <∞

Solution. Using the usual properties of the Riemann integral established in calculus andassuming s 6= 2, we obtain

L[f ](s) = limT→∞

ˆ T

0

e−stf(t)dt

= limT→∞

(ˆ 1

0

e−st(1− t)dt+

ˆ 3

1

e−st · 0dt+

ˆ T

3

e−ste2tdt

)= lim

T→∞

(ˆ 1

0

e−stdt−ˆ 1

0

te−st dt+

ˆ T

3

e(2−s)tdt

)= lim

T→∞

[−1

s(e−s − 1)−

(−1

se−s − 1

s2(e−s − 1)

)+

1

2− s

(e(2−s)T − e3(2−s)

)]= lim

T→∞

[e−s + s− 1

s2+

1

2− s

(e(2−s)T − e3(2−s)

)](6.13)

The limit does not exist if s < 2, since we would find that e(2−s)T →∞ as T →∞. If s = 2 thelimit again does not exist, since in this caseˆ T

3

e−ste2tdt =

ˆ T

3

e−2te2tdt =

ˆ T

3

dt = T − 3,

where T − 3→∞ as T →∞. However if s > 2, then e(2−s)T → 0 as T →∞, and from (6.13)we obtain

L[f ](s) =e−s + s− 1

s2+

1

2− s[0− e3(2−s)

]=e−s + s− 1

s2+e6−3s

s− 2,

which can be seen to be not piecewise-defined! �

Example 6.27. For b 6= 0 find L[sin bt] and L[cos bt].

Solution. From a table of integrals found in nearly any calculus textbook we haveˆeau sin bu du =

eau

a2 + b2(a sin bu− b cos bu) + C,

and so

L[sin bt](s) =

ˆ ∞0

e−st sin btdt = limT→∞

ˆ T

0

e−st sin btdt

= limT→∞

[e−st

s2 + b2(−s sin bt− b cos bt)

]T0

132

f(t) L[f ](s) Dom(L[f ])

sin btb

s2 + b2s > 0

cos bts

s2 + b2s > 0

t sin bt2bs

(s2 + b2)2s > 0

t cos bts2 − b2

(s2 + b2)2s > 0

eat sin btb

(s− a)2 + b2s > a

eat cos bts− a

(s− a)2 + b2s > a

tn, n = 0, 1, 2, . . .n!

sn+1s > 0

eattn, n = 0, 1, 2, . . .n!

(s− a)n+1s > a

Table 1. Laplace transforms of common functions.

= limT→∞

[b

s2 + b2− e−sT

s2 + b2(s sin bT + b cos bT )

]=

b

s2 + b2

for s > 0. The integral diverges if s ≤ 0. In a similar fashion we may useˆeau cos bu du =

eau

a2 + b2(a cos bu+ b sin bu) + C

to obtain

L[cos bt](s) =s

s2 + b2

for s > 0. �

Table 1 gives the Laplace transforms for many frequently encountered functions, with amore comprehensive table given at the end of the chapter. Additional items in the table will beverified in later examples. For the transform of t2 using the table, let a = 0 and n = 2 in eattn

to obtain

L[t2] =2!

(s− 0)2+1=

2

s3

133

for s > 0, as was found in Example 6.25. For the transform of 1, let a = 0 and n = 0 in eattn toobtain

L[1](s) =0!

(s− 0)0+1=

1

sfor s > 0, as was found in Example 6.4.

With the next proposition it becomes clear that the Laplace transform is in fact a lineartransformation on the vector space E , provided we agree that the domain of L[f + g] should berestricted to being the intersection of the domains of L[f ] and L[g].

Proposition 6.28. Suppose f, g ∈ E are of exponential order α, and let c ∈ R. Then for alls > α,

1. L[f + g](s) = L[f ](s) + L[g](s)2. L[cf ](s) = cL[f ](s)

Proof.Proof of (1). Let s > α. Then L[f ](s) and L[g](s) are both defined by Theorem 6.23, which isto say that the limits

limT→∞

ˆ T

0

e−stf(t)dt and limT→∞

ˆ T

0

e−stg(t)dt

both exist. This fact allows us to use a limit law to write

L[f ](s) + L[g](s) = limT→∞

ˆ T

0

e−stf(t)dt+ limT→∞

ˆ T

0

e−stg(t)dt

= limT→∞

[ˆ T

0

e−stf(t)dt+

ˆ T

0

e−stg(t)dt

],

and thus by an established property of integrals we obtain

L[f ](s) + L[g](s) = limT→∞

[ˆ T

0

(e−stf(t)dt+ e−stg(t)

)dt

]=

ˆ T

0

e−st(f + g)(t)dt = L[f + g](s),

where of course (f + g)(t) = f(t) + g(t) by definition. Hence

(L[f ] + L[g])(s) = L[f ](s) + L[g](s) = L[f + g](s)

for all s > α, which proves part (1).

Proof of (2). Left as an exercise. �

Since (L[f ] +L[g])(s) = L[f ](s) +L[g](s) by definition, we may write L[f + g] = L[f ] +L[g],with the understanding that the equality holds on some interval of the form (α,∞). Similarlywe may write L[cf ] = cL[f ].

The next example illustrates the usefulness of the Laplace transform’s linearity properties,especially when employed in conjunction with Table 1.

Example 6.29. Find L[f ] for f(t) = e−5t sin πt− cosh 8t+ 7.

134

Solution. Recall that the hyperbolic cosine function cosh is defined by cosh(x) = 12(ex + e−x).

By Proposition 6.28 we obtain

L[f ](s) = L[e−5t sin πt− cosh 8t+ 7

](s)

= L[e−5t sin πt

](s)− L[cosh 8t](s) + L[7](s)

= L[e−5t sin πt

](s)− L

[12e8t + 1

2e−8t

](s) + L[7](s)

= L[e−5t sin πt

](s)− 1

2L[e8t](s)− 1

2L[e−8t

](s) + 7L[1](s) (6.14)

Now we use Table 1 to get

L[e−5t sin πt

](s) =

π

(s+ 5)2 + π2

for s > π,

L[e8t](s) =

1

s− 8for s > 8,

L[e−8t

](s) =

1

s+ 8for s > −8, and finally L[1](s) = 1/s for s > 0. Putting all these results into (6.14) gives

L[f ](s) =π

(s+ 5)2 + π2− 1

2· 1

s− 8− 1

2· 1

s+ 8+ 7 · 1

s

=π

(s+ 5)2 + π2− s

s2 − 64+

7

s

for s > 8, which is the intersection of the domains of the individual transforms. �

135

6.4 – Laplace Transform Properties

In this section we establish a variety of properties of the Laplace transform that will proveindispensable in upcoming developments.

Theorem 6.30 (First Shifting Theorem). If Dom(L[f ]) = (α,∞),

L[eatf(t)](s) = L[f(t)](s− a)

for all s > α + a.

Proof. Suppose Dom(L[f ]) = (α,∞), and fix s > α+a. Then s−a ∈ (α,∞), so that L[f ](s−a)is defined in R, and we have

L[f(t)](s− a) =

ˆ ∞0

e−(s−a)tf(t)dt =

ˆ ∞0

e−st · eatf(t)dt = L[eatf(t)](s)

by Definition 6.22. �

Example 6.31. From Example 6.29 we found that, for f(t) = e−5t sin πt− cosh 8t+ 7,

L[f ](s) := F (s) =π

(s+ 5)2 + π2− s

s2 − 64+

7

s

for s > 8. Therefore, by Theorem 6.30,

L[e7tf(t)

](s) = L

[e2t sin πt− e7t cosh 8t+ 7e7t

](s) = F (s− 7)

=π

[(s− 7) + 5]2 + π2− s− 7

(s− 7)2 − 64+

7

s− 7

=π

(s− 2)2 + π2− s− 7

s2 − 14s− 15+

7

s− 7

for all s > 8 + 7 = 15. �

Example 6.32. In Example 6.27 it was found that

L[sin bt](s) =b

s2 + b2

for s > 0, and thus, by Theorem 6.30,

L[eat sin bt](s) = L[sin bt](s− a) =b

(s− a)2 + b2

for s > a. This verifies another item in Table 1, with the verification of L[eat cos bt] beingsimilar. �

The property of the Laplace transform given in the next theorem will be put to work timeand again throughout the remainder of the chapter. It is this property that is the reason why theLaplace transform can be used to solve a wide variety of differential equations, and in Chapter8 it is employed to solve certain systems of differential equations as well.

136

Theorem 6.33. For n ≥ 1, suppose f, f ′, . . . , f (n) are of exponential order α. If f, f ′, . . . , f (n−1)

are continuous on [0,∞), and f (n) is piecewise continuous on [0,∞), then

L[f (n)

](s) = snL[f ](s)−

n∑k=1

sn−kf (k−1)(0)

for all s > α.

Proof. When n = 1 the theorem states that if f and f ′ are of exponential order α, f iscontinuous on [0,∞), and f ′ is piecewise continuous on [0,∞), then

L[f ′](s) = sL[f ](s)− f(0) (6.15)

for all s > α. We prove this to start. Fix s > α. Integration by parts gives

L[f ′](s) = limT→∞

ˆ T

0

f ′(t)e−stdt = limT→∞

([f(t)e−st

]T0−ˆ T

0

−sf(t)e−stdt

)= lim

T→∞

(f(T )e−sT − f(0) + s

ˆ T

0

f(t)e−stdt

)= −f(0) + sL[f ](s),

where f(T )e−sT → 0 as T → ∞ by Proposition 6.20. This shows that the statement of thetheorem holds in the case when n = 1.

When n = 2 the theorem states that if f , f ′, and f ′′ are of exponential order α, f and f ′ arecontinuous on [0,∞), and f ′′ is piecewise continuous on [0,∞), then

L[f ′′](s) = s2L[f ](s)− sf(0)− f ′(0) (6.16)

for all s > α. To show this, we simply apply the n = 1 case first to f ′, and then to f :

L[f ′′](s) = L[(f ′)′](s) = sL[f ′](s)− f ′(0) = s[sL[f ](s)− f(0)

]− f ′(0)

= s2L[f ](s)− sf(0)− f ′(0).

The full proof of the theorem is done by induction: suppose the theorem is true for somearbitrary n ≥ 1, and then show that it is true for n+ 1. �

Example 6.34. Example 6.4 shows that

L[t0](s) = L[1](s) =1

s=

0!

s0+1

for all s > 0.13 For n ≥ 0 suppose that L[tn](s) = n!/sn+1 for s > 0. Applying (6.15) tof(t) = tn+1, it follows that

(n+ 1)L[tn](s) = L[(n+ 1)tn](s) = L[(tn+1)′](s) = sL[tn+1](s)

13For the function t 7→ t0 we naturally define 00 = 1, so that the function is identically equal to 1. In the nextsection we will come to understand that, from the standpoint of the Laplace transformation, all functions thatequal the constant function t 7→ 1 everywhere on [0,∞) except for a finite number of points are essentially thesame.

137

for s > 0, and hence

L[tn+1](s) =n+ 1

sL[tn](s) =

n+ 1

s· n!

sn+1=

(n+ 1)!

sn+2

for s > 0. Therefore

L[tn](s) =n!

sn+1

holds for all n ≥ 0 and s > 0 by the Principle of Induction. This verifies the next-to-last entryin Table 1. �

To prove the next property of the Laplace transform requires a variant of Leibniz’s IntegralRule (Theorem 1.6) that applies to an integral of the type

´∞a

. Such a rule, however, requiresthe concept of uniform convergence, which we now define.

Definition 6.35. Let S ⊆ R, and for some r0 ∈ R define a function Φr : S → R for each realr > r0. The family {Φr : r > r0} converges uniformly to a function Φ on S if, for eachε > 0, there exists some rε > r0 such that

|Φr(x)− Φ(x)| < ε

for all r > rε and x ∈ S.

Given a function ϕ(s, t) with domain (σ,∞)× [τ,∞), suppose

Φ(s) =

ˆ ∞τ

ϕ(s, t)dt (6.17)

converges for all s > σ, and define

ΦT (s) =

ˆ T

τ

ϕ(s, t)dt (6.18)

for all s > σ and T > τ . Given [s1, s2] ⊆ (σ,∞), we say the integral (6.17) convergesuniformly on [s1, s2] if the family of functions {ΦT : T > τ} converges uniformly to Φ on[s1, s2].

The following proposition furnishes an oftentimes convenient test to determine whether animproper integral of the form (6.17) converges uniformly on a compact interval of s values.

Proposition 6.36 (Weierstrass M-Test). Suppose M : [τ,∞) → R is continuous and´∞τM(t)dt converges. If |ϕ(s, t)| ≤ M(t) for all (s, t) ∈ [s1, s2] × [τ,∞), then

´∞τϕ(s, t)dt

converges uniformly on [s1, s2].

Proof. Suppose |ϕ(s, t)| ≤ M(t) for all (s, t) ∈ [s1, s2] × [τ,∞). Since´∞τM(t)dt converges

and |ϕ(s, t)| ≤ M(t) for all s1 ≤ s ≤ s2 and t ≥ τ , Theorem 6.5 implies that´∞τ|ϕ(s, t)|dt

converges, and hence´∞T|ϕ(s, t)|dt converges for all s1 ≤ s ≤ s2 and T ≥ τ . This fact also

ensures that Φ(s) and ΦT (s) as defined by (6.17) and (6.18) exist for all s1 ≤ s ≤ s2 and T ≥ τ .Let ε > 0. By Proposition 6.7 there exists some Tε > τ such thatˆ ∞

Tε

M(t)dt < ε,

138

and thus ˆ ∞Tε

|ϕ(s, t)|dt ≤ˆ ∞Tε

M(t)dt < ε

for all s ∈ [s1, s2]. Fixing T > Tε and s ∈ [s1, s2], we have

|ΦT (s)− Φ(s)| =∣∣∣∣ˆ ∞T

ϕ(s, t)dt

∣∣∣∣ ≤ ˆ ∞T

|ϕ(s, t)|dt ≤ˆ ∞Tε

|ϕ(s, t)|dt < ε,

which shows that the family {ΦT : T > τ} converges uniformly to Φ on [s1, s2], and therefore´∞τϕ(s, t)dt converges uniformly on [s1, s2]. �

Example 6.37. Suppose f : [0,∞) → [0,∞) is continuous and´∞0f(t)dt converges. Fix

0 ≤ s1 < s2 <∞. For any (s, t) ∈ [s1, s2]× [0,∞) we have st ≥ 0, so that e−st ≤ 1, and sincef(t) ≥ 0 for all t ≥ 0 it follows that

|e−stf(t)| = e−stf(t) ≤ f(t),

and therefore

L[f ](s) =

ˆ ∞0

e−stf(t)dt

converges uniformly on [s1, s2] by the Weierstrass M-Test. That is, the Laplace transform of fconverges uniformly on compact intervals in [0,∞). �

We now state without proof the aforementioned variant of Leibniz’s Integral Rule that willbe needed to prove Theorem 6.40.

Theorem 6.38. Suppose ϕ(s, t) and ∂ϕ/∂s are continuous on (σ,∞)× [τ,∞). If

Φ(s) =

ˆ ∞τ

ϕ(s, t)dt

converges for all s ∈ (σ,∞), and ˆ ∞τ

∂ϕ

∂s(s, t)dt

converges uniformly on all compact I ⊆ (σ,∞), then Φ′(s) exists for all s ∈ (σ,∞), with

Φ′(s) =d

ds

ˆ ∞τ

ϕ(s, t)dt =

ˆ ∞τ

∂ϕ

∂s(s, t)dt.

Lemma 6.39. Suppose f is piecewise continuous on [0,∞) and of exponential order α. Ifn ≥ 0 is an integer, then L[tnf(t)](s) exists for all s > α, and also converges uniformly on anycompact I ⊆ (α,∞).

Proof. Fix integer n ≥ 0. As Example 6.19 shows, the function tnf(t) is of exponential orderα + ε for any ε > 0, and thus Theorem 6.23 implies that

L[tnf(t)](s) =

ˆ ∞0

e−sttnf(t)dt (6.19)

exists for all s > α + ε. However, because ε > 0 is arbitrary, this observation leads to theconclusion that (6.19) exists for all s > α.

139

Now let ϕ(s, t) = e−sttnf(t), and let [s1, s2] ⊆ (α,∞). Since f is of exponential order, thereexists C > 0 such that |f(t)| ≤ Ceαt for all t ∈ [0,∞). Now, for (s, t) ∈ [s1, s2]× [0,∞),

|ϕ(s, t)| = e−sttn|f(t)| ≤ e−sttn · Ceαt = Ctne−(s−α)t ≤ Ctne−(s1−α)t := M(t),

and since M : [0,∞)→ R is a continuous function andˆ ∞0

M(t)dt = CL[tn](s1 − α) =n!C

(s1 − α)n+1

by Example 6.34 (so the integral´∞0M converges), the Weierstrass M-Test implies that

L[tnf(t)](s) =

ˆ ∞0

ϕ(s, t)dt

converges uniformly on [s1, s2]. Therefore L[tnf(t)](s) converges uniformly on any compactI ⊆ (α,∞). �

Finally we are ready to state and prove another property of the Laplace transform. Thesymbol L[f ](n) in the next theorem, as ought to be expected, denotes the nth derivative of L[f ].That L[f ](n)(s) exists for all s ∈ (α,∞) is implicitly part of the theorem’s conclusion.

Theorem 6.40. Suppose f is piecewise continuous on [0,∞) and of exponential order α. Ifn ≥ 0 is an integer, then

L[tnf(t)](s) = (−1)nL[f(t)](n)(s) (6.20)

for all s > α and integers n ≥ 0.

Proof. Lemma 6.39 implies that L[tkf(t)](s) exists for all s > α and k ≥ 0. Also, since

∂k

∂sk[e−stf(t)

]= (−1)ke−sttkf(t)

is continuous on (α,∞)× [0,∞) for any k ≥ 0,ˆ ∞0

∂k

∂sk[e−stf(t)

]dt =

ˆ ∞0

(−1)ke−sttkf(t)dt = (−1)kL[tkf(t)](s)

converges for all s > α by Lemma 6.39, and by the same lemmaˆ ∞0

∂k+1

∂sk+1

[e−stf(t)

]dt =

ˆ ∞0

(−1)k+1e−sttk+1f(t)dt = (−1)k+1L[tk+1f(t)](s)

converges uniformly on compact I ⊆ (α,∞), Theorem 6.38 implies that

d

ds

ˆ ∞0

∂k

∂sk[e−stf(t)

]dt =

ˆ ∞0

∂k+1

∂sk+1

[e−stf(t)

]dt

for all s > α and k ≥ 0. Thus, for any integer n ≥ 0, Theorem 6.38 may be employed n timesin succession (once for each application of d/ds) to obtain

L[f(t)](n)(s) =dn

dsn

ˆ ∞0

e−stf(t)dt =

ˆ ∞0

∂n

∂sn[e−stf(t)]dt

=

ˆ ∞0

(−1)ntne−stf(t)dt = (−1)nˆ ∞0

e−sttnf(t)dt

140

= (−1)nL[tnf(t)](s)

for all s > α, which yields (6.20) upon dividing by (−1)n. �

Example 6.41. With Theorem 6.40 we can easily determine L[t sin bt] for b 6= 0, therebyaffirming another item in Table 1. Since

L[sin bt](s) =b

s2 + b2

for s > 0, the theorem informs us that

L[t sin bt](s) = −L[sin bt]′(s) = − d

ds

(b

s2 + b2

)=

2bs

(s2 + b2)2

for s > 0. The treatment of L[t cos bt] is similar. �

Example 6.42. Recalling that L[eat](s) = 1/(s− a) for s > a by Example 6.24, for fixed n ∈ Nwe find by Theorem 6.40 that

L[tneat](s) = (−1)nL[eat](n)(s) = (−1)ndn

dsn

(1

s− a

)= (−1)n · (−1)nn!

(s− a)n=

n!

(s− a)n

for s > a. This verifies the last item in Table 1. �

141

6.5 – The Inverse Laplace Transform

For this section we recall the following notation: if A and B are sets, then A \B denotes theset of elements in A that are not in B. That is,

A \B = {x ∈ A : x /∈ B}.

Some texts may write A \B as A−B.Given a function ϕ : X → Y , we call the set X the domain of ϕ as usual, while the set Y we

call the codomain. The Laplace transform L is of course a function, and we already decided in§6.3 that the domain of L should be restricted to the collection E of functions [0,∞)→ R thatare piecewise continuous and of exponential order. But what of the codomain of L? We coulddeclare the codomain to be

L(E) = {L[f ] : f ∈ E}, (6.21)

the image of E under L, in which case L is perforce an onto function. However, while L(E)is a well-defined collection of functions, its set-theoretic definition (6.21) is not particularlyilluminating. It would be more satisfying if we possessed some tools to help determine whethera function F belongs to L(E). That is, given a function F , we would like some means ofascertaining whether there exists some f ∈ E such that L[f ] = F .

Theorem 6.23 makes clear that for any f ∈ E the transform L[f ](s) exists as a functiondefined on an interval of the form (α,∞) for some α ∈ [−∞,∞). This immediately gives us ourfirst tool: if F is a function whose domain does not contain an interval of the form (α,∞), thenwe may confidently conclude that F /∈ L(E). At this juncture a natural question arises: DoesL(E) perhaps equal the collection of all functions defined on any interval of the form (α,∞)?The following theorem provides us with a second tool which easily answers this question in thenegative.

Theorem 6.43. For any f ∈ E,lims→∞L[f ](s) = 0.

Proof. For f ∈ E there exists α ∈ R and C > 0 and τ ≥ 0 such that |f(t)| ≤ Ceαt for all t ≥ τ .However, because f is bounded on [0, τ ] on account of piecewise continuity, there exists M > 0sufficiently large that |f(t)| ≤Meαt for all t ≥ 0. Now, for any s > α,

|L[f ](s)| =∣∣∣∣ˆ ∞

0

e−stf(t)dt

∣∣∣∣ ≤ ˆ ∞0

e−st|f(t)|dt

≤ˆ ∞0

e−st ·Meαtdt = M

ˆ ∞0

e−(s−α)tdt

=M

s− α,

and since M/(s− α)→ 0 as s→∞, it follows by the Squeeze Theorem that |L[f ](s)| → 0 ass→∞, and therefore L[f ](s)→ 0. �

Thus if lims→∞ F (s) 6= 0 then there cannot exist any f ∈ E such that L[f ] = F , and thereforeF /∈ L(E). Now we see that such simple functions as s,

√s, and ln(s) are not members of L(E),

142

despite them all having domains containing intervals of the form (α,∞). What this means, ofcourse, is that we have thus far found no “nicer” way of describing the set L(E) aside from(6.21). If F (s) → 0 as s → ∞, Theorem 6.43 has nothing to say concerning whether F is amember of L(E). Nevertheless L : E → L(E) is onto, and we turn next to the question ofwhether L is one-to-one.

In fact L is not one-to-one on E . To see this we need only consider the functions f1(t) = 1and

f2(t) =

{0, if t = 0

1, if t > 0

Clearly f1 6= f2, but both functions belong to E , both L[f1] and L[f2] have domain (0,∞), andL[f1](s) = L[f2](s) = 1/s for all s > 0. That is, L[f1] = L[f2] for f1 6= f2, and one-to-onenessfails. If we’re interested in the transformation L having an inverse L−1, we appear to havearrived at an impasse. Or have we? It has to be admitted that f1 and f2 are very nearlyidentical. In the parlance of measure theory they are equal “almost everywhere,” a term weshall adapt to our own purposes here.

Given a function f ∈ E , let ∆f denote the set of values t ∈ [0,∞) where f is discontinuous.Definition 6.8 makes clear that ∆f ∩ [t1, t2] is a finite set for all 0 ≤ t1 < t2 < ∞, and soin particular ∆f is always a discrete set. If g ∈ E also, we say f = g almost everywhere(abbreviated a.e.) if f(t) = g(t) for all t ∈ [0,∞) \ (∆f ∪∆g); that is, f and g agree in valueeverywhere except possibly at their points of discontinuity.

Example 6.44. Referring to the functions f1 and f2 defined above, we have ∆f1 = ∅ (theempty set) and ∆f2 = {0}. Thus

[0,∞) \ (∆f1 ∪∆f2) = [0,∞) \ {0} = (0,∞),

and since f1(t) = 1 = f2(t) for all t ∈ (0,∞), we conclude that f1 = f2 a.e. �

Proposition 6.45. The “equal almost everywhere” relation is an equivalence relation.

Proof. Let f, g, h ∈ E . The reflexive property, which stipulates that f = f a.e., clearly holds,as does the symmetric property which states that f = g a.e. implies g = f a.e. Left to verify isthe transitive property.

Suppose f = g a.e. and g = h a.e. Fix t ∈ [0,∞) \ (∆f ∪∆h). If t /∈ ∆g, then f , g, and h areall continuous at t, so that f(t) = g(t) and g(t) = h(t), and hence f(t) = h(t). Suppose t ∈ ∆g.Then f and g are still continuous at t, so that limτ→t f(τ) = f(t) and limτ→t h(τ) = h(t). Since∆f , ∆g, and ∆h are all discrete sets, there is an open neighborhood U of t such that f , g, and hare all continuous on U \ {t}, and thus f(τ) = g(τ) = h(τ) for all τ ∈ U \ {t}. Now,

f(t) = limτ→t

f(τ) = limτ→t

h(τ) = h(t),

and we conclude that f(t) = h(t) for all t ∈ [0,∞) \ (∆f ∪∆h). Therefore f = h a.e., whichconfirms transitivity. �

Since being equal almost everywhere is an equivalence relation, we may meaningfully definethe equivalence class of all functions in E that are equal to f ∈ E almost everywhere:

[f ] = {ϕ ∈ E : f = ϕ a.e.}.

143

Left as an exercise is to show that if ϕ ∈ [f ], so that f = ϕ almost everywhere, then [f ] = [ϕ].It is in this sense that functions that are equal almost everywhere are considered to be the samefunction: namely, their equivalence classes are the same.

We will show presently that L becomes a one-to-one function if we adopt the view thatthe symbol L[f ] is short-hand for L([f ]), and formally regard the domain of L to consistof equivalence classes of functions in E rather than the functions themselves. As a concreteillustration of how this approach works, recall that for the functions f1, f2 ∈ E defined abovewe found that L[f1] = L[f2], but because the two functions are equal almost everywhere (theydisagree in value only at 0) we have [f1] = [f2] and so one-to-oneness is not violated.

Let [E ] denote the set of equivalence classes of functions in E . One additional theorem isrequired to make our arguments that L is one-to-one on [E ] complete. We omit the proof.14

Theorem 6.46 (Lerch’s Theorem). If f, g ∈ E and there exists some σ ∈ R such thatL[f ](s) = L[g](s) for all s > σ, then f = g a.e.

Because [ϕ] = [f ] for any ϕ ∈ [f ], there is concern about whether L[ϕ] = L[f ] if ϕ ∈ [f ] andϕ 6= f . We say L is well-defined on [E ] if L[ϕ] = L[f ] whenever [ϕ] = [f ].

The proof of the following theorem depends on the general property that´∞af =´∞ag if

f(t) = g(t) for all t ≥ a except on a discrete set D ⊆ [a,∞). This is because D ∩ [a, T ] is a

finite set for any T ≥ a, so that´ Taf =

´ Tag using the property cited in §6.2 that includes

Equation (6.3), and thereforeˆ ∞a

f = limT→∞

ˆ T

a

f = limT→∞

ˆ T

a

g =

ˆ ∞a

g.

Theorem 6.47. The Laplace transform L is well-defined and one-to-one on [E ].

Proof. Let f ∈ E , and suppose ϕ ∈ [f ] so that ϕ = f on [0,∞) \ (∆ϕ ∪∆f ). Since ϕ ∈ E also,there exists some α ∈ R such that f and ϕ are of exponential order α, and thus L[f ](s) andL[ϕ](s) exist for all s > α by Theorem 6.23. Now, f = ϕ a.e. implies that f(t)e−st = ϕ(t)e−st

for all t ≥ 0 with t /∈ ∆ϕ ∪∆f , and since ∆ϕ ∪∆f is a discrete set it follows that

L[f ](s) =

ˆ ∞0

f(t)e−stdt =

ˆ ∞0

ϕ(t)e−stdt = L[ϕ](s)

for all s > α. Therefore L is well-defined on [E ].Next fix f, g ∈ E , and let α ∈ R be sufficiently large that both f and g are of exponential

order α. Suppose L[f ] = L[g] on (α,∞). Then Lerch’s Theorem implies that f = g a.e., andhence [f ] = [g]. Therefore L is one-to-one on [E ]. �

Since L : [E ] → L([E ]) is one-to-one and onto, there exists an inverse transformationL−1 : L([E ])→ [E ] which is naturally designated the inverse Laplace transform. This inverseworks in the usual way, with L[f ] = F if and only if L−1[F ] = [f ] for any [f ] ∈ [E ]. In practice,however, we don’t deal in equivalence classes, and instead write L−1[F ] = f , where f ∈ E is the“nicest” member of its class (i.e. the member with the fewest discontinuities).

14A proof is given in Appendix II of An Introduction to Linear Analysis by Kreider, Kuller, Ostberg, andPerkins, Addison-Wesley, Reading, Mass., 1966.

144

It is known from linear algebra that the inverse of a linear transformation is also linear, andso the properties

L−1[F +G] = L−1[F ] + L−1[G] and L−1[cF ] = cL−1[F ]

both hold for any F,G ∈ L([E ]) and constant c.

Example 6.48. We can use Table 1 to find a variety of inverse Laplace transforms. For instance,since L[e−3t](s) = 1/(s+ 3), we have

L−1[

1

s+ 3

](t) = e−3t.

With linearity properties, we find that

L−1[s+ 1

s2 + 4

](t) = L−1

[s

s2 + 4

](t) + L−1

[4

s2 + 4

](t)

= L−1[

s

s2 + 22

](t) + 2L−1

[2

s2 + 22

](t)

= cos 2t+ 2 sin 2t

using Table 1. �

Table 1 can in fact be used to find the inverse Laplace transform of a variety of rationalfunctions p(s)/q(s). To maximize our mileage necessitates application of the partial fractiondecomposition procedure introduced in calculus.

Theorem 6.49 (Partial Fraction Decomposition). Let p(s) and q(s) be polynomial functionssuch that deg(p) < deg(q), and suppose q(s) factors over R as a product of polynomials of degreeat most 2. Then one of the following holds.

1. q(s) has form

q(s) = (a1s+ b1)(a2s+ b2) · · · (ans+ bn)

ais + bi 6= ajs + bj whenever i 6= j. So q1(s) is a product of distinct linear factors. Thenthere are constants A1, . . . , An such that

p(s)

q(s)=

A1

a1s+ b1+

A2

a2s+ b2+ · · ·+ An

ans+ bn. (6.22)

2. q(s) has form

q(s) = (as+ b)n

for some integer n ≥ 2. So q2(s) is a product of repeated linear factors. Then there areconstants B1, . . . , Bn such that

p(s)

q(s)=

B1

as+ b+

B2

(as+ b)2+ · · ·+ Bn

(as+ b)n. (6.23)

3. q(s) has form

q(s) = (a1s2 + b1s+ c1) · · · (ans2 + bns+ cn),

145

with b2i − 4aici < 0 for each i, and ais2 + bis + ci 6= ajs

2 + bjs + cj if i 6= j. So q3(s) is aproduct of distinct irreducible quadratic factors. Then there are constants C1, . . . , Cn andD1, . . . , Dn such that

p(s)

q(s)=

C1s+D1

a1s2 + b1s+ c1+

C2s+D2

a2s2 + b2s+ c2+ · · ·+ Cns+Dn

ans2 + bns+ cn. (6.24)

4. q(s) has formq(s) = (as2 + bs+ c)n

with b2 − 4ac < 0 and n ≥ 2. So q4(s) is a product of repeated irreducible quadratic factors.Then there are constants C1, . . . , Cn and D1, . . . , Dn such that

p(s)

q(s)=

C1s+D1

as2 + bs+ c+

C2s+D2

(as2 + bs+ c)2+ · · ·+ Cns+Dn

(as2 + bs+ c)n. (6.25)

Example 6.50. Find L−1[F ], where

F (s) =6s2 + 5s− 3

s3 + 2s2 − 3s

Solution. Factoring the denominator yields s(s + 3)(s − 1), which are three distinct linearfactors and so Case (1) of Theorem 6.49 applies here:

F (s) =6s2 + 5s− 3

s(s+ 3)(s− 1)=A1

s+

A2

s+ 3+

A3

s− 1.

Multiplying both sides by s(s+ 3)(s− 1), we obtain

6s2 + 5s− 3 = A1(s+ 3)(s− 1) + A2s(s− 1) + A3s(s+ 3)

= (A1s2 + 2A1s− 3A1) + (A2s

2 − A2s) + (A3s2 + 3A3s)

= (A1 + A2 + A3)s2 + (2A1 − A2 + 3A3)s− 3A1

Equating coefficients of s2, coefficients of s, and constant terms, we obtain a system ofequations, {

A1 + A2 + A3 = 62A1 − A2 + 3A3 = 53A1 = 3

From the third equation we obtain A1 = 1. Putting this into the first equation yields1 + A2 + A3 = 6, and so A2 = 5− A3. Now from the second equation we have

2(1)− (5− A3) + 3A3 = 5 ⇒ 4A3 − 3 = 5 ⇒ A3 = 2,

and thus A2 = 5− A3 = 3. We now have

F (s) =1

s+

3

s+ 3+

2

s− 1,

and so

L−1[F ](t) = L−1[

1

s+

3

s+ 3+

2

s− 1

](t)

146

= L−1[

1

s

](t) + L−1

[3

s+ 3

](t) + L−1

[2

s− 1

](t)

= 1 + 3e−3t + 2et,

using the linearity properties of L−1. �


F (s) =s2

(s+ 1)3

Solution. Here we have a repeated linear factor, and so in accordance with Case (2) of Theorem6.49 we obtain

s2

(s+ 1)3=

B1

s+ 1+

B2

(s+ 1)2+

B3

(s+ 1)3.

Multiplying both sides by (s+ 1)3 yields

s2 = B1(s+ 1)2 +B2(s+ 1) +B3,

whence we obtain

s2 = B1s2 + (2B1 +B2)s+ (B1 +B2 +B3).

Equating coefficients of matching powers of s produces the system of equations{B1 = 1

2B1 + B2 = 0B1 + B2 + B3 = 0

Putting B1 = 1 from the first equation into the second equation gives 2 +B2 = 0, or B2 = −2.Now the third equation becomes 1− 2 +B3 = 0, or B3 = 1. Hence

L−1[F ](t) = L−1[

1

s+ 1− 2

(s+ 1)2+

1

(s+ 1)3

](t)

= L−1[

0!

s+ 1− 2 · 1!

(s+ 1)2+

1

2· 2!

(s+ 1)3

](t)

= L−1[

0!

s+ 1

](t)− 2L−1

[1!

(s+ 1)2

](t) +

1

2L−1

[2!

(s+ 1)3

](t)

= e−t − 2te−t +1

2t2e−t,



F (s) =5s2 + 3s− 2

s4 + s3 − 2s2

147

Solution. Factoring the denominator gives

F (s) =5s2 + 3s− 2

s2(s+ 2)(s− 1),

so s + 2 and s − 1 are distinct linear factors, and s is a repeated factor. According to (5) ofTheorem 6.49 we have

5s2 + 3s− 2

s2(s+ 2)(s− 1)=

P1(s)

(s+ 2)(s− 1)+P2(s)

s2=

(A1

s+ 2+

A2

s− 1

)+

(B1

s+B2

s2

),

employing the prescribed decompositions for Cases (1) and (2) of Theorem 6.49. Multiplyingthe left and right sides of the equation by s2(s+ 2)(s− 1) yields

5s2 + 3s− 2 = A1s2(s− 1) + A2s

2(s+ 2) +B1s(s+ 2)(s− 1) +B2(s+ 2)(s− 1),

and thus

5s2 + 3s− 2 = (A1 + A2 +B1)s3 + (−A1 + 2A2 +B1 +B2)s

2 + (−2B1 +B2)s− 2B2.

Equating coefficients of matching powers of s produces the system of equationsA1 + A2 + B1 = 0−A1 + 2A2 + B1 + B2 = 5

−2B1 + B2 = 32B2 = 2

The solution to the system is A1 = −1, A2 = 2, B1 = −1, B2 = 1. Hence,

L−1[F ](t) = L−1[− 1

s+ 2+

2

s− 1− 1

s+

1

s2

](t)

= −L−1[

1

s+ 2

](t) + 2L−1

[1

s− 1

](t)− L−1

[1

s

](t) + L−1

[1

s2

](t)

= −e−2t + 2et − 1 + t,


148

6.6 – The Method of Laplace Transforms

According to Theorem 4.10, an initial-value problem of the form

n∑k=0

aky(k)(t) = f(t), y(0) = ξ0, y

′(0) = ξ1, . . . , y(n−1)(0) = ξn−1, (6.26)

where each coefficient ak is a constant and an 6= 0, has a unique solution that is valid on aninterval I provided the nonhomogeneity f(t) is continuous on I. The interval I is often takento be open, but this is not absolutely necessary if we are willing to countenance one-sidedderivatives at endpoints.

The Method of Laplace Transforms is, generally speaking, a procedure that uses theLaplace transform to determine a solution y(t) to an IVP such as (6.26) that is valid on [0,∞).Whether such a solution, once found, can subsequently be extended to a larger interval ofvalidity such as (−∞,∞) depends on the particular properties of f(t). If an IVP has the formgiven in Theorem 4.10 for t0 6= 0, a change of variables can always be done to obtain the form(6.26), as in Example 6.54. Finally, as demonstrated in Example 6.56, it is possible to solvesome initial-value problems in which the coefficients are not all constants.

More specifically the Method of Laplace Transforms proceeds by taking the Laplace transformof both sides of the differential equation in an IVP, which entails using Theorem 6.33 to takethe transform of y, y′, y′′, and so on. Of course, the theorem assumes that y and its variousderivatives are of exponential order, along with sundry continuity conditions. The method,therefore, is very much an exercise in letting the ends justify the means: assuming y and itsrelevant derivatives satisfy the hypotheses of Theorem 6.33, we proceed to solve for y usingTheorem 6.33, and then look to see that the expression for y(t) thereby obtained satisfies thosehypotheses.

A summary of the method is adequately presented by considering the case of a second-orderdifferential equation. Letting n = 2 in (6.26), and supposing that y(t) is a solution, we have

a2y′′(t) + a1y

′(t) + a0y(t) = f(t)

for all t ∈ [0,∞). Hence

L[a2y′′(t) + a1y

′(t) + a0y(t)](s) = L[f(t)](s)

for all s on some s-domain (α,∞). The linearity properties of L given by Proposition 6.28 thenyield

a2L[y′′(t)](s) + a1L[y′(t)](s) + a0L[y(t)](s) = L[f(t)](s).

Assuming t0 = 0 so that the initial conditions are y(0) = ξ0 and y′(0) = ξ1, and lettingY (s) = L[y(t)](s) and F (s) = L[f(t)](s), by equations (6.15) and (6.16) we obtain

a2[s2Y (s)− sy(0)− y′(0)] + a1[sY (s)− y(0)] + a0Y (s) = F (s),

and thus

Y (s) =F (s) + (a2s+ a1)y(0) + a2y

′(0)

a2s2 + a1s+ a0.

149

Since y(t) = L−1[Y (s)](t), the solution to the IVP is found as

y(t) = L−1[F (s) + (a2s+ a1)ξ0 + a2ξ1

a2s2 + a1s+ a0

](t).

We now consider a variety of explicit examples.


y′′ − 4y′ + 5y = 4e3t, y(0) = 2, y′(0) = 7

Solution. Taking the Laplace transform of both sides of the ODE gives

L[y′′]− 4L[y′] + 5L[y] = L[4e3t]. (6.27)

Letting Y (s) = L[y](s), we use equations (6.15) and (6.16), and Table 1, to obtain

L[y′](s) = sL[y](s)− y(0) = sY − 2,

L[y′′](s) = s2L[y](s)− sy(0)− y′(0) = s2Y − 2s− 7,

and

L[4e3t

](s) =

4

s− 3.

Putting these results into (6.27) gives

(s2Y − 2s− 7)− 4(sY − 2) + 5Y =4

s− 3,

from which we get

(s2 − 4s+ 5)Y =4

s− 3+ 2s− 1,

and finally

Y (s) =2s2 − 7s+ 7

(s− 3)(s2 − 4s+ 5).

The next step is to apply partial fraction decomposition: we must determine constants A, B,and C so that

2s2 − 7s+ 7

(s− 3)(s2 − 4s+ 5)=

A

s− 3+

Bs+ C

s2 − 4s+ 5.

Multiplying both sides by (s− 3)(s2 − 4s+ 5) yields

A(s2 − 4s+ 5) + (Bs+ C)(s− 3) = 2s2 − 7s+ 7,

which we can rearrange to obtain

(A+B)s2 + (−4A− 3B + C)s+ (5A− 3C) = 2s2 − 7s+ 7.

Equating coefficients, we arrive at the system of equations{A + B = 2

−4A − 3B + C = −75A − 3C = 7

150

The solution to the system is A = 2, B = 0, and C = 1. Thus,

L[y](s) = Y (s) =2s2 − 7s+ 7

(s− 3)(s2 − 4s+ 5)=

2

s− 3+

1

s2 − 4s+ 5,

and so

y(t) = L−1[

2

s− 3+

1

s2 − 4s+ 5

](t) = 2L−1

[1

s− 3

](t) + L−1

[1

s2 − 4s+ 5

](t).

Using Table 1, then, we at last obtain

y(t) = 2e3t + e2t sin t



y′′ + y = t, y(π) = 0, y′(π) = 0.

Solution. Equations (6.15) and (6.16) require initial conditions at t = 0, whereas here wehave initial conditions at t = π. However, if we let w(t) = y(t + π), then w′(t) = y′(t + π),w′′(t) = y′′(t+ π), and also

w(0) = y(π) = 0 and w′(0) = y′(π) = 0.

In the ODEy′′(t) + y(t) = t

we substitute t+ π for t to obtain

y′′(t+ π) + y(t+ π) = t+ π,

and so arrive at the IVP

w′′(t) + w(t) = t+ π, w(0) = 0, w′(0) = 0.

We solve this IVP by the Method of Laplace Transforms as usual: letting W (s) = L[w(t)](s),we have

[s2W (s)− sw(0)− w′(0)] +W (s) = L[t](s) + πL[1](s),

which implies that

s2W (s) +W (s) =1

s2+π

s,

and finally

W (s) =1

s2(s2 + 1)+

π

s(s2 + 1)=

1 + πs

s2(s2 + 1).

We must find constants A, B, C, and D such that

1 + πs

s2(s2 + 1)=A

s+B

s2+Cs+D

s2 + 1,

or equivalently1 + πs = (A+ C)s3 + (B +D)s2 + As+B.

151

Clearly we must have A = π, B = 1, B + D = 0, and A + C = 0. The unique solution is(A,B,C,D) = (π, 1,−π,−1), and so

W (s) =π

s+

1

s2− πs+ 1

s2 + 1.

Taking the inverse Laplace transform of both sides yields

w(t) = πL−1[

1

s

](t) + L−1

[1

s2

](t)− πL−1

[s

s2 + 1

](t)− L−1

[1

s2 + 1

](t)

= π + t− π cos(t)− sin(t),

and hencey(t+ π) = π + t− π cos(t)− sin(t).

Substituting t− π for t leads to

y(t) = π + (t− π)− π cos(t− π)− sin(t− π).

We simplify to obtainy(t) = t+ π cos(t) + sin(t)

as the solution to the original IVP. �

The Method of Laplace Transforms applies just as well to solving initial value problems ofthe form

any(n) + an−1y

(n−1) + · · ·+ a2y′′ + a1y

′ + a0y = f(t), y(0) = b0, . . . , y(n−1)(0) = bn−1

for n > 2 (or even n = 1).


y′′′ − y′′ + y′ − y = 0, y(0) = 1, y′(0) = 1, y′′(0) = 3.

Solution. Taking the Laplace transform of both sides of the ODE and using Theorem 6.33yields(

s3L[y](s)− s2y(0)− sy′(0)− y′′(0))−(s2L[y](s)− sy(0)− y′(0)

)+(sL[y](s)− y(0)

)− L[y](s) = L[0](s).

Letting Y (s) = L[y(t)](s) and noting that L[0](s) = 0, we use the initial conditions to obtain

[s3Y (s)− s2 − s− 3]− [s2Y (s)− s− 1] + [sY (s)− 1]− Y (s) = 0,

and thus

Y (s) =s2 + 3

s3 − s2 + s− 1=

s2 + 3

(s− 1)(s2 + 1).

The partial fraction decomposition of the rational expression on the right-hand has the form

s2 + 3

(s− 1)(s2 + 1)=

A

s− 1+Bs+ C

s2 + 1,

152

whence

s2 + 3 = A(s2 + 1) + (Bs+ C)(s− 1) = (A+B)s2 + (C −B)s+ (A− C).

This gives rise to the system {A + B = 1− B + C = 0

A − C = 3

which has solution (A,B,C) = (2,−1,−1), and so

Y (s) =2

s− 1− s+ 1

s2 + 1=

2

s− 1− s

s2 + 1− 1

s2 + 1.

Finally,

y(t) = 2L−1[

1

s− 1

](t)− L−1

[s

s2 + 1

](t)− L−1

[1

s2 + 1

](t)

leads to

y(t) = 2et − cos t− sin t


The Laplace transform can actually be employed to solve initial value problems that cannotbe solved using the Method of Undetermined Coefficients. In particular there are initial valueproblems for which the ODE has either a piecewise-defined nonhomogeneity as illustrated inthe next section, or nonconstant coefficients as illustrated in the next example.


ty′′ − ty′ + y = 2, y(0) = 2, y′(0) = −1

Solution. Taking the Laplace transform of both sides of the ODE gives

L[ty′′]− L[ty′] + L[y] = L[2]. (6.28)

Letting Y (s) = L[y](s), we use equations (6.15) and (6.16) to obtain

L[y′](s) = sL[y](s)− y(0) = sY − 2 := Z1

and

L[y′′](s) = s2L[y](s)− sy(0)− y′(0) = s2Y − 2s+ 1 := Z2.

Now, by Theorem 6.40,

L[ty′](s) = (−1)1Z ′1 = −(sY − 2)′ = −sY ′ − Yand

L[ty′′](s) = (−1)1Z ′2 = −(s2Y − 2s+ 1)′ = −s2Y ′ − 2sY + 2.

Putting these results into (6.28) yields

(−s2Y ′ − 2sY + 2)− (−sY ′ − Y ) + Y =2

s.

153

With a little algebra the equation becomes

(s− s2)Y ′ + (2− 2s)Y =2

s− 2,

which is a linear first-order differential equation. Dividing by s− s2 puts it into standard form:

Y ′ +2

sY =

2

s2.

A suitable integrating factor is given by

µ(s) = e´2/s ds = s2.

Multiplying the equation by s2 yields s2Y ′ + 2sY = 2, whence (s2Y )′ = 2 and so

s2Y = 2s+ c

for arbitrary constant c.We now have

L[y](s) = Y (s) =2

s+

c

s2,

and so

y(t) = L−1[Y ](t) = 2L−1[

1

s

](t) + cL−1

[1

s2

](t) = 2 + ct.

Thus y′(t) = c, and to determine c we must, oddly enough, make use of the initial conditiony′(0) = −1 again to obtain c = −1. Therefore

y(t) = 2− t

is the solution to the IVP. And there was much rejoicing throughout the kingdom. �

154

6.7 – Piecewise-Defined Nonhomogeneities

We now employ the Laplace transform method to solve initial-value problems of the form(6.26) having a nonhomogeneity f(t) that, while necessarily piecewise-defined, nonetheless admitexpression in terms of the unit step function u(t). Such a nonhomogeneity may or may not becontinuous on [0,∞). If f(t) is not continuous on [0,∞), however, we can no longer rely onTheorem 4.10 to guarantee that a solution to the IVP is unique. Nonetheless the Method ofLaplace Transforms may still be employed in an attempt to find a solution.

Proposition 6.57. Let f : [0,∞)→ R be a function. If a ≥ 0 and L[f(t)](s) exists for s > α,then

L[f(t− a)u(t− a)](s) = e−asL[f(t)](s) (6.29)

for s > α.

Proof. Suppose that a ≥ 0 and L[f(t)](s) exists for all s > α. Thus, for any real-valuedquantity c that is independent of t, we haveˆ ∞

0

e−st · cf(t) dt = c

ˆ ∞0

e−stf(t) dt

for any s > α, a result we shall make use of presently.By Definition 6.22, and making the substitution τ = t− a, we obtain

L[f(t− a)u(t− a)](s) =

ˆ ∞0

e−stf(t− a)u(t− a) dt = limb→∞

ˆ b

0

e−stf(t− a)u(t− a) dt

= limb→∞

ˆ b

a

e−stf(t− a) dt = limb→∞

ˆ b−a

0

e−s(τ+a)f(τ) dτ

= limb→∞

ˆ b−a

0

e−sτ · e−saf(τ) dτ =

ˆ ∞0

e−sτ · e−saf(τ) dτ

=

ˆ ∞0

e−st · e−saf(t) dt = e−saˆ ∞0

e−stf(t) dt

= e−asL[f(t)](s)

for any s > α. �

In particular if we suppose that a > 0 and f(t) ≡ 1, then from (6.29) we obtain

L[u(t− a)](s) = e−asL[1](s) =e−as

s.

for all s > 0, which may also be obtained easily enough from Definition 6.22.From the proposition above the following quite similar result obtains, which is needed often

in applications.

Corollary 6.58. If a > 0 and g : [a,∞)→ R is a function for which L[g(t+ a)](s) exists fors > α, then

L[g(t)u(t− a)](s) = e−asL[g(t+ a)](s) (6.30)

for s > α.

155

Proof. Define f : [0,∞)→ R by f(t) = g(t+ a), so that g(t) = f(t− a) for t ≥ a > 0. Now,for s > α,

L[g(t)u(t− a)](s) = L[f(t− a)u(t− a)](s) = e−asL[f(t)](s) = e−asL[g(t+ a)](s),

where the second equality follows from Proposition 6.57. �

Example 6.59. Determine the Laplace transform of f(t) = 5t3u(t− 6).

Solution. Here we have g(t)u(t− a) with g(t) = 5t3 and a = 6. Thus

g(t+ a) = g(t+ 6) = 5(t+ 6)3,

and using (6.30) in the corollary gives

L[5t3u(t− 6)

](s) = e−6sL

[5(t+ 6)3

](s) = 5e−6sL

[t3 + 18t2 + 108t+ 216

](s)

Now we have

L[5t3u(t− 6)

](s) = 5e−6s

(L[t3](s) + 18L

[t2](s) + 108L[t](s) + 216L[1](s)

)= 5e−6s

(3!

s4+ 18 · 2!

s3+ 108 · 1!

s2+ 216 · 1

s

)=

(30

s4+

180

s3+

540

s2+

1080

s

)e−6s,

using linearity and Table 1. �

Example 6.60. Determine the inverse Laplace transform of

G(s) =e−2s

s2 + 9.

Solution. Let f(t) be the function for which L[f(t)](s) = 1/(s2 + 9). Setting a = 2 in (6.29)gives

L[f(t− 2)u(t− 2)](s) = e−2sF (s) = G(s). (6.31)

From Table 1 we find that

L[

1

3sin 3t

](s) =

1

3· 3

s2 + 9=

1

s2 + 9= L[f(t)](s),

so f(t) = 13

sin 3t and from (6.31) we obtain

L[

1

3sin(3t− 6)u(t− 2)

](s) = G(s).

Therefore

L−1[e−2s

s2 + 9

](t) =

1

3sin(3t− 6)u(t− 2).

�

156

Example 6.61. Find the Laplace transform of the function f given by

f(t) =

3t2, if t < −2

0, if −2 ≤ t < 1

2t, if 1 ≤ t < 3

t sin t, if t ≥ 3

Solution. In terms of the unit step function we find that

f(t) = 3t2 − 3t2u(t+ 2) + 2tu(t− 1) + (t sin t− 2t)u(t− 3).

Then

L[f ](s) = L[3t2 − 3t2u(t+ 2) + 2tu(t− 1) + (t sin t− 2t)u(t− 3)

](s)

= 3L[t2](s)− 3L

[t2u(t+ 2)

](s) + 2L[tu(t− 1)](s) + L[t sin t · u(t− 3)](s)

− 2L[tu(t− 3)](s).

Now Table 1 and (6.30) yields

L[f ](s) = 3 · 2!

s2+1− 3e2sL

[(t− 2)2

](s) + 2e−sL[t+ 1](s) + e−3sL[(t+ 3) sin(t+ 3)](s)

− 2e−3sL[t+ 3](s).

To determine L[(t− 2)2](s) we simply expand the polynomial,

L[(t− 2)2

](s) = L

[t2 − 4t+ 4

](s) =

2

s3− 4

s2+

4

s.

As for L[(t+ 3) sin(t+ 3)](s), the trigonometric identity sin(u+ v) = sinu cos v+ cosu sin v willprove useful, giving

L[(t+ 3) sin(t+ 3)](s) = L[(t+ 3)(sin t cos 3 + cos t sin 3)](s)

= (cos 3)L[t sin t](s) + (sin 3)L[t cos t](s) + (3 cos 3)L[sin t](s)

+ (3 sin 3)L[cos t](s)

=2s cos 3

(s2 + 1)2+

(s2 − 1) sin 3

(s2 + 1)2+

3 cos 3

s2 + 1+

3s sin 3

s2 + 1.

Gathering all our results, we have

L[f ](s) =6

s3− 3e2s

(2

s3− 4

s2+

4

s

)+ 2e−s

(1

s2+

1

s

)+

2s cos 3 + (s2 − 1) sin 3

(s2 + 1)2

+3 cos 3 + 3s sin 3

s2 + 1− 2e−3s

(1

s2+

3

s

),

certainly no trivial expression! �

157

Example 6.62. Solve the initial value problem y′′ − y = f(t), y(0) = 1, y′(0) = 2, where f isgiven by

f(t) =

{1, if 0 ≤ t < 3

t, if t ≥ 3

Solution. We start by expressing f in terms of u:

f(t) = 1 + (−1 + t)u(t− 3). (6.32)

Now we have y′′ − y = 1 − u(t − 3) + tu(t − 3). Taking the Laplace transform of each side,linearity properties yield

L[y′′](s)− L[y](s) = L[1](s)− L[u(t− 3)](s) + L[tu(t− 3)](s).

Now, letting Y (s) = L[y](s), by (6.16) and (6.30) we have

s2Y (s)− sy(0)− y′(0)− Y (s) =1

s− e−3s

s+ e−3sL[t+ 3](s).

Using the given initial conditions then leads to

s2Y (s)− s− 2− Y (s) =1

s− e−3s

s+ e−3s

(1

s2+

3

s

)=

1

s+ e−3s

(1

s2+

2

s

).

So the function Y can be seen to be given by

Y (s) =1

s2 − 1

(s+ 2 +

1

s+

2s+ 1

s2e−3s

)=

s+ 2

s2 − 1+

1

s(s2 − 1)+

2s+ 1

s2(s2 − 1)e−3s.

Partial fraction decomposition on the rightmost expression yields

Y (s) =

(3/2

s− 1− 1/2

s+ 1

)+

(−1

s+

1/2

s− 1+

1/2

s+ 1

)+

(−2

s− 1

s2+

3/2

s− 1+

1/2

s+ 1

)e−3s,

which a little algebra renders as

Y (s) =2

s− 1− 1

s− 2

se−3s − 1

s2e−3s +

3/2

s− 1e−3s − 1/2

s+ 1e−3s.

t

y(t)

3

100

200

Figure 14.

158

Hence

y(t) = 2L−1[

1

s− 1

](t)− L−1

[1

s

](t)− 2L−1

[1

se−3s

](t)− L−1

[1

s2e−3s

](t)

+3

2L−1

[1

s− 1e−3s

](t)− 1

2L−1

[1

s+ 1e−3s

](t).

By Table 1 and (6.29), then,

y(t) = 2et − 1− 2u(t− 3)− (t− 3)u(t− 3) +3

2et−3u(t− 3)− 1

2e3−tu(t− 3).

Therefore

y(t) = 2et − 1 +

(1− t+

3

2et−3 − 1

2e3−t

)u(t− 3)

is the solution to the initial value problem. �

The solution to the IVP in Example 6.62 is seen to be

y(t) =

{2et − 1, 0 ≤ t < 3

2et − t+ 32et−3 + 1

2e3−t, t ≥ 3

Note that the graph of y(t), shown in Figure 14, does not exhibit any manifestly unusualproperties at t = 3 or anywhere else! In fact the smooth appearance of the graph at t = 3 shouldlead us to wonder whether y(t) is differentiable there despite being piecewise-defined. We have

y′+(3) = limt→3+

y(t)− y(3)

t− 3= lim

t→3+

(2et − t+ 3

2et−3 + 1

2e3−t

)− (2e3 − 1)

t− 3,

which has a 0/0 indeterminate form and so by L’Hopital’s Rule it follows that

y′+(3)LR= lim

t→3+

(2et − 1 +

3

2et−3 − 1

2e3−t

)= 2e3.

In similar fashion we obtain

y′−(3) = limt→3−

y(t)− y(3)

t− 3= lim

t→3−

(2et − 1)− (2e3 − 1)

t− 3LR= lim

t→3−2et = 2e3.

We see that y(t) is differentiable, and thus continuous, at t = 3 with y′(3) = 2e3, and we have

y′(t) =

{2et, 0 ≤ t < 3

2et − 1 + 32et−3 − 1

2e3−t, t ≥ 3

(Since Dom(y) = [0,∞), at t = 0 there is strictly speaking a right-hand derivative y′+(0) only.)Moreover

limt→3

y′(t) = 2e3 = y′(3)

shows that y′(t) is continuous at t = 3 as well.Now we investigate the second derivative y′′(t):

y′′+(3) = limt→3+

y′(t)− y′(3)

t− 3= lim

t→3+

(2et − 1 + 3

2et−3 − 1

2e3−t

)− 2e3

t− 3

159

LR=

(2et +

3

2et−3 +

1

2e3−t

)= 2e3 + 2

and

y′′−(3) = limt→3−

y′(t)− y′(3)

t− 3= lim

t→3−

2et − 2e3

t− 3LR= lim

t→3−2et = 2e3

Since y′′+(3) 6= y′′−(3) we conclude that y′′(3) does not exist and so y′′(t) is not differentiable att = 3. Indeed y′′(t) has a jump discontinuity of +2 in value at t = 3 precisely as f(t) on theright-hand side of the ODE does. We have

y′′(t) =

{2et, 0 ≤ t < 3

2et + 32et−3 + 1

2e3−t, t > 3

Because 3 /∈ Dom(y′′) must we accept that y(t) is not a solution to the IVP on [0,∞), but“only” on [0, 3) ∪ (3,∞)? There are a few options. One option is the route of the engineer orphysicist: t = 3 is merely an instant in time, so we refrain from considering what is happeningto the physical system modeled by the IVP during that instant. Another option is to let y′′+(3)stand in for the value of y′′(t) at t = 3, since the result does indeed satisfy the ODE:

y′′+(3)− y(3) = f(3) ⇒ (2e3 + 2)− (2e3 − 1) = 3 ⇒ 3 = 3.

A third option adopted by some textbooks (which is really the first option writ large) is to usea version of the unit step function u(t) that is not defined at t = 0, so that f(t) given by (6.32)is not defined at t = 3 and we are relieved at the outset of any expectation to come up with asolution to the IVP there.15

15This third option we do not entertain for reasons mentioned at the beginning of this section.

160

6.8 – The Convolution Theorem

Given functions f, g ∈ E , and taking fg to be the usual product given by (fg)(t) = f(t)g(t),it is not generally true that L[fg] = L[f ]L[g]; however, there is a kind of “product” of thefunctions f and g, denoted by f ∗ g, for which it is true that L[f ∗ g] = L[f ]L[g] at least onsome interval (α,∞). The product is known as convolution.

Definition 6.63. Let f, g ∈ E . The convolution of f and g is the function f ∗ g : [0,∞)→ Rgiven by

(f ∗ g)(t) =

ˆ t

0

f(t− τ)g(τ) dτ.

Example 6.64. Letting f(t) = eat and g(t) = ebt for constants a and b, with a 6= b, we have

(f ∗ g)(t) =

ˆ t

0

f(t− τ)g(τ) dτ =

ˆ t

0

ea(t−τ)ebτ dτ = eatˆ t

0

e(b−a)τ dτ

= eat[e(b−a)τ

b− a

]t0

=eat

b− a[e(b−a)t − 1].

Though something of an abuse of notation, it is common practice to write results such as theabove as

eat ∗ ebt =eat

b− a[e(b−a)t − 1].

�

As contrived as this new kind of product may appear, the next theorem furnishes all thejustification it needs to warrant serious consideration as a powerful analytical tool.

Theorem 6.65 (Convolution Theorem). If f, g are piecewise continuous on [0,∞) and ofexponential order α, then

L[f ∗ g](s) = L[f ](s)L[g](s)

for all s > α.

t

τ

R

Figure 15.

161

Proof. Suppose f, g are piecewise continuous on [0,∞) and of exponential order α. Let s > α.Recalling Fubini’s Theorem from calculus as it applies to double integrals, we have

L[f ∗ g](s) =

ˆ ∞0

e−st(f ∗ g)(t)dt =

ˆ ∞0

e−st(ˆ t

0

f(t− τ)g(τ) dτ

)dt

=

ˆ ∞0

ˆ t

0

e−stf(t− τ)g(τ) dτ dt =

¨R

e−stf(t− τ)g(τ) dA,

where R is the region in the tτ -plane defined by

R = {(t, τ) : 0 ≤ t <∞ and 0 ≤ τ ≤ t},

illustrated in Figure 15. But we also have

R = {(t, τ) : 0 ≤ τ <∞ and τ ≤ t <∞},

and so applying Fubini’s Theorem again yields

L[f ∗ g](s) =

¨R

e−stf(t− τ)g(τ) dA =

ˆ ∞0

ˆ ∞τ

e−stf(t− τ)g(τ)dtdτ

=

ˆ ∞0

g(τ)

(ˆ ∞0

e−stf(t− τ)dt

)dτ.

For the inside integral we now make the substitution u = t− τ , so that

L[f ∗ g](s) =

ˆ ∞0

g(τ)

(ˆ ∞0

e−s(u+τ)f(u) du

)dτ =

ˆ ∞0

e−sτg(τ)

(ˆ ∞0

e−suf(u) du

)dτ

=

(ˆ ∞0

e−suf(u) du

)(ˆ ∞0

e−sτg(τ) dτ

)= L[f ](s)L[g](s),

where both L[f ](s) and L[g](s) are known to exist by Theorem 6.23. �

It is a fact that f ∗ g ∈ E whenever f, g ∈ E , a property of convolution which we shallnot verify here. Other pleasant properties of the convolution operation are given in the nextproposition, with the second and fourth properties, in particular, lending credence to the practiceof referring to convolution as a product.

Proposition 6.66. If f, g, h are piecewise continuous on [0,∞) and of exponential order α,then the following hold on (α,∞).

1. f ∗ g = g ∗ f2. f ∗ (g + h) = f ∗ g + f ∗ h3. f ∗ (g ∗ h) = (f ∗ g) ∗ h4. f ∗ 0 = 0

Proof.Proof of (1). By the Convolution Theorem, for any s > α,

L[f ∗ g](s) = L[f ](s)L[g](s) = L[g](s)L[f ](s) = L[g ∗ f ](s),

and therefore f ∗ g = g ∗ f on (α,∞) since the Laplace transform is one-to-one on E .

162

Proof of (2). For s > α, noting that g + h ∈ E is of exponential order α by Proposition 6.17, weemploy Proposition 6.28 and the Convolution Theorem to obtain

L[f ∗ (g + h)](s) = L[f ](s)L[g + h](s) = L[f ](s)(L[g](s) + L[h](s)

)= L[f ](s)L[g](s) + L[f ](s)L[h](s) = L[f ∗ g](s) + L[f ∗ h](s)

=(L[f ∗ g] + L[f ∗ h]

)(s),

and therefore f ∗ (g + h) = f ∗ g + f ∗ h on (α,∞). �

Proofs of the other parts of Proposition 6.66 are left as exercises. All the properties could beproven using either Definition 6.63 or the Convolution Theorem.

If we set F (s) = L[f ](s) and G(s) = L[g](s), then

L−1[F (s)G(s)](t) = (f ∗ g)(t)

by the Convolution Theorem. We use this in the following example.

Example 6.67. Use the Convolution Theorem to find the inverse Laplace transform of

H(s) =s

(s2 + 1)2.

Solution. We have

L−1[H(s)](t) = L−1[

s

(s2 + 1)2

](t) = L−1

[s

s2 + 1· 1

s2 + 1

](t) = L−1[F (s)G(s)](t)

where

F (s) =s

s2 + 1and G(s) =

1

s2 + 1.

Letting f(t) = cos t and g(t) = sin t, we readily see that F (s) = L[f ](s) and G(s) = L[g](s),and therefore

L−1[H(s)](t) = L−1[F (s)G(s)](t) = (f ∗ g)(t) = (cos ∗ sin)(t).

That is,

L−1[H(s)](t) = (cos ∗ sin)(t) =

ˆ t

0

cos(t− τ) sin(τ) dτ,

and so using the trigonometric identity

sinx cos y =sin(x+ y) + sin(x− y)

2

we obtain

L−1[H(s)](t) =1

2

ˆ t

0

[sin t+ sin(2τ − t)] dτ =1

2

[τ sin t− 1

2cos(2τ − t)

]t0

=1

2

[(t sin t− 1

2cos t

)−(

0− 1

2cos(−t)

)]=t sin t

2.

�

163

Example 6.68. Solve the integral equation

y(t) +

ˆ t

0

(t− τ)2y(τ) dτ = t3 + 3.

Solution. We have ˆ t

0

(t− τ)2y(τ) dτ = (f ∗ g)(t)

with f(t) = t2 and g(t) = y(t), so the integral equation may be written as

y(t) + (f ∗ y)(t) = t3 + 3.

Taking the Laplace transform of both sides of the equation yields, by the Convolution Theorem,

L[y](s) + L[f ](s)L[y](s) = L[t3](s) + L[3](s),

orY (s) + Y (s)L

[t2](s) = L

[t3](s) + L[3](s)

if we let Y (s) = L[y](s). Using a table of Laplace transforms yields

Y (s) + Y (s) · 2

s3=

6

s4+

3

s⇒ Y (s)

(2 + s3

s3

)=

3(2 + s3)

s4⇒ Y (s) =

3

s,

whence

y(t) = L−1[

3

s

](t) = 3

obtains as the (unique) solution. �

164

6.9 – Impulse Functions and the Dirac Delta

Many physical phenomena are modeled by a differential equation of the form

any(n) + · · ·+ a2y

′′ + a1y′ + a0y = f(t),

where the nonhomogeneity f is such that

f(t) =

{M, t0 − ε < t < t0 + ε

0, otherwise

for some large M > 0 and small ε > 0. Such a function is called an impulse function, whichtypically is constant in value on the short interval (t0− ε, t0 + ε) where it is nonzero, although itis not a requirement. The total impulse of f , which could represent a force, voltage, or someother physical quantity that varies as a function of time t, is defined to be

I(f) =

ˆ ∞−∞

f(t) dt =

ˆ t0+ε

t0−εf(t) dt.

In particular, setting t0 = 0, we may have

f(t) = dε(t) =

{12ε, |t| < ε

0, |t| ≥ ε

in which case

I(dε) =

ˆ ∞−∞

dε(t) dt =

ˆ ε

−ε

1

2εdt =

1

2ε[(ε)− (−ε)] = 1

for any ε > 0. Observe that the smaller ε becomes (i.e. the shorter the time the impulse occurs),the larger 1

2εbecomes (i.e. the greater the magnitude of the impulse), with the net effect being

a total impulse of 1. The function dε is called a unit impulse function.As ε tends to zero, we find that dε approaches a kind of idealized unit impulse function that

occurs “instantaneously” at t = 0 and has “infinite” magnitude. We have

limε→0+

dε(t) = 0 (6.33)

for all t 6= 0, and also

limε→0+

I(dε) = limε→0+

ˆ ∞−∞

dε(t) dt = limε→0+

ˆ ε

−εdε(t) dt = lim

ε→0+(1) = 1. (6.34)

Equations (6.33) and (6.34) serve as motivation for the following definition.

Definition 6.69. The Dirac delta is the idealized unit impulse function δ given by δ(t) = 0for all t 6= 0, and with the formal propertyˆ ∞

−∞δ(t) dt = 1. (6.35)

The Dirac delta is not a function in the conventional sense. No conventional function f canbe zero everywhere except at one point, and yet manage to have a nonzero proper or improper

Riemann integral´ baf for some choice of limits a and b. Rigorous justification of the Dirac delta

is beyond the scope of this text. For our purposes the Dirac delta is a formal device that enables

165

us to conveniently—and accurately—model physical systems involving impulse functions. Ift0 6= 0, then an immediate consequence of Definition 6.69 is that

δ(t− t0) = 0, t 6= t0, (6.36)

and ˆ ∞−∞

δ(t− t0) dt = 1.

Since

dε(t− t0) =

{12ε, t0 − ε < t < t0 + ε

0, t ≤ t0 − ε or t ≥ t0 + ε

we see from (6.36) that

δ(t− t0) = limε→0+

dε(t− t0),

which motives yet another formal definition.

Definition 6.70. For t0 > 0 we define

L[δ(t− t0)](s) = limε→0+

L[dε(t− t0)](s).

Theorem 6.71. If t0 > 0, then

L[δ(t− t0)](s) = e−st0 .

Proof. Let t0 > 0. Then there exists ε > 0 sufficiently small that t0 − ε > 0, and so

L[dε(t− t0)](s) =

ˆ ∞0

e−stdε(t− t0) dt =

ˆ t0+ε

t0−ε

e−st

2εdt

=1

2ε

[−1

se−st

]t0+εt0−ε

= − 1

2εs

[e−s(t0+ε) − e−s(t0−ε)

].

Now, by Definition 6.70

L[δ(t− t0)](s) = limε→0+

L[dε(t− t0)](s) = limε→0+

e−st0

2

(esε − e−sε

sε

),

and since the limit at right has indeterminate form 0/0 we may apply L’Hopital’s Rule (differ-entiating with respect to ε) to obtain

L[δ(t− t0)](s) = limε→0+

e−st0

2

(sesε + se−sε

s

)= lim

ε→0+

e−st0

2

(esε + e−sε

)=e−st0

2

(e0 + e0

)= e−st0 ,

as was to be shown. �

We can extend the result of Theorem 6.71 to the case when t0 = 0 with a natural definition:

L[δ(t)](s) := limt0→0

e−st0 = 1 (6.37)

for all s ∈ [0,∞).Generalizing the spirit of Definition 6.70, we have the following.

166

Definition 6.72. If f is a continuous function, thenˆ ∞−∞

f(t)δ(t− t0) dt = limε→0+

ˆ ∞−∞

f(t)dε(t− t0) dt

for any t0 ∈ R.

Theorem 6.73. If f is continuous and t0 ∈ R, thenˆ ∞−∞

f(t)δ(t− t0) dt = f(t0). (6.38)

Proof. Suppose that f : R→ R is continuous and t0 ∈ R. We have

limε→0+

ˆ ∞−∞

f(t)dε(t− t0) dt = limε→0+

ˆ t0+ε

t0−εf(t) · 1

2εdt = lim

ε→0+

1

2ε

ˆ t0+ε

t0−εf(t) dt. (6.39)

Now, by the Mean Value Theorem for Integrals16 there exists some t∗ε ∈ (t0 − ε, t0 + ε), whichdepends on ε, such that

f(t∗ε) =1

(t0 + ε)− (t0 − ε)

ˆ t0+ε

t0−εf(t) dt,

and thus ˆ t0+ε

t0−εf(t) dt = 2εf(t∗ε).

Returning to (6.39),

limε→0+

ˆ ∞−∞

f(t)dε(t− t0) dt = limε→0+

(1

2ε· 2εf(t∗ε)

)= lim

ε→0+f(t∗ε).

Let α > 0 be arbitrary. Since f is continuous at t0, there exists some β > 0 such that

|t− t0| < β ⇒ |f(t)− f(t0)| < α

Suppose that ε > 0 is such that ε < β. Then

t∗ε ∈ (t0 − ε, t0 + ε) ⊆ (t0 − β, t0 + β),

which is to say |t∗ε − t0| < β and so ∣∣f(t∗ε)− f(t0)∣∣ < α.

This shows thatlimε→0+

f(t∗ε) = f(t0),

and therefore

limε→0+

ˆ ∞−∞

f(t)dε(t− t0) dt = f(t0).

Now (6.38) follows by Definition 6.72. �


y′′ + 4y = 2δ(t− π)− δ(t− 2π), y(0) = 0, y′(0) = 0.

16See §6.1 of [CAL].

http://faculty.bucks.edu/erickson/math140/140chap6.pdf

167

Solution. We take the Laplace transform of each side of the ODE, using linearity properties toobtain

L[y′′](s) + 4L[y](s) = 2L[δ(t− π)](s)− L[δ(t− 2π)](s).

Now, letting Y (s) = L[y](s) and using Theorem 6.71, we obtain

[s2Y (s)− sy(0)− y′(0)] + 4Y (s) = 2e−πs − e−2πs,whence

Y (s) =2e−πs

s2 + 4− e−2πs

s2 + 4and so

y(t) = L−1[

2e−πs

s2 + 4

](t)− L−1

[e−2πs

s2 + 4

](t). (6.40)

If we define h(t) = sin(2t), then

L[h(t)](s) =2

s2 + 4,

so by Proposition 6.57

L[h(t− π)u(t− π)](s) = e−πsL[h(t)](s) =2e−πs

s2 + 4

and hence

L−1[

2e−πs

s2 + 4

](t) = h(t− π)u(t− π) = sin(2t− 2π)u(t− π) = sin(2t)u(t− π).

In similar fashion we obtain

L−1[e−2πs

s2 + 4

](t) =

1

2h(t− 2π)u(t− 2π) =

1

2sin(2t− 4π)u(t− 2π) =

1

2sin(2t)u(t− 2π).


y(t) = sin(2t)[u(t− π)− 1

2u(t− 2π)

],

t

y(t)

π 2π 3π

12

1

Figure 16.

168

or equivalently

y(t) =

0, 0 ≤ t < π

sin(2t), π ≤ t < 2π12

sin(2t), t ≥ 2π

See Figure 16 for the graph of y(t). �


y′′ + y′ + 2y = 5δ(t− 3), y(0) = 0, y′(0) = 1.

Solution. We take the Laplace transform of each side of the ODE, using linearity properties toobtain

L[y′′](s) + L[y′](s) + 2L[y](s) = 5L[δ(t− 3)](s).

Letting Y (s) = L[y](s) and using Theorem 6.71, we obtain

[s2Y (s)− sy(0)− y′(0)] + [sY (s)− y(0)] + 2Y (s) = 5e−3s,

whence

[s2Y (s)− 1] + sY (s) + 2Y (s) = 5e−3s ⇒ Y (s) =1 + 5e−3s

s2 + s+ 2.

Since s2 + s+ 2 is an irreducible quadratic, we cast it as a sum of squares:

s2 + s+ 2 =[s2 + s+

(12

)2]+ 2−

(12

)2=(s+ 1

2

)2+(√

72

)2.

From

Y (s) =1

(s+ 1/2)2 + 7/4+

5e−3s

(s+ 1/2)2 + 7/4

we obtain

y(t) = L−1[

1

(s+ 1/2)2 + 7/4

](t) + 5L−1

[e−3s

(s+ 1/2)2 + 7/4

](t). (6.41)

Referring to Table 1, we find that

L−1[

1

(s+ 1/2)2 + 7/4

](t) =

2√7L−1

[ √7/2

(s+ 1/2)2 +(√

7/2)2]

(t) =2√7e−t/2 sin

(√7t/2

).

Now, if we let

h(t) =2√7e−t/2 sin

(√7t/2

),

then by Proposition 6.57

L[h(t− 3)u(t− 3)](s) = e−3sL[h(t)](s) =e−3s

(s+ 1/2)2 + 7/4

and thus

L−1[

e−3s

(s+ 1/2)2 + 7/4

](t) = h(t− 3)u(t− 3) =

2√7e−(t−3)/2 sin

(√7(t− 3)/2

)u(t− 3).

169

t

y(t)

2 4 6 8 10

1

2

Figure 17.


y(t) =2√7e−t/2 sin

(√7t/2

)+

10√7e−(t−3)/2 sin

(√7(t− 3)/2

)u(t− 3),

the graph of which is presented in Figure 17. �


y′′ + y = δ(t− 2π) cos(t), y(0) = 0, y′(0) = 1.

Solution. We take the Laplace transform of each side of the ODE, using linearity propertiesand letting Y (s) = L[y(t)](s) to obtain

[s2Y (s)− sy(0)− y′(0)] + Y (s) = L[δ(t− 2π) cos(t)](s). (6.42)

Since δ(t− 2π) = 0 for all t < 0, we haveˆ ∞0

e−stδ(t− 2π) cos(t) dt =

ˆ ∞−∞

e−stδ(t− 2π) cos(t) dt,

and so by Theorem 6.73

L[δ(t− 2π) cos(t)](s) =

ˆ ∞−∞

e−stδ(t− 2π) cos(t) dt = e−2πs cos(2π) = e−2πs.

Equation (6.42) now becomes

[s2Y (s)− 1] + Y (s) = e−2πs,

whence

Y (s) =1

s2 + 1+

e−2πs

s2 + 1,

and finally

y(t) = L−1[

1

s2 + 1

](t) + L−1

[e−2πs

s2 + 1

](t).

With Table 1 and Proposition 6.57 we obtain

y(t) = sin(t) + sin(t− 2π)u(t− 2π),

170

t

y(t)

π 2π 3π

2

1

−1

Figure 18.

or simplyy(t) = sin(t)[1 + u(t− 2π)]


171

A Table of Laplace Transforms

f(t) L[f ](s) Dom(L[f ])

t sin bt2bs

(s2 + b2)2s > 0

t cos bts2 − b2

(s2 + b2)2s > 0

eat sin btb

(s− a)2 + b2s > a

eat cos bts− a

(s− a)2 + b2s > a

eattn, n = 0, 1, . . .n!

(s− a)n+1s > a

u(t− a), a ≥ 0e−as

ss > 0

δ(t− a), a ≥ 0 e−as s > 0

(f ∗ g)(t) L[f(t)](s)L[g(t)](s) s > 0

1√t

√π

ss > 0

√t

1

2s

√π

ss > 0

tn−1/2, n = 1, 2, . . .1 · 3 · 5 · · · (2n− 1)

√π

2nsn+1/2s > 0

172

7Series Solutions

7.1 – Taylor Polynomials

Polynomial functions, as we have seen, are well behaved. They are continuous everywhere,and have continuous derivatives of all orders everywhere. It also turns out that, given anycontinuous function f that has continuous derivatives of all orders, a polynomial function P canbe found that approximates f on any arbitrary interval I ⊆ Dom(f) to an arbitrary degree ofaccuracy. The precise way of going about this is to construct what is called a Taylor polynomial.

Definition 7.1. Let f be a function for which f ′(x0), f′′(x0), . . . , f

(n)(x0) ∈ R. For n ∈ N thenth-order Taylor polynomial for f with center x0 is the polynomial function Pn givenby

Pn(x) =n∑k=0

f (k)(x0)

k!(x− x0)k

= f(x0) + f ′(x0)(x− x0) +f ′′(x0)

2!(x− x0)2 + · · ·+ f (n)(x0)

n!(x− x0)n,

where we define

f (0)(x0)

0!(x− x0)0 = f(x0)

for all x ∈ R.

More compactly we may write

Pn(x) =n∑k=0

ak(x− x0)k,

where

ak =f (k)(x0)

k!

is the kth coefficient of the polynomial Pn(x).

173

The remainder Rn associated with the nth-order Taylor polynomial for f is the functiongiven by

Rn(x) = f(x)− Pn(x)

and can be seen to be the error incurred when using Pn to approximate f for any x ∈ Dom(f).The absolute error in approximating f(x) with Pn(x) is given by

|Rn(x)| = |f(x)− Pn(x)| .

The summation notation is too convenient to pass up when working with Taylor polynomials,but using it can give the indeterminate form 00 occasion to arise now and again. Whenever thishappens in the present context it is understood that 00 = 1. Summation notation and expandedform will both be employed side-by-side in any situation when it seems timely to remind thereader of this convention.

Clearly Pn(x0) = f(x0), and since for any 1 ≤ k ≤ n we find that the kth derivative of Pn isgiven by

P (k)n (x) = f (k)(x0) +

f (k+1)(x0)

(k + 1)!(k + 1)(k)(k − 1) · · · (2)(x− x0)1

+f (k+2)(x0)

(k + 2)!(k + 2)(k + 1)(k) · · · (3)(x− x0)2

...

+f (n)(x0)

n!(n)(n− 1)(n− 2) · · · (n− (k − 1))(x− x0)n−k

= f (k)(x0) +f (k+1)(x0)

1!(x− x0) +

f (k+2)(x0)

2!(x− x0)2 + · · ·+ f (n)(x0)

(n− k)!(x− x0)n−k,

it follows that

P (k)n (x0) = f (k)(x0). (7.1)

We will need this fact in the proof of the following theorem.

Theorem 7.2 (Taylor’s Theorem). Suppose that f : [a, b]→ R, f (n) is continuous on [a, b]and differentiable on (a, b) for some n ∈ N, and x0 ∈ [a, b]. Given any x ∈ [a, b] with x 6= x0,there exists some c between x and x0 such that

f(x) = Pn(x) +f (n+1)(c)

(n+ 1)!(x− x0)n+1 (7.2)

Proof. Fix a ≤ x0 ≤ b, and let x ∈ [a, b] be such that x 6= x0. Letting t be a variable, we have

Pn(t) =n∑k=0

f (k)(x0)

k!(t− x0)k

for a ≤ t ≤ b. If we let

M =f(x)− Pn(x)

(x− x0)n+1,

174

which is a constant, then to show (7.2) is equivalent to showing f (n+1)(c) = M(n+ 1)! for somec between x and x0.

Define a continuous function g : [a, b]→ R by

g(t) = f(t)− Pn(t)−M(t− x0)n+1.

The continuity of f (n) on [a, b] implies the differentiability of f (n−1) on [a, b], which in turnimplies the differentiability of f (n−2) on [a, b], and so on. Thus it is clear that f (k) is differentiableon [a, b] for all 0 ≤ k ≤ n−1 (where f (0) = f). From these observations and the given hypotheseswe conclude that g(n) is continuous on [a, b] and differentiable on (a, b), and g(k) is differentiable(and hence continuous) on [a, b] for 0 ≤ k ≤ n− 1. In particular g(n+1)(t) exists for all a < t < b,and so is defined for all t between x and x0.

What we wish to determine is whether there exists some c between x and x0 for whichg(n+1)(c) = 0. First note that g(x0) = 0, and also

g(x) = f(x)− Pn(x)− f(x)− Pn(x)

(x− x0)n+1· (x− x0)n+1

= f(x)− Pn(x)− [f(x)− Pn(x)] = 0.

Moreover, from

g(k)(t) = f (k)(t)− P (k)n (t)−M(n+ 1)(n)(n− 1) · · · (n− (k − 2))(t− x0)n+1−k (7.3)

and equation (7.1) we obtain

g(k)(x0) = f (k)(x0)− P (k)n (x0) = f (k)(x0)− f (k)(x0) = 0

for all 1 ≤ k ≤ n.Since g(x) = g(x0) = 0, g is continuous on the closed interval with endpoints x and x0, and

g is differentiable on the open interval with endpoints x and x0, by the Mean Value Theoremthere exists some c1 between x and x0 for which g′(c1) = 0. Next, since g′(c1) = g′(x0) = 0, g′ iscontinuous on the closed interval with endpoints c1 and x0, and g is differentiable on the openinterval with endpoints c1 and x0, there exists some c2 between c1 and x0 for which g′′(c2) = 0.Repeating this argument leads to the conclusion that there exists some cn ∈ (a, b), not equalto x0, for which g(n)(cn) = 0. Then, since g(n)(x0) = 0 also, g(n) is continuous on the closedinterval I with endpoints cn and x0, and g(n) is differentiable on Int(I), it must be that there issome c between cn and x0 for which g(n+1)(c) = 0. Because each ck lies between x and x0 it iseasy to see that c does also.

Now, from (7.3) and the observation that P(n+1)n = 0,

g(n+1)(t) = f (n+1)(t)− P (n+1)n (t)−M(n+ 1)! = f (n+1)(t)−M(n+ 1)!,

and so g(n+1)(c) = 0 implies that

f (n+1)(c)−M(n+ 1)! = 0.

Therefore

f (n+1)(c) = M(n+ 1)! =f(x)− Pn(x)

(x− x0)n+1(n+ 1)!,

which immediately yields (7.2) as desired. �

175

x

y

P1P3P5

P2P4

f

1−1

3

Figure 19.

Proposition 7.3. Suppose that f : [a, b]→ R, f (n) is continuous on [a, b] and differentiable on(a, b) for some n ∈ N, a ≤ x0, x ≤ b with x0 6= x, and I is the open interval with endpoints xand x0. If Pn is the nth-order Taylor polynomial for f with center x0 and there exists someM ∈ R such that |f (n+1)(t)| ≤M for all t ∈ I, then

|Rn(x)| ≤M|x− x0|n+1

(n+ 1)!.

Proof. Recall that by definition Rn = f−Pn. By Taylor’s Theorem there exists some c betweenx and x0 such that

Rn(x) = f(x)− Pn(x) =f (n+1)(c)

(n+ 1)!(x− x0)n+1,

and since c ∈ I we obtain

|Rn(x)| =∣∣∣∣f (n+1)(c)

(n+ 1)!(x− x0)n+1

∣∣∣∣ =∣∣f (n+1)(c)

∣∣ |x− x0|n+1

(n+ 1)!≤M

|x− x0|n+1

(n+ 1)!,

as was to be shown. �

Example 7.4. Find the nth-order Taylor polynomial for

f(x) =1

(1 + x)2

centered at 0 for n = 1, 2, 3, 4, 5.

176

Solution. First we obtain the needed derivatives for f , along with their values at 0.

f ′(x) = −2(1 + x)−3 ⇒ f ′(0) = −2

f ′′(x) = 6(1 + x)−4 ⇒ f ′′(0) = 6

f ′′′(x) = −24(1 + x)−5 ⇒ f ′′′(0) = −24

f (4)(x) = 120(1 + x)−6 ⇒ f (4)(0) = 120

f (5)(x) = −720(1 + x)−7 ⇒ f (5)(0) = −720

Thus we have

P1(x) = f(0) + f ′(0)x = 1− 2x

P2(x) = f(0) + f ′(0)x+f ′′(0)

2!x2 = 1− 2x+ 3x2

P3(x) = f(0) + f ′(0)x+f ′′(0)

2!x2 +

f ′′′(0)

3!x3 = 1− 2x+ 3x2 − 4x3

P4(x) = P3(x) +f (4)(0)

4!= 1− 2x+ 3x2 − 4x3 + 5x4

P5(x) = P4(x) +f (5)(0)

5!= 1− 2x+ 3x2 − 4x3 + 5x4 − 6x5

Figure 19 shows the graphs of these Taylor polynomials. It can be seen that Pn provides abetter approximation for f in the neighborhood of 0 as n increases. �

Example 7.5. Let f(x) = (1 + x)−2 as in the previous example. Find an upper bound onthe absolute error that may be incurred by approximating f(0.1) using the 4th-order Taylorpolynomial for f with center 0.

Solution. The absolute error in question is

|R4(0.1)| = |f(0.1)− P4(0.1)|.

From Example 7.4 we found that f (5)(x) = −720(1 + x)−7, which certainly is continuous on[0, 0.1] and differentiable on (0, 0.1). Now, for any 0 < t < 0.1,∣∣f (5)(t)

∣∣ =720

|1 + t|7≤ 720

|1 + 0|7= 720,

and so by Proposition 7.3 we obtain an upper bound on |R4(x)|:

|R4(x)| ≤ (720)|0.1− 0|5

(4 + 1)!=

720(0.1)5

5!= 6× 10−5.

Thus, if we use P4(0.1) to estimate f(0.1), the absolute error will be no greater than 6× 10−5.Of course, nothing here prevents us from actually calculating the absolute error in this case.

Sincef(0.1) = (1 + 0.1)−2 ≈ 0.826446281

andP4(0.1) = 1− 2(0.1) + 3(0.1)2 − 4(0.1)3 + 5(0.1)4 = 0.8265,

177

it can be seen that the absolute error is about 5.3719× 10−5. This is indeed less than the upperbound 6× 10−5. �

Example 7.6. Find an upper bound on the absolute error in approximating f(x) = (1 + x)−2

on the interval [−0.3, 0.3] using the 4th-order Taylor polynomial for f with center 0.

Solution. The goal here is to find some number N such that |R4(x)| ≤ N for all x ∈ [−0.3, 0.3].Example 7.4 gives an expression for f (5), which is seen to be continuous on [−0.3, 0.3] anddifferentiable on (−0.3, 0.3), and so Proposition 7.3 can be used with x0 = 0 in order to determinea value for N .

Fix x ∈ [−0.3, 0.3], and let I be the open interval with endpoints 0 and x. For any t ∈ I wehave ∣∣f (5)(t)

∣∣ =720

|1 + t|7≤ 720

|1 + (−0.3)|7≈ 8742.7 < 8742.8,

since t ∈ I implies that −0.3 ≤ t ≤ 0.3, and so by Proposition 7.3 and the fact that |x| ≤ 0.3we obtain

|R4(x)| ≤ (8742.8)|x|5

5!≤ (8742.8)

0.35

5!≈ 0.1770 < 0.1771.

Now, since x ∈ [−0.3, 0.3] is arbitrary, it follows that |R4(x)| ≤ 0.1771 for all −0.3 ≤ x ≤ 0.3.That is, 0.1771 serves as an upper bound on the absolute error in approximating f on [−0.3, 0.3]using P4.

In the case when x = −0.3 we have f(−0.3) = (1− 0.3)−2 ≈ 2.0408 and P4(−0.3) = 2.0185,and so the absolute error is 0.0223—well less than 0.1771. �

In the next example we see at last how Taylor polynomials may be used to obtain approximatesolutions to initial value problems encountered in the theory of differential equations.

Example 7.7. Determine the first four nonzero terms in the Taylor polynomial approximationof the solution to the initial value problem

y′′ + 2y′ − y2 = t2, y(0) = 1, y′(0) = 1.

Solution. We have

y′′(t) = y2(t)− 2y′(t) + t2,

and so using the initial conditions we obtain

y′′(0) = y2(0)− 2y′(0) + 02 = 12 − 2(1) = −1.

Next, y′′′ = 2yy′ − 2y′′ + 2t, so

y′′′(0) = 2(1)(1)− 2(−1) + 2(0) = 4.

Finally, from y(4) = 2yy′′ + 2(y′)2 − 2y′′′ + 2 we have

y(4)(0) = 2(1)(−1) + 2(1)2 − 2(4) + 2 = −6.

178

The nth-order Taylor polynomial for y with center 0 is

Pn(t) =n∑k=0

y(k)(0)

k!tk = y(0) + y′(0)t+

y′′(0)

2!t2 + · · ·+ y(n)(0)

n!tn,

and so

P4(t) = y(0) + y′(0)t+y′′(0)

2!t2 +

y′′′(0)

3!t3 +

y(4)(0)

4!t4

= 1 + t− 1

2t2 +

2

3t3 − 1

4t4.

That is,

y(t) ≈ 1 + t− 1

2t2 +

2

3t3 − 1

4t4

for all t near 0. �

179

7.2 – Power Series

The basic definition of an infinite series, found in any calculus book, is assumed to be familiarto the reader and so is not included here. Rather, we begin with the definition of a special kindof infinite series known as a power series.

Definition 7.8. An infinite series of the form

∞∑k=0

ck(x− x0)k (7.4)

is a power series with center x0, and the ck values are the coefficients of the power series.

Just as an infinite series need not have its index k start at 1, the index of a power series asdefined here does not need to start at 0. However having the initial value of k be 0 is the mostcommon scenario.

Power series may be used to define functions. That is, we can define a function f by

f(x) =∞∑k=0

ck(x− x0)k,

with the understanding that the domain of f consists of the set of all x ∈ R for which the seriesconverges. Define

S =

{x ∈ R :

∞∑k=0

ck(x− x0)k ∈ R

}to be the “set of convergence” for the series (7.4). As the next theorem makes clear, given anypower series (7.4) the set S can only ever be {x0}, (−∞,∞), or an interval with endpointsx0 −R and x0 +R for some R > 0. Here R is called the radius of convergence of the powerseries. We define R = 0 if S = {x0}, and R =∞ if S = (−∞,∞).

Theorem 7.9. A power series∑ck(x− x0) exhibits one of three behaviors:

1. The series converges absolutely for all x ∈ R, so that S = (−∞,∞) and R =∞.2. The series converges only for x = x0, so that S = {x0} and R = 0.3. For some 0 < R <∞ the series converges absolutely for all x ∈ (x0 −R, x+R) and diverges

for all x ∈ (−∞, x0 −R) ∪ (x0 +R,∞).

In part (3) of the theorem notice that nothing is said about whether the power seriesconverges at x = x0 ±R, and that is because nothing can be said in general. If part (3) appliesto a particular power series, then the set of convergence S of the series will be an interval ofconvergence that may be of the form

(x0 −R, x0 +R), [x0 −R, x0 +R), (x0 −R, x0 +R], or [x0 −R, x0 +R].

The endpoints x0−R and x0 +R will have to be investigated individually to determine whetherthe series converges or diverges there.

180

To determine for what values of x a power series converges, the best tools are the Ratio Testand Root Test, stated here without proof.17

Theorem 7.10 (Root Test). Given the series∑ak, let ρ = lim

k→∞k√|ak|.

1. If ρ ∈ [0, 1), then∑|ak| converges.

2. If ρ ∈ (1,∞], then∑ak diverges.

Theorem 7.11 (Ratio Test). Given the series∑ak for which ak = 0 for at most a finite

number of k values, let ρ = limk→∞|ak+1/ak|.

1. If ρ ∈ [0, 1), then∑|ak| converges.

2. If ρ ∈ (1,∞], then∑ak diverges.

If {bk} is a sequence such that bk > 0 for all k, then a series of the form∑(−1)kbk or

∑(−1)k+1bk

is called alternating. Thus, the terms of an alternating series alternate between positive andnegative values. An easy example is the series

∞∑k=1

(−1)k+1 1

k,

which it known as the alternating harmonic series.To determine whether an alternating series converges or diverges, there is the following test

which will soon prove useful.

Theorem 7.12 (Alternating Series Test). If {bk} is such that 0 < bk+1 ≤ bk for all k andlimk→∞ bk = 0, then the series ∑

(−1)k+1bkconverges.

Example 7.13. Find the interval of convergence of the power series∞∑k=1

(−1)k−1xk

k3, (7.5)

and state the radius of convergence.

Solution. Clearly the series converges when x = 0. Assuming x 6= 0, we can employ the RatioTest with ak = (−1)k−1xk/k3:

limk→∞

∣∣∣∣ak+1

ak

∣∣∣∣ = limk→∞

∣∣∣∣(−1)kxk+1

(k + 1)3· k3

(−1)k−1xk

∣∣∣∣ = limk→∞

∣∣∣∣ −k3x(k + 1)3

∣∣∣∣ = limk→∞

k3

(k + 1)3|x| = |x|,

since

limk→∞

k3

(k + 1)3= 1.

17The proofs for these tests, as well as for most of the other results in this section, can be found in Chapters9 and 10 of my Calculus Notes.

181

Thus the series converges if |x| < 1, or equivalently −1 < x < 1. The Ratio Test is inconclusivewhen x = −1 or x = 1, so we analyze these endpoints separately.

When x = −1 the series becomes∞∑k=1

(−1)k−1(−1)k

k3=∞∑k=1

(−1)2k−1

k3=∞∑k=1

−1

k3.

Recall that∑

1/k3 is a convergent p-series, and thus∑

1/k3 = s for some s ∈ R. NowProposition 9.12 implies that

∞∑k=1

−1

k3= −

∞∑k=1

1

k3= −s,

which shows that∑−1/k3 also converges.

When x = 1 the series becomes∞∑k=1

(−1)k−1

k3,

which is an alternating series∑

(−1)k−1bk with bk = 1/k3. Since limk→∞ bk = 0 and

bk+1 =1

(k + 1)3<

1

k3= bk

for all k, by the Alternating Series Test this series converges.We conclude that the series (7.5) converges on the interval [−1, 1], and the radius of

convergence is R = 12|1− (−1)| = 1. �

Example 7.14. Find the interval of convergence of the power series

∞∑k=1

(−1)k(x+ 2)k

k · 2k, (7.6)


Solution. Clearly the series converges when x = −2. Assuming x 6= −2, we can employ theRatio Test with

ak = (−1)k(x+ 2)k

k · 2kto obtain

limk→∞

∣∣∣∣ak+1

ak

∣∣∣∣ = limk→∞

∣∣∣∣(−1)k+1(x+ 2)k+1

(k + 1) · 2k+1· k · 2k

(−1)k(x+ 2)k

∣∣∣∣= lim

k→∞

∣∣∣∣(−1)(x+ 2)

2(k + 1)· k

1

∣∣∣∣ = limk→∞

k

2k + 2|x+ 2|

=1

2|x+ 2|.

Thus the series converges if 12|x+ 2| < 1, which implies |x+ 2| < 2 and thus −4 < x < 0. The

Ratio Test is inconclusive when x = −4 or x = 0, so we analyze these endpoint separately.

182

When x = −4 the series becomes∞∑k=1

(−1)k(−2)k

k · 2k=∞∑k=1

2k

k · 2k=∞∑k=1

1

k,

which is the harmonic series and therefore diverges.When x = 0 the series becomes

∞∑k=1

(−1)k2k

k · 2k=∞∑k=1

(−1)k1

k,

which is an alternating series∑

(−1)kbk with bk = 1/k. Since limk→∞ bk = 0 and

bk+1 =1

k + 1<

1

k= bk

for all k, by the Alternating Series Test this series converges.Therefore the series (7.6) converges on the interval (−4, 0], and the radius of convergence is

R = 12|0− (−4)| = 2. �

Example 7.15. Find the interval of convergence of the power series

∞∑k=2

xk

(ln k)k, (7.7)


Solution. In this case it would be easier to use the Root Test with ak = xk/(ln k)k; so, for anyx ∈ R, we obtain

limk→∞

k√|ak| = lim

k→∞k

√∣∣∣∣ xk

(ln k)k

∣∣∣∣ = limk→∞

k

√|x|k| ln k|k

= limk→∞

|x|| ln k|

= 0,

since ln k → ∞ as k → ∞. Thus, the series (7.7) converges for all real x, which implies thatthe interval of convergence is (−∞,∞) and the radius of convergence is R =∞. �

Theorem 7.16. Suppose the series∑∞

k=0 ck(x − x0)k converges on an interval I, and definef : I → R by

f(x) =∞∑k=0

ck(x− x0)k.

1. f is continuous on I.2. f is differentiable on Int(I), with

f ′(x) =∞∑k=1

kck(x− x0)k−1.

for all x ∈ Int(I).

183

3. f is integrable on Int(I), with

ˆf =

∞∑k=0

ckk + 1

(x− x0)k+1 + c

for arbitrary constant c.

In Theorem 7.16(2) we could have written

f ′(x) =∞∑k=0

kck(x− x0)k−1,

with the understanding that the term corresponding to k = 0, which is 0c0(x− x0)−1, is equalto 0 even if x = x0 and the form 0−1 results! We will avoid this here, although we will continueto take 00 to be 1 wherever it arises in the power series notation.

The first part of Theorem 7.16 states that, for any x ∈ Int(I),

limt→x

∞∑k=0

ck(t− x0)k = limt→x

f(t) = f(x) =∞∑k=0

ck(x− x0)k =∞∑k=0

limt→x

ck(t− x0)k, (7.8)

which is to say the limit of a convergent series can be carried out “termwise” so long as the limitoperates in the interior of the interval of convergence I of the series. If x is an endpoint of Ithen the appropriate one-sided limit is executed in (7.8) instead. The second and third parts ofthe theorem state that a convergent power series can be differentiated and integrated “termwise”on the interior of I, meaning the differentiation or integration operator can be brought insidethe series:

d

dx

∞∑k=0

ck(x− x0)k =∞∑k=0

d

dx

[ck(x− x0)k

]=∞∑k=0

kck(x− x0)k−1 =∞∑k=1

kck(x− x0)k−1,

andˆ [ ∞∑

k=0

ck(x− x0)k]dx =

∞∑k=0

[ˆck(x− x0)k dx

]+ c =

∞∑k=0

ckk + 1

(x− x0)k+1 + c.

Moreover the new series that results from differentiating or integrating the old series will beconvergent on Int(I) as mentioned in the theorem, although nothing can be said in generalabout the behavior of the new series at the endpoints of I.

The following proposition is proven in more advanced texts using Theorem 7.16, amongother things. It will be needed in later sections.

Proposition 7.17. If∞∑k=0

ck(x− x0)k = 0

for all x in some open interval, then ck = 0 for all k ≥ 0.

184

Definition 7.18. A function f is said to be analytic at x0 if there exists R > 0 and coefficientsck ∈ R such that

f(x) =∞∑k=0

ck(x− x0)k (7.9)

for all x ∈ (x0 −R, x0 +R).If a function f is such that (7.9) holds for all x in an open interval I, then f is said to be

analytic on I.

An immediate consequence of Theorem 7.16 is that a function f for which (7.9) holds forall x ∈ (x0 − R, x0 + R) must have derivatives of all orders at x0. Given a function f whosederivatives can be explicitly determined at x0 by the usual rules of differentiation, the customaryway of going about finding a power series expression for f (i.e. one that satisfies (7.9) on someopen interval containing x0) is to construct the Taylor series for f with center at x0.

Definition 7.19. Let f be a function that has derivatives of all orders on an open interval Icontaining x0. Then the power series of the form

∞∑k=0

f (k)(x0)

k!(x− x0)k

is the Taylor series for f centered at x0. A Taylor series centered at 0 is called aMaclaurin series.

Recalling the definition for the nth-order Taylor polynomial for f with center x0 given in§7.1, it can be seen that

∞∑k=0

f (k)(x0)

k!(x− x0)k = lim

n→∞Pn(x).

185

7.3 – Series Solutions Near an Ordinary Point

Our objective is to devise a method to express the general solution of an ordinary differentialequation in terms of a power series that converges in some open interval I containing a point x0.To start, we will focus first on second-order linear differential equations of the form

a2(x)y′′ + a1(x)y′ + a0(x)y = 0. (7.10)

Dividing (7.10) by a2(x) puts the equation in the standard form

y′′ + p(x)y′ + q(x)y = 0, (7.11)

where of course

p(x) =a1(x)

a2(x)and q(x) =

a0(x)

a2(x).

Definition 7.20. A point x0 in the interior of Dom(p)∩Dom(q) is called an ordinary pointfor the ODE (7.11) if p and q are both analytic at x0. Otherwise x0 is called a singular point.

The theory that underlies our stated objective is best expressed in the following theorem.

Theorem 7.21. Let p and q in (7.11) be analytic at an ordinary point x0, so that thereexist pk, qk ∈ R and R1, R2 > 0 such that p(x) =

∑pk(x − x0)k on (x0 − R1, x0 + R1), and

q(x) =∑qk(x− x0)k on (x0 −R2, x0 +R2). If R = min{R1, R2}, then the IVP

y′′ + p(x)y′ + q(x)y = 0, y(x0) = b0, y′(x0) = b1 (7.12)

has a unique solution that is analytic on (x0 −R, x0 +R).

According to the developments in Section 4.7, the general solution to an equation of the form(7.11) is a two-parameter family of functions of the form d1y1(x) + d2y2(x), where y1 and y2 aretwo particular linearly independent solutions to (7.11), and d1 and d2 are arbitrary constants.

According to Theorem 7.21 the IVP (7.12) has a solution y(x) expressible as a power serieson I = (x0 −R, x0 +R), which is to say there exist ak ∈ R such that

y(x) =∞∑k=0

ck(x− x0)k = c0 + c1(x− x0) + c2(x− x0)2 + · · · (7.13)

for all x ∈ I, and the initial conditions y(x0) = b0 and y′(x0) = b1 are satisfied. Indeed from(7.13) we find that c0 = y(x0) = b0, and from

y′(x) =∞∑k=1

kck(x− x0)k−1 = c1 + 2c2(x− x0) + 3c3(x− x0)2 + · · ·

we find that c1 = y′(x0) = b1.Thus if the initial conditions in (7.12) were not given, then we would find that c0 and c1

would be left as arbitrary constants and so (7.13) would in fact be a two-parameter family of

186

solutions. Reconciling the developments of Section 4.7 with the consequences of Theorem 7.21,we conclude that the general solution to (7.11) is

y(x) =∞∑k=0

ck(x− x0)k = d1y1(x) + d2y2(x),

where again c0, c1, d1, d2 are arbitrary constants. What we have done is concoct two differentways of expressing the general solution to an ODE of the form (7.11). The following exampleshould help illustrate how this is all put into practice.


(x2 − 4)y′′ + 3xy′ + y = 0

in the form of a power series about 0.

Solution. Since x = 0 is an ordinary point for the ODE, we expect to find a general solution ofthe form

y(x) =∞∑k=0

ckxk, (7.14)

with the power series converging on some open interval I containing 0. The task is to determineall coefficients ck that can be determined. By the discussion above we expect that c0 and c1 willbe left as arbitrary, and all ck for k ≥ 2 will not. Now, substituting (7.14),

y′(x) =∞∑k=1

kckxk−1,

and

y′′(x) =∞∑k=2

k(k − 1)ckxk−2

into the ODE yields

(x2 − 4)∞∑k=2

k(k − 1)ckxk−2 + 3x

∞∑k=1

kckxk−1 +

∞∑k=0

ckxk = 0.

Assuming that x ∈ I, this equation becomes∞∑k=2

k(k − 1)ckxk − 4

∞∑k=2

k(k − 1)ckxk−2 + 3

∞∑k=1

kckxk +

∞∑k=0

ckxk = 0 (7.15)

Now we will shift indexes so that all the power series in (7.15) begin at k = 0. This can bedone effortlessly for the first and third series, since it merely results in adding terms that equal0. As for the second series, we need only replace k with k + 2 to obtain

∞∑k+2=2

(k + 2)[(k + 2)− 1]ck+2x(k+2)−2,

which becomes∞∑k=0

(k + 1)(k + 2)ck+2xk.

187

Putting this into (7.15) gives

∞∑k=0

k(k − 1)ckxk − 4

∞∑k=0

(k + 1)(k + 2)ck+2xk + 3

∞∑k=0

kckxk +

∞∑k=0

ckxk = 0,

whence we obtain∞∑k=0

[k(k − 1)ck − 4(k + 1)(k + 2)ck+2 + 3kck + ck]xk = 0,

and finally∞∑k=0

[(k + 1)2ck − 4(k + 1)(k + 2)ck+2

]xk = 0

for all x ∈ I. Therefore by Proposition 7.17 we have

(k + 1)2ck − 4(k + 1)(k + 2)ck+2 = 0

for all k ≥ 0.Solving for ck+2 gives

ck+2 =k + 1

4(k + 2)ck,

a recurrence relation which will now be used to obtain an explicit expression for ck. We have

c2 =0 + 1

4(0 + 2)c0 =

1

4 · 2c0

c3 =1 + 1

4(1 + 2)c1 =

2

4(1 · 3)c1

c4 =2 + 1

4(2 + 2)c2 =

3

4 · 4· 1

4 · 2c0 =

1 · 342(2 · 4)

c0

c5 =3 + 1

4(3 + 2)c3 =

4

4 · 5· 2

3 · 4c1 =

2 · 442(1 · 3 · 5)

c1

c6 =4 + 1

4(4 + 2)c4 =

5

4 · 6· 3

42(2 · 4)c0 =

1 · 3 · 543(2 · 4 · 6)

c0

c7 =5 + 1

4(5 + 2)c5 =

6

4 · 7· 2 · 4

42(3 · 5)c1 =

2 · 4 · 643(1 · 3 · 5 · 7)

c1

c8 =6 + 1

4(6 + 2)c6 =

7

4 · 8· 3 · 5

43(2 · 4 · 6)c0 =

1 · 3 · 5 · 744(2 · 4 · 6 · 8)

c0

Two patterns are emerging here: one for ck when k is even, and another for ck when k is odd.For n any whole number we have

c2n =3 · 5 · 7 · · · (2n− 1)

4n[2 · 4 · 6 · · · (2n)]c0 =

3 · 5 · 7 · · · (2n− 1)

4n(2nn!)c0 =

(2n)!

4n(2nn!)2c0 =

(2n)!

42n(n!)2c0

and

c2n+1 =2 · 4 · 6 · · · (2n)

4n[3 · 5 · 7 · · · (2n+ 1)]c1 =

[2 · 4 · 6 · · · (2n)]2

4n(2n+ 1)!c1 =

(2nn!)2

4n(2n+ 1)!c1 =

(n!)2

(2n+ 1)!c1.

188

These two results can be rigorously proven by a routine use of the principle of induction, but weshall refrain in the interests of brevity. What is evident, however, is that all ck are expressiblein terms of either c0 or c1, which are arbitrary. So, as a general solution to the ODE, we havey(x) =

∑∞k=0 ckx

k with

ck =

k!

4k[(k/2)!]2c0, if k is even

[((k − 1)/2)!]2

k!c1, if k is odd

There is another way to express the ODE’s solution which is nicer. If we set c0 = 0 andc1 = 1, then ck = 0 if k is even and

ck =[((k − 1)/2)!]2

k!

if k is odd, and so only the odd-indexed terms in the series (7.14) are nonzero. Now, k oddmeans here that k = 2n+ 1 for some n = 0, 1, 2, . . ., and so (7.14) can be written as

y1(x) =∞∑k=0

c2n+1x2n+1 =

∞∑k=0

(n!)2

(2n+ 1)!x2n+1,

which is in fact a particular solution to the ODE.If we set c0 = 1 and c1 = 0, then ck = 0 if k is odd and

ck =k!

4k[(k/2)!]2

if k is even, and so only the even-indexed terms in (7.14) are nonzero. Now, k even implies thatk = 2n for n = 0, 1, 2, . . ., and so (7.14) can be written as

y2(x) =∞∑k=0

c2nx2n =

∞∑k=0

(2n)!

42n(n!)2x2n,

which is another particular solution to the ODE.The particular solutions y1(x) and y2(x) are linearly independent, and so the general solution

to the ODE can be expressed as

y(x) = d1y1(x) + d2y2(x) = d1

∞∑k=0

(n!)2

(2n+ 1)!x2n+1 + d2

∞∑k=0

(2n)!

42n(n!)2x2n,

where d1 and d2 are arbitrary constants. �


y′′ − x2y′ − xy = 0

in the form of a power series about x0 = 0.

Solution. Since 0 is an ordinary point for the ODE we expect to find a general solution of theform

y(x) =∞∑k=0

ckxk,

189

with the series converging on some open interval I containing 0. Substituting this into the ODEyields

∞∑k=2

k(k − 1)ckxk−2 − x2

∞∑k=1

kckxk−1 − x

∞∑k=0

ckxk = 0,

and thus∞∑k=2

k(k − 1)ckxk−2 −

∞∑k=1

kckxk+1 −

∞∑k=0

ckxk+1 = 0.

Reindexing so that all series feature xk, we have∞∑k=0

(k + 1)(k + 2)ck+2xk −

∞∑k=2

(k − 1)ck−1xk −

∞∑k=1

ck−1xk = 0.

Finally we contrive to have the index of each series start at 2 by removing the first two terms ofthe leftmost series and the first term of the rightmost series:[

2c2 + 6c3x+∞∑k=2

(k + 1)(k + 2)ck+2xk

]−∞∑k=2

(k − 1)ck−1xk −

[c0x+

∞∑k=2

ck−1xk

]= 0.

Hence

2c2 + (6c3 − c0)x+∞∑k=2

[(k + 1)(k + 2)ck+2 − (k − 1)ck−1 − ck−1]xk = 0,

which simplifies to become

2c2 + (6c3 − c0)x+∞∑k=2

[(k + 1)(k + 2)ck+2 − kck−1]xk = 0.

Now, Proposition 7.17 implies that 2c2 = 0, 6c3 − c0 = 0, and

(k + 1)(k + 2)ck+2 − kck−1 = 0

for all k ≥ 2. That is, c2 = 0, c3 = c0/6 = c0/(2 · 3), and

ck+2 =k

(k + 1)(k + 2)ck−1

for k ≥ 2. The recursion relation enables us to express all ck exclusively in terms of c0 and c1:

c4 =2

3 · 4c1 c5 =

3

4 · 5c2 = 0

c6 =4

5 · 6c3 =

4

2 · 3 · 5 · 6c0 c7 =

5

6 · 7c4 =

2 · 53 · 4 · 6 · 7

c1

c8 =6

7 · 8c5 = 0 c9 =

7

8 · 9c6 =

4 · 72 · 3 · 5 · 6 · 8 · 9

c0

c10 =8

9 · 10c7 =

2 · 5 · 83 · 4 · 6 · 7 · 9 · 10

c1 c11 =9

10 · 11c8 = 0

c12 =10

11 · 12c9 =

4 · 7 · 10

2 · 3 · 5 · 6 · 8 · 9 · 11 · 12c0

So we have

190

y(x) = c0 + c1x+c0

2 · 3x3 +

2c13 · 4

x4 +4c0

2 · 3 · 5 · 6x6 +

2 · 5c13 · 4 · 6 · 7

x7 +4 · 7c0

2 · 3 · 5 · 6 · 8 · 9x9

+2 · 5 · 8c1

3 · 4 · 6 · 7 · 9 · 10x10 +

4 · 7 · 10c02 · 3 · 5 · 6 · 8 · 9 · 11 · 12

x12 + · · ·

Setting c0 = 0 and c1 = 1 yields the particular solution

y1(x) = x+2

3 · 4x4 +

2 · 53 · 4 · 6 · 7

x7 +2 · 5 · 8

3 · 4 · 6 · 7 · 9 · 10x10 + · · ·

= x+22

4!x4 +

22 · 52

7!x7 +

22 · 52 · 82

10!x10 + · · ·

= x+∞∑k=1

22 · 52 · · · (3k − 1)2

(3k + 1)!x3k+1,

and setting c0 = 1 and c1 = 0 yields the particular solution

y2(x) = 1 +1

2 · 3x3 +

4

2 · 3 · 5 · 6x6 +

4 · 72 · 3 · 5 · 6 · 8 · 9

x9 +4 · 7 · 10

2 · 3 · 5 · 6 · 8 · 9 · 11 · 12x12 + · · ·

= 1 +1

3!x3 +

42

6!x6 +

42 · 72

9!x9 +

42 · 72 · 102

12!x12 + · · ·

= 1 +∞∑k=1

42 · 72 · · · (3k − 2)2

(3k)!x3k.

Since y1(x) and y2(x) are linearly independent, the general solution to the ODE may be expressedas

y(x) = d1

[x+

∞∑k=1

22 · 52 · · · (3k − 1)2

(3k + 1)!x3k+1

]+ d2

[1 +

∞∑k=1

42 · 72 · · · (3k − 2)2

(3k)!x3k

]for all x ∈ I, where d1 and d2 are arbitrary constants. �

Example 7.24. Find the first four nonzero terms in a power series expansion about x0 = 2 fora general solution to

x2y′′ − y′ + y = 0.

Solution. Since x = 2 is an ordinary point for the ODE, we expect to find a general solution ofthe form

y(x) =∞∑k=0

ck(x− 2)k, (7.16)

with the power series converging on some open interval I containing 2. From (7.16) comes

y′(x) =∞∑k=1

kck(x− 2)k−1

and

y′′(x) =∞∑k=2

k(k − 1)ck(x− 2)k−2,

191

which when substituted into the ODE yields

x2∞∑k=2

k(k − 1)ck(x− 2)k−2 −∞∑k=1

kck(x− 2)k−1 +∞∑k=0

ck(x− 2)k = 0. (7.17)

It will be expedient to express x2 in terms of x− 2. Since (x− 2)2 = x2 − 4x+ 4 we have

x2 = (x− 2)2 + 4x− 4 = (x− 2)2 + 4(x− 2) + 4,

and so (7.17) becomes

[(x− 2)2 + 4(x− 2) + 4]∞∑k=2

k(k − 1)ck(x− 2)k−2 −∞∑k=1

kck(x− 2)k−1 +∞∑k=0

ck(x− 2)k = 0,

and thus∞∑k=2

k(k − 1)ck(x− 2)k + 4∞∑k=2

k(k − 1)ck(x− 2)k−1 + 4∞∑k=2

k(k − 1)ck(x− 2)k−2

−∞∑k=1

kck(x− 2)k−1 +∞∑k=0

ck(x− 2)k = 0.

Adding zero terms and reindexing where needed, we obtain

∞∑k=0

k(k − 1)ck(x− 2)k + 4∞∑k=0

k(k + 1)ck+1(x− 2)k + 4∞∑k=0

(k + 1)(k + 2)ck+2(x− 2)k

−∞∑k=0

(k + 1)ck+1(x− 2)k +∞∑k=0

ck(x− 2)k = 0,

or equivalently∞∑k=0

[k(k − 1)ck + 4k(k + 1)ck+1 + 4(k + 1)(k + 2)ck+2 − (k + 1)ck+1 + ck] (x− 2)k = 0

for all x ∈ I. Therefore by Proposition 7.17 we have

k(k − 1)ck + 4k(k + 1)ck+1 + 4(k + 1)(k + 2)ck+2 − (k + 1)ck+1 + ck = 0

for all k ≥ 0, which rearranges to become

ck+2 = −(4k2 + 3k − 1)ck+1 + (k2 − k + 1)ck4k2 + 12k + 8

. (7.18)

Using the recursion relation (7.18), we obtain

c2 =c1 − c0

8and

c3 = −6c2 + c124

= −1

4

(c1 − c0

8

)− 1

24c1 =

3c0 − 7c196

.

Hence

y(x) = c0 + c1(x− 2) + c2(x− 2)2 + c3(x− 2)3 + · · ·

192

= c0 + c1(x− 2) +c1 − c0

8(x− 2)2 +

3c0 − 7c196

(x− 2)3 + · · ·

is a power series expansion about 2 for a general solution to the ODE.Alternatively, setting c0 = 1 and c1 = 0 yields the particular solution

y1(x) = 1− 18(x− 2)2 + 1

32(x− 2)3 + · · · ,

while setting c0 = 0 and c1 = 1 yields the particular solution

y2(x) = (x− 2) + 18(x− 2)2 − 7

96(x− 2)3 + · · · ;

and since y1(x) and y2(x) are linearly independent it follows that y(x) = a0y1(x) + a1y2(x) is ageneral solution to the ODE. That is,

y(x) = d1[1− 1

8(x− 2)2 + 1

32(x− 2)3 + · · ·

]+ d2

[(x− 2) + 1

8(x− 2)2 − 7

96(x− 2)3 + · · ·

]for arbitrary constants d1 and d2. �

Example 7.25. Find the first four nonzero terms in a power series expansion about x0 = 0 fora solution to the initial value problem

y′′ + ty′ + ety = 0, y(0) = 0, y′(0) = −1.

Solution. Since t = 0 is an ordinary point for the ODE we expect to find a general solution ofthe form

y(t) =∞∑k=0

cktk,

with the power series converging on some open interval I containing 0. Substituting this powerseries into the ODE, as well as the power series representation for et centered at 0, yields

∞∑k=2

k(k − 1)cktk−2 + t

∞∑k=1

kcktk−1 +

(∞∑k=0

tk

k!

)(∞∑k=0

cktk

)= 0, (7.19)

We take the Cauchy product of the series for et and y(t):

ety(t) =

(∞∑k=0

tk

k!

)(∞∑k=0

cktk

)

=

(1 + t+

t2

2+t3

6+ · · ·

)(c0 + c1t+ c2t

2 + c3t3 + c4t

4 + · · ·)

= c0 + (c0 + c1)︸︷︷︸d1

t+(c0

2+ c1 + c2

)︸︷︷︸

d2

t2 +(c0

6+c12

+ c2 + c3

)︸︷︷︸

d3

t3 + · · · ,

so

ety(t) =∞∑k=0

dktk,

193

where d0 = c0, and d1, d2, and d3 are as defined above. Putting this in (7.19) and reindexing,we have

∞∑k=0

(k + 1)(k + 2)ck+2tk +

∞∑k=0

kcktk +

∞∑k=0

dktk = 0,

or∞∑k=0

[(k + 1)(k + 2)ck+2 + kck + dk] tk = 0.

By Proposition 7.17(k + 1)(k + 2)ck+2 + kck + dk = 0

for all k ≥ 0, which solved for ck+2 becomes

ck+2 = − kck + dk(k + 1)(k + 2)

. (7.20)

The recursion relation (7.20) may now be used to determine ck for any k ≥ 2, at least in termsof c0 and c1:

c2 = −d02

= −c02

c3 = −c1 + d16

= −c1 + (c0 + c1)

6= −c0 + 2c1

6

c4 = −2c2 + d212

= −2(−c0/2) + (c0/2 + c1 + c2)

12=c0 − (c0/2 + c1 − c0/2)

12=c0 − c1

12

c5 = −3c3 + d320

= −3(−c0/6− c1/3) + (c0/6 + c1/2 + c2 + c3)

20=

6c0 + 5c1120

Thus we have

y(t) = c0 + c1t+ c2t2 + c3t

3 + c4t4 + c5t

5 + · · ·

= c0 + c1t−c02t2 − c0 + 2c1

6t3 +

c0 − c112

t4 +6c0 + 5c1

120t5 + · · · (7.21)

From (7.21) and the initial condition y(0) = 0 we readily find that c0 = 0. Putting this into(7.21) gives

y(t) = c1t−1

3c1t

3 − 1

12c1t

4 + t1

24c1t

5 + · · ·From

y′(t) = c1 − c1t2 −1

3c1t

3 +5

24c1t

4 + · · ·

and the initial condition y′(0) = −1 we obtain c1 = −1, and so

y(t) = −t+1

3t3 +

1

12t4 − 1

24t5 + · · ·

is the solution to the IVP on I. �

194

8Systems of Equations

8.1 – Methods of Solving Systems of Linear ODEs

In this section we will consider two techniques for solving certain kinds of systems ofdifferential equations: the method of “systematic elimination” and the Laplace transform method.Systematic elimination, which we address first, makes use of linear differential operators.

Recall that a linear ODE with dependent variable x and independent variable t has the formΛ[x] = f(t), where

Λ =n∑k=0

ak(t)Dk

for Dk = dk/dtk (see §1.2) is a linear differential operator. More specifically Λ[x] = f(t) maybe called a linear ODE in x , to stress that x is the dependent variable (i.e. the unknownfunction).

A linear ODE in x1, . . . , xn is an ODE of the form

n∑j=1

Λj[xj] = f(t), (8.1)

where each Λj is a linear differential operator:

Λj =

oj∑k=0

ajk(t)Dk.

Here we give the order of the operator Λj (i.e. the order of the highest-order derivative operatorDk in Λj) to be oj, which is to say the orders of the operators Λ1, . . . ,Λn in (8.1) are notnecessarily the same. The order of the equation (8.1) is then defined to equal max{o1, . . . , on},the order of the highest-order operator Λj. For example, the linear ODE

(D4 − t9D2 − 5D)x1 + (D7 + 3tD3)x2 + (−D6 − 8D5 − 1)x3 = ln t

has operators of order 4, 7, and 6, and therefore the equation itself is 7th-order.

195

We will frequently encounter linear ODEs in x and y, with t the independent variable. Anexample would be [

x′′(t)− t2x′(t) + 4x(t)]

+[2y′(t)− y(t)

]= et.

This can be written in terms of linear differential operators as

(D2 − t2D + 4)x+ (2D − 1)y = et,

with D = d/dt and D2 = d2/dt2, which of course fits the form of (8.1).A system of linear ODEs in x1, . . . , xn is a set of equations of the form (8.1). If there are

m equations in all, then the system has the form

Λ11[x1] + Λ12[x2] + · · · + Λ1n[xn] = f1(t)

Λ21[x1] + Λ22[x2] + · · · + Λ2n[xn] = f2(t)

...

Λm1[x1] + Λm2[x2] + · · · + Λmn[xn] = fm(t)

(8.2)

Some of the linear differential operators Λij in (8.2) are allowed to be the zero operator, sothat not necessarily every equation in the system will possess all of the dependent variablesx1, . . . , xn. We will be most interested in the case when m = n, which is to say the number ofequations in the system will equal the number of dependent variables present.

We will frequently employ vector notation to present a solution (or family of solutions) to asystem. In the case of (8.2) we define the vector-valued function

x =

x1x2...xn

,where

x(t) =

x1(t)x2(t)

...xn(t)

(8.3)

for any t for which x1(t), . . . , xn(t) are defined. Now if we let

Λ =

Λ11 Λ12 · · · Λ1n

Λ21 Λ22 · · · Λ2n...

.... . .

...Λm1 Λm2 · · · Λmn

and f =

f1f2...fm

,with Λ(t) and f(t) defined analogously, then the system (8.2) can be represented by the matrixequation Λx = f(t) or Λx = f . We emphasize that the product Λx is formally carried outin the same fashion as products of matrices as defined in linear algebra, though of course theentries of the matrix Λ are operators, not scalars.

A solution to a system Λx = f on an interval I is any vector-valued function t 7→ x(t) forwhich Λx(t) = f(t) for all t ∈ I. Clearly it is required that I be a subset of Dom(fi) for all

196

1 ≤ i ≤ m and Dom(xj) for all 1 ≤ j ≤ n. The general solution to a system is the set of allsolutions. Absent any initial or boundary conditions, we can expect that a general solutionwill feature one or more arbitrary constants (called parameters as usual). The number ofparameters present in a general solution hinges on the number of equations m and dependentvariables n in the system, as well as the orders of the various operators Λij for 1 ≤ i ≤ m and1 ≤ j ≤ n.

The first example illustrates a system of the type (8.2) for which m = n = 2, which has theform {

Λ11x + Λ12y = f1(t)

Λ21x + Λ22y = f2(t)

for dependent variables x, y and independent variable t. Henceforth if the dependent variablesin a system are x, y, or x, y, z, then we will let

x =

[xy

]or x =

xyz

,respectively. In either case the independent variable is usually t.

Example 8.1. Solve the system d2x

dt2+dy

dt= −5x

dx

dt+dy

dt= −x+ 4y

(8.4)

by systematic elimination.

Solution. In terms of the differential operator D = d/dt the system becomes{(D2 + 5)x + Dy = 0(D + 1)x + (D − 4)y = 0 (8.5)

Applying the operator D − 4 to the first equation in the system, and D to the second equation,and using the fact that products of differential operators are commutative, we next obtain{

(D − 4)(D2 + 5)x + (D − 4)Dy = 0D(D + 1)x + (D − 4)Dy = 0

Now we subtract the second equation from the first in order to eliminate y from the system,giving

(D − 4)(D2 + 5)x−D(D + 1)x = 0. (8.6)

As proved in §1.2 and stressed anew in §4.5, products of differential operators are formally thesame as products of polynomials, so that

(D − 4)(D2 + 5) = D3 − 4D2 + 5D − 20,

and the ODE (8.6) becomes

d3x

dt− 5

d2x

dt+ 4

dx

dt− 20x = 0. (8.7)

197

The characteristic equation isr3 − 5r2 + 4r − 20 = 0,

which has roots 5, −2i, 2i, and so (8.7) has general solution

x(t) = c1e5t + c2 cos 2t+ c3 sin 2t. (8.8)

by Theorem 4.33.Returning to the system (8.5), we next apply D + 1 to the first equation and D2 + 5 to the

second to obtain {(D + 1)(D2 + 5)x + (D + 1)Dy = 0

(D + 1)(D2 + 5)x + (D2 + 5)(D − 4)Dy = 0

Subtracting these equations eliminates x and yields the ODE

(D + 1)Dy − (D2 + 5)(D − 4)Dy = 0,

or equivalentlyd3y

dt− 5

d2y

dt+ 4

dy

dt− 20y = 0.

This equation has the same characteristic equation as (8.7), and so the general solution is

y(t) = c4e5t + c5 cos 2t+ c6 sin 2t. (8.9)

Between (8.8) and (8.9) there appear to be six independent parameters. However, theparameters c4, c5, c6 in fact depend on c1, c2, c3. To determine the precise nature of thedependence, we put the expressions for x(t) and y(t) given by (8.8) and (8.9) into, say, theequation x′ + y′ = −x+ 4y in the original system (8.4). This gives

(5c1e5t − 2c2 sin 2t+ 2c3 cos 2t) + (5c4e

5t − 2c5 sin 2t+ 2c6 cos 2t)

= (−c1e5t − c2 cos 2t− c3 sin 2t) + (4c4e5t + 4c5 cos 2t+ 4c6 sin 2t).

This simplifies as

(6c1 + c4)e5t − (2c2 − c3 + 2c5 + 4c6) sin 2t+ (c2 + 2c3 − 4c5 + 2c6) cos 2t = 0,

which can only be satisfied for all t in some open interval of real numbers if{6c1 + c4 = 0

2c2 − c3 + 2c5 + 4c6 = 0c2 + 2c3 − 4c5 + 2c6 = 0

From this we discover that c4 = −6c1, c5 = 12c3, and c6 = −1

2c2. Therefore

x(t) =

[c1e

5t + c2 cos 2t+ c3 sin 2t−6c1e

5t − 12c2 sin 2t+ 1

2c3 cos 2t

]= c1

[e5t

−6e5t

]+ c2

[cos 2t−1

2sin 2t

]+ c3

[sin 2t12

cos 2t

]is a three-parameter family of solutions to the system (8.4). �

The final set of solutions found for the system in Example 8.1 has three parameters, whichhappens to be the sum of the orders of the two differential equations in the system. This is notcoincidental: the number of parameters in the general solution to a system of linear ODEs willequal the sum of the orders of the individual ODEs, provided the system admits any solutions

198

at all. Thus the three-parameter family of solutions found for the system in Example 8.1 isindeed the general solution to the system. The theory developed in the next section will provethis for a large class of systems.

The elimination method can sometimes be facilitated by a substitution, and if a systemconsists of more than two equations this can result in a significant savings in labor. The nextexample illustrates this. Also, because the system in the next example consists of three first-orderlinear differential equations, we can expect to find a three-parameter family of solutions.

Example 8.2. Solve the system

dx

dt= −x + z

dy

dt= −y + z

dz

dt= −x + y

(8.10)


Solution. We write the system anew using differential operator notation:(D + 1)x − z = 0

(D + 1)y − z = 0

Dz + x − y = 0

From the first equation we have z = (D+ 1)x, and so we substitute (D+ 1)x for z in the secondand third equations to obtain

(D + 1)x− (D + 1)y = 0 and (D2 +D + 1)x− y = 0.

Applying D + 1 to the equation at right and subtracting it from the equation at left then leadsto

(D + 1)x− (D + 1)(D2 +D + 1)x = 0,

or equivalently

d3x

dt+ 2

d2x

dt+dx

dt= 0.

The characteristic equation for this ODE has roots 0,−1,−1, and so the general solution is

x(t) = c1 + c2e−t + c3te

−t.

Substituting this expression for x in the first equation of (8.10) immediately yields

z(t) = c1 + c3e−t,

which, when substituted for z in the second equation of (8.10) results in

y(t) = c1 + (c2 − c3)e−t + c3te−t.

199

Therefore

x(t) =

c1 + c2e−t + c3te

−t

c1 + (c2 − c3)e−t + c3te−t

c1 + c3e−t

= c1

111

+ c2

110

e−t + c3

tt− 1

1

e−tis the general solution to the system. �

Example 8.3. Solve the systemdx

dt+d2y

dt2= e3t

dx

dt+dy

dt= y − x+ 4e3t

(8.11)


Solution. The system in terms of the differential operator D = d/dt is{Dx + D2y = e3t

(D + 1)x + (D − 1)y = 4e3t (8.12)

Applying D+ 1 to the first equation and D to the second, and then subtracting the second fromthe first, we eliminate x to obtain

(D + 1)D2y − (D − 1)Dy = −8e3t,

and hencey′′′ + y′ = −8e3t (8.13)

The characteristic equation for (8.13) is r3 + r = 0, with roots 0 and ±i. The nonhomogeneity−8e3t has the form Pm(t)eαt with Pm(t) = −8 (so m = 0) and eαt = e3t (so α = 3). By theMethod of Undetermined Coefficients a particular solution solution to (8.13) has the formyp(t) = Atse3t with s equalling the multiplicity of α = 3 as a root of the characteristic equation.Thus s = 0 and we have yp(t) = Ae3t, which, when substituted for y in (8.13) leads to A = − 4

15.

The general solution to (8.13) is therefore

y(t) = c1 + c2 cos t+ c3 sin t− 4

15e3t. (8.14)

Next, if we apply D − 1 to the first equation in (8.12) and D2 to the second, subtractioneliminates y to give

(D − 1)Dx− (D + 1)D2x = −34e3t,

and hencex′′′ + x′ = 34e3t.

The treatment of this ODE is the same as that for (8.13), yielding particular solution xp(t) = 1715e3t,

and therefore

x(t) = c4 + c5 cos t+ c6 sin t+17

15e3t (8.15)

is the general solution.

200

Now, if we put (8.14) and (8.15) into the first equation in (8.11), we find that

(c6 − c2) cos t− (c5 + c3) sin t = 0,

implying c6 = c2 and c5 = −c3. Next putting (8.14) and (8.15) into the second equation in(8.11) yields c4 = c1. Therefore

x(t) =

[c1 + c2 sin t− c3 cos t+ 17

15e3t

c1 + c2 cos t+ c3 sin t− 415e3t

]= c1

[11

]+ c2

[sin tcos t

]+ c3

[− cos tsin t

]+e3t

15

[17−4

]is the general solution to the system. �

Example 8.4. Using the Laplace transform method, solve the systemdx

dt+ 3x +

dy

dt= 1

dx

dt− x +

dy

dt− y = et

subject to x(0) = 0, y(0) = 0.

Solution. Taking the Laplace transform of each equation in the system, letting X = L[x] andY = L[y], leads to

[sX − x(0)] + 3X + [sY − y(0)] =1

s

[sX − x(0)] − X + [sY − y(0)] − Y =1

s− 1

or equivalently, since x(0) = 0 and y(0) = 0,(s+ 3)X + sY =

1

s

X + Y =1

(s− 1)2(8.16)

Multiplying the second equation in (8.16) by s and subtracting it from the first equation yields

3X =1

s− s

(s− 1)2,

and hence

x(t) =1

3L−1

[1

s− s

(s− 1)2

]=

1

3L−1

[1

s− 1

s− 1− 1

(s− 1)2

]=

1− et − tet

3.

Next, multiplying the second equation of (8.16) by s+ 3 and subtracting from the first equationyields

−3Y =1

s− s+ 3

(s− 1)2,

and hence

y(t) = −1

3L−1

[1

s− s+ 3

(s− 1)2

]= −1− et − 4tet

3.

201

Therefore

x(t) =1

3

[1− et − tet−1 + et + 4tet

]is the unique solution to the system subject to the given initial conditions. �

202

8.2 – The Theory of First-Order Linear Systems

A system of n first-order differential equations in n dependent variables x1, . . . , xn that iswritten as

dx1dt

= F1(t, x1, . . . , xn)

dx2dt

= F2(t, x1, . . . , xn)

......

dxndt

= Fn(t, x1, . . . , xn)

is said to be in normal form. The equations in this system are not necessarily linear; however,linearity obtains if for each 1 ≤ j ≤ n there exist functions aj1(t), . . . , ajn(t) and fj(t) such that

Fj(t, x1, . . . , xn) = aj1(t)x1 + aj2(t)x2 + · · ·+ ajn(t)xn + fj(t),

in which case the system takes the form

dx1dt

= a11(t)x1 + a12(t)x2 + · · ·+ a1n(t)xn + f1(t)

dx2dt

= a21(t)x1 + a22(t)x2 + · · ·+ a2n(t)xn + f2(t)

......

dxndt

= an1(t)x1 + an2(t)x2 + · · ·+ ann(t)xn + fn(t)

(8.17)

It is systems of first-order equations of this type, called linear systems, that we will beconsidering throughout much of the remainder of this chapter. With x(t) defined as in (8.3), wefurther define

x′(t) =

x′1(t)x′2(t)

...x′n(t)

, A(t) =

a11(t) a12(t) · · · a1n(t)a21(t) a22(t) · · · a2n(t)

......

. . ....

an1(t) an2(t) · · · ann(t)

, f(t) =

f1(t)f2(t)

...fn(t)

,so that (8.17) can be written simply as x′ = Ax + f . The linear system is homogeneous iff = 0, in which case it becomes x′ = Ax; otherwise the system is nonhomogeneous. We sayx is a solution to the system x′ = Ax + f on an interval I if x′(t) = A(t)x(t) + f(t) for allt ∈ I.

The classic initial-value problem for a linear system in the normal form x′ = Ax + fis the problem of finding some x(t) that both satisfies the system on an open interval I andalso satisfies the initial condition x(t0) = x0, where t0 ∈ I and x0 = [ξ1 · · · ξn]> for someconstants ξ1, . . . , ξn. The following existence-uniqueness theorem is analogous to Theorem4.10. To say the vector-valued function f(t) is continuous on I means each of the component

203

functions f1, . . . , fn are continuous on I; similarly, A(t) is continuous on I if aij(t) is continuouson I for each 1 ≤ i, j ≤ n.

Theorem 8.5 (Existence-Uniqueness). Suppose A(t) and f(t) are continuous on an openinterval I. If t0 ∈ I, then for any x0 ∈ Rn there exists a unique solution on I to the initial-valueproblem

x′(t) = A(t)x(t) + f(t), x(t0) = x0.

The theory pertaining to systems of the type (8.17) is based on this theorem, and as withall existence-uniqueness theorems the proof is deferred to a later chapter. Before developing thetheory, however, the next example should help illustrate the importance of systems of the type(8.17). In particular, any problem involving a higher-order linear ODE can be converted into aproblem involving a first-order linear system (and vice-versa). In the example we make use ofthe following notation: if v is a vector in Rn, so that v = [v1 · · · vn]>, then [v]k = vk for each1 ≤ k ≤ n; that is, [v]k denotes the kth scalar component of v.

Example 8.6. For the linear equation y′′ + ty′ − 4y = sin t we may let x1 = y and x2 = y′.The ODE then becomes x′2 + tx2 − 4x1 = sin t, and since x′1 = x2 also, we obtain the first-orderlinear system {

x′1 = x2

x′2 = 4x1 − tx2 + sin t

More generally, for any nth-order linear ODE in standard form,

y(n) + pn−1(t)y(n−1) + pn−2(t)y

(n−2) + · · ·+ p1(t)y′ + p0(t)y = g(t), (8.18)

we let x1 = y, x2 = y′, . . . , xn = y(n−1), so that the ODE becomes

x′n + pn−1(t)xn + pn−2(t)xn−1 + · · ·+ p1(t)x2 + p0(t)x1 = g(t).

Since x′k = xk+1 for k = 1, 2, . . . , n− 1, the first-order linear system

x′1 = x2

x′2 = x3...

x′n−1 = xn

x′n = −p0(t)x1 − p1(t)x2 − · · · − pn−1(t)xn + g(t)

(8.19)

arises. This system has the form x′ = Ax + f for

A(t) =

0 1 0 · · · 0 00 0 1 0 0...

......

......

0 0 0 · · · 0 1−p0(t) −p1(t) −p2(t) · · · −pn−2(t) −pn−1(t)

, f(t) =

0...0g(t)

,and x(t) as defined by (8.3).

204

As we will see in the pages to come, if xp(t) is a particular solution to (8.19), then there willexist vector-valued functions x1(t), . . . ,xn(t) such that the system’s general solution will be ann-parameter family of vector-valued functions of the form

x(t) = xp(t) +n∑k=1

ckxk(t).

Since [x(t)]1 = x1(t) = y(t), it then follows that the general solution to (8.18) is

y(t) =

[xp(t) +

n∑k=1

ckxk(t)

]1

= [xp(t)]1 +n∑k=1

ck[xk(t)]1.

If we let yp(t) = [xp(t)]1 and yk(t) = [xk(t)]1, we can more clearly perceive that we’ve arrivedat the same conclusion as Theorem 4.17 concerning the general solution to a single nth-orderlinear ODE. Thus the general solution to (8.18) is a scalar component of the general solution tothe system (8.19). �

For any first-order linear system x′ = Ax + f we call x′ = Ax the reduced system. Forexample, the reduced system for (8.19) above is obtained by replacing g(t) with 0 (i.e. the zerofunction) in the last equation. The general theory of first-order linear systems will specify thatthe functions x1(t), . . . ,xn(t) mentioned in Example 8.6 are linearly independent solutions tothe reduced system for (8.19). The next proposition indicates that all linear combinations ofsolutions to a homogeneous linear system x′ = Ax are also solutions.

Proposition 8.7 (Homogeneous Superposition Principle). If x1(t), . . . ,x`(t) are solutionson I to the homogeneous first-order linear system x′ = Ax, then so too is∑

k=1

ckxk(t)

for any scalars c1, . . . , c`.

Proof. Suppose x1(t), . . . ,x`(t) are solutions to x′ = Ax on I, and let ϕ(t) =∑`

k=1 ckxk(t) forsome choice of scalars c1, . . . , c`. Then x′k(t) = A(t)xk(t) for each 1 ≤ k ≤ ` and t ∈ I. Now,using matrix arithmetic and the fact that d/dt is a linear operator, we obtain

ϕ′(t) =d

dt

(∑k=1

ckxk(t)

)=∑k=1

ckx′k(t) =

∑k=1

ck[A(t)xk(t)]

=∑k=1

A(t)[ckxk(t)] = A(t)∑k=1

ckxk(t) = A(t)ϕ(t)

for all t ∈ I. Thus ϕ′ = Aϕ on I, which is to say ϕ is a solution to x′ = Ax on I. �

Given an interval I ⊆ R, the notion of vector-valued functions f1, . . . , f` : I → Rn beinglinearly independent on I is entirely analogous to that of scalar-valued functions. CompareDefinition 4.1 to the following definition.

205

Definition 8.8. Functions f1, . . . , f` : I ⊆ R→ Rn are linearly independent on I if∑k=1

ckfk ≡ 0

on I implies that ck = 0 for all 1 ≤ k ≤ `. Otherwise f1, . . . , f` are linearly dependent on I.

In Definition 8.8 it is not technically necessary for I to be an interval in R, but other kindsof sets are of no use to us here. If we define the set S = {f1, . . . , f`}, then it’s customary tosay S is a linearly independent (or dependent) set on I if the functions f1, . . . , f` are linearlyindependent (or dependent) on I.

Next we need a definition for the Wronskian function that is suitable for developing a moreelegant theory of first-order linear systems.

Definition 8.9. Let I be an interval, and for each 1 ≤ j ≤ n let xj : I → Rn be given by

xj(t) =

x1j(t)...xnj(t)

.The Wronskian of x1, . . . ,xn is the function W [x1, . . . ,xn] : I → R given by

W [x1, . . . ,xn](t) =

∣∣∣∣∣∣∣∣x11(t) x12(t) · · · x1n(t)x21(t) x22(t) · · · x2n(t)

......

. . ....

xn1(t) xn2(t) · · · xnn(t)

∣∣∣∣∣∣∣∣for all t ∈ I.

The value of the Wronskian of a set of vector-valued functions is sensitive to how the functionsare ordered (usually using subscripts). A different ordering will rearrange the columns of theWronskian determinant, potentially resulting in a change in sign. However, the absolute valueof the Wronskian never changes, and for our purposes that is good enough.

If we define the set X = {x1, . . . ,xn}, then W[x1, . . . ,xn] may be written as W[X]. Wecould also define the n× n matrix X = [x1 · · · xn], whose jth column consists of the entries inxj, and write W [x1, . . . ,xn] as W [X]; then we see that

W [X](t) = det(X(t)

).

It may seem that the definition for the Wronskian of a set of vector-valued functions has littlein common with that for scalar-valued functions in §4.1, since the latter involves derivatives ofthe functions. However, in the n = 1 case the Wronskian as presented in Definition 4.6 also lacksderivatives: for a single function f1 we haveW [f1](t) = f1(t). (The 1× 1 determinant of a scalarc simply equals c itself.) Thus the Wronskian of the fundamental set to a first-order homogeneouslinear equation y′(t) = a(t)y(t), which can consist of only one function y1, involves no derivatives.With Definition 8.9 we will find that the fundamental set of a first-order homogeneous linearsystem x′(t) = A(t)x(t) likewise involves no derivatives.

206

Theorem 8.10. Suppose x1, . . . ,xn : I → Rn are solutions on I to the first-order homogeneouslinear system x′ = Ax. Then the set X = {x1, . . . ,xn} is linearly independent on I if and onlyif

W [X](t) 6= 0

for all t ∈ I.

Proof. Suppose there exists some t0 ∈ I such that W [X](t0) = 0. Defining the n× n matrix

X(t) = [x1(t) · · · xn(t)],

it follows that det(X(t0)

)= 0, and the Invertible Matrix Theorem implies there is some nonzero

c =

c1...cn

such that X(t0)c = 0. Since c 6= 0, there exists some 1 ≤ ` ≤ n such that c` 6= 0. Now,

X(t0)c =n∑k=1

ck

x1k(t0)...xnk(t0)

=n∑k=1

ckxk(t0),

and so from X(t0)c = 0 comes

x`(t0) = − 1

c`

∑k 6=`

ckxk(t0) := x0.

Let

ϕ = − 1

c`

∑k 6=`

ckxk.

Now, x` is a solution to x′ = Ax on I, and by Proposition 8.7 so too is ϕ. Moreover we havex`(t0) = x0 and ϕ(t0) = x0, so in fact the functions x`(t) and ϕ(t) are both solutions on Ito the initial-value problem x′ = Ax, x(t0) = x0. However, Theorem 8.5 informs us that anysolution to this IVP must be unique, so that x` = ϕ on I, and hence

x`(t) +∑k 6=`

ckc`

xk(t) = 0

for all t ∈ I. This shows that x1, . . . ,xn are linearly dependent on I, and therefore if the set Xis linearly independent on I, then W [X](t) 6= 0 for all t ∈ I.

For the converse, suppose set X is linearly dependent on I. Then there exist scalars c1, . . . , cn,not all zero, such that

∑nk=1 ckxk ≡ 0 on I. Fix t0 ∈ I. Then

∑nk=1 ckxk(t0) = 0, and since not

all ck are zero we see that the vectors x1(t0), . . . ,xn(t0) are linearly dependent on I. By theInvertible Matrix Theorem it follows that

W [X](t0) = det(x1(t0), . . . ,xn(t0)

)= 0,

and so there exists some t ∈ I for which W[X](t) = 0. This shows that if W[X](t) 6= 0 for allt ∈ I, then X must be linearly independent on I. �

207

The concept of the fundamental set to a first-order homogeneous linear system is analogousto that of a homogeneous linear ODE (see Definition 4.13).

Definition 8.11. Let A(t) be n× n. A fundamental set on I to a first-order homogeneouslinear system x′ = Ax is a set {x1, . . . ,xn} of n linearly independent solutions to x′ = Ax on I.

In the definition the requirement that the matrix A(t) be n× n, together with the equationx′ = Ax, necessarily implies that x is a vector in Rn; that is, x(t) must have precisely the formgiven by (8.3). This is to say that a fundamental set to a system x′ = Ax must have exactlythe same number of elements as the number of dependent variables in the system.18

Theorem 8.12. If A(t) is continuous on an interval I, then the system x′ = Ax has afundamental set on I.

Proof. Suppose A(t) is continuous on I, and fix t0 ∈ I. Recalling the Kronecker delta functionδij from §4.2, for each 1 ≤ j ≤ n define ej ∈ Rn to be the vector with ith scalar component[ej]i = δij. By Theorem 8.5 there is a unique solution xj(t) on I to the initial-value problemx′(t) = A(t)x(t), x(t0) = ej . It follows that X = {x1, . . . ,xn} is a set of solutions to the systemx′ = Ax on I. Suppose scalars c1, . . . , cn are such that

n∑j=1

cjxj ≡ 0

on I. Then in particular, since xj(t0) = ej, we find that

0 =n∑j=1

cjxj(t0) =n∑j=1

cjej =

c100...0

+

0c20...0

+ · · ·+

00...0cn

=

c1c2...cn

,and hence c1 = c2 = · · · = cn = 0. This shows that the solutions x1(t), . . . ,xn(t) to x′ = Ax arelinearly independent on I, and therefore X is a fundamental set to the system on I. �

Theorem 8.13. If X = {x1, . . . ,xn} is a fundamental set on I to the first-order homogeneouslinear system x′ = Ax, then

Span(X) =

{n∑k=1

ckxk : c1, . . . , cn ∈ R

}is the general solution to the system on I.

18Care must be taken with the notation: the symbol xk(t) denotes the kth (scalar) component of the vectorx(t) as given by (8.3), while the bold-faced xk(t) denotes the kth solution (vector) to a system x′ = Ax. Thekth component of xk(t) we denote by xkk(t).

208

Proof. Let S denote the general solution to the system x′ = Ax on I. Since X ⊆ S byhypothesis, that Span(X) ⊆ S follows directly from Proposition 8.7. It remains to show thatS ⊆ Span(X). Suppose ϕ ∈ S. Fix t0 ∈ I, so that ϕ(t0) ∈ Rn. Since the set X is linearlyindependent on I, Theorem 8.10 implies that

det(x1(t0), . . . ,xn(t0)

)=W [X](t0) 6= 0,

and hence the vectors x1(t0), . . . ,xn(t0) are linearly independent in Rn by the Invertible MatrixTheorem. This means the set {x1(t0), . . . ,xn(t0)} is a basis for Rn, and so there exist scalarsc1, . . . , cn such that

n∑k=1

ckxk(t0) = ϕ(t0).

Letting x0 = ϕ(t0), both∑ckxk(t) and ϕ(t) are then seen to be solutions on I to the initial-

value problem x′ = Ax, x(t0) = x0, and since the solution to such an IVP must be unique byTheorem 8.5, it follows that

ϕ =n∑k=1

ckxk

on I. Of course,∑ckxk ∈ Span(X), so that ϕ ∈ Span(X) as well, and we conclude that

S ⊆ Span(X). �

Example 8.14. The systematic elimination method of §8.1 can be used to show that the system{x′1 = x1 + 3x2

x′2 = x1 − x2

has

x(t) = c1

[3e2t

e2t

]+ c2

[−e−2te−2t

](8.20)

as a two-parameter family of solutions on (−∞,∞). To show that this is in fact the generalsolution to the system, it suffices by Theorem 8.13 to demonstrate that

X =

{[3e2t

e2t

],

[−e−2te−2t

]}is a fundamental set to the system on (−∞,∞); and this, in turn, requires showing two things:(1) Each function in X is a solution to the system on (−∞,∞); and (2) the functions in X arelinearly independent on (−∞,∞).

To start, it’s often helpful to cast the system in matrix form. In this case we have[x′1x′2

]=

[1 31 −1

][x1x2

],

or simply x′ = Ax with

A =

[1 31 −1

].

By choosing c1 = 1, c2 = 0 in (8.20), and then c1 = 0, c2 = 1, it’s a quick matter to verify that

x1(t) =

[3e2t

e2t

]and x2(t) =

[−e−2te−2t

]

209

are solutions to the system on (−∞,∞). We could also carry out this verification through directsubstitution: replacing x with x1 in x′ = Ax, we find that

x′1(t) =

[6e2t

2e2t

]and Ax1(t) =

[1 31 −1

][3e2t

e2t

]=

[3e2t + 3e2t

3e2t − e2t]

=

[6e2t

2e2t

]for all t, thereby affirming that x1 is a solution to the system. The verification of x2 is similar,and so each function in X is indeed a solution to the system on (−∞,∞).

Next, since

W [x1,x2](t) =

∣∣∣∣ 3e2t −e−2te2t e−2t

∣∣∣∣ = 4 6= 0

for all t, Theorem 8.10 implies that X is linearly independent on (−∞,∞). �

Theorem 8.15. If xp is a particular solution on I to the first-order nonhomogeneous linearsystem x′ = Ax + f , and X = {x1, . . . ,xn} is a fundamental set on I to the reduced systemx′ = Ax, then

xp + Span(X) =

{xp +

n∑k=1

ckxk : c1, . . . , cn ∈ R

}is the general solution to x′ = Ax + f on I.

Proof. Let S denote the general solution to x′ = Ax + f on I. Suppose ϕ ∈ xp + Span(X), soϕ = xp +

∑nk=1 ckxk for some scalars c1, . . . , cn. Then x′p = Axp + f on I, and also x′k = Axk

on I for each 1 ≤ k ≤ n. Now,

Aϕ+ f = A

(xp +

n∑k=1

ckxk

)+ f = (Axp + f) +

n∑k=1

ck(Axk) = x′p +n∑k=1

ckx′k = ϕ′

on I, so that ϕ ∈ S, and we conclude that xp + Span(X) ⊆ S.Next suppose ϕ ∈ S, so that ϕ′ = Aϕ + f on I, and hence ϕ is a particular solution to

x′ = Ax + f on I. Since xp is also a particular solution, we obtain

(ϕ− xp)′ = ϕ′ − x′p = (Aϕ+ f)− (Axp + f) = A(ϕ− xp),

and thus ϕ− xp is a solution to the reduced system x′ = Ax. By Theorem 8.13 the generalsolution to x′ = Ax on I is Span(X), so ϕ− xp ∈ Span(X) and there exist scalars c1, . . . , cnsuch that

ϕ− xp =n∑k=1

ckxk

on I. Now we have

ϕ = xp +n∑k=1

ckxk ∈ xp + Span(X)

on I, and therefore S ⊆ xp + Span(X). �

The presentation of the core theory of first-order linear systems is now complete. SinceExample 8.6 shows that any nth-order linear ODE can be rendered as a first-order linear systemwith general solution consisting of vector-valued functions whose first components are solutionsto the ODE, the theory given here is a generalization of the theory of linear ODEs given in §4.2.

210

8.3 – Homogeneous Linear Systems

Given an n× n matrix A, we say that v 6= 0 is an eigenvector of A if Av = λv for somescalar λ, in which case λ is called an eigenvalue of A, v is said to be an eigenvector of A“corresponding to” λ, and (λ,v) is an eigenpair of A. It’s an easy matter to show that if (λ,v)is an eigenpair, then so too is (λ, cv) for any scalar c 6= 0.

What’s called the characteristic polynomial of an n×n matrix A is a polynomial functionPA given by

PA(r) = det(A− rI),

where I denotes the n× n identity matrix. As is established in §6.2 of [LIN], a scalar λ is aneigenvalue of A if and only if PA(λ) = 0. In §6.3 of [LIN] it is shown that PA(r) is an nthdegree polynomial, so by the Fundamental Theorem of Algebra the characteristic equation

PA(r) = 0

will have n roots (counting multiplicities), and hence A must have n eigenvalues. With theexpression for A(t) given in Example 8.6 it can be shown that the notion of a characteristicequation presented here is a generalization of the concept as presented in §4.4.

Here A will always be a coefficient matrix—with real-valued constant entries—for a first-orderlinear homogeneous system x′ = Ax. The eigenvalue method is a means of determining afundamental set to such a system principally by finding the eigenvalues of the coefficient matrix.As regards the eigenvalues of A in this context, there are three cases we will consider: (1)All eigenvalues are distinct and real; (2) All eigenvalues are distinct but not all real; and (3)There is a repeated real eigenvalue. The general solution to x′ = Ax is most straightforwardlydetermined when A has distinct real eigenvalues, so this case will be addressed first. The othercases introduce varying measures of complications and will be entertained later in this section.

In proving the following theorem we use the fact (see §6.1 of [LIN]) that if λ1, . . . , λm aredistinct eigenvalues of A, and vk is an eigenvector corresponding to λk for each 1 ≤ k ≤ m,then {v1, . . . ,vm} is a linearly independent set.

Theorem 8.16. If A is an n× n matrix with n distinct real eigenvalues λ1, . . . , λn, and vj isan eigenvector corresponding to λj for each 1 ≤ j ≤ n, then

Span(v1eλ1t, . . . ,vne

λnt) =

{n∑j=1

cjvjeλjt : c1, . . . , cn ∈ R

}

is the general solution to the system x′ = Ax on (−∞,∞).

Proof. Suppose A has distinct real eigenvalues λ1, . . . , λn, and (λj,vj) are eigenpairs with

vj =

v1j...vnj




211

for each 1 ≤ j ≤ n. Let xj(t) = vjeλjt, so that

xj(t) =

v1jeλjt...vnje

λjt

.Then

x′j(t) =

(v1jeλjt)′

...(vnje

λjt)′

=

v1jλjeλjt...vnjλje

λjt

= λjeλjt

v1j...vnj

= λjeλjtvj = λjxj

andAxj(t) = A(vje

λjt) = eλjt(Avj) = eλjt(λjvj) = λjxj

for all t, so that x′j = Axj on (−∞,∞), and X = {x1, . . . ,xn} is a set of solutions to x′ = Axon (−∞,∞).

Now, the eigenvectors v1, . . . ,vn are linearly independent since they correspond to distincteigenvalues, and so det(v1, . . . ,vn) 6= 0 by the Invertible Matrix Theorem. Then

W [X](t) = det(x1(t), . . . ,xn(t)

)= det

(v1e

λ1t, . . . ,vneλnt)

= det(v1, . . . ,vn)n∏j=1

eλjt 6= 0

for all t, implying that X is linearly independent on (−∞,∞) by Theorem 8.10, and hence Xis a fundamental set to x′ = Ax on (−∞,∞). The conclusion of the theorem now follows fromTheorem 8.13. �

Example 8.17. Solve the system

dx

dt= −x + 4y − 2z

dy

dt= −3x + 4y

dz

dt= −3x + y + 3z

(8.21)

Solution. The system has the form x′ = Ax with constant coefficient matrix

A =

−1 4 −2−3 4 0−3 1 3

.Expanding the determinant along the second row, the characteristic polynomial for A is

PA(λ) = det(A− λI) =

∣∣∣∣∣∣−1− λ 4 −2−3 4− λ 0−3 1 3− λ

∣∣∣∣∣∣= (−1)2+1(−3)

∣∣∣∣ 4 −21 3− λ

∣∣∣∣+ (−1)2+2(4− λ)

∣∣∣∣ −1− λ −2−3 3− λ

∣∣∣∣= −λ3 + 6λ2 − 11λ+ 6,

212

and so

PA(λ) = 0 ⇔ λ3 − 6λ2 + 11λ− 6 = 0.

By the Rational Zeros Theorem of algebra, the only rational numbers that may be zeros ofPA(λ) are ±1, ±2, ±3 and ±6. It’s an easy matter to verify that 1 is in fact a zero, and so bythe Factor Theorem of algebra λ− 1 must be a factor of PA(λ). Now,

λ3 − 6λ2 + 11λ− 6

λ− 1= λ2 − 5λ+ 6,

whence we obtain

λ3 − 6λ2 + 11λ− 6 = 0 ⇔ (λ− 1)(λ2 − 5λ+ 6) = 0 ⇔ (λ− 1)(λ− 2)(λ− 3) = 0,

and therefore PA(λ) = 0 if and only if λ = 1, 2, 3. The eigenvalues of A are 1, 2, 3.To find an eigenvector corresponding to the eigenvalue 1, we solve Ax = 1x. Now,

Ax = 1x ⇔

−1 4 −2−3 4 0−3 1 3

xyz

=

xyz

⇔ −x+ 4y − 2z = x−3x+ 4y = y−3x+ y + 3z = z

,

and hence −x+ 2y − z = 0−x+ y = 0−3x+ y + 2z = 0

.

Apply Gaussian elimination on the augmented matrix, with ri denoting the ith row:−1 2 −1 0−1 1 0 0−3 1 2 0

−r1+r2→r2−−−−−−−−→−3r1+r3→r3

−1 2 −1 00 −1 1 00 −5 5 0

−5r2+r3→r3−−−−−−−−→

−1 2 −1 00 −1 1 00 0 0 0

.So from the second row r2 we have y = z, and from r1 we have x = 2y − z = 2z − z = z.Replacing z with λ, so that x = y = z = λ, we find the solution set to be

λλλ

: λ ∈ R

=

1

11

λ : λ ∈ R

,and hence (1, [1, 1, 1]>) is an eigenpair.

Next we find an eigenvector corresponding to 2; that is, we find some x 6= 0 such thatAx = 2x. We have

Ax = 2x ⇔

−1 4 −2−3 4 0−3 1 3

xyz

=

2x2y2z

⇔−3x+ 4y − 2z = 0−3x+ 2y = 0−3x+ y + z = 0

.

Apply Gaussian elimination on the augmented matrix:−3 4 −2 0−3 2 0 0−3 1 1 0

−r1+r2→r2−−−−−−−→−r1+r3→r3

−3 4 −2 00 −2 2 00 −3 3 0

− 32r2+r3→r3−−−−−−−−→

−3 4 −2 00 −2 2 00 0 0 0

,

213

so y = z from r2, and from r1 we have

x =4

3y − 2

3z =

4

3z − 2

3z =

2

3z.

If we replace z with 3λ we can express the solution set without fractions as2

33

λ : λ ∈ R

,

and hence (2, [2, 3, 3]>) is an eigenpair.Finally we find an eigenvector corresponding to 3 by solving Ax = 3x.

Ax = 3x ⇔

−1 4 −2−3 4 0−3 1 3

xyz

=

3x3y3z

⇔−4x+ 4y − 2z = 0−3x+ y = 0−3x+ y = 0

.

The solution set is 1

34

λ : λ ∈ R

,and hence (3, [1, 3, 4]>) is an eigenpair.

By Theorem 8.16 we conclude that

x = c1

111

et + c2

233

e2t + c3

134

e3tis the general solution to (8.21) on (−∞,∞). �

differential equations -...

Documents