summary of calculus - university of arizonamath.arizona.edu/~faris/calc/calcsum.pdf · summary of...

Summary of Calculus

William G. Faris

December 6, 2004

Chapter 1

Functions

1.1 A catalog of functions

A function f takes an input number in its domain and gives an output number inits range. If for each output number in the range there is only one correspondingnumber in the domain, then the function f has an inverse function f−1. Thatis, if y = f(x), then x is defined in terms of y by x = f−1(y). The domain ofthe inverse function is the range of the original function, and the range of theinverse function is the domain of the original function.

Sometimes a function may not have an inverse function, but by restrictingit to a smaller domain it will have an inverse function. In that case, the inversefunction is determined by solving y = f(x) for x, with x in this restricteddomain. For example, the squaring function y = x2 is naturally defined on thedomain of all real numbers x, and it does not have an inverse function. Howeverif we restrict the squaring function to the smaller domain of all x ≥ 0, then thereis an inverse function x = y

12 , that is, taking the positive square root.

In mathematics it is customary to describe a function by what it does oninput values. In a few cases there are explicit names for the functions. For in-stance, many calculators and computer languages have notations such as squareand sqrt that describe the function itself. Thus for example the square func-tion is the function that sends x to square(x) = x2. Similarly, the square rootfunction sqrt sends x to sqrt(x) =

√x = x

12 .

A function is often described by a graph, where ordinarily the horizontalaxis represents the input, and the vertical axis represents the output. For thegraph to describe a function, it must have the property that every vertical lineintersects the graph at most once. For the graph to describe a function with aninverse, it must have the property that every horizontal line intersects the graphat most once. The graph of the inverse function is obtained by interchangingthe roles of horizontal and vertical.

For a first example of a function, fix a number p called the power. If x isthe input number, then xp is the output number. This power function may be

3

4 CHAPTER 1. FUNCTIONS

defined on the domain consisting of all numbers x > 0, that is, on the interval(0,+∞). If p > 0 it may be defined on a larger domain consisting of all realnumbers x ≥ 0, that is, on the interval [0,+∞). In either case it sends thedomain to itself. With these domains the function has an inverse function.Since y = xp, implies x = y

1p , this inverse function sends y to y

1p . Thus the

inverse function of the pth power function is the 1/pth power function.In some circumstances the power function may be defined on a larger domain.

Say that p = n/k, where n and k are integers, and k is odd. Then the domainmay be extended to include all numbers x < 0, that is, the interval (−∞, 0) isa subset of the domain. If n is odd, then the function has the range equal tothe domain. If n is odd, the inverse function sends y to y

1p , where 1/p = k/n.

If n is even, then the range is a subset of [0,∞). In this case the function withthis larger domain has no inverse function.

For a second example, fix a number a > 0 called the base. To make thefunction interesting, take a 6= 1. If x is the input number, then ax is the outputnumber. This is the exponential function with base a. The domain consists ofall real numbers, and the range consists of the interval (0,+∞). Since y = axis equivalent to x = loga(y), the inverse of the exponential function with base ais the logarithm function with base a.

The most common choices of base are 2, e, and 10. The almost universalchoice in calculus contexts is a = e. In this case, the exponential function issometimes denoted exp, and the logarithm function is often denoted ln. Thus

exp−1 = ln . (1.1)

The reason for this choice is that e is the only base for which the exponentialfunction satisfies the inequality

1 + x ≤ ex (1.2)for all x.

All the other exponential functions may be defined in terms of the one withbase e. This is because ax = (eln(a))x = eln(a)x. All one has to know is thenumerical value of the constant ln(a). Similarly, if y = loga(x), then x =ay = eln(a)y, so ln(a)y = ln(x), that is, loga(x) = ln(x)/ ln(a). Again the sameconstant is involved. Notice that the inequality for other bases a > 1 takes theform

1 + ln(a)t ≤ at (1.3)for all t. The reason for the simplicity in the case of base e is that ln(e) = 1.

The next example is that of the trigonometric functions. These functions willalways be defined with radians as inputs. (Degrees may be converted to radiansby multiplying by π/180.) The functions sin and cos have domain consisting ofall real numbers and range consisting of the interval [−1, 1]. With these naturaldomains they do not have inverse functions. However if one restricts sin to[−π/2, π/2] and cos to [0, π], then the restricted functions have inverses, calledarcsin and arccos. Thus

sin−1 = arcsin (1.4)

1.2. EXPONENTIALS BEAT POWERS 5

andcos−1 = arccos . (1.5)

The tangent function tan is defined by dividing the output of the sin functionby the output of the cosine function. Since the cosine is zero at odd multiples ofπ/2, the domain of the tangent function must exclude these points. The tangentfunction does not have an inverse unless it is restricted to a smaller interval,and the natural restriction is to (−π/2, π/2). Then inverse function is

tan−1 = arctan . (1.6)

A final example of a function is a constant function. This gives the constantoutput c for every input.

1.2 Exponentials beat powers

An exponential function will always be larger than a power function for suffi-ciently large input values. In fact, from 1 + x ≤ ex we see that for x ≥ 0 wehave (1 + x)n ≤ enx for each n = 1, 2, 3, . . .. We may set t = nx and get

(1 +t

n)n ≤ et (1.7)

for t ≥ 0. This says that the exponential function grows at least as fast as annth degree polynomial. For example, when n = 2 we have 1 + t + 14 t

2 ≤ et fort ≥ 0.

From 1 + x ≤ ex we get 1 + ln(y) ≤ y for all y > 0. Set y = sp for p > 0.This gives 1 + p ln(s) ≤ sp, or

ln(s) ≤ 1p

(sp − 1) (1.8)

for s > 0. This says that the logarithm function grows no faster than a p powerfunction. For example, when p = 1/2 we have ln(s) ≤ 12 (

√s− 1) for s > 0.

1.3 Combining functions

Functions may be combined by addition, subtraction, multiplication, and divi-sion. This is done by performing the corresponding operation on the outputvalues. Thus the value of the sum or difference function f ± g at x is

(f ± g)(x) = f(x)± g(x). (1.9)

The value of the product function f · g at x is

(f · g)(x) = f(x) · g(x). (1.10)


The value of the quotient function fg at x is

(f

g)(x) =

f(x)g(x)

. (1.11)

Here the domain is restricted to those x for which the denominator g(x) 6= 0.Start with the constant functions and the first power function. If we repeat-

edly apply the sum, difference, and product operations, we obtain the polynomialfunctions. If in addition we apply the quotient operations we obtain the rationalfunctions.

Another method of combining functions is composition. The compositionf ◦ g is defined by

(f ◦ g)(x) = f(g(x)). (1.12)Thus the output of g becomes the input of f . In other words, the output isobtained by first applying g and then applying f .

The order in which functions are composed is important. Thus, for example,the function sin ◦ square with input x has output sin(x2). On the other hand,the function square ◦ sin with input x has output (sin(x))2. In general, thecomposition f ◦ g represents the action of g followed by the action of f .

The notion of inverse function is closely related to the notion of composition.If f−1 is the inverse function to f , so that y = f(x) is equivalent to x =f−1(y), then f−1(f(x)) = x and f(f−1(y)) = y. Thus arcsin(sin(x)) = x andsin(arcsin(y)) = y, where −π/2 ≤ x ≤ π/2 and −1 ≤ y ≤ 1.

It is extremely important to avoid confusion between the notation g−1(x)for inverse function and the notation g(x)−1 = 1/g(x) for reciprocal function.These both play an important role in calculus, and the fact that the notationis essentially the same requires constant vigilance. Thus it is often useful indivision problems to note that the reciprocal (multiplicative inverse)

1g(x)

= g(x)−1 (1.13)

is the composition where one first applies g and then the −1 power function.This has nothing to do with the inverse function of g.

There are some rather simple compositions that are important in practice.The first is the shift. Say that f is a given function. Then we can shift it upby k if we define a new function with output y satisfying y − k = f(x), ory = f(x) +k. We can shift it right by h if we define a new function with outputy satisfying y = f(x − h). A nice example of the latter is when cos is changedto sin by shifting to the right by π/2.

Another is the stretch. Say that f is a given function. Then we can stretchit vertically by c > 0 if we define a new function with output y satisfyingy/c = f(x), or y = cf(x). We can stretch it horizontally by a > 0 if we definea new function with output y satisfying y = f(x/c).

Another one is the reflection. Say that f is a given function. Then wecan reflect it vertically if we define a new function with output y satisfying

1.4. UNITS AND DIMENSIONS 7

y = −f(x). We can reflect it horizontally if we define a new function withoutput y satisfying y = f(−x).

A function is even if it is unchanged by horizontal reflection, that is, iff(x) = f(−x) for all x. The standard examples is f = cos. Another exampleis the pth power where p = n/k with k odd and n even. A function is odd ifhorizontal reflection is the same as vertical reflection, that is, if f(−x) = −f(x)for all x. The standard example is sin. Another example is the pth power wherep = n/k with k odd and n odd.

1.4 Units and dimensions

In applications many quantities are measured in units. Thus length may bemeasured in meters, time in seconds, and mass in kilograms. In general units oflength, time, and mass are indicated by L, T and M . The specification of length,time, mass, and so on is called the dimensions of the quantity. The formulaV = s3 for the volume of a box of side s is a good example. In this formulathe input s has dimension L of length, while the output V has dimensions L3.Thus if s is measured in meters, the V is measured in cubic meters. Similarly,velocity has units L/T of length over time, acceleration has units L/T 2 of lengthover time squared, and force has units F = ML/T 2 of mass times length overtime squared. Velocity could be measured in meters per second, accelerationin meters per second squared, and force in newtons. Other important physicalquantities are energy with dimensions FL and power with dimensions FL/T .The unit of energy is the joule, while the unit of power is the watt. (Thus ajoule is a watt-second, while 3.6 megajoules is a kilowatt-hour.)

In applications we have relations like u = Aekt, where t is time. This definesu as a function of t. Here u is some positive quantity that grows exponentiallywith time (if k > 0) or decays exponentially with time (if k < 0). The input tothe exponential function must be independent of the units, that is, dimension-less. Therefore, if t is measured in seconds, the continuous growth rate k mustbe measured in inverse seconds, so that kt is a dimensionless number. That is,the growth rate has dimension 1/T of inverse time.

One modification of this scheme is to have u = Aek(t−t0). In this case t0 is astarting time. Again this defines u as a function of t, but now A is the value ofu at the starting time t0. However this reduces to the previous case, if we write

u = Aek(t−t0) = A1ekt, (1.14)

where A1 = Ae−kt0 . Notice that the output of the exponential function is alsodimensionless, so the dimension of A1 is the same as the dimension of A.

In applications it is common to pick a base for the exponential function thatis adapted to the particular situation. The idea is to choose a convenient unitof time ∆t > 0. Thus ∆t could be a half hour or 30 minutes. Define

a = ek∆t, (1.15)


Thenat

∆t = ekt. (1.16)

The reason for the choice of base a is that it is the growth factor for the chosentime unit ∆t. Conversely, one can choose a growth factor such as a = 2; thenthe time unit ∆t is the doubling time. (These concepts have to be modified inan obvious way if k is negative. Then the growth factor becomes a decay factor,and the doubling time becomes a half life.)

Often one writes a = 1 + r, where r is the increase over 1. If k∆t is small,then since ek∆t ≈ 1 + k∆t, it follows that r ≈ k∆t. It is not surprising thatpeople confuse these two quantities.

A more interesting case where the same ideas apply is when we have arelation like u = A cos(ωt), where t is time. This defines u as a function of t.The input to the cosine function is in radians and therefore must be independentof the units, that is, dimensionless. Therefore, if t is measured in seconds, theangular frequency ω must be measured in radians per seconds, so that the phaseωt is a dimensionless number. That is, the angular frequency ω has dimension1/T of inverse time.

One modification of this scheme is to have u = A cos(ω(t− t0). In this caset0 is the time shift. Again this defines u as a function of t, but now A is thevalue of u at the starting time t0. However this reduces to something more likethe previous case, if we write

u = A1 cos(ωt) +B1 sin(ωt), (1.17)

where A1 = A cos(ωt0) and B1 = B sin(ωt0). Notice that the output of the sineand cosine functions are also dimensionless, so the dimension of A1 and B1 arethe same as the dimension of A.

Sometimes we write φ = ωt0 to get a dimensionless constant called the phaseshift. In this notation u = A cos(ωt− φ).

There are several variations on these notions. The frequency ν is related tothe angular frequency by

ω = 2πν. (1.18)

This is because radians are being accumulated faster than complete rotations.Often the frequency is measured in hertz (another name for inverse seconds).The frequency ν is related to period T by

ν =1T. (1.19)

Therefore the angular frequency ω is related to period by

ω =2πT. (1.20)

The same ideas apply to a wave with variation in some direction in spacerather than in time. Thus we might have a relation u = A cos(kx), where x isdistance. This defines u as a function of x. The input to the cosine function

1.5. CONSTANT SPEED WAVE PROPAGATION 9

must be independent of the units, that is, dimensionless. Therefore, if x ismeasured in meters, the wave number k must be measured in inverse meters,so that kx is a dimensionless number. That is, the wave number has dimension1/L of inverse length. The wave number is related to the wavelength λ by

k =2πλ. (1.21)

1.5 Constant speed wave propagation

For light waves (and also for sound waves) there is a remarkable relation betweenthese quantities. Let c be the speed of these waves. One possible wave has theform

cos(kx− ωt) = cos(kx) cos(ωt) + sin(kx) sin(ωt). (1.22)For simplicity assume that ω and k are positive. (Otherwise use absolute values).Then the speed c satisfies

ω = ck (1.23)

This immediately givesνλ = c. (1.24)

The frequency times the wavelength is the speed. Frequency is measured inhertz (inverse seconds), while wavelength is measured in meters. The speed c isof course measured in meters per second.

The speed of light is about c = 3 · (10)8 meters per second. Wavelengths of3 km, 300 m, 30 m, 3 m, 30 cm correspond to radio frequencies of 100 kilohertz,1 megahertz, 10 megahertz, 100 megahertz, and 1 gigahertz. This are whatare called LF, MF, HF, VHF, and UHF radio bands. As the frequencies gethigher and the wavelengths get shorter one gets into the radar and microwaverange and eventually to the infrared at about 3 ·(10)−6 meter wavelength. Then3 · (10)−7 meters is already in the ultraviolet.

The speed of sound is about c = 3.3 · (10)2 meters per second. Thus afrequency of 330 hertz (cycles per second) corresponds to a wave of length aboutone meter. This is about the size of a low string instrument. If you triple thefrequency and divide the length by three, you get something more like a violinor a flute.

1.6 Appendix: Metric units

Fundamental metric units for length, mass, and time are the meter, kilogram,and second, abbreviated by m, kg, and s. The units minute, hour, day areabbreviated min, hr, d. Sometimes radian is abbreviated by rad.

Other units for frequency, force, energy, and power are the hertz (inversesecond), newton (kilogram meter per second squared), joule (newton meter),and watt (joule per second), abbreviated by Hz, N, J, W.


The multipliers 10−9, 10−6, 10−3 are called nano, micro, and milli, and areabbreviated as n, µ, and m. Sometimes 10−2 is called centi and abbreviated asc. The multipliers 103, 106, 109 are abbreviated kilo, mega, and giga and areabbreviated by k, M, G. Thus for example a kilowatt-hour is 1 kw hr = 3.6 MJ.

The situation is quite confused in the non-metric world. The question isabout the choice of the unit of mass. Some authors like to use the foot, pound-mass, poundal system. Then the unit of mass is the pound-mass, and the unitof force is the poundal. Others use the foot, slug, pound-force system. Thenthe unit of mass is the slug, and the unit of force is the pound-force.

A foot is 0.3048 meter. A pound-mass is 0.45359 kilogram. (A kilogram isthus about 2.2 pound-mass.)

In the foot, pound-mass, second system the unit of force is the poundal,which is a pound-mass foot per second squared. This is 0.45359 kilogram times0.3048 meter per second squared, which works out to be 0.13825 newton.

The acceleration of gravity at the earth’s surface is 9.80665 meter per secondsquared, or 32.1740 foot per second squared. A pound-force is a pound-masstimes the acceleration of gravity, that is, 32.1740 poundal.

In the foot, slug, second system a slug is 32.1740 pound-mass, or 14.594kilogram. The unit of force is the pound-force, which is 32.1740 poundal, or4.4482 newton.

Chapter 2

The derivative

2.1 Limits

Say that g is a function. We say that limt→c g(t) = L provided that g(t) canbe made as close as one wishes to L by taking t sufficiently close to c. In otherwords, one can get an output of g arbitrarily close to L by taking the inputclose enough to c.

The official definition of limt→c g(t) = L is that for every number � > 0 (nomatter how small), there is a number δ > 0 such that for every number t 6= c,if |t− c| < δ, then |g(t)− L| < �.

2.2 A limit for the exponential function

We have seen that 1 + x ≤ ex for all x. It follows that 1 − x ≤ e−x for allx. If x < 1, then 1 − x > 0, and so we may write this in the equivalent formex ≤ 1/(1− x). This proves the result

1 + h ≤ eh ≤ 11− h (2.1)

for all h < 1. As a consequence

h ≤ eh − 1 ≤ h1− h (2.2)

for all h < 1.From this we see that

1 ≤ eh − 1h

≤ 11− h (2.3)

for 0 < h < 1 and1

1− h ≤eh − 1h

≤ 1 (2.4)

11

12 CHAPTER 2. THE DERIVATIVE

for h < 0. In either case, if h is close to 0, then 1/(1−h) is close to 1, and henceby the inequality (eh − 1)/h is close to 1. This proves

limh→0

eh − 1h

= 1. (2.5)

Notice that one could prove in the same way that

limt→0

at − 1t

= ln(a). (2.6)

The reason for the simplicity of the base e case is that ln(e) = 1.This limit helps clarify the meaning of the continuous growth rate k. Recall

that for each time interval ∆t there is a corresponding growth rate r∆t suchthat

(1 + r∆t)t

∆t = ekt. (2.7)

It follows thatr∆t = ek∆t − 1 (2.8)

and sor∆t∆t

= kek∆t − 1k∆t

. (2.9)

Let ∆t→ 0. Then also k∆t→ 0. Solim

∆t→0r∆t∆t

= k. (2.10)

This says that the continuous growth rate is found by comparing the growthduring a small time interval to the length of the time interval. So if the growthrate is an increase by 0.003 (or 0.3 percent) in 1/100 year, then the continuousgrowth rate k = 0.29996 is close to 0.003/(1/100) = 0.30 per year. This shouldbe contrasted with the actual growth in a year. This is by a factor of (1.003)100 =1.349. This of course is the same as e0.29996 = 1.349. So in a year the growth isan increase of 34.9 percent.

2.3 Limits for the sine and cosine functions

Look at the circle centered at the origin with radius one. Consider h with0 < h < π/2. Then (cos(h), sin(h)) is the point on the circle corresponding toangle h in radians. The distance along the circle from the point (1, 0) to thepoint (cos(h), sin(h)) is h. From this it is clear that sin(h) ≤ h for 0 < h < π/2.

Now consider the area of the sector inside the circle corresponding to angle0 to angle h. This is (1/2)h. Compare this to the area of the triangle runningfrom (0, 0) to (1, 0) to (1, tan(h)). This area is (1/2) tan(h), so we have (1/2)h ≤(1/2) tan(h).

From the above inequalities we see that

cos(h) ≤ sin(h)h≤ 1 (2.11)

2.4. CONTINUITY 13

for 0 < |h| < π/2. This is true for both positive and negative values of h, sincethese are all even functions. The conclusion is the important limit result

limh→0

sin(h)h

= 1. (2.12)

We can get a result for cosine from the result for sine and from cos2(h) +sin2(h) = 1. Write this in the form sin2(h) = (1 + cos(h))(1 − cos(h)). Fromthis we get

sin2(h)h2

= (1 + cos(h))1− cos(h)

h2. (2.13)

Consider the limit as h→ 0. The left hand side has limit 1, while the first factoron the right hand side has limit 2. This proves

limh→0

cos(h)− 1h2

= −12. (2.14)

As a corollary we get that

limh→0

cos(h)− 1h

= 0. (2.15)

2.4 Continuity

A function f is continuous provided that for all x in the domain we have

limt→x

f(t) = f(x). (2.16)

In other words, the output of f is arbitrarily close to f(x) if the input is closeenough to x.

Here is another way of saying the same thing. A function f is continuousprovided that for all x in the domain we have

limh→0

f(x+ h) = f(x). (2.17)

2.5 The derivative

A function f is differentiable provided that for all x in the domain we have

limh→0

f(x+ h)− f(x)h

= f ′(x). (2.18)

The value f ′(x) is the derivative of the function f at the input value x. In otherwords, the difference quotient may be made arbitrarily close to the derivativeby making the denominator sufficiently close to zero.

The classic example is f(x) = x2. We can write the difference quotient as

(x+ h)2 − x2h

= 2x+ h. (2.19)


The left hand side is defined for h 6= 0. However it is evident from the righthand side that

limh→0

(x+ h)2 − x2h

= 2x. (2.20)

So f ′(x) = 2x.Another simple example is g(u) = 1/u. We can write

1/(u+ h)− 1/uh

= − 1u(u+ h)

. (2.21)


limh→0

1/(u+ h)− 1/uh

= − 1u2. (2.22)

So g′(u) = −1/u2.Say that y = f(x). Then the derivative may also be written in the form

f ′(x) = lim∆x→0

∆y∆x

, (2.23)

where ∆x 6= 0 is the change in x between two points and ∆y = f(x+∆x)−f(x)is the corresponding change in y between the same two points. This makesexplicit the fact that the derivative is a limit of quotients of differences. Acommon notation is the Leibniz notation, in which the derivative is written

dy

dx= lim

∆x→0∆y∆x

. (2.24)

Since the two points become closer and closer, the derivative dy/dx dependsonly on one point.

The fact that there are two notations in calculus is one of its most confusingaspects. However the relation between the two notations is quite precise. If

y = f(x) (2.25)

thendy

dx= f ′(x). (2.26)

Here f and f ′ are functions, with f ′ the derivative of f while x and y arethe input and output variables. Thus the two equations above are actuallyequations involving the outputs of the functions. That is, dy/dx is the outputof the derivative function f ′ at the input x. If we want to evaluate y or dy/dxat the point where the input variable x is a, then this value is y(x=a) = f(a) ordy/dx(x=a) = f ′(a).

The interpretation of dy/dx is as the rate of change of y with respect tochange in x. Thus if dy/dx = f ′(x) > 0, then y is increasing with x, whileif dy/dx = f ′(x) < 0, then y is decreasing with x. If the value of dy/dx at

2.5. THE DERIVATIVE 15

a particular point x = a is zero, that is, if dy/dx(x=a) = f ′(a) = 0, then y ishesitating at this point as x changes, undecided whether to increase or decrease.

Return to the examples. Suppose that y = x2. We can write the differencequotient as

∆y∆x

=(x+ ∆x)2 − x2

∆x=

2x∆x+ (∆x)2

∆x= 2x+ ∆x. (2.27)

The left hand side is defined for ∆x 6= 0. However it is evident from the righthand side that

lim∆x→0

(x+ ∆x)2 − x2∆x

= 2x. (2.28)

So dy/dx = 2x. Notice in connection with this example that (∆x)2 must bedistinguished from ∆x2 = 2x∆x+ (∆x)2.

Another simple example is w = 1/u. We can write

∆w∆u

=1/(u+ ∆u)− 1/u

∆u= − 1

u(u+ ∆u). (2.29)

The left hand side is defined for ∆u 6= 0. However it is evident from the righthand side that

lim∆u→0

1/(u+ ∆u)− 1/u∆u

= − 1u2. (2.30)

So dw/du = −1/u2.Warning: The Leibniz notation should serve to remind us of the definition

of derivative as a limit of quotients. Thus an expression like dy/dx or dw/du isnot a quotient, but is a limit of quotients ∆y/∆x or ∆w/∆u.

Suppose that s = f(t), where t is time and s is the position at time t. Thenthe velocity is

v =ds

dt= f ′(t). (2.31)

The classic example is the falling body, where

s =12gt2, (2.32)

and g is a constant. This describes the distance below the starting point whenan object is dropped from this point at time zero. Then

∆s∆t

=12g(t+ ∆t)

2 − 12gt2∆t

= gt+12g∆t. (2.33)

In this case

v =ds

dt= gt. (2.34)


2.6 Derivative of the exponential function

If exp(x) = ex, then exp′(x) = ex. The exponential function is its own deriva-tive. This is seen by calculating

ex+h − exh

=eh − 1h

ex. (2.35)

By using the limit we know for the exponential functions, we get

exp′(x) = limh→0

ex+h − exh

= ex = exp(x). (2.36)

2.7 Derivatives of the sine and cosine functions

We have

sin(x+ h)− sin(x)h

=cos(h)− 1

hsin(x) +

sin(h)h

cos(x). (2.37)

Take the limit as h→ 0. This gives

sin′(x) = cos(x). (2.38)

Similarly,

cos(x+ h)− cos(x)h

=cos(h)− 1

hcos(x)− sin(h)

hsin(x). (2.39)

Take the limit as h→ 0. This gives

cos′(x) = − sin(x). (2.40)

2.8 Differentiability implies continuity

If f is differentiable at x, then f is continuous at x. This is because when f ′(x)exists, then

limh→0

[f(x+ h)− f(x)] = limh→0

f(x+ h)− f(x)h

· limh→0

h = f ′(x) · 0 = 0. (2.41)

This implies that limh→0 f(x+ h) = f(x), which is continuity at x.If f is continuous at x, then f may or may not be differentiable at x. Here is

an interesting example. Let sqrt(x) =√x be the square root function, defined

for x ≥ 0. This function is continuous at zero, since if a number is small, itssquare root is also small. The difference quotient is

√x+ h−√x

h=

1√x+ h+

√x. (2.42)

2.9. THE SECOND DERIVATIVE 17

(This is a mildly tricky bit of algebra to invent, but to check it is simple. Theunderlying principle is that a2−b2 = (a−b)(a+b) implies a−b = (a2−b2)/(a+b).Apply this to the numerator on the left hand side.) Now take the limit as happroaches zero. If x > 0 the answer is just

sqrt′(x) =1

2√x. (2.43)

However if x = 0, then the difference quotient is 1/√h. This makes sense at

least if h > 0. However it does not approach a real number in the limit ash → 0, in fact, it gets larger and larger. So sqrt does not have a derivativeat zero. (However one could say that it has a right hand derivative with value+∞, but this would involve a more general notion of derivative.)

Another important example is f(x) = |x|. This absolute value function iscontinuous at zero, since if a number is close to zero, then its absolute value isalso close to zero. However the difference quotient is

|x+ h| − |x|h

=2x+ h

|x+ h|+ |x| . (2.44)

(Check the algebra; the same trick works as in the square root example.) Whenx 6= 0 this has a limit f ′(x) = x/|x|. However there is a problem when x = 0.The difference quotient in this case is h/|h| (or h/|h|, which is the same thing).When h > 0 this is 1, while when h < 0 this is −1. So making h close to zerodoes not force the difference quotient to be close to some fixed number. Thusthe absolute value function is not differentiable at zero. (However one could saythat it has a right hand derivative at zero with value 1 and a left hand derivativeat zero with value −1.)

2.9 The second derivative

If y = f(x) and dy/dx = f ′(x), then it may well be that there is a secondderivative, usually written as

d2y

dx2= f ′′(x). (2.45)

At the end of this section is an argument that a better version of the Leibniznotation for second derivative would be

d2y

(dx)2= f ′′(x). (2.46)

However it is common to write the second derivative as d2y/dx2 without theparentheses around the dx. This is in spite of the fact that dx2 has anothermeaning in the context dx2/dx = 2x.

The second derivative is the rate of change of the rate of change. If it ispositive, then the original function is concave up, while if it is negative, thenthe original function is concave down.


Consider the example w = g(u) = 1/u with dw/du = g′(u) = −1/u2. Thefact that this is negative means that the original function is decreasing for allinputs u 6= 0. We can write

−1/(u+ h)2 + 1/u2h

=2u+ h

u2(u+ h)2. (2.47)


limh→0−1/(u+ h)2 + 1/u2

h= 2

1u3. (2.48)

So d2w/du2 = g′′(u) = 2/u3. From this one can see that the original functionis concave up for u > 0 and concave down for u < 0.

If s = f(t) is the position of a moving particle at time t, then the velocity is

v =ds

dt= f ′(t). (2.49)

The acceleration is

a =dv

dt=d2s

dt2= f ′′(t). (2.50)

The Leibniz notation for second derivative is perhaps puzzling. However wehave seen that if we define ∆y = f(x+ ∆x)− f(x), then

dy

dx= lim

∆x→0∆y∆x

. (2.51)

This is called the forward difference definition of the derivative. A completelyequivalent definition would result from defining instead ∆y = f(x) − f(x −∆x). Then the derivative could also be defined with this backward differencedefinition. (From the point of view of symmetry and numerical accuracy aneven nicer definition would be to take the average of the forward and backwarddifferences.)

The most elegant way to do the second difference is to first perform theforward difference and then the backward difference. This would give

∆2y = ∆∆y = ∆[(f(x+ ∆x)−f(x)] = [f(x+ ∆x)−f(x)]− [f(x)−f(x−∆x)],(2.52)

where ∆y = ∆f(x) = [f(x + ∆x) − f(x)] is defined by the forward differenceand ∆∆y = ∆[f(x + ∆x) − f(x)] is defined by the backward difference. (Ifwe first did the backward difference and then the forward difference, then wewould get the same result.) Fact: The second derivative may be expressed inan alternate form as a single limit

d2y

(dx)2= lim

∆x→0∆2y

(∆x)2. (2.53)

This makes the point that the notation d2y/(dx)2 might be more suggestivethan the more conventional d2y/dx2.

Chapter 3

Differentiation rules

3.1 Summary of differentiation rules

Here are the differentiation rules in the Leibniz notation. This is the nota-tion that is most useful in mathematical modeling, that is, in applications ofmathematics to the real world.

The rule for differentiating a constant is

dc

dx= 0. (3.1)

The rule for differentiating a power function

dxp

dx= pxp−1. (3.2)

One may use x or u or whatever for the input variable. The rule for differenti-ating the exponential function is

deu

du= eu. (3.3)

The rule for differentiating the sine function is

d sin(θ)dθ

= cos(θ). (3.4)

The rule for differentiating the cosine function is

d cos(θ)dθ

= − sin(θ). (3.5)

Let u = f(x) and v = g(x). The sum (difference) rule is

d(u± v)dx

=du

dx± dvdx. (3.6)

19

20 CHAPTER 3. DIFFERENTIATION RULES

The product rule isd(uv)dx

=du

dxv + u

dv

dx. (3.7)

The quotient rule isduvdx

=dudxv − u dvdx

v2. (3.8)

Set y = f(u) and u = g(x). The chain rule is

dy

dx=dy

du

du

dx. (3.9)

After differentiation, substitute u = g(x) in dy/du.Take y = f(u) with u = f−1(y). The inverse function rule follows from the

chain rule by computing 1 = dy/dy = (dy/du)(du/dy). The result is

du

dy=

1dydu

. (3.10)

After differentiation, substitute u = f−1(y) in dy/du. .

3.2 Differentiating inverse functions

Since deu/du = eu the derivative of the logarithm function u = ln(y) is

d ln(y)dy

=1eu

=1

eln(y)=

1y

(3.11)

for y > 0.Since d sin(θ)/dθ = cos(θ) the derivative of the inverse sine function θ =

sin−1(y) with −π/2 < θ < π/2 isd sin−1(y)

dy=

1cos(sin−1(y))

=1√

1− y2 (3.12)

for −1 < y < 1.Parenthetical note: Since d cos(θ)/du = − sin(θ) the derivative of the inverse

cosine function θ = cos−1(y) with 0 < θ < π is

d cos−1(y)dy

=1

− sin(cos−1(y)) = −1√

1− y2 (3.13)

for −1 < y < 1. The fact that the answer is the same up to a sign change shouldbe no surprise, since

cos−1(y) =π

2− sin−1(y). (3.14)

Since tan(θ) = sin(θ)/ cos(θ), the quotient rule gives

d tan(θ)dθ

=cos2(θ) + sin2(θ)

cos2(θ)=

1cos2(θ)

. (3.15)

3.3. CHAIN RULE EXAMPLES 21

Therefore the derivative of the inverse tangent function θ = tan−1(y) with−π/2 < θ < π/2 is

d tan−1(y)dy

= cos2(θ) = cos2(tan−1(y)) =1

1 + y2. (3.16)

3.3 Chain rule examples

The sine of the square is expressed by the composition y = sin(u) with u = x2.Thus dy/dx = (dy/du)(du/dx), or

d sin(x2)dx

=d sin(u)du

du

dx= cos(u)

du

dx= cos(x2)2x. (3.17)

The derivative of the square of a sine is given by z = w2 with w = sin(θ). Thusdz/dθ = (dz/dw)(dw/dθ), or

d sin2(θ)dθ

=w2

dw

dw

dθ= 2w

dw

dθ= 2 sin(θ) cos(θ). (3.18)

The answer 2 sin(θ) cos(θ) = sin(2θ) is periodic with period π. This suggestsanother way of doing the problem. Use the identity cos(2θ) = cos2(θ)−sin2(θ) =1− 2 sin2(θ). This givesd sin2(θ)

dθ= −1

2d cos(2θ)

dθ= −1

2d cos(τ)dτ

dτ

dθ=

12

sin(τ)dτ

dθ=

12

sin(2θ)2 = sin(2θ).

(3.19)Some uses of the chain rule are so simple that most people do not use an

intermediate variable. For instance, instead of writing θ = ωt and computingd sin(ωt)/dt = d sin(θ)/dθ dθ/dt one just writes

d sin(ωt)dt

= ω cos(ωt). (3.20)

Similarly,dekt

dt= kekt. (3.21)

Write ekt = at

∆t , where a is the growth factor for time step ∆t. Notice that

dat

∆t

dt=

ln(a)∆t

at

∆t . (3.22)

is exactly the same thing, since ln(a) = k∆t. Remember that sometimes peoplewrite a = 1+r, where r is the growth rate for time step ∆t. Then ln(1+r) = k∆t,and when r is small this implies that r ≈ k∆t. So the meaning of the continuousgrowth rate k is r/∆t (in the limit as ∆t→ 0). That is, the continuous growthrate is the growth rate per time step, in the limit of very small time step. Thegrowth rate r is dimensionless, but the continuous growth rate k has dimensionsof inverse time.


3.4 Implicit functions

When we write y = 1/x2, we have an explicit definition of y as a function of x.However when we write yx2 = 1 we have an implicit definition of y as a functionof x.

It is possible to differentiate such expressions using the ordinary rules ofcalculus. Thus we would get (dy/dx)x2 + y(2x) = 0, which may be solved toget dy/dx = −2y/x. This is equivalent to the answer dy/dx = −2/x3 that onewould get by explicit differentiation.

The equation yx2 = 1 also gives an implicit definition of x as a func-tion of y. The explicit definition would be x = 1/

√y = y−1/2. If we dif-

ferentiate the implicit equation, we get x2 + y(2x)dx/dy = 0. This givesdx/dy = −(1/2)x/y = −(1/2)y−3/2. Notice that the −2 power and the −1/2power are inverse functions of each other. This is the reason why dy/dx anddx/dy are reciprocals of each other.

As another example, consider the implicit equation w = sin(z). This issupposed to define z as a function of w. We get 1 = cos(z)(dz/dw), so dz/dw =1/ cos(z). Since cos2(z)+sin2(z) = cos2(z)+w2 = 1, we get cos(z) = ±√1− w2and so dz/dw = 1/(±√1− w2). This is the answer we got when we did thederivative of the inverse function arcsin = sin−1 directly. (However one has tothink about the ± sign.)

3.5 Proof of the product rule

As a sample of the kind of argument to prove these rules, let us prove the productrule. The idea is that if u = f(x) and v = g(x), then ∆u = f(x + ∆x) − f(x)and ∆v = g(x+ ∆x)− g(x). Furthermore,

(u+ ∆u)(v + ∆v) = uv + ∆u v + u∆v + ∆u∆v. (3.23)

This algebraic identity says that changing the sides of a rectangle gives a changethat has three terms. The last term is the small postage stamp term.

∆uv∆x

=(u+ ∆u)(v + ∆v)− uv

∆x=

∆u∆x

v + u∆v∆x

+∆u∆x

∆v∆x

∆x. (3.24)

In the limit the postage stamp term goes away, and one gets the product rule.This is the crucial simplification that makes calculus work.

Remark: The quotient rule follows from the product rule. Write u/v = uv−1.Then by the power rule and the chain rule dv−1/dx = −v−2dv/dx. Hence bythe product rule

duv−1

dx=du

dxv−1 + u

dv−1

dx=du

dxv−1 − uv−2 dv

dx. (3.25)

Some people find this a helpful way to remember the quotient rule.

3.6. PROOF OF THE CHAIN RULE 23

3.6 Proof of the chain rule

The proof of the chain rule is much simpler than the proof of the product rule,since the chain rule is valid even before one takes the limit. If y = f(u) andu = g(x), then ∆y = f(u+ ∆u)− f(u) and ∆u = g(x+ ∆x)− g(x). Then

∆y∆x

=∆y∆u

∆u∆x

. (3.26)

When you take the limit, you get the chain rule. (The only problem could bein the rare situation when ∆u = 0 even when ∆x 6= 0, but this can be fixed.)

3.7 Differentiating power functions

Let p be an arbitrary real number. Say that f(x) = xp. Then it is true ingeneral that f ′(x) = pxp−1.

To prove this, we consider the cases when x > 0, x = 0, and x < 0 separately.When x > 0, then there are no restrictions on p. We can write f(x) = ep ln(x).

Hence by the chain rule

f ′(x) = ep ln(x)p1x

= xpp1x

= pxp−1. (3.27)

When x = 0, we restrict to p > 1. In that case we can use the definition ofthe derivative to argue that the derivative is zero.

When x < 0, we can define xp when p = n/k with k odd. In that case, xp isan even function for n even and an odd function for n odd. If f(x) is even, wehave f(x) = f(−x), so by the chain rule f ′(x) = −f ′(−x), and so f ′(x) is odd.Similarly, if f(x) is odd, then f ′(x) is even. These facts show that the powerrule continues to work for x < 0. In fact, if p = n/k with n even (or odd), thenp− 1 = (n− k)/k has n− k odd (or even).

3.8 Summary of differentiation rules in a differ-ent notation

Here we restate the rules in a different notation. This notation is used mainlyin theoretical discussions.

We have c′ = 0, ( )p′ = p( )p−1, exp′ = exp, sin′ = cos and cos′ = − sin.We have (f ± g)(x) = f(x) + g(x), so the sum and difference rules are

(f ± g)′(x) = f ′(x)± g′(x). (3.28)

Also, (fg)(x) = f(x)g(x), so the product rule is

(fg)′(x) = f ′(x)g(x) + f(x)g′(x). (3.29)


Again, (f/g)(x) = f(x)/g(x), so the quotient rule is

(f

g)′(x) =

f ′(x)g(x)− f(x)g′(x)g(x)2

. (3.30)

With (f ◦ g)(x) = f(g(x)) the chain rule is(f ◦ g)′(x) = f ′(g(x))g′(x). (3.31)

With f(f−1(y)) = y the inverse function rule is

(f−1)′(y) =1

f ′(f−1(y)). (3.32)

3.9 Summary of differentiation rules in mixednotation

Here are the rules in a hybrid notation that may make some people comfortable.The sum and difference rules are

d(f(x)± g(x))dx

= f ′(x)± g′(x). (3.33)

The product rule is

d(f(x)g(x))dx

= f ′(x)g(x) + f(x)g′(x). (3.34)

The quotient rule is

d(f(x)/g(x))dx

=f ′(x)g(x)− f(x)g′(x)

g(x)2. (3.35)

The chain rule isd f(g(x))

dx= f ′(g(x))g′(x). (3.36)

The inverse function rule is

df−1(y)dy

=1

f ′(f−1(y)). (3.37)

3.10 The derivative as a linear approximation

If f is differentiable at x, then it is always possible to write

f(x+ h) = f(x) + f ′(x)h+ Ex(h), (3.38)

where

limh→0

Ex(h)h

= 0. (3.39)

3.11. L’HOSPITAL’S RULE IN A SPECIAL CASE 25

There is no mystery about Ex(h); it is just Ex(h) = f(x+ h)− f(x)− f ′(x)h.Think of it as an error term with the limit property expressed above. So f(x+h)as a function of h is approximated by the linear function f(x) + f ′(x)h with anerror that is relatively small with respect to h.

In the Leibniz notation the same result may be expressed by taking y = f(x)and ∆y = f(x+ ∆x)− f(x) and writing

∆y =dy

dx∆x+ Ex(∆x). (3.40)

This says that ∆y is proportional to ∆x up to an error that is relatively smallwith respect to ∆x.

As an example, take sin(x+ h) = sin(x) + cos(x)h+ Ex(h). This says thata small change in the sine function at a particular point is given by multiplyingthe change by the value of the cosine function at the point. In the Leibniznotation this could take the form ∆ sin(x) = cos(x)∆x+ Ex(∆x).

There is another way of expressing the same ideas. If f is differentiable ata, then

f(z) = f(a) + f ′(a)(z − a) + Ea(z − a), (3.41)where

limz→a

Ea(z − a)z − a = 0. (3.42)

This says that f(z) as a function of z is approximated by the linear functionf(a) + f ′(x)(z − a) with an error that is relatively small with respect to z − a.That is, if one looks at f(z) on a very small range of values of inputs near a,then it looks like a linear function.

As an example, take sin(z) = sin(a)+cos(a)(z−a)+Ex(z−a). This says thatthe sine function near a particular point a is approximately a linear functionwith slope cos(a).

Contrast this with the absolute value function, which does not have a deriva-tive at zero. If one looks at the graph of |z| near the origin, the corner is alwaysapparent, no matter how close one looks.

3.11 l’Hospital’s rule in a special case

There is a special case of l’Hospital’s rule that makes sense from the point ofview of linear approximation. This says that if f and g are both differentiableat a, and if f(a) = 0 and g(a) = 0, and g′(a) 6= 0, then

limz→a

f(z)g(z)

=f ′(a)g′(a)

. (3.43)

The reason this is true is that we can write

f(z) = f ′(a)(z − a) + Ea(f, z − a), (3.44)


andg(z) = g′(a)(z − a) + Ea(g, z − a). (3.45)

Near a both f and g look like linear functions that vanish at a. Furthermore,the ratio of the two linear functions is the ratio of their slopes. So

limz→a

f(z)g(z)

= limz→a

f ′(a) + Ea(f,z−a)z−ag′(a) + Ea(g,z−a)z−a

=f ′(a)g′(a)

. (3.46)

This used the fact that the limit of a quotient is the quotient of the limits.An example of l’Hospital’s rule at work is

limx→π

ex − eπsin(x)

=eπ

cos(π)= −eπ. (3.47)

Chapter 4

Change

4.1 First derivative test

Let f be a function. Then f has a local minimum at a if f(x) ≥ f(a) for all xnear a. Also, f has a local maximum at a if f(x) ≤ f(a) for all x near a.

The first derivative test says that if f is a differentiable function and f haseither a local minimum or a local maximum at a, then

dy

dx (x=a)= f ′(a) = 0. (4.1)

4.2 Second derivative test

The second derivative test is about the situation where f is a differentiablefunction with f ′(a) = 0. Then if

d2y

dx2 (x=a)= f ′′(a) > 0 (4.2)

then it follows that f has a local minimum at a. If instead

d2y

dx2 (x=a)= f ′′(a) < 0 (4.3)

then it follows that f has a local maximum at a.

4.3 Global minimum and global maximum

A function f has a global minimum on some domain at a if f(x) ≥ f(a) for allx in the domain. Also, f has a global maximum at a if f(x) ≤ f(a) for all x inthe domain.

A continuous function f defined on a closed interval [a, b] (an interval thatincludes the end points) always has a point where there is a global minimum

27

28 CHAPTER 4. CHANGE

and a point where it has a global maximum. If the function is differentiable inthe open interval (a, b), then at such a point either the derivative is zero, or itis an end point.

A continuous function defined on an infinite interval like [a,+∞) or on anopen interval like (a, b) does not necessarily have a global minimum or a globalmaximum.

4.4 Change of coordinates

Say that y = f(u) and u = g(x), so that y = h(x) = f(g(x)). Then we canthink of y as a function f of u or as a function h of x.

Suppose du/dx 6= 0. Then the first derivative test for y as a function of u isthe same as the first derivative test for y as a function of x. This is because

dy

dx=dy

du

du

dx. (4.4)

Therefore since x = a and u = g(a) are the same point, we have

dy

dx (x=a)= 0⇔ dy

du (u=g(a))= 0. (4.5)

Again suppose du/dx 6= 0. Then the second derivative test for y as a functionof u is the same as the second derivative test for y as a function of x. This isbecause

d2y

dx2=d2y

du2

(du

dx

)2+dy

du

d2u

dx2. (4.6)

Therefore since x = a and u = g(a) are the same point and at that point

dy

dx (x=a)=dy

du (u=g(a))= 0, (4.7)

we haved2y

dx2 (x=a)> 0⇔ d

2y

du2 (u=g(a)> 0 (4.8)

andd2y

dx2 (u=a)< 0⇔ d

2y

du2 (u=g(a))< 0. (4.9)

4.5 Optimization

Consider the example of minimizing the area

A = 2(πr2) + (2πr)h (4.10)

of a cylinder with fixed value of the volume

V = (πr2)h. (4.11)

4.5. OPTIMIZATION 29

There are several ways to do the problem.A straightforward method is to think of A as a function of r. Then

A = 2πr2 +2Vr. (4.12)

The first derivative isdA

dr= 4πr − 2V

r2. (4.13)

The second derivative isd2A

dr2= 4π +

4Vr3. (4.14)

The first derivative vanishes at

r∗ =(V

2π

) 13

. (4.15)

At this point wheredA

dr (r=r∗)= 0 (4.16)

the second derivative isd2A

dr2 (r=r∗)= 12π > 0. (4.17)

This indicates a local minimum.A slightly uglier method is to think of A as a function of h. Then

A =2Vh

+ 2(πV h)12 . (4.18)

The first derivative isdA

dh= −2V

h2+ (πV )

12h−

12 . (4.19)

The second derivative is

d2A

dh2=

4Vh3− 1

2(πV )

12h−

32 . (4.20)

The first derivative vanishes at

h∗ =(

4Vπ

) 13

. (4.21)

At this point wheredA

dh (h=h∗)= 0 (4.22)

the second derivative isd2A

dh2 (h=h∗)=

34π > 0. (4.23)


This indicates a local minimum.There is however a much nicer way to do the problem. It does not matter

whether we think of A as a function of r or of h. For definiteness, let us take Aas a function of r. But the idea is to differentiate implicitly. Thus

dA

dr= 2π(2r + h+ r

dh

dr). (4.24)

Furthermore, since V is constant, we have

2rh+ r2dh

dr= 0. (4.25)

ThusdA

dr= 2π(2r − h). (4.26)

At the point where dA/dr = 0 we have

h = 2r. (4.27)

This version gives much more geometric insight. The second derivative is

d2A

dr2= 2π(2− dh

dr) = 4π(1 +

h

r). (4.28)

At the point where the first derivative vanishes this is 12π > 0 as before. Thisindicates a local minimum. At this minimum the actual value of A is 6πr2 wherer = r∗.

4.6 The mean value theorem

The mean value theorem says that if f is continuous on [a, b] and differentiableon (a, b), then there exists a number c in (a, b) with

f ′(c) =f(b)− f(a)

b− a . (4.29)

This is a very important theoretical result, since shows that information aboutthe derivative gives information about the functions. In the Leibniz notationy = f(x) this says that if we compute ∆y = f(x+ ∆x)− f(x), then there is ac between x and x+ ∆x such that

dy

dx (x=c)=

∆y∆x

. (4.30)

The most important consequences are the increasing function theorem andthe constant function theorem. The increasing function theorem says that iff ′(x) > 0 for all x in (a, b), then f is increasing on (a, b). The constant functiontheorem says that if f ′(x) = 0 for all x in (a, b), then f is constant on [a, b].

One consequence of the constant function theorem is that if two functionshave the same derivative on some interval, then they differ by a constant. Thatis, if F ′(x) = G′(x), then F (x) = G(x) + C. This constant is usually called theconstant of integration.

4.7. L’HOSPITAL’S RULE 31

4.7 l’Hospital’s rule

l’Hospital’s rule says that if f and g are both differentiable at a, and if f(a) = 0and g(a) = 0, then

limz→a

f(z)g(z)

= limz→a

f ′(z)g′(z)

. (4.31)

Consider the function f(z)g(x)−g(z)f(x) as a function of x for fixed z. Thisfunction has value zero at x = a and at x = z. By the mean value theorem, thereis a number c between a and z such that the derivative f(z)g′(c)−g(z)f ′(c) = 0.Thus

f(z)g(z)

=f ′(c)g′(c)

(4.32)

with c between a and z. As z gets close to a, then also c gets close to a. Sotaking the limit gives l’Hospital’s rule.

4.8 The error in the linear approximation ac-cording to Cauchy

Consider a differentiable function f(z) and the the linear approximation f(a) +f ′(a)(z− a) near a. Fix a particular value of z and look at this as a function ofa. Then

d

da[f(a) + f ′(a)(z − a)] = f ′(a)− f ′(a) + f ′′(a)(z − a) = f ′′(a)(z − a). (4.33)

This formula says that as a approaches z, the change in the predicted valuedepends on the second derivative. Furthermore, this change gets small as a getsclose to z. The reason is that the predicted value of f at z given by the linearapproximation is almost exact as a gets close to z.

Apply the mean value theorem to the interval from a to z. This says thatthere is a c between a and z such that

[f(z) + f ′(z)(z − z)]− [f(a) + f ′(a)(z − a)]z − a = f

′′(c)(z − c). (4.34)

This proves that

f(z) = f(a) + f ′(a)(z − a) + f ′′(c)(z − c)(z − a), (4.35)

where c is between a and z. This gives the Cauchy form of the error in thelinear approximation.

As an example, take sin(z) = sin(a) + cos(a)(z − a) + Ea(z − a), whereEa(z − a) = − sin(c)(z − c)(z − a). It is always true that | sin(c)| ≤ 1. If|z − a| ≤ 1/100, then certainly it is also the case that |z − c| ≤ 1/100. So|Ea(z − a)| ≤ 1/10000.


4.9 The error in the linear approximation ac-cording to Lagrange

There is another way of writing the error that is sometimes more convenient.The idea is to take a as a function of a parameter t. The number z remains afixed constant. Let g(t) = f(a) + f ′(a)(z − a), where a = h(t). Then by thechain rule g′(t) is given by

g′(t) =d

dt[f(a) + f ′(a)(z− a)] = f ′′(a)(z− a)da

dt= −1

2f ′′(a)

d(z − a)2dt

. (4.36)

This suggests a choice of parameter. Take t ≤ 0 such that (z − a)2 = −t. Withthis choice the rate of change is g′(t) = (1/2)f ′′(a), where a = h(t) = z ±√−t.What makes this work is that when t is near 0, the rate of change of a withrespect to t is very large. This makes the predicted value of f at z given by thelinear approximation continue to change at a rate given by the second derivative,even when a is close to z.

The interval from t to 0 parameterizes the interval from a = h(t) to z = h(0).The mean value theorem on the interval from t to 0 gives

g(0)− g(t)0− t =

[f(z) + f ′(z)(z − z)]− [f(a)− f ′(a)(z − a)](z − a)2 = g

′(c) =12f ′′(c∗),

(4.37)where c is between t and 0, and c∗ = h(c) is between a and z. This proves that

f(z) = f(a) + f ′(a)(z − a) + 12f ′′(c∗)(z − a)2, (4.38)

where c∗ is between a and z. This gives the Lagrange form of the error in thelinear approximation.

This last formula is particularly important. It may also be stated in theform

Ea(z − a) = 12f′′(c∗)(z − a)2. (4.39)

This says that if one knows that |f ′′(w)| ≤M on some interval near a, then forz in that interval

|Ea(z − a)| ≤ 12M(z − a)2. (4.40)

This can give a quite useful idea of how small the error is.As an example, take sin(z) = sin(a) + cos(a)(z − a) + Ea(z − a), where

Ea(z − a) = −(1/2) sin(c∗)(z − a)2. It is always true that | sin(c∗)| ≤ 1. If|z − a| ≤ 1/100, then |Ea(z − a)| ≤ 1/20000.

4.10 Newton’s second law of motion

The fundamental law of physics is

F = ma. (4.41)

4.10. NEWTON’S SECOND LAW OF MOTION 33

Here F is force, m is mass, and a = dv/dt = d2s/dt2 is acceleration.In metric units F is measured in newtons, m in kilograms, and a in meters

per second squared.In one English system F is measured in pounds of force, m in slugs, and a

in feet per second squared. A pound of force is about 4.4482 newtons. A slugis about 14.594 kilograms.

First consider the special case of constant gravitation. Distance is measuredupward. Thus the force F = −mg is downward. The solution is a = −g and sov = v0− gt, where v0 is the constant of integration, which is the initial velocity.Finally

s = s0 + v0t− 12gt2, (4.42)

where s0 is the next constant of integration, the initial displacement.Next consider the special case when the only force is a friction F = −αv.

This might describe a tiny particle in a fluid. Then the Newton equation is

mdv

dt= −αv. (4.43)

This can be written1v

dv

dt= − α

m. (4.44)

Thusln(±v) = ln(±v0)− α

mt. (4.45)

The constant of integration is the logarithm of the absolute value of the initialvelocity. This violates the rule that the input to a logarithm function must bedimensionless, but fortunately the equation has the equivalent form

ln(v

v0) = − α

mt (4.46)

without this problem. It follows that the solution is

v = v0e−αm t. (4.47)

This says that the motion gets slower and slower. The solution for the displace-ment is

s = s0 +m

αv0(1− e− αm t). (4.48)

This implies that as t → ∞ the displacement s approaches a limiting values = s0 + mα v0.

What if there is both gravitation and friction? Then the force is F = −mg+αv. The Newton equation is

mdv

dt= −mg − αv. (4.49)

This can be written1

mgα + v

dv

dt= − α

m. (4.50)


Thusln(±(v + mg

α)) = ln(±(v0 + mg

α))− α

mt. (4.51)

The solution isv = (v0 +

m

αg)e−

αm t − m

αg. (4.52)

This says that the motion approaches a terminal velocity v = −mα g where theacceleration is zero. The solution for the displacement is

s = s0 +m

α(v0 +

m

αg)(1− e− αm t)− m

αgt. (4.53)

This implies that as t → ∞ the displacement s becomes approximately linearin t, just like free motion. However the mechanism is different; it is a balancebetween gravity pulling down and friction pushing up.

What happens in the above formula as g → 0? It is easy to see that onerecovers s = s0 + (m/α)v0(1 − e−(α/m)t), which is the case with friction butwith no gravity.

What happens in the above formula as α→ 0? This is a harder calculation.Write the result as

s = s0 +m

αv0(1− e− αm t) + m

2

α2g(1− α

mt− e− αm t). (4.54)

With the help of l’Hospital’s rule the limit as α→ 0 is s = s0 + v0t− (1/2)gt2,the result for motion with gravity but without friction.

Here is one final example, the frictionless harmonic oscillator. The forceF = −ks is proportional to the displacement s. The constant k is the springconstant. Thus the farther the mass is from s = 0, the stronger it is pulled backin this direction.

Newton’s law of motion says ma = −ks, that is,

md2s

dt2= −ks. (4.55)

The second derivative is proportional to the function. This suggests trying asine or cosine function. One possible choice is

s = A cos(ωt− φ). (4.56)Here A is the amplitude and φ is the phase. In order to determine the angularfrequency ω, substitute this back into the differential equation. The result is

−mω2 = −k. (4.57)Thus the angular frequency is

ω =

√k

m. (4.58)

The constants A and φ depend on the initial conditions. In fact, the initialdisplacement is s0 = A cos(φ) and the initial velocity is v0 = Aω sin(φ).

Chapter 5

The integral

5.1 Riemann sums

Say that f is a function. Let a and b be two real numbers. Let n be a positiveinteger. Set h = (b− a)/n. Write xi = a+ ih. Notice that x0 = a and xn = b.A left Riemann sum is a sum of the form

Lba(f, n) =n−1∑

i=0

f(xi)h. (5.1)

It would be nice to be able to calculate such sums. Unfortunately, this can bedone in an explicit way in only a few cases. The rest of the time we need toresort to the computer.

Here are a few cases when the sum can be evaluated. First, if f(x) = 1, thesum is

n−1∑

i=0

h = nh = b− a. (5.2)

Second, if f(x) = x, we can use the trick of telescoping sums. Notice that(x+ h)2 − x2 = 2xh+ h2. Therefore

n−1∑

i=0

(2xi + h)h =n−1∑

i=0

((xi + h)2 − x2i ) = b2 − a2. (5.3)

Therefore

2n−1∑

i=0

xih+ hn−1∑

i=0

h = b2 − a2. (5.4)

The conclusion isn−1∑

i=0

xih =12

(b2 − a2)− 12h(b− a) (5.5)

35

36 CHAPTER 5. THE INTEGRAL

Third, if f(x) = x2, we can again use the trick of telescoping sums. Noticethat (x+ h)3 − x3 = 3x2h+ +3xh2 + h3. Therefore

n−1∑

i=0

(3x2i + 3xih+ h2)h =

n−1∑

i=0

((xi + h)3 − x3i ) = b3 − a3. (5.6)

Therefore

3n−1∑

i=0

x2ih+ 3hn−1∑

i=0

xih+ h2n−1∑

i=0

h = b3 − a3. (5.7)

The conclusion is

n−1∑

i=0

x2ih =13

(b3 − a3)− 12h(b2 − a2) + 1

6h2(b− a). (5.8)

For a final example, fix r > 0 with r 6= 1 and let f(x) = rx. This is anexponential function. Use the trick of telescoping sums to evaluate the geometricseries. Notice that rx+h − rx = rx(rh − 1). Therefore

n−1∑

i=0

rxih =h

rh − 1n−1∑

i=0

(rxi+h − rxi) = hrh − 1(r

b − ra). (5.9)

Various well-known sums are obtained by setting a = 0 and b = n, so h = 1.For instance, the last example gives the classic formula

n−1∑

i=0

ri =rn − 1r − 1 . (5.10)

for the partial sum of a geometric series.

5.2 The definite integral

Say that f is a function. Let a and b be two real numbers. Let n be a positiveinteger. For each n = 1, 2, 3, . . . define

∆t =b− an

. (5.11)

So as n gets large, the corresponding ∆t gets small. Also, for i = 0, 1, 2, . . . , n−1, n let

ti = a+ i∆t. (5.12)

Thus t0 = a and tn = b.Let the left Riemann sum be

Lba(f, n) =n−1∑

i=0

f(ti)∆t. (5.13)

5.2. THE DEFINITE INTEGRAL 37

Let the right Riemann sum be

Rba(f, n) =n∑

i=1

f(ti)∆t. (5.14)

Say that f is a continuous function. Let a and b be two numbers. Thedefinite integral is defined by

Iba(f) = limn→∞

Lba(f, n) = limn→∞

Rba(f, n). (5.15)

The definite integral is a number that depends on a, b, and f . It can be computedby either the left or right sum approximations. (From the point of view ofsymmetry and numerical accuracy an even nicer definition would be to take theaverage of the left and right sums.) The usual notation for the definite integralis

Iba(f) =∫ ba

f(t) dt. (5.16)

Another variable may be used. For instance

Iba(f) =∫ ba

f(x) dx (5.17)

defines the same number.Both Lba(f, n) and Rba(f, n) turn out to be useful in the appropriate situa-

tions. For instance, if the function f(t) is increasing on the interval from a to bwith a < b, then

Lba(f, n) ≤∫ ba

f(x) dx ≤ Rba(f, n). (5.18)

On the other hand, if the function f(t) is decreasing on the interval from a to bwith a < b, then

Rba(f, n) ≤∫ ba

f(x) dx ≤ Lba(f, n). (5.19)

Example. As n → ∞ the corresponding ∆t = (b − a)/n → 0. Use the leftsum to calculate

n−1∑

i=0

ti ∆t = [12

(b2 − a2)− 12

∆t(b− a)]. (5.20)

Since ∆t→ 0 as n→∞, the integral is then∫ ba

t dt = limn→∞

n−1∑

i=0

ti ∆t =12

(b2 − a2). (5.21)

These calculations are obvious geometrically, at least if 0 ≤ a < b. The integralis just the area (1/2)b2 of the big triangle minus the area (1/2)a2 of the small


triangle. The sum is a little smaller, since it left out n small triangles each ofarea (1/2)(∆t)2. Their total area is (1/2)∆t n∆t = (1/2)∆t(b− a).

Example. Use the left sum to calculate

∫ ba

t2 dt = limn→∞

n−1∑

i=0

t2i ∆t = lim∆t→0

[13

(b3−a3)−12

∆t(b2−a2)+16

(∆t)2(b−a)] = 13

(b3−a3).

(5.22)Example. Fix r > 0 with r 6= 1. Use the left sum to calculate

∫ ba

rt dt = limn→∞

n−1∑

i=0

rti ∆t = limn→∞

∆tr∆t − 1(r

b − ra). (5.23)

By l’Hospital’s rule the limit

limh→0

h

rh − 1 = limh→01

ln(r)rh=

1ln(r)

. (5.24)

So the answer is ∫ ba

rt dt =1

ln(r)(rb − ra). (5.25)

In particular, taking r = ek we get

∫ ba

ekt dt =1k

(ekb − eka). (5.26)

5.3 First fundamental theorem of calculus

If f is a given function, then its antiderivative is a function F such that F ′(x) =f(x). The antiderivative is only determined up to an additive constant.

Suppose f is continuous on [a, b]. Suppose it has an antiderivative F with

F ′(x) = f(x). (5.27)

The first fundamental theorem of calculus says that

∫ ba

f(t) dt = F (b)− F (a). (5.28)

If you can find an antiderivative, then this gives a practical way of comput-ing definite integrals. It is the calculus form of the telescoping trick, but thecomputations are much easier.

Example: Since F (x) = x2/2 has derivative f(x) = x, it follows that

∫ ba

x dx =12b2 − 1

2a2. (5.29)

5.4. SECOND FUNDAMENTAL THEOREM OF CALCULUS 39

Example: Since F (x) = x3/3 has derivative f(x) = x2, it follows that∫ ba

x2 dx =13b3 − 1

3a3. (5.30)

Example: Since F (x) = ex has derivative f(x) = ex, it follows that∫ ba

ex dx = (eb − ea). (5.31)

Example: There is a function F (x) such that F ′(x) = sin(x2). However itdoes not have an expression in terms of elementary functions. So while it is truethat ∫ b

a

sin(x2) dt = F (b)− F (a), (5.32)

this does not give a particularly convenient answer.The first fundamental theorem of calculus is a close analog to the telescoping

sum principle. Here is an example that makes this clear.Let ∆x = (b−a)/n and set xi = a+ i∆x. The summation example uses the

telescoping procedure based on ∆x2 = (x+ ∆x)2 − x2 = 2x∆x+ (∆x)2. Thisgives

n−1∑

i=0

(2xi + ∆x) ∆x =n−1∑

i=0

(∆x2)i = (b2 − a2). (5.33)

Since n∆x = b− a, it follows thatn−1∑

i=0

2xi ∆x = (b2 − a2)− (b− a)∆x. (5.34)

Compare this with the calculus calculation. Since dx2/dx = 2x, it follows that∫ ba

2x dx =∫ ba

dx2

dxdx = b2 − a2. (5.35)

5.4 Second fundamental theorem of calculus

Suppose f is continuous on [a, b]. Define

F (x) =∫ xa

f(t) dt. (5.36)

The second fundamental theorem of calculus says that

F ′(x) = f(x). (5.37)

This theorem says that the definite integral may always be used to definean antiderivative of f . This antiderivative may be very difficult to compute interms of elementary functions.


Example: According to the second fundamental theorem of calculus thefunction f(x) = sin(x2) has an antiderivative given by

F (x) =∫ xa

sin(t2) dt. (5.38)

Even though this function does not have an expression in terms of elementaryfunctions, it is true that F ′(x) = sin(x2).

The second fundamental theorem of calculus gives a way of defining newfunctions. For instance, define

Si(x) =∫ x

0

sin(t)t

dt. (5.39)

This sine-integral function Si is used in optics.As another example, define

N(z) =12

+1√2π

∫ z0

e−x22 dx. (5.40)

This is called the standard normal cumulative distribution function in probabil-ity. This function has values between 0 and 1, and indeed its values representprobabilities. Its derivative is the function

n(z) = N ′(z) =1√2πe−

z22 . (5.41)

This is called the standard normal density function. In popular language it isthe bell-shaped curve.

5.5 Notation for integrals

The value of a definite integral depends on the function and the end points, butnot on the variable. So in principle one could write

Ibaf =∫ ba

f(x) dx =∫ ba

f(t) dt (5.42)

For instance, one could write

Ib0 sin =∫ b

0

sin(x) dx =∫ b

0

sin(t) dt = 1− cos(b). (5.43)

The notation on the left without variables is very unusual.There is nothing mysterious about the fact that the variable of integration

can be changed in this way. The situation is exactly the same for a sum

n=p∑n=0

rn =k=p∑

k=0

rk =1− rp+1

1− r . (5.44)

Chapter 6

Integration rules

6.1 Summary of integration rules

Here are the integration rules in the Leibniz notation. These are all rules forfinding antiderivatives.

The rule for integrating a power function with power p 6= −1 is∫xp dx =

xp+1

p+ 1+ C, (6.1)

The rule for integrating the −1 power function for x 6= 0 is∫

1xdx = ln(|x|) + C. (6.2)

(Often x > 0; then the absolute value is not needed.) The rule for integratingthe exponential function is

∫ex dx = ex + C (6.3)

The rule for integrating the sine function is∫

sin(x) dx = − cos(x) + C. (6.4)

The rule for integrating the cosine function is∫

cos(x) dx = sin(x) + C. (6.5)

Let u = f(x) and v = g(x). The rule for integrating a constant multiple is∫cu dx = c

∫u dx. (6.6)

41

42 CHAPTER 6. INTEGRATION RULES

The sum (difference) rule is∫

(u± v) dx =∫u dx±

∫v dx. (6.7)

The integration by parts rule is∫udv

dxdx = uv −

∫du

dxv dx. (6.8)

This rule applies to a product. However it just converts one integral into anotherintegral. The hope is that the second one is easier. The integration by partsrule comes from the product rule for differentiation.

Suppose z = f(g(x)) and set z = f(u) and u = g(x). The substitution ruleis ∫

f(u)du

dxdx =

∫f(u) du. (6.9)

After integration, substitute u = g(x) on the right hand side. This rule onlyapplies to a product of a rather special form.

The substitution rule comes from the chain rule applied to y = F (u) =F (g(x), where F ′(u) = f(u). It can also be written

∫f(g(x))g′(x) dx =

∫f(u) du, (6.10)

where u = g(x). That is, to perform the integral, one must express u in termsof x in such a way that the integrand has this special form.

Sometimes the substitution rules works well in conjunction with inverse func-tions. Then instead of expressing w in terms of x, the original variable x isexpressed in terms of w, and so

∫f(x) dx =

∫f(x)

dx

dw

dw

dxdx =

∫f(x)

dx

dwdw. (6.11)

If the inverse function is x = h(w) this is∫f(x) dx =

∫f(h(w))h′(w) dw, (6.12)

that is, it is the substitution rule run backward. All of these variants are handledautomatically when one uses differential forms, as we shall see.

6.2 Definite integrals by substitution

The substitution rule for definite integrals is

∫ ba

f(g(x))g′(x) dx =∫ g(b)g(a)

f(u) du. (6.13)

6.3. DIFFERENTIAL FORMS 43

6.3 Differential forms

A function is an object f with a numerical input and a numerical output. How-ever applications of calculus also deal with variable quantities that are relatedin several different ways. For instance, one might have w = f(u) and u = g(x).Then x and u and w are each a variable quantity. In this example they arerelated by the functions g and f .

In a problem there may be a variable quantity t such that all other variablequantities of interest may be expressed as differentiable functions of t, with con-tinuous derivatives. Such a quantity is called a coordinate. If t is a coordinate,then other variable quantities such as w and u may be expressed in terms of t.Thus dw/dt and du/dt are defined.

A coordinate need not be unique. If s and t are two coordinates, thens = h(t) and t = h−1(s), where h−1 is the inverse function of h. Thus ds/dtand dt/ds are defined, and

ds

dt

dt

ds= 1. (6.14)

In particular ds/dt 6= 0 and dt/ds 6= 0.A differential form is an assignment of a variable quantity to each coordinate

system, in such a way that the variable quantity associated with coordinate t isrelated to the variable quantity associated to coordinate s by multiplication byds/dt.

Let u be a variable quantity, so f(u) is also a variable quantity. An exampleof a differential form is f(u) du. For each coordinate there is a correspondingvariable quantity. With the t coordinate the variable quantity is f(u)du/dt,and with the s coordinate the variable quantity is f(u)du/ds. These variablequantities are related by

f(u)du

dt= f(u)

du

ds

ds

dt. (6.15)

That is, one is obtained from the other by multiplication with ds/dt or dt/ds.Note: In advanced mathematics there are various technical names for differ-

ential forms. Sometimes they are called differential 1-forms, sometimes they arecalled covariant vector fields. In this part of elementary calculus we are lookingat these objects only in the one dimensional case, when every variable quantitymay be expressed in terms of a single coordinate.

Each variable quantity w has a differential which is a differential form. Thusif w = F (u) and u = g(x), then the differential of w is dw = F ′(u) du =F ′(g(x))g′(x) dx.

It does not make sense to say a differential form has a particular non-zeronumerical value, but it does make sense to say that it is zero at a certain point.

Example. Suppose that t is a coordinate for the quantities of interest, sowe can always take derivatives with respect to t. Then it is natural to saythat dt 6= 0. If w = u2 = sin2(t), then w is not a coordinate, and in fact, dwcan vanish at certain points. Thus dw = 2u du = 2 sin(t) cos(t) dt = sin(2t) dt


vanishes where u = 0 or du = 0, that is, where t is a multiple of π/2. These arecritical points of w.

6.4 Second differentials

Say that t is a coordinate. Since all variable quantities of interest may beexpressed as differentiable functions of t, we may consider dt 6= 0. Consideranother coordinate s. Differentiate a variable quantity w with respect to thesecoordinates. The derivatives are related by

dw

dt=dw

ds

ds

dt(6.16)

Differentiate again with respect to t and use the chain rule. This gives

d2w

dt2=d2w

ds2

(ds

dt

)2+dw

ds

d2s

dt2. (6.17)

In general we may have d2w/dt2 positive or negative or zero quite independentof whether d2w/ds2 is positive or negative or zero.

However, at a point where dw/ds = 0 we have

d2w

dt2=d2w

ds2

(ds

dt

)2, (6.18)

where (ds

dt

)2> 0. (6.19)

It does not make sense to say that the second differential d2w has a particularnumerical value, but at a particular point where dw = 0 it does make senseto say that d2w > 0 or d2w < 0 or d2w = 0 at that point. This is becausethe numerical value of the second differential d2w with respect to coordinate tis related to the numerical value of the second differential d2w with respect tocoordinate s by multiplication by (ds/dt)2 > 0.

To summarize, consider a variable quantity w. Then at a point where dw = 0(first derivative test) the second differential d2w is defined, and d2w can be eitherpositive, negative, or zero (second derivative test).

Note: The second differential is defined only at a particular point wherethe first differential is equal to zero. In advanced mathematics this seconddifferential is called the Hessian. Here we are only looking at the Hessian in thevery special case when every variable quantity may be expressed in terms of asingle coordinate.

Example. Suppose w = u2 = sin2(t). Then dw = sin(2t) dt = 0 where t is amultiple of π/2. At such a point where dw = 0 we have d2w = 2 cos(2t)(dt)2.This is greater than zero when t is an even multiple of π/2 and is less thanzero when t is an odd multiple of π/2. In the first case these are points that

6.5. DIFFERENTIAL FORMS AND INDEFINITE INTEGRALS 45

give local minima of w, and in the second case they are points that give localmaxima of w.

Example: Consider the example of minimizing the area

A = 2(πr2) + (2πr)h (6.20)

of a cylinder with fixed value of the volume

V = (πr2)h. (6.21)

Both r and h are coordinates.Calculate

dA = 2π((2r + h) dr + r dh). (6.22)

Since V is constant, we have

2rh dr + r2 dh = 0. (6.23)

ThusdA = 2π(2r − h) dr = πr(1− 2r

h) dh. (6.24)

At the point where dA = 0 we have

h = 2r. (6.25)

The second differential at this point is

d2A = 12π(dr)2 =34π(dh)2 > 0. (6.26)

This indicates a local minimum.

6.5 Differential forms and indefinite integrals

Differential forms occur naturally as integrands. For example, if u = g(x), thendu = g′(x) dx. This gives the substitution rule

∫f(g(x))g′(x) dx =

∫f(u) du. (6.27)

Integration of a differential recovers the original variable quantity. Thus∫

dw = w + C. (6.28)

Example: Since u = x2 implies du = 2x dx and d cos(u) = − sin(u) du, wehave∫

sin(x2)2x dx =∫

sin(u) du = −∫d cos(u) = − cos(u) + C = − cos(x2) + C.

(6.29)


6.6 Differential forms and definite integrals

To use differential forms in definite integrals the limits of integration should bethought of as points rather than as numbers. A point may be specified by thevalue of a coordinate. If x is a coordinate, then x = a and x = b are points.Then ∫ x=b

x=a

f(g(x))g′(x) dx =∫ u=qu=p

f(u) du, (6.30)

where p = g(a) and q = g(b). The conditions u = p and u = q may not uniquelyspecify points defined by x values, but if we are only concerned with variablequantities that depend on u, then they determine the value of the integral.

Such a notation allows hybrid expressions in integrals like

∫ x=bx=a

f(g(x))g′(x) dx =∫ x=bx=a

f(u) du =∫ x=bx=a

dw = wx=b − wx=a, (6.31)

where dw = f(u) du.Example: Suppose x is a coordinate. Then

∫ x=bx=0

sin(x2)2x dx =∫ x=bx=0

sin(u) du = −∫ x=bx=0

d cos(u) = 1− cos(b2). (6.32)

6.7 Rules for differentials

The rule for differentiating a constant is

dc = 0. (6.33)

The rule for differentiating a power function

dxp = pxp−1 dx. (6.34)

The rule for differentiating the exponential function is

deu = eu du. (6.35)

The rule for differentiating the sine function is

d sin(θ) = cos(θ) dθ. (6.36)

The rule for differentiating the cosine function is

d cos(θ) = − sin(θ) dθ. (6.37)

The sum (difference) rule is

d(u± v) = du± dv. (6.38)

6.8. RULES FOR INTEGRATION 47

The product rule isd(uv) = du v + u dv. (6.39)

The quotient rule is

du

v=du v − u dv

v2. (6.40)

The chain rule isdf(u) = f ′(u) du. (6.41)

6.8 Rules for integration

The fundamental theorem of calculus says that∫dw = w + C. (6.42)

The rule for integrating a constant multiple is∫cu dx = c

∫u dx. (6.43)

The sum (difference) rule is∫

(u± v) dx =∫u dx±

∫v dx. (6.44)

The integration by parts rule is∫u dv = uv −

∫v du. (6.45)

Suppose u = g(x). Since du = g′(x) dx, the substitution rule is∫f(g(x)) g′(x) dx =

∫f(u) du. (6.46)

6.9 Integration examples

With u = x4 + 5 we have du = 4x3. So∫x3 cos(x4+5) dx =

14

∫cos(u) du =

14

∫d sin(u) =

14

sin(u)+C =14

sin(x4+5)+C.

(6.47)Since w = 1 +

√x is inverted by x = (w − 1)2, we have dx = 2(w − 1) dw,

and so∫ √

1 +√x dx =

∫ √w 2(w − 1) dw = 22

5w

52 − 22

3w

32 + C. (6.48)

To finish substitute w = 1 +√x back in terms of x.


The preceding problem could also be done mindlessly by w = 1 +√x and

dw = 12√xdx. Then

∫ √1 +√x dx =

∫ √w 2√x dw = 2

∫ √w (w − 1) dw (6.49)

is completed as before.One learns from experience that quadratic expressions are simplified by

trigonometric substitution. Thus x2 + 9 is simplified by the substitution x =3 tan(θ). This gives∫

1x2 + 9

dx =19

∫1

tan2(θ) + 13

cos2(θ)dθ =

13

∫dθ =

13θ+C =

13

arctan(x

3)+C.

(6.50)Say x > 0. Then the exponential substitution x = eu gives

∫1xdx =

∫du = u+ C = ln(x) + C. (6.51)

Let me conclude by thanking Ali Vafaei for help in getting errors out of thesenotes. Those that remain are mine.

summary of calculus - university of arizonamath.arizona.edu/~faris/calc/calcsum.pdf · summary of...

Documents