partial diffraction seminar
TRANSCRIPT
Partial Differentiation
Quick Introduction:
Differentiation is the mathematical act of taking the derivative of a function. Partial
differentiation is the mathematical act of taking the derivative of a function that depends on more
than one variable with respect to one or more of those variables. If a differential equation
contains partial derivatives with respect to more than one independent variable then it is called
a partial differential equation (PDE). Partial differentiation is a pivotal technique used in
chemistry, physics and engineering. Partial differential equations are in general very difficult to
solve but their importance in applications warrants their use. Many partial differential equations
describe situations in which a property is dependent upon not only time but also on position.
Such a situation would be heat flow through a thin wire which involves position in the wire as
well as time.
Partial differentiation involves a process by which the derivatives of a function containing
multiple independent variables are found by considering all but the variable of interest as fixed
during differentiation. Partial differentiation corresponds to the same thing as ordinary
differentiation in that it represents an infinitesimal change in the function with respect to a given
parameter. The difference is that partial differentiation is performed on an equation with more
than one independent variable whereas ordinary differentiation is performed on an equation with
only one independent variable. The partial derivative is usually denoted as: ∂f/∂x or fx. These
denote the partial derivative of the function f, that is comprised of more independent variables
than just x, with respect to x. If a partial derivative is of second order or greater with respect to
two or more different variables then it is called a mixed partial derivative. For example if f is a
function of x, y, z (f(x, y, z)) and the partial derivative with respect to x is taken followed by the
partial derivative with respect to y then it is called a mixed partial derivative and is denoted as: fxy
or ∂2f/∂x∂y. For such functions whose partial derivatives exist and are continuous, for nice
functions, the mixed partial derivatives are equal no matter which differentiation is performed
first: fxy = fyx.
In general partial differential equations are more difficult to solve using analytical methods than
are ordinary differential equations. This is because often times they are more complex because of
the multiple variables involved. There are several methods that have been developed over the
years specifically designed to solve partial differential equations. Some of these methods include
the Bäcklund transformation, characteristic partial differential equation, Green's function,
Lagrange multiplier method, integral transform, Lax pair, and separation of variables. The
Lagrange multiplier method was probably the first of these methods formally devised to solve
partial differential equations. Joseph Louis Lagrange, an Italian mathematician, published this
method involving multipliers to investigate the motion of a particle in space that is constrained
to move on a surface defined by an equation involving three, independent variables. This method
was published in Lagrange's book, Mecanique analytique, in 1778 and is currently used to
maximize or minimize a function that is subject to a constraint. It can be employed in a variety of
situations such as minimizing the fuel required for a spacecraft to reach its desired trajectory as
well as maximizing the productivity of a commercial enterprise limited by the availability of
financial, natural, and personnel resources. As well as these methods numerical methods can be
applied to solve partial differential equations.
Although partial differential equations are in general difficult to solve, second-order partial
differential equations are often easily solved via analytical solutions. They can be classified as
elliptic, hyperbolic, or parabolic on the basis of a particular matrix or the discriminate of that
matrix. Each class has a solution that is quite different from the other classes. Second-order
partial differential equations that fall into the elliptic class produce stationary and energy-
minimizing solutions such as Laplace's equation and Poisson's equation. Those that are classified
as hyperbolic yield a propagating disturbance such as the wave equation. The last of the classes,
the parabolic equations, produce a smooth-spreading flow of an initial disturbance such as the
heat conduction equation and other diffusion equations.
Graphical explanation of partial differentiation:
Suppose that ƒ is a function of more than one variable. For instance,
A graph of z = x2 + xy + y2. For the partial derivative at (1, 1, 3) that leaves y constant, the
corresponding tangent line is parallel to the xz-plane.
A slice of the graph above at y= 1
It is difficult to describe the derivative of such a function, as there are an infinite number of
tangent lines to every point on this surface. Partial differentiation is the act of choosing one of
these lines and finding its slope. Usually, the lines of most interest are those that are parallel to
the xz-plane, and those that are parallel to the yz-plane.
A good way to find these parallel lines is to treat the other variable as a constant. For example, to
find the tangent line of the above function at (1, 1, 3) that is parallel to the xz-plane, we treat the
y variable as constant. The graph and this plane are shown on the right. On the left, we see the
way the function looks on the plane y = 1. By finding the derivative of the equation while
assuming that y is a constant, we discover that the equation of the tangent line of ƒ is:
So at (1, 1, 3), by substitution, the slope is 3. Therefore
at the point (1, 1, 3),
or as "The partial derivative of z with respect to x at (1, 1, 3) is 3."
Definition:
Basic Definition:
The function f can be reinterpreted as a family of functions of one variable indexed by the other
variables:
In other words, every value of x defines a function, denoted fx, which is a function of one real
number.[1] That is,
Once a value of x is chosen, say a, then f(x,y) determines a function fa which sends y to a2 + ay +
y2:
In this expression, a is a constant, not a variable, so fa is a function of only one real variable, that
being y. Consequently, the definition of the derivative for a function of one variable applies:
The above procedure can be performed for any choice of a. Assembling the derivatives together
into a function gives a function which describes the variation of f in the y direction:
This is the partial derivative of f with respect to y. Here ∂ is a rounded d called the partial
derivative symbol. To distinguish it from the letter d, ∂ is sometimes pronounced "del" or
"partial" instead of "dee".
In general, the partial derivative of a function f(x1,...,xn) in the direction xi at the point (a1,...,an)
is defined to be:
In the above difference quotient, all the variables except xi are held fixed. That choice of fixed
values determines a function of one variable
, and by definition,
In other words, the different choices of a index a family of one-variable functions just as in the
example above. This expression also shows that the computation of partial derivatives reduces to
the computation of one-variable derivatives.
An important example of a function of several variables is the case of a scalar-valued function
f(x1,...xn) on a domain in Euclidean space Rn (e.g., on R2 or R3). In this case f has a partial
derivative ∂f/∂xj with respect to each variable xj. At the point a, these partial derivatives define
the vector
This vector is called the gradient of f at a. If f is differentiable at every point in some domain,
then the gradient is a vector-valued function ∇f which takes the point a to the vector ∇f(a).
Consequently, the gradient produces a vector field.
A common abuse of notation is to define the del operator (∇) as follows in three-dimensional
Euclidean space R3 with unit vectors :
Or, more generally, for n-dimensional Euclidean space Rn with coordinates (x1, x2, x3,...,xn) and
unit vectors ( ):
Formal definition:
Like ordinary derivatives, the partial derivative is defined as a limit. Let U be an open subset of
Rn and f : U → R a function. The partial derivative of f at the point a = (a1, ..., an) ∈ U with
respect to the i-th variable ai is defined as
Even if all partial derivatives ∂f/∂ai(a) exist at a given point a, the function need not be
continuous there. However, if all partial derivatives exist in a neighborhood of a and are
continuous there, then f is totally differentiable in that neighborhood and the total derivative is
continuous. In this case, it is said that f is a C1 function. This can be used to generalize for vector
valued functions (f : U → R'm) by carefully using a componentwise argument.
The partial derivative can be seen as another function defined on U and can again be partially
differentiated. If all mixed second order partial derivatives are continuous at a point (or on a set),
f is termed a C2 function at that point (or on that set); in this case, the partial derivatives can be
exchanged by Clairaut's theorem:
Examples:
The volume of a cone depends on height and radius
The volume V of a cone depends on the cone's height h and its radius r according to the formula
The partial derivative of V with respect to r is
which represents the rate with which a cone's volume changes if its radius is varied and its height
is kept constant. The partial derivative with respect to h is
which represents the rate with which the volume changes if its height is varied and its radius is
kept constant.
By contrast, the total derivative of V with respect to r and h are respectively
and
The difference between the total and partial derivative is the elimination of indirect dependencies
between variables in the latter.
If (for some arbitrary reason) the cone's proportions have to stay the same, and the height and
radius are in a fixed ratio k,
This gives the total derivative with respect to r:
Equations involving an unknown function's partial derivatives are called partial differential
equations and are common in physics, engineering, and other sciences and applied disciplines.
Notation
For the following examples, let f be a function in x, y and z.
First-order partial derivatives:
Second-order partial derivatives:
Second-order mixed derivatives:
Higher-order partial and mixed derivatives:
When dealing with functions of multiple variables, some of these variables may be related to
each other, and it may be necessary to specify explicitly which variables are being held constant.
In fields such as statistical mechanics, the partial derivative of f with respect to x, holding y and z
constant, is often expressed as
Antiderivative analogue:
There is a concept for partial derivatives that is analogous to antiderivatives for regular
derivatives. Given a partial derivative, it allows for the partial recovery of the original function.
Consider the example of . The "partial" integral can be taken with respect to x
(treating y as constant, in a similar manner to partial derivation):
Here, the "constant" of integration is no longer a constant, but instead a function of all the
variables of the original function except x. The reason for this is that all the other variables are
treated as constant when taking the partial derivative, so any function which does not involve x
will disappear when taking the partial derivative, and we have to account for this when we take
the antiderivative. The most general way to represent this is to have the "constant" represent an
unknown function of all the other variables.
Thus the set of functions x2 + xy + g(y), where g is any one-argument function, represents the
entire set of functions in variables x,y that could have produced the x-partial derivative 2x+y.
If all the partial derivatives of a function are known (for example, with the gradient), then the
antiderivatives can be matched via the above process to reconstruct the original function up to a
constant.A relatively simple partial differential equation is
This relation implies that the function u(x,y) is independent of x. Hence the general solution of
this equation is
where f is an arbitrary function of y. The analogous ordinary differential equation is
which has the solution
where c is any constant value (independent of x). These two examples illustrate that general
solutions of ordinary differential equations involve arbitrary constants, but solutions of partial
differential equations involve arbitrary functions. A solution of a partial differential equation is
generally not unique; additional conditions must generally be specified on the boundary of the
region where the solution is defined. For instance, in the simple example above, the function f(y)
can be determined if u is specified on the line x = 0.
This is a concept for partial derivatives that is analogous to antiderivatives for regular
derivatives. Given a partial derivative, it allows for the partial recovery of the original function.
Existence and uniqueness:
Although the issue of the existence and uniqueness of solutions of ordinary differential equations
has a very satisfactory answer with the Picard–Lindelöf theorem, that is far from the case for
partial differential equations. There is a general theorem (the Cauchy–Kowalevski theorem) that
states that the Cauchy problem for any partial differential equation that is analytic in the
unknown function and its derivatives has a unique analytic solution. Although this result might
appear to settle the existence and uniqueness of solutions, there are examples of linear partial
differential equations whose coefficients have derivatives of all orders (which are nevertheless
not analytic) but which have no solutions at all: see Lewy (1957). Even if the solution of a partial
differential equation exists and is unique, it may nevertheless have undesirable properties. The
mathematical study of these questions is usually in the more powerful context of weak solutions.
An example of pathological behavior is the sequence of Cauchy problems (depending upon n)
for the Laplace equation
with initial conditions
where n is an integer. The derivative of u with respect to y approaches 0 uniformly in x as n
increases, but the solution is
This solution approaches infinity if nx is not an integer multiple of π for any non-zero value of y.
The Cauchy problem for the Laplace equation is called ill-posed or not well posed, since the
solution does not depend continuously upon the data of the problem. Such ill-posed problems are
not usually satisfactory for physical applications.
Notation:
In PDEs, it is common to denote partial derivatives using subscripts. That is:
Especially in (mathematical) physics, one often prefers the use of del (which in cartesian
coordinates is written ) for spatial derivatives and a dot for time
derivatives. For example, the wave equation (described below) can be written as
(physics notation),
or
(math notation), where Δ is the Laplace operator. This often leads to
misunderstandings in regards of the Δ-(delta)operator.
Examples:
The equation for conduction of heat in one dimension for a homogeneous body has the form
where u(t,x) is temperature, and α is a positive constant that describes the rate of diffusion. The
Cauchy problem for this equation consists in specifying u(0,x) = f(x), where f(x) is an arbitrary
function.
General solutions of the heat equation can be found by the method of separation of variables.
Some examples appear in the heat equation article. They are examples of Fourier series for
periodic f and Fourier transforms for non-periodic f. Using the Fourier transform, a general
solution of the heat equation has the form
where F is an arbitrary function. In order to satisfy the initial condition, F is given by the Fourier
transform of f, that is
If f represents a very small but intense source of heat, then the preceding integral can be
approximated by the delta distribution, multiplied by the strength of the source. For a source
whose strength is normalized to 1, the result is
and the resulting solution of the heat equation is
This is a Gaussian integral. It may be evaluated to obtain
This result corresponds to a normal probability density for x with mean 0 and variance 2αt. The
heat equation and similar diffusion equations are useful tools to study random phenomena.
Here u might describe the displacement of a stretched string from equilibrium, or the difference
in air pressure in a tube, or the magnitude of an electromagnetic field in a tube, and c is a number
that corresponds to the velocity of the wave. The Cauchy problem for this equation consists in
prescribing the initial displacement and velocity of a string or other medium:
where f and g are arbitrary given functions. The solution of this problem is given by d'Alembert's
formula:
This formula implies that the solution at (t,x) depends only upon the data on the segment of the
initial line that is cut out by the characteristic curves
that are drawn backwards from that point. These curves correspond to signals that propagate with
velocity c forward and backward. Conversely, the influence of the data at any given point on the
initial line propagates with the finite velocity c: there is no effect outside a triangle through that
point whose sides are characteristic curves. This behavior is very different from the solution for
the heat equation, where the effect of a point source appears (with small amplitude)
instantaneously at every point in space. The solution given above is also valid if t is negative,
and the explicit formula shows that the solution depends smoothly upon the data: both the
forward and backward Cauchy problems for the wave equation are well-posed.
Spherical waves are waves whose amplitude depends only upon the radial distance r from a
central point source. For such waves, the three-dimensional wave equation takes the form
This is equivalent to
and hence the quantity ru satisfies the one-dimensional wave equation. Therefore a general
solution for spherical waves has the form
where F and G are completely arbitrary functions. Radiation from an antenna corresponds to the
case where G is identically zero. Thus the wave form transmitted from an antenna has no
distortion in time: the only distorting factor is 1/r. This feature of undistorted propagation of
waves is not present if there are two spatial dimensions.
Laplace equation in two dimensions:
The Laplace equation for an unknown function of two variables φ has the form
Solutions of Laplace's equation are called harmonic functions.
Connection with holomorphic functions:
Solutions of the Laplace equation in two dimensions are intimately connected with analytic
functions of a complex variable (a.k.a. holomorphic functions): the real and imaginary parts of
any analytic function are conjugate harmonic functions: they both satisfy the Laplace equation,
and their gradients are orthogonal. If f=u+iv, then the Cauchy–Riemann equations state that
and it follows that
Conversely, given any harmonic function in two dimensions, it is the real part of an analytic
function, at least locally. Details are given in Laplace equation.
A typical boundary value problem:
A typical problem for Laplace's equation is to find a solution that satisfies arbitrary values on the
boundary of a domain. For example, we may seek a harmonic function that takes on the values
u(θ) on a circle of radius one. The solution was given by Poisson:
Petrovsky (1967, p. 248) shows how this formula can be obtained by summing a Fourier series
for φ. If r<1, the derivatives of φ may be computed by differentiating under the integral sign, and
one can verify that φ is analytic, even if u is continuous but not necessarily differentiable. This
behavior is typical for solutions of elliptic partial differential equations: the solutions may be
much more smooth than the boundary data. This is in contrast to solutions of the wave equation,
and more general hyperbolic partial differential equations, which typically have no more
derivatives than the data.
Euler–Tricomi equation:
The Euler–Tricomi equation is used in the investigation of transonic flow.
Advection equation:
The advection equation describes the transport of a conserved scalar ψ in a velocity field
. It is:
If the velocity field is solenoidal (that is, ), then the equation may be simplified to
In the one-dimensional case where u is not constant and is equal to ψ, the equation is referred to
as Burgers' equation.
Ginzburg–Landau equation:
The Ginzburg–Landau equation is used in modelling superconductivity. It is
where and are constants and i is the imaginary unit.
The Dym equation:
The Dym equation is named for Harry Dym and occurs in the study of solitons. It is
Initial-boundary value problems:
If the string is stretched between two points where x=0 and x=L and u denotes the amplitude of
the displacement of the string, then u satisfies the one-dimensional wave equation in the region
where 0<x<L and t is unlimited. Since the string is tied down at the ends, u must also satisfy the
boundary conditions
as well as the initial conditions
The method of separation of variables for the wave equation
leads to solutions of the form
where
where the constant k must be determined. The boundary conditions then imply that X is a
multiple of sin kx, and k must have the form
where n is an integer. Each term in the sum corresponds to a mode of vibration of the string. The
mode with n=1 is called the fundamental mode, and the frequencies of the other modes are all
multiples of this frequency. They form the overtone series of the string, and they are the basis for
musical acoustics. The initial conditions may then be satisfied by representing f and g as infinite
sums of these modes. Wind instruments typically correspond to vibrations of an air column with
one end open and one end closed. The corresponding boundary conditions are
The method of separation of variables can also be applied in this case, and it leads to a series of
odd overtones.
The general problem of this type is solved in Sturm–Liouville theory.
If a membrane is stretched over a curve C that forms the boundary of a domain D in the plane, its
vibrations are governed by the wave equation
if t>0 and (x,y) is in D. The boundary condition is u(t,x,y) = 0 if (x,y) is on C. The method of
separation of variables leads to the form
which in turn must satisfy
The latter equation is called the Helmholtz Equation. The constant k must be determined in order
to allow a non-trivial v to satisfy the boundary condition on C. Such values of k2 are called the
eigenvalues of the Laplacian in D, and the associated solutions are the eigenfunctions of the
Laplacian in D. The Sturm–Liouville theory may be extended to this elliptic eigenvalue problem
(Jost, 2002).
Schrödinger equation :
The Schrödinger equation is a PDE at the heart of non-relativistic quantum mechanics. In the
WKB approximation it is the Hamilton–Jacobi equation.
Except for the Dym equation and the Ginzburg–Landau equation, the above equations are linear
in the sense that they can be written in the form Au = f for a given linear operator A and a given
function f. Other important non-linear equations include the Navier–Stokes equations describing
the flow of fluids, and Einstein's field equations of general relativity.
Some linear, second-order partial differential equations can be classified as parabolic, hyperbolic
or elliptic. Others such as the Euler–Tricomi equation have different types in different regions.
The classification provides a guide to appropriate initial and boundary conditions, and to
smoothness of the solutions.
Equations of second order:
Assuming uxy = uyx, the general second-order PDE in two independent variables has the form
where the coefficients A, B, C etc. may depend upon x and y. This form is analogous to the
equation for a conic section:
Just as one classifies conic sections into parabolic, hyperbolic, and elliptic based on the
discriminant B2 − 4AC, the same can be done for a second-order PDE at a given point. However,
the discriminant in a PDE is given by B2 − AC.
1. : solutions of elliptic PDEs are as smooth as the coefficients allow, within the
interior of the region where the equation and solutions are defined. For example, solutions of
Laplace's equation are analytic within the domain where they are defined, but solutions may
assume boundary values that are not smooth. The motion of a fluid at subsonic speeds can be
approximated with elliptic PDEs, and the Euler–Tricomi equation is elliptic where x<0.
2. : equations that are parabolic at every point can be transformed into a form
analogous to the heat equation by a change of independent variables. Solutions smooth out as the
transformed time variable increases. The Euler–Tricomi equation has parabolic type on the line
where x=0.
3. : hyperbolic equations retain any discontinuities of functions or derivatives in
the initial data. An example is the wave equation. The motion of a fluid at supersonic speeds can
be approximated with hyperbolic PDEs, and the Euler–Tricomi equation is hyperbolic where x>0.
If there are n independent variables x1, x2 , ..., xn, a general linear partial differential equation of
second order has the form
The classification depends upon the signature of the eigenvalues of the coefficient matrix.
1. Elliptic: The eigenvalues are all positive or all negative.
2. Parabolic : The eigenvalues are all positive or all negative, save one which is zero.
3. Hyperbolic: There is only one negative eigenvalue and all the rest are positive, or there is
only one positive eigenvalue and all the rest are negative.
4. Ultrahyperbolic: There is more than one positive eigenvalue and more than one negative
eigenvalue, and there are no zero eigenvalues. There is only limited theory for
ultrahyperbolic equations (Courant and Hilbert, 1962).
Systems of first-order equations and characteristic surfaces:
The classification of partial differential equations can be extended to systems of first-order
equations, where the unknown u is now a vector with m components, and the coefficient matrices
Aν are m by m matrices for . The partial differential equation takes the form
where the coefficient matrices Aν and the vector B may depend upon x and u. If a hypersurface S
is given in the implicit form
where φ has a non-zero gradient, then S is a characteristic surface for the operator L at a given
point if the characteristic form vanishes:
The geometric interpretation of this condition is as follows: if data for u are prescribed on the
surface S, then it may be possible to determine the normal derivative of u on S from the
differential equation. If the data on S and the differential equation determine the normal
derivative of u on S, then S is non-characteristic. If the data on S and the differential equation do
not determine the normal derivative of u on S, then the surface is characteristic, and the
differential equation restricts the data on S: the differential equation is internal to S.
1. A first-order system Lu=0 is elliptic if no surface is characteristic for L: the values of u on
S and the differential equation always determine the normal derivative of u on S.
2. A first-order system is hyperbolic at a point if there is a space-like surface S with normal
ξ at that point. This means that, given any non-trivial vector η orthogonal to ξ, and a
scalar multiplier λ, the equation
has m real roots λ1, λ2, ..., λm. The system is strictly hyperbolic if these roots are always distinct.
The geometrical interpretation of this condition is as follows: the characteristic form Q(ζ)=0
defines a cone (the normal cone) with homogeneous coordinates ζ. In the hyperbolic case, this
cone has m sheets, and the axis ζ = λ ξ runs inside these sheets: it does not intersect any of them.
But when displaced from the origin by η, this axis intersects every sheet. In the elliptic case, the
normal cone has no real sheets.
Equations of mixed type:
If a PDE has coefficients which are not constant, it is possible that it will not belong to any of
these categories but rather be of mixed type. A simple but important example is the Euler–
Tricomi equation
which is called elliptic-hyperbolic because it is elliptic in the region x < 0, hyperbolic in the
region x > 0, and degenerate parabolic on the line x = 0.
Sepration Of Variables:
In the method of separation of variables, one reduces a PDE to a PDE in fewer variables, which
is an ODE if in one variable – these are in turn easier to solve.
This is possible for simple PDEs, which are called separable partial differential equations, and
the domain is generally a rectangle (a product of intervals). Separable PDEs correspond to
diagonal matrices – thinking of "the value for fixed x" as a coordinate, each coordinate can be
understood separately.
This generalizes to the method of characteristics, and is also used in integral transforms.
Method of characteristics:
In special cases, one can find characteristic curves on which the equation reduces to an ODE –
changing coordinates in the domain to straighten these curves allows separation of variables, and
is called the method of characteristics.
More generally, one may find characteristic surfaces.
Integral transform:
An integral transform may transform the PDE to a simpler one, in particular a separable PDE.
This corresponds to diagonalizing an operator.
An important example of this is Fourier analysis, which diagonalizes the heat equation using the
eigenbasis of sinusoidal waves.
If the domain is finite or periodic, an infinite sum of solutions such as a Fourier series is
appropriate, but an integral of solutions such as a Fourier integral is generally required for
infinite domains. The solution for a point source for the heat equation given above is an example
for use of a Fourier integral.
Change of variables:
Often a PDE can be reduced to a simpler form with a known solution by a suitable change of
variables. For example the Black–Scholes PDE
is reducible to the heat equation
by the change of variables (for complete details see Solution of the Black Scholes Equation)
Fundamental solution:
Inhomogeneous equations can often be solved (for constant coefficient PDEs, always be solved)
by finding the fundamental solution (the solution for a point source), then taking the convolution
with the boundary conditions to get the solution.
This is analogous in signal processing to understanding a filter by its impulse response.
Superposition principle:
Because any superposition of solutions of a linear PDE is again a solution, the particular
solutions may then be combined to obtain more general solutions.
Methods for non-linear equations:
There are no generally applicable methods to solve non-linear PDEs. Still, existence and
uniqueness results (such as the Cauchy–Kowalevski theorem) are often possible, as are proofs of
important qualitative and quantitative properties of solutions (getting these results is a major part
of analysis). Computational solution to the nonlinear PDEs, the Split-step method, exist for
specific equations like nonlinear Schrödinger equation.
Nevertheless, some techniques can be used for several types of equations. The h-principle is the
most powerful method to solve underdetermined equations. The Riquier–Janet theory is an
effective method for obtaining information about many analytic overdetermined systems.
The method of characteristics (Similarity Transformation method) can be used in some very
special cases to solve partial differential equations.
In some cases, a PDE can be solved via perturbation analysis in which the solution is considered
to be a correction to an equation with a known solution. Alternatives are numerical analysis
techniques from simple finite difference schemes to the more mature multigrid and finite element
methods. Many interesting problems in science and engineering are solved in this way using
computers, sometimes high performance supercomputers.
Suppose you want to forecast the weather this weekend in Los Angeles. You construct a formula
for the temperature as a function of several environmental variables, each of which is not entirely
predictable. Now you would like to see how your weather forecast would change as one
particular environmental factor changes, holding all the other factors constant. To do this
investigation, you would use the concept of a partial derivative.
Let the temperature T depend on variables x and y, T=f(x y). The rate of change of f with respect
to x (holding y constant) is called the partial derivative of f with respect to x and is denoted by
fx(x y). Similarly, the rate of change of f with respect to y is called the partial derivative of f
with respect to y and is denoted by fy(x y).
We define
fx(x y)=limh 0hf(x+h y)−f(x y)
fy(x y)=limh 0hf(x y+h)−f(x y)
Do you see the similarity beween these and the limit
definition of a function of one variable?
Example:
Let f(x y) Then fx(x y) = = = = xy2 limh 0h(x+h)y2−xy2 limh 0hhy2 y2 fy(x y) = = = = limh
0hx(y+h)2−xy2 limh 0h2xyh+xh2 limh 0(2xy+xh) 2xy
In practice, we use our knowledge of single-variable calculus to compute partial derivatives. To
calculate fx(x y), you view y as a constant and differentiate f(x y) with respect to x:
fx(x y)=y2 as expected since ddx[x]=1
Similarly, Partial Differentiation:
The usual rules for differentiation apply when dealing with several variables, but we now require
to treat the variables one at a time, keeping the others constant. It is for this reason that a new
symbol for differentiation is introduced. Consider the function
f (x, y) =
We can consider y fixed, and so treat it as a constant, to get a partial derivative
=
where we have differentiated with respect to x as usual. Or we can treat x as a constant, and
differentiate with respect to y, to get
= = .
Although a partial derivative is itself a function of several variables, we often want to evaluate it
at some fixed point, such as (x0, y0). We thus often write the partial derivative as
(x0, y0).
There are a number of different notations in use to try to help understanding in different
situations. All of the following mean the same thing:-
(x0, y0), f1(x0, y0), fx(x0, y0) and D1F(x0, y0).
Note also that there is a simple definition of the derivative in terms of a Newton quotient:-
=
provided of course that the limit exists.
Example 8.5 Let z = sin(x/y). Compute x + y .
Solution. Treating first y and then x as constants, we have
= cos and = cos .
thus
x + y = cos - cos = 0.
Note:This equation is an equation satisfied by the function we started with, which involves both
the function, and its partial derivatives. We shall meet a number of examples of such a partial
differential equation later.
Example 8.6 Let z = log(x/y). Show that x + y = 0.
The fact that the last two function satisfy the same differential equation is not a co-incidence.
With our next result, we can see that for any suitably differentiable function f, the function z(x, y)
= f (x/y) satisfies this partial differential equation.
Example 8.7 Let z = f (x/y), where f is suitably differentiable. Show that x + y = 0.
Because the definitions are really just version of the 1-variable result, these examples are quite
typical; most of the usual rules for differentiation apply in the obvious way to partial derivatives
exactly as you would expect. But there are variants. Here is how we differentiate compositions.
Example 8.8 Assume that f and all its partial derivatives fx and fy are continuous, and that x =
x(t) and y = y(t) are themselves differentiable functions of t. Let
F(t) = f (x(t), y(t)).
Then F is differentiable and
= + .
Proof. Write x = x(t), x0 = x(t0) etc. Then we calculate the Newton quotient for F.
F(t) - F(t0)= f (x, y) - f (x0, y0)
= f (x, y) - f (x0, y) + f (x0, y) - f (x0, y0)
=( , y)(x - x0) + (x0, )(y - y0)
Here we have used the Mean Value Theorem (5.18) to write
f (x, y) - f (x0, y) = ( , y)(x - x0)
for some point between x and x0, and have argued similarly for the other part. Note that ,
pronounced ``Xi'' is the Greek letter ``x''; in the same way , pronounced ``Eta'' is the Greek
letter ``y''. Thus
= ( , y) + (x0, )
Now let t t0, and note that in this case, x x0 and y y0; and since and are trapped
between x and x0, and y and y0 respectively, then also x0 and y0. The result then
follows from the continuity of the partial derivatives.
Example 8.9 Let f (x, y) = xy, and let x = cos t, y = sin t. Compute when t = /2.
Solution. From the chain rule,
= + = - y(t)sin t + x(t)cos t = - 1.sin( /2) = - 1.
The chain rule easily extends to the several variable case; only the notation is complicated. We
state a typical example
Example 8.10 Let x = x(u, v), y = y(u, v) and z = z(u, v), and let f be a function defined on a
subset U 3, and suppose that all the partial derivatives of f are continuous. Write
F(u, v) = f (x(u, v), y(u, v), z(u, v)).
Then
= + + and = = + + .
The introduction of the domain of f above, simply to show it was a function of three variables is
clumsy. We often do it more quickly by saying
Let f (x, y, z) have continuous partial derivatives
This has the advantage that you are reminded of the names of the variables on which f acts,
although strictly speaking, these names are not bound to the corresponding places. This is an
example where we adopt the notation which is very common in engineering maths. But note the
confusion if you ever want to talk about the value f (y, z, x), perhaps to define a new function g(x,
y, z).
Example 8.11 Assume that f (u, v, w) has continuous partial derivatives, and that
u = x - y, v = y - z w = z - x.
Let
F(x, y, z) = f (u(x, y, z), v(x, y, z), w(x, y, z)).
Show that
+ + = 0.
Solution. We apply the chain rule, noting first that from the change of variable formulae, we
have
= 1,
= - 1,
= 0,
Then
=.1 + .0 + . - 1 = -
=. - 1 + .1 + .0 = - +
=.0 + . - 1 + .1 = - +
fy(x y)=2xy since ddy y2 =2y
Partial Derivatives -- Pictorial Representation:
Executive summary:
Partial derivatives have many important uses in math and science. We shall see that a partial derivative is
not much more or less than a particular sort of directional derivative. The only trick is to have a reliable
way of specifying directions ... so most of this note is concerned with formalizing the idea of direction.
This results in a nice graphical representation of what “partial derivative” means. Perhaps even more
importantly, the diagrams go hand-in-hand with a nice-looking exact formula in terms of wedge products.
Partial derivatives are particularly confusing in non-Cartesian coordinate systems, such as are commonly
encountered in thermodynamics.
Some of it is because of the intrinsic complexity of having so many different variables to worry
about.
Some of it is because there is a lot of sloppy notation in the literature, notation that makes it
difficult (if not impossible) to figure out what the symbols are supposed to mean. You can get
away with a certain amount of sloppiness in Cartesian coordinates, but that often leads to bad
habits. Lots of things that are true in Cartesian coordinates are not true in general.
We will illustrate the discussion with a sample problem. Let there be five variables, namely {R, G, B, V,
and Y}, four of which are explicitly shown in figure 1. Let there be D degrees of freedom, so that these
five variables exist in an abstract D-dimensional space. We assume that the number of variables is larger
than D, so that not all of the variables are independent. The relationships among the variables are implicit
in figure 1. We will use the information in the figure to find the partial derivatives, and then the partial
derivatives can be integrated to find the explicit relationships.
For simplicity, you can focus on the D=3 case, but the generalization to more degrees of
freedom, or fewer, is straightforward, and will be pointed out from time to time. (Remember that
D has nothing to do with the dimensionality of the physical space we live in.)
Figure 1: Contours of Constant R, G, B, and V
In accordance with an unhelpful thermodynamic tradition, you may be tempted to identify D of our variables as “the” independent variables and the rest of them as “the” dependent variables. Well, do so if you must, but please bear in mind that the choice of independent variables is not carved in stone. My choice may differ from your choice, and for that matter I reserve the right to change my mind in the course of the calculation. As discussed at length in section 2 and section 5, we can get along just fine treating all variables on an equal footing.
Also note that having only five variables is a simplification. In a real thermodynamics problem, you could easily have ten or more, for instance energy, entropy, temperature, pressure, volume, number of particles, chemical potential, enthalpy, free energy, free enthalpy, et cetera. There could easily be more, especially if you have D>3 degrees of freedom.
The point is, variables are variables. At each point in the abstract D-dimensional space, each of our variables takes on a definite value. In the jargon of thermodynamics, one might say that each of our variables is a “thermodynamic potential” or equivalently a “function of state”. Trying to classify some of them as “dependent” or “independent” requires us to know more than we need to know ... it requires us to make decisions that didn’t need to be made.
We will be using vectors, but we will not assume that we have a dot product. (This situation is typical in thermodynamics.) As a result, there is no unique R direction, G direction, B direction, V direction, or Y direction. The best we can do is to identify contours of constant R, G, B, V, and/or Y. In figure 1,
The plane of the paper is a contour of constant Y. In three dimensions we have shells of constant R. The red curves in the figure are where
the shells of constant R intersect the paper. So motion along a red line keeps both R and Y constant.
Similarly, we have shells of constant G (dashed green), constant B (blue), and constant V (violet).
It is worth re-emphasizing: There is no such thing as an R axis in figure 1. If there were an R axis, it would be perpendicular to the red curves (the contours of constant R), but in this space we have no notion of perpendicular. We have no notion of angle at all (other than zero angle), because we have no dot product. The best we can do is to identify the contours of constant R. Any step -- any step -- that does not follow such a contour brings an increase or decrease in R. There are innumerably many ways of taking such a step, and we have no basis for choosing which direction should represent “the” unique direction of increasing R.
You can’t specify a direction in terms of any one variable, because almost every variable is changing in almost every direction. If we had a dot product, we could identify the steepest direction, but we don’t so we can’t.
The best way to specify a direction is as the direction in which a certain D-1 variables are not changing. This works always, with or without a dot product.
In our sample problem, D=3. The black arrow in figure 1 represents a direction, namely the direction of constant R and constant Y.
In general, it is unacceptable to think of ∂G/∂B as being the derivative of G with respect to B “other things being equal” -- because in general other things cannot be equal. Which other things do you mean, anyway? Constant R? Constant V? Constant Y?
Sometimes, the context might indicate which other things are to be held constant -- but if you rely on this, it’s just a matter of time before you get into trouble. If there is any doubt, rely on the idea of directional derivative, and specify the direction. So a proper partial derivative can be written as:
∂G
∂B
||||
R,Y (1)
which is pronounced “the partial derivative of G with respect to B at constant R and Y”.
The black arrow in figure 1 depicts the physical meaning of equation 1. To evaluate the derivative, take an infinitesimal step in the direction of constant R and Y, as shown by the arrow. The length of the step doesn’t matter, so long as it is small enough; the length will drop out of the calculation.
Next, count the number of G-contours crossed by the vector. This will be the numerator. In our sample problem, the numerator is 2. Also count the number of B contours crossed by the vector. This will be the denominator. In our sample problem, the denominator is 1. Then divide numerator by denominator. We conclude that
∂G
∂B
||||
R,Y= 2 (2)
We take this as the definition of partial derivative: take a step in the given direction. This gives
us two points in space, the tip-point and the tail-point. There will be a difference in the
“numerator” variable (tip minus tail) and a difference in the “denominator” variable (tip minus
tail). Divide the differences, and take the limit as stepsize goes to zero. That’s it.
By the way, when most people see diagrams such as figure 1, they are tempted to call it the
“geometric interpretation” of a partial derivative. Technically, though, that’s not the right name,
because if you don’t have a metric, you don’t have a geometry. Properly speaking, what we are
doing here is differential topology, not geometry. Of course, if you wish to conjure up a metric,
you can re-interpret everything in terms of geometry ... but the discussion here is quite a bit more
general than that. In particular, all the relationships discussed here are invariant if you stretch the
paper on which the diagrams are drawn. That’s the hallmark of topology, setting it apart from
geometry.
Notation :
Let z=f(x y).
The partial derivative fx(x y) can also be written as
f x(x y) or z x
Similarly, fy(x y) can also be written as
y f(x y) or y z
The partial derivative fy(x y) evaluated at the point (x0 y0) can be expressed in several
ways:
fx(x0 y0) f x (x0 y0) or f x(x0 y0)
There are analogous expressions for fy(x0 y0).
Geometrical Meaning:
Suppose the graph of z=f(x y) is the surface shown. Consider the partial derivative of f with
respect to x at a point (x0 y0).
Holding y constant and varying x, we trace out a curve that is the intersection of the surface with
the vertical plane y=y0.
The partial derivative fx(x0 y0) measures the change in z per unit increase in x along this curve.
That is, fx(x0 y0) is just the slope of the curve at (x0 y0). The geometrical interpretation of fy(x0
y0) is analogous.
Higher Derivatives:
Note that a partial derivative is itself a function of two variables, and so further partial
derivatives can be calculated. We write
= , = , = , = .
This notation generalises to more than two variables, and to more than two derivatives in the way
you would expect. There is a complication that does not occur when dealing with functions of a
single variable; there are four derivatives of second order, as follows:
, , and .
Fortunately, when f has mild restrictions, the order in which the differentiation is done doesn't
matter.
Example 8.12 Assume that all second order derivatives of f exist and are continuous. Then
the mixed second order partial derivatives of f are equal. i.e.
= .
Example 8.13 Suppose that f (x, y) is written in terms of u and v where x = u + log v and y =
u - log v. Show that, with the usual convention,
= + 2 +
and
v2 = - + + 2 +
You may assume that all second order derivatives of f exist and are continuous.
Solution. Using the chain rule, we have
= + = + and = + = - .
Thus using both these and their operator form, we have
=
+ = + + = +
+ + ,
whiledifferentiatingwithrespectt
o$v$, wehave
=- = - + +
-
=- + - + -
-
=
- + - 2 + .
Solving equations by Substitution:
One of the main interests in partial differentiation is because it enables us to write down how we
expect the natural world to behave. We move away from 1-variable results as soon as we have
properties which depend on e.g. at least one space variable, together with time. We illustrate with
just one example, designed to whet the appetite for the whole subject of mathematical physics.
Assume the displacement of a length of string at time t from its rest position is described by the
function f (x, t). This is illustrated in Fig 8.4. The laws of physics describe how the string
behaves when released in this position, and allowed to move under the elastic forces of the
string; the function f satisfies the wave equation
= c2 .
Figure 8.5: A string displaced from the equilibrium position
Example 8.14 Solve the equation
= 0.
Solution. Such a function is easy to integrate, because the two variables appear independently. So
= g1(v), where g1 is an arbitrary (differentiable) function. since when differentiated with
respect to u we are given that = 0. Thus we can integrate with respect to v to get
F(u, v) = g1(v)dv + h(u) = g(v) + h(u),
where h is also an arbitrary (differentiable) function.
Example 8.15 Rewrite the wave equation using co-ordinates u = x - ct and v = x + ct.
Solution. Write f (x, t) = F(u, v) and now in principle confuse F with f, so we can tell them apart
only by the names of their arguments. In practice we use different symbols to help the learning
process; but note that in a practical case, all the F's that appear below, would normally be written
as f's By the chain rule
= .1 + .1 and = .(- c) + .c.
differentiating again, and using the operator form of the chain rule as well,
=c - c c - c
=
c2 - c2 - c2 + c2
=
c2 + - 2c2
and similarly
=
+ + 2 .
Substituting in the wave equation, we thus get
4c2 = 0,
an equation which we have already solved. Thus solutions to the wave equation are of the form f
(u) + g(v) for any (suitably differentiable) functions f and g. For example we may have sin(x - ct).
Note that this is not just any function; it must be constant when x = ct.
the partial derivative of a multivariable function f is simply its derivative with respect to only one
variable, keeping all other variables constant (which are not functions of the variable in
question). The formal definition is
where ei is the standard basis vector of the i th variable. Since this only affects the i th variable,
one can derive the function using common rules and tables, treating all other variables (which are
not functions of ai ) as constants. For example, if f(x)=x2+2xy+y2+y3z , then
Note that in equation (1) , we treated y as a constant, since we were differentiating with respect
to x . dxd(c x)=c The partial derivative of a vector-valued function f (x) with respect to
variable ai is a vector −− Dif= f ai .
Multiple Partials:
Multiple partial derivatives can be treated just like multiple derivatives. There is an additional
degree of freedom though, as you can compound derivatives with respect to different variables.
For example, using the above function,
D12 is another way of writing x1 x2 . If f(x) is continuous in the neighborhood of x , and
Dijf and Djif are continuous in an open set V , it can be shown (see Clairaut's theorem) that
Dijf(x)=Djif(x) in V , where i j are the ith and jth variables. In fact, as long as an equal number
of partials are taken with respect to each variable, changing the order of differentiation will
produce the same results in the above condition.
Another form of notation is f(a b c )(x) where a is the partial derivative with respect to the first
variable a times, b is the partial with respect to the second variable b times, etc.
Bound as we humans are to three spacial dimensions, multi-variable functions can be very
difficult to get a good feel for. (Try picturing a function in the 17th dimension and see how far
you get!) We can at least make three-dimensional models of two-variable functions, but even
then at a stretch to our intuition. What is needed is a way to cheat and look at multi-variable
functions as if they were one-variable functions.
We can do this by using partial functions. A partial function is a one-variable function obtained
from a function of several variables by assigning constant values to all but one of the
independent variables. What we are doing is taking two-dimensional "slices" of the surface
represented by the equation.
For Example: z=x2-y2 can be modeled in three dimensional
space, but personally I find it difficult to sketch! In the section
on critical points a picture of a plot of this function can be found
as an example of a saddle point. But by alternately setting x=1
(red), x=0.5 (white), and x=0.25 (green), we can take slices of
z=x2-y2 (each one a plane parallel to the z-y plane) and see
different partial functions. We can get a further idea of the
behavior of the function by considering that the same curves are
obtained for x=-1, -0.5 and -0.25.
Food For Thought:
How do partial functions compare to level curves and level surfaces? If the function f is a
continuous function, does the level set or surface have to be continuous? What about partial
functions?
All of this helps us to get to our main topic, that is, partial differentiation. We know how to take
the derivative of a single-variable function. What about the derivative of a multi-variable
function? What does that even mean? Partial Derivatives are the beginning of an answer to that
question.
A partial derivative is the rate of change of a multi-variable function when we allow only one
of the variables to change. Specifically, we differentiate with respect to only one variable,
regarding all others as constants (now we see the relation to partial functions!). Which essentially
means if you know how to take a derivative, you know how to take a partial derivative.
A partial derivative of a function f with respect to a variable x, say z=f(x,y1,y2,...yn) (where the yi's
are other independent variables) is commonly denoted in the following ways:
(referred to as ``partial z, partial x'')
(referred to as ``partial f, partial x'' )
Note that this is not the usual derivative ``d'' . The funny ``d'' symbol in the notation is called
``roundback d'', ``curly d'' or ``del d'' (to distinguish from ``delta d''; the symbol is actually a
``lowercase Greek `delta' '').
The next set of notations for partial derivatives is much more compact and especially used when
you are writing down something that uses lots of partial derivatives, especially if they are all
different kinds:
(referred to as ``partial z, partial x'')
(referred to as ``partial f, partial x'')
(referred to as ``partial f, partial x'')
Any of the above is equivalent to the limit
.
To get an intuitive grasp of partial derivatives, suppose you were an ant crawling over some
rugged terrain (a two-variable function) where the x-axis is north-south with positive x to the
north, the y-axis is east-west and the z-axis is up-down. You stop at a point P=(x0, y0, z0) on a hill
and wonder what sort of slope you will encounter if you walk in a straight line north. Since our
longitude won't be changing as we go north, the y in our function is constant. The slope to the
north is the value of fx(x0, y0).
The actual calculations of partial derivatives for most functions is very easy! Treat every
indpendent variable except the one we are interested in as if it were a constant and apply the
familiar rules!
Example:
Let's find fx and fy of the function z=f=x2 -3x2y+y3. To find fx, we will treat y as a constant and
differentiate. So, fx=2x-6xy. By treating x as a constant, we find fy=-3x2+3y2.
Second Partial Derivatives:
Observe carefully that the expression fxy implies that the function f is differentiated first with
respect to x and then with respect to y, which is a natural inference since fxy is really (fx)y.
For the same reasons, in the case of the expression,
it is implied that we differentiate first with respect to y and then with respect to x.
Below are examples of pure second partial derivatives:
Example:
Lets find fxy and fyx of f=exy + y(sinx).
fx=yexy + ycosx
fxy=xyexy + cosx
fy=xexy + sinx
fyx=xyexy + cosx
In this example fxy=fyx. Is this true in general? Most of the time and in most examples that you
will probably ever see, yes. More precisely, if
both fxy and fyx exist for all points near (x0,y0)
and are continuous at (x0,y0),
then fxy=fyx.
Partial Derivatives of higher order are defined in the obvious way. And as long as suitable
continuity exists, it is immaterial in what order a sequence of partial differentiation is carried out.
Electrical circuits: Changes in current:-
In an electrical circuit with electromotive force (EMF) of E volts and resistance R ohms, the current, I, is
I=E/R amperes.
Question: (a) At the instant when E=120 and R=15 , what is the rate of change of current with respect to voltage ) (b)What is the rate of change of current with respect to resistance?
Answer:
The rate of change of current with respect to voltage =
the partial derivative of I with respect to voltage, holding resistance constant =
Þ
when E=120 and R=15 , we have
.
Our verbal conclusion becomes: If the resistance is fixed at 15 ohms, the current is increasing with respect to voltage at the rate of 0.0667 amperes per volt when the EMF is 120 volts.
Part (b): What is the rate of change of current with respect to resistance?
Using similar observations to part (a) we conclude:
the partial derivative of I with respect to resistance, holding voltage constant =
Þ
when E=120 and R=15 , we have
Our verbal conclusion becomes: If the EMF is fixed at 120 volts, the current is decreasing with respect to resistance at the rate of 0.5333 amperes per ohm when the resistance is 15 ohms.