chapter iii - university of british columbia department of ...hajeong/chapter3.pdfchapter iii...

47

Upload: others

Post on 31-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Chapter III

    Orthogonality

    91

  • III.1. Projections

    Prerequisites and Learning Goals

    After completing this section, you should be able to

    • Write down the definition of an orthogonal projection matrix and determine when a matrixis an orthogonal projection.

    • Identify the range of a projection matrix as the subspace onto which it projects and useproperties of the projection matrix to derive the orthogonality relation between the rangeand nullspace of the matrix.

    • Given the projection matrix onto a subspace S, compute the projection matrix onto S⊥;describe the relations between the nullspace and range of the two matrices.

    • Use orthogonal projection matrices to decompose a vector into components parallel to andperpendicular to a given subspace.

    • Explain how the problem of finding the projection of a vector b onto a certain subspacespanned by the columns of a matrix A translates into finding a vector x that minimizes thelength ‖Ax − b‖; show that x always exists and satisfies the least squares equation; discusshow the results of the minimization problem can vary depending on the type of norm used;discuss the sensitivity of a least squares fit to outliers.

    • Compute the orthogonal projection matrix whose range is the span of a given collection ofvectors.

    • Perform least squares calculations to find polynomial fits to a given set of points, or in otherapplications where overdetermined systems arise. You should be able to perform all necessarycomputations and plot your results using MATLAB/Octave.

    • Interpret the output of the MATLAB/Octave \ command when applied to systems that haveno solutions.

    III.1.1. Warm up: projections onto lines and planes in R3

    Let a be a vector in three dimensional space R3 and let L = span(a) = {sa : s ∈ R} be the linethrough a. The line L can also be identified as the range R(a), where a is considered to be a 3× 1matrix.

    The projection of a vector x onto L is defined to be the vector in L that is closest to x. In thediagram below, p is the projection of x onto L.

    92

  • p

    a

    L

    x

    If sa is a point on the line L then the distance from sa to x is ‖sa−x‖. To compute the projectionwe must find s that minimizes ‖sa−x‖. This is the same as minmizing the square ‖sa−x‖2. Now

    ‖sa− x‖2 = (sa− x) · (sa− x)= s2‖a‖2 − 2sa · x + ‖x‖2.

    To minimize this quantity, we can use elementary calculus: differentiate with respect to s, set thederivative equal to zero and solve for s. This yields

    s =a · x‖a‖2

    =1

    ‖a‖2aTx.

    and therefore the projection is given by

    p =1

    ‖a‖2(aTx)a

    It is useful to rewrite this formula. The product (aTx)a of the scalar (aTx) times the vector a canalso be written as a matrix product aaTx, if we consider x and a to be 3× 1 matrices. Thus

    p =1

    ‖a‖2aaTx.

    This formula says that the projection of x onto the line throught a can be obtained by multiplyingx by the 3× 3 matrix P given by

    P =1

    ‖a‖2aaT .

    We now make some observations about the matrix P .

    93

  • To begin, we observe that the matrix P satisfies the equation P 2 = P . To see why this must betrue, notice that P 2x = P (Px) is the vector in L closest to Px. But Px is already in L so theclosest vector to it in L is Px itself. Thus P 2x = Px, and since this is true for every x it must betrue that P 2 = P . We can also verify the equation P 2 = P directly by the calculation

    P 2 =1

    ‖a‖4(aaT )(aaT ) =

    1

    ‖a‖4a(aTa)aT =

    ‖a‖2

    ‖a‖4aaT =

    1

    ‖a‖2aaT = P

    (here we used that matrix multiplication is associative and aTa = ‖a‖2).

    Another fact about P is that it is equal to its transpose, that is, P T = P . This can also beverified directly by the calculation

    P T =1

    ‖a‖2(aaT )T =

    1

    ‖a‖2(aT )TaT =

    1

    ‖a‖2aaT = P.

    (here we use that (AB)T = BTAT and (AT )T = A).

    Clearly, the range of P isR(P ) = L.

    The equation P T = P lets us determine the null space too. Using one of the orthogonality relationsfor the four fundamental subspaces of P we find that

    N(P ) = R(P T )⊥ = R(P )⊥ = L⊥.

    Example: Compute the matrix P that projects onto the line L through a =

    12−1

    . Verify thatP 2 = P and P T = P . What vector in L is closest to x =

    111

    ?Let’s use MATLAB/Octave to do this calculation.

    >x = [1 1 1]’;

    >a = [1 2 -1]’;

    >P = (a’*a)^(-1)*a*a’

    P =

    0.16667 0.33333 -0.16667

    0.33333 0.66667 -0.33333

    -0.16667 -0.33333 0.16667

    >P*P

    94

  • ans =

    0.16667 0.33333 -0.16667

    0.33333 0.66667 -0.33333

    -0.16667 -0.33333 0.16667

    This verifies the equation P 2 = P . The fact that P T = P can be seen by inspection. The vector inL closest to x is given by

    >P*x

    ans =

    0.33333

    0.66667

    -0.33333

    Now we consider plane L⊥ orthogonal to L. Given a vector x, how can we find the projection ofx onto L⊥, that is, the the vector q in L⊥ closest to x? Looking at the picture,

    a

    q

    p

    x

    we can guess that q = x − p, where p = Px is the projection of x onto L. This would say thatq = x − Px = (I − P )x, where I denotes the identity matrix. In other words, Q = I − P is thematrix that projects on L⊥. We will see below that this guess is correct.

    Example: Compute the vector q in the plane orthogonal to a =

    12−1

    that is closest to x =11

    1

    .As in the previous example, let’s use MATLAB/Octave. Assume that a, x and P have been defined

    in the previous example. The 3× 3 identity matrix is computed using the command eye(3). If wecompute

    95

  • >Q=eye(3)-P

    Q =

    0.83333 -0.33333 0.16667

    -0.33333 0.33333 0.33333

    0.16667 0.33333 0.83333

    then the vector we are seeking is

    > Q*x

    ans =

    0.66667

    0.33333

    1.33333

    III.1.2. Orthogonal projection matrices

    A matrix P is called an orthogonal projection matrix if

    • P 2 = P

    • P T = P .

    The matrix 1‖a‖2aaT defined in the last section is an example of an orthogonal projection matrix.

    This matrix projects onto its range, which is one dimensional and equal to the span of a. We willsee below that every orthogonal projection matrix projects onto its range, but the range can haveany dimension.

    So, let P be an orthogonal projection matrix, and let Q = I − P . Then

    1. Q is also an orthogonal projection matrix.

    2. P+Q = I and PQ = QP = 0. (A consequence of this is that any vector in R(P ) is orthogonalto any vector in R(Q) since (Px) · (Qy) = x · (P TQy) = x · (PQy) = 0.)

    3. P projects onto its range R(P ). (In other words, Px is the closest vector in R(P ) to x.)

    4. Q projects onto N(P ) = R(P )⊥.

    Let’s verify these statements in order:

    1. This follows from Q2 = (I − P )(I − P ) = I − 2P + P 2 = I − 2P + P = I − P = Q and(I − P )T = IT − P T = I − P

    96

  • 2. The identity P + Q = I follows immediately from the definition of Q. The second identityfollows from PQ = P (I − P ) = P − P 2 = P − P = 0. The identity QP = 0 has a similarproof.

    3. We want to find the closest vector in R(P ) (i.e., of the form Py for some y) to x. To do thiswe must find the vector y that minimizes ‖Py − x‖2. We have

    ‖Py − x‖2 = ‖P (y − x)−Qx‖2 (using x = Px +Qx)= (P (y − x)−Qx) · (P (y − x)−Qx)= ‖P (y − x)‖2 + ‖Qx‖2 (the cross terms vanish by 2.)

    This is obviously minimized when y = x. Thus Px is the closest vector in R(P ) to x.

    4. Since Q is an orthogonal projection (by 1.) we know it projects onto R(Q) (by 3.). Since weknow that R(P )⊥ = N(P ) (from the basic subspace relation N(P ) = R(P T )⊥ and the factthat P T = P ) it remains to show that R(Q) = N(P ). First, note that x ∈ R(Q)⇔ Qx = x.(The implication ⇐ is obvious, while the implication ⇒ can be seen as follows. Supposex ∈ R(Q). This means x = Qy for some y. Then Qx = Q2y = Qy = x.) Now we cancomplete the argument: x ∈ R(Q)⇔ Qx = x⇔ (I − P )x = x⇔ Px = 0⇔ x ∈ N(P ).

    This section has been rather theoretical. We have shown that an orthogonal projection ma-trix projects onto its range. But suppose that a subspace is presented as span(a1, . . . ,ak) (orequivalently R([a1| · · · |ak]) for a given collection of vectors a1, . . . ,ak. How can we compute theprojection matrix P whose range is this given subspace, so that P projects onto it? We will answerthis question in the next section.

    III.1.3. Least squares and the projection onto R(A)

    We now consider linear equationsAx = b

    That do not have a solution. This is the same as saying that b 6∈ R(A) What vector x is closest tobeing a solution?

    R(A) = possible values of Ax

    Ax−bAx

    b

    We want to determine x so that Ax is as close as possible to b. In other words, we want tominimize ‖Ax−b‖. This will happen when Ax is the projection of b onto R(A), that is, Ax = Pb,where P is the projection matrix. In this case Qb = (I − P )b is orthogonal to R(A). But

    97

  • (I − P )b = b − Ax. Therefore (and this is also clear from the picture), we see that Ax − b isorthogonal to R(A). But the vectors orthogonal to R(A) are exactly the vectors in N(AT ). Thusthe vector we are looking for will satisfy AT (Ax− b) = 0 or the equation

    ATAx = ATb

    This is the least squares equation, and a solution to this equation is called a least squares solution.

    (Aside: We can also use Calculus to derive the least squares equation. We want to minimize‖Ax− b‖2. Computing the gradient and setting it to zero results in the same equations.)

    It turns out that the least squares equation always has a solution. Another way of saying thisis R(AT ) = R(ATA). Instead of checking this, we can verify that the orthogonal complementsN(A) and N(ATA) are the same. But this is something we showed before, when we considered theincidence matrix D for a graph.

    If x solves the least squares equation, the vector Ax is the projection of b onto the range R(A),since Ax is the closest vector to x in the range of A. In the case where ATA is invertible (thishappens when N(A) = N(ATA) = {0}), we can obtain a formula for the projection. Starting withthe least squares equation we multiply by (ATA)−1 to obtain

    x = (ATA)−1ATb

    so thatAx = A(ATA)−1ATb.

    Thus the projection matrix is given by

    P = A(ATA)−1AT

    Notice that the formula for the projection onto a line through a is a special case of this, since thenATA = ‖a‖2.

    It is worthwhile pointing out that if we say that the solution of the least squares equation givesthe “best” approximation to a solution, what we really mean is that it minimizes the distance, orequivalently, its square

    ‖Ax− b‖2 =∑

    ((Ax)i − bi)2.

    There are other ways of measuring how far Ax is from b, for example the so-called L1 norm

    ‖Ax− b‖1 =∑|(Ax)i − bi|

    Minimizing the L1 norm will result in a different “best” solution, that may be preferable undersome circumstances. However, it is much more difficult to find!

    98

  • III.1.4. Polynomial fit

    Suppose we have some data points (x1, y1), (x2, y2), . . . , (xn, yn) and we want to fit a polynomialp(x) = a1x

    m−1 +a2xm−2 + · · ·+am−1x+am through them. This is like the Lagrange interpolation

    problem we considered before, except that now we assume that n > m. This means that in generalthere will no such polynomial. However we can look for the least squares solution.

    To begin, let’s write down the equations that express the desired equalities p(xi) = yi for i =1, . . .m. These can be written in matrix form

    xm−11 xm−21 · · · x1 1

    xm−12 xm−22 · · · x2 1

    ...... · · ·

    ......

    ...... · · ·

    ......

    ...... · · ·

    ......

    xm−1n xm−2n · · · xn 1

    a1a2...am

    =

    y1y2.........yn

    or Aa = y. where A is a submatrix of the Vandermonde matrix. To find the least squares ap-proximation we solve ATAa = ATy. In a homework problem, you are asked to do this usingMATLAB/Octave.

    In case where the polynomial has degree one this is a straight line fit, and the equation we wantto solve are

    x1 1x2 1...xn 1

    [a1a2

    ]=

    y1y2...yn

    These equations will not have a solution (unless the points really do happen to lie on the sameline.) To find the least squares solution, we compute

    [x1 x2 · · · xn1 1 · · · 1

    ]x1 1x2 1...xn 1

    =[∑

    x2i∑xi∑

    xi n

    ]

    and [x1 x2 · · · xn1 1 · · · 1

    ]y1y2...yn

    =[∑

    xiyi∑yi

    ]

    This results in the familiar equations[∑x2i

    ∑xi∑

    xi n

    ] [a1a2

    ]=

    [∑xiyi∑yi

    ]

    99

  • which are easily solved.

    III.1.5. Football rankings

    We can try to use least squares to rank football teams. To start with, suppose we have three teams.We pretend each team has a value v1, v2 and v3 such that when two teams play, the difference inscores is the difference in values. So, if the season’s games had the following results

    1 vs. 2 30 401 vs. 2 20 402 vs. 3 10 03 vs. 1 5 03 vs. 2 5 5

    then the vi’s would satisfy the equations

    v2 − v1 = 10v2 − v1 = 20v2 − v3 = 10v3 − v1 = 5v2 − v3 = 0

    Of course, there is no solution to these equations. Nevertheless we can find the least squaressolution. The matrix form of the equations is

    Dv = b

    with

    D =

    −1 1 0−1 1 0

    0 1 −1−1 0 10 1 −1

    b =

    10201050

    The least squares equation is

    DTDv = DTb

    or 3 −2 −1−2 4 −2−1 −2 3

    v =−3540−5

    Before going on, notice that D is an incidence matrix. What is the graph? (Answer: the nodes

    are the teams and they are joined by an edge with the arrow pointing from the losing team to

    100

  • the winning team. This graph may have more than one edge joining to nodes, if two teams playmore than once. This is sometimes called a multi-graph.). We saw that in this situation N(D) isnot empty, but contains vectors whose entries are all the same. The situation is the same as forresistances, it is only differences in vi’s that have a meaning.

    We can solve this equation in MATLAB/Octave. The straightforward way is to compute

    >L = [3 -2 -1;-2 4 -2;-1 -2 3];

    >b = [-35; 40; -5];

    >rref([L b])

    ans =

    1.00000 0.00000 -1.00000 -7.50000

    0.00000 1.00000 -1.00000 6.25000

    0.00000 0.00000 0.00000 0.00000

    As expected, the solution is not unique. The general solution, depending on the parameter s is

    v = s

    111

    +−7.56.25

    0

    We can choose s so that the vi for one of the teams is zero. This is like grounding a node in a

    circuit. So, by choosing s = 7.5, s = −6.25 and s = 0 we obtain the solutions

    013.757.5

    ,−13.750−6.25

    or

    −7.56.250

    .Actually, it is easier to compute a solution with one of the vi’s equal to zero directly. If v =

    0v2v3

    then v2 =

    [v2v3

    ]satisfies the equation L2v2 = b2 where the matrix L2 is the bottom right 2 × 2

    block of L and b2 are the last two entries of b.

    >L2 = L(2:3,2:3);

    >b2 = b(2:3);

    >L2\b2

    ans =

    13.7500

    7.5000

    101

  • We can try this on real data. The football scores for the 2007 CFL season can be found athttp://www.cfl.ca/index.php?module=sked&func=view&year=2007. The differences in scoresfor the first 20 games are in cfl.m. The order of the teams is BC, Calgary, Edmonton, Hamilton,Montreal, Saskatchewan, Toronto, Winnipeg. Repeating the computation above for this data wefind the ranking to be (running the file cfl.m)

    v =

    0.00000

    -12.85980

    -17.71983

    -22.01884

    -11.37097

    -1.21812

    0.87588

    -20.36966

    Not very impressive, if you consider that the second-lowest ranked team (Winnipeg) ended up inthe Grey Cup game!

    102

  • III.2. Complex vector spaces and inner product

    Prerequisites and Learning Goals

    From your work in previous courses, you should be able to

    • Perform arithmetic with complex numbers.

    • Write down the definition of and compute the complex conjugate, modulus and argument ofa complex number.

    After completing this section, you should be able to

    • Define and perform basic matrix calculations with complex vectors and complex matrices.

    • Define and compute the complex inner product and the norm of complex vectors, state basicproperties of the complex inner product.

    • Define and compute the matrix adjoint for a complex matrix; explain its relation to thecomplex inner product; compare its properties to the properties of the transpose of a realmatrix.

    • Define an orthonormal basis for Cn and determine whether a set of complex vectors is anorthonormal basis; determine the coefficients in the expansion of a complex vector in anorthonormal basis.

    • Write down the definition of a unitary matrix and list its properties; recognize when a matrixis unitary.

    • Define and compute the inner product and the norm for complex- (or real-) valued functionsthat are defined on a given interval; define what it means for two functions to be orthonormal,and verify it in specific examples.

    • Define the complex exponential function, compute its value at given points and perform basiccomputations (addition, differentiation, integration) involving complex exponential functions.

    • Explain what are the elements of the vector space L2([a, b]) for an interval [a, b].

    • Use complex numbers in MATLAB/Octave computations, specifically real(z), imag(z),conj(z), abs(z), exp(z) and A’ for complex matrices.

    103

  • III.2.1. Why use complex numbers?

    So far the numbers (or scalars) we have been using have been real numbers. Now we will to startusing complex numbers as well. Here are two reasons why.

    1. Solving polynomial equations (finding roots, factoring): If we use complex numbers, thenevery polynomial

    p(z) = a1zn−1 + a2z

    n−2 + · · · an−1z + anwith a1 6= 0 (so that p(z) has degree n− 1) can be completely factored as

    p(z) = a1(z − r1)(z − r2) · · · (z − rn−1).

    The numbers r1, . . . , rn−1 are called the roots of p(z) and are the values of z for which p(z) = 0.Thus the equation p(z) = 0 always has solutions. There might not be n − 1 distinct solutions,though, since it may happen that a given root r occurs more than once. If r occurs k times in thelist, then we say r has multiplicty k. An important point is that the roots of a polynomial may becomplex even when the coefficients a1, . . . , an are real. For example z

    2 + 1 = (z + i)(z − i).

    2. Complex exponential: The complex exponential function eiθ is more convenient to use thancos(θ) and sin(θ) because it is easier to multiply, differentiate and integrate exponentials than trigfunctions.

    Solving polynomial equations will be important when studying eigenvalues, while the complexexponential appears in Fourier series and the discrete Fourier transform.

    III.2.2. Review of complex numbers

    Complex numbers can be thought of as points on the (x, y) plane. The point

    [xy

    ], thought of as a

    complex number, is written x+ iy (or x+ jy if you are an electrical engineer).

    If z = x+iy then x is called the real part of z and is denoted Re(z) while y is called the imaginarypart of z and is denoted Im(z).

    Complex numbers are added just like vectors in two dimensions. If z = x + iy and w = s + it,then

    z + w = (x+ iy) + (s+ it) = (x+ s) + i(y + t)

    The rule for multiplying two complex numbers is

    zw = (x+ iy)(s+ it) = (xs− yt) + i(xt+ ys)

    Notice that i is a square root of −1 since

    i2 = (0 + i)(0 + i) = (0− 1) + i(0 + 0) = −1

    104

  • This fact is all you need to remember to recover the rule for multiplying two complex numbers. Ifyou multiply the expressions for two complex numbers formally, and then substitute −1 for i2 youwill get the right answer. For example, to multiply 1 + 2i and 2 + 3i, compute

    (1 + 2i)(2 + 3i) = 2 + 3i+ 4i+ 6i2 = 2− 6 + i(3 + 4) = −4 + 7i

    Complex addition and multiplication obey the usual rules of algebra:

    z1 + z2 = z2 + z1 z1z2 = z2z1z1 + (z2 + z3) = (z1 + z2) + z3 z1(z2z3) = (z1z2)z3

    0 + z1 = z1 1z1 = z1z1(z2 + z3) = z1z2 + z1z3

    The negative of any complex number z = x + iy is defined by −z = −x + (−y)i, and obeysz + (−z) = 0.

    The modulus of a complex number, denoted |z|, is the length of the corresponding vector in twodimensions. If z = x+ iy

    |z| = |x+ iy| =√x2 + y2

    An important property is|zw| = |z||w|

    The complex conjugate of a complex number z, denoted z̄, is the reflection of z across the x axis.Thus

    x+ iy = x− iy.

    The complex conjugate obeysz + w = z̄ + w̄, zw = z̄w̄

    This means that complex conjugate of an algebraic expression can be obtained by changing all thei’s to −i’s, either before or after performing arithmetic operations. The complex conjugate alsoobeys

    zz̄ = |z|2.

    This last equality is useful for simplifying fractions of complex numbers by turning the denominatorinto a real number, since

    z

    w=

    zw̄

    |w|2

    For example, to simplify (1 + i)/(1− i) we can write

    1 + i

    1− i=

    (1 + i)2

    (1− i)(1 + i)=

    1− 1 + 2i2

    = i

    A complex number z is real (i.e. the y part in x + iy is zero) whenever z̄ = z. We also have thefollowing formulas for the real part and imaginary part of z. If z = x+iy then Re(z) = x = (z+z̄)/2and Im(z) = y = (z − z̄)/(2i)

    105

  • We define the exponential, eit, of a purely imaginary number it to be the number

    eit = cos(t) + i sin(t)

    lying on the unit circle in the complex plane.

    The complex exponential satisfies the familiar rule ei(s+t) = eiseit since by the addition formulasfor sine and cosine

    ei(s+t) = cos(s+ t) + i sin(s+ t)

    = cos(s) cos(t)− sin(s) sin(t) + i(sin(s) cos(t) + cos(s) sin(t))= (cos(s) + i sin(s))(cos(t) + i sin(t))

    = eiseit

    Any complex number can be written in polar form

    z = reiθ

    where r and θ are the polar co-ordinates of z. This means r = |z| and θ is the angle that the linejoining z to 0 makes with the real axis. The angle θ is called the argument of z, denoted arg(z).Since ei(θ+2πk) = eiθ for k ∈ Z the argument is only defined up to an integer multiple of 2π. Inother words, there are infinitely many choices for arg(z). We can always choose a value of theargument with −π < θ ≤ π; this choice is called the principal value of the argument.

    The polar form lets us understand the geometry of the multiplication of complex numbers. Ifz1 = r1e

    iθ1 and z2 = r2eiθ2 then

    z1z2 = r1r2ei(θ1+θ2)

    This shows that when we multiply two complex numbers, their arguments are added.

    The exponential of a number that has both a real and imaginary part is defined in the naturalway.

    ea+ib = eaeib = ea(cos(b) + i sin(b))

    The derivative of a complex exponential is given by the formula

    d

    dte(a+ib)t = (a+ ib)e(a+ib)t

    while the anti-derivative, for (a+ ib) 6= 0 is∫e(a+ib)tdt =

    1

    (a+ ib)e(a+ib)t + C

    If (a+ ib) = 0 then e(a+ib)t = e0 = 1 so in this case∫e(a+ib)tdt =

    ∫dt = t+ C

    106

  • III.2.3. Complex vector spaces and inner product

    The basic example of a complex vector space is the space Cn of n-tuples of complex numbers.Vector addition and scalar multiplication are defined as before:

    z1z2...zn

    +w1w2...wn

    =z1 + w1z2 + w2

    ...zn + wn

    , sz1z2...zn

    =sz1sz2...szn

    ,where now zi, wi and s are complex numbers.

    For complex matrices (or vectors) we define the complex conjugate matrix (or vector) by conju-gating each entry. Thus, if A = [ai,j ], then

    A = [ai,j ].

    The product rule for complex conjugation extends to matrices and we have

    AB = ĀB̄

    The inner product of two complex vectors w =

    w1w2...wn

    and z =z1z2...zn

    is defined by

    〈w, z〉 = wT z =n∑i=1

    wizi

    When the entries of w and z are all real, then this is just the usual dot product. (In these notes wewill reserve the notation w ·z for the case when w and z are real.) When the vectors are complex itis important to remember the complex conjugate in this definition. Notice that for complex vectorsthe order of w and z in the inner product matters: we have 〈z,w〉 = 〈w, z〉.

    With this definition for the inner product the norm of z is always positive since

    〈z, z〉 = ‖z‖2 =n∑i=1

    |zi|2

    For complex matrices and vectors we have to modify the rule for bringing a matrix to the other

    107

  • side of an inner product.

    〈w, Az〉 = wTAz= (ATw)T z

    =

    ((A

    Tw)

    )Tz

    = 〈ATw, z〉

    This leads to the definition of the adjoint of a matrix

    A∗ = AT.

    (In physics you will also see the notation A†.) With this notation

    〈w, Az〉 = 〈A∗w, z〉.

    MATLAB/Octave deals seamlessly with complex matrices and vectors. Complex numbers canbe entered like this

    >z= 1 + 2i

    z = 1 + 2i

    There is a slight danger here in that if i has be defined to be something else (e.g. i =16) then z=iwould set z to be 16. In this case, if you do want z to be equal to the number 0 + i, you could usez=1i to get the desired result, or use the alternative syntax

    >z= complex(0,1)

    z = 0 + 1i

    The functions real(z), imag(z), conj(z), abs(z) compute the real part, imaginary part, con-jugate and modulus of z.

    The function exp(z) computes the complex exponential if z is complex.

    If a matrix A has complex entries then A’ is not the transpose, but the adjoint (conjugatetranspose).

    >z = [1; 1i]

    108

  • z =

    1 + 0i

    0 + 1i

    z’

    ans =

    1 - 0i 0 - 1i

    Thus the square of the norm of a complex vector is given by

    >z’*z

    ans = 2

    This gives the same answer as

    >norm(z)^2

    ans = 2.0000

    (Warning: the function dot in Octave does not compute the correct inner product for complexvectors (it doesn’t take the complex conjugate). This has been fixed in the latest versions, so youshould check. In MATLAB dot works correctly for complex vectors.)

    109

  • III.3. Orthonormal bases, Orthogonal Matrices and Unitary Matrices

    Prerequisites and Learning Goals

    After completing this section, you should be able to

    • Write down the definition of an orthonormal basis, and determine when a given set of vectorsis an orthonormal basis.

    • Compute the coefficients in the expansion of a vector in an orthonormal basis.

    • Compute the norm of a vector from its coefficients in its expansion in an orthonormal basis.

    • Write down the definition of an orthogonal (unitary) matrix; recognize when a matrix is or-thogonal (unitary); describe the action of an orthogonal (unitary) matrix on vectors; describethe properties of the rows and columns of an orthogonal (unitary) matrix.

    III.3.1. Orthonormal bases

    A basis q1,q2, . . . is called orthonormal if

    1. ‖qi‖ = 1 for every i (normal)

    2. 〈qi,qj〉 = 0 for i 6= j (ortho).

    The standard basis given by

    e1 =

    100...

    , e2 =

    010...

    , e3 =

    001...

    , · · ·is an orthonormal basis for Rn. For example, e1 and e2 form an orthonormal basis for R2. Anotherorthonormal basis for R2 is

    q1 =1√2

    [11

    ], q2 =

    1√2

    [−1

    1

    ]

    The vectors in a basis for Rn can also be considered to be vectors in Cn. Any orthonormal basisfor Rn is also an orthonormal basis for Cn if we are using complex scalars (homework problem).Thus the two examples above are also orthonormal bases for Cn and C2 respectively. On the otherhand, the basis

    q1 =1√2

    [1i

    ], q2 =

    1√2

    [1−i

    ]is an orthonormal basis for C2 but not for R2.

    110

  • If you expand a vector in an orthonormal basis, it’s very easy to find the coefficients in theexpansion. Suppose

    v = c1q1 + c2q2 + · · ·+ cnqnfor some orthonormal basis q1,q2, . . . ,qn. Then, if we take the inner product of both sides withqk, we get

    〈qk,v〉 = c1〈qk,q1〉+ c2〈qk,q2〉+ · · ·+ ck〈qk,qk〉 · · ·+ cn〈qk,qn〉= 0 + 0 + · · ·+ ck + · · ·+ 0= ck

    This gives a convenient formula for each ck. For example, in the expansion[12

    ]= c1

    1√2

    [11

    ]+ c2

    1√2

    [−1

    1

    ]we have

    c1 =1√2

    [11

    ]·[12

    ]=

    3√2

    c2 =1√2

    [−1

    1

    ]·[12

    ]=

    1√2

    Notice also that the norm of v is easily expressed in terms of the coefficients ci. We have

    ‖v‖2 = 〈v,v〉= 〈c1q1 + c2q2 + · · ·+ cnqn, c1q1 + c2q2 + · · ·+ cnqn〉= |c1|2 + |c2|2 + · · ·+ |cn|2

    Another way of saying this is that the vector c = [c1, c2, . . . , cn]T of coefficients has the same norm

    as v.

    III.3.2. Orthogonal matrices and Unitary matrices

    If we put the vectors of an orthonormal basis into the columns of a matrix, the resulting matrix iscalled orthogonal (if the vectors are real) or unitary (if the vectors are complex). If q1,q2, . . .qn isan orthonormal basis then the expansion

    v = c1q1 + c2q2 + · · ·+ cnqn

    can be expressed as a matrix equation v = Qc where c = [c1, c2, . . . , cn]T and Q is the orthogonal

    (or unitary) matrix

    Q =

    [q1

    ∣∣∣∣∣ q2∣∣∣∣∣ · · ·

    ∣∣∣∣∣ qn]

    111

  • The fact that the columns of Q are orthonormal means that Q∗Q = I (equivalently Q∗ = Q−1).When the entries of Q are real, so that Q is orthogonal, then Q∗ = QT . So for orthogonal matricesQTQ = I (equivalently QT = Q−1).

    To see this, we compute

    Q∗Q =

    qT1qT2...qTn

    [q1

    ∣∣∣∣∣ q2∣∣∣∣∣ · · ·

    ∣∣∣∣∣ qn]

    =

    〈q1,q1〉 〈q1,q2〉 · · · 〈q1,qn〉〈q2,q1〉 〈q2,q2〉 · · · 〈q2,qn〉

    ......

    ......

    〈qn,q1〉 〈qn,q2〉 · · · 〈qn,qn〉

    =

    1 0 · · · 00 1 · · · 0...

    ......

    ...0 0 · · · 1

    .

    Another way of recognizing unitary and orthogonal matrices is by their action on vectors. SupposeQ is unitary. We already observed in the previous section that if v = Qc then ‖v‖ = ‖c‖. We canalso see this directly from the calculation

    ‖Qv‖2 = 〈Qv, Qv〉 = 〈v, Q∗Qv〉 = 〈v,v〉 = ‖v‖2

    This implies that ‖Qv‖ = ‖v‖. In other words, unitary matrices don’t change the lengths of vectors.

    The converse is also true. If a matrix Q doesn’t change the lengths of vectors then it must beunitary (or orthogonal, if the entries are real). We can show this using the following identity, calledthe polarization identity, that expresses the inner product of two vectors in terms of norms.

    〈v,w〉 = 14

    (‖v + w‖2 − ‖v −w‖2 + i‖v − iw‖2 − i‖v + iw‖2

    )(You are asked to prove this in a homework problem.) Now suppose that Q doesn’t change thelength of vectors, that is, ‖Qv‖ = ‖v‖ for every v. Then, using the polarization identity, we find

    〈Qv,Qw〉

    =1

    4

    (‖Qv +Qw‖2 − ‖Qv −Qw‖2 + i‖Qv − iQw‖2 − i‖Qv + iQw‖2

    )=

    1

    4

    (‖Q(v + w)‖2 − ‖Q(v −w)‖2 + i‖Q(v − iw)‖2 − i‖Q(v + iw)‖2

    )=

    1

    4

    (‖v + w‖2 − ‖v −w‖2 + i‖v − iw‖2 − i‖v + iw‖2

    )= 〈v,w〉

    112

  • Thus 〈v, Q∗Qw〉 = 〈Qv, Qw〉 = 〈v,w〉 for all vectors v and w. In particular, if v is the standardbasis vector ei and w = ej , then 〈ei, Q∗Qej〉 is the i, jth entry of the matrix Q∗Q while 〈ei, ej〉 isthe i, jth entry of the identity matrix I. Since these two quantities are equal for every i and j wemay conclude that Q∗Q = I. Therefore Q is unitary.

    Recall that for square matrices a left inverse is automatically also a right inverse. So if Q∗Q = Ithen QQ∗ = I too. This means that Q∗ is an unitary matrix whenever Q is. This proves the(non-obvious) fact that if the columns of an square matrix form an orthonormal basis, then so dothe (complex conjugated) rows!

    113

  • III.4. Fourier series

    Prerequisites and Learning Goals

    After completing this section, you should be able to

    • Show that the functions en(x) = e2πinx/L for n = 0,±1,±2, ..., a < x < b and L = b− a forman orthonormal (scaled by

    √L) set in L2([a, b]).

    • Use the fact that the functions en(x) form an infinite orthonormal basis to expand a L2function in a Fourier series; explain how this leads to a formula for the coefficients of theseries, and compute the coefficients (in real and complex form).

    • State and derive Parseval’s formula and use it to sum certain infinite series.

    • Use MATLAB/Octave to compute and plot the partial sums of Fourier series.

    • Explain what an amplitude-frequency plot is and generate it for a given function using MAT-LAB/Octave; describe the physical interpretation of the plot when the function is a soundwave.

    III.4.1. Vector spaces of complex-valued functions

    Let [a, b] be an interval on the real line. Recall that we introduced the vector space of real valuedfunctions defined for x ∈ [a, b]. The vector sum f + g of two functions f and g was defined to bethe function you get by adding the values, that is, (f + g)(x) = f(x) + g(x) and the scalar multiplesf was defined similarly by (sf)(x) = sf(x).

    In exactly the same way, we can introduce a vector space of complex valued functions. Theindependent variable x is still real, taking values in [a, b]. But now the values f(x) of the functionsmay be complex. Examples of complex valued functions are f(x) = x + ix2 or f(x) = eix =cos(x) + i sin(x).

    Now we introduce the inner product of two complex valued functions on [a, b]. In analogy withthe inner product for complex vectors we define

    〈f, g〉 =∫ baf(x)g(x)dx

    and the associated norm defined by

    ‖f‖2 = 〈f, f〉 =∫ ba|f(x)|2dx

    For real valued functions we can ignore the complex conjugate.

    114

  • Example: the inner product of f(x) = 1 + ix and g(x) = x2 over the interval [0, 1] is

    〈1 + ix, x2〉 =∫ 10

    (1 + ix) · x2dx =∫ 10

    (1− ix) · x2dx =∫ 10x2 − ix3dx = 1

    3− i1

    4

    It will often happen that a function, like f(x) = x is defined for all real values of x. In this casewe can consider inner products and norms for any interval [a, b] including semi-infinite and infiniteintervals, where a may be −∞ or b may be +∞. Of course the values of the inner product an normdepend on the choice of interval.

    There are technical complications when dealing with spaces of functions. In this course we willdeal with aspects of the subject where these complications don’t play an important role. However,it is good to aware that they exist, so we will mention a few.

    One complication is that the integral defining the inner product may not exist. For example forthe interval (−∞,∞) = R the norm of f(x) = x is infinite since∫ ∞

    −∞|x|2dx =∞

    Even if the interval is finite, like [0, 1], the function might have a spike. For example, if f(x) = 1/xthen ∫ 1

    0

    1

    |x|2dx =∞

    too. To overcome this complication we agree to restrict our attention to square integrable functions.For any interval [a, b], these are the functions f(x) for which |f(x)|2 is integrable. They form avector space that is usually denoted L2([a, b]). It is an example of a Hilbert space and is importantin Quantum Mechanics. The L in this notation indicates that the integrals should be defined asLebesgue integrals rather than as Riemann integrals usually taught in elementary calculus courses.This plays a role when discussing convergence theorems. But for any functions that come up inthis course, the Lebesgue integral and the Riemann integral will be the same.

    The question of convergence is another complication that arises in infinite dimensional vectorspaces of functions. When discussing infinite orthonormal bases, infinite linear combinations ofvectors (functions) will appear. There are several possible meanings for an equation like

    ∞∑i=0

    ciφi(x) = φ(x).

    since we are talking about convergence of an infinite series of functions. The most obvious inter-pretation is that for every fixed value of x the infinite sum of numbers on the left hand side equalsthe number on the right.

    Here is another interpretation: the difference of φ and the partial sums∑N

    i=0 ciφi tends to zerowhen measured in the L2 norm, that is

    limN→∞

    ‖N∑i=0

    ciφi − φ‖ = 0

    115

  • With this definition, it might happen that there are individual values of x where the first equationdoesn’t hold. This is the meaning that we will give to the equation.

    III.4.2. An infinite orthonormal basis for L2([a, b])

    Let [a, b] be an interval of length L = b− a. For every integer n, define the function

    en(x) = e2πinx/L.

    Then infinite collection of functions

    {. . . , e−2, e−1, e0, e1e2, . . .}

    forms an orthonormal basis for the space L2([a, b]), except that each function en has norm√L

    instead of 1. (Since this is the usual normalization, we will stick with it. To get a true orthonormalbasis, we must divide each function by

    √L.)

    Let’s verify that these functions form an orthonormal set (scaled by√L). To compute the norm

    we calculate

    ‖en‖2 = 〈en, en〉 =∫ bae2πinx/Le2πinx/Ldx

    =

    ∫ bae−2πinx/Le2πinx/Ldx

    =

    ∫ ba

    1dx.

    = L

    This shows that ‖en‖ =√L for every n. Next we check that if n 6= m then en and em are orthogonal.

    〈en, em〉 =∫ bae−2πinx/Le2πimx/Ldx

    =

    ∫ bae2πi(m−n)x/Ldx

    =L

    2πi(m− n)e2πi(m−n)x/L

    ∣∣∣bx=a

    =L

    2πi(m− n)

    (e2πi(m−n)b/L − e2πi(m−n)a/L

    )= 0

    116

  • Here we used that e2πi(m−n)b/L = e2πi(m−n)(b−a+a)/L = e2πi(m−n)e2πi(m−n)a/L = e2πi(m−n)a/L.This shows that the functions {. . . , e−2, e−1, e0, e1e2, . . .} form an orthonormal set (scaled by

    √L).

    To show these functions form a basis we have to verify that they span the space L2[a, b]. In otherwords, we must show that any function f ∈ L2[a, b] can be written as an infinite linear combination

    f(x) =

    ∞∑n=−∞

    cnen(x) =

    ∞∑n=−∞

    cne2πinx/L.

    This is a bit tricky, since it involves infinite series of functions. For a finite dimensional space, toshow that an orthogonal set forms a basis, it suffices to count that there are the same number ofelements in an orthogonal set as there are dimensions in the space. For an infinite dimensionalspace this is no longer true. For example, the set of en’s with n even is also an infinite orthonormalset, but it doesn’t span all of L2[a, b].

    In this course, we will simply accept that it is true that {. . . , e−2, e−1, e0, e1e2, . . .} span L2[a, b].Once we accept this fact, it is very easy to compute the coefficients in a Fourier expansion. Theprocedure is the same as in finite dimensions. Starting with

    f(x) =∞∑

    n=−∞cnen(x)

    we simply take the inner product of both sides with em. The only term in the infinite sum thatsurvives is the one with n = m. Thus

    〈em, f〉 =∞∑

    n=−∞cn〈em, en〉 = cmL

    and we obtain the formula

    cm =1

    L

    ∫ bae−2πimx/Lf(x)dx

    III.4.3. Real form of the Fourier series

    Fourier series are often written in terms of sines and cosines as

    f(x) =a02

    +

    ∞∑n=1

    (an cos(2πnx/L) + bn sin(2πnx/L))

    To obtain this form, recall that

    e±2πinx/L = cos(2πnx/L)± i sin(2πnx/L)

    117

  • Using this formula we find

    ∞∑n=−∞

    cne2πnx/L = c0 +

    ∞∑n=1

    cne2πnx/L +

    ∞∑n=1

    c−ne−2πnx/L

    = c0 +∞∑n=1

    cn(cos(2πnx/L) + i sin(2πnx/L))

    +∞∑n=1

    c−n(cos(2πnx/L)− i sin(2πnx/L))

    = c0 +

    ∞∑n=1

    ((cn + c−n) cos(2πnx/L) + i(cn − c−n) sin(2πnx/L)))

    Thus the real form of the Fourier series holds with

    a0 = 2c0

    an = cn + c−n for n > 0

    bn = icn − ic−n for n > 0.

    Equivalently

    c0 =a02

    cn =an2

    +bn2i

    for n > 0

    cn =a−n

    2− b−n

    2ifor n < 0.

    The coefficients an and bn in the real form of the Fourier series can also be obtained directly. Theset of functions

    {1/2, cos(2πx/L), cos(4πx/L), cos(6πx/L), . . . , sin(2πx/L), sin(4πx/L), sin(6πx/L), . . .}

    also form an orthogonal basis where each vector has norm√L/2. This leads to the formulas

    an =2

    L

    ∫ ba

    cos(2πnx/L)f(x)

    for n = 0, 1, 2, . . . and

    bn =2

    L

    ∫ ba

    sin(2πnx/L)f(x)

    for n = 1, 2, . . .. The desire to have the formula for an work out for n = 0 is the reason for dividingby 2 in the constant term a0/2 in the real form of the Fourier series.

    One advantage of the real form of the Fourier series is that if f(x) is a real valued function,then the coefficients an and bn are real too, and the Fourier series doesn’t involve any complex

    118

  • numbers. However, it is often easier to calculate the coefficients cn because exponentials are easierto integrate than sines and cosines.

    III.4.4. An example

    Let’s compute the Fourier coefficients for the square wave function. In this example L = 1.

    f(x) =

    {1 if 0 ≤ x ≤ 1/2−1 if 1/2 < x ≤ 1

    If n = 0 then e−i2πnx = e0 = 1 so c0 is simply the integral of f .

    c0 =

    ∫ 10f(x)dx =

    ∫ 1/20

    1dx−∫ 11/2

    1dx = 0

    Otherwise, we have

    cn =

    ∫ 10e−i2πnxf(x)dx

    =

    ∫ 1/20

    e−i2πnxdx−∫ 11/2

    e−i2πnxdx

    =e−i2πnx

    −i2πn

    ∣∣∣x=1/2x=0

    − e−i2πnx

    −i2πn

    ∣∣∣x=1x=1/2

    =2− 2eiπn

    2πin

    =

    {0 if n is even

    2/iπn if n is odd

    Thus we conclude that

    f(x) =

    ∞∑n=−∞n odd

    2

    iπnei2πnx

    To see how well this series is approximating f(x) we go back to the real form of the series. Usingan = cn + c−n and bn = icn − ic−n we find that an = 0 for all n, bn = 0 for n even and bn = 4/πnfor n odd. Thus

    f(x) =∞∑n=1n odd

    4

    πnsin(2πnx) =

    ∞∑n=0

    4

    π(2n+ 1)sin(2π(2n+ 1)x)

    We can use MATLAB/Octave to see how well this series is converging. The file ftdemo1.mcontains a function that take an integer N as an argument and plots the sum of the first 2N + 1terms in the Fourier series above. Here is a listing:

    119

  • function ftdemo1(N)

    X=linspace(0,1,1000);

    F=zeros(1,1000);

    for n=[0:N]

    F = F + 4*sin(2*pi*(2*n+1)*X)/(pi*(2*n+1));

    end

    plot(X,F)

    end

    Here are the outputs for N = 0, 1, 2, 10, 50:

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    -0.2 0 0.2 0.4 0.6 0.8 1 1.2-1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    -0.2 0 0.2 0.4 0.6 0.8 1 1.2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    -0.2 0 0.2 0.4 0.6 0.8 1 1.2-1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    -0.2 0 0.2 0.4 0.6 0.8 1 1.2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    -0.2 0 0.2 0.4 0.6 0.8 1 1.2

    120

  • III.4.5. Parseval’s formula

    If v1, v2, . . . , vn is an orthonormal basis in a finite dimensional vector space and the vector v hasthe expansion

    v = c1v1 + · · ·+ cnvn =n∑i=1

    civi

    then, taking the inner product of v with itself, and using the fact that the basis is orthonormal, weobtain

    〈v,v〉 =n∑i=1

    n∑j=1

    cicj〈vi,vj〉 =n∑i=1

    |ci|2

    The same formula is true in Hilbert space. If

    f(x) =

    ∞∑n=−∞

    cnen(x)

    Then ∫ 10|f(x)|2dx = 〈f, f〉 =

    ∞∑n=−∞

    |cn|2

    In the example above, we have 〈f, f〉 =∫ 10 1dx = 1 so we obtain

    1 =n=∞∑n=−∞n odd

    4

    π2n2= 2

    n=∞∑n=0n odd

    4

    π2n2=

    8

    π2

    ∞∑n=0

    1

    (2n+ 1)2

    or∞∑n=0

    1

    (2n+ 1)2=π2

    8

    III.4.6. Interpretation of Fourier series

    What is the meaning of a Fourier series in a practical example? Consider the sound made by amusical instrument in a time interval [0, T ]. This sound can be represented by a function y(t) fort ∈ [0, T ], where y(t) is the air pressure at a point in space, for example, at your eardrum.

    A complex exponential e2πiωt = cos(2πωt) ± i sin(2πωt) can be thought of as a pure oscillationwith frequency ω. It is a periodic function whose values are repeated when t increases by ω−1. Ift has units of time (seconds) then ω has units of Hertz (cycles per second). In other words, in onesecond the function e2πiωt cycles though its values ω times.

    The Fourier basis functions can be written as e2πiωnt with ωn = n/T . Thus Fourier’s theoremstates that for t ∈ [0, T ]

    y(t) =∞∑

    n=−∞cne

    2πiωnt.

    121

  • In other words, the audio signal y(t) can be synthesized as a superposition of pure oscillations withfrequencies ωn = n/T . The coefficients cn describe how much of the frequency ωn is present inthe signal. More precisely, writing the complex number cn as cn = |cn|e2πiτn we have cne2πiωnt =|cn|e2πi(ωnt+τn). Thus |cn| represents the amplitude of the oscillation with frequency ωn while τnrepresents a phase shift.

    A frequency-amplitude plot for y(t) is a plot of the points (ωn, |cn|). It should be thought of asa graph of the amplitude as a function of frequency and gives a visual representation of how muchof each frequency is present in the signal.

    If y(t) is defined for all values of t we can use any interval that we want and expand the restrictionof y(t) to this interval. Notice that the frequencies ωn = n/T in the expansion will be different fordifferent values of T .

    Example: Let’s illustrate this with the function y(t) = e2πit and intervals [0, T ]. This function isitself a pure oscillation with frequency ω = 1. So at first glance one would expect that there will beonly one term in the Fourier expansion. This will turn out to be correct if number 1 is one of theavailable frequencies, that is, if there is some value of n for which ωn = n/T = 1. (This happensif T is an integer.) Otherwise, it is still possible to reconstruct y(t), but more frequencies will berequired. In this case we would expect that |cn| should be large for ωn close to 1. Let’s do thecalculation. Fix T . Let’s first consider the case when T is an integer. Then

    cn =1

    T

    ∫ T0e−2πint/T e2πitdt

    =1

    T

    ∫ T0e2πi(1−n/T )tdt

    =

    {1 n = T

    12Tπi(1−n/T )

    (e2πi(T−n) − e0

    )= 0 n 6= T,

    as expected. Now let’s look at what happens when T is not an integer. Then

    cn =1

    T

    ∫ T0e−2πint/T e2πitdt

    =1

    2πi(T − n)

    (e2πi(T−n) − 1

    )A calculation (that we leave as an exercise) results in

    |cn| =√

    2− 2 cos(2πT (1− ωn))2πT |1− ωn|

    We can use MATLAB/Octave to do an amplitude-frequency plot. Here are the commands forT = 10.5 and T = 100.5

    122

  • N=[-200:200];

    T=10.5;

    omega=N/T;

    absc=sqrt(2-2*cos(2*pi*T*(1-omega)))./(2*pi*T*abs(1-omega));

    plot(omega,absc)

    T=100.5;

    omega=N/T;

    absc=sqrt(2-2*cos(2*pi*T*(1-omega)))./(2*pi*T*abs(1-omega));

    hold on;

    plot(omega,absc, ’r’)

    Here is the result

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    -20 -15 -10 -5 0 5 10 15 20

    As expected, the values of |cn| are largest when ωn is close to 1.

    Let us return to the sound made by a musical instrument, represented by a function y(t) fort ∈ [0, T ]. The frequency content of the sound is captured by the infinite Fourier series and canbe displayed using a frequency-amplitude plot. In practical situations, though, we cannot measurey(t) for infinitely many t values, but must sample this functions for a discrete set of t values. Howcan we perform a frequency analysis with this finite sample? To do this, we will use the discreteFourier transform, which is the subject of the next section.

    123

  • III.5. The Discrete Fourier Transform

    Prerequisites and Learning Goals

    After completing this section, you should be able to

    • Explain why the vectors in Cn obtained by sampling the exponential Fourier bases en(t) forman orthogonal basis for Cn (discrete Fourier bases).

    • Use the discrete Fourier basis to expand a vector in Cn obtaining the discrete Fourier trans-form of the vector; recognize the matrix that implements the discrete Fourier transform asan unitary matrix.

    • Use the Fast Fourier transform (fft) algorithm to compute the discrete Fourier transform, andexplain why the Fast Fourier transform algorithm is a faster method. You should be ableto perform Fourier transform computations by executing and interpreting the output of theMATLAB/Octave fft command.

    • Explain the relation between the coefficients in the Fourier series of a function f definedon [0, L] and the coefficients in the discrete Fourier transform of the corresponding sampledvalues of f , and discuss its limitations.

    • Construct a frequency-amplitude plot for a sampled signal using MATLAB/Octave; give aphysical interpretation of the resulting plot; explain the relation between this plot and theinfinite frequency-amplitude plot.

    III.5.1. Definition

    In the previous section we saw that the functions ek(x) = e2πikx for k ∈ Z form an infinite

    orthonormal basis for the Hilbert space of functions L2([0, 1]). Now we will introduce a discrete,finite dimensional version of this basis.

    To motivate the definition of this basis, imagine taking a function defined on the interval [0, 1]and sampling it at the N points 0, 1/N, 2/N, . . . , j/N, . . . , (N − 1)/N . If we do this to the basisfunctions ek(x) we end up with vectors ek given by

    ek =

    e2πi0k/N

    e2πik/N

    e2πi2k/N

    ...

    e2πi(N−1)k/N

    =

    1ωkNω2kN

    ...

    ω(N−1)kN

    where

    ωN = e2πi/N

    124

  • The complex number ωN lies on the unit circle, that is, |ωN | = 1. Moreover ωN is a primitive Nthroot of unity. This means that ωNN = 1 and ω

    jN 6= 1 unless j is a multiple of N .

    Because ωk+NN = ωkNω

    NN = ω

    kN we see that ek+N = ek. Thus, although the vectors ek are defined

    for every integer k, they start repeating themselves after N steps. Thus there are only N distinctvectors, e0, e1, . . . , eN−1.

    These vectors, ek for k = 0, . . . , N − 1 form an orthogonal basis for CN . To see this we use theformula for the sum of a geometric series:

    N−1∑j=0

    rj =

    N r = 11− rN1− r

    r 6= 1

    Using this formula, we compute

    〈ek, el〉 =N−1∑j=0

    ωNkjωljN =

    N−1∑j=0

    ω(l−k)jN =

    N l = k

    1− ω(l−k)NN1− ωl−kN

    = 0 l 6= k

    Now we can expand any vector f ∈ CN in this basis. Actually, to make our discrete Fouriertransform agree with MATLAB/Octave we divide each basis vector by N . Then we obtain

    f =1

    N

    N−1∑j=0

    cjej

    where

    ck = 〈ek, f〉 =N−1∑j=0

    e−2πikj/Nfj

    The map that sends the vector f to the vector of coefficients c = [c0, . . . , cN−1]T is the discrete

    Fourier transform. We can write this in matrix form as

    c = F f , f = F−1c

    where the matrix F−1 has the vectors ek as its columns. Since this vectors are an orthogonal basis,the inverse is the transpose, up to a factor of N . Explicitly

    F =

    1 1 1 · · · 11 ωN ω

    2N · · · ω

    N−1N

    1 ω2N ω4N · · · ω

    2(N−1)N

    ......

    ......

    1 ωN−1N ω2(N−1)N · · · ω

    (N−1)(N−1)N

    125

  • and

    F−1 =1

    N

    1 1 1 · · · 11 ωN ωN

    2 · · · ωNN−11 ωN

    2 ωN4 · · · ωN 2(N−1)

    ......

    ......

    1 ωNN−1 ωN

    2(N−1) · · · ωN (N−1)(N−1)

    The matrix F̃ = N−1/2F is a unitary matrix (F̃−1 = F̃ ∗). Recall that unitary matrices preservethe length of complex vectors. This implies that the lengths of the vectors f = [f0, f1, . . . , fN−1]and c = [c0, c1, . . . , cN−1] are related by

    ‖c‖2 = N‖f‖2

    or

    sumN−1k=0 |ck|2 = N

    N−1∑k=0

    |fk|2

    This is the discrete version of Parseval’s formula.

    III.5.2. The Fast Fourier transform

    Multiplying an N × N matrix with a vector of length N normally requires N2 multiplications,since each entry of the product requires N , and there are N entries. It turns out that the discreteFourier transform, that is, multiplication by the matrix F , can be carried out using only N log2(N)multiplications (at least if N is a power of 2). The algorithm that achieves this is called the FastFourier Transform, or FFT. This represents a tremendous saving in time: calculations that wouldrequire weeks of computer time can be carried out in seconds.

    The basic idea of the FFT is to break the sum defining the Fourier coefficients ck into a sumof the even terms and a sum of the odd terms. Each of these turns out to be (up to a factor wecan compute) a discrete Fourier transform of half the length. This idea is then applied recursively.Starting withN = 2n and halving the size of the Fourier transform at each step, it takes n = log2(N)steps to arrive at Fourier transforms of length 1. This is where the log2(N) comes in.

    To simplify the notation, we will ignore the factor of 1/N in the definition of the discrete Fouriertransform (so one should divide by N at the end of the calculation.) We now also assume that

    N = 2n

    126

  • so that we can divide N by 2 repeatedly. The basic formula, splitting the sum for ck into a sumover odd and even j’s is

    ck =N−1∑j=0

    e−i2πkj/Nfj

    =N−1∑j=0

    j even

    e−i2πkj/Nfj +N−1∑j=0j odd

    e−i2πkj/Nfj

    =

    N/2−1∑j=0

    e−i2πk2j/Nf2j +

    N/2−1∑j=0

    e−i2πk(2j+1)/Nf2j+1

    =

    N/2−1∑j=0

    e−i2πkj/(N/2)f2j + e−i2πk/N

    N/2−1∑j=0

    e−i2πkj/(N/2)f2j+1

    Notice that the two sums on the right are discrete Fourier transforms of length N/2.

    To continue, it is useful to write the integers j in base 2. Lets assume that N = 23 = 8. Onceyou understand this case, the general case N = 2n will be easy. Recall that

    0 = 000 (base 2)

    1 = 001 (base 2)

    2 = 010 (base 2)

    3 = 011 (base 2)

    4 = 100 (base 2)

    5 = 101 (base 2)

    6 = 110 (base 2)

    7 = 111 (base 2)

    The even j’s are the ones whose binary expansions have the form ∗ ∗ 0, while the odd j’s havebinary expansions of the form ∗ ∗ 1.

    For any pattern of bits like ∗ ∗ 0, I will use the notation F to denote the discrete Fouriertransform where the input data is given by all the fj ’s whose j’s have binary expansion fitting thepattern. Here are some examples. To start, F ∗∗∗k = ck is the original discrete Fourier transform,since every j fits the pattern ∗ ∗ ∗. In this example k ranges over 0, . . . , 7, that is, the values startrepeating after that.

    Only even j’s fit the pattern ∗ ∗ 0, so F ∗∗0 is the discrete Fourier transform of the even j’s givenby

    F ∗∗0k =

    N/2−1∑j=0

    e−i2πkj/(N/2)f2j .

    127

  • Here k runs from 0 to 3 before the values start repeating. Similarly, F ∗00 is a transform of lengthN/4 = 2 given by

    F ∗00k =

    N/4−1∑j=0

    e−i2πkj/(N/4)f4j .

    In this case k = 0, 1 and then the values repeat. Finally, the only j matching the pattern 010 isj = 2, so F 010k is a transform of length one term given by

    F 010k =

    N/8−1∑j=0

    e−i2πkj/(N/8)f2 =0∑j=0

    e0f2. = f2

    With this notation, the basic even–odd formula can be written

    F ∗∗∗k = F∗∗0k + ω

    kNF∗∗1k .

    Recall that ωN = ei2π/N , so ωN = e

    −i2π/N .

    Lets look at this equation when k = 0. We will represent the formula by the following diagram.

    F**0

    0***F 0

    **10

    F

    ωN

    0

    128

  • This diagram means that F ∗∗∗0 is obtained by adding F∗∗00 to ω

    0NF∗∗10 . (Of course ω

    0N = 1 so we

    could omit it.) Now lets add the diagrams for k = 1, 2, 3.

    F**0

    0***F 0

    **10

    F

    ωN

    0

    ***FωN

    33

    F**0 ***F

    **1F

    ωN

    F**0 ***F

    **1F

    ωN

    F**0

    **1F

    11

    22

    3

    1

    2

    3

    1

    2

    129

  • Now when we get to k = 4, we recall that F ∗∗0 and F ∗∗1 are discrete transforms of lengthN/2 = 4. Therefore, by periodicity F ∗∗04 = F

    ∗∗00 , F

    ∗∗05 = F

    ∗∗01 , and so on. So in the formula

    F ∗∗∗4 = F∗∗04 + ω

    4NF∗∗14 we may replace F

    ∗∗04 and F

    ∗∗14 with F

    ∗∗00 and F

    ∗∗10 respectively. Making

    such replacements, we complete the first part of the diagram as follows.

    F**00

    ***F 0

    **10

    F

    ωN

    0

    ***FωN

    33

    F**0 ***F

    **1F

    ωN

    F**0 ***F

    **1F

    ωN

    F**0

    **1F

    11

    22

    3

    1

    2

    3

    1

    2

    ***FωN

    ***FωN

    ***FωN

    ***FωN

    4

    5

    6

    7

    4

    5

    6

    7

    130

  • To move to the next level we analyze the discrete Fourier transforms on the left of this diagramin the same way. This time we use the basic formula for the transform of length N/2, namely

    F ∗∗0k = F∗00k + ω

    kN/2F

    ∗10k

    andF ∗∗1k = F

    ∗01k + ω

    kN/2F

    ∗11k .

    The resulting diagram shows how to go from the length two transforms to the final transform onthe right.

    F

    F

    F

    F

    F

    F

    1

    1

    1

    2

    F0

    0F

    0*

    *

    *

    *

    *

    *

    00

    10

    10

    *01

    11*

    00

    0

    1

    0

    1

    ***F

    ***F

    1

    2

    ***F

    ***F

    ***F

    ***F

    4

    5

    6

    7

    ***F 0

    F**0

    **1F

    ω

    F**0

    **1F

    ω

    F**0

    **1F

    1

    2

    3

    1

    2

    3

    1

    2

    ω

    ω

    ω

    ω

    4

    5

    6

    7

    F**00

    **10

    F

    ω0

    ***Fω 33

    N

    N

    N

    N

    N

    N

    N

    N

    N/2

    N/2

    N/2

    N/2

    N/2

    ω

    ω

    ω

    ω

    ω

    ω

    ω

    ωN/2

    N/2

    N/2

    0

    3

    1

    2

    3

    01

    11

    131

  • Now we go down one more level. Each transform of length two can be constructed from transformsof length one, i.e., from the original data in some order. We complete the diagram as follows. Herewe have inserted the value N = 8.

    0

    1

    2

    3

    4

    5

    6

    7

    ***F

    ***F

    1

    2

    ***F

    ***F

    ***F

    ***F

    4

    5

    6

    7

    ***F 0

    F**0

    **1F

    ω

    F**0

    **1F

    ω

    F**0

    **1F

    1

    2

    3

    1

    2

    3

    1

    2

    ω

    ω

    ω

    ω

    4

    5

    6

    7

    F**00

    **10

    F

    ω0

    ***Fω 3

    3

    F

    F

    F

    F

    F

    F

    F0

    0F

    0

    0

    1

    0

    ω

    ω

    ω

    ω

    ω

    ω

    ω

    ω

    0

    1

    1

    2

    0

    ω

    ω

    ω

    ω

    ω

    ω

    ω

    ω

    0

    3

    1

    2

    3

    F

    F

    F

    F

    F

    F

    1

    1

    F0

    0F

    *

    *

    *

    *

    *

    *

    00

    10

    10

    *01

    11*

    00

    0

    1

    0

    1

    01

    11

    111

    011

    101

    001

    110

    010

    100

    000

    0

    0

    0

    0

    2

    2

    2

    2

    0

    1

    2

    2

    20

    2

    1

    4

    4

    4

    4

    4

    4

    4

    4 8

    8

    8

    8

    8

    8

    8

    8

    f

    f

    f

    f

    f

    f

    f

    f0

    4

    2

    5

    6

    1

    3

    7=

    =

    =

    =

    =

    =

    =

    =

    = c

    = c

    = c

    = c

    = c

    = c

    = c

    = c

    Notice that the fj ’s on the left of the diagram are in bit reversed order. In other words, if wereverse the order of the bits in the binary expansion of the j’s, the resulting numbers are orderedfrom 0 (000) to 7 (111).

    Now we can describe the algorithm for the fast Fourier transform. Starting with the originaldata [f0, . . . , f7] we arrange the values in bit reversed order. Then we combine them pairwise, asindicated by the left side of the diagram, to form the transforms of length 2. To do this we weneed to compute ω2 = e

    −iπ = −1. Next we combine the transforms of length 2 according to themiddle part of the diagram to form the transforms of length 4. Here we use that ω4 = e

    −iπ/2 = −i.Finally we combine the transforms of length 4 to obtain the transform of length 8. Here we needω8 = e

    −iπ/4 = 2−1/2 − i2−1/2.

    The algorithm for values of N other than 8 is entirely analogous. For N = 2 or 4 we stopat the first or second stage. For larger values of N = 2n we simply add more stages. Howmany multiplications do we need to do? Well there are N = 2n multiplications per stage of thealgorithm (one for each circle on the diagram), and there are n = log2(N) stages. So the numberof multiplications is 2nn = N log2(N)

    As an example let us compute the discrete Fourier transform withN = 4 of the data [f0, f1, f2, f3] =[1, 2, 3, 4]. First we compute the bit reversed order of 0 = (00), 1 = (01), 2 = (10), 3 = (11) to be

    132

  • (00) = 0, (10) = 2, (01) = 1, (11) = 3. We then do the rest of the computation right on the diagramas follows.

    0

    1

    2

    3f

    f

    f

    f0

    =

    =

    =

    =

    3

    2

    1

    1

    3

    2

    4

    1

    1

    −1

    −1

    1

    1−3=−2

    1+3=4

    2+4=6

    2−4=−2

    4+6=10

    4−6=−2

    −2+2i

    −2−2i

    −i

    i

    −1

    = c

    = c

    = c

    = c

    The MATLAB/Octave command for computing the fast Fourier transform is fft. Let’s verifythe computation above.

    > fft([1 2 3 4])

    ans =

    10 + 0i -2 + 2i -2 + 0i -2 - 2i

    The inverse fft is computed using ifft.

    III.5.3. A frequency-amplitude plot for a sampled audio signal

    Recall that a frequency-amplitude plot for the function y(t) defined on the interval [0, T ] is a plotof the points (ωn, |cn|), where ωn = n/T and cn are the numbers appearing in the Fourier series

    y(t) =∞∑

    n=−∞cne

    2πiωnt =∞∑

    n=−∞cne

    2πint/T

    If y(t) represents the sound of a musical instrument, then the frequency-amplitude plot gives avisual representation of the strengths of the various frequencies present in the sound.

    Of course, for an actual instrument there is no formula for y(t) and the best we can do is tosample this function at a discrete set of points. Let tj = jT/N for j = 0, . . . , N − 1 be N equallyspaced points, and let yj = y(tj) be the sampled values of y(t). Put the results in a vectory = [y0, y2, . . . , yN−1]

    T . How can we make an approximate frequency-amplitude plot with thisinformation?

    133

  • The key is to realize that the coefficients in the discrete Fourier transform of y can be used toapproximate the Fourier series coefficients cn. To see this, do a Riemann sum approximation of theintegral in the formula for cn. Using the equally spaced points tj with ∆tj = T/N we obtain

    cn =1

    T

    ∫ T0e−2πint/T y(t)dt

    ≈ 1T

    N−1∑j=0

    e−2πintj/T y(tj)∆tj

    =T

    TN

    N−1∑j=0

    e−2πinj/Nyj

    =1

    Nc̃n

    where c̃n is the nth coefficient in the discrete Fourier transform of y

    The frequency correpsonding to cn is n/T . So, for an approximate frequency-amplitude plot, wecan plot the points (n/T, |c̃n|/N). Typically we are not given T but rather the vector y of samples,from which we may determine the number N of samples, and the sampling frequency Fs = N/T .Then the points to be plotted can also be written as (nFs/N, |c̃n|/N).

    It is important to realize that the approximation cn ≈ c̃n/N is only good for small n. The reasonis that the Riemann sum will do a worse job in approximating the integral when the integrand isoscillating rapidly, that is, when n is large. So we should only plot a restricted range of n. In fact,it never makes sense to plot more than N/2 points. To see why, recall the formula

    c̃n =

    N−1∑j=0

    e−2πinj/Nyj .

    Notice that, although in the discrete Fourier transform n ranges from 0 to N − 1, the formula forc̃n makes sense for any integer n. With this extended definition of c̃n, (1) c̃n+N = c̃n, and (2) fory real valued, c̃−n = c̃n. Relation (2) implies that |c̃−n| = |c̃n| so that the plot of |c̃n| is symmetricabout n = 0. But there is also a symmetry about n = N/2, since using (2) and then (1) we find|c̃N/2+k| = |c̃−N/2−k| = |c̃N/2−k|

    Here is a typical plot of |c̃n| for N = 8 illustrating the two lines of symmetry.

    ��������

    ��������

    ��������

    ��������

    ����������������

    ��������

    ��������

    ��������

    ����������������

    �������� ��

    ������

    ��������

    ���������

    ���������

    ���������

    ���������

    0 1 2 3 4 5 6 7 8−1−3 −2−4 9

    134

  • The coefficients cn for the Fourier series obey the symmetry (2) but not (1) so if we were to addthese to the plot (using the symbol ◦) the result might look like this:

    ��������

    ��������

    ��������

    ����������������

    ��������

    ��������

    ��������

    ����������������

    �������� ��

    ������

    ��������

    ��������

    ���������

    ���������

    ���������

    ���������

    0 1 2 3 4 5 6 7 8−1−3 −2−4 9

    So we see that |c̃7| should be thought of as an approximation for |c−1| rather than for |c7|

    To further compare the meanings of the coefficients cn and c̃n it is instructive to consider theformulas (both exact) for the Fourier series and the discrete Fourier transform for yj = y(tj):

    yj =1

    N

    N−1∑n=0

    c̃ne2πinj/N

    y(tj) =

    ∞∑n=−∞

    cne2πintj/T =

    ∞∑n=−∞

    cne2πinj/N

    The coefficients cn and c̃n/N are close for n close to 0, but then their values must diverge so thatthe infinite sum and the finite sum above both give the same answer.

    Now let’s try and make a frequency amplitude plot using MATLAB/Octave for a sampled flutecontained in the audio file F6.baroque.au available at

    http://www.phys.unsw.edu.au/music/flute/baroque/sounds/F6.baroque.au.

    This file contains a sampled baroque flute playing the note F6, which has a frequency of 1396.91Hz. The sampling rate is Fs = 22050 samples/second.

    Audio processing is one area where MATLAB and Octave are different. The Octave code to loadthe file F6.baroque.au is

    y=loadaudio(’F6.baroque’,’au’,8);

    while the MATLAB code is

    y=auread(’F6.baroque.au’);

    After this step the sampled values are loaded in the vector y. Now we compute the FFT of y andstore the resulting values c̃n in a vector tildec. Then we compute a vector omega containing thefrequencies and make a plot of these frequencies against |c̃n|/N . We plot the first Nmax=N/4 values.

    135

  • tildec = fft(y);

    N=length(y);

    Fs=22050;

    omega=[0:N-1]*Fs/N;

    Nmax=floor(N/4);

    plot(omega(1:Nmax), abs(tildec(1:Nmax)/N));

    Here is the result.

    0

    2

    4

    6

    8

    10

    12

    0 1000 2000 3000 4000 5000

    Notice the large spike at ω ≈ 1396 corresponding to the note F6. Smaller spikes appear at theovertone series, but evidently these are quite small for a flute.

    136