why are complex numbers needed in quantum mechanics? some answers for the introductory … · why...
TRANSCRIPT
Why are complex numbers needed in quantum mechanics?
Some answers for the introductory level
Ricardo Karam∗
Department of Science Education, University of Copenhagen, Denmark
Abstract
Complex numbers are broadly used in physics, normally as a calculation tool that makes things
easier due to Euler’s formula. In the end, it is only the real component that has physical meaning or
the two parts (real and imaginary) are treated separately as real quantities. However, the situation
seems to be different in quantum mechanics, since the imaginary unit i appears explicitly in its
fundamental equations. From a learning perspective, this can create some challenges to newcomers.
In this article, four conceptually different justifications for the use/need of complex numbers in
quantum mechanics are presented and some pedagogical implications are discussed.
1
I. INTRODUCTION
Complex numbers were invented (or discovered?) in 16th-century Italy as a calculation
tool to solve cubic equations. In the beginning, not much attention was paid to their
meaning, since the imaginary terms should cancel out during the calculations and only (real)
roots were considered. Around 250 years later, complex numbers were given a geometrical
interpretation and, since then, they became quite a useful tool to physics.1 One reason
is that complex numbers represent direction algebraically (2D vectors) and many of their
operations have a direct geometrical meaning (e.g., the product rule: multiply the norms
and add the angles).
It is particularly helpful to use complex numbers to model periodic phenomena, especially
to operate with phase differences. Mathematically, one can treat a physical quantity as being
complex, but address physical meaning only to its real part. Another possibility is to treat
the real and imaginary parts of a complex number as two related (real) physical quantities.
In both cases, the structure of complex numbers is useful to make calculations more easily,
but no physical meaning is actually attached to complex variables.
Quantum mechanics seems to use complex numbers in a more fundamental way. It suffices
to look at some of the most basic equations, both in the matrix ([p, x] = −i~) and wave
(i~∂ψ∂t
= Hψ) formulations, to wonder about the presence of the imaginary unit. What is
essentially different in quantum mechanics is that it deals with complex quantities (e.g. wave
functions and quantum state vectors) of a special kind, which cannot be split up into pure
real and imaginary parts that can be treated independently. Furthermore, physical meaning
is not attached directly to the complex quantities themselves, but to some other operation
that produces real numbers (e.g. the square modulus of the wave function or of the inner
product between state vectors).
This complex nature of quantum mechanical quantities puzzled some of the very founders
of the theory. Schrodinger, for instance, was bothered by the fact that his wave function
was complex and tried for quite some time to find physical interpretations for its (real)
component.2,3 It was probably not until Dirac formulated his bra-ket notation that it became
clearer that the complex quantities of quantum mechanics were of a different kind than the
ones commonly used in classical physics.4
It is therefore very reasonable to expect that introductory level students will face consid-
2
erable difficulties with the peculiar use of complex numbers in quantum mechanics. However,
it is not uncommon to find textbooks that postulate, from the get-go, that wave functions
and quantum state vectors are complex valued, without providing any sort of explanation for
why this is the case. It is not only natural, but even desirable, that students inquire about
the reasons why certain mathematical structures are useful to describe physical properties.
Thus, for newcomers to the field, one can say that it is perfectly legitimate to ask: “Why
do we need complex numbers in quantum mechanics?”.
There is certainly no unique straight answer to this question5–8; constructing plausible
arguments to address it is a matter of pedagogical creativity. The aim of this paper is to
present synthetic (and slightly modified) versions of four different justifications found in the
literature9 and discuss some of their pedagogical implications. The first two justifications
(IIA and IIB) align best with a position first approach, whereas the last two (IIC and
IID) with a spin first approach. The main selection criteria that guided this choice were
the plausibility of the arguments and their applicability for the very first lessons of an
introductory quantum mechanics course.
II. WHY ARE COMPLEX NUMBERS NEEDED IN QUANTUM MECHANICS?
A. No information on position when momentum is known exactly
This argument is found in R. Shankar’s textbook10 for the introductory level and goes
as follows. Consider one particle in one dimension and assume de Broglie’s matter waves
relation (λ = 2π~p
), which is previously justified by empirical evidence from the double-slit
experiment with electrons. Since there appears to be a wave associated with the electron, it
is reasonable to assume that there is a wave function ψ(x) describing it (ignore time depen-
dence). Postulate that the absolute square of this function (|ψ(x)|2) gives the probability
density, i.e., the likelihood to find the particle between the positions x and x+ dx.
The last piece needed for the argument is inspired by Heisenberg’s uncertainty principle.
It is presented qualitatively in Shankar’s textbook by thought experiments like the gamma-
ray microscope. One way to state the principle is to say that “it is impossible to prepare a
particle in a state in which its momentum and position (along one axis) are exactly known.”
Now, suppose the electron is in a state of definite momentum. How would the wave
3
function for this state look like? According to de Broglie’s relation, there is a wavelength as-
sociated with the electron. Therefore, it seems plausible to assume a classical/real oscillating
wave function of the form
ψp(x) = A cos2πx
λ= A cos
px
~, (1)
where ψp(x) denotes a state of definite momentum. ψp(x), as well as |ψp(x)|2, are plotted
in Fig. 1.
II. WHY ARE COMPLEX NUMBERS NEEDED IN QUANTUM MECHANICS?
A. Due to the uncertainty principle
This argument is found in R. Shankar’s textbook1 for the introductory level and goes as
follows. Consider 1 particle in 1 dimension and assume de Broglie’s matter waves relation
(� = 2⇡~p
), which is previously justified by empirical evidence from the double-slit experiment
with electrons. Since there appears to be a wave associated with the electron, it is reason-
able to assume that there is a wave function (x) describing it (ignore time dependency).
Postulate that the absolute square of this function (| (x)|2) gives the probability density,
i.e., the likelihood to find the particle between the positions x and x + dx.
The last piece needed for the argument is Heisenberg’s uncertainty principle. It is pre-
sented qualitatively in Shankar’s textbook by thought experiments like the gamma-ray mi-
croscope. One way to state the principle is to say that “it is impossible to prepare a particle
in a state in which its momentum and position (along one axis) are exactly known.” In other
words, the product of the uncertainties �x and �p has a lower bound and the principle can
be roughly expressed mathematically by �x�p >⇠ ~.
Now, suppose the electron is in a state of definite momentum. How would the wave
function for this state look like? According to de Broglie’s relation, there is a wavelength
associated with it, thus, it seems plausible to assume a classical/real oscillating wave of the
form
p(x) = A cos2⇡x
�= A cos
px
~, (1)
where p(x) denotes a state of definite momentum. p(x), as well as | p(x)|2, are plotted
in Fig. 1.
FIG. 1. .
By looking at the graph of | p(x)|2 (probability distribution) one realizes that it is in
contradiction with the uncertainty principle. If you know the momentum of the electron
exactly, you should have no information about its position. However, the graph of | p(x)|2
is implying that there are regions where it is more likely to find the electron than others. If
the uncertainty principle were to be respected, Shankar argues, the probability distribution
should be “flat”, meaning that it is equally likely to find it anywhere. Thus, one cannot
3
II. WHY ARE COMPLEX NUMBERS NEEDED IN QUANTUM MECHANICS?
A. Due to the uncertainty principle
This argument is found in R. Shankar’s textbook1 for the introductory level and goes as
follows. Consider 1 particle in 1 dimension and assume de Broglie’s matter waves relation
(� = 2⇡~p
), which is previously justified by empirical evidence from the double-slit experiment
with electrons. Since there appears to be a wave associated with the electron, it is reason-
able to assume that there is a wave function (x) describing it (ignore time dependency).
Postulate that the absolute square of this function (| (x)|2) gives the probability density,
i.e., the likelihood to find the particle between the positions x and x + dx.
The last piece needed for the argument is Heisenberg’s uncertainty principle. It is pre-
sented qualitatively in Shankar’s textbook by thought experiments like the gamma-ray mi-
croscope. One way to state the principle is to say that “it is impossible to prepare a particle
in a state in which its momentum and position (along one axis) are exactly known.” In other
words, the product of the uncertainties �x and �p has a lower bound and the principle can
be roughly expressed mathematically by �x�p >⇠ ~.
Now, suppose the electron is in a state of definite momentum. How would the wave
function for this state look like? According to de Broglie’s relation, there is a wavelength
associated with it, thus, it seems plausible to assume a classical/real oscillating wave of the
form
p(x) = A cos2⇡x
�= A cos
px
~, (1)
where p(x) denotes a state of definite momentum. p(x), as well as | p(x)|2, are plotted
in Fig. 1.
FIG. 1. .
By looking at the graph of | p(x)|2 (probability distribution) one realizes that it is in
contradiction with the uncertainty principle. If you know the momentum of the electron
exactly, you should have no information about its position. However, the graph of | p(x)|2
is implying that there are regions where it is more likely to find the electron than others. If
the uncertainty principle were to be respected, Shankar argues, the probability distribution
should be “flat”, meaning that it is equally likely to find it anywhere. Thus, one cannot
3
II. WHY ARE COMPLEX NUMBERS NEEDED IN QUANTUM MECHANICS?
A. Due to the uncertainty principle
This argument is found in R. Shankar’s textbook1 for the introductory level and goes as
follows. Consider 1 particle in 1 dimension and assume de Broglie’s matter waves relation
(� = 2⇡~p
), which is previously justified by empirical evidence from the double-slit experiment
with electrons. Since there appears to be a wave associated with the electron, it is reason-
able to assume that there is a wave function (x) describing it (ignore time dependency).
Postulate that the absolute square of this function (| (x)|2) gives the probability density,
i.e., the likelihood to find the particle between the positions x and x + dx.
The last piece needed for the argument is Heisenberg’s uncertainty principle. It is pre-
sented qualitatively in Shankar’s textbook by thought experiments like the gamma-ray mi-
croscope. One way to state the principle is to say that “it is impossible to prepare a particle
in a state in which its momentum and position (along one axis) are exactly known.” In other
words, the product of the uncertainties �x and �p has a lower bound and the principle can
be roughly expressed mathematically by �x�p >⇠ ~.
Now, suppose the electron is in a state of definite momentum. How would the wave
function for this state look like? According to de Broglie’s relation, there is a wavelength
associated with it, thus, it seems plausible to assume a classical/real oscillating wave of the
form
p(x) = A cos2⇡x
�= A cos
px
~, (1)
where p(x) denotes a state of definite momentum. p(x), as well as | p(x)|2, are plotted
in Fig. 1.
FIG. 1. .
By looking at the graph of | p(x)|2 (probability distribution) one realizes that it is in
contradiction with the uncertainty principle. If you know the momentum of the electron
exactly, you should have no information about its position. However, the graph of | p(x)|2
is implying that there are regions where it is more likely to find the electron than others. If
the uncertainty principle were to be respected, Shankar argues, the probability distribution
should be “flat”, meaning that it is equally likely to find it anywhere. Thus, one cannot
3
FIG. 1. Graphs of ψp(x) = A cos px~ and |ψp(x)|2. The graph of |ψp(x)|2 (probability distribution)
is inconsistent with the assertion that once momentum is exactly determined, one should have no
information about the electron’s position.
The graph of |ψp(x)|2 (probability distribution) illustrates an apparent contradiction.11 If
you know the momentum of the electron exactly, you should have no information about its
position. However, the graph of |ψp(x)|2 is implying that there are regions where it is more
likely to find the electron than others. If we should have no information about position,
Shankar argues, the probability distribution should be “flat”, meaning that it is equally
likely to find it anywhere. Thus, one cannot accept the wave function as described in (1).
4
Is there a function that has a wavelength and, yet, a constant absolute value? The
complex exponential has this property. Therefore, instead of (1), if the wave function looks
like
ψp(x) = Aeipx~ , (2)
then its absolute square has a constant value |ψp(x)|2 = Aeipx~ · Ae−ipx
~ = |A|2 (see Fig. 2).
In sum, the wave function needs to be complex so that no information about the position is
obtained for a state of definite momentum.
II. WHY ARE COMPLEX NUMBERS NEEDED IN QUANTUM MECHANICS?
A. Due to the uncertainty principle
This argument is found in R. Shankar’s textbook1 for the introductory level and goes as
follows. Consider 1 particle in 1 dimension and assume de Broglie’s matter waves relation
(� = 2⇡~p
), which is previously justified by empirical evidence from the double-slit experiment
with electrons. Since there appears to be a wave associated with the electron, it is reason-
able to assume that there is a wave function (x) describing it (ignore time dependency).
Postulate that the absolute square of this function (| (x)|2) gives the probability density,
i.e., the likelihood to find the particle between the positions x and x + dx.
The last piece needed for the argument is Heisenberg’s uncertainty principle. It is pre-
sented qualitatively in Shankar’s textbook by thought experiments like the gamma-ray mi-
croscope. One way to state the principle is to say that “it is impossible to prepare a particle
in a state in which its momentum and position (along one axis) are exactly known.” In other
words, the product of the uncertainties �x and �p has a lower bound and the principle can
be roughly expressed mathematically by �x�p >⇠ ~.
Now, suppose the electron is in a state of definite momentum. How would the wave
function for this state look like? According to de Broglie’s relation, there is a wavelength
associated with it, thus, it seems plausible to assume a classical/real oscillating wave of the
form
p(x) = A cos2⇡x
�= A cos
px
~, (1)
where p(x) denotes a state of definite momentum. p(x), as well as | p(x)|2, are plotted
in Fig. 1.
FIG. 1. .
By looking at the graph of | p(x)|2 (probability distribution) one realizes that it is in
contradiction with the uncertainty principle. If you know the momentum of the electron
exactly, you should have no information about its position. However, the graph of | p(x)|2
is implying that there are regions where it is more likely to find the electron than others. If
the uncertainty principle were to be respected, Shankar argues, the probability distribution
should be “flat”, meaning that it is equally likely to find it anywhere. Thus, one cannot
3
II. WHY ARE COMPLEX NUMBERS NEEDED IN QUANTUM MECHANICS?
A. Due to the uncertainty principle
This argument is found in R. Shankar’s textbook1 for the introductory level and goes as
follows. Consider 1 particle in 1 dimension and assume de Broglie’s matter waves relation
(� = 2⇡~p
), which is previously justified by empirical evidence from the double-slit experiment
with electrons. Since there appears to be a wave associated with the electron, it is reason-
able to assume that there is a wave function (x) describing it (ignore time dependency).
Postulate that the absolute square of this function (| (x)|2) gives the probability density,
i.e., the likelihood to find the particle between the positions x and x + dx.
The last piece needed for the argument is Heisenberg’s uncertainty principle. It is pre-
sented qualitatively in Shankar’s textbook by thought experiments like the gamma-ray mi-
croscope. One way to state the principle is to say that “it is impossible to prepare a particle
in a state in which its momentum and position (along one axis) are exactly known.” In other
words, the product of the uncertainties �x and �p has a lower bound and the principle can
be roughly expressed mathematically by �x�p >⇠ ~.
Now, suppose the electron is in a state of definite momentum. How would the wave
function for this state look like? According to de Broglie’s relation, there is a wavelength
associated with it, thus, it seems plausible to assume a classical/real oscillating wave of the
form
p(x) = A cos2⇡x
�= A cos
px
~, (1)
where p(x) denotes a state of definite momentum. p(x), as well as | p(x)|2, are plotted
in Fig. 1.
FIG. 1. .
By looking at the graph of | p(x)|2 (probability distribution) one realizes that it is in
contradiction with the uncertainty principle. If you know the momentum of the electron
exactly, you should have no information about its position. However, the graph of | p(x)|2
is implying that there are regions where it is more likely to find the electron than others. If
the uncertainty principle were to be respected, Shankar argues, the probability distribution
should be “flat”, meaning that it is equally likely to find it anywhere. Thus, one cannot
3
II. WHY ARE COMPLEX NUMBERS NEEDED IN QUANTUM MECHANICS?
A. Due to the uncertainty principle
This argument is found in R. Shankar’s textbook1 for the introductory level and goes as
follows. Consider 1 particle in 1 dimension and assume de Broglie’s matter waves relation
(� = 2⇡~p
), which is previously justified by empirical evidence from the double-slit experiment
with electrons. Since there appears to be a wave associated with the electron, it is reason-
able to assume that there is a wave function (x) describing it (ignore time dependency).
Postulate that the absolute square of this function (| (x)|2) gives the probability density,
i.e., the likelihood to find the particle between the positions x and x + dx.
The last piece needed for the argument is Heisenberg’s uncertainty principle. It is pre-
sented qualitatively in Shankar’s textbook by thought experiments like the gamma-ray mi-
croscope. One way to state the principle is to say that “it is impossible to prepare a particle
in a state in which its momentum and position (along one axis) are exactly known.” In other
words, the product of the uncertainties �x and �p has a lower bound and the principle can
be roughly expressed mathematically by �x�p >⇠ ~.
Now, suppose the electron is in a state of definite momentum. How would the wave
function for this state look like? According to de Broglie’s relation, there is a wavelength
associated with it, thus, it seems plausible to assume a classical/real oscillating wave of the
form
p(x) = A cos2⇡x
�= A cos
px
~, (1)
where p(x) denotes a state of definite momentum. p(x), as well as | p(x)|2, are plotted
in Fig. 1.
FIG. 1. .
By looking at the graph of | p(x)|2 (probability distribution) one realizes that it is in
contradiction with the uncertainty principle. If you know the momentum of the electron
exactly, you should have no information about its position. However, the graph of | p(x)|2
is implying that there are regions where it is more likely to find the electron than others. If
the uncertainty principle were to be respected, Shankar argues, the probability distribution
should be “flat”, meaning that it is equally likely to find it anywhere. Thus, one cannot
3
Re[]
FIG. 2. Graphs of Re[ψp(x)] and |ψp(x)|2 for ψp(x) = Aeipx~ . Now the contradiction is eliminated,
since there is no information about the electron’s position for a state of definite momentum.
One advantage of this argument is that it has the structure of a reductio ad absurdum
proof, i.e., it starts by assuming a real wave function and reaches a contradiction, which
can be psychologically more appealing to students. It also nicely stresses a formal difference
between a cosine and a complex exponential, both are periodic functions but the latter has
a constant absolute value, and therefore is more suitable to describe the intended physical
properties. On the other hand, the argument relies on providing plausible reasons for why
one cannot have full information about position and momentum simultaneously, and it
postulates the square modulus of ψ(x) as probability density, which can leave students
wondering where this comes from.
B. Because i appears explicitly in the Schrodinger equation
The second justification stems from the textbook “Quantum Theory” written by David
Bohm12 and is based on a structural difference between the classical wave equation and
5
Schrodinger’s time-dependent equation. For simplicity reasons, let us first consider the wave
equation in one dimension
∂2y
∂t2= c2
∂2y
∂x2. (3)
Can we allow the solutions to this equation to be complex valued? In other words, what
are the consequences of writing the solution as y(x, t) = A(x, t) + iB(x, t)? By substituting
this complex function in (3), and setting the real and imaginary parts to be respectively
equal, we get
∂2A
∂t2= c2
∂2A
∂x2and
∂2B
∂t2= c2
∂2B
∂x2, (4)
which means that the functions A and B can be chosen freely, as long as they both satisfy
the wave equation. Due to its nice mathematical properties that make calculations easier,
we often use the complex exponential ei(kx−ωt) to represent oscillating travelling waves, but
attach physical meaning only to its real part. As we can see from Euler’s formula, both
A(x, t) = cos (kx− ωt) and B(x, t) = sin (kx− ωt) are solutions to the wave equation, but
given A(x, t), B(x, t) could be another function that satisfies the same equation.
Now let us try to do the same with the Schrodinger equation. Consider, for simplicity,
the equation of a free particle in one dimension13
i~∂ψ
∂t= − ~2
2m
∂2ψ
∂x2. (5)
Substituting ψ(x, t) = A(x, t) + iB(x, t) in (5) and setting the real and imaginary parts
to be respectively equal, we obtain the following conditions
∂A
∂t= − ~
2m
∂2B
∂x2and
∂B
∂t=
~2m
∂2A
∂x2. (6)
Now we no longer have the freedom to choose the functions A(x, t) and B(x, t) arbitrarily
since they are coupled by the relations (6). This implies that we need two functions to
describe the physical situation and cannot assign meaning to just one of them, as it was
the case with classical waves. This will generally happen when the imaginary unit appears
explicitly in the differential equation, as in Schrodinger’s. Another way to justify the need
for two (real) functions is to see how they both appear in the definition of probability density
P = ψ∗ψ = A2 +B2.
6
But then another question arises immediately: Why is there an i in the time-dependent
Schrodinger equation? One answer can be found in Schrodinger’s original derivation of his
time-dependent equation, which consists in eliminating the energy parameter E from his
time-independent equation14
∂2ψ
∂x2+
2m
~2(E − V )ψ = 0. (7)
The derivation involves assuming the Planck-Einstein relation ω = E/~ and using sep-
aration of variables to represent ψ(x, t) = Ψ(x) · eiE~ t. In order to eliminate the energy
parameter, we derive ψ(x, t) with respect to time
∂ψ
∂t= i
E
~ψ, (8)
and substitute in Eq. (7) to obtain
∂2ψ
∂x2− i2m
~∂ψ
∂t− 2m
~2V ψ = 0, (9)
which, for a free particle, is exactly Eq. (5). Thus, one can say that the imaginary unit i
appeared explicitly in the equation because a time periodic function was represented by a
complex exponential. This enabled the isolation of the energy parameter already after the
first time derivative.16
One nice thing about this justification is that it emphasizes formal differences between the
classical wave equation and Schrodinger’s equation, and is therefore closer to Schrodinger’s
original reasoning. For some people this can actually be a drawback, since it puts too much
emphasis on the wave formalism, which can hinder a more authentic understanding of mod-
ern quantum mechanics. Another interesting aspect is that it highlights the relation between
a complex function and two coupled real functions, implying that in quantum mechanics
one needs two real functions to describe the dynamics of a system. As in the previous case,
here we are also led to wonder about the implications of certain formal differences between
a cosine and a complex exponential. One negative aspect of this justification is that if
one truly wants to explore the consequences of a real wave function which is the solution
of a fourth order differential equation (see note 16), then the explanation can become too
demanding/distracting and not worth the trade-off from a pedagogical point of view.
7
C. Sx, Sy and Sz in sequential Stern-Gerlach experiments
This justification appears in some textbooks that adopt the so-called spin first approach
(e.g. Ref. 18–20) and therefore differs fundamentally from the two previous ones. Sequential
Stern-Gerlach (SG) experiments are presented to motivate the need for a departure from
classical mechanics and to build the new formalism of quantum states. The situation illus-
trated in Fig. 3 is used to highlight some peculiarities of quantum mechanical systems, since
it is rather counterintuitive that a component of Sz in the negative (↓) direction “reappears”
after the beam emerges from the third apparatus. The reason is that one cannot determine
Sz and Sx simultaneously. In other words, they are incompatible observables and the selec-
tion of the component of Sx in the positive (↑) direction by the second apparatus destroys
any previous information about Sz.18
1.1 Stern-Gerlach Experiment 9
Experiment 4c. If we were to use classical probability analysis, then Experiment 4a would indicate that the probability for an atom leaving the first analyzer to take the upper path through the second analyzer and then exit through the upper port of the third analyzer is 25%, where we are now referring to the total probability for those two steps. Likewise, Experiment 4b would indicate that the total probability to take the lower path through the second analyzer and exit through the upper port of the third analyzer is also 25%. Hence the total probability to exit from the upper port of the third analyzer when both paths are available, which is Experiment 4c, would be 50%, and likewise for the exit from the lower port.
However, the quantum mechanical result in Experiment 4c is that all the atoms exit the upper port of the third analyzer and none exits the lower port. The atoms now appear to “remember” that they were initially measured to have spin up along the z-axis. By combining the two beams from the second analyzer, we have avoided the quantum mechanical disturbance that was evident in Experiments 3, 4a, and 4b. The result is now the same as Experiment 1, which means it is as if the second analyzer is not there.
To see how odd this is, look carefully at what happens at the lower port of the third analyzer. In this discussion, we refer to percentages of atoms leaving the first analyzer, because that analyzer is the same in all three experiments. In Experiments 4a and 4b, 50% of the atoms are blocked after the middle analyzer and 25% of the atoms exit the lower port of the third analyzer. In Experiment 4c, 100% of the atoms pass from the second analyzer to the third analyzer, yet fewer atoms come out of the lower port. In fact, no atoms make it through the lower port! So we have a situation where
25
25
XZ Z100 50
25
25
XZ Z100
(a)
(b)
(c)
50
100
0
XZ Z100 100
FIGURE 1.6 Experiment 4 measures the spin component three times in succession and uses (a and b) one or (c) two beams from the second analyzer.
FIG. 3. Sequential SG experiments showing the counterintuitive result that particles with a nega-
tive Sz “reappear” after the beam emerges from the third apparatus, although they were previously
“blocked” by the first apparatus. Figure in Ref. 19, p. 9.
Within this context, the notion of a ket, a quantum state vector, is introduced to express
quantum states mathematically. By convention, the states with definite values of Sz (which
we call Sz states) are denoted by |+〉 and |−〉, and are taken as basis vectors for other states
with definite values of other spin observables. Thus, for the situation presented in Fig. 3,
one can express the kets with definite values of Sx (|+〉x and |−〉x, according to the notation
used in Ref. 19) as a linear combination of the Sz basis vectors, i.e., |+〉x = a |+〉 + b |−〉and |−〉x = c |+〉+ d |−〉.
The situation that leads to the need for complex numbers is described as follows. In Fig.
3, the beam is traveling in the y-direction. But the situation is symmetric, so the exact
same result is to be expected if the beam travels in the x-direction and the SGX apparatus
is replaced by a SGY . However, it is important to distinguish between Sx and Sy states,
e.g., |+〉y cannot have the same coefficients a and b. So how is that possible?
8
In his initial discussion of sequential SG experiments, Sakurai makes use of a formal
analogy with light polarization to find a solution to this problem. He compares the decom-
position of the Sx states with the mathematical expression of linearly polarized light in the
xy-plane and inclined 45◦with the x-direction. In order to represent the Sy states in a differ-
ent way, Sakurai argues, one can refer to the mathematical expression of circularly polarized
light, which is clearly a distinct physical situation. Because the perpendicular components of
circularly polarized light are out of phase by 90◦, complex numbers appear to represent this
phase difference. Although the analogy is meaningful from a formal perspective, it seems
confusing if one tries to make some kind of physical association between the two situations.
McIntyre19 and Townsend20 do not refer to the analogy with polarization, but introduce
directly the formalism of quantum state vectors and the probabilistic interpretation of the
square modulus of the inner product between two state vectors. This enables them to treat
more formally the issue of distinguishing the Sx from the Sy states and to motivate the need
for complex numbers. These states are first represented in the Sz basis as follows:
|+〉x = a |+〉+ b |−〉|−〉x = c |+〉+ d |−〉
and|+〉y = e |+〉+ f |−〉|−〉y = g |+〉+ h |−〉 .
(10)
1.2 Quantum State Vectors 17
enough information about how the states 0 +9x and 0 -9x behave mathematically. Rather, we will use the results of the experiment to determine these states. Recalling that the experimental results would be the same if the first analyzer prepared the system to be in the 0 -9 state [see Fig. 1.11(b)], we have four results for the two experiments:
P1,+x = 0 x8+ 0 +9 0 2 = 12
P1,-x = 0 x8 - 0 +9 0 2 = 12
P2,+x = 0 x8+ 0 -9 0 2 = 12
(1.28)
P2,-x = 0 x8 - 0 -9 0 2 = 12 .
Because the kets 0 +9 and 0 -9 form a complete basis, the kets describing the Sx measurement, 0 +9x and 0 -9x, can be written in terms of them. We do not yet know the specific coefficients of the 0 { 9x states, so we use general expressions
0 +9x = a 0 +9 + b 0 -9
0 -9x = c 0 +9 + d 0 -9, (1.29)
and now our task is to use the results of Experiment 2 to determine the coefficients a, b, c, and d. The first measured probability in Eq. (1.28) is
P1,+x = 0 x8+ 0 +9 0 2 = 12 . (1.30)
Using the general expression for 0 +9x in Eq. (1.29), we calculate the probability that the 0 +9 input state is measured to be in the 0 +9x output state, that is, to have Sx = +U>2:
P1,+x = 0 x8+ 0 +9 0 2
= 0 5a*8 + 0 + b*8 - 0 6 0 +9 0 2 (1.31)
= 0 a* 0 2 = 0 a 0 2 ,
where we convert the 0 +9x ket to a bra x8 + 0 in order to calculate the inner product. Equating the experimental result in Eq. (1.30) and the prediction in Eq. (1.31), we find
0 a 0 2 = 12 . (1.32)
(a)
P!x
1
P
Sx
(b)
P"xP!x
1
P
Sx!!2
!2 !!2
!2
P"x
!Ψin"$%$!!"!Ψin"$%$!""
FIGURE 1.11 Histograms of Sx spin component measurements for Experiment 2 for different input states (a) 0cin9 = 0 + 9 and (b) 0cin9 = 0 -9.24 Stern-Gerlach Experiments
Example 1.3 To get some practice using this new matrix notation, and to learn some more about the spin-1/2 system, use the results of Experiment 2 to determine the Sy basis kets using the matrix approach instead of the Dirac bra-ket approach.
Consider Experiment 2 in the case where the second Stern-Gerlach analyzer is aligned along the y-axis. We said before that the results are the same as in the case shown in Fig. 1.4. Thus, we have
P1,+y = @ y8+ @ +9 @2 = 12
P1,-y = @ y8- @ +9 @2 = 12
P2,+y = @ y8+ @ -9 @2 = 12
(1.56)
P2,-y = @ y8- @ -9 @2 = 12 ,
as depicted in the histograms of Fig. 1.14.These results allow us to determine the kets 0 {9y corresponding to the spin component up and
down along the y-axis. The argument and calculation proceeds exactly as it did earlier for the 0 {9x states up until the point [Eq. (1.35)] where we arbitrarily chose the phase a to be zero. Having done that for the 0 {9x states, we are no longer free to make that same choice for the 0 {9y states. Thus we use Eq. (1.35) to write the 0 {9y states as
0 +9y =122
3 0 +9 + eia 0 -94 !122
a 1eiab
0 -9y =122
3 0 +9 - eia 0 -94 !122
a 1-eiab .
(1.57)
To determine the phase a, we use some more information at our disposal. Experiment 2 could be performed with the first Stern-Gerlach analyzer aligned along the x-axis and the second analyzer along the y-axis. Again the results would be identical (50% at each output port), yielding
P+y = @ y8+ @ +9x @2 = 12 (1.58)
(a)
P!yP"y
1
P
Sy
"!2!2
(b)
P!yP"y
1
P
Sy"!2
!2
!Ψin"$%$!""!Ψin"$%$!!"
FIGURE 1.14 Histograms of Sy spin component measurements for input states (a) 0cin9 = 0 +9 and (b) 0cin9 = 0 -9.
6 Stern-Gerlach Experiments
through the upper port of the second analyzer. Thus we say that when the first Stern-Gerlach analyzer measures an atom to have a z-component of spin Sz = +U>2, then the second analyzer also measures Sz = +U>2 for that atom. This result is not surprising, but it sets the stage for results of experiments to follow.
Though both Stern-Gerlach analyzers in Experiment 1 are identical, they play different roles in this experiment. The first analyzer prepares the beam in a particular quantum state 1 0 +92 and the second analyzer measures the resultant beam, so we often refer to the first analyzer as a state prepa-ration device. By preparing the state with the first analyzer, the details of the source of atoms can be ignored. Thus our main focus in Experiment 1 is what happens at the second analyzer because we know that any atom entering the second analyzer is represented by the 0 +9 ket prepared by the first analyzer. All the experiments we will describe employ a first analyzer as a state preparation device, though the SPINS program has a feature where the state of the atoms coming from the oven is deter-mined but unknown, and the user can perform experiments to determine the unknown state using only one analyzer in the experiment.
1.1.2 ! Experiment 2
The second experiment is shown in Fig. 1.4 and is identical to Experiment 1 except that the sec-ond Stern-Gerlach analyzer has been rotated by 90° to be aligned with the x-axis. Now the second analyzer measures the spin component along the x-axis rather the z-axis. Atoms input to the second analyzer are still represented by the ket 0 +9 because the first analyzer is unchanged. The result of this experiment is that atoms appear at both possible output ports of the second analyzer. Atoms leaving the upper port of the second analyzer have been measured to have Sx = +U>2, and atoms leaving
ZZ
50
50
0
!!"!!"
!""!""
!Ψ"
FIGURE 1.3 Experiment 1 measures the spin component along the z-axis twice in succession.
XZ
50
25
25!""
!""x
!!"x!!"
!Ψ"
FIGURE 1.4 Experiment 2 measures the spin component along the z-axis and then along the x-axis.
6 Stern-Gerlach Experiments
through the upper port of the second analyzer. Thus we say that when the first Stern-Gerlach analyzer measures an atom to have a z-component of spin Sz = +U>2, then the second analyzer also measures Sz = +U>2 for that atom. This result is not surprising, but it sets the stage for results of experiments to follow.
Though both Stern-Gerlach analyzers in Experiment 1 are identical, they play different roles in this experiment. The first analyzer prepares the beam in a particular quantum state 1 0 +92 and the second analyzer measures the resultant beam, so we often refer to the first analyzer as a state prepa-ration device. By preparing the state with the first analyzer, the details of the source of atoms can be ignored. Thus our main focus in Experiment 1 is what happens at the second analyzer because we know that any atom entering the second analyzer is represented by the 0 +9 ket prepared by the first analyzer. All the experiments we will describe employ a first analyzer as a state preparation device, though the SPINS program has a feature where the state of the atoms coming from the oven is deter-mined but unknown, and the user can perform experiments to determine the unknown state using only one analyzer in the experiment.
1.1.2 ! Experiment 2
The second experiment is shown in Fig. 1.4 and is identical to Experiment 1 except that the sec-ond Stern-Gerlach analyzer has been rotated by 90° to be aligned with the x-axis. Now the second analyzer measures the spin component along the x-axis rather the z-axis. Atoms input to the second analyzer are still represented by the ket 0 +9 because the first analyzer is unchanged. The result of this experiment is that atoms appear at both possible output ports of the second analyzer. Atoms leaving the upper port of the second analyzer have been measured to have Sx = +U>2, and atoms leaving
ZZ
50
50
0
!!"!!"
!""!""
!Ψ"
FIGURE 1.3 Experiment 1 measures the spin component along the z-axis twice in succession.
XZ
50
25
25!""
!""x
!!"x!!"
!Ψ"
FIGURE 1.4 Experiment 2 measures the spin component along the z-axis and then along the x-axis.
Y
1.4 General Quantum Systems 25
as one of the measured quantities. Now use matrix algebra to calculate this:
y8 + 0 + 9x = 112 11 e-ia2 112 a11b
= 12 11 + e-ia2
@ y8 + 0 +9x @2 = 12 11 + e-ia21
2 11 + eia2 (1.59)
= 14 11 + eia + e-ia + 12
= 12 11 + cos a2 = 1
2 .
This result requires that cos a = 0, or that a = {p>2. The two choices for the phase correspond to the two possibilities for the direction of the y-axis relative to the already determined x- and z-axes. The choice a = +p>2 can be shown to correspond to a right-handed coordinate system, which is the standard convention, so we choose that phase. We thus represent the 0 {9y kets as
0 +9y !122
a1ib
0 -9y !122
a 1- ib .
(1.60)
Note that the imaginary components of these kets are required. They are not merely a mathemati-cal convenience as one sees in classical mechanics. In general, quantum mechanical state vectors have complex coefficients. But this does not mean that the results of physical measurements are complex. On the contrary, we always calculate a measurement probability using a complex square, so all quantum mechanics predictions of probabilities are real.
1.4 " GENERAL QUANTUM SYSTEMS
The machinery we have developed for spin-1/2 systems can be generalized to other quantum systems. For example, if an observable A yields quantized measurement results an for some finite range of n, then we generalize the schematic depiction of a Stern-Gerlach measurement to a measurement of the
A
!a1"
!a2"
!a3"
!Ψin"a2
a1
a3
FIGURE 1.15 Generic depiction of the quantum mechanical measurement of observable A.
1.4 General Quantum Systems 25
as one of the measured quantities. Now use matrix algebra to calculate this:
y8 + 0 + 9x = 112 11 e-ia2 112 a11b
= 12 11 + e-ia2
@ y8 + 0 +9x @2 = 12 11 + e-ia21
2 11 + eia2 (1.59)
= 14 11 + eia + e-ia + 12
= 12 11 + cos a2 = 1
2 .
This result requires that cos a = 0, or that a = {p>2. The two choices for the phase correspond to the two possibilities for the direction of the y-axis relative to the already determined x- and z-axes. The choice a = +p>2 can be shown to correspond to a right-handed coordinate system, which is the standard convention, so we choose that phase. We thus represent the 0 {9y kets as
0 +9y !122
a1ib
0 -9y !122
a 1- ib .
(1.60)
Note that the imaginary components of these kets are required. They are not merely a mathemati-cal convenience as one sees in classical mechanics. In general, quantum mechanical state vectors have complex coefficients. But this does not mean that the results of physical measurements are complex. On the contrary, we always calculate a measurement probability using a complex square, so all quantum mechanics predictions of probabilities are real.
1.4 " GENERAL QUANTUM SYSTEMS
The machinery we have developed for spin-1/2 systems can be generalized to other quantum systems. For example, if an observable A yields quantized measurement results an for some finite range of n, then we generalize the schematic depiction of a Stern-Gerlach measurement to a measurement of the
A
!a1"
!a2"
!a3"
!Ψin"a2
a1
a3
FIGURE 1.15 Generic depiction of the quantum mechanical measurement of observable A.
6 Stern-Gerlach Experiments
through the upper port of the second analyzer. Thus we say that when the first Stern-Gerlach analyzer measures an atom to have a z-component of spin Sz = +U>2, then the second analyzer also measures Sz = +U>2 for that atom. This result is not surprising, but it sets the stage for results of experiments to follow.
Though both Stern-Gerlach analyzers in Experiment 1 are identical, they play different roles in this experiment. The first analyzer prepares the beam in a particular quantum state 1 0 +92 and the second analyzer measures the resultant beam, so we often refer to the first analyzer as a state prepa-ration device. By preparing the state with the first analyzer, the details of the source of atoms can be ignored. Thus our main focus in Experiment 1 is what happens at the second analyzer because we know that any atom entering the second analyzer is represented by the 0 +9 ket prepared by the first analyzer. All the experiments we will describe employ a first analyzer as a state preparation device, though the SPINS program has a feature where the state of the atoms coming from the oven is deter-mined but unknown, and the user can perform experiments to determine the unknown state using only one analyzer in the experiment.
1.1.2 ! Experiment 2
The second experiment is shown in Fig. 1.4 and is identical to Experiment 1 except that the sec-ond Stern-Gerlach analyzer has been rotated by 90° to be aligned with the x-axis. Now the second analyzer measures the spin component along the x-axis rather the z-axis. Atoms input to the second analyzer are still represented by the ket 0 +9 because the first analyzer is unchanged. The result of this experiment is that atoms appear at both possible output ports of the second analyzer. Atoms leaving the upper port of the second analyzer have been measured to have Sx = +U>2, and atoms leaving
ZZ
50
50
0
!!"!!"
!""!""
!Ψ"
FIGURE 1.3 Experiment 1 measures the spin component along the z-axis twice in succession.
XZ
50
25
25!""
!""x
!!"x!!"
!Ψ"
FIGURE 1.4 Experiment 2 measures the spin component along the z-axis and then along the x-axis.
24 Stern-Gerlach Experiments
Example 1.3 To get some practice using this new matrix notation, and to learn some more about the spin-1/2 system, use the results of Experiment 2 to determine the Sy basis kets using the matrix approach instead of the Dirac bra-ket approach.
Consider Experiment 2 in the case where the second Stern-Gerlach analyzer is aligned along the y-axis. We said before that the results are the same as in the case shown in Fig. 1.4. Thus, we have
P1,+y = @ y8+ @ +9 @2 = 12
P1,-y = @ y8- @ +9 @2 = 12
P2,+y = @ y8+ @ -9 @2 = 12
(1.56)
P2,-y = @ y8- @ -9 @2 = 12 ,
as depicted in the histograms of Fig. 1.14.These results allow us to determine the kets 0 {9y corresponding to the spin component up and
down along the y-axis. The argument and calculation proceeds exactly as it did earlier for the 0 {9x states up until the point [Eq. (1.35)] where we arbitrarily chose the phase a to be zero. Having done that for the 0 {9x states, we are no longer free to make that same choice for the 0 {9y states. Thus we use Eq. (1.35) to write the 0 {9y states as
0 +9y =122
3 0 +9 + eia 0 -94 !122
a 1eiab
0 -9y =122
3 0 +9 - eia 0 -94 !122
a 1-eiab .
(1.57)
To determine the phase a, we use some more information at our disposal. Experiment 2 could be performed with the first Stern-Gerlach analyzer aligned along the x-axis and the second analyzer along the y-axis. Again the results would be identical (50% at each output port), yielding
P+y = @ y8+ @ +9x @2 = 12 (1.58)
(a)
P!yP"y
1
P
Sy
"!2!2
(b)
P!yP"y
1
P
Sy"!2
!2
!Ψin"$%$!""!Ψin"$%$!!"
FIGURE 1.14 Histograms of Sy spin component measurements for input states (a) 0cin9 = 0 +9 and (b) 0cin9 = 0 -9.
YX
1.4 General Quantum Systems 25
as one of the measured quantities. Now use matrix algebra to calculate this:
y8 + 0 + 9x = 112 11 e-ia2 112 a11b
= 12 11 + e-ia2
@ y8 + 0 +9x @2 = 12 11 + e-ia21
2 11 + eia2 (1.59)
= 14 11 + eia + e-ia + 12
= 12 11 + cos a2 = 1
2 .
This result requires that cos a = 0, or that a = {p>2. The two choices for the phase correspond to the two possibilities for the direction of the y-axis relative to the already determined x- and z-axes. The choice a = +p>2 can be shown to correspond to a right-handed coordinate system, which is the standard convention, so we choose that phase. We thus represent the 0 {9y kets as
0 +9y !122
a1ib
0 -9y !122
a 1- ib .
(1.60)
Note that the imaginary components of these kets are required. They are not merely a mathemati-cal convenience as one sees in classical mechanics. In general, quantum mechanical state vectors have complex coefficients. But this does not mean that the results of physical measurements are complex. On the contrary, we always calculate a measurement probability using a complex square, so all quantum mechanics predictions of probabilities are real.
1.4 " GENERAL QUANTUM SYSTEMS
The machinery we have developed for spin-1/2 systems can be generalized to other quantum systems. For example, if an observable A yields quantized measurement results an for some finite range of n, then we generalize the schematic depiction of a Stern-Gerlach measurement to a measurement of the
A
!a1"
!a2"
!a3"
!Ψin"a2
a1
a3
FIGURE 1.15 Generic depiction of the quantum mechanical measurement of observable A.
1.4 General Quantum Systems 25
as one of the measured quantities. Now use matrix algebra to calculate this:
y8 + 0 + 9x = 112 11 e-ia2 112 a11b
= 12 11 + e-ia2
@ y8 + 0 +9x @2 = 12 11 + e-ia21
2 11 + eia2 (1.59)
= 14 11 + eia + e-ia + 12
= 12 11 + cos a2 = 1
2 .
This result requires that cos a = 0, or that a = {p>2. The two choices for the phase correspond to the two possibilities for the direction of the y-axis relative to the already determined x- and z-axes. The choice a = +p>2 can be shown to correspond to a right-handed coordinate system, which is the standard convention, so we choose that phase. We thus represent the 0 {9y kets as
0 +9y !122
a1ib
0 -9y !122
a 1- ib .
(1.60)
Note that the imaginary components of these kets are required. They are not merely a mathemati-cal convenience as one sees in classical mechanics. In general, quantum mechanical state vectors have complex coefficients. But this does not mean that the results of physical measurements are complex. On the contrary, we always calculate a measurement probability using a complex square, so all quantum mechanics predictions of probabilities are real.
1.4 " GENERAL QUANTUM SYSTEMS
The machinery we have developed for spin-1/2 systems can be generalized to other quantum systems. For example, if an observable A yields quantized measurement results an for some finite range of n, then we generalize the schematic depiction of a Stern-Gerlach measurement to a measurement of the
A
!a1"
!a2"
!a3"
!Ψin"a2
a1
a3
FIGURE 1.15 Generic depiction of the quantum mechanical measurement of observable A.
22 Stern-Gerlach Experiments
Note that the two probabilities add to unity, which indicates that we normalized the input state properly. A histogram of the predicted measurement results is shown in Fig. 1.13.
1.3 ! MATRIX NOTATION
Up to this point, we have defined kets mathematically in terms of their inner products with other kets. Thus, in the general case we write a ket as
0 c9 = 8+ 0 c9 0 +9 + 8 - 0 c9 0 -9, (1.45)
or in a specific case, we write
0 +9x = 8+ 0 +9x 0 +9 + 8 - 0 +9x 0 -9 = 112
0 +9 + 1120 -9.
(1.46)
In both of these cases, we have chosen to write the kets in terms of the 0 +9 and 0 -9 basis kets. If we agree on that choice of basis as a convention, then the two coefficients 8+ 0 +9x and 8 - 0 +9x uniquely specify the quantum state, and we can simplify the notation by using just those numbers. Thus, we represent a ket as a column vector containing the two coefficients that multiply each basis ket. For example, we represent 0 +9x as
0 +9x "122
¢11≤ , (1.47)
where we have used the new symbol " to signify “is represented by,” and it is understood that we are using the 0 +9 and 0 -9 basis or the Sz basis. We cannot say that the ket equals the column vector, because the ket is an abstract vector in the state space and the column vector is just two complex num-bers. If we were to choose a different basis for representing the vector, then the complex coefficients would be different even though the vector is unchanged. We need to have a convention for the order-ing of the amplitudes in the column vector. The standard convention is to put the spin up amplitude first (at the top). Thus, the representation of the 0 -9x state in Eq. (1.36) is
0 -9x "122
¢ 1-1≤ d 0 +9
d 0 -9 , (1.48)
1
P
Sz!!2
!Ψin"#$#####!%"#%#####!!"35
45
!2
P!P%
FIGURE 1.13 Histogram of Sz spin component measurements.
22 Stern-Gerlach Experiments
Note that the two probabilities add to unity, which indicates that we normalized the input state properly. A histogram of the predicted measurement results is shown in Fig. 1.13.
1.3 ! MATRIX NOTATION
Up to this point, we have defined kets mathematically in terms of their inner products with other kets. Thus, in the general case we write a ket as
0 c9 = 8+ 0 c9 0 +9 + 8 - 0 c9 0 -9, (1.45)
or in a specific case, we write
0 +9x = 8+ 0 +9x 0 +9 + 8 - 0 +9x 0 -9 = 112
0 +9 + 1120 -9.
(1.46)
In both of these cases, we have chosen to write the kets in terms of the 0 +9 and 0 -9 basis kets. If we agree on that choice of basis as a convention, then the two coefficients 8+ 0 +9x and 8 - 0 +9x uniquely specify the quantum state, and we can simplify the notation by using just those numbers. Thus, we represent a ket as a column vector containing the two coefficients that multiply each basis ket. For example, we represent 0 +9x as
0 +9x "122
¢11≤ , (1.47)
where we have used the new symbol " to signify “is represented by,” and it is understood that we are using the 0 +9 and 0 -9 basis or the Sz basis. We cannot say that the ket equals the column vector, because the ket is an abstract vector in the state space and the column vector is just two complex num-bers. If we were to choose a different basis for representing the vector, then the complex coefficients would be different even though the vector is unchanged. We need to have a convention for the order-ing of the amplitudes in the column vector. The standard convention is to put the spin up amplitude first (at the top). Thus, the representation of the 0 -9x state in Eq. (1.36) is
0 -9x "122
¢ 1-1≤ d 0 +9
d 0 -9 , (1.48)
1
P
Sz!!2
!Ψin"#$#####!%"#%#####!!"35
45
!2
P!P%
FIGURE 1.13 Histogram of Sz spin component measurements.
22 Stern-Gerlach Experiments
Note that the two probabilities add to unity, which indicates that we normalized the input state properly. A histogram of the predicted measurement results is shown in Fig. 1.13.
1.3 ! MATRIX NOTATION
Up to this point, we have defined kets mathematically in terms of their inner products with other kets. Thus, in the general case we write a ket as
0 c9 = 8+ 0 c9 0 +9 + 8 - 0 c9 0 -9, (1.45)
or in a specific case, we write
0 +9x = 8+ 0 +9x 0 +9 + 8 - 0 +9x 0 -9 = 112
0 +9 + 1120 -9.
(1.46)
In both of these cases, we have chosen to write the kets in terms of the 0 +9 and 0 -9 basis kets. If we agree on that choice of basis as a convention, then the two coefficients 8+ 0 +9x and 8 - 0 +9x uniquely specify the quantum state, and we can simplify the notation by using just those numbers. Thus, we represent a ket as a column vector containing the two coefficients that multiply each basis ket. For example, we represent 0 +9x as
0 +9x "122
¢11≤ , (1.47)
where we have used the new symbol " to signify “is represented by,” and it is understood that we are using the 0 +9 and 0 -9 basis or the Sz basis. We cannot say that the ket equals the column vector, because the ket is an abstract vector in the state space and the column vector is just two complex num-bers. If we were to choose a different basis for representing the vector, then the complex coefficients would be different even though the vector is unchanged. We need to have a convention for the order-ing of the amplitudes in the column vector. The standard convention is to put the spin up amplitude first (at the top). Thus, the representation of the 0 -9x state in Eq. (1.36) is
0 -9x "122
¢ 1-1≤ d 0 +9
d 0 -9 , (1.48)
1
P
Sz!!2
!Ψin"#$#####!%"#%#####!!"35
45
!2
P!P%
FIGURE 1.13 Histogram of Sz spin component measurements.
22 Stern-Gerlach Experiments
Note that the two probabilities add to unity, which indicates that we normalized the input state properly. A histogram of the predicted measurement results is shown in Fig. 1.13.
1.3 ! MATRIX NOTATION
Up to this point, we have defined kets mathematically in terms of their inner products with other kets. Thus, in the general case we write a ket as
0 c9 = 8+ 0 c9 0 +9 + 8 - 0 c9 0 -9, (1.45)
or in a specific case, we write
0 +9x = 8+ 0 +9x 0 +9 + 8 - 0 +9x 0 -9 = 112
0 +9 + 1120 -9.
(1.46)
In both of these cases, we have chosen to write the kets in terms of the 0 +9 and 0 -9 basis kets. If we agree on that choice of basis as a convention, then the two coefficients 8+ 0 +9x and 8 - 0 +9x uniquely specify the quantum state, and we can simplify the notation by using just those numbers. Thus, we represent a ket as a column vector containing the two coefficients that multiply each basis ket. For example, we represent 0 +9x as
0 +9x "122
¢11≤ , (1.47)
where we have used the new symbol " to signify “is represented by,” and it is understood that we are using the 0 +9 and 0 -9 basis or the Sz basis. We cannot say that the ket equals the column vector, because the ket is an abstract vector in the state space and the column vector is just two complex num-bers. If we were to choose a different basis for representing the vector, then the complex coefficients would be different even though the vector is unchanged. We need to have a convention for the order-ing of the amplitudes in the column vector. The standard convention is to put the spin up amplitude first (at the top). Thus, the representation of the 0 -9x state in Eq. (1.36) is
0 -9x "122
¢ 1-1≤ d 0 +9
d 0 -9 , (1.48)
1
P
Sz!!2
!Ψin"#$#####!%"#%#####!!"35
45
!2
P!P%
FIGURE 1.13 Histogram of Sz spin component measurements.
22 Stern-Gerlach Experiments
Note that the two probabilities add to unity, which indicates that we normalized the input state properly. A histogram of the predicted measurement results is shown in Fig. 1.13.
1.3 ! MATRIX NOTATION
Up to this point, we have defined kets mathematically in terms of their inner products with other kets. Thus, in the general case we write a ket as
0 c9 = 8+ 0 c9 0 +9 + 8 - 0 c9 0 -9, (1.45)
or in a specific case, we write
0 +9x = 8+ 0 +9x 0 +9 + 8 - 0 +9x 0 -9 = 112
0 +9 + 1120 -9.
(1.46)
In both of these cases, we have chosen to write the kets in terms of the 0 +9 and 0 -9 basis kets. If we agree on that choice of basis as a convention, then the two coefficients 8+ 0 +9x and 8 - 0 +9x uniquely specify the quantum state, and we can simplify the notation by using just those numbers. Thus, we represent a ket as a column vector containing the two coefficients that multiply each basis ket. For example, we represent 0 +9x as
0 +9x "122
¢11≤ , (1.47)
where we have used the new symbol " to signify “is represented by,” and it is understood that we are using the 0 +9 and 0 -9 basis or the Sz basis. We cannot say that the ket equals the column vector, because the ket is an abstract vector in the state space and the column vector is just two complex num-bers. If we were to choose a different basis for representing the vector, then the complex coefficients would be different even though the vector is unchanged. We need to have a convention for the order-ing of the amplitudes in the column vector. The standard convention is to put the spin up amplitude first (at the top). Thus, the representation of the 0 -9x state in Eq. (1.36) is
0 -9x "122
¢ 1-1≤ d 0 +9
d 0 -9 , (1.48)
1
P
Sz!!2
!Ψin"#$#####!%"#%#####!!"35
45
!2
P!P%
FIGURE 1.13 Histogram of Sz spin component measurements.
22 Stern-Gerlach Experiments
Note that the two probabilities add to unity, which indicates that we normalized the input state properly. A histogram of the predicted measurement results is shown in Fig. 1.13.
1.3 ! MATRIX NOTATION
Up to this point, we have defined kets mathematically in terms of their inner products with other kets. Thus, in the general case we write a ket as
0 c9 = 8+ 0 c9 0 +9 + 8 - 0 c9 0 -9, (1.45)
or in a specific case, we write
0 +9x = 8+ 0 +9x 0 +9 + 8 - 0 +9x 0 -9 = 112
0 +9 + 1120 -9.
(1.46)
In both of these cases, we have chosen to write the kets in terms of the 0 +9 and 0 -9 basis kets. If we agree on that choice of basis as a convention, then the two coefficients 8+ 0 +9x and 8 - 0 +9x uniquely specify the quantum state, and we can simplify the notation by using just those numbers. Thus, we represent a ket as a column vector containing the two coefficients that multiply each basis ket. For example, we represent 0 +9x as
0 +9x "122
¢11≤ , (1.47)
where we have used the new symbol " to signify “is represented by,” and it is understood that we are using the 0 +9 and 0 -9 basis or the Sz basis. We cannot say that the ket equals the column vector, because the ket is an abstract vector in the state space and the column vector is just two complex num-bers. If we were to choose a different basis for representing the vector, then the complex coefficients would be different even though the vector is unchanged. We need to have a convention for the order-ing of the amplitudes in the column vector. The standard convention is to put the spin up amplitude first (at the top). Thus, the representation of the 0 -9x state in Eq. (1.36) is
0 -9x "122
¢ 1-1≤ d 0 +9
d 0 -9 , (1.48)
1
P
Sz!!2
!Ψin"#$#####!%"#%#####!!"35
45
!2
P!P%
FIGURE 1.13 Histogram of Sz spin component measurements.
(a) (b) (c)
FIG. 4. Different configurations of sequential SG experiments are treated formally in order to
motivate the need for complex numbers for the description of quantum state vectors. |x 〈+|+〉|2 = 12
represents the probability that the output state is |+〉x when the input state is |+〉. Figures adapted
from Ref. 19.
In order to determine the coefficients, which are in principle assumed to be complex, the
probabilities for the situations described in Fig. 4 are used. From the first line of Fig. 4(a),
9
we have
|x 〈+|+〉|2 =1
2
|(a∗ 〈+|+ b∗ 〈−|) |+〉|2 =1
2
|a|2 =1
2.
(11)
Performing similar calculations for the other three lines of Fig. 4(a), we obtain the
following relations between the coefficients of the Sx states
|a|2 = |b|2 = |c|2 = |d|2 =1
2. (12)
Thus, we know the absolute square of each coefficient, but their phase is indeterminate.
Since the absolute phase is not physically measurable, one coefficient of each vector of the Sx
states can be chosen to be real without loss of generality24, so that we can take a = c = 1√2.
Both state vectors (|+〉x and |−〉x) are already normalized, but not necessarily orthogonal.
Imposing the latter condition we obtain
|x 〈−|+〉x| = 0[1√2〈+|+ d∗ 〈−|
] [1√2|+〉+ b |−〉
]= 0
d∗b = −1
2.
(13)
Since our goal is to try to use only real numbers, we are free to choose b = 1√2
and
d = − 1√2, which enables us to represent the Sx states (in matrix notation)25 as
|+〉x.=
1√2
1
1
and |−〉x.=
1√2
1
−1
, (14)
showing that we managed to represent the Sx states in the Sz basis without using complex
numbers. So far, so good.
Now let us try to do the same Sy states. From Fig. 4(b) we arrive at similar relations
as (12), i.e., |e|2 = |f |2 = |g|2 = |h|2 = 12, and make the same initial assumption that
e = g = 1√2. Orthogonality of the Sy states gives us, just like in (13), f ∗h = −1
2. In order
to avoid complex numbers and to distinguish the Sy from the Sx states, we are tempted to
choose f = − 1√2
and h = 1√2.
10
However, this would be in contradiction with the situation represented by Fig 4(c). Tak-
ing, for instance, the first line
|y 〈+|+〉x|2 =
1
2∣∣∣∣( 1√2〈+|+ f ∗ 〈−|) 1√
2(|+〉+ |−〉)
∣∣∣∣2 =1
2∣∣∣∣12 +f ∗√
2
∣∣∣∣2 =1
2,
(15)
we can see that f = − 1√2
cannot be a solution. Since f cannot be 1√2
either, in order to
distinguish the Sy states from the Sx, this already suffices to demonstrate that we won’t be
able to represent the Sy states only with real numbers. In fact, by looking carefully at (15)
we can already predict that f needs to be purely imaginary. To show this more formally,
we write f as a general complex number with absolute square equal to 12, i.e, f = eiα√
2, and
substitute it in (15),
(1
2+e−iα
2
)(1
2+eiα
2
)=
1
21
4+
1
4
(e−iα + eiα
)+
1
4=
1
2
cosα = 0
α = ±π2.
(16)
Choosing α = π2, f = i√
2and h = −i√
2, since f ∗h = −1
2. These coefficients also verify all
the relations of Fig 4(c). Thus, we can represent the the Sy states in the Sz basis as
|+〉y.=
1√2
1
i
and |−〉y.=
1√2
1
−i
. (17)
After arriving at this result, Townsend (Ref. 20, p. 17) concludes:
The appearance of i’s [in (17)] is one of the key ingredients of a description
of nature by quantum mechanics. Whereas in classical physics we often use
complex numbers as an aid to do calculations, there they are not essential.
The straightforward Stern-Gerlach experiments we have outlined [...] demand
complex numbers for their explanation.
11
This example shows that although it is possible to write the Sx states in the Sz basis
without complex numbers, they are inevitable to describe the Sy states in the same basis.
A more general way to put it is to say that complex numbers are indispensable to describe
(two-state) systems with more than two incompatible observables.
This type of reason is analogous to other more mathematical arguments where complex
numbers appear due to a necessity of some kind of generalization/expansion of our notion
of number, e.g., in connection with the fundamental theorem of algebra. It is somehow also
related with the first geometrical interpretation of complex numbers, where they appear as
a necessity to fully represent direction in the plane.21 One aspect that can be criticized in
this approach is the fact that the coefficients of quantum state vectors are assumed to be
complex numbers from the beginning, and no justification for that is usually offered.
D. To enable continuous transitions
The forth justification presented here is based on the last of Lucien Hardy’s “five reason-
able axioms” from which quantum theory can be deduced22 and has been presented in an
insightful paper written by Artur Eckert.23 It starts with the possibility of building quantum
mechanics entirely from classical probability in order to see where this leads. The initial
assumption is that a quantum state is described by a column vector in which each term rep-
resents the probability of the system to be measured in a given configuration. For instance,
in a two-state system like the spin 1/2 discussed in the previous section, such a vector could
look like
~p0 =
0.3
0.7
, (18)
meaning that it has 30% probability to be measured with its spin up and 70% down.
In such a formalism, transitions are represented by stochastic matrices, whose elements
are positive real numbers and whose values in each column add up to 1. One random
example of a transition (T1) applied to ~p0 is
T1 ~p0 =
0.2 0.4
0.8 0.6
0.3
0.7
=
0.34
0.66
, (19)
12
meaning that after this transition there is 34% probability for spin up and 66% for spin
down. The elements of the stochastic matrices represent the probabilities for the system to
either remain (main diagonal) or swap states (antidiagonal). Note that after the transition
the probabilities still add up to 1.
When one thinks about time evolution, a physically desirable requirement is that these
transitions are continuous. In other words, it should be possible to view any transition as a
sequence of independent transitions over shorter periods of time. Among other things, this
means that one should be able to extract square or cubic roots (in fact any nth root) of any
transition and the result should be another stochastic matrix. Let us take, for instance, one
square root of T1
(T1)1/2 =
1
3
1 + 2√5i 1− 1√
5i
2− 2√5i 2 + 1√
5i
,
which enables us to write T1 as
T1 =1
3
1 + 2√5i 1− 1√
5i
2− 2√5i 2 + 1√
5i
1
3
1 + 2√5i 1− 1√
5i
2− 2√5i 2 + 1√
5i
. (20)
One can easily see that our initial goal was not achieved, i.e., it was not possible to write
the stochastic matrix T1 as the product of two (equal) stochastic matrices. The elements of
(T1)1/2 are complex numbers with non-zero imaginary parts.26
This example is enough to show that if one requires continuity of transitions, real numbers
are not sufficient. The attempt to describe quantum mechanical systems with classical prob-
ability failed and complex numbers appeared inevitably. As in the first justification, this one
has also the reductio ad absurdum structure. Another advantage is that one counterexample
is sufficient to show the need for a more general set of numbers. This justification can also
lead to arguments related to the need for a unitary time evolution. A possible drawback is
that this explanation can be too mathematical and wouldn’t make much sense for newcomers
who aren’t used to a probabilistic description of the physical world. Another critical remark
can be that although it shows that real numbers are not sufficient, it does not fully prove
that complex numbers are enough for a complete quantum mechanical formalism.
13
III. CONCLUSIONS
Complex numbers seem to be fundamental for the description of the world proposed by
quantum mechanics. In principle, this can be a source of puzzlement: Why do we need such
abstract entities to describe real things?
One way to refute this bewilderment is to stress that what we can measure is essentially
real, so complex numbers are not directly related to observable quantities. A more philo-
sophical argument is to say that real numbers are no less abstract than complex ones, the
actual question is why mathematics is so effective for the description of the physical world.28
Regardless of philosophical discussions, a newcomer to quantum mechanics will encounter
complex variables and it is legitimate for her/him to ask why do we need them. A brief
search on the internet also shows numerous forums discussing the topic. As mentioned in
the introduction, some of the very founders of theory were puzzled by this issue. So the
question posed in the title is clearly perceived as relevant and it is worthwhile to dedicate
some didactic effort to come up with plausible justifications.
When comparing the answers presented here, it is quite interesting to see how they differ
from one another. Four apparently different physical principles/properties are used to justify
the need for complex numbers, namely (II.A): the impossibility of having information on
position when momentum is exactly known; (II.B): the fact that i appears explicitly in the
Schrodinger equation; (II.C): the descriptions of Sx, Sy and Sz in sequential Stern-Gerlach
experiments; and (II.D): the demand for continuous transitions. For courses adopting a
position first framework, A and B should be more appropriate, whereas C and D are more
suitable for the ones that chose a spin first. This diversity of explanations reflects a historical
(and actual) characteristic trait of quantum mechanics, which is the possibility of multiple
theoretical approaches.
It is reasonable to assume that people with different backgrounds will have different
preferences. But perhaps the question “Which is THE best justification?” is not the most
appropriate. One learns something from each justification and a deep understanding of
the matter is probably related to the establishment of different connections. For a physics
instructor, one can conjecture that “the more the better”, as it seems plausible to assume
that one important teaching competence is to possess a broad repertoire of explanations.
14
ACKNOWLEDGMENTS
I would like to thank the many colleagues and friends who made comments to previ-
ous versions of this paper and/or with whom I had the privilege to discuss this intriguing
topic. Among them are Elina Palmgren, Ismo Koponen, Tommi Kokkonen, Giacomo Zuc-
carini, Magnus Bøe, Carlos Eduardo Aguiar, Nelson Studart, Debora Coimbra, Helge Kragh,
Christian Joas, Thiago Hartz, Ramon Lopez, Paul van Kampen, Mossy Kelly, Nuria Munoz
Gargante, Robert Lambourne, John Stack and R. Shankar. I am also indebted to the re-
viewers for their valuable comments and suggestions.
1 Salomon Bochner, “The significance of some basic mathematical conceptions for physics,” Isis
54(2), 179-205 (1963).
2 This struggle is well documented in Schrodinger’s letters and papers. Here we give two examples.
In a letter to Lorentz on June 6, 1926, Schrodinger wrote: “What is unpleasant here, and
indeed directly to be objected to, is the use of complex numbers. ψ is surely fundamentally a
real function”. The last paragraph of his forth communication to the Annalen der Physik also
illustrates this feeling of puzzlement and frustration: “Meantime, there is no doubt a certain
crudeness in the use of a complex wave function.”
3 C.N. Yang, “Square root of minus one, complex phases and Erwin Schrodinger,” in Schrodinger:
Centenary celebration of a polymath, edited by C.W. Kilmister (Cambridge University Press,
Cambridge, 1987), pp. 53-64.
4 In fact, Dirac is quite explicit about that in the forth edition of his celebrated book The Prin-
ciples of Quantum Mechanics (p. 20, our emphasis): “Our bra and ket vectors are complex
quantities, since they can be multiplied by complex numbers and are then of the same nature
as before, but they are complex quantities of a special kind which cannot be split up into real
and pure imaginary parts. The usual method of getting the real part of a complex quantity, by
taking half the sum of the quantity itself and its conjugate, cannot be applied since a bra and
a ket vector are of different natures and cannot be added.”
5 It is worth mentioning that we are scratching the surface of a much deeper question that has been
15
addressed by many serious researchers on mathematical foundations of quantum mechanics (e.g.
Ref. 6). Among them, the work of Stueckelberg (Ref. 7) stands out as a plausible formulation
of quantum mechanics in real Hilbert space (see Ref. 8 for a pedagogical introduction). The
interested reader will also find numerous discussions about the topic in internet forums, such as
https://physics.stackexchange.com/questions/32422/qm-without-complex-numbers.
6 Felix M. Lev, “Why is quantum physics based on complex numbers?” Finite Fields and Their
Applications, 12, 336 (2006).
7 Ernst C. G. Stueckelberg, “Quantum Theory in Real Hilbert Space.” Helv. Phys. Acta 33, 727
(1960).
8 Jan Myrheim, “Quantum Mechanics on a Real Hilbert Space.” arXiv:quant-ph/9905037.
9 A broad literature review shows very few attempts to explain the need for complex numbers
in quantum mechanics, both in introductory level textbooks or physics education journals. An
exception is the paper “Why i?” (W. E. Baylis et al., Am J Phys 60 (9), 788, 1992), which
actually deals with the question of why complex numbers are useful for physics in general. This
paper is based on the geometric algebra by David Hestenes, who proposes a new mathematical
formalism for physics (Am J Phys 71 (2), 104, 2003). Because of its geometrical interpretation,
the reasons for “Why i?” in quantum mechanics enabled by this formalism are very convincing
and it is definitely worthwhile pursuing this project. However, here we are concerned with
physics instructors who use more standard approaches.
10 Ramamurti Shankar, Fundamentals of Physics II: Electromagnetism, Optics, and Quantum
Mechanics (Yale University Press, New Haven, 2016).
11 If one takes the precise definition of uncertainty in quantum theory, the cosine function does
not violate the uncertainty principle, since its uncertainty is infinite (plane wave). Nevertheless,
the general argument that this function does provide information about position still holds. I
am indebted to two reviewers for pointing this out.
12 David Bohm, Quantum Theory (Prentice-Hall, New York, 1951).
13 The main argument would remain valid in the general case.
14 From a pedagogical perspective it would be important to show where this equation comes
from. Schrodinger derived it from Hamilton’s optical-mechanical analogy (See Ref. 15). For our
purpose here, it is worth stressing that one does not need to use the complex exponential in
this derivation, i.e., the same expression is obtained if one assumes ψ(x, t) = Ψ(x) · cos E~ t.
16
15 Erwin Schrodinger, Four lectures on wave mechanics, delivered at the Royal institution, London,
on 5th, 7th, 12th, and 14th March, 1928. (Blackie & son limited, London, Glasgow, 1928).
16 If we had represented ψ as a purely real periodic function, e.g. ψ(x, t) = Ψ(x) · cos E~ t, then we
would need to derive it twice with respect to time to isolate the energy parameter (∂2ψ∂t2
= −E2
~2 ψ).
In order to substitute this in Eq. (7), we would also need to square the whole equation and would
end up with a rather complicated fourth order equation that looks like ( ∂2
∂x2−2m
~ V )2ψ+4m2
~2∂2ψ∂t2
=
0. Curiously, this fourth-order equation was the one Schrodinger initially claimed to be “the
uniform and general wave equation for the scalar field ψ”, since apparently it would be possible
to consider ψ as a real function. One can hypothesize that Schrodinger did this in order to avoid
an explicit i in his fundamental equation. Dealing with this matter is beyond the scope of this
article, but the interested reader can find deeper discussions about the implications of a real
wave function in the continuation of Bohm’s argument12 and in Ref. 17.
17 Robert L. W. Chen, “Derivation of the real form of Schrodinger’s equation for a nonconservative
system and the unique relation between Re(ψ) and Im(ψ)”, J. Math. Phys., 30 (1), 1988.
18 Jun John Sakurai, Advanced Quantum Mechanics (Addison-Wesley, Boston, 1967).
19 D. H. McIntyre, C. A. Manogue, J. Tate, Quantum Mechanics: A Paradigms Approach (Pearson
Addison-Wesley, Boston, 2012).
20 John S. Townsend, A Modern Approach to Quantum Mechanics, Second Edition (University
Science Books, Mill Valley, CA, 2013).
21 Paul J. Nahin, Imaginary Tale: The Story of√−1 (Princeton University Press, Princeton, 1998).
22 Lucien Hardy, “Quantum Physics From Five Reasonable Axioms,” arXiv:quant-ph/0101012v4.
23 Artur Ekert, “Complex and unpredictable Cardano,” Int J Theor Phys,47, 2101 (2008).
24 We will assume that a and c are real because we want to see if we can get away without complex
numbers. A more general treatment of the situation is presented in Ref 20.
25 The symbol.= is used to indicate “is represented by”. One cannot set the ket equal to the
column vector, because the former is an abstract vector and the latter is its representation in a
given basis.
26 There is a great deal of gymnastics to calculate the square root of a square matrix. For this
particular case, the fact that the determinant of T1 is negative already guarantees that the
components of its square roots are complex numbers with non-zero imaginary parts (see Ref.
27).
17
27 Bernard W. Levinger, “The Square Root of a 2 x 2 Matrix”, Mathematics Magazine, 53 (4),
222-224 (1980).
28 See, for instance, Wigner’s Unreasonable Effectiveness paper (Pure Appl. Math. 13, 114 , 1960)
and the debates related to it.
18