cs 371 { discrete fourier transformation: applicationssthorgei/guest_lecture_1.pdf · 2018. 7....
TRANSCRIPT
CS 371 – Discrete Fourier Transformation: Applications
Andreas Stockel
July 23, 2018
1 Introduction
The Discrete Fourier Transformation (DFT) is one of the workhorses of modern technology. We constantly
interact with systems based on the discrete Fourier and related transformations, such as the Discrete Cosine
Transformation (DCT). This is because these frequency space transformations have (among others) three
interesting properties. First, they can be used to efficiently decorrelate natural signals. Second, by exploiting
the fact that the forward and inverse Fourier transformations can be computed in O(n log n) instead of O(n2),
they allow us to implement certain algorithms in a computationally more efficient way. Third, certain math
problems are easier to solve in the frequency domain. Today, we will have a closer look at the decorrelation
property and data compression in particular.
Table 1: Some applications of the Discrete Fourier Transformation and related transformations
Decorrelation/
Spectral analysis
• Telecommunication
– Demodulation of OFDM signals
– Broadcasts (DVB)
– Mobile phone networks (4G)
– Internet access (DSL, DOCSIS, Wi-Fi)
• Data compression
– Video (HEVC, AV1)
– Audio (MP3, Opus)*
– Images (JPEG)*
• Scientific/medical instruments
– MRT
– Spectrometers
• Feature extraction
– Speech analysis (MFCCs)
– Music analysis and fingerprinting
Efficient computation
• Polynomial multiplication
• Convolution
– Filtering signals
– Image processing
• Quantum Fourier Transformation
– Cryptography, Shor’s algorithm
Mathematical problems
• Solving (discretised forms) certain partial
differential equations
2 Data compression
Digital representations of audio signals, images, and videos tend to be fairly large, making it expensive to
transmit them over global communication networks or to store them on digital media in their raw form.
1
The idea of lossy data compression is to not transmit and store information that is (a) redundant and (b)
irrelevant to the human receiver (this is the lossy step).
Next we have a glimpse at how this compression works in the JPEG image and the MP3 audio codec, both
developed at the end of the 1980s/beginning of the 1990s. The encoding (i.e., compression) process can be
summarised as follows:
(I) Split the signal into temporal (audio) or spatial blocks (images)
Working on images/audio signals of unbounded size as a whole is computationally and numerically
intractable, work on small “chunks” of data instead.
(II) Decorrelate the data by computing the DCT of each block
(III) Quantise the DCT coefficients, set small coefficients to zero (← information is lost here)
Observation: this works well for natural images (photos).
(IV) Entropy (Huffman) encoding of the data
Allocate fewer bits for common symbols/bit sequences. For example, if there are many consecutive
“zeros” in the signal, we need fewer bits to encode these.
2.1 Discrete Cosine Transform (DCT)
In contrast to the DFT, the DCT operates on real numbers. Correspondingly, for many real-valued natural
signals (sound waves, images), computing the DCT is faster in practice, although the computational complexity
is in O(n log n) as well. There are several types of DCT transformations with slightly different properties.
Here, as an example, we will focus on the so-called DCT-II transformation, which is used in most practical
applications.
Given a real-valued, discrete signal x ∈ RN , the DCT-II transformed signal x ∈ RN is defined as an invertible
linear transformation TDCT
xk =
N−1∑n=0
2xn cos
[π
N
(n+
1
2
)k
],
x = TDCTx .
Compare this to the DFT
xk =
N−1∑n=0
xn exp
(−2πi
Nkn
)=
N−1∑n=0
xn
(cos
[2πkn
N
]− i sin
[2πkn
N
]).
The DCT and the DFT are closely related. We can compute the DCT-II using the DFT by quadrupling and
mirroring the input signal. Let x′ ∈ R4N be our new input signal, such that
x′ = [0, x0, 0, x1, 0, . . . , xN−1, 0, xN−1, . . . , 0, x1, 0, x0] ,
x′k =
0 if k is even ,
x(k−1)/2 if k < 2N ,
x2N−(k−1)/2 if k > 2N .
2
Proof. Idea: first ignore zero entries in x′n by substituting the index variable n, then exploit the symmetry of
x′n, and finally apply the identities cos(−x) = cos(x) and sin(−x) = − sin(x).
4N−1∑n=0
x′n
[cos
(2πkn
4N
)− i sin
(2πkn
4N
)]
=
2N−1∑n=0
x′2n+1
(cos
[2πk(2n+ 1)
4N
]− i sin
[2πk(2n+ 1)
4N
])
=
2N−1∑n=0
x′2n+1
(cos
[π
N
(n+
1
2
)k
]− i sin
[π
N
(n+
1
2
)k
])
=
N−1∑n=0
xn
(cos
[π
N
(n+
1
2
)k
]− i sin
[π
N
(n+
1
2
)k
]+ cos
[π
N
(2N − n− 1
2
)k
]− i sin
[π
N
(2N − n− 1
2
)k
])
=
N−1∑n=0
xn
(cos
[π
N
(n+
1
2
)k
]− i sin
[π
N
(n+
1
2
)k
]+ cos
[π
N
(n+
1
2
)k
]+ i sin
[π
N
(n+
1
2
)k
])
=
N−1∑n=0
2xn cos
[π
N
(n+
1
2
)k
]� .
As shown in Figure 1, the DCT coefficients neither directly correspond to frequencies present in the signal,
nor the phase of individual frequencies (which is stored in the angle of the complex coefficient). However, the
DCT is still useful to decorrelate signals.
2.2 Decorrelation
Natural signals are highly self-correlated, i.e., the information contained in them is redundant. Given a sample
value xk we can predict the value of neighbouring samples xk′ with high confidence. To store, transmit, or
analyse them efficiently, we need to decorrelate the information.
As an example for signal correlation, consider the image in Figure 2, which demonstrates that the usual
RGB representation of digital images has a high correlation between colour channels. We can decorrelate the
channels by applying a simple linear transformation, in this case to the so called YUV colour space.
Another example is shown in Figure 4 (which is similar to Figure 1). Neighbouring samples in the sine or
cosine wave are highly similar. Transforming the signal linearly using a DCT or DFT reduces most samples
to zeros.
Decorrelation in the JPEG codec JPEG first transforms images to YUV colour space (Figure 2) and
then handles the three colour channels (intensity (Y ), cyan vs. yellow (U), green vs. red (V )) as individual
images.1
Since it would be intractable to apply the DCT to images of arbitrary resolution, the image is first subdivided
into blocks of 8× 8 pixels, where each pixel value corresponds to the intensity of the corresponding colour
channel at that location of the image. These blocks are stored independently of each other. We will now
look at one of these 8× 8 blocks, which can be represented as a vector x ∈ [0, 1]64. When we normally think
1Note that JPEG has the option to subsample colour information, i.e., to store the colour planes at a lower resolution. This
works because the human eye is more sensitive to changes in light intensity than colour, so, in general, people won’t notice.
3
0.0 2.5 5.0 7.5 10.0
Time (ms)
Signal
cos sin
197.5 200.0 202.5 205.0
Frequency (Hz)
Amplitude (DCT)
cos
cos
197.5 200.0 202.5 205.0
Frequency (Hz)
Power spectrum (DFT)
cos
cos
Figure 1: Comparison between DCT and DFT. Note that for the cosine signal (orange), the DCT corresponds to
a single coefficient with a non-zero amplitude. For the phase shifted cosine signal (blue, sine wave), the DCT has
multiple non-zero coefficients. At the same time the power-spectrum of the DFT (absolute value of the coefficients) is
the same for both signals.
= + +
Figure 2: An image split into its individual red, green, and blue components. Note how the intensity of the individual
channels is similar—the information is highly correlated.
= + +
Figure 3: The same image as in Figure 2; however, the individual pixel values (r, g, b) are transformed linearly to
(y, u, v)T = T · (r, g, b)T , where T is a linear transformation matrix. The intensity information is now solely in the first
channel, while colour information is stored in the second and third channel.
-1
-0.5
0
0.5
1
0 0.001 0.002 0.003 0.004 0.005 0.006
Am
plitu
de
Time [s]
0
5
10
15
20
25
0 500 1000 1500 2000 2500 3000
Am
plitu
de
Frequency [Hz]
Figure 4: Left image shows a sampled (discrete) sine wave. Again, note how the values are correlated, i.e. neighbouring
samples have a similar value. Applying the FFT (right image; power spectrum) decorrelates the samples.
4
about pixel data, this vector is represented in terms of the standard basis, that is
x = Ix =
N∑k=1
〈ek,xk〉, where ek are the basis vectors of the standard orthonormal basis.
Figure 5 (a) visualises what the basis vectors look like in terms of 8× 8 pixel blocks. Intuitively, each entry
in x corresponds to a single element/pixel in an 8× 8 block. Since image data is often locally correlated, this
is not very data-efficient.
This changes when we represent x in terms of the two-dimensional DCT basis. That is, we compute
x = TDCTx. To reconstruct our block x in pixel-space, we apply the inverse DCT. Since the DCT basis is
orthogonal, it holds T−1DCT = TTDCT. We obtain
x = TTDCT · TDCT · x = TT
DCT · x .
Correspondingly, we can say that x represents our image block in terms of the basis (TDCT)T . As we can see in
the basis vector visualisation in Figure 5 (b), each entry in x influences many pixels in the corresponding x. If
we apply this transformation to natural images, many of the vector entries representing high-frequency content
will be close to zero. Setting as many coefficients to zero as possible allows them to be represented with few
bits in the entropy coding stage (Huffman compression). In photos, we can safely discard these coefficients
(set them to zero) without this being noticeable. However, compressing images containing diagrams or text
using JPEG reveals the block structure and high-frequency artefacts stemming from this quantization step
(see Figure 6).
2.3 Why is the DCT a good choice for the compression of natural signals?
We know that we can compute the DCT and the inverse DCT quickly (compared to a naıve matrix-vector
multiplication) in O(n log n). This is an important property for most practical applications. However, for
data compression, the question remains whether the DCT is actually good choice for decorrelating natural
data compared to other techniques.
Principal Component Analysis The mathematically optimal method to linearly decorrelate signals is to
project them onto their principal components. Principal components are computed using the so called Principal
Compontent Analysis (PCA). Without going into details, PCA is essentially an Eigenvector decomposition of
the covariance matrix
XTX = V ΛV T , where E(X) = 0 and Λ is a diagonal matrix of eigenvalues .
The constraint E(X) = 0 means that each data-dimension in X must be centred around zero. The new
orthogonal basis corresponds to the eigenvectors V = (v1, . . . , vN ).
Experiment To see what the basis we would obtain from a PCA would look like, we randomly sample
(here, 4096) 8× 8 pixel blocks from a natural image and compute the PCA basis. Results are depicted in
Figure 7. As we can see, most of the basis vectors obtained via PCA look similar to those in the 2D DCT
(Figure 5). Correspondingly, using the DCT basis is an excellent choice—it intrinsically corresponds to an
“optimal” decorrelation of natural images, while at the same time the transformation is computationally more
efficient compared to standard matrix-vector multiplication (projection).
5
(a) (b)
Figure 5: Standard versus DCT basis. (a) Visualisation of the standard basis for an 8 × 8 block. Each entry in x
corresponds to a single pixel. (b) Visualisation of the DCT basis. Each entry in x influences many pixels in the target
block.
Figure 6: Image compressed with various quality settings (high to low quality from left to right). The quality roughly
corresponds to the number of DCT coefficients that are set to zero. Close inspection of the images reveals the block
structure and high-frequency artefacts.
Input image Blocks (subset) PCA basis vectors
(a) (b) (c)
Figure 7: Computing a new basis for image decorrelation using Eigenvector decomposition/PCA. (a) Natural source
image. (b) Selection of some 8 × 8 blocks sampled from the input image. (c) Corresponding basis vectors computed
using Eigenvector decomposition, sorted by magnitude of the corresponding eigenvalue (left to right, top to bottom).
6