cs 371 { discrete fourier transformation: applicationssthorgei/guest_lecture_1.pdf · 2018. 7....

$: CS 371 { Discrete Fourier Transformation: Applicationssthorgei/guest_lecture_1.pdf · 2018. 7. 23. · CS 371 { Discrete Fourier Transformation: Applications Andreas St ockel July$
CS 371 – Discrete Fourier Transformation: Applications

Andreas Stockel

July 23, 2018

1 Introduction

The Discrete Fourier Transformation (DFT) is one of the workhorses of modern technology. We constantly

interact with systems based on the discrete Fourier and related transformations, such as the Discrete Cosine

Transformation (DCT). This is because these frequency space transformations have (among others) three

interesting properties. First, they can be used to efficiently decorrelate natural signals. Second, by exploiting

the fact that the forward and inverse Fourier transformations can be computed in O(n log n) instead of O(n2),

they allow us to implement certain algorithms in a computationally more efficient way. Third, certain math

problems are easier to solve in the frequency domain. Today, we will have a closer look at the decorrelation

property and data compression in particular.

Table 1: Some applications of the Discrete Fourier Transformation and related transformations

Decorrelation/

Spectral analysis

• Telecommunication

– Demodulation of OFDM signals

– Broadcasts (DVB)

– Mobile phone networks (4G)

– Internet access (DSL, DOCSIS, Wi-Fi)

• Data compression

– Video (HEVC, AV1)

– Audio (MP3, Opus)*

– Images (JPEG)*

• Scientific/medical instruments

– MRT

– Spectrometers

• Feature extraction

– Speech analysis (MFCCs)

– Music analysis and fingerprinting

Efficient computation

• Polynomial multiplication

• Convolution

– Filtering signals

– Image processing

• Quantum Fourier Transformation

– Cryptography, Shor’s algorithm

Mathematical problems

• Solving (discretised forms) certain partial

differential equations

2 Data compression

Digital representations of audio signals, images, and videos tend to be fairly large, making it expensive to

transmit them over global communication networks or to store them on digital media in their raw form.

1

The idea of lossy data compression is to not transmit and store information that is (a) redundant and (b)

irrelevant to the human receiver (this is the lossy step).

Next we have a glimpse at how this compression works in the JPEG image and the MP3 audio codec, both

developed at the end of the 1980s/beginning of the 1990s. The encoding (i.e., compression) process can be

summarised as follows:

(I) Split the signal into temporal (audio) or spatial blocks (images)

Working on images/audio signals of unbounded size as a whole is computationally and numerically

intractable, work on small “chunks” of data instead.

(II) Decorrelate the data by computing the DCT of each block

(III) Quantise the DCT coefficients, set small coefficients to zero (← information is lost here)

Observation: this works well for natural images (photos).

(IV) Entropy (Huffman) encoding of the data

Allocate fewer bits for common symbols/bit sequences. For example, if there are many consecutive

“zeros” in the signal, we need fewer bits to encode these.

2.1 Discrete Cosine Transform (DCT)

In contrast to the DFT, the DCT operates on real numbers. Correspondingly, for many real-valued natural

signals (sound waves, images), computing the DCT is faster in practice, although the computational complexity

is in O(n log n) as well. There are several types of DCT transformations with slightly different properties.

Here, as an example, we will focus on the so-called DCT-II transformation, which is used in most practical

applications.

Given a real-valued, discrete signal x ∈ RN , the DCT-II transformed signal x ∈ RN is defined as an invertible

linear transformation TDCT

xk =

N−1∑n=0

2xn cos

[π

N

(n+

1

2

)k

],

x = TDCTx .

Compare this to the DFT

xk =

N−1∑n=0

xn exp

(−2πi

Nkn

)=

N−1∑n=0

xn

(cos

[2πkn

N

]− i sin

[2πkn

N

]).

The DCT and the DFT are closely related. We can compute the DCT-II using the DFT by quadrupling and

mirroring the input signal. Let x′ ∈ R4N be our new input signal, such that

x′ = [0, x0, 0, x1, 0, . . . , xN−1, 0, xN−1, . . . , 0, x1, 0, x0] ,

x′k =

0 if k is even ,

x(k−1)/2 if k < 2N ,

x2N−(k−1)/2 if k > 2N .

2

Proof. Idea: first ignore zero entries in x′n by substituting the index variable n, then exploit the symmetry of

x′n, and finally apply the identities cos(−x) = cos(x) and sin(−x) = − sin(x).

4N−1∑n=0

x′n

[cos

(2πkn

4N

)− i sin

(2πkn

4N

)]

=

2N−1∑n=0

x′2n+1

(cos

[2πk(2n+ 1)

4N

]− i sin

[2πk(2n+ 1)

4N

])

=

2N−1∑n=0

x′2n+1

(cos

[π

N

(n+

1

2

)k

]− i sin

[π

N

(n+

1

2

)k

])

=

N−1∑n=0

xn

(cos

[π

N

(n+

1

2

)k

]− i sin

[π

N

(n+

1

2

)k

]+ cos

[π

N

(2N − n− 1

2

)k

]− i sin

[π

N

(2N − n− 1

2

)k

])

=

N−1∑n=0

xn

(cos

[π

N

(n+

1

2

)k

]− i sin

[π

N

(n+

1

2

)k

]+ cos

[π

N

(n+

1

2

)k

]+ i sin

[π

N

(n+

1

2

)k

])

=

N−1∑n=0

2xn cos

[π

N

(n+

1

2

)k

]� .

As shown in Figure 1, the DCT coefficients neither directly correspond to frequencies present in the signal,

nor the phase of individual frequencies (which is stored in the angle of the complex coefficient). However, the

DCT is still useful to decorrelate signals.

2.2 Decorrelation

Natural signals are highly self-correlated, i.e., the information contained in them is redundant. Given a sample

value xk we can predict the value of neighbouring samples xk′ with high confidence. To store, transmit, or

analyse them efficiently, we need to decorrelate the information.

As an example for signal correlation, consider the image in Figure 2, which demonstrates that the usual

RGB representation of digital images has a high correlation between colour channels. We can decorrelate the

channels by applying a simple linear transformation, in this case to the so called YUV colour space.

Another example is shown in Figure 4 (which is similar to Figure 1). Neighbouring samples in the sine or

cosine wave are highly similar. Transforming the signal linearly using a DCT or DFT reduces most samples

to zeros.

Decorrelation in the JPEG codec JPEG first transforms images to YUV colour space (Figure 2) and

then handles the three colour channels (intensity (Y ), cyan vs. yellow (U), green vs. red (V )) as individual

images.1

Since it would be intractable to apply the DCT to images of arbitrary resolution, the image is first subdivided

into blocks of 8× 8 pixels, where each pixel value corresponds to the intensity of the corresponding colour

channel at that location of the image. These blocks are stored independently of each other. We will now

look at one of these 8× 8 blocks, which can be represented as a vector x ∈ [0, 1]64. When we normally think

1Note that JPEG has the option to subsample colour information, i.e., to store the colour planes at a lower resolution. This

works because the human eye is more sensitive to changes in light intensity than colour, so, in general, people won’t notice.

3

0.0 2.5 5.0 7.5 10.0

Time (ms)

Signal

cos sin

197.5 200.0 202.5 205.0

Frequency (Hz)

Amplitude (DCT)

cos

cos

197.5 200.0 202.5 205.0

Frequency (Hz)

Power spectrum (DFT)

cos

cos

Figure 1: Comparison between DCT and DFT. Note that for the cosine signal (orange), the DCT corresponds to

a single coefficient with a non-zero amplitude. For the phase shifted cosine signal (blue, sine wave), the DCT has

multiple non-zero coefficients. At the same time the power-spectrum of the DFT (absolute value of the coefficients) is

the same for both signals.

= + +

Figure 2: An image split into its individual red, green, and blue components. Note how the intensity of the individual

channels is similar—the information is highly correlated.

= + +

Figure 3: The same image as in Figure 2; however, the individual pixel values (r, g, b) are transformed linearly to

(y, u, v)T = T · (r, g, b)T , where T is a linear transformation matrix. The intensity information is now solely in the first

channel, while colour information is stored in the second and third channel.

-1

-0.5

0

0.5

1

0 0.001 0.002 0.003 0.004 0.005 0.006

Am

plitu

de

Time [s]

0

5

10

15

20

25

0 500 1000 1500 2000 2500 3000

Am

plitu

de

Frequency [Hz]

Figure 4: Left image shows a sampled (discrete) sine wave. Again, note how the values are correlated, i.e. neighbouring

samples have a similar value. Applying the FFT (right image; power spectrum) decorrelates the samples.

4

about pixel data, this vector is represented in terms of the standard basis, that is

x = Ix =

N∑k=1

〈ek,xk〉, where ek are the basis vectors of the standard orthonormal basis.

Figure 5 (a) visualises what the basis vectors look like in terms of 8× 8 pixel blocks. Intuitively, each entry

in x corresponds to a single element/pixel in an 8× 8 block. Since image data is often locally correlated, this

is not very data-efficient.

This changes when we represent x in terms of the two-dimensional DCT basis. That is, we compute

x = TDCTx. To reconstruct our block x in pixel-space, we apply the inverse DCT. Since the DCT basis is

orthogonal, it holds T−1DCT = TTDCT. We obtain

x = TTDCT · TDCT · x = TT

DCT · x .

Correspondingly, we can say that x represents our image block in terms of the basis (TDCT)T . As we can see in

the basis vector visualisation in Figure 5 (b), each entry in x influences many pixels in the corresponding x. If

we apply this transformation to natural images, many of the vector entries representing high-frequency content

will be close to zero. Setting as many coefficients to zero as possible allows them to be represented with few

bits in the entropy coding stage (Huffman compression). In photos, we can safely discard these coefficients

(set them to zero) without this being noticeable. However, compressing images containing diagrams or text

using JPEG reveals the block structure and high-frequency artefacts stemming from this quantization step

(see Figure 6).

2.3 Why is the DCT a good choice for the compression of natural signals?

We know that we can compute the DCT and the inverse DCT quickly (compared to a naıve matrix-vector

multiplication) in O(n log n). This is an important property for most practical applications. However, for

data compression, the question remains whether the DCT is actually good choice for decorrelating natural

data compared to other techniques.

Principal Component Analysis The mathematically optimal method to linearly decorrelate signals is to

project them onto their principal components. Principal components are computed using the so called Principal

Compontent Analysis (PCA). Without going into details, PCA is essentially an Eigenvector decomposition of

the covariance matrix

XTX = V ΛV T , where E(X) = 0 and Λ is a diagonal matrix of eigenvalues .

The constraint E(X) = 0 means that each data-dimension in X must be centred around zero. The new

orthogonal basis corresponds to the eigenvectors V = (v1, . . . , vN ).

Experiment To see what the basis we would obtain from a PCA would look like, we randomly sample

(here, 4096) 8× 8 pixel blocks from a natural image and compute the PCA basis. Results are depicted in

Figure 7. As we can see, most of the basis vectors obtained via PCA look similar to those in the 2D DCT

(Figure 5). Correspondingly, using the DCT basis is an excellent choice—it intrinsically corresponds to an

“optimal” decorrelation of natural images, while at the same time the transformation is computationally more

efficient compared to standard matrix-vector multiplication (projection).

5

(a) (b)

Figure 5: Standard versus DCT basis. (a) Visualisation of the standard basis for an 8 × 8 block. Each entry in x

corresponds to a single pixel. (b) Visualisation of the DCT basis. Each entry in x influences many pixels in the target

block.

Figure 6: Image compressed with various quality settings (high to low quality from left to right). The quality roughly

corresponds to the number of DCT coefficients that are set to zero. Close inspection of the images reveals the block

structure and high-frequency artefacts.

Input image Blocks (subset) PCA basis vectors

(a) (b) (c)

Figure 7: Computing a new basis for image decorrelation using Eigenvector decomposition/PCA. (a) Natural source

image. (b) Selection of some 8 × 8 blocks sampled from the input image. (c) Corresponding basis vectors computed

using Eigenvector decomposition, sorted by magnitude of the corresponding eigenvalue (left to right, top to bottom).

6

cs 371 { discrete fourier transformation: applicationssthorgei/guest_lecture_1.pdf · 2018. 7....

Documents