design and implementation of a fast fourier...

43
Design And Implementation of a Fast Fourier Transform Unit on a Xilinx FPGA 3rd Year Project Report completed in the School of Computer Science at The University of Manchester whilst studying MEng in Computer Systems Engineering Word Count: 11280 (WORDS IN TEXT ONLY, COUNTED USING TEXCOUNT) Author: Matthew AKERMAN Supervisor: Vasilis PAVLIDIS April 29, 2014

Upload: phamkhue

Post on 18-Apr-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

Design And Implementation of a Fast FourierTransform Unit on a Xilinx FPGA

3rd Year Project Report

completed in the

School of Computer Science at The University of Manchester

whilst studying

MEng in Computer Systems Engineering

Word Count: 11280 (WORDS IN TEXT ONLY, COUNTED USING TEXCOUNT)

Author:Matthew AKERMAN

Supervisor:Vasilis PAVLIDIS

April 29, 2014

Design And Implementation of a Fast FourierTransform Unit on a Xilinx FPGA

3rd Year Project Report

completed in the

School of Computer Science at The University of Manchester

whilst studying

MEng in Computer Systems Engineering

April 29, 2014

Abstract

The Fast Fourier Transform (FFT) is important in many domains, including signal processing and imageprocessing. It is used by audio codecs such as MP3 to help achieve high quality audio at relatively low bitrates, and allows complex filters to be applied to images very quickly. This report explains how Fouriertransforms work and what it is they do. For digital applications a Discrete Fourier Transform (DFT) is used,but this is a costly operation. This report details a few ways to produce the result of the DFT whilst usingfar less operations.

The report also details the work done over the duration of the project. The initial plan was to design,implement, verify and benchmark an FFT processor on a Xilinx FPGA. However, this did not work out asexpected. Several approaches were explored to produce a working FFT processor, though ultimately theseapproaches did not lead to a working processor. Once producing a FFT processor became an infeasiblegoal, it was decided that a validation and benchmarking suite for FFT algorithms should be produced in theremaining time along with some implementations of FFT algorithms. Some screenshots and results obtainedby using the application and algorithms are presented at the end with a summary of the implementationdetails for the FFT software tools.

1

Acknowledgements

First and foremost, I would like to thank my project supervisor Vasilis Pavlidis. His advice has helped methroughout the duration of this project, and he always made time to answer queries quickly. His patience,

mentoring and humour have been an invaluable resource in helping me to complete this project.

My thanks also go to my parents and supportive family members who have been a rock for the duration of mystudies so far, encouraging me through stressful times and good times in equal measure. Also, for providing

me with home comforts and a quiet place to write this report.

I would like to thank my partner Alexandra for supporting me through the year, providing laughter and forgiving me valuable feedback on many aspects of the project including - but not limited to - the way I delivered

my seminar, the way the final deliverable looked and the way the final report was structured.

Thank you to Jim Garside for taking time out of his busy schedule at a moment’s notice to help me tounderstand more about the operation of Fast Fourier Transforms, and for providing me with one of the best

written resources on the FFT that I have managed to find.

Finally, I would like to thank my friends both here in Manchester and back home for their emotional supportthey have given to me throughout the rigours of this degree.

2

CONTENTS CONTENTS

Contents

1 Introduction 61.1 Representing a Wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.1.1 Time Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.1.2 Frequency Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Digital Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Context 82.1 Astronomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Geology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Audio Codecs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.1 Overview of MP3 Codec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4.2 Psychoacoustic Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4.3 Human Hearing Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4.4 Frequency Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Fourier Transforms 133.1 Circular Motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 Representing Circular Motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.2 Combining Circular Motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Continuous Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.3 Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3.1 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 FFT Algorithms 174.1 Cooley-Tukey Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1.1 Radix 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.1.2 Radix 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.1.3 Split Radix Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2 Goertzel Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5 Initial Project Outline 225.1 Dedicated FFT Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.2 Existing Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.3 Attempting To Design A FFT Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.3.1 Designing For RTL Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.3.2 Converting C to RTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.4 Alternative Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.4.1 Xilinx Core Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.4.2 Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6 Final Implementation Details 276.1 Purpose of Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.1.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276.1.2 Verification Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286.1.3 Benchmarking Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296.2.1 DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296.2.2 Recursive Radix-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306.2.3 Iterative Radix-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.3 Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3

LIST OF FIGURES LIST OF FIGURES

6.3.1 FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326.3.2 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336.3.3 Running Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.3.4 Validating Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.3.5 Representing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

7 Results 36

8 Conclusions 39

References 40

List of Figures

1 1KHz signal in the time domain[1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 1KHz signal in the frequency domain[1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 An example of how a wave is sampled to produce a digital representation[2] . . . . . . . . . . 74 An example of how Fourier transforms can be used to isolate background noise[3] . . . . . . . 85 Bit rate of CD quality audio and size of a 3 minute CD audio file . . . . . . . . . . . . . . . . 96 File sizes for 3 minutes of MP3 audio at different bit rates . . . . . . . . . . . . . . . . . . . . 107 Comparison of the sizes of MP3 and CD audio file with comparable quality . . . . . . . . . . 108 The non-linear hearing threshold for humans[4] . . . . . . . . . . . . . . . . . . . . . . . . . 119 Frequency masking in human hearing[4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1210 Diagrams to explain how circular motion can be represented[5] . . . . . . . . . . . . . . . . . 1311 Adding circular motions together to form a new signal[5] . . . . . . . . . . . . . . . . . . . . 1412 Comparison of the effects of using rectangular and Hanning windowing[6]. The top signal in

both images represents the original signal, the middle line represents the windowing functionand the third line represents the signal after the windowing function has been applied . . . . . 15

13 Frequencies bin ranges with a transform length of N[7] . . . . . . . . . . . . . . . . . . . . . 1614 Graph illustrating the Hermitian Symmetries in a real valued signal[8] . . . . . . . . . . . . . 1615 Radix-2 butterfly diagram for an 8 point FFT[9] . . . . . . . . . . . . . . . . . . . . . . . . . 1916 Radix-4 butterfly diagram[10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2017 Frequency map for a telephone number pad. Each number generates a unique tone combining

2 frequencies[11] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2118 Components on the logic board of a typical MP3 player[12] . . . . . . . . . . . . . . . . . . . 2219 Test suite structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2420 Space delimited paramaters, which are stored in a file and used to generate input . . . . . . . . 2521 Using the Core Generator to generate FFT processors . . . . . . . . . . . . . . . . . . . . . . 2622 Three software classes used to manage and model FFTs . . . . . . . . . . . . . . . . . . . . . 3223 The ProgramPanel class and three examples of concrete subclasses . . . . . . . . . . . . . . . 3324 The Observer pattern: ValidationObserver subscribes to updates from RunValidation, and when-

ever RunValidation posts an update via the notifyObservers() method this is passed on to Run-Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

25 A view of the main classes involved with validating a FFT . . . . . . . . . . . . . . . . . . . 3526 An overview of the application running, showing the validation harness code pane . . . . . . . 3627 A pie chart displays the number of test passes/failures . . . . . . . . . . . . . . . . . . . . . . 3728 Graphs obtained using an input with a small transform length . . . . . . . . . . . . . . . . . . 3729 Running times of algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4

LIST OF ALGORITHMS LIST OF ALGORITHMS

List of Algorithms

1 DFT[13] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Recursive Radix-2[14] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Iterative Radix-2[15] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5

1. INTRODUCTION LIST OF ALGORITHMS

1 Introduction

Fourier transforms are important to modern society. They are widely used in a diverse range of fields includingaudio codecs, image processing, identifying geological events and even making sense of satellite imagingdata.

Fourier transforms are used to convert between different representations of a signal. Specifically, a Fouriertransform converts a time/space domain representation of a wave to a frequency domain representation, whilstan Inverse Fourier transform performs the opposite transformation. This section of the report will explain somebasic terminology associated with Fourier transforms before Chapter 2 details exactly how Fourier transformsare used in some of the fields mentioned above.

Chapter 3 reveals some of the relevant inner workings of Fourier transforms, focussing in particular on a versionof the Fourier transforms which is designed to work with discrete values: the Discrete Fourier Transform(DFT). Chapter 4 then moves on to detail numerous Fast Fourier Transform (FFT) algorithms which are lesscomputationally expensive than the DFT but are still able to compute the same results.

Initially the project goal was to design, implement, verify and benchmark a Fast Fourier Transform hardwareunit. Chapter 5 details the work done whilst attempting to achieve this. Unfortunately the project proved to betoo difficult and an alternative project based on Fourier transforms was started late on.

The alternative project was to design and implement software tools for verifying and benchmarking FFT algo-rithms, and to code several FFT algorithms for use with the software. Design and implementation details of theapplication and algorithms are given in Chapter 6, whilst some results obtained using the algorithms with thesoftware are presented in Chapter 7.

1.1 Representing a Wave

When representing a wave, it is convenient to do so graphically. Two domains used when measuring a waveare time and frequency. Independent variables include milliseconds for measuring time and KHz for measuringfrequency. The dependent variable in waveform graphs almost always represents the displacement of a wavefrom its mean value, otherwise known as the amplitude of a wave. Amplitude is usually measured in decibels(air pressure) for analogue sound and voltage levels for digital sound which could, for example, have beencaptured by a microphone. The units of the amplitude differ depending on what is being measured[1].

1.1.1 Time Domain

A wave can be represented by measuring amplitude with respect to time. In this representation, the amplitudeis continuously plotted as time progresses, as illustrated in Figure 1.

Figure 1: 1KHz signal in the time domain[1]

6

1. INTRODUCTION LIST OF ALGORITHMS

1.1.2 Frequency Domain

A complex wave can be seen to be made up of many simpler waves summed together. This theory supportingthis concept is explored in more detail in Section 3.1, but to understand the context of the FFT it is sufficientto know what the Fourier transform does rather than how it does it. In this representation, the amount whichvarious frequencies contribute to the overall signal is measured. This is called a frequency spectrum, and anexample is shown in Figure 2.

Figure 2: 1KHz signal in the frequency domain[1]

1.2 Digital Representations

So far, it has been assumed that signals are analogue. However, digital devices must use discrete representations.Consequently it is necessary to convert information to a digital form1. The basic premise is that an analoguesignal’s amplitude is measured at regular intervals in time. Each measurement is called a sample. For eachsample, the amplitude is recorded and rounded to the nearest value which can be represented using a givennumber of bits[16].

Figure 3: An example of how a wave is sampled to produce a digital representation[2]

There are various issues to consider when sampling a wave. For example, Nyquist’s theorem states that thesampling frequency should be twice as high as the maximum frequency contained within the signal to preventaliasing2[17]. Choosing an appropriate number of bits and an appropriate way to use them is important. Using

1This process is usually handled by an analogue-to-digital converter (ADC), whilst the reverse is handled by a digital-to-analogueconverter (DAC)

2Aliasing occurs when a signal is not sampled frequently enough, and so the new representation corresponds to a completelydifferent signal

7

2. CONTEXT LIST OF ALGORITHMS

16 bits could provide 16 unique values for measuring amplitude (linear quantisation), whilst more sophisticatedquantisation schemes allow for more precision with the same number of bits.

These issues affect the usefulness of data produced when applying a Fourier transform, but not the opera-tion of the Fourier transform itself. Excellent resources for more detailed information on this subject are [7]and[18].

2 Context

Both the time domain and frequency domain are useful. It is convenient to use the time domain when convertinginformation between analogue and digital forms. However, the frequency domain representation can provideuseful information about a wave. It is often necessary to convert between domains to get the best of bothrepresentations, and Fourier transforms are often used to perform this conversion.

Fourier transforms are rarely used in isolation. Rather, they are one of many Digital Signal Processing (DSP)techniques used to manipulate signals in some useful way. In this section, a selection of uses for Fourier trans-forms in the fields of geology, astronomy and communications are shown before the use of Fourier transformsin audio codecs is explored in more detail.

2.1 Astronomy

Fourier transforms are commonly used in astronomy to identify unusual patterns contained within the largeamount of satellite imaging data collected every day. It is impractical to analyse all of this data manually be-cause only a very small part of the data will prove to be useful. Furthermore, the signals usually contain back-ground noise. Fourier transforms allow astronomers to isolate background noise so that interesting informationstands out. For a given frequency spectrum, frequencies with large amplitudes can often indicate that usefulinformation is present so that astronomers can focus on it, and a simple example is given in Figure 4.

Figure 4: An example of how Fourier transforms can be used to isolate background noise[3]

8

2. CONTEXT LIST OF ALGORITHMS

2.2 Geology

Fourier transforms have a number of uses in the field of Geology. One source of information[19] stated that oneof the initial uses of Fourier transforms in this field was to determine whether measurements corresponded to anatural seismic event or a nuclear test explosion. As these produce different frequency spectra, it is possible todetermine what type of event has occurred by using a Fourier transform and comparing the frequency spectrumof an event against the spectra of known nuclear and geological events. Many further uses are detailed by DePaor[20].

2.3 Communications

Communications standards including mobile WiMAX (IEEE 802.16e) and various wireless LAN standards(IEEE 802.11a, g, n and ac) use orthogonal frequency-division multiplexing (OFDM) to achieve their perfor-mance goals[21, 22]. OFDM is a specialised form of frequency-division multiplexing (FDM)3 which ensuresthat subchannels are independent and do not interfere with each other[23]. Fourier transforms are used tomodulate and demodulate signals in networks which use OFDM.

Another use of Fourier transforms in communications concerns dialling tones. When dialling a telephonenumber, each key on the keypad corresponds to a particular tone. By applying a Fourier transform and findingthe key whose frequency spectrum is most similar to the received frequency spectrum, the receiver is able todetermine which keys have been pressed[24].

2.4 Audio Codecs

Audio codecs play an important role in the digital world by drastically reducing the bit rate of audio files.Humans can hear frequencies as high as 20KHz, so the sampling frequency of audio should be at least 40KHzaccording to Nyquist’s theorem[17]. As shown in Figure 5 CD quality audio uses two channels, a samplingrate of 44.1KHz and 16 bits per sample. This results in a bit rate of 1411.2Kbits/s, so a 3 minute CD audio filewould be 31.8MB in size.

CD BIT RATE = 44100/s×16bits×2 = 1411200bits/s

= 1411.2Kbits/s

3 MIN FILE SIZE = 1411.2Kbits/s×180s = 254016Kbits

≈ 31.8MB

Figure 5: Bit rate of CD quality audio and size of a 3 minute CD audio file

This is impractical for several reasons. For example, a lower bit rate is desirable when streaming audio over anetwork where bandwidth is a finite resource. Small file sizes are desirable when building a music library sothat audio files do not occupy too much space. This is of particular importance for portable devices such asmobile phones where storage is limited.

Audio codecs aim to reduce the bit rate of audio files whilst preserving quality. One of the most commonlyused audio codecs is MP3 and it uses Fourier transforms to help it achieve a reduction in bit rate.

3FDM: A technique used to split the total bandwidth of a communications channel across multiple subchannels

9

2. CONTEXT LIST OF ALGORITHMS

2.4.1 Overview of MP3 Codec

MP3 is a lossy codec: that is, part of the bit rate reduction which MP3 achieves is due to information beingdiscarded. This does not necessarily have a noticeable impact on the quality of the audio as far as humanhearing is concerned, but the original wave gets fundamentally altered. There are several bit rates available,ranging from 96Kbits/s to 320Kbits/s. The amount of information which is discarded varies depending on thebit rate and, consequently, higher bit rates are able to to more closely represent the original signal. Using severalcommon MP3 bit rates, file sizes are given in Figure 6 for a 3 minute audio file to illustrate the varying amountsof compression which can be achieved.

3 MIN FILE SIZE @ 96KBITS/S = 96Kbits/s×180s = 17280Kbits

≈ 2.2MB

3 MIN FILE SIZE @ 128KBITS/S = 128Kbits/s×180s = 23040Kbits

≈ 2.9MB

3 MIN FILE SIZE @ 160KBITS/S = 160Kbits/s×180s = 28800Kbits

≈ 3.6MB

3 MIN FILE SIZE @ 320KBITS/S = 160Kbits/s×180s = 57600Kbits

≈ 7.2MB

Figure 6: File sizes for 3 minutes of MP3 audio at different bit rates

A bit rate of 128Kbits/s is roughly equivalent to radio quality audio, whilst 160Kbits/s is widely believed to beequivalent to CD quality audio[25]. As a result, a three minute MP3 version of a recording is almost 9 timessmaller than the CD audio file with comparable quality, as shown in Figure 7.

MP3 (160KBIT/S) = 3.6MBCD (1411.2KBIT/S) = 31.8MB

31.8MB3.6MB = 8.83

Figure 7: Comparison of the sizes of MP3 and CD audio file with comparable quality

10

2. CONTEXT LIST OF ALGORITHMS

2.4.2 Psychoacoustic Modelling

Whilst some compression is achieved using lossless techniques such as run-length encoding4, a larger amountof space is saved by selectively discarding information about the signal. By using various properties of humanhearing, psychoacoustic modelling techniques discard information which humans are least sensitive to in orderto preserve the perceived quality of the audio when heard by a human.

Audio is often stored using a time domain representation to allow for easy conversion between analogue anddigital forms so that it can be played back, whilst psychoacoustic modelling techniques typically require afrequency domain representation. The MP3 codec uses a Fourier transform before applying psychoacousticmodelling techniques and an inverse transform after the techniques have been applied. Two important psychoa-coustic modelling techniques are detailed below.

2.4.3 Human Hearing Threshold

An average human can hear frequencies between 20Hz and 20KHz. However, the way in which humans areable to interpret sound is not linear: humans are more sensitive to some frequencies than others, and a graphillustrating this threshold is shown in Figure 8.

Figure 8: The non-linear hearing threshold for humans[4]

Discarding information which falls beneath this threshold allows for a reduction in file sizes without affectingthe perceivable quality of the audio at all. In this case, it is clear that a frequency domain representation isrequired in order to apply the model.

4Run-length encoding - Sequences of identical data values are stored as a data value and a count, which saves space when longsequences of identical data values are common (typically zeros)

11

2. CONTEXT LIST OF ALGORITHMS

2.4.4 Frequency Masking

Humans are not equally sensitive to all frequencies. Rather, humans are sensitive to groups of frequenciesknown as bark bands. Importantly, groups of frequencies can interfere with adjacent frequencies. For example,Figure 9 shows that a 4KHz signal with amplitude 60dB is capable of masking out a 6KHz signal with amplitude30dB and a 12KHz signal with amplitude just under 20dB such that the latter two signals can not be heard atall.

Figure 9: Frequency masking in human hearing[4]

Frequency masking can be used to select which information to discard. By discarding information about fre-quencies which would be masked out by others it is possible to reduce file sizes significantly. Once again, thismodel clearly requires a frequency domain representation and a Fourier transform.

So far, it has been assumed that only information which humans would not hear is discarded. This is largely trueof the 320Kbit/s MP3 codec, but more information is thrown away when lower bit rates are used. Psychoacous-tic models can be used to discard information which humans would be least sensitive to, such as informationabout frequencies with amplitudes just above the hearing threshold.

12

3. FOURIER TRANSFORMS LIST OF ALGORITHMS

3 Fourier Transforms

Fourier transforms exploit the fact that any signal can be formed by combining simple repeating cycles together.A Fourier transform takes a time domain representation of a wave as input, breaks it down into separate repeat-ing cycles and calculates the amount of each repeating cycle that is present. This representation is referred toas a frequency spectrum.

In this section, circular motion and continuous Fourier transforms are explained. However, digital devices mustwork with discrete data. Consequently, the Discrete Fourier Transform (DFT) is explained as a starting pointfor calculating Fourier transforms on digital devices.

3.1 Circular Motions

Circular motions, or a repeating cycles, are simple signals. These simple signals maintain a constant amplitude,phase angle and frequency and any signal can be formed by combining circular motions.

3.1.1 Representing Circular Motions

As depicted in Figure 10a, a repeating cycle can be represented using a circular path with the radius representingthe amplitude of a signal, the starting point of the circle representing the phase angle of the signal and thenumber of times per second which the circle is drawn representing the frequency[5].

At any point in time the state of a repeating cycle can be represented by describing the cycle’s position ona circle’s circumference. This is done using complex numbers5, as depicted in Figure 10b. The position iscalculated using basic trigonometry, as visualised by drawing a right angled triangle between the centre of thecircle and the position on the circumference.

(a) Illustration of a circular path (b) Representing a position on the path

Figure 10: Diagrams to explain how circular motion can be represented[5]

The Euler formula (shown in Equation 1) is a conventional way to write a circular motion position concisely.A proof is beyond the scope of the report, but a good resource is available at [27]. When x = π, a useful specialcase of the Euler formula is obtained. As shown in Equation 2, the cosine term equates to -1 and the sine termequates to 0. This relation holds for integer multiples of πi.

eix = cos(x)+ i sin(x) (1)

eiπ =−1 (2)

5A complex number has a real component and an imaginary component and is commonly written in the form z = x+ yi, where xand y are real numbers and i2 =−1. [26] is a good resource for more information

13

3. FOURIER TRANSFORMS LIST OF ALGORITHMS

3.1.2 Combining Circular Motions

By combining circular motions, any signal can be formed. Figure 11a shows a 1KHz signal with amplitude 1and a 2KHz signal with amplitude 2. The signal shown in Figure 11b is clearly the result of adding these twowaves together.

(a) Two repeating cycles individually (b) Two repeated cycles added together

Figure 11: Adding circular motions together to form a new signal[5]

This process can be used to build any signal. A Fourier transform reverses this process and instead of buildinga complex signal from simple circular motions, it takes a complex signal and attempts to break it down into thesimple circular motions from which it is formed.

3.2 Continuous Fourier Transform

A proof of the mathematics behind Fourier transform formulae is beyond the scope of the report. However, itforms the basis of discrete versions of the algorithm. The convention adopted for this report uses k to denotefrequency, x to denote time, F(k) to denote the amplitude of frequency k and f (x) to denote the amplitude attime x. Equations 3 and 4 show the equations for the forward Fourier transform and inverse Fourier transformrespectively[5]. As the maths supporting both transforms is quite similar only forward Fourier transforms areanalysed in the rest of this report.

Amplitude of frequency k︷︸︸︷F(k) =

Sum of contributions from all samples︷ ︸︸ ︷∫∞

−∞

f (x)e−2πikx dx (3)

Amplitude at time x︷︸︸︷f (x) =

Sum of contributions from all frequencies︷ ︸︸ ︷∫∞

−∞

F(k)e2πikx dk (4)

3.3 Discrete Fourier Transform

The DFT calculates the product of the continuous Fourier transform in a manner which can be implemented bydigital devices. Both the time between samples and the amplitude of a given sample are discrete values with theDFT. The DFT takes N complex time domain samples x0,x1,x2 . . .xN−1 as input and produces N frequency binsX0,X1,X2 . . .XK−1 as output. The DFT operates on a window of time domain samples, producing a frequencyspectrum for the signal over a particular window of time. So far in the report, descriptions of signals havelargely been referred to as periodic6. In reality, many signals change over time.

Speech and music signals are good examples of this. Whilst these signals do change over the course of a secnd,they do not change significantly over very short periods of time. By choosing an appropriate sampling frequencyνs and transform length N it is possible to provide input to the DFT which changes very little within the window.

6A periodic signal is one that repeats at regular intervals of time, such as a sinusoid

14

3. FOURIER TRANSFORMS LIST OF ALGORITHMS

Rectangular windows treat all input samples equally but more complicated windowing schemes can be used toscale samples near the start and end of a window. By scaling samples near the edges of a window the effectsof a signal changing over time, otherwise known as leakage, can be mitigated. More detailed explanations ofwindowing functions and their effects on Fourier transforms are given by references [28, 29], whilst diagramsto illustrate the difference between two windowing functions can be found in Figure 12. Note that in Figure 12athe signal is unaltered during the period of a rectangular window and values outside this window are zeroed,whilst in Figure 12b the signal is scaled towards the start and end of the Hanning window.

(a) Rectangular window (b) Hanning window

Figure 12: Comparison of the effects of using rectangular and Hanning windowing[6]. The top signal in bothimages represents the original signal, the middle line represents the windowing function and the third line

represents the signal after the windowing function has been applied

Each frequency bin corresponds to a range of frequencies. This range is referred to as the resolution (or binwidth) νr. The relationship between the sampling frequency and transform length is shown in equation 5[30].Larger transform lengths result in a smaller bin width for each frequency bin, and this can be useful in somecases. For example, frequencies separated by a small value may be of particular interest when applying aFourier transform. With a large bin width then they may be grouped together, preventing any useful informationabout the individual frequencies from being extracted. A smaller bin width may allow these frequencies to bedistinguished individually, whereas these frequencies would fall into the same frequency bin with a larger binwidth.

νr =νNyquist

N(5)

Using smaller bin widths can be a good idea, but if the sampling frequency remains constant then the betterresolution comes at the expense of time. Firstly, a signal may change rapidly over time and a large transformlength may introduce excessive leakage as the signal changes, despite offering what appears to be a superiorresolution. Furthermore, having a larger transform length means that the Fourier transform cannot be performeduntil all of the input samples are ready which can result in a longer delay between transforms. The transformlength and sampling frequency are important factors when using Fourier transforms and need to be chosencarefully according to the signals they will be applied to.

15

3. FOURIER TRANSFORMS LIST OF ALGORITHMS

X0 contains frequencies in the range 0 ≤ ν < νr, whilst X1 contains frequencies in the range νr ≤ ν < νr× 2.This pattern repeats for the rest of the frequency bins, as shown in Figure 13. As the maximum frequency whichcan be represented is the Nyquist frequency the output samples of interest are X0,X1 . . .XN

2. Values in higher

frequency bins are not strictly necessary for most purposes because they are above the maximum frequencywhich can be represented. Some Fourier transforms choose to filter these results out or simply not calculatethem at all for this reason.

X0 :0≤ ν < νr

X1 :νr ≤ ν < νr×2

X2 :νr×2≤ ν < νr×3

. . .

XN2 −1 :νr× (

N2−1)≤ ν < νNyquist

. . .

XN−1 :νr×N ≤ ν < νs

Figure 13: Frequencies bin ranges with a transform length of N[7]

The forward DFT can be written as presented in Equation 6[31]. This equation is relatively simple to implementin digital devices.

X [k] =N−1

∑n=0

x[n] · e−2πikn

N 0≤ k ≤ N−1 (6)

The DFT computes the relative strength of repeating cycles within the data. In many cases the input to a DFTwill be entirely real, with an imaginary component of zero. If every xn is real valued, then the relation displayedin Equation 77 holds true because of Hermitian symmetry[32].

X [N− k] = X [k] (7)

Consequently, X [0] contains a real value when the input data (x0 . . .xN−1) is also real. Furthermore, two peaksoccur for all other values in a frequency spectrum, resulting in symmetry between elements X [1],X [2] . . .X [N−1]. This is clearly illustrated in Figure 14 where 50 samples are provided as input to the DFT. The blue linerepresents imaginary coefficients, the red line represents real coefficients and the green line is the complexmodulus8.

Figure 14: Graph illustrating the Hermitian Symmetries in a real valued signal[8]

7Every value of X is a complex number. z denotes a complex conjugate (where z = x+ yi and z = x− yi both hold true).8The complex modulus is where z =

√x2 +(yi)2. The complex modulus is one proof that complex conjugates are equal

16

4. FFT ALGORITHMS LIST OF ALGORITHMS

3.3.1 Complexity

Whilst relatively simple to write code to solve, the DFT is computationally expensive. For each X [k], N sumsare computed. As there are N frequency bins to calculate, the complexity9 of the DFT is O(N2). This meansthat the algorithm scales poorly as N grows. The computational complexity makes it a poor choice for real timeapplications and for mobile devices, where battery power is finite.

4 FFT Algorithms

Due to the fact that the DFT is computationally expensive, faster methods of calculating the same result havebeen discovered. The term FFT does not refer to one specific equation or algorithm. Instead, it refers to afamily of algorithms which are able to efficiently compute the product of the DFT. Some of these algorithmscompute exactly the same solution in all cases, such as the Cooley-Tukey algorithm. Some algorithms estimatethe result, and others work in special cases. Two approaches are detailed in this report: the Cooley-Tukeyalgorithm and the Goertzel Algorithm.

4.1 Cooley-Tukey Algorithm

The Cooley-Tukey algorithm, detailed in a report first published in 1965[33], computes the same product as aDFT but is far more computationally efficient. It exploits the Danielson-Lanczos lemma[34], which recursivelysplits one DFT smaller DFTs. The Cooley-Tukey algorithm takes this further and allows for a number ofdifferent ways to solve one DFT by splitting it into smaller parts. Two common examples are Radix-2 andRadix-4, as detailed in this section.

4.1.1 Radix 2

The Radix-2 algorithm is a commonly used version of the general Cooley-Tukey algorithm. It recursively splitsa DFT calculation of size N into two DFTs of size N

2 (it follows that the transform length must be a power oftwo in order for this to work). Radix based algorithms exploits symmetry found in the DFT, and different baseswork effectively for different sizes of input. The Radix-2 approach requires the input to be a size which powerof 2, whilst Radix-4 requires the input to be a size which is a power of 4. One part of the symmetry exploitedby Radix-2 is shown in equations 8 to 12[31].

X [N + k] =N−1

∑n=0

x[n] · e−2πin(N+k)

N (8)

=N−1

∑n=0

x[n] · e−2πikn

N · e2πinN

N (9)

=N−1

∑n=0

x[n] · e−2πikn

N · e2πin, e2πin = 1 (10)

=N−1

∑n=0

x[n] · e−2πikn

N (11)

= X [k] (12)

9Complexity refers to the rate at which the difficulty of solving a problem increases as the input increases. Integer coefficients donot matter: for example, two functions which require N2 and 3N2 operations respectively are in the same complexity class. Radix basedFFT algorithms all generally belong to the same complexity class, ON logN

17

4. FFT ALGORITHMS LIST OF ALGORITHMS

By extension, the following symmetry property holds:

X [k+aN] = X [k], a ∈ N (13)

Recall the DFT equation originally written in equation 6:

X [k] =N

∑n=0

x[n] · e−2πikn

N

This equation can be rewritten as the sum of even and odd components. After being rearranged in this waythe complexity remains O(n2), because both the even and odd components require N

2 ×N computations. Equa-tions 14 to 16 show how it can be rewritten, resulting in a DFT of the even indexes Ek and a DFT of the oddindexes Ok.

X [k] =

Even indexes︷ ︸︸ ︷N2 −1

∑m=0

x[2m] · e−2πik(2m)

N +

Odd indexes︷ ︸︸ ︷N2 −1

∑m=0

x[2m+1] · e−2πik(2m+1)

N (14)

=

N2 −1

∑m=0

x[2m] · e−4πikm

N +

N2 −1

∑m=0

x[2m+1] · e−4πikm

N · e−2πik

N (15)

=

N2 −1

∑m=0

x[2m] · e− 2πikm

N2 + e−

2πikN ·

N2 −1

∑m=0

x[2m+1] · e− 2πikm

N2 (16)

= Ek + e−2πik

N ·Ok (17)

The term which each odd DFT is multiplied by is referred to as the twiddle factor. An interesting property ofthe twiddle factor is that some symmetry is also present when 0≤ k < N

2 , as shown in equations 18 to 21.

e−2πin(k+ N

2 )

N = e−2πink

N ·−2πinN

2N (18)

= e−2πink

N ·−πin (19)

= e−2πink

N · e−πin e−πin =−1 (20)

=−e−2πink

N (21)

When the input is entirely real Ek and Ok have symmetry for 0≤ k < N2 , as shown in equations 22 and 23.

Ek+N2= Ek (22)

Ok+N2= Ok (23)

18

4. FFT ALGORITHMS LIST OF ALGORITHMS

To exploit these symmetries when input data is entirely real the DFT can be rewritten as two separate equa-tions, 24 and 25, which are true for 0≤ k < N

2 .

X [k] = Ek + e−2πink

N ·Ok (24)

X [k+N2] = Ek− e−

2πinkN ·Ok (25)

The classic Radix-2 algorithm recursively expresses an N point DFT as the sum of two N2 sized DFTs[9],

which is a simple operation10. By reusing values which have already been calculated the number of operationsrequired can be reduced drastically. This process is often referred to as a butterfly, and an illustration of a simplebutterfly diagram is given in Figure 15. In this diagram, an 8 point DFT is split into two 4 point DFTs, and theresults are combined at the end. The number of complex multiplications required at each stage is N, and thenumber of stages is log2 N. This gives a complexity of O(N log2 N), representing a significant improvement onthe complexity of the DFT.

Figure 15: Radix-2 butterfly diagram for an 8 point FFT[9]

The way that the N2 -point DFTs are computed is not important. They could resolve the DFT directly at this

point, they could continue to split the DFT recursively into two N2 DFTs until a base case is reached and they

could use entirely different algorithms. In some cases combining different algorithms to compute a Fouriertransform can be useful. A split-radix algorithm using Radix-2 and Radix-4 is detailed later on, and this iswidely believed to be one of the fastest FFTs[35].

10A Radix-M algorithm, with input size N (where N is a power of M), follows this pattern: it recursively splits the DFT into M DFTsof size N

M . As M increases, the complexity increases drastically whilst the speed increased gained by using a higher radix decreasesrapidly

19

4. FFT ALGORITHMS LIST OF ALGORITHMS

4.1.2 Radix 4

To illustrate how other Radix algorithms can be derived from the generic DFT algorithm, a brief explanation ofthe Radix-4 case is presented here. This algorithm works well when the size of the input is a power of four andoffers up to a 20% speed increase over the Radix-2 case for input in these cases[36]. Increasing the radix abovethis figure starts to yield diminishing returns in speed at a large complexity cost: after examining the Radix-4case it should be clear that it is more complex than the Radix-2 case.

This algorithm splits an N point DFT into four N4 DFTs by using indexes 4m, 4m+1, 4m+2 and 4m+3 instead

of 2n and 2m+1, as shown in equations 26 to 28.

X [k] =

N4 −1

∑m=0

x[4m] · e−2πik(4m)

N +

N4 −1

∑m=0

x[4m+1] · e−2πik(4m+1)

N +

N4 −1

∑m=0

x[4m+2] · e−2πik(4m+2)

N

+

N4 −1

∑m=0

x[4m+3] · e−2πik(4m+3)

N (26)

=

N4 −1

∑m=0

x[4m] · e−8πikm

N +

N4 −1

∑m=0

x[4m+1] · e−8πikm

N · e−2πik

N +

N4 −1

∑m=0

x[4m+2] · e−8πikm

N · e−4πik

N

+

N4 −1

∑m=0

x[4m+3] · e−8πikm

N · e−6πik

N (27)

=

N4 −1

∑m=0

x[4m] · e− 2πikm

N4 + e−

2πikN ·

N4 −1

∑m=0

x[4m+1] · e− 2πikm

N4 + e−

4πikN ·

N4 −1

∑m=0

x[4m+2] · e− 2πikm

N4

+ e−6πik

N ·N4 −1

∑m=0

x[4m+3] · e− 2πikm

N4 (28)

Figure 16 shows how the butterfly may look for a Radix-4 algorithm. Whilst this is considerably more complexthan a Radix-2 butterfly it offers 20% more performance. Consequently the speed boost is often worth theadded design complexity.

Figure 16: Radix-4 butterfly diagram[10]

20

4. FFT ALGORITHMS LIST OF ALGORITHMS

4.1.3 Split Radix Algorithms

Split radix algorithms can provide an even more efficient implementation, with one of the lowest recordedoperation counts of any Fourier transform[35]. The classic split-radix algorithm splits a DFT of length N intoa DFT of the even indexes and two DFTs of the odd indexes: that is, the even indexes are calculated using aRadix-2 approach whilst the odd indexes are calculated using a Radix-4 approach. Equations 29 to 31 showhow the split radix algorithm is formed.

X [k] =

N2 −1

∑m=0

x[2m] · e−2πik(2m)

N +

N4 −1

∑m=0

x[4m+1] · e−2πik(4m+1)

N +

N4 −1

∑m=0

x[4m+3] · e−2πik(4m+3)

N (29)

=

N2 −1

∑m=0

x[2m] · e−4πikm

N +

N4 −1

∑m=0

x[4m+1] · e−8πikm

N · e−2πik

N +

N4 −1

∑m=0

x[4m+3] · e−8πikm

N · e−6πik

N (30)

=

N2 −1

∑m=0

x[2m] · e− 2πikm

N2 + e−

2πikN ·

N4 −1

∑m=0

x[4m+1] · e− 2πikm

N4 + e−

6πikN ·

N4 −1

∑m=0

x[4m+3] · e− 2πikm

N4 (31)

Some further improvements have been made to this approach, and both the original split radix algorithm andthe improved version are detailed by Johnson and Frigo[35].

4.2 Goertzel Algorithm

Whereas radix algorithms compute the same result as the DFT with a smaller order of complexity, the Goertzelalgorithm is optimised to calculate the amplitudes of only a few frequencies which are known in advance. Oneof the most common uses of the Goertzel algorithm is for detecting button presses on a telephone keypad. It isused to detect keys when dialling a number and is also used by services such as telephone banking[11].

The Goertzel algorithm relies on a small number of frequencies being used. Figure 17 shows the grid-likelayout of a telephone keypad along with some frequencies. Each row and column of the grid corresponds toa frequency. The frequencies are selected such that they do not interfere with each other. When a button ispressed, the frequencies of the row and column are combined to form a unique tone. As each key pressedcorresponds to a different tone the button presses can be matched by a receiver.

↓ Frequencies→ 1209Hz 1336Hz 1477Hz697Hz 1 2 3770Hz 4 5 6852Hz 7 8 9941Hz * 0 #

Figure 17: Frequency map for a telephone number pad. Each number generates a unique tone combining 2frequencies[11]

If used as a general purpose solution to the DFT the Goertzel algorithm is in the complexity class O(N2)[36].However, the Goertzel algorithm is faster than a general purpose FFT algorithm when M < log2 N, where M isthe the number of frequencies to be matched and N is the transform length[37]11. The details of the Goertzelalgorithm are beyond the scope of this report but its brief inclusion is intended to show an interesting alternativeto solving the entire DFT. The sources cited in this section provide varying amounts of deeper insight into theinner workings of the Goertzel algorithm.

11log2 N is the number of stages in a FFT comptuation

21

5. INITIAL PROJECT OUTLINE LIST OF ALGORITHMS

5 Initial Project Outline

Initially, the aim of the project was to create a Fast Fourier Transform hardware unit and to synthesise it on aField Programmable Gate Array (FPGA)12. Whilst this ultimately proved to be too difficult, a lot of work wascarried out on this and a brief summary of this work, including research, is detailed in this section.

5.1 Dedicated FFT Hardware

FFT algorithms can be implemented using a programming language and compiled to run on a general purposeapplication processor. For some purposes this is sufficient and desirable, such as in large software applications.One obvious benefit is that several FFT algorithms could be compiled and the most appropriate one selectedat run time after analysing properties of the data, such as the transform length. An FFT may also be requiredto run on hardware which doesn’t have a dedicated FFT processor in which case using a software FFT maybe the only option. Furthermore, swapping algorithms at run time would not be possible in dedicated FFThardware.

On the other hand, dedicated FFT hardware is desirable when there are real time constraints to be met[18]. Aprocessor designed specifically to compute FFTs would be able to guarantee that results are available withina specific time period whereas a FFT algorithm running on a general purpose Central Processing Unit (CPU)may have to compete with other software processes for resources. A well designed FFT processor is also likelyto be more power efficient when computing FFTs than a general application processor which is particularlyuseful in mobile devices[18].

Dedicated FFT hardware is usually included as part of a DSP processor[16]. As the FFT is a commonly usedDSP algorithm an FFT processor is often a critical part of a DSP processor, making the design of dedicatedFFT hardware particularly important. DSP processors are an important component in mobile phones[12]. Themain benefit of using a DSP processor is that it is able to guarantee performance within certain parameters: forexample, the latency and throughput guarantees of a dedicated DSP chip make it more suited to performingcommon and time sensitive operations than the CPU in mobile phones. The FFT is used in various wirelesscommunications standards, whilst other DSP techniques are used to provide error detection and correction.Many of these are critical and time sensitive tasks for a mobile phone.

A portable MP3 player is another common example of a consumer device which typically includes a specialisedaudio DSP processor, as shown in Figure 18. One of the main purposes of the DSP processor in an MP3 playeris to perform FFTs and interpret the data stored in an MP3 file[38]. Many other uses of DSP processors in fieldsincluding security, avionics and image processing are detailed by Texas Instruments, a leading manufacturer ofDSP processors[39].

Figure 18: Components on the logic board of a typical MP3 player[12]

12A FPGA is a configurable integrated circuit.

22

5. INITIAL PROJECT OUTLINE LIST OF ALGORITHMS

5.2 Existing Products

In this field there are many existing products. DSP processors are made to suit many different power, size,latency and throughput constraints so that a suitable unit is available for almost any use[40]. Off-the-shelfsolutions are available from manufacturers such as Texas Instruments[39], though many large and small vendorssell competing products.

Another option is to design components to meet specific demands. This process can be done from scratch,but modern tools allow components to be designed rapidly to meet needs flexibly. Hardware is almost alwaysdesigned with the use of Electronic Design Automation (EDA) software tools[41] provided by vendors suchas Cadence, Synopsis and Xilinx. Intellectual property (IP) can be licensed and used in these EDA tools tosimplify the design process. Xilinx ISE is an EDA tool which provides a Core Generator, allowing custom FFTprocessors to be designed very quickly using licensed IP.

5.3 Attempting To Design A FFT Processor

The initial plan was to design, implement, verify and benchmark a FFT processor. Specifying the processorturned out to be more challenging than anticipated and proved to be an insurmountable roadblock. The learningcurve was too steep when due to a lack of relevant prior knowledge. Understanding how the FFT algorithmscould implemented in hardware took too long to understand at a reasonable stage in the project. Nonetheless,the thought processes and various attempts to produce a working solution throughout this phase of the projectare detailed in this section.

5.3.1 Designing For RTL Implementation

Once hardware has been specified it is possible to implement the design in a number of ways. One of themost common ways used to create complex modern devices is to use a Hardware Description Language suchas Verilog or VHDL to describe the operations at an abstracted level, saving hardware engineer from low leveldetails of how operations such as additions and multiplications are actually implemented[16]. This level ofdesign is usually referred to as Register Transfer Level (RTL) design. Once RTL code has been verified it canbe synthesised into code which will run on a FPGA. This was the way that the solution was initially going tobe designed for this project.

Research showed that Radix-N algorithms, such as Radix-2, are common choices for FFT processors[7, 16, 18,35, 42]. The symmetries which the algorithms exploit makes them ideal candidates for hardware solutions, andfixed radix or split radix algorithms are the basis of many commercial FFT processors. However, understandingthe intricate details of how the algorithms were made into hardware was very difficult.

It was decided before Christmas that a simple 8-point radix-2 FFT would be designed and implemented basedon the simplest FFT processor architecture that had been found during research[42]. However, this also provedto be more difficult than anticipated.

One of the main reasons that the initial project idea proved to be so difficult is that resources about FFT pro-cessors are almost exclusively written for an audience familiar with the inner workings of FFTs. The resourcesoften assume an electronic engineering background with a large volume of DSP expertise as well as signifi-cant experience of designing hardware. Research papers focus on complex, highly optimised architectures andbooks did not provide enough detail on the basics.

Furthermore, programmers creating software implementations of a FFT algorithm can take advantage of usefulhigh level features such as floating point number support and inbuilt trigonometric functions which aren’tavailable to hardware designers, resulting in an excruciating level of detail for everything concerning the FFT.Also, FFTs typically operate on floating point data which is represented in hardware using mantissa and a

23

5. INITIAL PROJECT OUTLINE LIST OF ALGORITHMS

exponent form, but a lack of experience with floating point numbers and a lack of inbuilt support in EDA toolsmade it difficult to determine how floating point numbers could be implemented.

5.3.2 Converting C to RTL

After it became clear that the first approach would not yield results within an appropriate timeframe it wasdecided that another approach should be tried. Research showed that tools exist to convert C code to Verilog,and so it was decided that an effort should be made to produce a working FFT processor in this way.

At this stage, the idea was to use an adapted version of code published in Numerical Recipes in C[34] andconvert it to Verilog. This would allow verification and benchmarking work to be performed on the generatedcode to ensure its correctness and measure its performance. To this end, a verification environment was createdto ensure that the algorithm coded in C behaved correctly before being converted to Verilog.

A diagram of the test suite is shown in Figure 19. A Python script randomly generated a user defined numberof parameter files. From these parameters, complex time domain samples could be created and provided asinput to the C and Python FFT functions. One parameter file was per test as input to the FFTs. Creation of theinput signal was handled by a Python script which calculated samples and stored them in a temporary file whichcould be read in by the C testbench. The Python script calculated the FFT for these samples and stored them ina results file with the same format as the input samples, making it easy for both files of complex numbers to beparsed. The C testbench then reads in the input samples and the Python results before performing the FFT andcomparing the results of the trusted Python FFT to its own results, outputting ‘Pass’ or ‘Fail’ accordingly.

Parameter Generator

Structured Parameter Files

Python ScriptTmp File of Complex Time Samples

C Testbench Reference Implementation Results

Generates

Creates

Input

Produces

Input

Input

Pass/Fail

Figure 19: Test suite structure

Constant verification is good practice in hardware design. As the design matures it will progress throughseveral layers of abstraction, eventually culminating in logic gates being laid out. A mistake at a high levelof abstraction usually propagates through design phases. Returning to fix bugs in higher levels of abstraction

24

5. INITIAL PROJECT OUTLINE LIST OF ALGORITHMS

requires a large amount of extra design effort to be expended and can be costly in terms of time and money,if applicable. It is prudent to invest time in finding as many errors as possible at every stage of the designprocess.

As a result, verification is necessary to be able to say with a large amount of confidence that code performs asexpected. The FFT function in a Python library called NumPy was chosen as the reference FFT. The algorithmcoded in C was expected to produce the same results as the reference implementation for every possible input,and this was the criteria for validation.

The file format for parameters is shown in Figure 20. To reduce the storage overheads when running a largeamount of tests, it was decided that the best approach would be to store the parameters as one line of text in afile instead of storing up to 2048 complex samples for each test. The results of each FFT are maintained forcomparison purposes.

TestNumber StartTime EndTime NumberOfSamples Random1 Random2 Random3 Random4

Figure 20: Space delimited paramaters, which are stored in a file and used to generate input

The test number was stored so that parameters can easily be associated with the test that they correspond to.The start and end times were randomly generated floating point numbers and the number of samples was arandomly generated integer. The number of samples, or transform length, was always a power of 2 as the FFTcode implements an in-place Radix-2 algorithm. These values range from 8 to 2048 which is representativeof the range of transform lengths usually used with FFTs. Random numbers were generated to randomise thesignal generated on each input, such as controlling the combination of sin and cosine signals which formed thesignal.

Comparing the output of each FFT was done by comparing the complex magnitudes of each frequency bin. Asfloating point numbers were passed between programming languages and text files it was decided to allow amargin of error which could be specified by a user. A value of 0.0000005 was used because this margin of errorwas deemed insignificant, though good cases could be made for slightly higher or lower margins of error.

Bash provides a convenient way to deal with system operations, such as running files written in differentlanguages and finding all of the tests to run. Consequently, the verification flow was handled by a Bash script.The Bash script first calls for a user defined number of test parameters to be generated before running thereference FFT and the C FFT with every set of test parameters. At the end it outputs ”Pass” or ”Fail” to reflectwhether all tests produced the expected result.

When it came to converting the C code to RTL, however, it was discovered that the conversion tool was unableto perform the conversion using floating point numbers, and this plan was abandoned.

5.4 Alternative Ideas

After producing a FFT processor by converting C code into RTL hadn’t worked it was established that producinga FFT processor was an unrealistic goal. At this alte stage of the project two alternative proposals were putforward, and these are detailed in this section.

5.4.1 Xilinx Core Generator

The first proposal was to use functionality built into the Xilinx ISE EDA tool to generate a FFT processorand proceed with other parts of the project. Xilinx Core Generator allows a customised FFT processor to beproduced by choosing from a number of parameters. This functionality is useful if a FFT processor needs to

25

5. INITIAL PROJECT OUTLINE LIST OF ALGORITHMS

meet specific latency, throughput and power constraints within a specific area on chip, and the core generatorcan save considerable amounts of time for design engineers. As seen in Figure 21, ISE’s core generator offerslots of implementation options which allow FFT processors to be generated very quickly.

Figure 21: Using the Core Generator to generate FFT processors

5.4.2 Software Tools

It was quickly decided that while progressing with a generated FFT processor would fit in with the initial goalof the project, it would not offer much complexity or give much scope for the work to be completed. Thealternative proposal was to use the remaining time to build a software application capable of verifying andbenchmarking FFT algorithms. This proposal would be able to build upon the code produced in the previousstage of the project. Whilst less complex than the initial plan, it was agreed that this was a sensible proposalwhich could be completed in the remaining time. It was decided that this option was the best one to move

26

6. FINAL IMPLEMENTATION DETAILS LIST OF ALGORITHMS

forward with, despite representing a larger deviation from the original plan.

6 Final Implementation Details

This section details the implementation details for the FFT verification and benchmarking application that wasultimately produced. An overview of the application is given first before some of the important software classesare detailed.

6.1 Purpose of Application

The application allows a user to verify that a FFT algorithm written in C behaves as expected and compare itsperformance to other algorithms. A large number of inputs for a FFT can be generated from the application toensure that algorithms function correctly. Errors are flagged up and the software displays the results graphicallyto aid with debugging issues.

It was decided that the application should be coded in Java because of relevant experience and because severalparts of the application are easily modelled in an object oriented manner.

6.1.1 Requirements

It follows that some functional and non-functional requirements were identified early on before coding tookplace. These requirements were defined taking into account the short period of development time available,and represent a realistic set of requirements given the circumstances.

Functional

• The software must be able to generate random inputs for Fourier transforms and provide these as inputsto algorithms to test their correctness and speed.

• The software must be able to correctly validate whether a Fourier Transform algorithm produces theexpected results.

• The software must be able to benchmark Fourier Transform algorithms by recording the time taken toperform transforms.

• The software must be able to display useful information about the inputs and outputs of a Fourier trans-form, the number of tests which pass and fail verification, and the time taken by an algorithm to performa full transform.

Non Functional

• The software should have an intuitive, simple and aesthetically pleasing user interface.

• The software should be well modelled so that changes and additions can be made easily.

• The software should be able to respond quickly to user input.

• The software should be easily portable to different platforms.

These requirements were taken into consideration when designing the application with many of the goals beingsuccessfully met.

27

6. FINAL IMPLEMENTATION DETAILS LIST OF ALGORITHMS

6.1.2 Verification Tool

The verification in the application builds upon what was done when preparing to convert code from C toVerilog. Some of the functionality has been ported to Java whilst other functionality remains in a Bash script.The application compares the complex moduli obtained using each algorithm with the complex moduli obtainedusing the reference FFT, but Bash scripts are used to generate input and run tests.

The application compares the results obtained using each algorithm against the results obtained using the FFTfunction built into the Python NumPy library. As detailed previously in the report, this is trusted to produce thecorrect result and so other implementations are expected to produce the same values in order to be consideredcorrect.

Given more time the verification process could have been refined a little so that more of the Bash script wasported to Java and support for testing an algorithm implemented any language was added. However, thesefeatures did not make it into the application due to time constraints.

6.1.3 Benchmarking Tool

The benchmarking is performed according to the time it takes to perform an FFT 1000 times. Repeating the FFToperation enough times should ensure that background tasks do not have a noticeable impact on timings and itshould ensure that the negligible time for reading samples from a file can be ignored. Results are grouped bytransform length, and the average time taken to execute the algorithm using each transform length is displayedon a graph. By using enough test cases for each transform size the results should give a reliable average andallow a user to spot trends.

This could prove useful for a software developer who is comparing FFT algorithms to see which one is likely tobe fastest. It is important to note that issues which the programmer has limited control over, such as processorcacheing and process scheduling by an operating system, can have a significant impact on the speed of softwareFFTs. Consequently, the benchmarks obtained using this application should be considered to be platformspecific for a hardware and software unit.

Times are recorded for each test using the inbuilt Unix time command line utility. The time for each test isstored in a file with an appropriate name and standard formatting to allow for easy, consistent postprocessing.This approach allowed for rapid development, but if time allowed then time measurements would have beenrecorded and reported by testbenches when they run an algorithm as testbenches would be able to report moreprecise timings.

28

6. FINAL IMPLEMENTATION DETAILS LIST OF ALGORITHMS

6.2 Algorithms

Several algorithms were designed for use with the application. These algorithms are designed in C, but witha little extra time the application could easily be amended to run algorithms in any language. The languageisn’t particularly important but for benchmarking FFTs here it was decided that C exposes the details of thealgorithms and allows more control over operations. For example, an in-place Radix-2 algorithm was testedagainst a recursive Radix-2 algorithm, and C was an ideal candidate because it allows for fine grained controlover memory allocation. In contrast, this level of control would be much more difficult to achieve using Java toimplement the algorithms.

6.2.1 DFT

The DFT is an important reference point for implementing FFT algorithms. The DFT is computationally ex-pensive and is the reason which FFT algorithms are necessary. Recall the DFT equation from equation 6.

X [k] =N

∑n=0

x[n] · e−2πikn

N

This lazy algorithm can be implemented using relatively simple pseudocode. Two nested loops of lengthN are used, resulting in a complexity of O(N2). Recall from equation 1 that eix = cos(x) + isin(x). Thishelps to explain how the sum is calculated: for every frequency bin the result is computed as the sum of allsamples.

Algorithm 1: DFT[13]Data: N complex samples, ‘in’Result: N complex frequency bins, ‘output’for i← 0 to N do

sum = 0;for k← 0 to N do

angle← 2πkiN ;

sum += in[k]× cos(angle)+ in[k]× sin(angle);endoutput[i]← sum;

endreturn output;

This is expected to be the poorest performing algorithm.

29

6. FINAL IMPLEMENTATION DETAILS LIST OF ALGORITHMS

6.2.2 Recursive Radix-2

The recursive Radix-2 algorithm is the most straightforward implementation of the equation originally given inequation 16:

X [k] =

N2 −1

∑m=0

x[2m] · e− 2πikm

N2 + e−

2πikN ·

N2 −1

∑m=0

x[2m+1] · e− 2πikm

N2

A Radix-2 algorithm recursively splits a DFT of size N into two DFTs of size N2 . This description of Radix-

2 fits perfection with Algorithm 2 which solves the DFT with ON logN complexity. The algorithm uses thenotion of a complex number expressed in terms of its real and imaginary components.

Algorithm 2: Recursive Radix-2[14]Data: N complex samples, ‘in’Result: N complex frequency bins, ‘output’if N = 2 then

output[0].real = in[0].real + in[1].real;output[0].imag = in[0].imag + in[1].imag;output[1].real = in[0].real - in[1].real;output[1].imag = in[0].imag - in[1].imag;return output;

endxeven,xodd ← /0;for i← 0 to N do

if i is even thenxeven[ i

2 ]← in[i];else

xodd[ i−12 ]← in[i];

endendXg = recursive-radix2(xeven, N

2 );Xh = recursive-radix2(xodd , N

2 );for i← 0 to N do

j = i % N2 ;

output[i].real = Xg[j].real + Xh[j].real ×cos(2πiN ) + Xh[j].imag ×sin(2πi

N );output[i].imag = Xg[j].imag + Xh[j].imag ×cos(2πi

N ) + Xh[j].real ×sin(2πiN );

endreturn output;

30

6. FINAL IMPLEMENTATION DETAILS LIST OF ALGORITHMS

6.2.3 Iterative Radix-2

Whilst the recursive Radix-2 algorithm is convenient for a programmer to comprehend and far more computa-tionally efficient than the DFT, it is not very efficient with memory as it constantly copies values until it reachesthe base case. On the other hand, iterative implementations of the Radix-2 algorithm operate on data in-place,giving a space complexity of O(N) which is very good. As the input size increases the amount of memoryrequired scales linearly.

This version of the algorithm is rather verbose to describe, so a simplified version is presented in Algorithm 3where intricate details are omitted. In this equation, the variable s is used to keep track of which stage the FFTis currently at in the butterfly with log2N being the total number of stages. Essentially, this algorithm expressesa Radix-2 algorithm as the process of building up from size 2 DFTs, whereas the recursive algorithm startedwith a size N DFT and continually split it into two N

2 sized DFTs until DFTs are of size 2.

Like all radix based algorithms, its time complexity is O(N log2 N)

Algorithm 3: Iterative Radix-2[15]Data: N complex samples, ‘in’Result: N complex frequency bins, ‘output’for s← 1 to log2 N do

for k← 0 to n−1, k+= 2s docombine the two 2s−1-element DFTs in[k . . .k+2s−1−1] and in[k+2s−1 . . .k+2s−1] into one2s-element DFT in[k . . .k+2s−1];

endendreturn output;

31

6. FINAL IMPLEMENTATION DETAILS LIST OF ALGORITHMS

6.3 Modelling

The application was designed to be user friendly with a clean codebase. The clean codebase was achieved bymaking sensible use of GRASP principles and common object oriented design patterns. A key goal was tocreate highly cohesive13 software classes with low coupling14 between them, and extending the application toperform new tasks during the last week of the project was made much simpler as a result of this. Larman[43]provides a brilliantly written resource which gives more details about software design patterns and principlesmentioned during this section.

6.3.1 FFT

The FFT was modelled using a number of classes. In this application there are several attributes which clearlybelong to a FFT object:

• The algorithm name.

• The code for the FFT algorithm.

• The code used as a testbench.

• The average time taken by the FFT algorithm for each transform length.

• Whether the algorithm passes all tests.

Figure 22 shows three main software classes used to model information about FFTs and some important opera-tions. It was decided that the algorithm code and testbench should be stored in a separate class called FFTCode.This class deals with reading in code from the relevant files and handling exceptions because this detail isn’trelevant to the FFT class. As there are several transform lengths information about the time taken for eachdatapoint is stored using a HashMap variable, with an integer transform length as the key and the average timetaken to run the algorithm on this transform length as the value. The time taken is stored in a Time object whichis able to provide the information stored in numerous formats.

Figure 22: Three software classes used to manage and model FFTs

In order to keep the FFT class highly cohesive a separate group of classes were created to validate the resultsof the FFT. These are covered next; a FFT object only needs to know whether the algorithm passes or fails thetests, not how the tests are performed.

13High Cohesion means that each software class represents a single well defined entity14Low Coupling means that software classes have few dependencies between classes and that these dependencies are kept simple.

32

6. FINAL IMPLEMENTATION DETAILS LIST OF ALGORITHMS

To allow the number of algorithms to scale a Factory was used to create and manage FFT objects. The FFTFac-tory class uses the Singleton design pattern to ensure that only one instance can be used, providing a single placefor other parts of the code to obtain an FFT object. It uses a HashMap to store the FFTs, with the algorithmnames as the key and the FFT object as the value.

This is convenient way to store and retrieve FFT objects and allows the application to scale well when moreFFT algorithms are being tested. Depending on the number of tests performed, loading the information about aFFT algorithm can be a reasonably time consuming process because it can require a lot of file parsing. Using afactory means that the loading information about an FFT is only done once when loading the application insteadof being repeated many times when the software is in use, resulting in efficient use of memory resources andless time delays when using the application. Furthermore, the FFTFactory object also stores the FFT algorithmwhich is currently in use which allows many objects to gain access to the current FFT object with minimaleffort.

6.3.2 User Interface

The User Interface (UI) was designed by creating various panels to deal with single, well defined parts of theuser interface. Creating the UI in a modular fashion made it easy to rearrange components when new panelswere added and to change the appearance of parts of the system entirely. The UI uses a tabbed interface wheredifferent panels are accessible by selecting clicking on a tab near the top of the window, as shown the Resultssection shortly.

A user can select to view the results of any of the algorithms. When changing the algorithm which is beingviewed, many different panels need to update their content. This could be a logistical nightmare if not handledsensibly and is far from ideal. Instead of handling these changes in a monolithic fashion whenever a differentalgorithm is selected by making changes to each class in turn, it was decided that a reset() method should beused.

This was solved using polymorphism. An abstract ProgramPanel class was created, extending the inbuilt JPanelclass by defining an abstract reset() method. Every panel used in the application is a subclass of ProgramPanel,and consequently has to implement a concrete reset() method. As the panels are arranged in a hierarchichalmanner with MainPanel at the top level, calling reset() on MainPanel is all that needs to be done to refresh orreset the interface. MainPanel can call reset() on all panels which it contains. This allows a reset operationto propagate through all the panels in the UI with just one call from a controller class, achieving Model-ViewSeparation15.

Figure 23: The ProgramPanel class and three examples of concrete subclasses

15Model-View Separation is desirable because the model shouldn’t be coupled to classes which display information, and view classesshouldn’t be concerned about how the information is generated. Separating these with a layer of controllers allows undesirable couplingto be controlled in one area of the application

33

6. FINAL IMPLEMENTATION DETAILS LIST OF ALGORITHMS

If necessary, each panel is able to consult the FFTFactory to obtain the FFT object which is currently in use.This keeps coupling between layers to a minimum and confines it to a single place. The FFTFactory is wellplaced to deal with such requests according to the Information Expert principle.

Furthermore, there are several combo boxes in the UI. The creation logic for these combo boxes can be complex,relying on traversing the file system to find the available algorithms and test cases. Instead of putting this logicin the classes which use the combo boxes a ComboBoxFactory object is used to manage creation and cacheinglogic to achieve separation of concerns. This results in a single, simple point of coupling along with highlycohesive objects which do not mix file handling functions with display related functions. Furthermore, there isa single point of external coupling between the application and the external file system which makes it easy toadapt to changes instead of having to manually update several classes which each create combo boxes.

6.3.3 Running Tests

Instead of converting the entire test framework to Java, some of the old validation harness was used to allowdevelopment to progress rapidly. After the shell script was amended to find all of the FFT algorithms in thefile hierarchy and to run each FFT algorithm with all input data, the shell script was fit to be used as part of theapplication. However, this posed some further problems: how could the shell script be executed from a Javaenvironment, and how could it provide useful feedback to the Java software?

This was achieved by using the Observer software design pattern. The Observer pattern uses two types ofobjects: observable and observer. This provides a neat solution for outputting text from the shell script. Twoobjects, RunValidation and ValidationObserver, were created to extend the inbuilt Observable and Observerobjects respectively. RunValidation spawns a process to run the shell script. Whenever the shell script writesany output, RunValidation notifies all observers of this change. ValidationObserver is subscribed to the changesand is able to update the UI with the new content. This implementation turns a complex problem into a relativelya simple one and makes the system extensible: if another class needed to respond to updates from the shell scriptthen this would not require changes to the existing code.

Observable (A)

+subscribers: List

+run(): void

+notifyObservers(): void

RunValidation

+run(): void

+notifyObservers(): void

Observer (A)

+update(A)(): void

ValidationObserver

+update(): void

publish > < subscribe

Figure 24: The Observer pattern: ValidationObserver subscribes to updates from RunValidation, andwhenever RunValidation posts an update via the notifyObservers() method this is passed on to RunValidation.

Whilst running tests the UI is locked so that a user cannot click on buttons to mess up the process or click ontabs to trigger unexpected results. The UI is locked by disabling the MainPanel, providing a simple and elegantsolution without relying on a large number of classes. Upon completing the tests, the whole UI is automaticallyupdated by calling the reset method on the top level panel and then the top level panel is reenabled.

34

6. FINAL IMPLEMENTATION DETAILS LIST OF ALGORITHMS

6.3.4 Validating Results

The constructor for an FFT object checks whether or not the corresponding algorithm worked correctly. Inorder to verify this, a number of classes were created. The result files contain pairs of complex numbers andso it makes sense to model complex numbers as an object. The main classes involved with validating a FFTalgorithm are shown in Figure 25

In this application, the complex modulus of a complex pair is evaluated and stored as a double in a Com-plexNumber class. One result file consists of many such complex numbers, and a list of complex numbers canbe stored using the ComplexNumberList class. Both classes override the default Java equals() method, withComplexNumber comparing two complex numbers and ComplexNumberList comparing two lists of complexnumbers. This provides an elegant way of checking equality. When a ComplexNumberList checks if it is equalto another, it iterates over every complex number calls the equals() method on the ComplexNumber objects inquestion.

In the equals() method of the ComplexNumber a small value is set as the acceptable deviation from the ref-erence FFT implementation to compensate for slight differences in numerical representation. As numbersare passed between programming languages through a text file it is understandable that the results may differslightly.

ComplexNumberList

-complexNumbers: ArrayList<ComplexNumber>

+equals(obj:Object): Boolean

ComplexNumber

-value: Double

+equals(): Boolean

Contains >1 1..*

ValidateFFT

-cResults: ArrayList<ComplexNumberList>

-pythonResults: ArrayList<ComplexNumberList>

+getPasses(): int

+getFails(): int

+checkResults(): Boolean

1

1..*

Figure 25: A view of the main classes involved with validating a FFT

6.3.5 Representing Results

The input and output of a FFT can be conveniently displayed in a graph to provide useful feedback to theprogrammer. Java lacks inbuilt support for this so an open source external library called JFreeChart was usedinstead. This library was chosen because it offers lots of functionality for generating graphs, because it is freelyavailable and because it has a large and active development community.

The library contains a ChartFactory class to create graphs which is useful point of entry. However, severalUI classes need to display different kinds of graphs. Whilst a large amount of graph creation work is doneby the ChartFactory class in the external library, this class needs to be provided with the information which itis to create a graph about. This information comes from the model, and so it is not good practice for the UIclasses to deal with creating graphs. Instead UI classes are able to access a Facade class, ChartFacade, andrequest a particular graph. The Facade class then gathers information from the model, requests a graph usingthis information and returns the result.

35

7. RESULTS LIST OF ALGORITHMS

6.4 Testing

Due to the short time available for developing this application, rigorous JUnit testing was not completed. Whilstthis is not ideal, the way in which the design was approached helps to alleviate risks in a few ways:

• By using common software design patterns when designing the code the risk of mistakes was reducedslightly compared to a fully customised solution. Design patterns provide commonly accepted solutionsfor modelling different situations, reducing the risk of poorly organised software.

• Methods and classes were kept small and cohesive. Small methods and classes are easy to read and amistake in a small method is more easily spotted and corrected than a mistake hidden in a series of largemonolithic methods.

• When the application was being designed, some of the functionality was being ported from a script whichwas already validated to work properly. The Java outputs were quickly sanity checked against the outputsof the original script which sufficed for the purposes of this application.

• The algorithms were being verified by the application. The applications were tested against a trustedimplementation and were seen to produce the correct results.

If more time was available then the application would have been developed after a test-driven developmentenvironment was created. Unfortunately this was infeasible with the amount of time available after switchingprojects. After weighing up the benefits and drawbacks of this approach it was decided that the priority was toget the application up and running before focussing on testing because some parts of the system were designedfor the sole purpose of performing validation.

7 Results

In the limited development time available for the alternative project a lot was achieved, and the results aredetailed in this section. Firstly, Figure 26 shows a view of the application’s tabbed user interface. In this case,the application is displaying a pane with the validation harness loaded for the DFT algorithm. The interface isclean, simple and intuitive.

Figure 26: An overview of the application running, showing the validation harness code pane

36

7. RESULTS LIST OF ALGORITHMS

When it comes to validation it is useful to be able to see what proportion of tests are passing and failing. Inthe application this information is presented in a pie chart as in Figure 27, with red representing test passes andblue representing test failures. In this case all of the algorithms pass all test cases which is expected, becausethe algorithms were debugged before the project hand in date.

Figure 27: A pie chart displays the number of test passes/failures

Information corresponding to particular tests can be found in the Individual Test Results section of the appli-cation. A graph of an input signal in the time domain is displayed in the Input tab (Figure 28a), whilst theresults of the FFT operations can be seen in the Result tab (Figure 28b). Each frequency bin has two colouredbars above it corresponding to the complex magnitudes obtained using the reference Python FFT (red) andthe selected FFT algorithm (blue) respectively. Furthermore, this screen also displays whether an individualtest passes or fails. The value obtained using the reference FFT is plotted next to the value obtained using thealgorithm under test for each frequency bin. This allows a designer to see what exactly is going wrong and mayhelp lead to errors being fixed.

(a) A simple time domain wave used as input (b) The transform of this information

Figure 28: Graphs obtained using an input with a small transform length

The graphs certainly helped with diagnosing issues with the recursive FFT algorithm. Input functions arepurely real and, as can be seen clearly in Figure 28, there is Hermitian symmetry in the results. When therecursive Radix-2 algorithm failed early tests it was initially thought that there was a serious flaw with theimplementation, but analysing the graphs revealed that some correct values were being calculated before beingassigned to incorrect frequency bins. This led to the error being corrected quickly.

Most importantly, the running times of the algorithms were analysed. Plans to implement a Radix-4 algorithmand a split-radix algorithm were too ambitious because significant changes to the verification flow were needed

37

7. RESULTS LIST OF ALGORITHMS

to run these algorithms: they only work properly with a transform length which is a power of 4. These changescould not be completed in time. However, the running times shown in Figure 29 provide a useful insight intohow quick the each of the three tested implementations of Fourier transform algorithms are.

Figure 29: Running times of algorithms

The graph uses a logarithmic axis to represent time. The DFT was slowest by a clear margin as expected,demonstrating clearly why the Fast Fourier Transform algorithms are required. The recursive Radix-2 algorithmcame in a clear second place, whilst the iterative Radix-2 algorithm came out in a clear first place. This wasthe expected result, but it was startling to see the difference between the iterative Radix-2 algorithm and itsrecursive counterpart.

This is likely to be partly due to some optimisations made in the iterative algorithm which were not mirroredin the recursive algorithm, but more importantly it is likely to do with the space complexity of the iterativesolution. As the iterative algorithm runs in-place its memory requirements are O(N2), whereas the recursivealgorithm uses more memory when it creates copies of the input array at each stage of recursion. When execut-ing on a general purpose processor, several other factors come into play as well as straightforward algorithmiccomplexity. Chief among these is the size of processor caches.

Indeed, processor cache sizes would most likely explain the non-uniform rate of growth for each functionswhen used with small transform lengths. The DFT sees a profound jump in execution time when moving froma transform length of 8 to a transform length of 16. For transform lengths between 16 and 2048 the growthrate remains reasonably constant. A similarly strange trend can be seen with the time taken to execute therecursive Radix-2 algorithm as the transform length grows. One plausible explanation is that the change in therate of growth of the function is related to cache size. Filling up one level of the processor’s cache is likely tocorrespond to a jump in the time taken to execute the transform, and larger transforms fill the cache quickly.This would also explain why the split radix algorithm is so good to begin with: it operates in place so does notfill up the limited cache memory as quickly as the other algorithms.

It would be interesting to carry out further investigation into the effect of caches - if any - on the results whichwere obtained, given more time. In the absence of scientific proof, cache sizes seems to be a reasonable andrational explanation for the few strange results which were obtained.

38

8. CONCLUSIONS LIST OF ALGORITHMS

8 Conclusions

Conclusions for this project are mixed. The final product was very different to the initial project goals andlacked the same degree of complexity. By the same token, the initial project proved to be far too difficult anddelivering a final product seemed implausible at several stages of the project.

Whilst reading up on the subject, it quickly became apparent that resources on this topic are not written ina simple and easily understood manner. Assimilating a large amount of unfamiliar material was a difficultchallenge but it has proven to be rewarding in the end. Parts of the report were designed to demonstrate thata good understanding of the material has eventually been gained despite the initial project goal not being met.In some ways this is a good thing because the project represented an opportunity to learn about a subject areawhich is not covered in detail within the Computer Science syllabus.

Different approaches were used to attempt to create a working FFT processor but regrettably these attemptsall failed. It was good to try different approaches to solve a difficult problem instead of giving up instantly.However, on reflection, it is readily apparent that a more prudent approach would have been to evaluate thesituation a little less optimistically early on. This project has showed that sometimes effort and determinationis not enough to overcome adversity. This experience will undoubtedly be useful when going forward becausenext time a similar situation occurs the response to the situation will be a little more realistic, whether itinvolves seeking even more help or realising the need for change before a situation becomes difficult to recoverfrom.

To end on a positive note, the project demonstrated the highs and lows of tackling tough problems which isgood life experience: coming to understand how the Fourier transforms worked after interpreting unfamiliarinformation written in wildly varying mathematical styles by many different authors was a high point, whilstfailing to get near to delivering the original project goal was a low point. In the end, an application was producedfor verifying and benchmarking FFT algorithms written in C and so the project finished on a medium point. Theproject was not the success that was hoped for, but the application allowed the theory about Fourier transformsto be put into practice along with some software engineering skills. The project was not a success but it wasnot a complete disaster either.

39

REFERENCES REFERENCES

References

[1] MineLab. Terminology. [Online], 2014. URL http://www.minelab.com/consumer/knowledge-base/terminology. Last visited 06/04/2014.

[2] Anders Gjendemsj. Illustrations [of the Sampling Theorem]. [Online], 2013. URL http://cnx.org/content/m11443/latest/?collection=col10631/latest. Last visited 9/10/2013.

[3] NASA. Timing Analysis. [Online], 2011. URL http://imagine.gsfc.nasa.gov/docs/science/how_l2/timing.html. Last visited 16/04/2014.

[4] Auckland University. Psychoacoustics. [Online], 2013. URL http://www.cs.auckland.ac.nz/compsci708s1c/lectures/jpeg_mpeg/mpeg_audio.html. Last visited 9/10/2013.

[5] Kalid Azad. An Interactive Guide To The Fourier Transform. [Online], 2012. URL http://betterexplained.com/articles/an-interactive-guide-to-the-fourier-transform/. Lastvisited 22/04/2014.

[6] Robert Mannell. Acoustic Analysis of Sound: Analog and Digital. [Online], 2008. URL http://clas.mq.edu.au/speech/acoustics/frequency/analog_digital.html. Last visited 27/04/2014.

[7] Douglas B. Williams. The Digital Signal Processing Handbook. The Electrical Engineering HandbookSeries. CRC Press LLC, 1998.

[8] Eric W. Weisstein. Discrete Fourier Transform. [Online], 2014. URL http://mathworld.wolfram.com/DiscreteFourierTransform.html. Last visited 25/04/2014.

[9] Douglas L. Jones. Decimation-in-time (DIT) Radix-2 FFT. [Online], 2006. URL http://cnx.org/content/m12016/latest/. Last visited 25/04/2014.

[10] Douglas L. Jones. Radix-4 FFT Algorithms. [Online], 2006. URL http://cnx.org/content/m12027/latest/. Last visited 25/04/2014.

[11] Naim Dahnoun. Goertzel Algorithm. [Online], 2004. URL www.ti.com/ww/cn/uprogram/share/ppt/c6000/Chapter17.ppt. Last visited 28/04/2014.

[12] Marshall Brain. Inside A Cell Phone. [Online], 2000. URL http://electronics.howstuffworks.com/inside-cell-phone.htm. Last visited 26/04/2014.

[13] Nayuki Minase. How to Implement the Discrete Fourier Transform. [Online], 2014. URL http://nayuki.eigenstate.org/page/how-to-implement-the-discrete-fourier-transform. Lastvisited 29/04/2014.

[14] Manish Kasat. Radix 2 floating point FFT Implementation. [Online], 2009. URL http://smsoftdev-solutions.blogspot.co.uk/2009/10/radix-2-floating-point-fft.html. Last vis-ited 29/04/2014.

[15] EuroInformatica. Efficient FFT Implementations. [Online], Unknown. URL http://tinyurl.com/k79dalz. Last visited 29/04/2014.

[16] Vijay K. Madisetti. VLSI Digital Signal Processors: An Introduction To Rapid Prototyping And DesignSynthesis. Butterworth-Heinemann, 1995.

[17] Eric W. Weisstein. Nyquist Frequency. [Online], 2013. URL http://mathworld.wolfram.com/NyquistFrequency.html. Last visited 2/10/2013.

[18] Patrick Gaydecki. Foundations of Digital Signal Processing: theory, algorithms and hardware design,volume 15 of Circuits, Devices and Systems. The Institution of Electrical Engineers, London, UnitedKingdom, 2004.

40

REFERENCES REFERENCES

[19] James R. Graham. Seismic Applications for the FFT. [Online], 2014. URL http://astro.berkeley.edu/˜jrg/ngst/fft/seismic.html. Last visited 15/04/2014.

[20] D.G De Paor. Structural Geology and Personal Computers. Elsevier, 1996.

[21] Thomas Schwengler. Wireless & Cellular Communications - Class Notes for TLEN-5510. [Online], 2013.URL http://morse.colorado.edu/˜tlen5510/text/classwebch9.html. Last visited 17/04/2014.

[22] Keithley Instruments. An Introduction to Orthogonal Frequency Division Multiplex Technology. [Online],2004. URL http://www.ieee.li/pdf/viewgraphs/introduction_to_orthogonal_frequency_division_multiplex.pdf. Last visited 29/04/2014.

[23] Cory Johnson. What is Orthogonal Frequency Division Multiplexing (OFDM)? [Online],2014. URL http://www.techopedia.com/definition/5078/orthogonal-frequency-division-multiplexing-ofdm. Last visited 17/04/2014.

[24] Brian L. Evans. Performance Evaluation and Real-Time Implementation of Subspace, Adaptive, and DFTAlgorithms for Multi-tone Detection. [Online], 1994. URL http://ptolemy.eecs.berkeley.edu/papers/96/dtmf_ict/www/node3.html. Last visited 17/04/2014.

[25] Marshall Brain. How MP3 Files Work. [Online], 2000. URL http://computer.howstuffworks.com/mp32.htm. Last visited 19/04/2014.

[26] Eric W. Weisstein. Complex Number. [Online], 2014. URL http://mathworld.wolfram.com/ComplexNumber.html. Last visited 24/04/2014.

[27] Eric W. Weisstein. Euler Formula. [Online], 2014. URL http://mathworld.wolfram.com/EulerFormula.html. Last visited 24/04/2014.

[28] Julie E. Greenberg and Natalie T. Smith. Overview of Spectral Analysis. [Online], 2008. URL http://web.mit.edu/6.555/www/tutorial/SAtext.html. Last visited 27/04/2014.

[29] LDS. Understanding FFT Windows. [Online], 2003. URL http://www.physik.uni-wuerzburg.de/

˜praktiku/Anleitung/Fremde/ANO14.pdf. Last visited 27/04/2014.

[30] Renato Romero. FFT For Beginners. [Online], 2000. URL http://www.vlf.it/fft_beginners/fft_beginners.html. Last visited 27/04/2014.

[31] Jake Vanderplas. Understanding the FFT Algorithm. [Online], 2013. URL http://jakevdp.github.io/blog/2013/08/28/understanding-the-fft/. Last visited 25/04/2014.

[32] Lance Williams. Fourier Transform Symmetrics (Lecture). [Online], 2011. URL http://www.cs.unm.edu/˜williams/cs530/symmetry.pdf. Last visited 27/04/2014.

[33] James W. Cooley and John W. Tukey. An algorithm for the machine calculation of complex Fourier series.Technical report, Math. Comput., 1965.

[34] William H. Press and Saul A. Teukolsky and William T. Vetterling and Brian P. Flannery. NumericalRecipes in C - The Art of Scientific Computing. Cambridge University Press, 2 edition, 1988.

[35] Stephen G. Johnson and Matteo Frigo. A modified split-radix FFT with fewer arithmetic operations.Technical report, IEEE Trans. Signal Processing 55 (1), 111119, 2007. URL http://www.fftw.org/newsplit.pdf.

[36] Bevan Baas. FFT Diagrams & Algorithms (Lecture). [Online], 2012. URL http://www.ece.ucdavis.edu/˜bbaas/281/slides/Handout.fft2.pdf. Last visited 27/04/2014.

[37] Richard Lyons. Single tone detection with the Goertzel algorithm. [Online], 2012. URLhttp://www.embedded.com/design/real-world-applications/4401754/Single-tone-detection-with-the-Goertzel-algorithm. Last visited 28/04/2014.

41

REFERENCES REFERENCES

[38] Kevin Bonsor, Jeff Tyson, and Craig Freudenrich. MP3 Technology. [Online], 2007. URL http://electronics.howstuffworks.com/mp3-player2.htm. Last visited 26/04/2014.

[39] TI. Applications for Digital Signal Processors. [Online], 2014. URL http://www.ti.com/lsds/ti/dsp/applications.page. Last visited 25/04/2014.

[40] BDTI. Choosing a DSP Processor. [Online], 2000. URL http://www.bdti.com/MyBDTI/pubs/choose_2000.pdf. Last visited 28/04/2014.

[41] Neil Johnson. Agile Hardware Development: Nonsense or Necessity? [Online], 2011. URL http://www.eetimes.com/document.asp?doc_id=1279137. Last visited 29/04/2014.

[42] Bevan Baas. FFT Processor Example (Lecture). [Online], 2012. URL http://www.ece.ucdavis.edu/

˜bbaas/281/slides/Handout.fft4.Spiffee.pdf. Last visited 28/04/2014.

[43] Craig Larman. Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Designand Iterative Development, 3/e. Pearson Education, 2012.

42