hardwarealgorithms mse: parallelization - bfh · mse: hardwarealgorithms parallelization...

40
Mse: Hardware Algorithms Parallelization Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-Ti HuCE-microLab, Biel/Bienne [email protected] October 11, 2017 Contents 1 Introduction 1 2 Parallelization 2 3 Unfolding 9 4 Hardware Rules 14 5 OCT Example 15 5.1 OCT Introduction ................. 15 6 Parallelization at OCTExample 29 6.1 Data-Path Unfolding ................ 29 6.2 FiFo Unfolding ................... 31 6.3 DFT Unfolding ................... 33 References 38

Upload: phamcong

Post on 01-Apr-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Mse: Hardware Algorithms

Parallelization

Marcel JacometJosef Goette

Bern University of Applied SciencesBfh-Ti HuCE-microLab, Biel/Bienne

[email protected]

October 11, 2017

Contents

1 Introduction 1

2 Parallelization 2

3 Unfolding 9

4 Hardware Rules 14

5 OCT Example 155.1 OCT Introduction . . . . . . . . . . . . . . . . . 15

6 Parallelization at OCTExample 296.1 Data-Path Unfolding . . . . . . . . . . . . . . . . 296.2 FiFo Unfolding . . . . . . . . . . . . . . . . . . . 316.3 DFT Unfolding . . . . . . . . . . . . . . . . . . . 33

References 38

Page 2: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

c© Marcel Jacomet, 2012 - 2016

All rights reserved. This work may not be translated or copied in

whole or in part without the written permission by the author, except

for brief excerpts in connection with reviews or scholarly analysis.

Use in connection with any form of information storage and retrieval,

electronic adaptation, computer software is forbidden.

Marcel Jacomet ii 2008

Page 3: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

1 Introduction

Marcel Jacomet 1 2008

Page 4: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Textbooks

• Vlsi Digital Signal Processing Systems, Design and Im-plementation, Keshab K. Parhi, John Wiley & Sons,Isbn 0-471-24186-5, 1999, USD 135

• Oct texts discussing the lab example can be found onthe web

2 Parallelization

Marcel Jacomet 2 2008

Page 5: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Parallelization Principles 1

• parallelization at degree p speeds up hardware algorithmsby up to factor p

• parallelization of hardware basically can be done in twoways:

– p identical hardware paths executing time delayeddata-streams in parallel

– p interlinked hardware paths executing a stream ofdata vectors of length p data sets in parallel

• the first approach is a straight forward implementationusing p times the number of non parallelized hardware

• the second approach is more challenging, using p times thenumber of operators of the non parallelized hardware, butthe identical number of storage elements only

Marcel Jacomet 3 2008

Page 6: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Parallelization Principles: Parallel Streams

Marcel Jacomet 4 2008

Page 7: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Parallelization Principles: Parallel Sets

data sampling channel 1

data sampling channel 2

data sampling channel 3

data sampling channel 4

data sampling channel 5

data sample(5 set vector)

interlinked parallel processing of samples (vectors)

Marcel Jacomet 5 2008

Page 8: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Dataflow Graph Representation

y[n] = a · x[n] + b · x[n− 1] + c · x[n− 2]

• block diagram of 3-tap FIR filter

1z

1z

y[n]

x[n-2]x[n-1]x[n]

a b c

• data-flow diagram of 3-tap FIR filter

y[n]

x[n]

a b c

D 2D

Marcel Jacomet 6 2008

Page 9: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Dataflow Graph: Pipelining

• pipelining is done by introducing additional delay elements(registers)

• pipelining delays elements can only be set in feed-forwardpaths

y[n]

x[n]

a b c

D2D

y[n]

x[n]

a b c

D3D

D

Marcel Jacomet 7 2008

Page 10: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Dataflow Graph: Pipelining for Speedup

• pipelining to increase clock frequency

• retiming theory (Bellman-Ford or Floyd-Warshall algo-ithms)

• Fir example: frequency is 1/(2u) instead of 1/(4u)

y[n]

x[n]

a b c

D2D

(2u) (2u) (2u)

(1u) (1u)

y[n]

x[n]

a b c

D2D

(2u) (2u) (2u)

(1u) (1u)D

D D

y[n]

x[n]

a b c

D D

(2u) (2u) (2u)

(1u) (1u)D

D D

Marcel Jacomet 8 2008

Page 11: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

3 Unfolding

Marcel Jacomet 9 2008

Page 12: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Unfolding 1

• unfolding or loop unrolling

• example

y[n] = a · y[n− 9] + x[n]

1: for i← 1, to ∞ do2: y[i]← a · y[i− 9] + x[i]

• replacing index n by 2k and n+ 1 by 2k + 1

• together, the 2 equations describe the same algorithm

y[2k] = a · y[2k − 9] + x[2k]

y[2k + 1] = a · y[2k − 8] + x[2k + 1]

Marcel Jacomet 10 2008

Page 13: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Unfolding 2

• parallelization degree: J-slow

• J-slow means that for an input x[kJ +m] the output aftera delay is x[(k − 1)J +m]

• thus we get:

y[2k] = a · y[2(k − 5) + 1] + x[2k]

y[2k + 1] = a · y[2(k − 4) + 0] + x[2k + 1]

Marcel Jacomet 11 2008

Page 14: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Unfolding 3

• data flow graph of example

• algorithm of example (2-slow)

x[n]

a

9D

y[n]

x[2k+1]

a

4D

x[2k]

a

5D

y[2k+1]

y[2k]

Marcel Jacomet 12 2008

Page 15: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Unfolding Design Procedure

• for each node U in the original Dfg, draw the J nodesU0,U1, · · · , UJ−1

• for each edge U → V with w delays in the original Dfg,draw the J edges Ui → V(i+w)mod (J) with ⌊ i+w

J ⌋ delaysfor i = 0, 1, 2, · · · , J − 1

U0

U1

U2

V0

V1

V2

T0

T1

T2

U V

T

D

6D

5D

D

D

2D

2D

2D

2D

2D

Marcel Jacomet 13 2008

Page 16: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

4 Hardware Rules

Signal Processing Hardware Rules: ”No Control Path”

• 1/z register stores at every clock cycle a new input sample

• if clause asks for controllable registers (with enable)

• let’s built it in Simulink: hardware rule

1z

Unit Delay

Register

D

clk

Q 1z

Unit Delay

Register

D

clk

Q

u

E

1z

Unit Delay

y

Enabled

1z

Unit Delay

Register

D

clk

Q

EnabledRegister

D

clk

Q

ena u

E

1z

Unit Delay

y

Enabled

1z

Unit Delay

Register

D

clk

Q

EnabledRegister

D

clk

Q

ena

1z

Unit Delay

~=0

Switch

Register

D

clk

Q

EnabledRegister

D

clk

Q

ena

1z

Unit Delay

ena

DQ

1z

Unit Delay

~=0

Switch

1z

Unit Delay

ena

D

Marcel Jacomet 14 2008

Page 17: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

5 OCT Example

5.1 Introduction to OCT

Marcel Jacomet 15 2008

Page 18: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware AlgorithmsOptical coherence tomography (Oct) is an optical signalacquisition and processing method. It captures micrometer-resolution, three-dimensional images from within optical scat-tering media (e.g., biological tissue). Optical coherence tomog-raphy is an interferometric technique, typically employing near-infrared light. The use of relatively long wavelength light allowsit to penetrate into the scattering medium. Reflection is causedby refraction index changes at tissue boundaries and scatteringis a diffraction process at micro-structures in the tissue. Oct

signals only contain information about the depth of scattering orreflecting structures and cannot differentiate between these twofundamental processes. A relatively recent implementation ofoptical coherence tomography, frequency-domain optical coher-ence tomography, provides advantages in signal-to-noise ratio,permitting faster signal acquisition. Optical coherence tomog-raphy systems are employed in diverse applications, includingart conservation and diagnostic medicine, notably ophthalmol-ogy where it can be used to obtain detailed images from withinthe retina. Advantages compared to other techniques are theachieved tissue penetration (1 to 3 mm) combined with the rel-ative high axial resolution (0.5 to 15 mm) at a very high mea-suring frequency (several 100 kS/s).

Introduction to OCT: Features

• Oct is an optical signal acquisition and processing method

• micro-meter resolution in 3-D images

• optical scattering/reflecting media: biological tissues

• interferometric technique with near infrared laser

Marcel Jacomet 16 2008

Page 19: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms• reflection is caused by refraction index changes at tissueboundaries

• recent Oct technology is frequency domain Oct provideslow Snr and high speed signal acquisition

Marcel Jacomet 17 2008

Page 20: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Introduction to OCT: Applications

• applications in medicine: ophthalmology, ...

• depth penetration of 1 to 3 mm (A-scan)

• speeds of 100 kS/s per depth scans at 2048 pixels, ≥ 200MS/s

• Oct image of pig eye atHuCE-optoLab (left), Oct setupwith Gecko platform at HuCE-microLab (right)

Marcel Jacomet 18 2008

Page 21: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware AlgorithmsThe optical setup for frequency-domain Oct typically con-sists of an interferometer with a low coherence, broad band-width light source (white light) or a narrow band sweeping lightsource. Light is split into and recombined from reference andsample arm, respectively.

Introduction to OCT: Principle

• low coherence source (Lcs)

• beam splitter (Bs)

• reference (Ref) and sample arm (Smp)

• diffraction grating (Dg) and full field camera Cam) asspectrometer (source wiki)

Marcel Jacomet 19 2008

Page 22: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware AlgorithmsThe measured input samples received by the digital signalprocessing units are equidistant to the wavelength (x-axis is thewavelength, y-axis is the measured Oct light intensity). A firststep in the Oct processing is to remap the measured light in-tensity equidistant to the wave number instead to the wave-length. This pre-processing step is needed for a succeeding Dft

transformation. Use simple linear interpolation to calculate theremapped sample intensity.

Introduction to OCT: Signals

• top: captured fourier domain Oct signals of A-scan

• middle: signals after filtering and remapping

• bottom: final A-scan image after inverse Fft

0 200 400 600 800 1000 12000

1

2

3

wave length [nm]

Inte

nsity

a.u

.

7.25 7.3 7.35 7.4 7.45 7.5 7.55 7.6 7.65 7.7 7.75

−0.5

0

0.5

1

wave number [1/um]

Inte

nsity

a.u

.

−1000 −800 −600 −400 −200 0 200 400 600 800 10000

0.05

0.1

0.15

0.2

depth z [um]

Inte

nsity

a.u

Marcel Jacomet 20 2008

Page 23: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Signal Processing in OCT: Remapping 1

• Oct input signals are captured in λ (wave length) domain

• they have to be transformed into k (wave number) domain

• this process is called remapping

7.25 7.3 7.35 7.4 7.45 7.5 7.55 7.6 7.65 7.7 7.75

7.25

7.3

7.35

7.4

7.45

7.5

7.55

7.6

7.65

7.7

7.75

camparison of k (linear) and k = 2*pi/lambda(n)

linear k

Marcel Jacomet 21 2008

Page 24: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Signal Processing in OCT: Remapping 2

• λ (wave length) from 810 nm to 870 nm

• λ equidistant sampling in wave length: Ln

• λ equidistant sampling in wave number: Lm

Ln-1 Ln Ln+1 Ln+2

Lm-1 Lm Lm+1

L (equidistant in L)

L (equidistant in k)

Lstep

valA

valBout(m)

input signal

remapped signal

• relation is: k = 2π/λ withLstep =

λmax−λminN Ln = λmin + n · Lstep

kstep =2π

λmin−

λmaxN Lm = 2π

kmax−m·kstep

Marcel Jacomet 22 2008

Page 25: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Signal Processing in OCT: Remapping 3

• signal processing with look-up table

– no division with iteration

– no error due to continuous summing

Ln-1 Ln Ln+1 Ln+2

Lm-1 Lm Lm+1

L (equidistant in L)

L (equidistant in k)

Lstep

valA

valBout(m)

input signal

remapped signal

outm = valA+ (valB−valA)Lstop

· (Lm − Ln)

outm = valA+ (valB− valA) · LUTk(addr)

Marcel Jacomet 23 2008

Page 26: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Signal Processing in OCT: Control Path

• signal processing: data path and control path

– for clause would be perfect

– if clause in code asks for control path

– control can also be done by look-up tables

Ln-1 Ln Ln+1 Ln+2

Lm-1 Lm Lm+1

L (equidistant in L)

L (equidistant in k)

Lstep

valA

valBout(m)

input signal (equidistant sampling in wave length)

remapped signal (equidistant sampling in wave number)

Lm+2

Ln-1 Ln Ln+1 Ln+2

Lm-1 Lm Lm+1

L (equidistant in L)

L (equidistant in k)

Lstep

valA

valBout(m)

input signal (equidistant sampling in wave length)

remapped signal (equidistant sampling in wave number)

Lm+2

1x 2x 0x

Marcel Jacomet 24 2008

Page 27: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Signal Processing in OCT: Datapath and Control Path

1: i← 1, j ← 1, m← 1, adr ← 12: while m ≤ 1024 do3: varA← inp[i]4: varB ← inp[i+ 1]5: if lutCtr(adr − 1) 6= 2 then6: outm(j)← varA+ (varB − varA) ∗ lutK(adr)7: if lutCtr(adr) = 0 increment input and output sample

index then8: m← m+ 19: i← i+ 1

10: else if lutCtr(adr) = 3 keep, do not load new input sam-ple then

11: m← m+ 112: else if lutCtr(adr) = 2 skip, do not generate output sam-

ple then13: i← i+ 114: adr ← adr + 1

Marcel Jacomet 25 2008

Page 28: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Signal Processing in OCT: Simulink

outm = valA+ (valB− valA) · LUTk(addr)

Marcel Jacomet 26 2008

Page 29: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Signal Processing in OCT: ”No Control Path”

outm = valA+ (valB− valA) · LUTk(addr)

Marcel Jacomet 27 2008

Page 30: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Signal Processing in OCT: Simplifications in ControlPath

outm = valA+ (valB− valA) · LUTk(addr)

Marcel Jacomet 28 2008

Page 31: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

6 Parallelization at OCT Example

6.1 Data-Path Unfolding

Marcel Jacomet 29 2008

Page 32: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Unfolding: OCT Example 1

• OCT data flow graph for interpolation

• exercise: design a 4-slow unfolding

• simulate it with Matlab/Simulinik

in Mux

wr

Mux

wr

+

- *out+

D

D

D

D

D

D

D

lutKlutCTR

Marcel Jacomet 30 2008

Page 33: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Unfolding: How to Model the FiFo?

• OCT data flow graph for interpolation

• exercise: 4-slow unfolding inlcuding control path

• what about the FiFos?

in Mux

wr

Mux

wr

+

- *Mux

wr

out+

not 3 not 2

+

LUT ctr

LUT k1

D

D

D

D

D

D

D

D

D

D

D

2D 3D

D

D

1

?? D

push pop

FiFo ??

push pop

FiFo

6.2 FiFo Unfolding

Marcel Jacomet 31 2008

Page 34: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

FiFo Model

• Dfg model of a FiFo

• the FiFo has to be decomposed downto delay elementsand combinational logic

push pop

FiFo

Mux

wr

D

D Mux

wr

D

D

push pop

dual portRAM

in out

adrWadrRD

D

1

D

D

1

in out

Marcel Jacomet 32 2008

Page 35: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Unfolding the FiFo Model

• Dfg model of an 2-slow unfolding of FiFos

• impossible to compose again FiFos

• shall we start to re-implement all IP cores?

Mux

wr

Mux

wr

push

pop

dual portRAM

in out

adrWadrR

1

D

1

inout

Mux

wr

D

Mux

wr

pushpop

dual portRAM

in out

adrWadrR

11

inout

D D

D

D

6.3 DFT Unfoldingl

Marcel Jacomet 33 2008

Page 36: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Dft (Dtfs): Discrete Fourier Transform

• natural parallelization by Fft algos

• N -point Dft

X[k] =

N−1∑

n=0

x[n]W knN , k = 0, 1, 2, . . . , N − 1

where WN =̂ Nth root of unity

WN =N√1 = e−j(2π/N)

• inverse transform

x[n] =1

N

N−1∑

k=0

X[k]W−knN , n = 0, 1, 2, . . . , N − 1

We need a note on the factor 1/N .

Marcel Jacomet 34 2008

Page 37: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Dft: Matrix Form

• denote the vector of input samples by

x =(

x[0] , x[1] , x[2] , . . . , x[N − 1])T

• denote the vector of spectral samples by

X =(

X[0] , X[1] , X[2] , . . . , x[N − 1])T

• then the Dft can be written as

X = DFT (x) = Fx

with F =̂

1 1 1 · · · 1

1 WN W 2N · · · WN−1

N

1 W 2N W 2·2

N · · · W2·(N−1)N

...

1 WN−1N W

(N−1)·2N · · · W

(N−1)·(N−1)N

Superscript T denotes transpose.

Marcel Jacomet 35 2008

Page 38: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Dft: Low-Order Fourier Matrix Examples

• for N = 2: WN = W2 = 2√1 = e−j2π/2 = e−jπ = −1

F2 =̂

(

1 1

1 W2

)

=

(

1 1

1 −1

)

• for N = 4: WN = W4 = 4√1 = e−j2π/4 = e−jπ/2 = −j

F4 =̂

1 1 1 1

1 W4 W 24 W 3

4

1 W 24 W 2·2

4 W 2·34

1 W 34 W 3·2

4 W 3·34

=

1 1 1 1

1 −j −1 j

1 −1 1 −11 j −1 −j

Superscript T denotes transpose.

Marcel Jacomet 36 2008

Page 39: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware Algorithms

Dft: Matrix Factorization ❀ Fft

• for example N = 1024:

F1024 =̂

(

I512 D512

I512 −D512

)

·(

F512 O

O F512

)

·(

even

odd

)

where I512 =̂ identity matrix

D512 =̂ diag{

1,W1024,W21024, . . . ,W

5111024

}

F512 =̂ 512-point Fourier matrix

permutation at end separates even and odd part:

(↓)x =(

x[0] , x[2] , . . .)

(↓) (z)x =(

x[1] , x[3] , . . .)

Marcel Jacomet 37 2008

Page 40: HardwareAlgorithms Mse: Parallelization - BFH · Mse: HardwareAlgorithms Parallelization MarcelJacomet JosefGoette BernUniversityofAppliedSciences Bfh-TiHuCE-microLab,Biel/Bienne

Hardware AlgorithmsReferences

Marcel Jacomet 38 2008