1 fft processor_modified
TRANSCRIPT
-
8/7/2019 1 FFT PROCESSOR_modified
1/19
ACCESSICLAB
Graduate Institute of Electronics Engineering, NTU
FFT VLSI ImplementationFFT VLSI Implementation
VLSI Signal Processing
1. Shousheng He and Mats Torkelson, A new approach to pipeline FFTprocessor. IEEE Proc. Of IPPS, P766-770, 1996.
2. E. Bidet, D. Castelain, C. Joanblanq, and P. Senn, A fast single-chip
implementation of 8192 complex point FFT. IEEE J. Solid-State Circuits,
P300-305, March 1995
-
8/7/2019 1 FFT PROCESSOR_modified
2/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
FFT Review
with
1-N0,1,...,kfor)()(
)/2(
1
0
Nj
N
N
n
nk
N
eW
WnkX
T
G
!
!
!!
1
1
1
1
1
1
1
1
1
1
1
1
W N0
WN
0
W N0
W N0
W N0
W N2
W N0
W N2
W N0
W N2
W N1
W N3
G [ ]0
G [ ]4
G [ ]2
G [ ]6
G [ ]1
G [ ]5
G [ ]3
G [ ]7
X [ ]0
X [ ]1
X [ ]2
X [ ]3
X [ ]4
X [ ]5
X [ ]6
X [ ]7
-
8/7/2019 1 FFT PROCESSOR_modified
3/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Implementation--- Two Extreme Method
Slow ----------------- Speed ----------------- Fast
Small ------------------Area------------------- Large
1
1
1
1
1
1
1
1
1
1
1
1
N
0
N
0
N
0
N
0
W N0
W N2
W N0
WN
2
N
0
N
2
N
1
N
3
G [ ]0
G [ ]4
G [ ]2
G[ ]6
G [ ]1
G [ ]5
G [ ]3
G [ ]7
X [ ]0
X [ ]1
X [ ]2
X [ ]3
X [ ]4
X [ ]5
X [ ]6
X [ ]7
Fully SpreadReuse Single Butterfly
Complicated ------------ Control --------------- Simple
-
8/7/2019 1 FFT PROCESSOR_modified
4/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Design ConsiderationSystem Requirement
e.g., speed, area,power
Trade-off in these two cases, we need
More Processing Elements (PEs)
Better Processing Element Utilization
RateBetter Control Scheme
-
8/7/2019 1 FFT PROCESSOR_modified
5/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
FFT Processor--- Block Diagram
COEF
ROM
Processing
Element
(Butterfly)
FFT
RAM
INPUT
BUFFER
CONTROL
DATA OUTDATA IN
CONTROL
SIGNAL
-
8/7/2019 1 FFT PROCESSOR_modified
6/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Some Current Themes
BF2
8
BF2
4
BF2
2
jBF2
1
Radix-2 Single-path Delay Feedback. ( N = 16 )
Radix-2 Multi-path Delay Commutator. ( N = 16 )
-
8/7/2019 1 FFT PROCESSOR_modified
7/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Some Current Themes (cont.)
Radix-4 Single-path Delay Feedback. ( N = 256 )
BF4
8
BF4
4
BF4
2
jBF4
1
DC6x64 BF4 BF4 DC6x16 BF4 DC6x4 BF4 DC6x1
Radix-4 Single-path Delay Commutator. ( N = 256 )
C4
192
BF4
C4 C4 C4 BF4128
64
16
32
48
48
BF4
32
16
4812
12
BF4
84
123
321
Radix-4 Multi-path Delay Commutator. ( N = 256 )
-
8/7/2019 1 FFT PROCESSOR_modified
8/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Distinctive merit of the aboveThe delay-feedback are more efficient
than delay-commutator in terms of
memory utilizationRadix-4 has higher multiplier utilization
,however,Radix-2 has simpler BF which
are better utilized
-
8/7/2019 1 FFT PROCESSOR_modified
9/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Comparison
Control ThemeSimple ----------------------------------- Complex
ProcessingAbility / Unit
Low ----------------------------------- High
Radix / Speed
Low ----------------------------------- High
Combine the advantages
Further decompose high radix PE
-
8/7/2019 1 FFT PROCESSOR_modified
10/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Decompose Method (1)Simply reuse the repeated micro unit
Reuse 4times
A radix-4PE
-
8/7/2019 1 FFT PROCESSOR_modified
11/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Decompose Method (2)From algorithm levelApplying 3 index:
n=N
k=N
Summation ofn1
where n1,n2={0,1} ;n3={0~N/4-1}
-
8/7/2019 1 FFT PROCESSOR_modified
12/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Decompose Method (2) cont.Summation of n2
Only real-imaginary swapping & sign inversion
-
8/7/2019 1 FFT PROCESSOR_modified
13/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Graphical Explanation (N=16)
Trivial multiplication
-
8/7/2019 1 FFT PROCESSOR_modified
14/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Graphical Explanation (cont.)The Eqs are equivalent to the operations
below
BF4
Control
BF2 I BF2 II
Control
-
8/7/2019 1 FFT PROCESSOR_modified
15/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Circuit of BF2I
Xr(n)
Xi(n)
Xr(n+N/2)
Xi(n+N/2)
Zr(n+N/2)
Zi(n+N/2)
Zr(n)
Zi(n)
First N/2 cycles
Second N/2 cycles
-
8/7/2019 1 FFT PROCESSOR_modified
16/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Circuit of BF2II
Xi(n)
Xr(n+N/2)
Xr(n)
Xi(n+N/2) Z i(n)
Zr(n)
Zi(n+N/2)
Zr(n+N/2)
Swap Re&Im and sign inversion
-
8/7/2019 1 FFT PROCESSOR_modified
17/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Radix-22
Single-path Delay Feedback
BF2i
128
x(n) BF2ii
64
BF2i
32
BF2ii
16
BF2i
8
BF2ii
4
BF2i
2
BF2ii
1
X(k)
W1(n) W2(n) W3(n)
01234567clk
FFT architecture using the above technique, forN=256
Compare with original architecture, forN=256
-
8/7/2019 1 FFT PROCESSOR_modified
18/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Structural advantageRadix-2 has the same complexity as
radix-4,but still retain radix-2 BF
structureThe stage has non-trivial multiplication
Control is simple;
synchronization controller
address counter for W
2
n
-
8/7/2019 1 FFT PROCESSOR_modified
19/19
ACCESSICLAB Graduate Institute of Electronics Engineering, NTU
Conclusions1. FFT Applications: Radar Signal Processing, Fast
convolution, Spectrum Estimation, OFDM-based
Modulation/demodulations
2. Efficient VLSI architectures (parallel processing) are
required for real-time processing.
3. However, most systems still employ DSP processors (e.g.,
TI C3x/C5x) for computations (fast algorithms like DIT and
DIF FFT).4. VLIW (Very Long-length Instruction Word)-based processors
(TI C6x) need new programming skills to utilize the two
parallel MAC units.