low power qdi asynchronous fft · 14 this work sync (chip)* [1] [2] tech 65 nm 65 nm 65 nm 0.35 µm...
TRANSCRIPT
![Page 1: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/1.jpg)
Low Power QDI Asynchronous FFT
Benjamin Z. Tang, Frank LaneQualcomm ResearchMay 10, 2016
![Page 2: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/2.jpg)
2
• Extreme long battery life is crucial for M2M communication
• Must push innovation in low-power communication
• FFT is a common IP block that is computationally and memory intensive
Motivation
![Page 3: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/3.jpg)
3
• DFT:
• FFT: O(N log(N))
• Implementation options:• Algorithm level: Radix, DIT/DIF, number representation• Circuit level: Twiddle multiplication, CORDIC rotation, multi-rate clocking, etc
• Reference sync version:• Radix 23
• Decimation in frequency (DIF)• 16-bit real, 16-bit imaginary• Twiddle multiplication: (a+bj)*(c+dj) = (ac-bd) + (ad+bc)j
FFT
21
0[ ] [ ] , 0,1,...., 1
kN j nN
nX k x n e k N
π− −
=
= = −∑
![Page 4: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/4.jpg)
4
Radix 23
FFT Architecture
Stage 1 Memory
Stage 2 Memory
Stage 2 Memory
Stage 3 Memory
Stage 3 Memory
Stage 3 Memory
Stage 3 Memory
-j
-j
-jW8
W8
IN
Memory
TwiddleNi/2
Ni/4
Ni/8
![Page 5: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/5.jpg)
5
• QDI
• Micro-architecture optimizations highlights:• Token ring memory controls
• CORDIC twiddle multiplication
Async FFT
![Page 6: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/6.jpg)
6
Example:
• 1st stage of 8-pt FFT
• 1-D ring
Token-Ring Memory Intro
Tin
ToutC
Tin
ToutC
Tin
ToutC
Tin
ToutC
Stage mem input
Out
OutdelayedTo butterfly
Stage memory
0
1
2
3
WR
W
W
W
R
R
R
![Page 7: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/7.jpg)
7
1st stage example
• 2 passes, need 2-D ring
• 1st pass: store 8, read 8, once
• 2nd pass: store 1, read 1, 8 times
• Token ring keeps track of pattern
16-point FFT
Token-Ring ControlsTin
ToutC
Tin
ToutC
Tin
ToutC
Tin
ToutC
Stage mem input
Out
OutdelayedTo butterfly
Stage memory
Tin
ToutC
Tin
ToutC
Tin
ToutC
Tin
ToutC
0
1
2
3
4
5
6
7
WR
WR
WR
WR
WR
WR
WR
WR
W
WR
WR
WR
WR
WR
WR
WR
R
![Page 8: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/8.jpg)
8
1st stage example
• 3 passes, need 3-D ring, with 8 groups of 8
• 1st pass: store 64, read 64, once
• 2nd pass: store 8, read 8, 8 times
• 3rd pass: store 1, read 1, 64 times
128-point FFT
Token-Ring Controls Tin1Tout
CTin0
Tin1Tout
CTin0
ToutC
Tin
ToutC
Tin
Tout0C
Tin
Tout1
Tout0C
Tin
Tout1
0
1
7
8
9
15
...
...
Group 0
Group 1
Group 2
Group 7
...
...
...
Token rings provide controls for memory read/write, automatic back-pressure, counting and addressing
![Page 9: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/9.jpg)
9
• Twiddle (e-j(2πkn/N)) multiplication = rotation by angle –(2πkn/N)
• Pipelined vs iterative
• Performance, area, clock cycles
CORDIC Rotation Intro
1
1
1
0
0
0
2
2
arctan 21, 01, 0
ii i i i
ii i i i
ii i i
ii
i
in
in
x x y d
y y x dz z d
zd
zx xy yz rotation angle
−+
−+
−+
= − × ×
= + × ×
= − ×
− <⎧= ⎨
≥⎩=
=
=
Chose iterative architecture
![Page 10: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/10.jpg)
10
• 6 iterations
• Bypass CORDIC for 0 degrees• About 37% of the time in 128-point FFT
• Increased performance, reduced power
Asynchronous CORDIC Engine
×(-1)atan
01
0 1
0 0
1 1
msb
Z07
7
01
0 1
×(-1)
0 0
1 1
18
>>>i
Y0
Youtscale16
C
Cend
18
Dinv
01
0 1
0 0
1 118
>>>i
X0
Xoutscale16
C
Cend
18
count
Dinv Dinv
Xs Ys
i i
C=*[1,0,0,0,0,0,0]Cend=*[0,0,0,0,0,0,1]
C
Cend
×(-1)
Ramps up performance to cycle through iterations without being constrained by system clock
![Page 11: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/11.jpg)
11
• CHP production rules transistor-level (Spice) netlist
• Sinusoid input, negligible difference compared to result from Matlab's native FFT function
Results - FFT Plot
![Page 12: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/12.jpg)
12
Results – Spice Waveforms
Inputs
1st stage
2nd stage
3rd stage
Outputs
![Page 13: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/13.jpg)
13
Subsystems Energy (nJ)
Memories, controls, butterflies, others 3.1
CORDIC 2.8
Total 5.9
Results - Power
• 65nm technology
• Vdd=1V
• 10 MHz data rate
![Page 14: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/14.jpg)
14
This Work Sync (Chip)* [1] [2]
Tech 65 nm 65 nm 65 nm 0.35 µm
Voltage 1.0 V 1.0 V 0.3 V 1.1 V
N-point 128 128 128 128
Data rate 10 MHz 10 MHz - 16 kHz
Energy 5.9 nJ 205 nJ 31 nJ 120 nJ
Results – Comparison
* Normalized to same data rate and FFT length
[1] K.-S. Chong, J. Chang, I. Ebong, Y. Yilmaz, and P. Mazumder, “Comparison of FFT/IFFT Designs Utilizing Different Low Power Techniques,” in Electronic System Design (ISED), 2012 International Symposium.
[2] K.-S. Chong, B.-H. Gwee, and J. S. Chang, “Energy-Efficient Synchronous-Logic and Asynchronous-Logic FFT/IFFT Processors,”, IEEE Journal of Solid-State Circuits, 2007.
![Page 15: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/15.jpg)
15
• Low power clockless FFT design• Same design concepts can be extended to high performance
systems• Can lower supply voltage further for near-threshold computing
• Simple, fast token rings memory controls
• Small, fast CORDIC engine
Summary
![Page 16: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/16.jpg)
16
• Rajit Manohar for async CAD tools
Acknowledgment
![Page 17: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz](https://reader033.vdocuments.us/reader033/viewer/2022053020/5f814800ade84214b13146d3/html5/thumbnails/17.jpg)
Thank you
Follow us on:For more information, visit us at: www.qualcomm.com & www.qualcomm.com/blog
Nothing in these materials is an offer to sell any of the components or devices referenced herein.
©2016 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm is a trademark of Qualcomm Incorporated, registered in the United States and other countries.Other products and brand names may be trademarks or registered trademarks of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable.Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT.