P1B
P2BP4B
P3B
P1A
P2AP4A
P3A
Register file
Processor
Operation Scheduling
ALU
Program Memory
ALUR
To Register file
To Register file
Output
Processor Architecture
R41B
R41A
R12B
R12A
R34A
R34B
R23A
R23B
Control Input
An Energy-Efficient Reconfigurable Multiprocessor IC for DSP
Applications
• Multiple programmable VLIW processors arranged in a ring topology– Balances its functionalities between ASICs and general-purpose digital signal processors– Distributed memories along with direct inter-processor communications through register files– Flexible choice of computing resources
• Energy-efficient DSP applications can be achieved by exploiting its multi-level reconfigurable architecture
– Efficient mapping of algorithms onto the multiprocessor– Inside each processor, computation modules, e.g. multipliers, can be turned off by the instructi
ons to improve the energy-efficiency– Scalable datapath provides a means of trading off performance vs. power efficiency– Memory localization through distributed memories also contributes to power savings
• Variable word-length 20-tap FIR and 8-point FFT (16-, 24- and 48-bit)– In 16- and 24-b resolution, ring A in use and ring B in “sleep mode”; in 48-b mode, both rings a
re active—ring A for 24 MSBs and ring B for 24 LSBs
– Booth multipliers used in 16-b mode; serial multipliers employed in 24- and 48-b modes
– Multipliers are active only for the multiplication with W81 and W8
3 in the FFT
• Reconfigurable Viterbi decoder (K = 6 to 9, r = 1/2 and 1/3)– Efficient ACS implementation and path metric memory localization
MUX
8-bitadder
Compare
8-bitadder
{6'h00, BM0, 6'h00, BM1}
{PM0, PM1}
16
16
sub +
Viterbi
enable
16
8 8
survivor metric
8 General addition output
Mode<1>Mode<0>
Cin_ext
sub 0
1
BM1sub
2
B<7:0>Cin
8
A<7:0>8
Co
PM1
sub
0 1
mode<1>
16
B<7:0>Cin
A<7:0>
Co
8
82BM0sub
PM0
Energy-Efficient DSP Applications on the Multiprocessor IC
23456789
1011121314151617181920212223
1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
VDD (V)
Po
wer
(m
W)
48-bit FIR
48-bit FFT
24-bit FIR
24-bit FFT
16-bit FIR
16-bit FFT
0.10
1.00
10.00
100.00
1000.00
1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
VDD (V)
Max
imu
m T
hro
ug
hp
ut
(MH
z)
16-bit FFT 24-bit FFT 48-bit FFT
16-bit FIR 24-bit FIR 48-bit FIR
0
50
100
150
200
250
300
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0VDD (V)
Po
we
r (m
W)
K=9 (@ 10 Mbps)
K=8 (@ 10 Mbps)
K=7 (@ 10 Mbps)
K=6 (@ 10 Mbps)
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
0.7 0.9 1.1 1.3 1.5 1.7 1.9
VDD (V)
Ma
xim
um
De
co
de
Ra
te (
Mb
ps
)
K=6K=7K=8K=9
(c) Viterbi decoder: power consumption vs. VDD (d) Viterbi decoder: maximum throughput vs. VDD
(a) FIR & FFT: power consumption vs. VDD (b) FIR & FFT: maximum throughput vs. VDD
Results and Conclusion
• Conclusion: the multiprocessor IC achieves performance close to ASIC solutions while possessing a degree of flexibility available only in general-purpose digital signal processors
48-b FIR48-b FFT24-b FIR24-b FFT16-b FIR16-b FFT
16-b FFT16-b FIR
24-b FFT24-b FIR
48-b FFT48-b FIR