![Page 1: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/1.jpg)
Hoplite-DSP Harnessing the Xilinx DSP48
Multiplexers to efficiently support NoCs on FPGAs
Chethan Kumar H B and Nachiket Kapre [email protected]
![Page 2: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/2.jpg)
Hoplite — FPL 2015 paper
• Jan Gray co-author
• Specs— 60 LUTs+100 FFs— 2.9ns clock
• Smallest FPGA router available + RTL code
2
![Page 3: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/3.jpg)
Router LUTs FFs ClockPenn 1.7K 541 4.5nsCMU 1.5K 635 9.6ns
Hoplite — FPL 2015 60 100 2.9ns
32b payload + Virtex-6 240T
3
![Page 4: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/4.jpg)
Router LUTs FFs ClockPenn 1.7K 541 4.5nsCMU 1.5K 635 9.6ns
Hoplite — FPL 2015 60 100 2.9ns
32b payload + Virtex-6 240T
25x
3
![Page 5: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/5.jpg)
Router LUTs FFs ClockPenn 1.7K 541 4.5nsCMU 1.5K 635 9.6ns
Hoplite — FPL 2015 60 100 2.9ns
32b payload + Virtex-6 240T
25x 5x
3
![Page 6: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/6.jpg)
Router LUTs FFs ClockPenn 1.7K 541 4.5nsCMU 1.5K 635 9.6ns
Hoplite — FPL 2015 60 100 2.9ns
32b payload + Virtex-6 240T
25x 5x 1.5x
3
![Page 7: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/7.jpg)
Router LUTs FFs ClockHoplite
FPL 2015 70 140 2.7ns
Hoplite-DSP FPL 2016 13 17 2.8ns
47b payload + Virtex-7 485T
4
![Page 8: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/8.jpg)
Router LUTs FFs ClockHoplite
FPL 2015 70 140 2.7ns
Hoplite-DSP FPL 2016 13 17 2.8ns
47b payload + Virtex-7 485T
5
![Page 9: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/9.jpg)
Router LUTs FFs ClockHoplite
FPL 2015 70 140 2.7ns
Hoplite-DSP FPL 2016 13 17 2.8ns
47b payload + Virtex-7 485T
5x
5
![Page 10: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/10.jpg)
Router LUTs FFs ClockHoplite
FPL 2015 70 140 2.7ns
Hoplite-DSP FPL 2016 13 17 2.8ns
47b payload + Virtex-7 485T
5x 8x
5
![Page 11: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/11.jpg)
Router LUTs FFs ClockHoplite
FPL 2015 70 140 2.7ns
Hoplite-DSP FPL 2016 13 17 2.8ns
47b payload + Virtex-7 485T
5x 8x ~
5
![Page 12: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/12.jpg)
Router LUTs FFs ClockHoplite
FPL 2015 70 140 2.7ns
Hoplite-DSP FPL 2016 13 17 2.8ns
47b payload + Virtex-7 485T
5x 8x ~+ 1 DSP48
6
![Page 13: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/13.jpg)
7
![Page 14: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/14.jpg)
Motivation
• Close the gap vs. embedded NoCs — do we really want clean-slate hard NoCs?
• Return resources to FPGA application — reduce NoC overheads
• Find clever ways to reuse existing FPGA elements
8
![Page 15: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/15.jpg)
Outline
• Adapting the Hoplite arch. to the DSP48
• Scaling to 2D layouts — using DSP carry chains
• Performance and Resource evaluation
9
![Page 16: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/16.jpg)
Outline
• Adapting the Hoplite arch. to the DSP48
• Scaling to 2D layouts — using DSP carry chains
• Performance and Resource evaluation
10
![Page 17: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/17.jpg)
Overview of Hoplite switch organization
• NoC organised as a unidirectional torus
• Each switch has 2 inputs, 2 outputs into the network + PE connection
• Uses deflection routing — no buffering, no allocation, etc
from: Jan Gray11
![Page 18: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/18.jpg)
Hoplite Internals
5LUT5
LUT
5LUT5
LUT5
LUT
6LUT
WPE E
S/PE
DOR Logicsel0 sel1,2
N
12
![Page 19: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/19.jpg)
Hoplite summary• Bulk of the footprint from 5-LUT, 6-LUT blocks
— implement packet multiplexers
• DOR logic handful of LUTs — only reads address fields, valid signals
• Inter-Hoplite router links pipelined — registers
• Idea: move (1) multiplexers + (2) registers into Xilinx DSP48 block
5LUT5
LUT
5LUT5
LUT5
LUT
6LUT
WPE E
S/PE
DOR Logicsel0 sel1,2
N
13
![Page 20: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/20.jpg)
Xilinx DSP48 block
A
D
B
C
30/
27/
18/
48/
P
48/
27/
48/
PCIN48/
ALU
X
Z
Y
PCOUTOPMODE ALUMODEINMODE
14
![Page 21: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/21.jpg)
Xilinx DSP48 block
A
D
B
C
30/
27/
18/
48/
P
48/
27/
48/
PCIN48/
ALU
X
Z
Y
PCOUTOPMODE ALUMODEINMODE
15
![Page 22: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/22.jpg)
Programmable elements
• Xilinx DSP block very versatile!
• Typical use case: signal processing, streaming computations => mainly arithmetic
• INMODE — 27b multiplexer between A and D OPMODE — 48b multiplexers between A:B, C
• Exploit cascade links PCIN/PCOUT!
A
D
B
C
30/
27/
18/
48/
P
48/
27/
48/
PCIN48/
ALU
X
Z
Y
PCOUTOPMODE ALUMODEINMODE
16
![Page 23: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/23.jpg)
Input + Multiplexer Mapping
5LUT5
LUT
5LUT5
LUT5
LUT
6LUT
WPE E
S/PE
DOR Logicsel0 sel1,2
N
A
D
B
C
30/
27/
18/
48/
P
48/
27/
48/
PCIN48/
ALU
X
Z
Y
PCOUTOPMODE ALUMODEINMODE
17
![Page 24: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/24.jpg)
Input + Multiplexer Mapping
5LUT5
LUT
5LUT5
LUT5
LUT
6LUT
WPE E
S/PE
DOR Logicsel0 sel1,2
N
A
D
B
C
30/
27/
18/
48/
P
48/
27/
48/
PCIN48/
ALU
X
Z
Y
PCOUTOPMODE ALUMODEINMODE
WEST
PE
N
S/PE
EAST
18
![Page 25: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/25.jpg)
Input + Multiplexer Mapping
5LUT5
LUT
5LUT5
LUT5
LUT
6LUT
WPE E
S/PE
DOR Logicsel0 sel1,2
N
A
D
B
C
30/
27/
18/
48/
P
48/
27/
48/
PCIN48/
ALU
X
Z
Y
PCOUTOPMODE ALUMODEINMODE
WEST
PE
N
S/PE
EAST
19
![Page 26: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/26.jpg)
Input + Multiplexer Mapping
5LUT5
LUT
5LUT5
LUT5
LUT
6LUT
WPE E
S/PE
DOR Logicsel0 sel1,2
N
A
D
B
C
30/
27/
18/
48/
P
48/
27/
48/
PCIN48/
ALU
X
Z
Y
PCOUTOPMODE ALUMODEINMODE
WEST
PE
N
S/PE
EAST
20
![Page 27: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/27.jpg)
Multi-cycling
• Problem: Hoplite has two outputs (three in fact, with S/PE output port shared)
• Solution: must multi-pump the DSP block — runs at 2x the frequency of the PEs
• First sub-cycle — resolve EAST output
• Second sub-cycle — resolve SOUTH/PE output
21
![Page 28: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/28.jpg)
First cycle
A
D
B
C
30/
27/
18/
48/
27/
PCIN48/
ALU
X
Z
Y
PCOUTOPMODE ALUMODEINMODE
PE Input
West Input
East Output
48/
P48/
CE
22
![Page 29: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/29.jpg)
Second cycle
A
D
B
C
30/
27/
18/
48/
27/
PCIN48/
ALU
X
Z
Y
PCOUTOPMODE ALUMODEINMODE
PE Input
West Input
South/PE Output
48/
P48/
North Input
CE
23
![Page 30: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/30.jpg)
Outline
• Adapting the Hoplite arch. to the DSP48
• Scaling to 2D layouts — using DSP carry chains
• Performance and Resource evaluation
24
![Page 31: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/31.jpg)
DSP48 columnar layout
DSP48E
DSP48E
PCOUT
PCIN
A:B
C
P DSP48E
UserLogic
A:B
DSP48E
PCOUT
PCIN
DSP48E P
dedicatedcascade routes
programmable FPGA interconnectDSP
Column
DORLogic
25
![Page 32: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/32.jpg)
Layout considerations• FPGA DSPs organised into vertical columns
~100s of DSPs in a column~10s of columns
• Restrictions:1. Cascade links only extend within column 2. Horizontal links must use general interconnect
• Key question: Adjusting NoC size vs. DSP count— use passthrough DSPs
26
![Page 33: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/33.jpg)
Embedded layout
Hoplite
Hoplite
DSP48E
cascade
fabric
Hoplite
DSP48E
DSP48E
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplitefabric
Top-Turn DSPs PCIN to P
Bottom-Turn DSPs A:B to PCOUT
DSP48EDSP48E DSP48E DSP48E
Pass-thru DSPsPCOUT to PCIN
Pass-thru DSPsPCOUT to PCIN
Router DSPs
Router DSPs
Router DSPs
27
![Page 34: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/34.jpg)
Comparing Xilinx Virtex6 and Virtex7 Layouts
8x8 NoC (ML605 board)
16x16 NoC (VC707 board)
28
![Page 35: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/35.jpg)
Outline
• Adapting the Hoplite arch. to the DSP48
• Scaling to 2D layouts — using DSP carry chains
• Performance and Resource evaluation
29
![Page 36: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/36.jpg)
LUTs vs DSPs
30
• Simple tradeoff— substantially fewer LUTs vs. DSP48s— Importantly, FFs absorbed into DSP48
• Power and effective B/W for random traffic mostly identical
![Page 37: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/37.jpg)
LUTs vs DSPs
31
• Simple tradeoff— substantially fewer LUTs vs. DSP48s— Importantly, FFs absorbed into DSP48
• Power and effective B/W for random traffic mostly identical
![Page 38: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/38.jpg)
Commentary on hard NoCs• Area:
— Hard router = 12.45 LABs— 1 Altera DSP block = 11.9 LABs Stratix-III— Hoplite-DSP marginally smaller
• Speed:— Hard router ~996 MHz— Hoplite-DSP ~650 MHz (multi-pumped)— Hoplite-DSP limits freq advantage to 3x.
• Power— Hard router ~1.58 W— Hoplite-DSP model ~1.1W 15% activity— Hoplite-DSP uses ~50% less power
32
Abdelfattah + Betz [TRETS2014](extrapolated results for 48b-wide 1VC)
![Page 39: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/39.jpg)
Wish-list for DSP48s Gen2• Configurable Cascades
— 48b switched bidirectional routing instead of just cascades (approach hard NoC wiring)— option to skip DSP blocks (segment lengths)
• DOR routing— pattern detection logic with multiple masks (similar to Altera DSP units)
• SIMD Multiplexing — fracturing 48b-wide lanes into multiple lanes
33
![Page 40: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/40.jpg)
Conclusions
• Hoplite muxes mapped to DSP48 blocks — use the dynamic OPMODE feature
• Reduce cost by 5x LUTs, 8x FFs per router
• Exploit cascade links to absorb NoC wiring
• Significantly close the gap with hard NoCs
34
![Page 41: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/41.jpg)
Embedded layout• Three kinds of DSPs
• “Route DSPs” — Small fraction of DSPs for switching
• “Pass-through DSPs” — glorified “pipelined wires” — multi-pumping 50% back to user
• “Corner-turn DSPs”— connect cascades to fabric
Hoplite
Hoplite
DSP48E
cascade
fabric
Hoplite
DSP48E
DSP48E
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
fabric
Top-Turn DSPs PCIN to P
Bottom-Turn DSPs A:B to PCOUT
DSP48EDSP48E DSP48E DSP48E
Pass-thru DSPsPCOUT to PCIN
Pass-thru DSPsPCOUT to PCIN
Router DSPs
Router DSPs
Router DSPs
Hoplite
Hoplite
DSP48E
cascade
fabric
Hoplite
DSP48E
DSP48E
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
fabric
Top-Turn DSPs PCIN to P
Bottom-Turn DSPs A:B to PCOUT
DSP48EDSP48E DSP48E DSP48E
Pass-thru DSPsPCOUT to PCIN
Pass-thru DSPsPCOUT to PCIN
Router DSPs
Router DSPs
Router DSPs
Hoplite
Hoplite
DSP48E
cascade
fabric
Hoplite
DSP48E
DSP48E
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
fabric
Top-Turn DSPs PCIN to P
Bottom-Turn DSPs A:B to PCOUT
DSP48EDSP48E DSP48E DSP48E
Pass-thru DSPsPCOUT to PCIN
Pass-thru DSPsPCOUT to PCIN
Router DSPs
Router DSPs
Router DSPs
Hoplite
Hoplite
DSP48E
cascade
fabric
Hoplite
DSP48E
DSP48E
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
fabric
Top-Turn DSPs PCIN to P
Bottom-Turn DSPs A:B to PCOUT
DSP48EDSP48E DSP48E DSP48E
Pass-thru DSPsPCOUT to PCIN
Pass-thru DSPsPCOUT to PCIN
Router DSPs
Router DSPs
Router DSPs
Hoplite
Hoplite
DSP48E
cascade
fabric
Hoplite
DSP48E
DSP48E
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
fabric
Top-Turn DSPs PCIN to P
Bottom-Turn DSPs A:B to PCOUT
DSP48EDSP48E DSP48E DSP48E
Pass-thru DSPsPCOUT to PCIN
Pass-thru DSPsPCOUT to PCIN
Router DSPs
Router DSPs
Router DSPs
35
![Page 42: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/42.jpg)
Physical FPGA layout
Hoplite
Hoplite
DSP48E
cascade
fabric
Hoplite
DSP48E
DSP48E
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
Hoplite
DSP48E
Hoplite
DSP48E
DSP48E
Hoplite
fabric
Top-Turn DSPs PCIN to P
Bottom-Turn DSPs A:B to PCOUT
DSP48EDSP48E DSP48E DSP48E
Pass-thru DSPsPCOUT to PCIN
Pass-thru DSPsPCOUT to PCIN
Router DSPs
Router DSPs
Router DSPs
2x2 NoC (ML605 board)
Corner-Turn
Pass-Thru
Hoplite
36
![Page 43: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/43.jpg)
![Page 44: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/44.jpg)
Efficiency
38
![Page 45: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/45.jpg)
Efficiency
39
![Page 46: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/46.jpg)
Efficiency
40
![Page 47: hoplite-dsp - FPL 2016 · Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs Chethan Kumar H B and Nachiket Kapre nachiket@ieee.org. Hoplite](https://reader035.vdocuments.us/reader035/viewer/2022062602/5ee1f8dfad6a402d666ca630/html5/thumbnails/47.jpg)
Efficiency
41
DSP48s less-efficient than LUT-based Hoplite!