![Page 1: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/1.jpg)
Best of Both Worlds: A Bus-Enhanced Network on-Chip
(BENoC)
Ran Manevich, Isask’har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny
Technion – Israel Institute of Technology
May, 2009
![Page 2: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/2.jpg)
2
Network on-Chip : the Good News
Interconnect for SoCs, CMPs and FPGAs Multi-hop, packet-based communication Efficient resource sharing
Scalable performance and efficiency in Power Area Design productivity
System Bus
![Page 3: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/3.jpg)
3
Network on-Chip : the Bad News
Increased and hard-to-predict latency due to multi-hop and sharing Time critical signals
Broadcast? multicast? No easy solutions Slow (10s of cycles)
I wish I had a bus at hand ….
![Page 4: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/4.jpg)
4
Solution: Bus-Enhanced NoC (BENoC)
Bus re-introduced as a NoC “add-on”
Use NoC for data Optimized for high bandwidth
Use bus for short meta-data Low bandwidth, low latency Broadcast, multicast
Overhead should be justified!
R
RR RR
R
R
R RR
R
R
R R
R
R
R
R R
R
R
R
R
R
RR
RR
R
R
R
R
Module Module
Module Module
Module Module
Module Module
Module
Module
Module
Module
Module
Module
Module
Module
![Page 5: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/5.jpg)
5
In-band support of time critical communication; and:In-band Multicast/Broadcast Complex router
implementation Suffer from multi-hop latency
Existing Bus-NoC hybrids Form a topological hierarchy Typically bus used for local
communication
Related WorkModule
Module
Module
Module
Module
Module
Module Module Module
R
R
R R
R
R R R
R
Module Module
Module Module
Module
Module
Module
Module
R R R
Module Module
Module Module
Module
Module
Module
Module
R R R
R R R
![Page 6: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/6.jpg)
6
BENoC Services
Fast unicast and multicast signaling CMP cache example
Anycast Find resources that fulfills certain
conditions E.g., “Looking for an idling DSP”; or
“Where are the 5 closest multipliers?” Convergecast
Efficient collection of feedback back to the initiator
Barrier synchronization, …
![Page 7: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/7.jpg)
7
Additional BENoC Applications
NoC control Router configuration
E.g., routing table configuration Adapt NoC routing for load balancing Fault discovery and recovery
System control Power management Resource load balancing
Debug
![Page 8: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/8.jpg)
8
Outline Introduction MetaBus architecture MetaBus latency and energy analysis CMP cache use case
![Page 9: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/9.jpg)
9
Conventional System Buses
Figure is copied from “Amba Specifications Rev 2.0” - http://www.arm.com/products/solutions/AMBA_Spec.html
Bandwidth optimized Poor scalability Not suitable for tasks in
BENoC
![Page 10: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/10.jpg)
10
MetaBus Design Requirements
Low area, low power Low bandwidth Low latency Simple Versatile Scalable
Multicast and broadcast support
Acknowledgement
R
R
R
R
R R
R
RR R R
RR R R
R
Module
Module
Module
Module
Module
Module
Module
Module
ModuleModule Module Module
ModuleModule Module Module
“MetaBus”
![Page 11: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/11.jpg)
11
MetaBus Architecture
Many possible implementations Example: tree topology with distributed
arbitration
Module#1
Module#2
Module#3
Module#4
Module#5
Module#6
Module#7
Module#8
Module#9
BusStation
BusStation
BusStation
BusStation
Root
BusStation
![Page 12: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/12.jpg)
12
Module#1
Module#2
Module#3
Module#4
Module#5
Module#6
Module#7
Module#8
Module#9
BusStation
BusStation
BusStation
BusStation
Root
BusStation
Data Path
Data to rootData to receivers
![Page 13: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/13.jpg)
13
Module#1
Module#2
Module#3
Module#4
Module#5
Module#6
Module#7
Module#8
Module#9
BusStation
BusStation
BusStation
BusStation
Root
BusStation
Address word propagates to the rootData word
1Data word 2
propagates to the modules
Example: Broadcast of Two Words
![Page 14: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/14.jpg)
14
Module#1
Module#2
Module#3
BusStation
BusStation
Root
BusStation
Distributed Arbitration Mechanism
Bus RequestBus Grant
![Page 15: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/15.jpg)
15
Module#1
Module#2
Module#3
Module#4
Module#5
Module#6
Module#7
Module#8
Module#9
BusStation 3
BusStation 4
BusStation 5
BusStation 2
Root
BusStation 1
Address word propagates to the rootData word
1propagates to the modules
Masking Saves Power
Mask1Mask2Mask3Mask4Mask5
Mask1
Mask2
Mask3
Mask4
Mask5
Unicast from Module#3 to Module#5
1 0
1 0 1
10101
![Page 16: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/16.jpg)
16
(Binary )Bus Station
![Page 17: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/17.jpg)
17
MetaBus Floorplan – An Example
64 modules balanced binary MetaBus
![Page 18: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/18.jpg)
18
Outline Introduction MetaBus architecture MetaBus Latency and energy analysis CMP cache use case
![Page 19: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/19.jpg)
19
Analysis Highlights 1/4
NoC Broadcast+Unicast Energy/Transaction:
2NoC broadcast flits NL NDE V N K C C
2
1
2NoC unicast flits W NL ND
nE V N L C C
![Page 20: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/20.jpg)
20
Analysis Highlights 2/4
MetaBus Broadcast and Unicast Energy/Transaction:
2,
12
,1 1
D D
MetaBus flits D BL BD upbroadcast
B Bn n
flits BL R BD down Rn n
E V N B C C
V N C B C B
2,
2,1
MetaBus flits D BL BD upunicast
flits R D BL D BD down
E V N B C C
V N B B C B C
![Page 21: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/21.jpg)
21
Analysis Highlights 3/4
NoC unicast and broadcast latency:
NoC unicast CiR Nclk Nclk flitsT nN T T N
NoC broadcast Nclk flitsT n T N
![Page 22: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/22.jpg)
22
Analysis Highlights 4/4
MetaBus unicast and broadcast latency:
,,
,
, ,
,
1.5
0.7 0.4
0.7 0.4
MetaBus flits
BL BD upD BL BD up BL BL
BD up
R BL BD down BL BD downD BL BL
BD down
T N
C CB R C R C
C
B C C R CB R C
C
![Page 23: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/23.jpg)
23
Results - Energy Consumption
Energy consumption for a 3 data words broadcast and unicast transactions
0
0.5
1
1.5
2
2.5
3
3.5
0 5 10 15 20 25 30 35 40
Number of Modules
En
erg
y p
er t
ran
sact
ion
[n
J]
MetaBus Broadcast
Network Broadcast
MetaBus Unicast
Network Unicast
Bus and NoC unicast and broadcast energy per transaction
10X10 mm chip
64 modules mesh
1GHz NoC clock
Speed optimized bus
@0.18um
![Page 24: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/24.jpg)
24
Results - Latencies 3 data words broadcast and unicast
transactions latencies in system with a frequency and a speed optimized MetaBus.
0
20
40
60
80
100
120
0 5 10 15 20 25 30 35 40
Number of modules
La
ten
cy
[n
s]
MetaBus
Network Broadcast
Network Unicast
Figure 9: Bus and NoC broadcast latencies
10X10 mm chip
64 modules mesh
1GHz NoC clock
Speed optimized bus
@0.18um
![Page 25: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/25.jpg)
25
Outline Introduction MetaBus architecture MetaBus Latency and energy analysis CMP cache use case
![Page 26: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/26.jpg)
26
Dynamic Non-Uniform Cache Access
Split large cache into independent smaller banks Non uniform cache access time (NUCA)
Cache lines are moved to shorten access time Dynamic NUCA
Before fetching a into its L1$, a CPU needs to find the L2 cache storing the line
CPUL1$
L2$ L2$
L2$ L2$
L2$ L2$
L2$ L2$
L2$ L2$
L2$ L2$
L2$ L2$
L2$ L2$
CPUL1$
CP
UL1$
CP
UL1$
CPUL1$
CPUL1$
CP
UL1
$
CP
UL1
$
L2$
CMP
(Chi
p Mul
ti Pr
oces
sor)
![Page 27: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/27.jpg)
27
Simulation Setup 16 processors, 64 L2 cache banks PARSEC and SPLASH-2 benchmarks Vanilla Wormhole NoC Simulation account for bus latency,
arbitration time, etc.
![Page 28: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/28.jpg)
28
Simulation Results
Performance improvement in BENoC compared to a NoC-based CMP
(a) average read transaction latency; (b) application speed
![Page 29: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/29.jpg)
29
Summary Current NoCs are largely distributed
Borrowing concepts from off-chip networks On-chip environment provides an
opportunity Enhancing the network with a bus gives the
best of both worlds Advanced services are easily supported
Anycast, management and control Cost effective
Power and performance Analysis and simulation
![Page 30: Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel](https://reader035.vdocuments.us/reader035/viewer/2022070306/55177a085503460e6e8b526e/html5/thumbnails/30.jpg)
30
Thank you!
Questions?
Bus-Enhanced NoC
M odule
M odule M odule
M odule M odule
M odule M odule
M odule
M odule
M odule
M odule
M odule
QNoCResearch
GroupGroup
ResearchQNoC