router microarchitecture and scalability of ring topology ... · nocarc’09 ring router...
TRANSCRIPT
1 NoCArc’09 Ring Router Microarchitecture
Router Microarchitecture and Scalability of Ring Topology in
On-Chip Networks
John Kim, Hanjoon Kim Department of Computer Science
KAIST
2 NoCArc’09 Ring Router Microarchitecture
Topology
• Topology efficiently exploits the available packaging technology to meet the requirements at a minimum cost
zero-load latency
saturation throughput
3 NoCArc’09
[Scott et al. ISCA06]
On-chip networks are different
Ring Router Microarchitecture
Off-Chip Networks On-Chip Networks
[src: Intel Developers Forum]
4 NoCArc’09
Topologies for On-Chip Networks
• Crossbar is often sufficient – if it can be done efficiently
• 2D mesh topology commonly assumed • Many different topologies recently proposed
– CMESH [ICS’06] – Flattened butterfly [Micro’07] – Express Cubes [HPCA’09] – Hierarchical Network [HPCA’09] – …
• Recent multicore architectures have used the ring topology – Cell processor, Intel processors, …
Ring Router Microarchitecture
5 NoCArc’09
Why Ring Topology?
• Routing – route with clockwise or counterclockwise – route until destination reached
• Low-radix router – each “router” only requires 3 ports (local port, left & right
port) • Flow control
– Arbitration can be simplified – 3 ports but only two maximum requests
• Can be implemented without “routers” – Bufferless router – Simple topology
Ring Router Microarchitecture
6 NoCArc’09 Ring Router Microarchitecture
Today’s Talk
• Background in On-Chip Networks and Topology
• Router Microarchitecture for Ring Topology
• Scalability of Ring Topology
• Summary
7 NoCArc’09
Bufferless router in ring topology
• Simplified arbitration – Priority to packets already in flight – Guaranteed (deterministic) latency to destination
• No buffers needed – No misrouting [Bufferless router ISCA’09] – No packet dropping [SCARAB Micro’09]
• Only two-input muxes • No routing deadlock
Ring Router Microarchitecture
8 NoCArc’09
Conventional Router Microarchitecture
Ring Router Microarchitecture
9 NoCArc’09
Bufferless Ring Topology Router Microarchitecture
Ring Router Microarchitecture
10 NoCArc’09
No buffers needed
Ring Router Microarchitecture
11 NoCArc’09
Bufferless router in ring topology
• Simplified arbitration – Priority to packets already in flight – Guaranteed (deterministic) latency to destination
• No buffers needed – No misrouting [Bufferless router ISCA’09] – No packet dropping [SCARAB Micro’09]
• Only two-input muxes • No routing deadlock
• However… – Requires reserving the path to destination – Can reduce performance/throughput
Ring Router Microarchitecture
12 NoCArc’09
Lightweight Router Microarchitecture
• Add a buffer entry (2 buffer entry per input port) • Credit-based flow control for backpressure • Maintain same prioritized arbitration for packets in flight • Arbitration needed when ejecting packets
Ring Router Microarchitecture
bufferless lightweight
13 NoCArc’09
Lightweight Router Microarchitecture
• No predetermined routing – Bufferless : only in the appropriate slot was packet injected
into the network – Lightweight : the packet can be injected at any time
• Deadlock – Packets in the bufferless router were guaranteed to make
progress – Routing deadlock still avoided without additional virtual
channels ( see paper for detail )
Ring Router Microarchitecture
14 NoCArc’09
Evaluation
• Cycle accurate simulator used to compared ring router microarchitecture
• Simulator parameters include – N = 16 – single-flit packet (1 flit = 512 bits) – synthetic traffic patterns
• Orion2.0 used to model area / power (results in paper)
• Following microarchitectures compared: – baseline (3 cycle) – bufferless (1 cycle) – lightweight (1 cycle)
Ring Router Microarchitecture
15 NoCArc’09
Performance Comparison
Ring Router Microarchitecture
0
5
10
15
20
25
30
0 0.2 0.4 0.6 0.8
Late
ncy
(cyc
les)
Offered load (fraction of capacity)
bufferless
lightweight
baseline (b=2)
baseline (b=8)
0
5
10
15
20
25
30
0 0.2 0.4 0.6 0.8
Late
ncy
(cyc
les)
Offered load (fraction of capacity)
bufferless
lightweight
baseline (b=2)
baseline (b=8)
uniform random bit complement
16 NoCArc’09
Impact of Prioritized Arbitration
Ring Router Microarchitecture
0
5
10
15
20
25
30
0 0.2 0.4 0.6 0.8
Late
ncy
(cyc
les)
Offered load (fraction of capacity)
baseline (b=1)
baseline (b=2)
lightweight
17 NoCArc’09 Ring Router Microarchitecture
Today’s Talk
• Background in On-Chip Networks and Topology
• Router Microarchitecture for Ring Topology
• Scalability of Ring Topology
• Summary
18 NoCArc’09
How Scalable is the Ring Topology?
• Assumption : same bisection bandwidth comparing ring and 2D mesh The bandwidth PER channel for ring is higher than 2D mesh Trade-off of hop count vs serialization latency Per-hop latency can be higher with 2D mesh
Ring Router Microarchitecture
19 NoCArc’09
0
0.5
1
1.5
2
2.5
16 36 64 16 36 64 16 36 64 16 36 64
2 4 8 16
Nor
mal
ized
runt
ime
ring
mesh
Synthetic Workload
Ring Router Microarchitecture
network size (N) max oustanding req (r)
20 NoCArc’09
Bandwidth Fragmentation
• 2D mesh : – short packets (req) = 1 flit – long packets (reply) = 4 flits
• ring : – short packets (req) = 1 flit – long packets (reply) = 1 flit
Wide channels results in high bandwidth for ring However, for short packets, ring only utilizes ¼ of the
channel bandwidth Ring topology inefficient for short packets
Ring Router Microarchitecture
21 NoCArc’09
0
0.5
1
1.5
2
2.5
16 36 64 16 36 64 16 36 64 16 36 64
2 4 8 16
Nor
mal
ized
runt
ime
0
0.5
1
1.5
2
2.5
16 36 64 16 36 64 16 36 64 16 36 64
2 4 8 16
Nor
mal
ized
runt
ime
ring
mesh
Bandwidth Fragmentation
Ring Router Microarchitecture
bimodal pkts single flits pkts
22 NoCArc’09
Limitations of this study
• “Packaging” of on-chip network topology = 2D layout of the topology
• Layout of topology can impact the performance – 2D mesh : only require communicating with neighbors – Ring : long links can be needed as network scale
• Hierarchical rings not investigated.
• Router complexity (for mesh) not properly modeled.
Ring Router Microarchitecture
23 NoCArc’09 Ring Router Microarchitecture
Summary
• On-chip networks presents different constraints compared to off-chip networks – can exploit different router microarchitecture.
• Ring topology presents a simple topology and bufferless router microarchitecture can be implemented.
• Lightweight router microarchitecture proposed to increase performance with minimal additional complexity.
• Ring topology can scale but because of bandwidth fragmentation, can be limited in scalability – especially high traffic.
• Can we scale this router microarchitecture to 2D mesh topology?
24 NoCArc’09
Low-Cost Router Microarchitecture (Micro’09)
Ring Router Microarchitecture
25 NoCArc’09 Ring Router Microarchitecture
Thank you
Questions?