![Page 1: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/1.jpg)
1
Indirect Adaptive Routing on Large Scale Interconnection Networks
Nan Jiang, William J. Dally
Computer System Laboratory
Stanford University
John Kim
Korean Advanced Institute of Science and
Technology
![Page 2: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/2.jpg)
2
Overview• Indirect adaptive routing (IAR)
– Allow adaptive routing decision to be based on local and remote congestion information
• Main contributions– Three new IAR algorithms for large scale networks– Steady state and transient performance evaluations– Impact of network configurations– Cost of implementation
![Page 3: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/3.jpg)
3
Presentation Outline
• Background– The dragonfly network– Adaptive routing
• Indirect adaptive routing algorithms• Performance results • Implementation considerations
![Page 4: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/4.jpg)
4
Group 1Group 0 Group 2 …
Global Network
…
… …
Local Network
Router 1 Router 2
The Dragonfly Network• High Radix Network
– High radix routers– Small network diameter
• Each router– Three types of channels– Directly connected to a few
other groups• Each group
– Organized by a local network – Large number of global
channels (GC)• Large network with a global
diameter of one
Router 0p0
p1
…
![Page 5: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/5.jpg)
5
Routing on the Dragonfly• Minimal Routing (MIN)
1. Source local network2. Global network3. Destination local network
• Some Adversarial traffic congests the global channels– Each group i sends all packets
to group i+1
• Oblivious solution: Valiant’s Algorithm (VAL)– Poor performance on benign
traffic
Router 0
Group 1Group 0
…
Group 2 …
p0
p1
…
… …
Router 1 Router 2Congestion
![Page 6: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/6.jpg)
6
Adaptive Routing• Choose between the MIN path
and a VAL path at the packet source [Singh'05]– Decision metric: path delay– Delay: product of path distance
and path queue depth
• Measuring path queue length is unrealistic
• Use local queues length to approximate path– Require stiff backpressure
SourceRouter
q0 q1
q2 q3
MIN GC
VALGC
Congestion
![Page 7: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/7.jpg)
7
Adaptive Routing: Worst Case Traffic
0 0.1 0.2 0.3 0.4 0.5100
150
200
250
300
350
400
450
Throughput (Flit Injection Rate)
Pac
ket L
aten
cy (
Sim
ulat
ion
cycl
es)
Valiant’sMinimalAdaptive
![Page 8: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/8.jpg)
8
Indirect Adaptive Routing• Improve routing decision through remote
congestion information • Previous method:
– Credit round trip [Kim et. al ISCA’08]• Three new methods:
– Reservation – Piggyback– Progressive
![Page 9: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/9.jpg)
9
Credit Round Trip (CRT)• Delay the return of local
credits to the congested router
• Creates the illusion of stiffer backpressure
• Drawbacks– Remote congestion is still
inferred through local queues
– Information not up to date SourceRouter
Congestion
DelayedCredits
Credits
MIN GC
VALGC
[Kim et. al ISCA’08]
![Page 10: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/10.jpg)
10
Reservation (RES)• Each global channel track
the number of incoming MIN packets
• Injected packets creates a reservation flit
• Routing decision based on the reservation outcome
• Drawbacks– Reservation flit flooding– Reservation delay
SourceRouter
Congestion
RESFlit
RESFailed
MIN GC
VALGC
![Page 11: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/11.jpg)
11
Piggyback (PB)• Local congestion broadcast
– Piggybacking on each packet
– Send on idle channels• Congestion data
compression
• Drawbacks– Consumes extra bandwidth– Congestion information not
up to date (broadcast delay)
SourceRouter
Congestion
GCBusy
GCFree
MIN GC
VALGC
![Page 12: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/12.jpg)
12
Progressive (PAR)• MIN routing decisions at
the source are not final• VAL decisions are final• Switch to VAL when
encountering congestion
• Draw backs– Need an additional virtual
channel to avoid deadlock– Add extra hops
SourceRouter
Congestion
MIN GC
VALGC
![Page 13: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/13.jpg)
13
Experimental Setup
• Fully connected local and global networks– 33 groups– 1,056 nodes
• 10 cycle local channel latency• 100 cycle global channel latency• 10-flit packets
![Page 14: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/14.jpg)
14
Steady State Traffic: Uniform Random
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9100
120
140
160
180
200
220
240
260
280
300
Throughput (Flit Injection Rate)
Pac
ket L
aten
cy (
Sim
ulat
ion
cycl
es)
PiggybackCredit Round TripProgressiveReservationMinimal
![Page 15: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/15.jpg)
15
Steady State Traffic: Worst Case
0 0.1 0.2 0.3 0.4 0.5100
150
200
250
300
350
400
450
Throughput (Flit Injection Rate)
Pac
ket L
aten
cy (S
imul
atio
n cy
cles
)
PiggybackCredit Round TripProgressiveReservationValiant’s
![Page 16: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/16.jpg)
16
Transient Traffic: Uniform Random to Worst Case
0 20 40 60 80 100100
200
300
400
500
Cycles After Transition
Pac
ket L
aten
cyAverage Packet Latency per Cycle - UR to WC
ProgressivePiggyback
0 20 40 60 80 1000
50
100
Cycles After Transition% o
f Pac
kets
Rou
ting
Non
min
imal
ly
% Packets Routing Non-minimally per Cycle - UR to WC
ProgressivePiggyback
![Page 17: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/17.jpg)
17
Network Configuration Considerations
• Packet size– RES requires long packets to amortize reservation flit cost– Routing decision is done on per packet basis
• Channel latency– Affects information delay (CRT, PB)– Affects packet delay (PAR, RES)
• Network size– Affects information bandwidth overhead (RES, PB)
• Global diameter greater than one– Need to exchange congestion information on the global
network
![Page 18: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/18.jpg)
18
Cost Considerations• Credit round trip
– Credit delay tracker for every local channel
• Reservation– Reservation counter for every global channel– Additional buffering at the injection port to store packets
waiting for reservation
• Piggyback– Global channel lookup table for every router– Increase in packet size
• Progressive– Extra virtual channel for deadlock avoidance
![Page 19: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/19.jpg)
19
Conclusion• Three new indirect adaptive routing algorithms for large
scale networks• Performance and design evaluation of the algorithms
• Best Algorithm?– Piggyback performed the best under steady state traffic– Progressive responded fastest to transient changes
– Network configurations will affect some algorithm performance– Cost of implementation
![Page 20: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/20.jpg)
20
Thank You!•Questions?
![Page 21: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/21.jpg)
21
Adaptive Routing: Uniform Traffic
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9100
120
140
160
180
200
220
240
260
280
300
Throughput - Flit Injection Rate
Pac
ket L
aten
cy -
Sim
ulat
ion
cycl
es
VALMINAdaptive
![Page 22: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/22.jpg)
22
Transient Traffic: Worst Case to Uniform Random
0 20 40 60 80 100150
200
250
300
350
Cycles After Transition
Pac
ket L
aten
cyAverage Packet Latency per Cycle - WC to UR
PARPB
0 20 40 60 80 1000
50
100
Cycles After Transition% o
f Pac
kets
Rou
ting
Non
min
imal
ly
% Packets Routing Non-minimally per Cycle - WC to UR
PARPB
![Page 23: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/23.jpg)
23
Transient Traffic: Worst Case 1 to Worst Case 10
0 20 40 60 80 100200
300
400
500
Cycles After Transition
Pac
ket L
aten
cyAverage Packet Latency per Cycle - WC1 to WC10
PARPB
0 20 40 60 80 1000
50
100
Cycles After Transition% o
f Pac
kets
Rou
ting
Non
min
imal
ly
% Packets Routing Non-minimally per Cycle - WC1 to WC10
PARPB
![Page 24: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/24.jpg)
24
1000 Random Permutation Traffic
200 3000
5
10
15
20
25
30PB
Packet Latency
% o
f 1K
Per
mut
atio
ns
200 3000
5
10
15
20
25
30CRT
Packet Latency%
of 1
K P
erm
utat
ions
200 3000
5
10
15
20
25
30PAR
Packet Latency
% o
f 1K
Per
mut
atio
ns
200 3000
5
10
15
20
25
30RES
Packet Latency
% o
f 1K
Per
mut
atio
ns
200 3000
5
10
15
20
25
30VAL
Packet Latency
% o
f 1K
Per
mut
atio
ns
![Page 25: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/25.jpg)
25
Effect of Packet size on RES: Worst Case Traffic
0 0.1 0.2 0.3 0.4 0.50
50
100
150
200
250
300
350
400
450
500
550
Throughput - Flit Injection Rate
Late
ncy
- Sim
ulat
ion
cycl
es
1 Flit2 Flits4 Flits8 Flits
![Page 26: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/26.jpg)
26
Large local network: Uniform Random
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
50
100
150
200
250
300
350
400
Throughput - Flit Injection Rate
Pac
ket L
aten
cy -
Sim
ulat
ion
cycl
es
PBCRTMINPARRES
![Page 27: Indirect Adaptive Routing on Large Scale Interconnection Networks](https://reader035.vdocuments.us/reader035/viewer/2022070500/56816867550346895ddecda3/html5/thumbnails/27.jpg)
27
Large local network: Worst Case
0 0.1 0.2 0.3 0.4 0.50
100
200
300
400
500
600
Throughput - Flit Injection Rate
Pac
ket L
aten
cy -
Sim
ulat
ion
cycl
es
PBCRTPARRESVAL