110 february 2006 rapidio ft research update: dynamic routing david bueno february 10, 2006 hcs...

11
1 10 February 2006 RapidIO FT Research Update: Dynamic Routing David Bueno February 10, 2006 HCS Research Laboratory Dept. of Electrical and Computer Engineering University of Florida

Upload: kory-morrison

Post on 18-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

310 February 2006 Background Several relevant papers on this topic uncovered in previous literature searches  [1] and [2] of most interest since they deal with expanding an existing protocol (InfiniBand) to support dynamic routing  Unlike IBA, RIO spec does not forbid dynamic routing, but leaves implementation up to developer One important issue in implementing dynamic routing in a RIO system is in-order delivery  For traffic flows requiring in-order delivery, [2] suggests assigning multiple destIDs to each node that may be the recipient of an in-order flow  All switch routing tables would then provide only a single output port for this destID  Example: Assign destID’s 5 and 6 to physical processing element P Assume use ID 5 for dynamic traffic, 6 for in-order traffic Sample routing table entry for a RIO switch then could look like:  destID: 5 Port: 1, 2, 3  destID: 6 Port: 3  Packets for destID 5 can leave through ports 1, 2, or 3, but packets for destID 6 must leave through port 3  All packets for destID 5 and 6 end up at the same destination, processing element P

TRANSCRIPT

Page 1: 110 February 2006 RapidIO FT Research Update: Dynamic Routing David Bueno February 10, 2006 HCS Research Laboratory Dept. of Electrical and Computer Engineering

110 February 2006

RapidIO FT Research Update:Dynamic Routing

David BuenoFebruary 10, 2006

HCS Research LaboratoryDept. of Electrical and Computer Engineering

University of Florida

Page 2: 110 February 2006 RapidIO FT Research Update: Dynamic Routing David Bueno February 10, 2006 HCS Research Laboratory Dept. of Electrical and Computer Engineering

210 February 2006

Overview RapidIO switches traditionally handle routing using routing tables

with destID-output port pairs Packet route is NOT specified at source, instead is determined by

switches as packet travels through network Routing tables are generally static with one port for each destID

Want to explore capabilities of dynamic routing in RapidIO switches for purposes of performance (load balancing) and fault tolerance Many of our FT network designs provide the option of “over-provisioning”

the backplane by providing an extra switch Initial experiments with high-bandwidth corner turns found it best to leave

extra switch inactive as no benefits were gained by using it in active mode with a statically-routed application

Early GMTI experiments tested dynamic round-robin routing with GMTI corner turns and found a statically-routed version performed better Lesson learned here is that if an application CAN be effectively statically

routed and network provides enough bandwidth, use static routing

Page 3: 110 February 2006 RapidIO FT Research Update: Dynamic Routing David Bueno February 10, 2006 HCS Research Laboratory Dept. of Electrical and Computer Engineering

310 February 2006

Background Several relevant papers on this topic uncovered in previous literature

searches [1] and [2] of most interest since they deal with expanding an existing protocol

(InfiniBand) to support dynamic routing Unlike IBA, RIO spec does not forbid dynamic routing, but leaves

implementation up to developer

One important issue in implementing dynamic routing in a RIO system is in-order delivery For traffic flows requiring in-order delivery, [2] suggests assigning multiple

destIDs to each node that may be the recipient of an in-order flow All switch routing tables would then provide only a single output port for this

destID Example: Assign destID’s 5 and 6 to physical processing element P

Assume use ID 5 for dynamic traffic, 6 for in-order traffic Sample routing table entry for a RIO switch then could look like:

destID: 5 Port: 1, 2, 3 destID: 6 Port: 3 Packets for destID 5 can leave through ports 1, 2, or 3, but packets for destID 6 must

leave through port 3 All packets for destID 5 and 6 end up at the same destination, processing element P

Page 4: 110 February 2006 RapidIO FT Research Update: Dynamic Routing David Bueno February 10, 2006 HCS Research Laboratory Dept. of Electrical and Computer Engineering

410 February 2006

Model Improvements Models already supported dynamic routing assumed similar to Honeywell RIOS

“aggregate” capabilities Round-robin selection of output ports from a list similar to previous example

Expanded simulation models to allow selection of output port based on port with smallest number of packets outstanding to be sent and accepted

Expanded models to allow random selection of output port Selection of port takes place prior to decision to accept or reject a packet based

on buffer space, priority, etc. Created additional 32-node benchmarks to test usefulness of dynamic routing

for traffic that cannot be statically scheduled Random reads- Each processing element issues 1000 read requests to random

destinations for 256 B. Request N+1 is not issued until request N is filled. Generally ~32 packets are in flight in the network at any one time.

Random sends 256- Each processing element issues 1000 message passing packets (256 B) to random destinations. There is a large delay after each packet is sent so that each iteration is not subject to contention prior to starting. (i.e. everyone sends their packet, then waits awhile, then everyone sends again at the same time, and this happens a total of 1000 times)

Random sends 4096- Each processing element issues 1000 full RapidIO messages (4096 B) to random destinations. There is a large delay after each message is sent so that each iteration is not subject to contention prior to starting.

Page 5: 110 February 2006 RapidIO FT Research Update: Dynamic Routing David Bueno February 10, 2006 HCS Research Laboratory Dept. of Electrical and Computer Engineering

510 February 2006

Experiments Overview (1) All experiments use the Fault-

Tolerant Clos (FTC) network architecture

Results generally hold for any of our FT architectures with 5-switch core stage if routing is configured identically

Dynamic routing only possible in FIRST stage if a shortest-hop path is to be taken to destination First-stage switch may choose

between any active core switch (up to 5 active switches) assuming packet is destined for a destination node NOT connected to the same first-stage switch

Most paths traverse three switches to get from one node to another Some paths only require one switch

when both source and dest are connected to same switch

Page 6: 110 February 2006 RapidIO FT Research Update: Dynamic Routing David Bueno February 10, 2006 HCS Research Laboratory Dept. of Electrical and Computer Engineering

610 February 2006

Experiments Overview (2) For all experiments, 5-switch core assumes all 5

switches are active 4-switch core may represent either of two cases:

4 active switches with a 5th switch unpowered as a spare 4 active switches, when the 5th switch has previously failed

3-switch core should be interpreted similarly Note that based on number of nodes and network

bandwidth, 5 switches is over provisioned, 4 switches is “correct” provisioning, and 3 switches is under provisioned

Page 7: 110 February 2006 RapidIO FT Research Update: Dynamic Routing David Bueno February 10, 2006 HCS Research Laboratory Dept. of Electrical and Computer Engineering

710 February 2006

Random Reads Round robin strategy performed very well across all cases, only slightly

outperformed by shortest buffer in the 3-switch core case Round robin appears to provide best pure load balance for nondeterministic traffic 3-switch case suffers from under provisioning and contention where “intelligent”

port selection may provide slight benefit With random traffic, even statically-routed systems benefited from 5-switch

core, although benefit was larger for round robinRandom Read Requests (256 B)

2700000

2750000

2800000

2850000

2900000

2950000

3000000

3050000

5-Switch Core 4-Switch Core 3-Switch Core

Com

plet

ion

Tim

e (n

s)

Round Robin

Shortest Buffer

Random Buffer

Static

Page 8: 110 February 2006 RapidIO FT Research Update: Dynamic Routing David Bueno February 10, 2006 HCS Research Laboratory Dept. of Electrical and Computer Engineering

810 February 2006

Random Sends (256 B) Benchmark has lower contention than previous one because traffic is synchronized

by delays between message sends With this benchmark, we examine average packet latencies rather than completion time

Lower contention in this case lends benchmark well to static routing Round robin and static methods behave very similarly in this case due to synchronization and

provide very similar results With under-provisioned network, dynamic routing becomes of greater benefit

Shortest buffer and round robin both outperform static routing in 3-switch core case Random output port selection performed very poorly in all cases

Random Message Passing Sends (256 B)

2400

2450

2500

2550

2600

2650

2700

2750

5-Switch Core 4-Switch Core 3-Switch Core

Aver

age

Pack

et L

aten

cy (n

s)

Round Robin

Shortest Buffer

Random Buffer

Static

Page 9: 110 February 2006 RapidIO FT Research Update: Dynamic Routing David Bueno February 10, 2006 HCS Research Laboratory Dept. of Electrical and Computer Engineering

910 February 2006

Random Sends (4096 B) Benchmark has highest contention due to large message sizes to random destinations With this benchmark, we examine average packet latencies rather than completion time

Packet latency is counted from time message is generated to time individual packet reaches destination Benchmark assumes message segments may be received out of order at destination and reassembled

Most cases performed best under static routing due to large messages and high contention Similar phenomenon to earlier corner turn experiments

No clever routing is going to help when two large messages “collide” in network Exception again is under-provisioned 3-switch core case

Shortest-buffer, round robin, and even random more efficient than static routing

Random Message Passing Sends (4096 B)

10000

10500

11000

11500

12000

12500

13000

13500

5-Switch Core 4-Switch Core 3-Switch Core

Aver

age

Pack

et L

aten

cy (n

s)

Round Robin

Shortest Buffer

Random Buffer

Static

Page 10: 110 February 2006 RapidIO FT Research Update: Dynamic Routing David Bueno February 10, 2006 HCS Research Laboratory Dept. of Electrical and Computer Engineering

1010 February 2006

Conclusions Optimal routing strategy extremely dependent on algorithm and communication patterns

Dynamic routing not very useful when high traffic amounts (such as corner turns) can be adequately balanced statically Previous experiments have shown it can do more harm than good

These experiments show dynamic routing most useful in cases of “moderate” network contention or under-provisioned network

In general, round robin appears to be most flexible dynamic routing strategy for Clos-based RIO networks Results may vary widely for other network configurations, but Clos networks the focus here due to

their FT properties and high performance Shortest-buffer routing performed better in some cases but likely not worth the cost of

extra logic required to make decisions based on buffer status Effectiveness is limited in a Clos network because choice can only be made at first-stage switch

Even if buffer at first-stage switch is empty, it could be headed to a highly congested second-stage switch! Do NOT want to concern switches with the status of OTHER switches in the network

May be more useful in some applications specifically tailored towards this routing strategy But, similar queue “bypass” could be handled just using RapidIO priority mechanism already present in protocol

Random routing generally not useful compared to other alternatives Extra switch core helpful in ALL cases when traffic is random, even without dynamic

routing Dynamic routing enhances usefulness of active 5th core switch

Page 11: 110 February 2006 RapidIO FT Research Update: Dynamic Routing David Bueno February 10, 2006 HCS Research Laboratory Dept. of Electrical and Computer Engineering

1110 February 2006

References[1] J. M. Montanana, J. Flich, A. Robles, P. Lopez, and J.

Duato, "A Transition-Based Fault-Tolerant Routing Methodology For Infiniband Networks," in Proceedings of the 18th International Parallel and Distributed Processing Symposium, Santa Fe, New Mexico, April 2004.

[2] J. C. Martinez, J. Flich, A. Robles, P. Lopez, and J. Duato, “Supporting Adaptive Routing in Infiniband Networks,” In Proceedings of the Eleventh Euromicro Conference on Parallel, Distributed, and Network-Based Processing, pp. 165-172, February 2003.