germán rodríguez cyriel minkenberg ramon beivide ronald p. luijten jesus labarta mateo valero...

32
Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Oblivious Routing Schemes in Extended Generalized Fat Tree Networks New Orleans, 2009 HPI-DC'09 in conjunction with CLUSTER'09)

Upload: gerard-martin

Post on 11-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

Germán RodríguezCyriel MinkenbergRamon BeivideRonald P. LuijtenJesus LabartaMateo Valero

Oblivious Routing

Schemes in

Extended Generalized

Fat Tree Networks

New Orleans, 2009

HPI-DC'09(in conjunction with CLUSTER'09)

Page 2: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

2

Summary

● We describe previously well known regular modulo-based routing algorithms for k-ary n-trees.

● We extend and analyze these algorithms for a broader class of networks: XGFTs, including cost-effective variants of k-ary n-trees

● We produce some combinatorial results that show that the two main variants for modulo-based algorithms perform equally well for a random distribution of traffic

● We identify two intrinsic flaws of oblivious modulo-based algorithms and propose a variant that improves over both.

Page 3: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

3

● XGFT topologies: ● k-ary n-trees and more cost-effective variants.

● Routing (State of the Art)● Random● Modulo-radix variants: Source-Mod-k and Destination-mod-k

● Experimental environment● Analysis of Modulo-radix algorithms● Proposal – random NCA up/down● Evaluation● Results● Conclusion

Outline

Page 4: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

4

Extended Generalized Fat Trees I

● XGFT ( h ; m1, … , mh; w1, … , wh )

● Superclass of Multi-Trees● k-ary n-trees [Petrini97]

● Slimmed trees [Navaridas07]

● h = height

● number of levels-1

● levels are numbered 0 through h

● level 0 : compute nodes

● levels 1 … h : switch nodes

● mi = # children per node at level i, 0 < i ≤ h

● wi = # parents per node at level i-1, 0 < i ≤ h

● number of level 0 nodes = i mi

● number of level h nodes = i wi

XGFT ( 3 ; 3, 2, 2 ; 2, 2 ,3 )

0,0,0 0,0,1 0,0,2 0,1,0 0,1,1 0,1,2 1,0,0 1,0,1 1,0,2 1,1,0 1,1,1 1,1,2

0,0,0 1,0,0 0,1,0 1,1,0 0,0,1 1,0,1 0,1,1 1,1,1

0,0,0 0,1,0 1,0,0 1,1,0 0,0,1 0,1,1 1,0,1 1,1,1

0,0,0 0,0,1 0,0,2 0,1,0 0,1,1 0,1,2 1,0,0 1,0,1 1,0,2 1,1,0 1,1,1 1,1,2

4-ary 2-tree

XGFT(3;4,4,4;1,4,1) – Slimmed tree

Nearest Common Ancestors (NCA),

Least Common Ancestors (LCA)

or “roots”

of a pair (s,d) or nodes are:

The set of inner nodes at the lowermost level that are ancestors of both s and d.

Page 5: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

5

Extended Generalized Fat Trees II

XGFT ( 3 ; 3, 2, 2 ; 2, 2 ,3 )

0,0,0 1,0,0 2,0,0 0,1,0 1,1,0 2,1,0 0,0,1 1,0,1 2,0,1 0,1,1 1,1,1 2,1,1

0,0,0 1,0,0 0,1,0 1,1,0 0,0,1 1,0,1 0,1,1 1,1,1

0,0,0 0,1,0 1,0,0 1,1,0 0,0,1 0,1,1 1,0,1 1,1,1

0,0,0 0,0,1 0,0,2 0,1,0 0,1,1 0,1,2 1,0,0 1,0,1 1,0,2 1,1,0 1,1,1 1,1,2

0

1

21

1

0

4-ary 2-tree

XGFT(3;4,4,4;1,4,1) – Slimmed tree

● Number of nodes at level i, 0 < i < h

● Each node can be labeled as a h-tuple:

< Wi, ... ,W1,, Mh, ... Mi+1>, 0 ≤ Mi ≤ mi, 0 ≤ Wi ≤ wi

which in combination with the level number i uniquely determines a node in the whole network

(first W’s, then M’s)

● Equivalent variations in the labeling schemes have been proposed [Lin04,Gomez07]

h

ij

i

jjj

i wmN1 1

Page 6: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

6

XGFTs and Contention

),(

1

),(dslevelNCA

iiwdsPaths

● XGFTs provide multiple paths for every pair of nodes:

● Proportional to the “number of parents” (wi) parameters up to the Least/Nearer Common ancestors of Source s and Destination d.

● Increasing the number of parents increases the cost.● k-ary n-trees provide full-bisection and set a well-known trade-off

between cost and performance

● Slimmed trees (with wi ≤ k) become more important with the increasing number of nodes● Our analysis and proposal works better for slimmed trees than previous

algorithms.

Page 7: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

7

Related Work: Routing schemes

● Main Oblivious routing schemes for Fat Trees● Random [Valiant81][Greenberg85] selection of upward paths● Either Source [Leiserson92][Ohrin95][Kariniemi06] modulo

assignment of upward links● or Destination [Lin04][Gomez07][Johnson08] modulo

assignment of upward links

● Pattern-aware (used in this work)● Colored Heuristic [Rodriguez09]

● We use it as a base-line for comparison

Page 8: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

8

Random Routing I

● The assignments of links to reach an NCA is totally random● Idea: a random distribution should equally distribute the probability of having

contention● At each step choose a random parent until an NCA is reached,● Then, follow the unique deterministic path down

S

Node 1 Node 10

Page 9: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

9

Regular Routings (s mod k, d mod k)

● “Self-routing” approach● At each step, choose the parent by getting doing a modulo operation (k)● Difference: The label of the source or destination is used to go up to

the tree only

<0,0,0> <0,0,1>

<0,0,0> <0,1,1>

<0,0,0> <0,1,0>

Node <0,0,0> Node 10 = <1,0,1> Dest 26 = <2,2,2>

<0,0,0> mod 3 = (port) 0 <1,0,1> mod 3=

(port) 1

<0,0,0> mod 3 = (port) 0

<1,0,1> mod 3 = (port) 0

<2,2,2>

<0,2,0> <1,2,1>

<0,0,0> <0,1,0>

Node <0,0,0> Node <1,0,1> Dest <2,2,2>

<2,2,2> mod 3 = (port) 2

<2,2,2> mod 3 = (port) 2

source mod k destination mod k

Page 10: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

10

Combinatorial Analysis of Modulo-based algorithms:

•An interesting question arises: is any of the two variations (source or destination) of the modulo-based algorithms intrinsically better?

•Number of permutations routed

● By s-mod-k, by d-mod-k

● The same; why?

● Idea: For every P, exists Inverse (P) / if P has c conflicts with s-mod-k, Inverse of P has c conflicts with d-mod-k (details in the paper)

•Number of general patterns (no permutations) routed

● By s-mod-k, by d-mod-k

● The same; why?

● Idea: decompose the pattern in all possible permutations

● Compute the maximum c of all possible permutations for s-mod-k

● Invert the decomposed permutations and apply the previous result, the union of the inverted permutations have the same maximum c for d-mod-k

● Look for more details in the paper

Page 11: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

11

Experimental Setup

● Collection of application traces and pattern extraction● Co-simulation approach [Minkenberg09]:

● Dimemas replays the MPI activity of the trace of an application● Venus simulates the transmission of the messages with a

detailed model of the network

statistics

Venus Simulator

routes

mapping

topology

Config File:Adapter, Switch parameters, BW,

Link delay

statistics

map2ned

Myrinet’sroute files

Myrinet’smap files

routereader

Traffic Generator

traces

Dimemas Simulator

Config File:Links, Bandwidth,#buses, latency,

Eager/rendez-vous, etc.

traces

Execution of an

ApplicationVisualization, Analysis

Validation(Paraver)

ServerMod ClientMod

Detailed level of simulation

Applications/MPI

Page 12: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

12

Applications

● WRF● 256 processors● Each process sends 2

outstanding sends to destinations +/- 16 nodes away (except the first and the last 16 processes)

● CG● 128 processors

Page 13: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

13

Results: WRFProgressive tree slimming

● Removing a single switch degrades the performance by 2● Removing 7 more middle switches has no impact for 3 routing

schemes● Regular modulo routings work very well (as good as the

baseline), while Random does not.WRF, progressive tree slimming

0

2

4

6

8

10

12

14

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

Slo

wd

ow

n

Random

S mod k

D mod k

Colored

Full-Crossbar

Page 14: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

14

Modulo-based Algorithms look good

● A word about contention:

● Two main types: endpoint contention, and network fabric contention

● Endpoint contention arises because a node is performing multiple outstanding sends or receives and has less adapters than it needs.

● Network fabric contention arises because there are not enough network resources or the routing algorithm is not using them adequately.

● Modulo-based routing algorithms work by using node labels to go up to the tree, concentrating endpoint contention for every particular node to a specific NCA

● S-mod-k uses the source label – endpoint contention at the source is concentrated

● D-mod-k uses the destination label – endpoint contention at the destination is concentrated

However, modulo-based algorithms do not always work well...

Page 15: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

15

Results: CG

● Oblivious routings cannot achieve the best performance● It’s a pathological case for modulo-based oblivious algorithms● Random routing does not achieve good performance● The oblivious strategies do not match the baseline

CG 128 processors, progressive tree slimming

0

0.5

1

1.5

2

2.5

3

3.5

4

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

Slo

wd

ow

n

Random

S mod k

D mod kColored

Full-Crossbar

Page 16: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

16

Results: CGCommunication Pattern

● Colored

● All phases take the same time

● Destination Mod K

● Non-local phase takes 8 times longer?

Page 17: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

17

Results: CGCommunication Pattern congruent with the modulo algorithm

● Why do oblivious algorithms work badly with CG?● Only a phase in CG is non-local in our experiment:

● Each source sends to:

● destination = (source/2) * 16 + (source mod 2)

● Modulo-based routing algorithms in radix 16 networks

● OutputPort (destination) = ((source/2) * 16 + (source mod 2)) mod 16 == 0 or 1

● Map the 16 outgoing communications to either port 0 or 1● 8 to each – 8 contending communications

● 14 unused ports in the switch…

Page 18: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

18

Proposal:Random NCA up/down

Oblivious algorithms: What does d-mod-k or s-mod-k do? Make certain “roots” responsible to

route a collection of sources or destination.

The distribution of roots is even (for a k-ary n-tree, but not for slimmed trees).

Tries to concentrate endpoint contention either in the path up to the root (souce mod k) or down from the root (destination mod k)

We can relabel the nodes and apply modulo-based algorithms to the new sources or destinations labels and define two families of algorithms:

Random NCA up (using source labels) Random NCA down (using d labels)

Idea:Each root is responsible toconcentrate endpoint contentionof a number of leaf nodes.

Even distribution of leaf nodesto roots should lead to goodperformance.

Page 19: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

19

A word on the results plots

● In each of the graphs there is a data point for:● Source-mod-k (triangle up, centered)

● Destination-mod-k (triangle down, centered)

● And three boxes with (minimum,1st quartile, median, 2nd quartile and maximum) for:● Random

● Random NCA up

● Random NCA down

● Note that although the random algorithms results are based on the statistical collection of 20 to 60 experiments with different seeds, the variance in the performance might not be noticeable, thus a single horizontal line is the whole “box”

0

0.5

1

1.5

2

2.5

3

3.5

4

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2

Slo

wd

ow

n

Value of w2 (#middle switches) for XGFT(2;16,16;1,w2)

CG.D, Progressive tree-slimming

s-mod-k (centered)d-mod-k (centered)colored (centered)r-NCA-u (1st box)r-NCA-d (2nd box)Random (3rd box)

Page 20: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

20

Results: WRF

Random-NCA-up and Random-NCA-down are almost as good as S-mod-K and D-mod-k

0

2

4

6

8

10

12

14

16

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

Slo

wd

ow

n

Value of w2 (#middle switches) for XGFT(2;16,16;1,w2)

WRF, Progressive tree-slimming

s-mod-k (centered)d-mod-k (centered)colored (centered)r-NCA-u (1st box)r-NCA-d (2nd box)Random (3rd box)

Page 21: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

21

Results: CG

0

0.5

1

1.5

2

2.5

3

3.5

4

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2

Slo

wd

ow

n

Value of w2 (#middle switches) for XGFT(2;16,16;1,w2)

CG.D, Progressive tree-slimming

s-mod-k (centered)d-mod-k (centered)colored (centered)r-NCA-u (1st box)r-NCA-d (2nd box)Random (3rd box)

Random-NCA-up and Random-NCA-down are mid-way between S-mod-K and D-mod-k and the baseline.

Page 22: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

22

Routes per NCA● Distribution of routes per NCA for several routing schemes

● X axis is the NCA number● Left – non-slimmed

● Small variance of routes per NCA per routing and across ports● Right – slimmed topology

● Source and destination modulo-based algorithms show a huge difference of routes assigned per NCA

● Random and the proposed family of random assignment of NCAs exhibit less variance across NCAs

3500

4000

4500

5000

5500

6000

6500

7000

7500

8000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Nu

mb

er

of r

ou

tes

ass

ign

ed

NCA number (16 NCAs)

Distribution of Routes per NCA

s-mod-k (1st, data point)d-mod-k (2nd, data point)Random (3rd, box)r-NCA-u (4th, box)r-NCA-d (5th, box)

3500

4000

4500

5000

5500

6000

6500

7000

7500

8000

0 1 2 3 4 5 6 7 8 9

Nu

mb

er

of r

ou

tes

ass

ign

ed

NCA number (10 NCAs)

Distribution of Routes per NCA

s-mod-k (1st, data point)d-mod-k (2nd, data point)Random (3rd, box)r-NCA-u (4th, box)r-NCA-d (5th, box)

Page 23: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

23

Conclusions

● Conclusions

● There are no fundamental differences in performance for typical communication patterns between source and destination modulo-based algorithms

● Modulo-based algorithms present an intrinsic flaw for slimmed trees

● Non-balanced distribution of routes per NCA can lead to increased network contention

● A hybrid approach (randomly selecting NCAs that become “endpoint-contention” concentrators) helps and could be used as a better oblivious approach for both non-slimmed and slimmed networks.

Page 24: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

24

THANKS

HPIDC’09

Page 25: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

25

Q & A

Page 26: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

26

Q & A

Page 27: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

27

Node 1

Level 1

Level 2

Page 28: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

28

Routing in XGFTs

● Selecting a link up-wards further limits the choice of links a the upper levels.

● In pink: the switches that can be visited after selecting the first leftmost parent of level 1 and the second leftmost link up of level

Node 1

Level 1

Level 2

Node 1

Level 1

Level 2\

Page 29: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

29

XGFTs I

● Superclass of Fat Tree topologies:

● XGFT( h ; m1, ... , mh ; w1, ... , wh )

● h is the height of the tree.

● mi is the number of children per node at level i.

● wi is the number of parents per node at level i.

XGFT(1;4,1) XGFT(1;4,2) XGFT(1;4,3) XGFT(1;4,4)

4-ary tree 4-ary 1-tree

Page 30: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

30

Random Routing I

● The assignments of links to reach an NCA is totally random● Idea: a random distribution should equally distribute the probability of

having contention

S S S

Nodes 1 - 9 Node 10

● Drawback I: Suboptimal link assignment given a pattern

S S S

Nodes 1 - 9 Node 10

Page 31: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

31

Random Routing II

● Drawback II● Even a single conflict halves performance

3 4 5 6 7 8 9 1 3 4 5 6 7 8 91

9 Links, 2 conflicts for 3 pairs of nodes 6 Links, No conflicts

2 2

Page 32: Germán Rodríguez Cyriel Minkenberg Ramon Beivide Ronald P. Luijten Jesus Labarta Mateo Valero Oblivious Routing Schemes in Extended Generalized Fat Tree

32

Coupled effects

Topology

Routing

Communication

Pattern

Mapping

PerformanceContention

Results