th infocom 2014 traffic engineering with equalcost ...€¦ · traffic engineering with...
TRANSCRIPT
Traffic Engineering withEqualCostMultiPath:
An Algorithmic Perspective
Marco Chiesa s
joint work with
Guy Kindler Michael Schapira
Israel Networking Day 2014 April 24th
Infocom 2014
trafficengineering (TE)
network operators' goal: ● provide best possible service● minimize costs
how?● fully exploit network resources
→ route flows of traffic along the “best” paths
traditional TE tools
ECMP (EqualCostMultiPath)● the most widely deployed TE mechanism● loadbalancing tool● very simple mechanism
contributions
ECMP and arbitrary topologies no reasonable approximation is possible→
ECMP and datacenter topologies:● hypercubes vs folded Clos networks ● large flows in folded Clos networks
:(
:):)
traditional TE tools
ECMP (EqualCostMultiPath)● operators set link weights● traffic is routed along shortestpaths
s
t
traditional TE tools
ECMP (EqualCostMultiPath)● operators set link weights● traffic is routed along shortestpaths
s3
3
2
199
t
traditional TE tools
when multiple shortest paths are available:● perpacket level equal split● perflow level hashbased split
→ equal split for many small flows
3
199
3
2
s
t
4
22
2
fst=4
traditional TE tools
when multiple shortest paths are available:● perpacket level equal split● perflow level hashbased split
→ equal split for many small flows
3
99 4
3
2
s
t
2
11
3
2
fst=4
1
traditional TE tools
when multiple shortest paths are available:● perpacket level equal split● perflow level hashbased split
→ equal split for many small flows
3
1 99
3
2
s
t
2
2
2
2
fst=4
99 4
traditional TE tools
OSPF + ECMP (EqualCostMultiPath)● operators set link weights
how? heuristic approaches: ● local search [Fortz,Thorup, 2000][Sundaresan et al, 2010]
● memetic algorithms [Buriol et al, 2002]
● genetic algorithms [Ericsson et al, 2002]
● branchandcut for mixedILP [Palmar et al, 2006]
wanted: algorithm with provable guarantees?
TE model:multi commidity flow
input:● capacitated graph G=(V,E)● demand matrix D={dij}
constraints:● flows cannot exceed links capacities ● flows are equally split among all shortestpaths
optimization functions:● maximize total throughput (flow)● minimize congestion● minimize sum of linkcosts
known results:inapproximability for maxflow
Theorem [FortzThorup,2000]. Given an instance I=(G,D), it is NPhard to distinguish whether:
OPT(I) = 1 or OPT(I) = 2/3 k,
for any k>0
→ no efficient algorithm can provably route at least a fraction of OPT(I)
23
23+e
our first results:inapproximability for maxflow
Theorem [FortzThorup,2000]. Given an instance I=(G,D), with a single entry in D, it is NPhard to distinguish whether:
OPT(I) = 1 or OPT(I) = 2/ k,
for any k>0
23
our first results:no inapproximability
for maxflowTheorem [FortzThorup,2000]. Given an instance I=(G,D), with a single entry in D, it is NPhard to distinguish whether:
OPT(I) = 1 or OPT(I) = 2/3q,
for any q>0
→ no algorithm can provably route at least a fraction k of OPT(G) in polynomial time
23
Theorem [FortzThorup,2000]. Given an instance I=(G,D), with a single entry in D, it is NPhard to distinguish whether:
OPT(I) = 1 or OPT(I) = 2/3q,
for any q>0
→ no efficient algorithm can provably route at least a fraction q of OPT(I)
our first results:any constant inapproximability
for maxflow
23
key tool:amplification operator X
operator X: instance I → instance Inew
such that
OPT(Inew) = OPT(I) 2
OPTA(A(G)) = (OPTG)4
...
amplifying the gap
OPT(I) = 1 or OPT(I) = it is NPhard to distinguish between 1 and ~0.6
OPT(X(G)) = 1 or OPT(X(H)) = it is NPhard to distinguish between 1 and ~0.4
OPT(X2(G)) = 1 or OPT(X2(H)) = it is NPhard to distinguish between 1 and ~0.2
…
23
amplifying the gap
OPT(I) = 1 or OPT(I) = it is NPhard to distinguish between 1 and ~0.6
OPT(X(I)) = 1 or OPT(X(I)) = it is NPhard to distinguish between 1 and ~0.4
OPT(X2(G)) = 1 or OPT(X2(H)) = it is NPhard to distinguish between 1 and ~0.2
…
49
23
amplifying the gap
OPT(I) = 1 or OPT(I) = it is NPhard to distinguish between 1 and ~0.6
OPT(X(I)) = 1 or OPT(X(I)) = it is NPhard to distinguish between 1 and ~0.4
OPT(X2(I)) = 1 or OPT(X2(I)) = it is NPhard to distinguish between 1 and ~0.2
…
23
49
1681
amplification gap technique:graph G
kc1
kc3
kc2
kc4
kc5
s
t
● source s● target t● capacitated edges
amplification gap technique:graph G
kc1
kc3
kc2
kc4
kc5
s
t
● source s● target t● capacitated edges
t
maxflow (s t) = → OPT
amplification gap technique: recursive replacement
kc1
kc3
kc2
kc4
kc5
s
t
c2
c2c2c2c1
c2c3
c2c5c2c4
s'
t'
amplification gap technique: recursive replacement
kc1
kc3
kc2
kc4
kc5
s
t
c2
c2c2c2c1
c2c3
c2c5c2c4
s'
t'
maxflow(s' → t') = c2OPT
amplification gap technique: recursive replacement
kc1
kc3
kc2
kc4
kc5
s
t
c2
c2c2c2c1
c2c3
c2c5c2c4
s'
t'
maxflow(s' → t') = c2OPT
amplification gap technique: recursive replacement
c2O
PT
s
t
c2
c2c2c2c1
c2c3
c2c5c2c4
s'
t'
kc1
kc3
kc4
kc5
maxflow(s' → t') = c2OPT
amplification gap technique: recursive replacement
c1OPT
c3OPT
c2O
PT
c4O
PT
c5OPT
s
t
c2
c2c2c2c1
c2c3
c2c5c2c4
s'
t'
maxflow(s' → t') = c2OPT
amplification gap technique:graph X(G)
c1OPT
c3OPT
c2O
PT
c4O
PT
c5OPT
s
t
maxflow(s t,G') = →OPT ...⋅
amplification gap technique:graph X(G)
c1OPT
c3OPT
c2O
PT
c4O
PT
c5OPT
s
t
maxflow(s t,G') = →OPT ⋅ OPT = OPT 2
datacenter topologies and ECMP
topology constraints:● ddimensional hypercubes (e.g., bCubelike)
→ NPhard● llayers folded Clos networks (e.g., VL2like)
→ easy● random regular graphs (e.g., Jellyfish)
→ future work
datacenter topologies and ECMP
topology constraints:● ddimensional hypercubes (e.g., bCube) Clos networks (e.g., VL2) network
d=4
datacenter topologies and ECMP
topology constraints:● ddimensional hypercubes (e.g., bCube) Clos networks (e.g., VL2) network
d=4
computing the best weight assignment is computationally intractable
datacenter topologies and ECMP
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=2
servers
l=2
datacenter topologies and ECMP
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=3
servers
datacenter topologies and ECMP
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=3
servers
datacenter topologies and ECMP
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=3
servers
datacenter topologies and ECMP
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=3
servers
datacenter topologies and ECMP
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=4
servers
datacenter topologies and ECMP
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=4
servers
datacenter topologies and ECMP
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=4
servers
datacenter topologies and ECMP
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=4
servers
datacenter topologies and ECMP
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=4
servers
datacenter topologies and ECMP
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=4
setting all link weights to 1 is optimal
optimality proof sketch
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=2
servers
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=4
optimality proof sketch
servers
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=4
optimality proof sketch
servers
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=4
optimality proof sketch
servers
topology constraints:● llayers folded Clos networks (e.g.,VL2)
l=4
optimality proof sketch
servers
ECMP and large flowsperflow level hashbased split
→ when there are a few large flows, traffic may not be properly loadbalanced
l=4
servers
ECMP and large flowsperflow level hashbased split
→ when there are a few large flows, traffic may not be properly loadbalanced
l=4
servers
ECMP and large flowsperflow level hashbased split
→ when there are a few large flows, traffic may not be properly loadbalanced
l=4
servers
ECMP and large flowsperflow level hashbased split
→ when there are a few large flows, traffic may not be properly loadbalanced
l=4
severe performance degradation [Al Fares et al, 2010]:30% of the bandwidth in a datacenter is wasted
ECMP, Clos networks, and large flows
proposed solution [AlFares et al, 2010]:● route small flows (mice) using ECMP● route large flows (elephant) using a greedy
algorithm
our results:● 2 inapproximability● greedy is a approximation algorithm
● approximation if all flows have equal size
12
15
14
conclusions
● in general, no efficient algorithm exists to assign the best link weights
datacenter topologies:
hypercubes still hard to find the best weights→
folded Clos networks set all weights to 1→
greedy algorithm for routing large flows is a approximation in a 3layers Folded Clos network (VL2like)
conclusions
● in general, no efficient algorithm exists to assign the best reasonable link weights
datacenter topologies:
hypercubes hard to find the best weights→
folded Clos networks set all weights to 1→
greedy algorithm for routing large flows is a approximation in a 3layers Folded Clos network (VL2like)
conclusions
● in general, no efficient algorithm exists to assign the best reasonable link weights
● datacenter topologies:● hypercubes hard to find the best weights→● folded Clos networks set all weights to 1→
– greedy algorithm for routing large flows is a approximation in a 3layers Folded Clos network (VL2like)
15
thank you