ethernet data center routing challenges and 802.1aq/spb new work peter ashwood-smith...

15
Ethernet Data Center Routing Challenge and 802.1aq/SPB new work PETER ASHWOOD-SMITH [email protected]

Upload: jacob-godley

Post on 01-Apr-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH peterashwoodsmith@huawei.com

Ethernet Data Center Routing Challengesand 802.1aq/SPB new work

PETER ASHWOOD-SMITH

[email protected]

Page 2: Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH peterashwoodsmith@huawei.com

802.1aq’s 16 ECT can give perfect spread going 2 hops 16 uplinks. However:

A) Need to tweak 2nd layer switch priorities to guarantee all 16 are used.B) Need at least 16 subnets (C/S-Vlan’s) to assign one per 802.1aq B-VID.

A) TweakBridgePrioritiesHere

S1 … S16B)

Page 3: Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH peterashwoodsmith@huawei.com

Can we eliminate ‘tweaking*’

• David Allan et al. have a presentation on this so I won’t spend much time on it.

• In general a network with N equal cost paths from ‘some source’ to ‘some destination’ requires #ECT about 25-40% greater than N (to statistically capture them all).

• Therefore when #ECT == N some ‘tweaking’ is usually required (for DC its trivial to do however).

• Dave et al. suggest non-independence between ECT algorithms as way to address this (maximize diversity) …

*Tweaking = adjustingBridge Priorities up/down fromdefaults.

Page 4: Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH peterashwoodsmith@huawei.com

A15 A16

B32B31B30B29

A1 A2

B4B3B2B1

• 48 switch non blocking 2 layer L2 fabric• 16 at “upper” layer A1..A16

• 32 at “lower” layer B1.. B32

• 16 uplinks per Bn, & 160 UNI links per Bn

• 32 downlinks per An

“Example” 802.1aq switching cluster – assume 100GE NNI links/groups

• (16 x 100GE per Bn )x32 = 512x100GE = 51.2T • 160 x 10GE server links (UNI) per Bn

• (32 x 160)/2 = 2560 servers @ 2x10GE per• uFIB = 16 x 48 B-mac = 768 entries• mFIB = 16 subnet x 48 src = 768 entries

16 x 32 x 100GE = 51.2Tusing 48 x 2T switches

S3,1 S3,160 S32,1 S32,160S1,1 S1,160

5120 x 10GE

16 x 100GE

160 x 10GE

32 x 100GE

1536 FIB/node

Goodnumbers“16”& “2”levels.

Page 5: Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH peterashwoodsmith@huawei.com

For a given ECT-ALGk, Aj is a member of every SPF-TREE(B*,ECT-ALGk)

Properly tuned no two ECT-ALGorithms will use the same Aj as a fork point.

S1 … S16

ECT-ALG#12

SourceNode (1)

Page 6: Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH peterashwoodsmith@huawei.com

A15 A16

B32B31B30B29

A1 A2

B4B3B2B1

Subnet Ni maps to I-SIDj and then to a unique A (j mod 16 )

So load spreading allows each Ai to transit a complete subnet.

Problem#1 - Unable to further spread such that Ai and Aj (i != j) each handle subset of flows in I-SID j

I-SIDj I-SIDjI-SIDj

I-SIDi I-SIDiI-SIDi

Page 7: Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH peterashwoodsmith@huawei.com

A15 A16

B32B31B30B29

A1 A2

B4B3B2B1

This is an issue under failure of Aj

Recovery will move entire subnet traffic to another Ai node.

A preferable solution is to spread affected load over remaining A*

I-SIDj I-SIDjI-SIDj

I-SIDi I-SIDiI-SIDi

Page 8: Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH peterashwoodsmith@huawei.com

A15 A16

B32B31B30B29

A1 A2

B4B3B2B1

Possible solution – head end hashing (unicast only)

Allow unicast I-SIDi and I-SIDj traffic to be hashed based on smaller flows to different B-VIDs (ECT-ALGorithms)

This breaks the symmetry and congruence rules but allows edge balancing at smaller granularity. No changes to multicast.Requires learning <C-DA, B-DA> , independent of B-VID

I-SIDj I-SIDjI-SIDj

I-SIDi I-SIDiI-SIDi

Unicast

Mcast

Page 9: Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH peterashwoodsmith@huawei.com

A15 A16

B32B31B30B29

A1 A2

B4B3B2B1

A15 A16

B32B31B30B29

A1 A2

B4B3B2B1

Interconnection of fabrics creates more than 16 paths (exponential )

C1 C2

Number of paths can grow exponentially with increasing levels.Constant number of paths always << number of paths in many networks.Growing 802.1aq ECT to say 32 or even 100 ECMP causes larger unicast FIBs.

O(16)

O(16x2)

O(16x2x16)

Page 10: Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH peterashwoodsmith@huawei.com

A15 A16

B32B31B30B29

A1 A2

B4B3B2B1

Horizontal Growth – not too bad but need more ECT-ALGORITHMS.

Horizontal growth by 1 just increases number of ECT by 1Not too big a problem but we would need to define new ECT (via Opaque).

B34B33

A17

Page 11: Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH peterashwoodsmith@huawei.com

General Issue

O(degree)

O(diameter)

#paths ~= O( diameter degree)

So head end ECT in worst case requires O(exp(# B-VIDs))

S D

Choosepath fromN x B-VID

Page 12: Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH peterashwoodsmith@huawei.com

A feasible solution …

Re-assign traffic to path at each hop

Tandem “ECMP” just like IP.

Need to keep O(degree) number of next hopsOnly need one B-VID .. removes O(diameter) from state cost

Flip side is you have no control – just hope for fine scale statistical distribution

Choosepath fromN x nxt hop

S D

Choosepath fromN x nxt hop

Single B-VID

Page 13: Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH peterashwoodsmith@huawei.com

What about loops in this mode?

802.1aq Ingress Check is very strong in the case of a single next hop and hencea single possible ingress for an SA.

802.1aq Ingress Check is weakened in the case of a multiple next hop and henceMultiple possible ingress for an SA.

However 802.1aq Agreement Protocol functions correctly in the context of multiple possible Next Hops for the same B-VID (refer to Mick’s proof).

But …

Page 14: Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH peterashwoodsmith@huawei.com

Agreement Protocol ConcernsIs it too complex? it is clearly non trivial, we need implementation/emulation experience.

Is it overly Draconian. For example the bounds on movement are what is required for a mathematical proof by induction .. However there are probably many cases where further movement would not loop. What isthe degree of ‘overkill’ ?

Is it marketable? – this is unfortunately a legitimate concern!!!

802.1aq can be deployed without AP until we introduce hash basedforwarding at which point we either require a symmetric AP and/oran on-data-path loop detection/drop mechanism.

Believe that an on-data-path loop detection mechanism is requiredfor hash based ECMP until we have more experience with AP.

Recommend we standardize a TTL TAG either stand-alone or as a new form of I-TAG.

Page 15: Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH peterashwoodsmith@huawei.com

View of New Work Requirements

R1) New ECT-ALGorithms with improved spreading properties.

R2) Allow optional head end hash assignment of 802.1aq SPBM UNI known unicasttraffic to one of multiple next hop interfaces/B-VIDs. Very similar to Link Ag.Minimally HASH (seed, C.SA, C.DA, C-VID, [ IP.SA, IP.DA, IP.PROTO] )

R3) Allow optional tandem hash assignment of 802.1aq SPBM B-VID NNI unicasttraffic to one of multiple next hop interfaces. Essentially a new SPBM ECT-ALGwith its own B-VID. (i.e. new ECT-ALGorithms, all usable at same time)Minimally HASH (seed, B-VID, C.SA, C.DA, C-VID, [ IP.SA, IP.DA, IP.PROTO ])

R4) minor OA&M changes in support of R2 and R3, because symmetry/congruence broken.

R5) More experience with AP, emulations, simulations etc. +addition of TTL to new I-TAG or a TTL-TAG.