resilient routing reconfiguration ye wang, hao wang*, ajay mahimkar +, richard alimi, yin zhang +,...

RResilient esilient RRouting outing RReconfigurationeconfiguration

Ye Wang, Hao Wang*, Ajay Mahimkar+, Richard Alimi, Yin Zhang+, Lili Qiu+, Yang Richard Yang

* Google

+ University of Texas at AustinYale University

ACM SIGCOMM 2010, New Delhi, IndiaACM SIGCOMM 2010, New Delhi, IndiaSeptember 2, 2010

Motivation Failures are common in operational IP networks

Accidents, attacks, hardware failure, misconfig, maintenance Multiple unexpected failures may overlap

e.g., concurrent fiber cuts in Sprint (2006) Planned maintenance affects multiple network elements

May overlap with unexpected failures (e.g. due to inaccurate SRLG)

Increasingly stringent requirement on reliability VoIP, video conferencing, gaming, mission-critical apps, etc. SLA has teeth violation directly affects ISP revenue

Need resiliency: network should recover quickly & smoothly from one or multiple overlapping failures

Challenge: Topology Uncertainty Number of failure scenarios quickly explodes

500-link network, 3-link failures: > 20,000,000!

Difficult to optimize routing to avoid congestion under all possible failure scenarios Brute-force failure enumeration is clearly infeasible Existing methods handle only 100s of topologies

Difficult to install fast routing Preconfigure 20,000,000 backup routes on routers?

Focus exclusively on reachability e.g., FRR, FCP (Failure Carrying Packets), Path Splicing May suffer from congestion and unpredictable performance

congestion mostly caused by rerouting under failures [Iyer et al.] multiple network element failures have domino effect on FRR

rerouting, resulting in network instability [N. So & H. Huang]

Only consider a small subset of failures e.g., single-link failures [D. Applegate et al.] Insufficient for demanding SLAs

Online routing re-optimization after failures Too slow cannot support fast rerouting

Existing Approaches & Limitations

4

R3: Resilient Routing Reconfiguration

A novel link-based routing protection scheme

requiring no enumeration of failure scenarios provably congestion-free for all up-to-F link failures efficient w.r.t. router processing/storage overhead flexible in supporting diverse practical requirements

Goal: congestion-free rerouting under up-to-F link failures Input: topology G(V,E), link capacity ce, traffic demand d

dab: traffic demand from router a to router b

Output: base routing r, protection routing p rab(e): fraction of dab carried by link e

pl(e): (link-based) fast rerouting for link l

Problem Formulation

6

5

1010

10

10

capacity dab=6

a b

d

c

rab(ab)=1

pab(ac)=1 pab(cb)=1

From Topology Uncertainty to Traffic Uncertainty

Instead of optimizing for original traffic demand on all possible topologies under failures

R3 optimizes protection routing for a set of traffic demands on the original topology

Rerouting virtual demand set captures the effect of failures on amount of rerouted traffic

Protection routing on original topology can be easily reconfigured for use after failure occurs

7

Failure scenario (f) rerouted traffic (x)

Rerouted traffic under all possible up-to-F-link failure scenarios (independ of r):XF = { x | 0 ≤ xl ≤ cl, Σ(xl/cl) ≤ F } (convex combination)

Rerouting Virtual Demand Set

8

4/5

0/10 0/10

4/10

2/10

rerouted traffic xl after link l fails = base load on l given r(r is congestion-free xl ≤ cl)

load/capacity

a b

c

d

Failure scenario

Rerouted traffic

Upper bound of rerouted traffic

ac fails xac = 4 xac ≤ 5 (cac)

ab fails xab = 2 xab ≤ 10 (cab)

R3 Overview Offline precomputation

Plan (r,p) together for original demand d plus rerouting virtual demand x on original topology G(V,E) to minimize congestion

p may “use” links that will later fail

Online reconfiguration Convert and use p for fast rerouting after failures

9

Compute (r,p) to minimize MLU (Max Link Utilization) for original demand d + rerouting demand x ∈ XF

r carries d, p carries x∈XF

min(r,p) MLU

s.t. [1] r is a routing, p is a routing; [2] ∀x∈XF, ∀e:

[ ∑a,b∈Vdabrab(e) + ∑l∈Exlpl(e) ] / ce ≤ MLU

Challenge: [2] has infinite number of constraints Solution: apply LP duality a polynomial # of constraints

Offline Precomputation

10

Original traffic Rerouting traffic

Step 1: Fast rerouting after ac fails: Precomputed p for ac:

pac(ac)=1/3, pac(ab)=1/3, pac(ad)=1/3 ac fails fast reroute using pac \ pac(ac) equivalent to a rescaled pac:

ξac(ac)=0, ξac(ab)=1/2, ξac(ad)=1/2 link l fails ξl(e)=pl(e)/(1-pl(l))

Online Reconfiguration

11

1/3

1/3 1/3

2/31/3

pac(e)

a b

c

d

0

1/2

1/2 1/2

1ξac(e)

Online Reconfiguration (Cont.) Step 2: Reconfigure p after failure of ac

Current p for ab: pab(ac)=1/2, pab(ad)=1/2

ac fails 1/2 need to be “detoured” using ξac pab(ac)=0, pab(ad)=3/4, pab(ab)=1/4

link l fails ∀l’, pl’(e) = pl’(e)+pl’(l) ξl(e)

12

1/2

1/2 1/2

1/2

Apply detour ξac on every protection routing (for otherlinks) that is using ac

pab(e)

a b

c

d

0

3/4 3/4

0

01/4

R3 Guarantees Sufficient condition for congestion-free

if (∃ r,p) s.t. MLU ≤ 1 under d+XF

no congestion under any failure involving up to F links

Necessary condition under single link failure if there exists a protection routing guarantees no congestion

under any single-link failure scenario (∃ r,p) s.t. MLU ≤ 1 under d+X1

Adding superset of rerouted traffic to original demand is not so wasteful

Open problem: is R3 optimal for >1 link failures?

R3 online reconfiguration is order independent of multiple failures

13

R3 Extensions Fixed base routing

r can be given (e.g., as an outcome of OSPF) Trade-off between no-failure and failure protection

Add penalty envelope β(≥1) to bound no-failure performance

Trade-off between MLU and end-to-end delay Add envelope γ(≥1) to bound end-to-end path delay

Prioritized traffic protection Associate different protection levels to traffic with different priorities

Realistic failure scenarios Shared Risk Link Group, Maintenance Link Group

Traffic variations Optimize (r,p) for d ∈ D + x ∈ XF

14

Evaluation Methodology Network Topology

Real: Abilene, US-ISP (PoP-level) Rocketfuel: Level-3, SBC, and UUNet Synthetic: GT-ITM

Traffic Demand Real: Abilene, US-ISP Synthetic: gravity model

Failure Scenarios Abilene: all failure scenarios with up to 3 physical links US-ISP: maintenance events (6 months) +

all 1-link, 2-link failures + ~1100 sampled 3-link failures

Enumeration only needed for evaluation, not for R3

15

Evaluation Methodology (cont.) R3 vs other rerouting schemes

OSPF+R3: add R3 rerouting to OSPF MPLS-ff+R3: “ideal” R3 (flow-based base routing) OSPF+opt : benchmark, optimal rerouting (based on OSPF) OSPF+CSPF-detour: commonly used OSPF+recon: ideal OSPF reconvergence FCP: Failure Carrying Packets [Lakshminarayanan et al.] PathSplice: Path Splicing (k=10,a=0,b=3) [Motivala et al.]

Performance metrics MLU (Maximum Link Utilization) measures congestion

lowerbetter performance ratio = MLU of the algorithm / MLU of optimal routing

for the changed topology (corresponding to the failure scenario) always ≥1, closer to 1better

16

US-ISP Single Failure

17

R3 achieves near-optimal performance (R3);R3 out-performs other schemes significantly.

US-ISP Multiple Failures

18

R3 consistently out-performs other schemes by at least 50%

all two-link failure scenarios sampled three-link failure scenarios

US-ISP No Failure: Penalty Envelope

19

R3 near optimal under no failures with 10% penalty envelope

Experiments using Implementation R3 Linux software router implementation

Based on Linux kernel 2.6.25 + Linux MPLS Implement flow-based fast rerouting and efficient R3 online

reconfiguration by extending Linux MPLS

Abilene topology emulated on Emulab 3 physical link failures (6 directed link failures)

20

R3 vs OSPF-recon: Link Utilization

21

R3 out-performs OSPF+recon by a factor of ~3

Precomputation Complexity

Profile computation time (in seconds) Computer: 2.33 GHz CPU, 4 GB memory

22

Offline precomputation time < 36 minutes(operational topologies, < 17 minutes)

Computation time stable with higher #failures.

Router Storage Overhead Estimate maximum router storage overhead for our implementation

ILM (MPLS incoming label mapping) and NHLFE (MPLS next-hop label forwarding entry) are required to implement R3 protection routing

23

Modest router storage overhead:FIB < 300 KB, RIB < 20 MB,#ILM < 460, #NHLFE < 2,500

Conclusions R3: RResilient RRouting RReconfiguration

Provably congestion-free guarantee under multiple failures Key idea: convert topology uncertainty to traffic uncertainty offline precomputation + online reconfiguration

Flexible extensions to support practical requirements

Trace-driven simulation R3 near optimal (>50% better than existing approaches)

Linux implementation Feasibility and efficiency of R3 in real networks

24

Thank you!

25

26

Backup Slides

26

27

US-ISP Single Failure: Zoom In

R3 achieves near-optimal performance (R3 vs opt)

For each hour (in one day), compare the worst case failure scenario for each algorithm

27

Level-3 Multiple Failures Left: all two-failure scenarios; Right: sampled three-failure scenarios

28

R3 out-performs other schemes by >50%

SBC Multiple Failures Left: all two-failure scenarios Right: sampled three-failure scenarios

29

“Ideal” R3 outperforms OSPF+R3 in some cases

Robustness on Base Routing OSPFInvCap: link weight is inverse proportional to bandwidth OSPF: optimized link weights Left: single failure; Right: two failures

30

A better base routing can lead to better routing protection

31

Flow RTT (Denver-Los Angeles)

R3 implementation achieves smooth routing protection

31

32

R3: OD pair throughput

Traffic demand is carried by R3 under multiple link failure scenarios

32

33

R3: Link Utilization

Bottleneck link load is controlled under 0.37 using R3

33

Offline Precomputation Solution

[C2] contains infinite number of constraints due to x

Consider maximum “extra” load on link e caused by x

Σl∈Exlpl(e) ≤ UB, if there exists multipliers πe(l) and λe (LP duality):

Convert [C2] to polynomial number of constraints:

34

πe(l)λe

& constraints on πe(l) and λe

35

Traffic Priority

Service priority is a practical requirement of routing protection

Traffic Priority ExampleTPRT (real-time IP) traffic

should be congestion-free under up-to-3 link failures (to achieve 99.999% reliability SLA)

Private Transport (TPP) trafficshould survive up-to-2 link failures

General IP trafficshould only survive single link failures

35

36

R3 with Traffic Priority

Attribute protection level to each class of trafficTPRT protection level 3TPP protection level 2 IP protection level 1

Traffic with protection level greater than equal to i should survive under failure scenarios covered by protection level i

36

37

R3 with Traffic Priority AlgorithmPrecomputation with Traffic Priority

Consider protection for each protection level i

Guarantee each class of traffic has no congestion under the failure scenarios covered by its protection level

37

38

R3 with Traffic Priority Simulation Routing Protection Simulation

Basic R3 vs R3 with Traffic Priority

Methodology US-ISP: a large tier-1 operational network topology

hourly PoP-level TMs for a tier-1 ISP (1 week in 2007) extract IPFR and PNT traffic from traffic traces, subtract IPFR

and PNT from total traffic and treat the remaining as general IP Protection levels:

TPRT: up-to-4 link failures TPP: up-to-2 link failures IP: single link failures

Failure scenarios: enumerated all single link failures 100 worst cases of 2 link failures 100 worst cases of 4 link failures

38

Traffic Protection Priority

IP: up-to-1 failure protection;TPP: up-to-2 failure protection;TPRT: up-to-4 failure protection

Left: single failure;Righttop: worst two failures;Rightbottom: worst four failures

39R3 respects different traffic protection

priorities

Linux Implementation MPLS-ff software router

Based on Linux kernel 2.6.25 + Linux MPLS Implement flow-based routing for efficient R3 online reconfiguration

Extend MPLS FWD (forward) data structure to enable per hop traffic splitting for each flow

Failure detection and response Detection using Ethernet monitoring in kernel

In operational networks, detection can be conservative (for SLRG, MLG)

Notification using ICMP 42 flooding Requires reachability under failures

Rerouting traffic by MPLS-ff label stacking Online reconfiguration of traffic splitting ratios (locally at each

router)

40

41

MPLS-ff R3 uses flow-based traffic splitting (for each OD

pair) e.g. pnewywash(chichous)=0.2 means link chichous will

carry 20% traffic originally carried by newywash when newywash fails

Current MPLS routers only support path-based traffic splitting Traffic load on equal-cost LSPs is proportional to requested

bandwidth by each LSP Juniper J-, M-, T- series and Cisco 7200, 7500, 12000 series

e.g. multiple paths from newy to wash can be setup to protect link newywash

4141

42

MPLS-ff Convert flow-based to path-based routing

e.g., using flow decomposition algorithm-- [Zheng et al]

Expensive to implement R3 online reconfiguration Need to recompute and signal LSPs after each failure

Extend MPLS to support flow-based routing MPLS-ff Enable next-hop traffic splitting ratios for each flow

flow: traffic originally carried by a protected link

4242

43

MPLS-ff

MPLS FWD data structure Extended to support multiple Next Hop Label

Forwarding Entry (NHLFEs) One NHLFE specifying one neighbor

One Next-hop Splitting Ratio for each NHLFE

Label 200 FWD:NHLFE R2: 50%NHLFE R3: 50%NHLFE R4: 0% R1

R2

R4

R3

43

44

MPLS-ff

Implement Next-hop Splitting Ratio router i, next-hop j, protected link (a,b):

Packet hashing Same hash value for packets in the same TCP flow Independent hash values on different routers for

any particular TCP flow

Hash of (packet flow fields + router id)

Hash = f(src, dst,srcPort, dstPort)

FWD to j if 40<Hash<64

all packets from i: 40<Hash<64!

(skewed hash value distribution)i j

44

45

Failure Response Failure Detection and

Notification detection: Ethernet monitoring

in Linux kernel

notification: ICMP 42 flood

Failure Response MPLS-ff label stacking

Protection Routing Update R3 online reconfiguration

requires a local copy of p on each router

45

R3 Design: Routing Model Network topology as graph G = (V,E)

V: set of routers E: set of directed network links, link e=(i,j) with capacity ce

Traffic matrix (TM) TM d is a set of demands: d = { dab | a,b ∈ V } dab: traffic demand from a to b

Flow-based routing representation r = { rab(e) | a,b ∈ V, e ∈ E } rab(e): the fraction of traffic from a to b (dab) that is carried by

ink ee.g. rab(e)=0.25 means link e carry 25% traffic from a to b

46

Intuition behind R3 Plan rerouting on the single original topology

Avoid enumeration of topologies (failure scenarios) Compute (r,p) to gurantee congestion-free

for d + x∈XF on G

Puzzles Add rerouted traffic before it appears

XF = { x | 0 ≤ xl ≤ cl, Σ(xl/cl) ≤ F } superset of rerouted traffic under failures Use network resource that will disappear

Protection routing p uses links that will fail

47

By doing two counter-intuitive things, R3 achieves:• Congestion-free under multiple link failures w/o

enumeration of failure scenarios• Optimal for single link failure• Fast rerouting

Not real traffic demand!

Topology with links that will fail!

Step 1: Fast rerouting after ac fails: Precomputed p for link ac:

pac(ac)=0.4, pac(ab)=0.3, pac(ad)=0.3 ac fails 0.4 need to be carried by ab and ad

pac: ac 0, ab 0.5, ad 0.5 a locally rescales pac and activates fast rerouting

Efficiently implemented by extending MPLS label stacking

2/102+2/10

Online Reconfiguration

48

0/10 0/10

2/10

0+2/10 0+2/10

2+4/104/54-4/5

Fast reroutingload/capacity

a b

c

d

0.3

0.3 0.3

0.60.4

pac(e)

a b

c

d

0

0.5

0.5 0.5

1

resilient routing reconfiguration ye wang, hao wang*, ajay mahimkar +, richard alimi, yin zhang +,...

Documents

p ac ab

p ab ab

ac p ab ac

p ac p ac ac equivalent

ac p ab e ab c d

link e p

p ac ad

p ac e ab c d