reducing transient disconnectivity using anomaly-cognizant forwarding andrey ermolinskiy, scott...

32
Reducing Transient Disconnectivity using Anomaly-Cognizant Forwarding Andrey Ermolinskiy, Scott Shenker University of California – Berkeley and ICSI

Upload: denis-mcdowell

Post on 31-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Reducing Transient Disconnectivity using Anomaly-Cognizant ForwardingAndrey Ermolinskiy, Scott Shenker

University of California – Berkeley and ICSI

What’s the problem? One of the central goals of the Internet - continuous

end-to-end connectivity

BGP convergence is a major cause of connectivity disruption Routers operate upon potentially inconsistent local views Temporary inconsistencies give rise to anomalies such as

loops and black holes that disrupt end-to-end packet delivery

Example: transient routing loop with BGP

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

withdraw BA

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

withdraw BA

Routing loop between C and D incurs temporary loss of connectivity between {B, C, D, E, F} and A.

Example: transient routing loop with BGP

Related Work Shrinking the convergence time window through BGP protocol extensions

Ghost flushing Consistency assertions

Protecting end-to-end packet delivery from adverse effects of convergence R-BGP

Forward packets on pre-computed failover paths, Propagate root cause information to prevent loops

Consensus Routing Enforce a globally-consistent view via distributed snapshots and strategically delay adoption of incoming BGP updates

Anomaly-Cognizant Forwarding

Anomaly-Cognizant Forwarding (ACF) Approach Accept routing anomalies as an unavoidable fact Protect end-to-end packet delivery by detecting and recovering from anomalies on the forwarding path

Main hypothesis Several simple and lightweight extensions to conventional IP forwarding enable us to sustain packet delivery during periods of BGP instability

without the use of pre-computed backup paths without modifying the core routing protocol or altering its timing dynamics

Domain S has anomalous forwarding state for destination D if S’s outgoing packets destined for D arrive back to S as result of a routing loop.

Main idea of ACF: Detect occurrences of anomalous state

Avoid forwarding packets via domains that are known to have anomalous state.

S

DAnomalous forwarding state

ACF Overview

Each packet carries a list of prior AS-level hops (pathTrace)

Each packet carries a blackList of domains with anomalous state

pathTrace blackList

Packet header

ACF OverviewForward (packet p) {

if (localASNum in p.pathTrace)

Move loop elements from p.pathTrace to p.blackList

nextHop lookupNextHop(p.destAddr)

if (nextHop in p.blackList)

Invoke the control plane, look for alternate non-blacklisted routes in the RIB

if (nextHop != NONE) {

Append localASNum to p.pathTrace

SendPacket(p, nextHop)

} else

Initiate recovery-mode forwarding for p

}

ACF Recovery-mode forwarding

Normal-mode forwarding

Recovery-mode forwarding

Intuition: R or some router along the path to R may know a working alternate route to the original destination.

If a router is unable to forward a packet because it does not have a valid non-blacklisted route, it initiates recovery forwarding. Chooses a recovery destination R from a static and well-

known set of highly-connected Tier-1 domains. Detours the packet through R.

R1 R2

nextHop=NONE

Recovery destinations

Anomaly-Cognizant Forwarding

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

p

p.Header

pathTrace = [ C ] blackList = { }dst = A origDst =

Anomaly-Cognizant Forwarding

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

p

p.Header

pathTrace = [ C D ] blackList = { }dst = A origDst =

Anomaly-Cognizant Forwarding

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

p

pathTrace = [ C D ] blackList = {D }

p.Headerdst = A origDst =

C initiates recovery forwarding through domain F

Anomaly-Cognizant Forwarding

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

p

p.Header

pathTrace = [ ] blackList = {C D }dst = F origDst = A

C initiates recovery forwarding through domain F

Anomaly-Cognizant Forwarding

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

p

p.Header

pathTrace = [ ] blackList = {C D }dst = F origDst = A

C initiates recovery forwarding through domain F

Anomaly-Cognizant Forwarding

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

p

p.Header

pathTrace = [ C] blackList = {C D }dst = F origDst = A

C initiates recovery forwarding through domain F

Anomaly-Cognizant Forwarding

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

p

p.Header

pathTrace = [ C] blackList = {C D }dst = F origDst = A

C initiates recovery forwarding through domain F

Anomaly-Cognizant Forwarding

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

p

p.Header

pathTrace = [ C] blackList = {C D E}dst = F origDst = A

C initiates recovery forwarding through domain F

Anomaly-Cognizant Forwarding

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

p

p.Header

pathTrace = [ C E] blackList = {C D E}dst = F origDst = A

C initiates recovery forwarding through domain F

Anomaly-Cognizant Forwarding

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

p

p.Header

pathTrace = [ C E] blackList = {C D E}dst = F origDst = A

C initiates recovery forwarding through domain F

Anomaly-Cognizant Forwarding

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

p

p.Header

pathTrace = [ ] blackList = {C D E}dst = F origDst = A

C initiates recovery forwarding through domain F

F resumes normal-mode

forwarding

Anomaly-Cognizant Forwarding

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

p

p.Header

pathTrace = [ F] blackList = {C D E}dst = F origDst = A

C initiates recovery forwarding through domain F

F resumes normal-mode

forwarding

Anomaly-Cognizant Forwarding

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

p

p.Header

pathTrace = [ F G] blackList = {C D E}dst = F origDst = A

C initiates recovery forwarding through domain F

F resumes normal-mode

forwarding

Anomaly-Cognizant Forwarding

A

B

C D

EF

G

1. BA 2. CBA

1. BA 2. DBA

1. CBA 2. DBA

1. ECBA 2. GA

p

p.Header

pathTrace = [ F G] blackList = {C D E}dst = F origDst = A

C initiates recovery forwarding through domain F

F resumes normal-mode

forwarding

Anomaly-Cognizant Forwarding

A

B

C D

EF

G

ACF: Observations ACF does not use pre-computed failover paths

Discovers alternate routes dynamically using state in the packet header The two forwarding modes make use of the same forwarding table

Paths to recovery destinations are not assumed to be stable and anomaly-free We protect recovery-mode forwarding using the same mechanism (pathTrace and blackList)

ACF: Preliminary Evaluation Evaluation metrics

Effectiveness in eliminating transient disconnectivity Efficiency of alternate paths Packet header overhead

ACF: Preliminary Evaluation Simulation methodology

CAIDA AS-level topology (27969 nodes) annotated with inferred inter-AS relationships 12937 multihomed edge domains, 29426 adjacent provider links Provider link failure experiment

For each multihomed domain D, and each provider link L Fail L and simulate packet delivery from every other domain to D during

convergence

D

S1

S2

S4

S3

Recovery destinations = 10 highly-connected Tier-1 ISPs Packet TTL = 32 hops

ACF: Preliminary Evaluation Transient disconnection after a link failure

BGP with conventional forwarding 51% of failures cases produce unwarranted disconnection Widespread disconnection (>50% of ASes) in 17% of cases

BGP with ACF No disconnection in 92% of failure

cases <1% of ASes see disconnection in

98% of failure cases

ACF: Preliminary Evaluation Transient path efficiency

Causes of path dilation in ACF Transient loops Detouring via a recovery

destination

F – failure cases that produce transient disconnection with conventional forwarding

In 65% of failure cases that produce disconnectivity, ACF recovers packets using ≤ 2 extra hops

9% of cases require 7 hops or more

ACF: Preliminary Evaluation Packet header overhead

% of ASes disconnected 0% 0.09% 0.9% 9% 90%

pathTrace length 11 16 16 20 13

blackList length 4 11 9 11 16

Maximum number of pathTrace and blackList entries in a representative sample of failure cases.

Worst-case pathTrace – 20 entries 40 bytes of overhead assuming 16-bit AS numbers

Worst-case blackList – 16 entries 10 bytes of overhead for a Bloom filter with 1% error rate

Challenges / Concerns Feasibility of deployment

ACF adds fields to packet header and modifies core IP forwarding logic.

Packet processing overhead Control plane is invoked only during periods of

instability Common case: check pathTrace and blackList.

Both operations admit efficient implementation in hardware and parallelization.

ACF and routing policies

Thank you. Questions?