routing protocol convergence and availability · network convergence •network convergence is the...

Routing Protocol Convergence and

© 2009 Cisco Systems, Inc. All rights reserved.LACNOG2010 Cisco Public

Convergence and Availability

Alvaro Retana ([email protected])Principal EngineerCore IP Technology Architecture

High AvailabilityOverview

© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 2

Overview

2

Availability Definitions

• The probability that a service (or network, etc.) is operational, and functional as needed, at any point in time


• Availability = (MTBF—MTTR)/MTBFUseful definition for theoretical and practical

• MTBF is mean time between failureWhat, when, why and how does it fail?

• MTTR is mean time to repairHow long does it take to fix?

What Is High Availability?

Availability Downtime Per Year (24x365)99.000%99.500%99.900%

3 Days 1 Day

15 Hours19 Hours8 Hours

36 Minutes48 Minutes46 Minutes

DPM1000050001000


DPM = Defects per Million (Hours of Running Time)

99.950%99.990%99.999%99.9999%

53 Minutes5 Minutes30 Seconds

4 Hours 23 Minutes500100101

“HighAvailability”

Downtime

67%

67%

79%

87%

87%

Customer Premises Equipment Failure

Network Software Failures

Network Hardware Failures

Physical Link Failures

Network Operations Failures

Causes of Unscheduled Downtime


25%

37%

37%

44%

62%

67%

0% 20% 40% 60% 80% 100%

Malicious Damage

Acts of Nature

Unknown

Congestion/Overload

Physical Environment Failures

Customer Premises Equipment Failure

% of RespondentsSource: Sage Research, IP Service Provider Downtime Study: Analysis of Downtime Causes,

Costs and Containment Strategies, August 17, 2001, Prepared for Cisco SPLOB

Network Convergence• Network convergence is the time needed for traffic to be rerouted to the alternative or more optimal path after the network event

• Network convergence requires all affected routers to process the event and update the appropriate data structures used for forwarding


structures used for forwarding• Network Convergence is the time required to:

Detect event has occurredPropagate the eventProcess the eventUpdate related forwarding structures

Network Convergence (2)� Network Design and Operational Considerations

Processes for fault, configuration, performance and securityNo Single Points of Failure (except at edge) / Failure Domain SizeExcellent consistency (HW, SW, config, design)Redundancy, Hierarchy, Summarization, Modularity

� DetectionPhysical Failure (light!)Fast Hellos


Fast HellosBidirectional Forwarding Detection (BFD)

� HidingInterface Dampening (for flapping links)Graceful Restart

� Propagation and ProcessingLink State Exponential Back offPrefix PrioritizationBGP Prefix Independent Convergence (PIC)IP Fast ReRouteIGP/BGP Interaction

Graceful Restart


Graceful Restart

9

NSF/SSO• Standby Route Processor (RP) takes control of router after a hardware or software fault on the Active RP

• SSO allows standby RP to take immediate control and maintain connectivity protocols

StandbyRP

ActiveRP

State Information


immediate control and maintain connectivity protocols

• NSF continues to forward packets until route convergence is complete

RPRP

Line CardLine Card

NSF/SSO• Provide a scalable solution

Architecture must scale with workloads and features and meet network requirements

• Minimize state that must be synchronizedMinimize impact of HA on service

• Detect and react to failures quickly

Design Goals


• Detect and react to failures quicklyContinuously monitor Active componentsContinuously verify operation of Standby components

Graceful Restart• When the BGP peering session is brought up, the graceful restart capability is negotiated. If both peers state they are capable of GR, it’s enabled on the peering session.

• When A restarts, it opens a new

Control Data

GR ca

pabil

ityNe

w TC

P Ses

sion

Restart; close

BGP

A


• When A restarts, it opens a new TCP session to B, using the same router ID.

• B interprets this as a restart, and closes the old TCP session. Control Data

GR ca

pabil

ityNe

w TC

P Ses

sion

Restart; close old session

B

Graceful Restart• B transmits updates containing its BGP table (it’s local RIB out).

• A goes into read only mode, and does not run the bestpath calculations until its B has finished sending updates.

Control Data

Upda

tesEn

d of R

IB Ma

rker

Read only

A

BGP


• When B has finished sending updates, it sends an end of RIB marker, which is an update with an empty withdrawn NLRI TLV.

Control Data

End o

f RIB

Marke

r

Read only mode

B

Graceful Restart• When A receives the end of RIB marker, it runs bestpath, and installs the best routes in the routing table.

• After the local routing table is updated, BGP notifies CEF.

Control Data A

BGP


• CEF then updates the forwarding tables, and removes all information marked as stale.

Control Data B

Graceful Restart• rfc4724: Graceful Restart Mechanism for BGP• rfc5306: Restart Signaling for IS-IS• rfc4811: OSPF Out-of-Band Link State Database (LSDB) Resynchronization

References


• rfc5613: OSPF Link-Local Signaling• rfc4812: OSPF Restart Signaling• rfc3623: Graceful OSPF Restart

Fast Convergence


Fast Convergence

16

OSPF Architectural Constants• Initial LSA Generation Delay = 500 ms • Recurring LSA Origination Delay = 5 s• LSA Arrival Throttling = 1 s• LSA Flooding Pacing = 33 ms


• LSA Flooding Pacing = 33 ms• LSA Retransmission = 66 ms• SPF Execution Delay = 500 ms• SPF Holdtime = 5 s

Event Propagation• Fast LSA Generation after Initial Event• Repeated events increase regeneration delay• Configuration:timers throttle lsa all <lsa-start> <lsa-hold> <lsa-max>

OSPF Exponential Backoff


� Similar Configuration for Event Processing (SPF Runs)timers throttle spf <spf-start> <spf-hold> <spf-max>

timers throttle lsa all 10 500 5000

previous LSA generation at t0 (t1 – t0) > 5000 msEvents Causing LSA Generation

t1 time [ms]t2

1000

Event PropagationOSPF Exponential Backoff


LSA Generation

LSA Generation – Back-off Alg.

time [ms]

time [ms]

time [ms]t2

500

t1+10

5000 5000

1000 2000 4000 5000500

Link State Prefix Priority• Prefix Prioritization

4 priorities: Critical, High, Medium, Low/32 IPv4 and /128 IPv6 prefixes are classified by default in Medium PriorityRest is classified by default in Low Priority

• Prefix Prioritization is THE key behavior; for example


• Prefix Prioritization is THE key behavior; for exampleCRITICAL: IPTV SSM sourcesHIGH: Most Important PE’sMEDIUM: All other PE’sLOW: All other prefixes

BGP PIC EdgePE-CE link failure (fast repair)

RR1 RR2

RR4RR3

1. link PE2-CE2 failsIf BGP PIC Edge implemented, then traffic

goes PE1,PE2,PE3,CE2

BGP PIC Edge


VPN 1site Bx.x.x.x/y

RD 1:1RD 2:1

RD 3:1

RR4RR3

PE1PE2

PE3

CE2CE1VPN 1site A

BGP PIC EdgePE-CE link failure (re-optimization)

RR1 RR2

RR4RR33. PE2 withdraws paths4. RR2 and RR4 propagate

1. link PE2-CE2 failsIf BGP PIC Edge implemented, then traffic

goes PE1,PE2,PE3,CE2

2. Fast External Fallover scans BGP table, calculating new bestpaths



RD 1:1RD 2:1

RD 3:1

RR4RR3

PE1PE2

PE3

CE2CE1VPN 1site A

6. PE1 deletes path via PE2, now going via PE3

5. RR1 and RR3 propagate withdraws

4. RR2 and RR4 propagate withdraws

BGP PIC EdgePE node failure (fast repair)

RR1 RR2

RR4RR3

3. PE1 withdraws pathsIf BGP PIC Edge implemented, then

traffic goes PE1,PE3,CE2

1. link PE2 fails2. The IGP does propagate the BGP NH failure



RD 1:1RD 2:1

RD 3:1

RR4RR3

PE1PE2

PE3

CE2CE1VPN 1site A

10000

100000

1000000 msec

250k PIC250k no PIC500k PIC

BGP PIC Edge sample


1

10

100

1000

0

5000

0

1000

00

1500

00

2000

00

2500

00

3000

00

3500

00

4000

00

4500

00

5000

00

Prefix

500k PIC500k no PIC

IP Fast ReRoute


IP Fast ReRoute

25

Objective• Provide fast re-route in pure IP networks and MPLS/LDP networks without deploying RSVP-TE.

• To restore productive forwarding to all reachable addresses within 50ms.

• Control the transition of the network from repair to


• Control the transition of the network from repair to normal forwarding without further packet loss or micro-looping.

The Four Stages of IPFRR1. Pre-computation of repair paths2. Detection of failure3. Invocation of appropriate repair4. Controlled re-convergence of network


4. Controlled re-convergence of network

Basic Repair• Uses ECMP and Loop Free Alternates (LFA) where available

• LFAs easily computed in OSPF and IS-IS • Analogous to feasible successors in EIGRP


• Properties:•In general topologies around 80% of failures allow alldestinations to be repaired•For the remaining 20%, only a subset of destinations can be repaired

Triangle topology - ECMP

SiSi SiSi

S N


BA

SiSiSiSiP O

Square topology - LFA

SiSi SiSi

S N


BA

SiSiSiSiP

More complex topology – no LFA available

SiSi SiSiSiSi

S NM


BA

SiSiSiSiP

Complex topology

SiSi SiSiSiSi

S NM

Final Solution in Process


BA

SiSiSiSiP

Ap

in Process

Designing for Fast Convergence� Designing for FC is more than tuning a few timers� Designers need to look at all network layers

Layer 1 and Layer 2 for failure detection properties and physical topology (shared-risk link groups)Layer 3 protocol behaviour, interactions between different


Layer 3 protocol behaviour, interactions between different protocolsLayer 4-7 for application requirements and behaviour

� The base must be a solid network design!� Balance must be achieved between engineering complexity and gain.

routing protocol convergence and availability · network convergence •network convergence is the...

Documents