routing protocol convergence and availability · network convergence •network convergence is the...
TRANSCRIPT
Routing Protocol Convergence and
© 2009 Cisco Systems, Inc. All rights reserved.LACNOG2010 Cisco Public
Convergence and Availability
Alvaro Retana ([email protected])Principal EngineerCore IP Technology Architecture
High AvailabilityOverview
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 2
Overview
2
Availability Definitions
• The probability that a service (or network, etc.) is operational, and functional as needed, at any point in time
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 3
• Availability = (MTBF—MTTR)/MTBFUseful definition for theoretical and practical
• MTBF is mean time between failureWhat, when, why and how does it fail?
• MTTR is mean time to repairHow long does it take to fix?
What Is High Availability?
Availability Downtime Per Year (24x365)99.000%99.500%99.900%
3 Days 1 Day
15 Hours19 Hours8 Hours
36 Minutes48 Minutes46 Minutes
DPM1000050001000
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 4
DPM = Defects per Million (Hours of Running Time)
99.950%99.990%99.999%99.9999%
53 Minutes5 Minutes30 Seconds
4 Hours 23 Minutes500100101
“HighAvailability”
Downtime
67%
67%
79%
87%
87%
Customer Premises Equipment Failure
Network Software Failures
Network Hardware Failures
Physical Link Failures
Network Operations Failures
Causes of Unscheduled Downtime
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 5
25%
37%
37%
44%
62%
67%
0% 20% 40% 60% 80% 100%
Malicious Damage
Acts of Nature
Unknown
Congestion/Overload
Physical Environment Failures
Customer Premises Equipment Failure
% of RespondentsSource: Sage Research, IP Service Provider Downtime Study: Analysis of Downtime Causes,
Costs and Containment Strategies, August 17, 2001, Prepared for Cisco SPLOB
Network Convergence• Network convergence is the time needed for traffic to be rerouted to the alternative or more optimal path after the network event
• Network convergence requires all affected routers to process the event and update the appropriate data structures used for forwarding
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 6
structures used for forwarding• Network Convergence is the time required to:
Detect event has occurredPropagate the eventProcess the eventUpdate related forwarding structures
Network Convergence (2)� Network Design and Operational Considerations
Processes for fault, configuration, performance and securityNo Single Points of Failure (except at edge) / Failure Domain SizeExcellent consistency (HW, SW, config, design)Redundancy, Hierarchy, Summarization, Modularity
� DetectionPhysical Failure (light!)Fast Hellos
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 7
Fast HellosBidirectional Forwarding Detection (BFD)
� HidingInterface Dampening (for flapping links)Graceful Restart
� Propagation and ProcessingLink State Exponential Back offPrefix PrioritizationBGP Prefix Independent Convergence (PIC)IP Fast ReRouteIGP/BGP Interaction
Network Convergence (2)� Network Design and Operational Considerations
Processes for fault, configuration, performance and securityNo Single Points of Failure (except at edge) / Failure Domain SizeExcellent consistency (HW, SW, config, design)Redundancy, Hierarchy, Summarization, Modularity
� DetectionPhysical Failure (light!)Fast Hellos
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 8
Fast HellosBidirectional Forwarding Detection (BFD)
� HidingInterface Dampening (for flapping links)Graceful Restart
� Propagation and ProcessingLink State Exponential Back offPrefix PrioritizationBGP Prefix Independent Convergence (PIC)IP Fast ReRouteIGP/BGP Interaction
Graceful Restart
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 9
Graceful Restart
9
NSF/SSO• Standby Route Processor (RP) takes control of router after a hardware or software fault on the Active RP
• SSO allows standby RP to take immediate control and maintain connectivity protocols
StandbyRP
ActiveRP
State Information
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 10
immediate control and maintain connectivity protocols
• NSF continues to forward packets until route convergence is complete
RPRP
Line CardLine Card
NSF/SSO• Provide a scalable solution
Architecture must scale with workloads and features and meet network requirements
• Minimize state that must be synchronizedMinimize impact of HA on service
• Detect and react to failures quickly
Design Goals
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 11
• Detect and react to failures quicklyContinuously monitor Active componentsContinuously verify operation of Standby components
Graceful Restart• When the BGP peering session is brought up, the graceful restart capability is negotiated. If both peers state they are capable of GR, it’s enabled on the peering session.
• When A restarts, it opens a new
Control Data
GR ca
pabil
ityNe
w TC
P Ses
sion
Restart; close
BGP
A
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 12
• When A restarts, it opens a new TCP session to B, using the same router ID.
• B interprets this as a restart, and closes the old TCP session. Control Data
GR ca
pabil
ityNe
w TC
P Ses
sion
Restart; close old session
B
Graceful Restart• B transmits updates containing its BGP table (it’s local RIB out).
• A goes into read only mode, and does not run the bestpath calculations until its B has finished sending updates.
Control Data
Upda
tesEn
d of R
IB Ma
rker
Read only
A
BGP
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 13
• When B has finished sending updates, it sends an end of RIB marker, which is an update with an empty withdrawn NLRI TLV.
Control Data
End o
f RIB
Marke
r
Read only mode
B
Graceful Restart• When A receives the end of RIB marker, it runs bestpath, and installs the best routes in the routing table.
• After the local routing table is updated, BGP notifies CEF.
Control Data A
BGP
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 14
• CEF then updates the forwarding tables, and removes all information marked as stale.
Control Data B
Graceful Restart• rfc4724: Graceful Restart Mechanism for BGP• rfc5306: Restart Signaling for IS-IS• rfc4811: OSPF Out-of-Band Link State Database (LSDB) Resynchronization
References
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 15
• rfc5613: OSPF Link-Local Signaling• rfc4812: OSPF Restart Signaling• rfc3623: Graceful OSPF Restart
Fast Convergence
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 16
Fast Convergence
16
OSPF Architectural Constants• Initial LSA Generation Delay = 500 ms • Recurring LSA Origination Delay = 5 s• LSA Arrival Throttling = 1 s• LSA Flooding Pacing = 33 ms
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 17
• LSA Flooding Pacing = 33 ms• LSA Retransmission = 66 ms• SPF Execution Delay = 500 ms• SPF Holdtime = 5 s
Event Propagation• Fast LSA Generation after Initial Event• Repeated events increase regeneration delay• Configuration:timers throttle lsa all <lsa-start> <lsa-hold> <lsa-max>
OSPF Exponential Backoff
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 18
� Similar Configuration for Event Processing (SPF Runs)timers throttle spf <spf-start> <spf-hold> <spf-max>
timers throttle lsa all 10 500 5000
previous LSA generation at t0 (t1 – t0) > 5000 msEvents Causing LSA Generation
t1 time [ms]t2
1000
Event PropagationOSPF Exponential Backoff
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 19
LSA Generation
LSA Generation – Back-off Alg.
time [ms]
time [ms]
time [ms]t2
500
t1+10
5000 5000
1000 2000 4000 5000500
Link State Prefix Priority• Prefix Prioritization
4 priorities: Critical, High, Medium, Low/32 IPv4 and /128 IPv6 prefixes are classified by default in Medium PriorityRest is classified by default in Low Priority
• Prefix Prioritization is THE key behavior; for example
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 20
• Prefix Prioritization is THE key behavior; for exampleCRITICAL: IPTV SSM sourcesHIGH: Most Important PE’sMEDIUM: All other PE’sLOW: All other prefixes
BGP PIC EdgePE-CE link failure (fast repair)
RR1 RR2
RR4RR3
1. link PE2-CE2 failsIf BGP PIC Edge implemented, then traffic
goes PE1,PE2,PE3,CE2
BGP PIC Edge
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 21
VPN 1site Bx.x.x.x/y
RD 1:1RD 2:1
RD 3:1
RR4RR3
PE1PE2
PE3
CE2CE1VPN 1site A
BGP PIC EdgePE-CE link failure (re-optimization)
RR1 RR2
RR4RR33. PE2 withdraws paths4. RR2 and RR4 propagate
1. link PE2-CE2 failsIf BGP PIC Edge implemented, then traffic
goes PE1,PE2,PE3,CE2
2. Fast External Fallover scans BGP table, calculating new bestpaths
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 22
VPN 1site Bx.x.x.x/y
RD 1:1RD 2:1
RD 3:1
RR4RR3
PE1PE2
PE3
CE2CE1VPN 1site A
6. PE1 deletes path via PE2, now going via PE3
5. RR1 and RR3 propagate withdraws
4. RR2 and RR4 propagate withdraws
BGP PIC EdgePE node failure (fast repair)
RR1 RR2
RR4RR3
3. PE1 withdraws pathsIf BGP PIC Edge implemented, then
traffic goes PE1,PE3,CE2
1. link PE2 fails2. The IGP does propagate the BGP NH failure
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 23
VPN 1site Bx.x.x.x/y
RD 1:1RD 2:1
RD 3:1
RR4RR3
PE1PE2
PE3
CE2CE1VPN 1site A
10000
100000
1000000 msec
250k PIC250k no PIC500k PIC
BGP PIC Edge sample
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 24
1
10
100
1000
0
5000
0
1000
00
1500
00
2000
00
2500
00
3000
00
3500
00
4000
00
4500
00
5000
00
Prefix
500k PIC500k no PIC
IP Fast ReRoute
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 25
IP Fast ReRoute
25
Objective• Provide fast re-route in pure IP networks and MPLS/LDP networks without deploying RSVP-TE.
• To restore productive forwarding to all reachable addresses within 50ms.
• Control the transition of the network from repair to
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 26
• Control the transition of the network from repair to normal forwarding without further packet loss or micro-looping.
The Four Stages of IPFRR1. Pre-computation of repair paths2. Detection of failure3. Invocation of appropriate repair4. Controlled re-convergence of network
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 27
4. Controlled re-convergence of network
Basic Repair• Uses ECMP and Loop Free Alternates (LFA) where available
• LFAs easily computed in OSPF and IS-IS • Analogous to feasible successors in EIGRP
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 28
• Properties:•In general topologies around 80% of failures allow alldestinations to be repaired•For the remaining 20%, only a subset of destinations can be repaired
Triangle topology - ECMP
SiSi SiSi
S N
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 29
BA
SiSiSiSiP O
Square topology - LFA
SiSi SiSi
S N
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 30
BA
SiSiSiSiP
More complex topology – no LFA available
SiSi SiSiSiSi
S NM
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 31
BA
SiSiSiSiP
Complex topology
SiSi SiSiSiSi
S NM
Final Solution in Process
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 32
BA
SiSiSiSiP
Ap
in Process
Designing for Fast Convergence� Designing for FC is more than tuning a few timers� Designers need to look at all network layers
Layer 1 and Layer 2 for failure detection properties and physical topology (shared-risk link groups)Layer 3 protocol behaviour, interactions between different
© 2009 Cisco Systems, Inc. All rights reserved. Cisco PublicLACNOG2010 33
Layer 3 protocol behaviour, interactions between different protocolsLayer 4-7 for application requirements and behaviour
� The base must be a solid network design!� Balance must be achieved between engineering complexity and gain.