routing resiliency latest enhancements clarence...

49
Cisco Confidential © 2010 Cisco and/or its affiliates. All rights reserved. 1 Routing Resiliency Latest Enhancements Clarence Filsfils [email protected]

Upload: hadien

Post on 07-Jul-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

Cisco Confidential © 2010 Cisco and/or its affiliates. All rights reserved. 1

Routing Resiliency Latest Enhancements

Clarence Filsfils – [email protected]

Page 2: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2

• RLFA

• RCMD

Page 3: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

Cisco Confidential © 2010 Cisco and/or its affiliates. All rights reserved. 3

Remote LFA

Page 4: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4

• IGP pre-computes a backup path per primary path to an IGP destination

• FIB pre-installs the backup path in dataplane

• Upon local failure, all the backup paths of the impacted destinations are enabled in a prefix-independent manner (<50msec LoC)

– Hierarchical HW FIB organization

– Similar to BGP-PIC FRR behavior

S F

C

E

D1

D2

C is an LFA for D1 if CD1 < CS + SD1

Page 5: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5

• Simple

• Sub-50msec

• Link, Node and SRLG Protection

• Deployment friendly

– no protocol change, no interop testing, incremental deployment

• Good Scaling

• No degradation on IGP convergence for primary paths

Page 6: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6

• Topology dependent

– availability of a backup path depends on topology

Page 7: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7

• 11 real Core Topologies

– average coverage: 94% of destinations

– 5 topologies higher than 98% coverage

• Real Aggregation

– simple design rules help ensure 100% link/node protection coverage for most frequent real aggregation topologies

– RFC6571

– Sweet Spot

>A simple solution is essential for access/aggregation as it represents 90% of the network size hence complexity

Page 8: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8

• Is there a way to also support the ring and “biased square”?

Biased Square

(a<c)

Ring

Page 9: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9

• Absolutely keep Per-prefix LFA benefits

– simplicity

– incremental deployment

• Increase coverage for real topologies

– primarily for ring and biased-square access topologies

– potentially for core topology

– “98/99%” is seen as good-enough

– 100% coverage is “icing on the cake”

Page 10: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10

• No LFA protection in the ring

– if E4 sends a C1-destined packet to E3, E3 sends it back to E4

Page 11: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11

• Any node which meets the P and Q properties

– P: the set of nodes reachable from E4 without traversing E4E5

– Q: the set of nodes which can reach E5 without traversing E4E5

• Best PQ node

– the closest from A: E1

• Establish a directed LDP session with the selected PQ node

C1

E5

E4

E3

E1

E2

C2

Backbone

Access Region

E1

Page 12: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12

• E4’s LIB

– E5’s label for FEC C2 = 20

– E3’s label for FEC E1 = 99

– E1’s label for FEC C2 = 21

• E4’s FIB for destination C2

– Primary: out-label = 20, oif = E5

– Backup: out-label = 21

oif = [push 99, oif = E3]

RLFA is LFA from a remote node (E1)

C1

E5

E4

E3

E1

E2

C2

20 21

99

With Node and SRLG protection!

Page 13: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13

• PQ’s coverage extension is significant for some SP’s

Page 14: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14

• Odd ring: 2 LDP additional sessions per node

• Even ring: 1 LDP additional session per node

Page 15: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15

• Small number of automatically signaled LDP sessions per node

Page 16: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16

• In a square, any metric should be less than the sum of the 3 other links

Simple rule: any link in a square should have a metric less than the sum of the 3 other links

E1 can send a C2-destined packet to E2 whatever the E1E2 metric, but E2 forwards it to C2 only if E2C2 is < E2E1C1C2 C2 sends a C1-bound packet to C1 only if C2C1 < C2E2E1C1. Applying this for any link in the square we see that a link metric should be less than the sum of the other 3 link metrics

Page 17: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17

• E1 has no LFA for C1

– E2 routes back

• E1 has no RemoteLFA for C1

– P and Q intersection is null

Page 18: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18

• When the P and Q space do not intersect, then setup an RSVP-LSP to the closest Q node

– RSVP allows an explicit path and hence a path that avoids the primary link

• Very few RSVP-LSP’s

• Automated

• 100% guarantee

• Node protection

Page 19: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19

• Seamless integration with Per-Prefix LFA

– Packets take their shortest paths from the PQ node

– Destinations use per-prefix LFA onto physical oif when available (i.e. per-prefix LFA), and per-prefix LFA onto LDP LSP (i.e. Remote LFA) otherwise

• Simple

– Automated computation, negligible CPU, low TLDP requirement

• Incremental Deployment

– New code only at the protecting node

• Meet the real coverage requirements

– backbone and access/aggreation

Page 20: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20

• Seamless and straightforward extension to per-prefix LFA

• WG document

• 3 implementations across 2 vendors

Page 21: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21

• RLFA is a seamless extension to Per-Prefix LFA

• It preserves its benefits

– simplicity, incremental-deployment, scalable

• And drastically increase coverage for real topologies

– rings and biased-square

– backbone topologies

Page 22: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

Cisco Confidential © 2010 Cisco and/or its affiliates. All rights reserved. 22

Route Convergence Monitoring & Diagnostics

Page 23: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23

OSPF BGP LDP

H/W FIB

RIB

FIB

Route

Processor (RP)

RP/LC

ISIS

LSD BCDL

BCDL

Line Cards

(LC)

Flooding

Update

Detection

Page 24: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24

Was end-to-end connectivity restored within a second?

What is the network availability for the last N days?

How do network design changes affect convergence?

What is the route change flooding propagation delays seen in the network?

Are timers and other tuning parameters working optimally?

How are different routers or segments of network handling failures?

How can we get these answers in production networks?

Route Convergence Monitoring Diagnostics

Page 25: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25

• A tool that collects and reports data related to Routing Convergence

• Provides “in-router” view of convergence events – data exported via XML can be correlated & leveraged by an “offline” tool

• Lightweight and always-on – non convergence impacting

• Persistent – archived for use after hours/days

• Covers SP core IGP/LDP network in first phase

• Runs in two modes • Monitoring – detecting events & measuring convergence

• Diagnostics – additional (debug) info collection for “abnormal” events

• Debugging router/network-wide convergence for an event is complex

• Affects ISPs ability to commit to SLA

Page 26: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26

• Tracing

– since day1, IOS-XR processes have a lightweight capability to log traces of what they do. Very performant (different than debugging)

• Route Flow Marker

– each routing event results into a Route Flow (RF). A RF is made of four ordered sub-flows representing the four classes of service (critical, high, medium and low). A single RF is characterized by at most 8 markers

• RCMD in a nutshell

– Add these 8 new markers

– Name them to allow unambiguous correlation with IGP data

– Trace them down to LC FIB update

Page 27: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 27

IGP-1

Trigger

T1: IGP-1

SPF

ID K

T2: IGP-1-k

Start-Crit

T10: RIB-1-k

Start-Crit

T18: LDP-1-k

Start-Crit

T26: FIB-IP-1-k

Start-Crit

T34: FIB-M-1-k

Start-Crit

T3: IGP-1-k

End-Crit

T11: RIB-1-k

End-Crit

T19: LDP-1-k

End-Crit

T27: FIB-IP-1-k

End-Crit

T35: FIB-M-1-k

End-Crit

T8: IGP-1-k

Start-Low

T16: RIB-1-k

Start-Low

T24: LDP-1-k

Start-Low

T32: FIB-IP-1-k

Start-Low

T36: FIB-M-1-k

Start-Low

T9: IGP-1-k

End-Low

T17: RIB-1-k

End-Low

T25: LDP-1-k

End-Low

T33: FIB-IP-1-k

End-Low

T37: FIB-M-1-k

End-Low …

Page 28: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28

• Tracing: synchronous with the route event

• Collection: asynchronous and periodical: every x minutes, the RCMD process retrieves the RCMD traces from each linecards

• Filing: asynchronous and periodical: every x minutes, the RCMD process files the RCMD data in a structured database (local disk or remote disk)

• UI:

– a few show commands to provide basic reports

– XML access to complete data for customized reports

Page 29: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 29

• Poll RCMD on a daily/weekly basis across all routers

• For each LSP that was flooded

– determine the origination time

– for each remote router, determine the flooding time

– flooding = Time until the remote router got the LSP

– case flooding >= 100msec: orange flag

– case flooding >= 200msec: red flag

– compute average and percentiles…

Page 30: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 30

• Poll RCMD on a daily/weekly basis across all routers

• For each router, for each IGP event

– determine the duration until the last important prefix was updated across all linecards (ie. called “update”). This is readily available from RCMD data.

– determine the number of important prefixes that were updated across all linecards (ie. called “A”). This is readily available from RCMD data.

– case A > 1k: “scale is larger than expected”

– case A <=1k & update >= 250msec: orange flag

– case A <=1k & update >= 500msec: red flag

– compute average and percentiles…

Page 31: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 31

• If

– detection is known to be <10msec

– flooding for any IGP event was verified by RCMD to be < 200msec

– update for any IGP event and any router was verified by RCMD to be < 500msec

• Then

– for any IGP event, for any router, loss of connectivity < sec !

• This does not require any complex offline processing

Page 32: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 32

• Cariden Mate to leverage RCMD data and build effective availability analysis or post-mortem detailed analysis for SLA deviations

Page 33: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 33

• Monitoring intra/inter-level & external routes on a per SPF basis

• tracking on prefix priority basis

• maximum of 4 sets of routes tracked per SPF– one for each priority

• Covers all types of SPF

• Full

• Incremental

• Partial Route Calculation

• Nexthop Change Calculation

• Reporting done on per SPF event basis

• aggregate convergence time also reported on prefix priority basis

• convergence time for a priority is when last route (intra/inter/ext) that is provisioned

• Provides timers values applied for the SPF along with activity statistics and trigger reasons & times

* Also support OSPF

Page 34: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 34

• ISIS convergence events (i.e. SPF runs) and time taken to provision route+label changes across all LCs

• SPF computations statistics, trigger reasons, wait times

• LSPs that were processed and the timestamps of when their change was detected

• Route prioritization aware – reporting done on aggregate route priority set and not per prefix

• Leaf network deletes detected during the SPF (throttled)

• Statistics – route counts, LSP change counts

* Also support OSPF

Page 35: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 35

Edge

Router

Core Router

Core Router

Edge

Router

C1

ABR

C2

A1

------------------------------------------------------------------------------------------------

Event Num | LSP ID |Seq Num |SPF Run | Lvl | Time | Trigger

------------------------------------------------------------------------------------------------

13 0020.0203.2002.00-00 50 0 L2 Feb 16 14:39:55.802 aj if

LSP Regeneration Report from C1 (attached to failure)

Run: 80 Topology: 0 Level: L2 Type: Full

Trigger: Feb 16 14:39:55.173 Trigger: ll lp

Wait: 100 Start: 102 Duration: 1

Trigger LSP: 0020.0203.2002.00-00 Seq: 50 Change-type: Modify Time: Feb 16 14:39:55.173

<snip>

LSP Processed:

Id: 0020.0203.2002.00-00 Seq: 50 Change-type: Modify Recv-Time: Feb 16 14:39:55.173

Id: 0000.0000.0030.00-00 Seq: 2337 Change-type: Modify Recv-Time: Feb 16 14:39:55.239

SPF report snippet from C2 (remote router)

Page 36: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 36

Legend:

SPF - ISIS process level SPF counter.

Trigger Time - Absolute time when the SPF was triggered (in mmm dd hh:mm:ss.msec).

Dur - Duration of the ISIS SPF computation (msecs).

Type - Type of SPF run.

Critical/High/Medium/Low - Priority based on the configure prefix prioritization policy.

For each priority - Total prefixes affected and time taken for their programming (msecs) for IP and MPLS.

^ no route change # threshold exceeded ~ incomplete data * collection pending

Reporting SPF Events for ISIS Instance : 1

--------------------------------------------------------------------------------------------------------------------------------

SPF | Trigger Time | Dur |Type | LSPs | Critical | High | Medium | Low

--------------------------------------------------------------------------------------------------------------------------------

71 Feb 16 14:04:14.864 9 PRCL 0 0 / - / - 3 / 116 / 125 3 / 117 / 126 64 / 122 / 127

73 Feb 16 14:05:43.768 1 FULL 2 3 / 107 / 116 3 / 107 / 118 3 / 107 / 119 6 / 108 / 121

74 Feb 16 14:24:30.108 0 PRCL 1 0 / - / - 0 / - / - 0 / - / - 1 / 107 / 112

75 Feb 16 14:24:34.978 1 FULL 2 3 / 107 / 118 3 / 107 / 120 3 / 108 / 122 5 / 108 / 125

76 Feb 16 14:28:50.800 1 FULL 2 3 / 107 / 116 3 / 107 / 118 3 / 107 / 119 6 / 108 / 121

77 Feb 16 14:37:36.491 0 PRCL 1 0 / - / - 0 / - / - 0 / - / - 1 / 106 / 112

^78 Feb 16 14:37:44.627 0 FULL 1 0 / - / - 0 / - / - 0 / - / - 0 / - / -

79 Feb 16 14:37:45.075 1 FULL 1 3 / 107 / 117 3 / 108 / 119 3 / 108 / 121 5 / 108 / 125

80 Feb 16 14:39:55.173 1 FULL 2 3 / 107 / 117 3 / 107 / 118 3 / 108 / 119 6 / 108 / 121

SPF summary report snippet from C2 (remote router)

Edge

Router

Core Router

Core Router

Edge

Router

C1

ABR

C2

A1

• Provides high level snapshot of SPF events – their impact on routes and convergence times

• Also identifies events where “threshold” has exceeded.

Page 37: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 37

Run: 80 Topology: 0 Level: L2 Type: Full

Trigger: Feb 16 14:39:55.173 Trigger: ll lp

Wait: 100 Start: 102 Duration: 1

Trigger LSP: 0020.0203.2002.00-00 Seq: 50 Change-type: Modify Time: Feb 16 14:39:55.173

Node Stats: Added: 0 Deleted: 0 Modified: 2

Reachable: 40 Unreachable: 0 Touched: 40

Timeline Summary:

Priority: Critical

Route Count: Added: 0 Deleted: 0 Modified: 3

IP Route Program Time: Min: 107(0/0/CPU0) Max: 107(0/0/CPU0)

MPLS Label Program Time: Min: 117(0/0/CPU0) Max: 117(0/0/CPU0)

Priority: High

Route Count: Added: 0 Deleted: 0 Modified: 3

IP Route Program Time: Min: 107(0/0/CPU0) Max: 107(0/0/CPU0)

MPLS Label Program Time: Min: 118(0/0/CPU0) Max: 118(0/0/CPU0)

Priority: Medium

Route Count: Added: 0 Deleted: 0 Modified: 3

IP Route Program Time: Min: 108(0/0/CPU0) Max: 108(0/0/CPU0)

MPLS Label Program Time: Min: 119(0/0/CPU0) Max: 119(0/0/CPU0)

Priority: Low

Route Count: Added: 0 Deleted: 1 Modified: 5

IP Route Program Time: Min: 108(0/0/CPU0) Max: 108(0/0/CPU0)

MPLS Label Program Time: Min: 121(0/0/CPU0) Max: 121(0/0/CPU0)

<snip>

SPF convergence report snippet from C2 (remote router)

Edge

Router

Core Router

Core Router

Edge

Router

C1

ABR

C2

A1

• Provides details on router-wide route update time, trigger details, statistics, fastest/slowest LCs, etc.

Page 38: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 38

• Characterization of convergence times for critical/high priority prefixes

– How many %age converged in sub-second periods?

– Impact on customer SLA

• Type 1-2 LSA (or LSP) flooding/propagation delays see in the network

• Data to monitor & analyze impact of network changes on convergence

> How different routers reacted during a period of churn?

• Diagnostics mode automatically triggered and additional debug data collected when update times exceed specified threshold

• Detailed convergence data collected and archived for post-mortem analysis of critical and high impact failures

• Analysis on effectiveness of SPF & LSA/LSP timers

• Integration with intelligent offline tool that is topology aware to gather data for end-to-end network convergence

Page 39: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 39

4.2.0

(FCS 12/2011)

• OSPF/ISIS SPF event monitoring

• IGP system-wide prefix prioritization

• Detailed convergence timeline, LSP/LSA triggers, statistics reporting

• IPv4 support

• Default VRF only support

• OSPF/ISIS interface event logging

• LDP neighbor/session event logging

• Diagnostic mode via EEM script

• RCMD Framework; CLI/XML, archiving, MC 8+1 scale

• CRS, XR12K, ASR9K support

4.3.0

(FCS 12/2012)

• OSPF/ISIS individual prefix monitoring

• OSPF summary & external prefix add/delete monitoring

• OSPF Type 3/5/7 LSA tracking for add/delete

• Detailed convergence timeline, path operations and statistics for prefix events

• Diagnostics mode for prefix events

• LFA coverage reporting via SPF reports

• Diagnostics mode for LFA coverage

Future

• …

Page 40: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 40

• Challenges exist TODAY in Monitoring and Analyzing Routing Convergence in production networks

• RCMD is a tool that reports data related to Routing Convergence

• Provides “in-router” detailed view of convergence events

• Lightweight and always-on – non convergence impacting

• Persistent – archived for use after hours/days

• Runs in Monitoring & Diagnostics modes

• Leverage RCMD reports for network-wide convergence analysis

• Impact to service SLAs

• Impact of network design changes & growth

Page 41: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

Thank you.

Page 42: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

Cisco Confidential © 2010 Cisco and/or its affiliates. All rights reserved. 42

Annex LFA

Page 43: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 43

Intra Link

Prot

Intra

Node

Prot

Inter

Link

Prot

Inter

Node

Prot

uLoop Link

Vs

Prefix

Triangle Y Y Y C: if e<c

N =

Full-Mesh Y Y Y A: Y

C: if e<c

N =

Square Y except (1)

Y Y A: Y

C: if e<c

N except (2)

= except (3)

(1): C1 has no LFA for dest=A1, (2): traffic to A1 when C1A1 fails, (3): C1A1 (in this direction only) has no per-link LFA

Triangle Full-Mesh Square (c<a)

Page 44: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 44

• Based on ~10 SP backbone topologies

– Link LFA: 70% of the links are protected

– Prefix LFA: 94% of the prefixes across all links are protected

• Some SP’s selected LFA FRR for the backbone

– implies a tight process to plan the topology

– needs tools such as Cariden Mate

– 5 topologies are well above 95% protection

– Per-Prefix LFA is likely selected for its better coverage

As explained previously, ~50% of

the topologies are LFA-friendly

either by nature or design

Page 45: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 45

• E4’s LIB

– E5’s label for FEC C2 = 20

– E5’s label for FEC C3 = 19

– E3’s label for FEC E1 = 99

– E1’s label for FEC C2 = 21

– E1’s label for FEC C3 = 23

• E4’s FIB for destination C2

– Primary: out-label = 20, oif = E5

– Backup: out-label = 21

oif = [push 99, oif = E3]

• E4’s FIB for destination C3

– Primary: out-label = 19, oif = E5

– Backup: out-label = 23

oif = [push 99, oif = E3]

C1

E5

E4

E3

E1

E2

C2

C2:20

C3:19 C2: 21

C3: 23

99

C3

One single PQ node is

computed per link

The backup rewrite is per-prefix

Same benefit as per-prefix LFA: from

the PQ node, the rerouted packets

take the shortest path to their

destinations

Page 46: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 46

• E4 computes a single PQ node for the link E4E5

– P space: does not require any new computation

>Nodes whose shortest-path avoid E4E5

>Nodes whose shortest-path is E4E5 but have an LFA

– Q space: one dijkstra per link

>Dijkstra rooted at E5 with the reverse metrics and with E5E4 branch pruned

• Extremely low CPU requirement

Page 47: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 47

• Exactly like per-prefix LFA but as if the PQ node was directly connected

• Hence, all per-prefix LFA benefits are preserved

– each packet uses its shortest-path from the PQ node

– Excellent capacity planning

– “De Facto” Node protection

>(100% node protection for ring and biased square)

Page 48: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 48

• Upon E1C1 failure, E1 has no per-prefix LFA to C1

– E2 routes C1 via E1

• With RemoteLFA, upon E1C1 failure, E1 forwards the packets destined to C1 towards the PQ node (C2) from where the packet reaches C1

Backbone

Access Region

E1 E2

C1 C2 20

10

10

10

With Node and SRLG protection!

Page 49: Routing Resiliency Latest Enhancements Clarence …d2zmdbbm9feqrf.cloudfront.net/2013/eur/pdf/BRKIPM-2000.pdf• Debugging router/network-wide convergence for an event is ... T24:

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 49

• Upon computation of a new PQ node Z, the local router R establishes a targetted LDP session with PQ node Z

– If Z ceases to be a PQ node, R waits 3 minutes before clearing the session

• Z must be configured to accept Targetted LDP session

– mpls ldp discovery targeted-hello accept [ from <peer-acl> ]

• Same security model as PWE, VPLS and LDP over Full-Mesh TE deployments