network diagnosis: prevent, prepare,...

96

Upload: trinhliem

Post on 17-Mar-2018

221 views

Category:

Documents


3 download

TRANSCRIPT

Network Diagnosis: Prevent, Prepare, Repair BRKARC-2002

2

3

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

The Classic “Whodunit” Is Back— With a Modern Twist!

Features Updated Suspects and Problems!

Deck of Red Herring Cards!

Who?

Where?

With What?

4

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

6500

Data Center

High CPU

ASR1K

Edge

Dropping Packets

3945

Remote Site

Stuck in rommon

?

? ?

5

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Agenda

Preparation and Methodology

Prevent, Prepare, Repair

Know Your Crime Scene and Your Suspects

Overview of Troubleshooting Methodology

Troubleshooting and Repairing

Troubleshooting Packet Flow

Utilization Issues

6

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Agenda

Preparation and Methodology

Prevent, Prepare, Repair

Know Your Crime Scene and Your Suspects

Overview of Troubleshooting Methodology

Troubleshooting and Repairing

Troubleshooting Packet Flow

Utilization Issues

7

Prevent, Prepare, Repair

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Prevent, Prepare, Repair

– What is the “normal” state of your network?

– How do you “expect” it to fail?

Know Your Crime Scene and Your Suspects

– Architectures in your network and their failure modes.

Overview of Troubleshooting Methodology

– Detection 101

Preparation and Methodology

9

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Never underestimate the role of good baselining.

Understand what is “normal” for your network. – If traffic to the web peaks every Friday at 5PM, then be ready for

it.

– If backups to the data center happen every morning at 2AM, then you might have problems when the CEO decides to do a late-night Telepresence with Sydney.

Document the things you expect to go wrong and create plans to deal with them. – Nothing can replace a well-thought-out plan when the s*** hits the

fan.

– It isn’t practical to prevent all potential failures, but it is practical to plan for them!

Preparing for the Crime

10

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Automatically logs commands and changes to configurations including the user ID of who made the change.

3945# configure terminal

3945(config)# archive

3945(config-archive)# log config

3945(config-archive-log-config)# logging enable

3945(config-archive-log-config)# logging size 200

3945(config-archive-log-config)# hidekeys

3945(config-archive-log-config)# notify syslog

3945# show archive log config 1 2

idx sess user@line Logged command

1 1 user1@console logging enable

2 1 user1@console logging size 200

IOS Change Notification and Logging

11

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Prevent, Prepare, Repair

Anatomy of a Recovery

Failure Action Recovery

12

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Prevent!

Avoiding Failure…..

Failure Action Recovery

13

Prevent, Prepare, Repair

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Prepare!

Reducing Action/Reaction/Troubleshooting…..

Failure Action Recovery

14

Prevent, Prepare, Repair

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Repair

Failure

Getting to Recovery…..

Failure Action Recovery

15

Prevent, Prepare, Repair

Albert Einstein

“Insanity: doing the same thing over and over again and expecting different results.”

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

How can we Prevent this in the future

Learning from Experience…..

Failure Action Recovery

If we can’t prevent……

How could we Repair this faster next time

How can we better Prepare

Improve

Improve!

If we are as prepared as we can be….

17

Prevent, Prepare, Repair, Improve

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Your Work Isn’t over when the Network Is Back!

Formalize the Post-Mortem Procedure

Understand if this failure was planned for. – If it was, were the documented procedures adequate?

– If it wasn’t, then why not?

Every failure should improve the procedures.

Eventually every failure is planned for.

Filling Out the Report

18

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Single Point of Failures?

Bottlenecks?

Convergence:

Timers Where We Want Them?

Bandwidth of Links: Are They Being Utilized Efficiently?

Is There Enough Bandwidth to Get the Job Done?

Quality of Service?

How Can Our Network Be Improved?

CPU Percentage: What Is Using the Most Time?

Is this Okay?

19

Prevent, Prepare, Repair, Improve

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Failure Action Recovery

20

Prevent, Prepare, Repair, Improve

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Network Engineers

Management/Policy

Single Point of Failures?

Bottlenecks? Faster Troubleshooting?

Quality of Networking Documents?

How Can Our Network Be Improved?

Bandwidth of IT Staff: Are They Being Utilized Efficiently?

Enough Bandwidth to Get the Job Done?

“CPU” Percentage of IT Staff: What Is Using the Most Time?

Is this Okay?

Prevent

Prepare

Repair

Improve

Network

Packet

21

Prevent, Prepare, Repair, Improve

Know Your Crime Scene and Your Suspects

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Prevent, Prepare, Repair

Know Your Crime Scene and Your Suspects

Overview of Troubleshooting Methodology

Preparation and Methodology

23

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Know Your Crime Scene and Your Suspects

What?

Where?

Why?

Packet

Packet

Packet

24

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Soap Box: Network Diagrams

Up to date

Easy to read

Have multiple diagrams—physical and logical

Use colors

Use keys

Identify

– Trunk vs. non-trunk

– Port channels

– Primary vs. Backup Paths

Prevent

Prepare

Repair

25

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Physical

Data Link

Network

Transport

Session

Presentation

Application

Router

Cards

Interfaces

Routing

Traffic

Packet

Know Your Crime Scene and Your Suspects

26

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Know Your Crime Scene and Your Suspects

27

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

“Routers operate in two different planes of existence.”

Control plane,

in which the router learns the outgoing interface that is most appropriate for

forwarding specific packets to specific destinations,

Forwarding plane,

which is responsible for the actual process of sending a packet received on a logical

interface to an outbound logical interface.”

Control plane processing

leads to the construction of

what is variously called a

routing table or routing

information base (RIB)

Processes Routing Protocols

Established,

maintained,

information

exchanged, best

path selection

Control

Plane

Forwarding

Plane

Know Your Crime Scene and Your Suspects

Source: http://en.wikipedia.org/wiki/Router

CPU

Packet

28

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Different Devices Fail In Different Ways!

Understand where you have basic “single-brain” devices and where you have unique “dual-brain” devices.

Dual-brain devices often have unique characteristics and caveats. – Example: The ASR has a limited amount of TCAM memory typically used for fast

ACL processing. Stay within that memory and life is good. Step outside and things change.

Understand Common Failure Modes of Your Suspects.

29

Overview of Troubleshooting Methodology

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Prevent, Prepare, Repair

Know Your Crime Scene and Your Suspects

Overview of Troubleshooting Methodology

Preparation and Methodology

31

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

1. Percentage of all network outages caused by natural disasters: 11%

2. Percentage of network downtime caused by natural disasters: 62%

3. Percentage caused by human error: 49%

4. Increase in telecom repair costs, 1994-2002: 133%

5. 99.5% network reliability rate, in minutes of downtime per month: 216

6. 99.99% network reliability rate, in minutes of downtime per month: 4.5

7. 99.999% network reliability rate, in seconds of downtime per month: 29.5

8. Ratio of how fast a bad reputation spreads to how quickly a good reputation spreads: 24:1

– Sources: 1,2,3, IEEE Computer; 4, U.S. Census Bureau; 5,6,7, Telephony Online

Reporting the Crime

32

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

You Should Know when an Outage Occurs Before Your Users Do!

If you wait for the report of an outage, it’s already too late.

Make routine use of network monitoring tools: – IPSLA

– SNMP Traps

– Network Management Tools (both free and paid)

– Application Performance Tools (both free and paid)

33

Reporting the Crime

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Troubleshooting Methodology

What is the problem? Gathering the Facts

Where is the problem? Looking for the Clues – 1st Pass: Looking for the Quick Confession or Obvious Clues

– 2nd Pass: Scavenger Hunt for Clues

– 3rd Pass: Prepping for Deeper Troubleshooting

– The Stakeout

34

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

What is normal? What has changed?

Did someone change

something?

Did the network itself change

something?

What is the problem? Gathering the Facts

What is happening? – Understanding the problem can be the biggest step.

Connectivity loss? – Packet loss?

– Latency?

– Network Management System shows issue

35

Troubleshooting Methodology

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Pings, Traceroutes, Show IP Route Show ip route

Ping

Extended ping

Traceroute

Extended traceroute

R1 R2

R3

R4

R5

1st Pass: Looking for the Quick Confession or Obvious Clues

36

Troubleshooting Methodology

Sherlock Holmes

“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”

Gil Grissom CSI (Las Vegas)

“Let the evidence guide you.”

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Status of Routing Peer Relationships

IP Interface Status

Interface Issues

Interface Summary Information

Device Specific Issues

Application Issues

IP SLA & Application Performance

2nd Pass: Scavenger Hunt for Clues

R1 R2

R3

R4

R5

39

Troubleshooting Methodology

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Status of Peer Relationships Quick Pass to Look for Suspicious Activity

R1#show tcp brief

TCB Local Address Foreign Address (state)

495A1974 10.1.1.1.60263 10.1.1.2.179 ESTAB

4933398C 10.1.1.1.646 10.1.1.2.24193 ESTAB

4958E448 10.1.1.1.32644 10.1.1.4.639 ESTAB

4958DBC8 10.1.1.1.711 10.1.1.4.13304 ESTAB

BGP

MSDP TDP

LDP

R1 R2

R3

R4

R5

Is this normal for your network? How would you know?

40

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Status of Peer Relationships

R1#show tcp

Stand-alone TCP connection to host 10.1.1.12

.

.

SRTT: 146 ms, RTTO: 1283 ms, RTV: 1137 ms, KRTT: 0 ms

minRTT: 0 ms, maxRTT: 300 ms, ACK hold: 200 ms

Flags: higher precedence, nagle, path mtu capable

Datagrams (max data segment is 1460 bytes):

Rcvd: 1454 (out of order: 0) ……

Sent: 1454 (retransmit: 0) ……

R1 R2

R3

R4

R5

TCP

41

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Status of Routing Peer Relationships

InQ: Number of messages queued to be processed from the neighbor

OutQ: Number of messages queued to be sent to the neighbor

R1#show ip bgp summary

BGP router identifier 10.1.1.1, local AS number 100

BGP table version is 211, main routing table version 211

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd

3.8.4.5 4 100 8010 8009 211 0 0 5d13h 101

BGP

42

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Status of Routing Peer Relationships

Quick Count: Number of EIGRP packets (update, query, and reply) that the software is waiting to send

R1#show ip eigrp neighbors

IP-EIGRP neighbors for process 100

H Address Interface Hold Uptime SRTT RTO Q Seq

(sec) (ms) Cnt Num

1 10.2.2.2 Gi0/2 14 6d07h 8 200 0 4

0 10.1.1.2 Gi0/1 14 6d07h 1 200 0 6

R1 R2

R3

R4

R5

EIGRP – Quick Pass to Look for Smoking Gun

43

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Status of Routing Peer Relationships

R1#show ip ospf neighbor

Neighbor ID Pri State Dead Time Address Interface

10.4.4.2 1 FULL/BDR 00:00:37 10.2.2.2 GigabitEthernet0/2

10.3.3.2 1 FULL/BDR 00:00:33 10.1.1.2 GigabitEthernet0/1

R1#show ip ospf neighbor detail | include up for

Neighbor is up for 6d07h

Neighbor is up for 6d07h

OSPF – Quick Pass to Look for Smoking Gun

44

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Interface IP Status

Interfaces with IP addresses assigned — are they up/up?

R1#show ip int brief

Interface IP-Address OK? Method Status Protocol

GigabitEthernet4/1 10.1.1.2 YES manual up up

GigabitEthernet4/2 10.3.3.2 YES manual up up

GigabitEthernet4/3 12.1.1.1 YES manual down down

GigabitEthernet4/4 unassigned YES unset down down

GigabitEthernet4/5 unassigned YES unset administratively down down

Only “no shut” interfaces you plan on using

Use of descriptions and then show ip int description

45

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Interface Errors

show interface | include drops

show interface | include errors

7206_VXR#sh int | include drops

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 12

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

7206_VXR#sh int | include errors

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

4 output errors, 0 collisions, 1 interface resets

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

802 output errors, 2705 collisions, 1 interface resets

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort

0 output errors, 0 collisions, 0 interface resets

46

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

3750G#sh int summary

*: interface is up

IHQ: pkts in input hold queue IQD: pkts dropped from input queue

OHQ: pkts in output hold queue OQD: pkts dropped from output queue

RXBS: rx rate (bits/sec) RXPS: rx rate (pkts/sec)

TXBS: tx rate (bits/sec) TXPS: tx rate (pkts/sec)

TRTL: throttle count

Interface IHQ IQD OHQ OQD RXBS RXPS TXBS TXPS TRTL

-------------------------------------------------------------------------

* GigabitEthernet1/0/1 0 0 0 0 0 0 0 0 0

* GigabitEthernet1/0/2 0 0 0 0 0 0 0 0 0

GigabitEthernet1/0/3 0 0 0 0 0 0 0 0 0

GigabitEthernet1/0/4 0 0 0 0 0 0 0 0 0

GigabitEthernet1/0/5 0 0 0 0 0 0 0 0 0

GigabitEthernet1/0/6 0 0 0 0 0 0 0 0 0

GigabitEthernet1/0/7 0 0 0 0 0 0 0 0 0

GigabitEthernet1/0/8 0 0 0 0 0 0 0 0 0

GigabitEthernet1/0/9 0 0 0 0 0 0 0 0 0

GigabitEthernet1/0/10 0 0 0 0 0 0 0 0 0

* GigabitEthernet1/0/11 0 0 0 0 0 0 5000 5 0

* GigabitEthernet1/0/12 0 0 0 0 4000 5 0 0 0

Interface Summary

show interface summary

47

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Troubleshooting Methodology

load-interval 30 configured on interested interfaces

clear counters

Ensure logging

clear log

3rd Pass: Prepping for Deeper Troubleshooting (Still Looking for a Suspect)

48

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Troubleshooting Methodology

Show proc cpu

Show ip traffic (clear ip traffic)

Show log (clear log)

Clear counters

Other Crime/Suspect relevant commands

Potential debugs

Show Platform and other platform specific steps

“The Stakeout” – Very Very Suspicious Router

49

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Troubleshooting Methodology

“The Stakeout” – Very Very Suspicious Router

C6K#sh ip traffic

IP statistics:

Rcvd: 98406446 total, 1289848 local destination

0 format errors, 0 checksum errors, 45999 bad hop count

0 unknown protocol, 48761 not a gateway

0 security failures, 0 bad options, 7576 with options

Opts: 0 end, 0 nop, 0 basic security, 0 loose source route

0 timestamp, 0 extended security, 0 record route

0 stream ID, 0 strict source route, 7576 alert, 0 cipso, 0 ump

0 other

Frags: 0 reassembled, 0 timeouts, 0 couldn't reassemble

0 fragmented, 0 couldn't fragment

Bcast: 960090 received, 0 sent

Mcast: 292915 received, 406578 sent

Sent: 418580 generated, 3763770399 forwarded

Drop: 11 encapsulation failed, 0 unresolved, 0 no adjacency

251 no route, 0 unicast RPF, 0 forced drop

0 options denied, 0 source IP address zero

.

.

50

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Troubleshooting Methodology

“The Stakeout” – Very Very Suspicious Router

.

.

.

ICMP statistics:

Rcvd: 0 format errors, 0 checksum errors, 0 redirects, 0 unreachable

4 echo, 0 echo reply, 0 mask requests, 0 mask replies, 0 quench

0 parameter, 0 timestamp, 0 info request, 0 other

0 irdp solicitations, 0 irdp advertisements

0 time exceeded, 0 timestamp replies, 0 info replies

Sent: 0 redirects, 254 unreachable, 0 echo, 4 echo reply

0 mask requests, 0 mask replies, 0 quench, 0 timestamp

0 info reply, 0 time exceeded, 0 parameter problem

0 irdp solicitations, 0 irdp advertisements.

.

.

.

.

51

Troubleshooting and Repairing

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Router

Cards

Interfaces

Routing

Traffic

…. Questioning Your Suspect

53

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Agenda

Preparation and Methodology

Prevent, Prepare, Repair

Know Your Crime Scene and Your Suspects

Overview of Troubleshooting Methodology

Troubleshooting and Repairing

Troubleshooting Packet Flow

Utilization Issues

54

Troubleshooting Packet Flow

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Troubleshooting Packet Flow

Utilization Issues

Troubleshooting and Repairing

56

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Troubleshooting Packet Flow

Knowing Your Suspect Router’s M.O.

Understand Normal Packet Flow

Cisco Express Forwarding (CEF) Load Sharing

EtherChannel Load Balancing

57

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

*Modus operandi (often used in the abbreviated forms M.O. or simply

Method) is a Latin phrase, approximately translated as "mode of

operation".

Packet

Packet

Source: http://en.wikipedia.org/wiki/Modus_operandi

Knowing Your Suspect Router’s M.O.*

Packet

58

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Knowing Your Suspect Router’s M.O.

59

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Packet

Packet

Packet

“Classic Routers” (aka “Single Brain” Routers)

Knowing Your Suspect Router’s M.O.

With “classic” routers there is one brain that is involved with all of the forwarding of packets

60

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Routing Protocols

Processes

Handles Control Plane Traffic

Manages System

RIB

Knowing Your Suspect Router’s M.O.

Examples:

Intelligent Line Cards

Separate Forwarding Processor

– ASR ESPs

VPN accelerators

– AIM VPN

Digital Signal Processors

Multiple Software Images

Multiple Licenses

Multiple Forwarding Paths

“Distributed Services Routers”

61

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Packet

We need to know how to translate “Yo, dude, what did you do with my packet?” into a show command so that we can get a meaningful answer from our suspect router

Example:

– Separate Forwarding Processor

ASR ESPs

– VPN accelerators

AIM VPN

Packet

62

Routing Protocols

Processes

Handles Control Plane Traffic

Manages System

RIB

Knowing Your Suspect Router’s M.O.

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Asking the Right Questions Just one more

question….

What did you do with

the packet?

63

Packet

Packet

Routing Protocols

Processes

Handles Control Plane Traffic

Manages System

RIB

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Route Processor:

ASR-1000-RP2

Forwarding Processor:

ASR-1000-ESP20

Knowing Your Suspect Router’s M.O.

RP (Route Processor)

– Handles control plane traffic

– Manages system

ESP

– Handles forwarding plane traffic

SPA Carrier Card

– Houses the SPAs

– Not like a 7600 SIP

SPAs

– Provide interface connectivity

ASR-1004

64

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Services Ready Engine

Single or Dual Brain?

Single IOS brain for forwarding

DSP

– Handles voice conferencing and transcoding

Etherswitch Module

– L2/L3 integrated Catalyst switch on a module

Services Ready Engine

– Server-on-a-Blade for Applications

3945 ISR G2 Etherswitch Module

DSP Module:

PVDM3

65

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

IOS Order of Operations

1. RITE 2. EPC 3. QoS Drop 4. VRF Classify 5. Packet Debug 6. Netflow 7. LISP 8. BGP Policy Map 9. QoS Classify 10.Fragment Assembly 11.LI 12. IPS 13.Firewall 14.ACL

15.SBC 16.FPM 17. IPSec Decrypt 18.QoS Marking 19.Policing 20.QoS post-crypto Classify 21.WAAS 22.EZVPN 23.Accounting 24.NAT Outside 25.Policy Routing 26.WCCP 27.VRF Select 28.BOOTP/DHCP Reply

Input Feature Processing in 15.1(3)T

66

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

IOS Order of Operations

1. QoS Classification 2. NAT Inside 3. NHRP 4. WCCP 5. NAT Outside 6. BGP Policy Map 7. IPSec Classify 8. CTS 9. QoS Classification 10.Firewall 11. IPS 12.QoS Drop 13.ACL

14.FPM 15.WAAS 16.QoS Marking 17.Accounting 18.RSVP 19.Policing 20.Netflow 21. IPSec Encrypt 22.Packet Debug 23.Packet Capture 24.HW Checks

Output Feature Processing in 15.1(3)T

67

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Cisco Express Forwarding Load Sharing

Traffic will always flow along one link since there is only one source address and one destination address

Per packet load sharing will make the traffic flow along both links, at the risk of out of order packets

Some platforms support per-packet and per-flow load balancing while others do not. Know your suspects!

68

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Cisco Express Forwarding Load Sharing

Potential to still have the two sources using the same link

Even if the traffic is split across the links, links will still likely show unequal utilization

Telnet

FTP

69

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Cisco Express Forwarding Load Sharing

Over time, the traffic may statistically work out to be equal across multiple links

But it probably won’t

Ratios has high as 70% to 30% can be normal

Per packet load sharing can resolve this, but at the risk of out-of-order packets

70

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

EtherChannel Load Balancing

What type of device is it?

What instructions will it apply to determine what to do?

?

Packet

71

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

EtherChannel Load Balancing

SUP_720#sh etherchannel load-balance

EtherChannel Load-Balancing Configuration:

src-dst-ip

mpls label-ip

EtherChannel Load-Balancing Addresses Used Per-Protocol:

Non-IP: Source XOR Destination MAC address

IPv4: Source XOR Destination IP address

IPv6: Source XOR Destination IP address

MPLS: Label or IP

6500/7600 Sup720

6500/7600

Sup720

Packet

72

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

EtherChannel Load Balancing

3750#sh etherchannel load-balance

EtherChannel Load-Balancing Configuration:

src-mac

EtherChannel Load-Balancing Addresses Used Per-Protocol:

Non-IP: Source MAC address

IPv4: Source MAC address

IPv6: Source MAC address

3750

What is the

Source MAC?

Packet

L3 Forwarding means all packets will come from 3750’s CPU!

73

Utilization Issues

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Troubleshooting Packet Flow

Utilization Issues

Troubleshooting and Repairing

75

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Utilization Issues

Utilization noun

The act of utilizing, or the state of being utilized

The condition of being put to use *

Interfaces

Main CPU

Memory

* http://www.answers.com/topic/utilization

76

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Utilization Issues 7600#show platform hardware capacity

Power Resources

System power: 5772W, 0W (0%) inline, 1559W (27%) total allocated

Flash/NVRAM Resources

Usage: Module Device Bytes: Total Used %Used

3 dfc#3-bootdisk: 1014251520 966656 1%

6 SP disk0: 256462848 194871296 76%

6 SP sup-bootdisk: 512024576 416456704 81%

CPU Resources

CPU utilization: Module 5 seconds 1 minute 5 minutes

3 2% / 0% 5% 5%

6 RP 0% / 0% 0% 0%

6 SP 7% / 0% 5% 4%

Processor memory: Module Bytes: Total Used %Used

3 1712688084 406650040 24%

6 RP 908176176 127643508 14%

6 SP 861979340 212723360 25%

I/O memory: Module Bytes: Total Used %Used

3 134217728 31395560 23%

6 RP 67108864 13204508 20%

6 SP 67108864 13205416 20%

77

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Utilization Issues …(show platform hardware capacity (continued))…

EOBC Resources

Module Packets/sec Total packets Dropped packets

3 Rx: 10 2111619 0

Tx: 4 1179400 0

6 RP Rx: 6 3326992 0

Tx: 5 3256757 0

6 SP Rx: 3 3413610 0

Tx: 9 5910012 3

VLAN Resources

VLANs: 4094 total, 5 VTP, 0 extended, 19 internal, 4070 free

L2 Forwarding Resources

MAC Table usage: Module Collisions Total Used %Used

3 0 98304 9 1%

6 0 65536 9 1%

VPN CAM usage: Total Used %Used

512 0 0%

78

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Utilization Issues

High Processor Utilization

Utilization Issues Internal to the Router

Memory Utilization

Utilizing Post Mortems

79

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

High Processor Utilization

Signs of high processor utilization – Router acting “slow” at the console

– Router inaccessible through telnet

– Packet loss

– Router adding a large amount of delay to packets traversing it

“Processor” can have lots of different meanings depending on platform

show platform commands on most boxes will give an insight into the different processing components on distributed platforms

At what point do we decide that we’ve outgrown this router? – This is an easier decision to make well ahead of time than under the gun.

80

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Processor Utilization

7200VXR#sh process cpu sorted

CPU utilization for five seconds: 61%/22%; one minute: 37%; five minutes: 36%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

185 330388 3657407 90 39.22% 23.46% 24.42% 0 encrypt proc

58 8 348403 0 0.08% 0.07% 0.08% 0 T3E3 EC IPC POLL

235 4 837 4 0.08% 0.00% 0.00% 0 IP-EIGRP: HELLO

39.22

+ .08

+ .08

+….

Very, Very Common Statement

“I add up the 5sec column but it doesn’t add up to 61%”

81

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

High Processor Utilization

First, determine what is “high”

Is the router spending most of its time

– Running processes

– Interrupt context

What other processing is going on in the system?

What is normal for this router/location/time/day/month?

7200VXR#sh proc cpu sorted

CPU utilization for five seconds: 61%/22%; one minute: 37%; five minutes: 36%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

185 330388 3657407 90 39.22% 23.46% 24.42% 0 encrypt proc

58 8 348403 0 0.08% 0.07% 0.08% 0 T3E3 EC IPC POLL

235 4 837 4 0.08% 0.00% 0.00% 0 IP-EIGRP: HELLO

82

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

High Processor Utilization

If the total processing is “high”, but the interrupt level is “low”, then take several snapshots of show processes CPU

7200VXR#sh proc cpu sorted

CPU utilization for five seconds: 61%/22%; one minute: 37%; five minutes: 36%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

185 330388 3657407 90 39.22% 23.46% 24.42% 0 encrypt proc

58 8 348403 0 0.08% 0.07% 0.08% 0 T3E3 EC IPC

POLL

235 4 837 4 0.08% 0.00% 0.00% 0 IP-EIGRP: HELLO

221 4 54658 0 0.08% 0.01% 0.00% 0 PPP Events

42 300 2511 119 0.08% 0.04% 0.05% 0 Net Background

46 112 1802 62 0.08% 0.07% 0.08% 0 Per-Second Jobs

Processes High, Interrupt Low

83

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

High Processor Utilization

For instance, if IP Input is high, this indicates there are a large number of packets which are being switched at the process level; try to figure out why

7206_VXR#show processes cpu

CPU utilization for five seconds: 83%/21%; one minute: 79%; five minutes:

84%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

35 4520 68993 65 48.00% 30.00% 52.00% 0 IP Input

....

Processes High, Interrupt Low

84

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

High Processor Utilization

Here, the processor is high on the right side of the slash, which means the processor is spending most of its time in interrupt context

7206_VXR#show processes cpu

CPU utilization for five seconds: 80%/75%; one minute: 86%; five minutes:

92%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

2 68 227 299 0.00% 0.00% 0.00% 0 Exec

3 368920 138425 2665 0.08% 0.02% 0.00% 0 Check heaps

4 4 1 4000 0.00% 0.00% 0.00% 0 Chunk Manager

Interrupt High

85

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

High Processor Utilization

Check each interface for traffic levels

If the traffic levels are high, but look normal (a baseline is needed to know this), then the router could just be running at a normal level

If the traffic levels look too high compared to the baseline, then you should look for changes in the traffic flow

High Interrupt

86

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

High Forwarding Plane Utilization

Each multi-brain device will have different measurements for FP health.

Take a look at the show platform commands for a clue.

Look for punts going to the control plane. Traffic causing punts can kill your router in a hurry.

Know Your Multi-Brain Devices!

87

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Packet

6506#show fabric utilization

slot channel speed Ingress % Egress %

1 0 8G 22 23

2 0 8G 4 9

3 0 20G 0 1

3 1 20G 11 12

4 0 20G 0 1

4 1 20G 10 13

6 0 20G 0 1

6506#

Packet

Packet

Utilization Issues Internal to the Router

Routing Protocols

Processes

Handles Control Plane Traffic

Manages System

RIB

88

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Memory Utilization

Question: In your suspect router, in

how many places do you have

memory?

• FP Buffers & I/O Memory

• CP Memory

• Flash

• Modules

• DSP

• TCAM

• Modules

Routing Protocols

Processes

Handles Control Plane Traffic

Manages System

RIB

89

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Rules to Live By

Memory… Memory… Memory!

Buy it! Buy enough to have two images simultaneously in flash

Don’t delete the previous version of IOS®, even if you are booting off of the new one

Buy plenty of DRAM for future scale and features

– IOS behaves differently with different amounts of memory.

Buy the expensive memory

If you can afford it, buy maximum memory.

Feel free to complain to the nearest Cisco representative about the cost of memory.

Prevent

Prepare

Repair

90

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Failure Action Recovery Improve

Utilizing Post Mortems Prevent

Prepare

Repair

Every failure should improve the procedures

91

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Formalize the Post-Mortem

Create a formal procedure to review problems and results.

Procedures and baselines need to be living documents.

Conduct the post-mortem as soon after the event as possible.

No egos allowed! Don’t get caught playing the blame-game.

Every action requires an after-action report and analysis.

Analysis should feed back to design, procedures and planning.

It’s never cost effective to design for every possible failure, but that doesn’t mean that procedures can’t be prepared.

Don’t Forget the Most Important Part! Prevent

Prepare

Repair

92

Questions?

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Maximize your Cisco Live experience with your

free Cisco Live 365 account. Download session

PDFs, view sessions on-demand and participate in

live activities throughout the year. Click the Enter

Cisco Live 365 button in your Cisco Live portal to

log in.

Complete Your Online Session Evaluation

Give us your feedback and you could win fabulous prizes. Winners announced daily.

Receive 20 Cisco Daily Challenge points for each session evaluation you complete.

Complete your session evaluation online now through either the mobile app or internet kiosk stations.

94

© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public

Final Thoughts

Get hands-on experience with the Walk-in Labs located in World of Solutions, booth 1042

Come see demos of many key solutions and products in the main Cisco booth 2924

Visit www.ciscoLive365.com after the event for updated PDFs, on-demand session videos, networking, and more!

Follow Cisco Live! using social media:

– Facebook: https://www.facebook.com/ciscoliveus

– Twitter: https://twitter.com/#!/CiscoLive

– LinkedIn Group: http://linkd.in/CiscoLI

95