network diagnosis: prevent, prepare,...
TRANSCRIPT
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
The Classic “Whodunit” Is Back— With a Modern Twist!
Features Updated Suspects and Problems!
Deck of Red Herring Cards!
Who?
Where?
With What?
4
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
6500
Data Center
High CPU
ASR1K
Edge
Dropping Packets
3945
Remote Site
Stuck in rommon
?
? ?
5
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Agenda
Preparation and Methodology
Prevent, Prepare, Repair
Know Your Crime Scene and Your Suspects
Overview of Troubleshooting Methodology
Troubleshooting and Repairing
Troubleshooting Packet Flow
Utilization Issues
6
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Agenda
Preparation and Methodology
Prevent, Prepare, Repair
Know Your Crime Scene and Your Suspects
Overview of Troubleshooting Methodology
Troubleshooting and Repairing
Troubleshooting Packet Flow
Utilization Issues
7
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Prevent, Prepare, Repair
– What is the “normal” state of your network?
– How do you “expect” it to fail?
Know Your Crime Scene and Your Suspects
– Architectures in your network and their failure modes.
Overview of Troubleshooting Methodology
– Detection 101
Preparation and Methodology
9
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Never underestimate the role of good baselining.
Understand what is “normal” for your network. – If traffic to the web peaks every Friday at 5PM, then be ready for
it.
– If backups to the data center happen every morning at 2AM, then you might have problems when the CEO decides to do a late-night Telepresence with Sydney.
Document the things you expect to go wrong and create plans to deal with them. – Nothing can replace a well-thought-out plan when the s*** hits the
fan.
– It isn’t practical to prevent all potential failures, but it is practical to plan for them!
Preparing for the Crime
10
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Automatically logs commands and changes to configurations including the user ID of who made the change.
3945# configure terminal
3945(config)# archive
3945(config-archive)# log config
3945(config-archive-log-config)# logging enable
3945(config-archive-log-config)# logging size 200
3945(config-archive-log-config)# hidekeys
3945(config-archive-log-config)# notify syslog
3945# show archive log config 1 2
idx sess user@line Logged command
1 1 user1@console logging enable
2 1 user1@console logging size 200
IOS Change Notification and Logging
11
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Prevent, Prepare, Repair
Anatomy of a Recovery
Failure Action Recovery
12
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Prevent!
Avoiding Failure…..
Failure Action Recovery
13
Prevent, Prepare, Repair
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Prepare!
Reducing Action/Reaction/Troubleshooting…..
Failure Action Recovery
14
Prevent, Prepare, Repair
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Repair
Failure
Getting to Recovery…..
Failure Action Recovery
15
Prevent, Prepare, Repair
Albert Einstein
“Insanity: doing the same thing over and over again and expecting different results.”
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
How can we Prevent this in the future
Learning from Experience…..
Failure Action Recovery
If we can’t prevent……
How could we Repair this faster next time
How can we better Prepare
Improve
Improve!
If we are as prepared as we can be….
17
Prevent, Prepare, Repair, Improve
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Your Work Isn’t over when the Network Is Back!
Formalize the Post-Mortem Procedure
Understand if this failure was planned for. – If it was, were the documented procedures adequate?
– If it wasn’t, then why not?
Every failure should improve the procedures.
Eventually every failure is planned for.
Filling Out the Report
18
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Single Point of Failures?
Bottlenecks?
Convergence:
Timers Where We Want Them?
Bandwidth of Links: Are They Being Utilized Efficiently?
Is There Enough Bandwidth to Get the Job Done?
Quality of Service?
How Can Our Network Be Improved?
CPU Percentage: What Is Using the Most Time?
Is this Okay?
19
Prevent, Prepare, Repair, Improve
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Failure Action Recovery
20
Prevent, Prepare, Repair, Improve
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Network Engineers
Management/Policy
Single Point of Failures?
Bottlenecks? Faster Troubleshooting?
Quality of Networking Documents?
How Can Our Network Be Improved?
Bandwidth of IT Staff: Are They Being Utilized Efficiently?
Enough Bandwidth to Get the Job Done?
“CPU” Percentage of IT Staff: What Is Using the Most Time?
Is this Okay?
Prevent
Prepare
Repair
Improve
Network
Packet
21
Prevent, Prepare, Repair, Improve
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Prevent, Prepare, Repair
Know Your Crime Scene and Your Suspects
Overview of Troubleshooting Methodology
Preparation and Methodology
23
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Know Your Crime Scene and Your Suspects
What?
Where?
Why?
Packet
Packet
Packet
24
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Soap Box: Network Diagrams
Up to date
Easy to read
Have multiple diagrams—physical and logical
Use colors
Use keys
Identify
– Trunk vs. non-trunk
– Port channels
– Primary vs. Backup Paths
Prevent
Prepare
Repair
25
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Physical
Data Link
Network
Transport
Session
Presentation
Application
Router
Cards
Interfaces
Routing
Traffic
Packet
Know Your Crime Scene and Your Suspects
26
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Know Your Crime Scene and Your Suspects
27
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
“Routers operate in two different planes of existence.”
Control plane,
in which the router learns the outgoing interface that is most appropriate for
forwarding specific packets to specific destinations,
Forwarding plane,
which is responsible for the actual process of sending a packet received on a logical
interface to an outbound logical interface.”
Control plane processing
leads to the construction of
what is variously called a
routing table or routing
information base (RIB)
Processes Routing Protocols
Established,
maintained,
information
exchanged, best
path selection
Control
Plane
Forwarding
Plane
Know Your Crime Scene and Your Suspects
Source: http://en.wikipedia.org/wiki/Router
CPU
Packet
28
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Different Devices Fail In Different Ways!
Understand where you have basic “single-brain” devices and where you have unique “dual-brain” devices.
Dual-brain devices often have unique characteristics and caveats. – Example: The ASR has a limited amount of TCAM memory typically used for fast
ACL processing. Stay within that memory and life is good. Step outside and things change.
Understand Common Failure Modes of Your Suspects.
29
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Prevent, Prepare, Repair
Know Your Crime Scene and Your Suspects
Overview of Troubleshooting Methodology
Preparation and Methodology
31
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
1. Percentage of all network outages caused by natural disasters: 11%
2. Percentage of network downtime caused by natural disasters: 62%
3. Percentage caused by human error: 49%
4. Increase in telecom repair costs, 1994-2002: 133%
5. 99.5% network reliability rate, in minutes of downtime per month: 216
6. 99.99% network reliability rate, in minutes of downtime per month: 4.5
7. 99.999% network reliability rate, in seconds of downtime per month: 29.5
8. Ratio of how fast a bad reputation spreads to how quickly a good reputation spreads: 24:1
– Sources: 1,2,3, IEEE Computer; 4, U.S. Census Bureau; 5,6,7, Telephony Online
Reporting the Crime
32
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
You Should Know when an Outage Occurs Before Your Users Do!
If you wait for the report of an outage, it’s already too late.
Make routine use of network monitoring tools: – IPSLA
– SNMP Traps
– Network Management Tools (both free and paid)
– Application Performance Tools (both free and paid)
33
Reporting the Crime
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Troubleshooting Methodology
What is the problem? Gathering the Facts
Where is the problem? Looking for the Clues – 1st Pass: Looking for the Quick Confession or Obvious Clues
– 2nd Pass: Scavenger Hunt for Clues
– 3rd Pass: Prepping for Deeper Troubleshooting
– The Stakeout
34
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
What is normal? What has changed?
Did someone change
something?
Did the network itself change
something?
What is the problem? Gathering the Facts
What is happening? – Understanding the problem can be the biggest step.
Connectivity loss? – Packet loss?
– Latency?
– Network Management System shows issue
35
Troubleshooting Methodology
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Pings, Traceroutes, Show IP Route Show ip route
Ping
Extended ping
Traceroute
Extended traceroute
R1 R2
R3
R4
R5
1st Pass: Looking for the Quick Confession or Obvious Clues
36
Troubleshooting Methodology
Sherlock Holmes
“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Status of Routing Peer Relationships
IP Interface Status
Interface Issues
Interface Summary Information
Device Specific Issues
Application Issues
IP SLA & Application Performance
2nd Pass: Scavenger Hunt for Clues
R1 R2
R3
R4
R5
39
Troubleshooting Methodology
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Status of Peer Relationships Quick Pass to Look for Suspicious Activity
R1#show tcp brief
TCB Local Address Foreign Address (state)
495A1974 10.1.1.1.60263 10.1.1.2.179 ESTAB
4933398C 10.1.1.1.646 10.1.1.2.24193 ESTAB
4958E448 10.1.1.1.32644 10.1.1.4.639 ESTAB
4958DBC8 10.1.1.1.711 10.1.1.4.13304 ESTAB
BGP
MSDP TDP
LDP
R1 R2
R3
R4
R5
Is this normal for your network? How would you know?
40
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Status of Peer Relationships
R1#show tcp
Stand-alone TCP connection to host 10.1.1.12
.
.
SRTT: 146 ms, RTTO: 1283 ms, RTV: 1137 ms, KRTT: 0 ms
minRTT: 0 ms, maxRTT: 300 ms, ACK hold: 200 ms
Flags: higher precedence, nagle, path mtu capable
Datagrams (max data segment is 1460 bytes):
Rcvd: 1454 (out of order: 0) ……
Sent: 1454 (retransmit: 0) ……
R1 R2
R3
R4
R5
TCP
41
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Status of Routing Peer Relationships
InQ: Number of messages queued to be processed from the neighbor
OutQ: Number of messages queued to be sent to the neighbor
R1#show ip bgp summary
BGP router identifier 10.1.1.1, local AS number 100
BGP table version is 211, main routing table version 211
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
3.8.4.5 4 100 8010 8009 211 0 0 5d13h 101
BGP
42
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Status of Routing Peer Relationships
Quick Count: Number of EIGRP packets (update, query, and reply) that the software is waiting to send
R1#show ip eigrp neighbors
IP-EIGRP neighbors for process 100
H Address Interface Hold Uptime SRTT RTO Q Seq
(sec) (ms) Cnt Num
1 10.2.2.2 Gi0/2 14 6d07h 8 200 0 4
0 10.1.1.2 Gi0/1 14 6d07h 1 200 0 6
R1 R2
R3
R4
R5
EIGRP – Quick Pass to Look for Smoking Gun
43
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Status of Routing Peer Relationships
R1#show ip ospf neighbor
Neighbor ID Pri State Dead Time Address Interface
10.4.4.2 1 FULL/BDR 00:00:37 10.2.2.2 GigabitEthernet0/2
10.3.3.2 1 FULL/BDR 00:00:33 10.1.1.2 GigabitEthernet0/1
R1#show ip ospf neighbor detail | include up for
Neighbor is up for 6d07h
Neighbor is up for 6d07h
OSPF – Quick Pass to Look for Smoking Gun
44
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Interface IP Status
Interfaces with IP addresses assigned — are they up/up?
R1#show ip int brief
Interface IP-Address OK? Method Status Protocol
GigabitEthernet4/1 10.1.1.2 YES manual up up
GigabitEthernet4/2 10.3.3.2 YES manual up up
GigabitEthernet4/3 12.1.1.1 YES manual down down
GigabitEthernet4/4 unassigned YES unset down down
GigabitEthernet4/5 unassigned YES unset administratively down down
Only “no shut” interfaces you plan on using
Use of descriptions and then show ip int description
45
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Interface Errors
show interface | include drops
show interface | include errors
7206_VXR#sh int | include drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 12
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
7206_VXR#sh int | include errors
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
4 output errors, 0 collisions, 1 interface resets
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
802 output errors, 2705 collisions, 1 interface resets
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 0 interface resets
46
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
3750G#sh int summary
*: interface is up
IHQ: pkts in input hold queue IQD: pkts dropped from input queue
OHQ: pkts in output hold queue OQD: pkts dropped from output queue
RXBS: rx rate (bits/sec) RXPS: rx rate (pkts/sec)
TXBS: tx rate (bits/sec) TXPS: tx rate (pkts/sec)
TRTL: throttle count
Interface IHQ IQD OHQ OQD RXBS RXPS TXBS TXPS TRTL
-------------------------------------------------------------------------
* GigabitEthernet1/0/1 0 0 0 0 0 0 0 0 0
* GigabitEthernet1/0/2 0 0 0 0 0 0 0 0 0
GigabitEthernet1/0/3 0 0 0 0 0 0 0 0 0
GigabitEthernet1/0/4 0 0 0 0 0 0 0 0 0
GigabitEthernet1/0/5 0 0 0 0 0 0 0 0 0
GigabitEthernet1/0/6 0 0 0 0 0 0 0 0 0
GigabitEthernet1/0/7 0 0 0 0 0 0 0 0 0
GigabitEthernet1/0/8 0 0 0 0 0 0 0 0 0
GigabitEthernet1/0/9 0 0 0 0 0 0 0 0 0
GigabitEthernet1/0/10 0 0 0 0 0 0 0 0 0
* GigabitEthernet1/0/11 0 0 0 0 0 0 5000 5 0
* GigabitEthernet1/0/12 0 0 0 0 4000 5 0 0 0
Interface Summary
show interface summary
47
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Troubleshooting Methodology
load-interval 30 configured on interested interfaces
clear counters
Ensure logging
clear log
3rd Pass: Prepping for Deeper Troubleshooting (Still Looking for a Suspect)
48
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Troubleshooting Methodology
Show proc cpu
Show ip traffic (clear ip traffic)
Show log (clear log)
Clear counters
Other Crime/Suspect relevant commands
Potential debugs
Show Platform and other platform specific steps
“The Stakeout” – Very Very Suspicious Router
49
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Troubleshooting Methodology
“The Stakeout” – Very Very Suspicious Router
C6K#sh ip traffic
IP statistics:
Rcvd: 98406446 total, 1289848 local destination
0 format errors, 0 checksum errors, 45999 bad hop count
0 unknown protocol, 48761 not a gateway
0 security failures, 0 bad options, 7576 with options
Opts: 0 end, 0 nop, 0 basic security, 0 loose source route
0 timestamp, 0 extended security, 0 record route
0 stream ID, 0 strict source route, 7576 alert, 0 cipso, 0 ump
0 other
Frags: 0 reassembled, 0 timeouts, 0 couldn't reassemble
0 fragmented, 0 couldn't fragment
Bcast: 960090 received, 0 sent
Mcast: 292915 received, 406578 sent
Sent: 418580 generated, 3763770399 forwarded
Drop: 11 encapsulation failed, 0 unresolved, 0 no adjacency
251 no route, 0 unicast RPF, 0 forced drop
0 options denied, 0 source IP address zero
.
.
50
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Troubleshooting Methodology
“The Stakeout” – Very Very Suspicious Router
.
.
.
ICMP statistics:
Rcvd: 0 format errors, 0 checksum errors, 0 redirects, 0 unreachable
4 echo, 0 echo reply, 0 mask requests, 0 mask replies, 0 quench
0 parameter, 0 timestamp, 0 info request, 0 other
0 irdp solicitations, 0 irdp advertisements
0 time exceeded, 0 timestamp replies, 0 info replies
Sent: 0 redirects, 254 unreachable, 0 echo, 4 echo reply
0 mask requests, 0 mask replies, 0 quench, 0 timestamp
0 info reply, 0 time exceeded, 0 parameter problem
0 irdp solicitations, 0 irdp advertisements.
.
.
.
.
51
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Router
Cards
Interfaces
Routing
Traffic
…. Questioning Your Suspect
53
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Agenda
Preparation and Methodology
Prevent, Prepare, Repair
Know Your Crime Scene and Your Suspects
Overview of Troubleshooting Methodology
Troubleshooting and Repairing
Troubleshooting Packet Flow
Utilization Issues
54
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Troubleshooting Packet Flow
Utilization Issues
Troubleshooting and Repairing
56
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Troubleshooting Packet Flow
Knowing Your Suspect Router’s M.O.
Understand Normal Packet Flow
Cisco Express Forwarding (CEF) Load Sharing
EtherChannel Load Balancing
57
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
*Modus operandi (often used in the abbreviated forms M.O. or simply
Method) is a Latin phrase, approximately translated as "mode of
operation".
Packet
Packet
Source: http://en.wikipedia.org/wiki/Modus_operandi
Knowing Your Suspect Router’s M.O.*
Packet
58
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Knowing Your Suspect Router’s M.O.
59
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Packet
Packet
Packet
“Classic Routers” (aka “Single Brain” Routers)
Knowing Your Suspect Router’s M.O.
With “classic” routers there is one brain that is involved with all of the forwarding of packets
60
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Routing Protocols
Processes
Handles Control Plane Traffic
Manages System
RIB
Knowing Your Suspect Router’s M.O.
Examples:
Intelligent Line Cards
Separate Forwarding Processor
– ASR ESPs
VPN accelerators
– AIM VPN
Digital Signal Processors
Multiple Software Images
Multiple Licenses
Multiple Forwarding Paths
“Distributed Services Routers”
61
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Packet
We need to know how to translate “Yo, dude, what did you do with my packet?” into a show command so that we can get a meaningful answer from our suspect router
Example:
– Separate Forwarding Processor
ASR ESPs
– VPN accelerators
AIM VPN
Packet
62
Routing Protocols
Processes
Handles Control Plane Traffic
Manages System
RIB
Knowing Your Suspect Router’s M.O.
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Asking the Right Questions Just one more
question….
What did you do with
the packet?
63
Packet
Packet
Routing Protocols
Processes
Handles Control Plane Traffic
Manages System
RIB
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Route Processor:
ASR-1000-RP2
Forwarding Processor:
ASR-1000-ESP20
Knowing Your Suspect Router’s M.O.
RP (Route Processor)
– Handles control plane traffic
– Manages system
ESP
– Handles forwarding plane traffic
SPA Carrier Card
– Houses the SPAs
– Not like a 7600 SIP
SPAs
– Provide interface connectivity
ASR-1004
64
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Services Ready Engine
Single or Dual Brain?
Single IOS brain for forwarding
DSP
– Handles voice conferencing and transcoding
Etherswitch Module
– L2/L3 integrated Catalyst switch on a module
Services Ready Engine
– Server-on-a-Blade for Applications
3945 ISR G2 Etherswitch Module
DSP Module:
PVDM3
65
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
IOS Order of Operations
1. RITE 2. EPC 3. QoS Drop 4. VRF Classify 5. Packet Debug 6. Netflow 7. LISP 8. BGP Policy Map 9. QoS Classify 10.Fragment Assembly 11.LI 12. IPS 13.Firewall 14.ACL
15.SBC 16.FPM 17. IPSec Decrypt 18.QoS Marking 19.Policing 20.QoS post-crypto Classify 21.WAAS 22.EZVPN 23.Accounting 24.NAT Outside 25.Policy Routing 26.WCCP 27.VRF Select 28.BOOTP/DHCP Reply
Input Feature Processing in 15.1(3)T
66
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
IOS Order of Operations
1. QoS Classification 2. NAT Inside 3. NHRP 4. WCCP 5. NAT Outside 6. BGP Policy Map 7. IPSec Classify 8. CTS 9. QoS Classification 10.Firewall 11. IPS 12.QoS Drop 13.ACL
14.FPM 15.WAAS 16.QoS Marking 17.Accounting 18.RSVP 19.Policing 20.Netflow 21. IPSec Encrypt 22.Packet Debug 23.Packet Capture 24.HW Checks
Output Feature Processing in 15.1(3)T
67
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Cisco Express Forwarding Load Sharing
Traffic will always flow along one link since there is only one source address and one destination address
Per packet load sharing will make the traffic flow along both links, at the risk of out of order packets
Some platforms support per-packet and per-flow load balancing while others do not. Know your suspects!
68
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Cisco Express Forwarding Load Sharing
Potential to still have the two sources using the same link
Even if the traffic is split across the links, links will still likely show unequal utilization
Telnet
FTP
69
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Cisco Express Forwarding Load Sharing
Over time, the traffic may statistically work out to be equal across multiple links
But it probably won’t
Ratios has high as 70% to 30% can be normal
Per packet load sharing can resolve this, but at the risk of out-of-order packets
70
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
EtherChannel Load Balancing
What type of device is it?
What instructions will it apply to determine what to do?
?
Packet
71
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
EtherChannel Load Balancing
SUP_720#sh etherchannel load-balance
EtherChannel Load-Balancing Configuration:
src-dst-ip
mpls label-ip
EtherChannel Load-Balancing Addresses Used Per-Protocol:
Non-IP: Source XOR Destination MAC address
IPv4: Source XOR Destination IP address
IPv6: Source XOR Destination IP address
MPLS: Label or IP
6500/7600 Sup720
6500/7600
Sup720
Packet
72
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
EtherChannel Load Balancing
3750#sh etherchannel load-balance
EtherChannel Load-Balancing Configuration:
src-mac
EtherChannel Load-Balancing Addresses Used Per-Protocol:
Non-IP: Source MAC address
IPv4: Source MAC address
IPv6: Source MAC address
3750
What is the
Source MAC?
Packet
L3 Forwarding means all packets will come from 3750’s CPU!
73
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Troubleshooting Packet Flow
Utilization Issues
Troubleshooting and Repairing
75
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Utilization Issues
Utilization noun
The act of utilizing, or the state of being utilized
The condition of being put to use *
Interfaces
Main CPU
Memory
* http://www.answers.com/topic/utilization
76
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Utilization Issues 7600#show platform hardware capacity
Power Resources
…
System power: 5772W, 0W (0%) inline, 1559W (27%) total allocated
…
Flash/NVRAM Resources
Usage: Module Device Bytes: Total Used %Used
3 dfc#3-bootdisk: 1014251520 966656 1%
6 SP disk0: 256462848 194871296 76%
6 SP sup-bootdisk: 512024576 416456704 81%
…
CPU Resources
CPU utilization: Module 5 seconds 1 minute 5 minutes
3 2% / 0% 5% 5%
6 RP 0% / 0% 0% 0%
6 SP 7% / 0% 5% 4%
Processor memory: Module Bytes: Total Used %Used
3 1712688084 406650040 24%
6 RP 908176176 127643508 14%
6 SP 861979340 212723360 25%
I/O memory: Module Bytes: Total Used %Used
3 134217728 31395560 23%
6 RP 67108864 13204508 20%
6 SP 67108864 13205416 20%
77
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Utilization Issues …(show platform hardware capacity (continued))…
EOBC Resources
Module Packets/sec Total packets Dropped packets
3 Rx: 10 2111619 0
Tx: 4 1179400 0
6 RP Rx: 6 3326992 0
Tx: 5 3256757 0
6 SP Rx: 3 3413610 0
Tx: 9 5910012 3
VLAN Resources
VLANs: 4094 total, 5 VTP, 0 extended, 19 internal, 4070 free
L2 Forwarding Resources
MAC Table usage: Module Collisions Total Used %Used
3 0 98304 9 1%
6 0 65536 9 1%
VPN CAM usage: Total Used %Used
512 0 0%
78
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Utilization Issues
High Processor Utilization
Utilization Issues Internal to the Router
Memory Utilization
Utilizing Post Mortems
79
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
High Processor Utilization
Signs of high processor utilization – Router acting “slow” at the console
– Router inaccessible through telnet
– Packet loss
– Router adding a large amount of delay to packets traversing it
“Processor” can have lots of different meanings depending on platform
show platform commands on most boxes will give an insight into the different processing components on distributed platforms
At what point do we decide that we’ve outgrown this router? – This is an easier decision to make well ahead of time than under the gun.
80
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Processor Utilization
7200VXR#sh process cpu sorted
CPU utilization for five seconds: 61%/22%; one minute: 37%; five minutes: 36%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
185 330388 3657407 90 39.22% 23.46% 24.42% 0 encrypt proc
58 8 348403 0 0.08% 0.07% 0.08% 0 T3E3 EC IPC POLL
235 4 837 4 0.08% 0.00% 0.00% 0 IP-EIGRP: HELLO
39.22
+ .08
+ .08
+….
Very, Very Common Statement
“I add up the 5sec column but it doesn’t add up to 61%”
81
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
High Processor Utilization
First, determine what is “high”
Is the router spending most of its time
– Running processes
– Interrupt context
What other processing is going on in the system?
What is normal for this router/location/time/day/month?
7200VXR#sh proc cpu sorted
CPU utilization for five seconds: 61%/22%; one minute: 37%; five minutes: 36%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
185 330388 3657407 90 39.22% 23.46% 24.42% 0 encrypt proc
58 8 348403 0 0.08% 0.07% 0.08% 0 T3E3 EC IPC POLL
235 4 837 4 0.08% 0.00% 0.00% 0 IP-EIGRP: HELLO
82
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
High Processor Utilization
If the total processing is “high”, but the interrupt level is “low”, then take several snapshots of show processes CPU
7200VXR#sh proc cpu sorted
CPU utilization for five seconds: 61%/22%; one minute: 37%; five minutes: 36%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
185 330388 3657407 90 39.22% 23.46% 24.42% 0 encrypt proc
58 8 348403 0 0.08% 0.07% 0.08% 0 T3E3 EC IPC
POLL
235 4 837 4 0.08% 0.00% 0.00% 0 IP-EIGRP: HELLO
221 4 54658 0 0.08% 0.01% 0.00% 0 PPP Events
42 300 2511 119 0.08% 0.04% 0.05% 0 Net Background
46 112 1802 62 0.08% 0.07% 0.08% 0 Per-Second Jobs
Processes High, Interrupt Low
83
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
High Processor Utilization
For instance, if IP Input is high, this indicates there are a large number of packets which are being switched at the process level; try to figure out why
7206_VXR#show processes cpu
CPU utilization for five seconds: 83%/21%; one minute: 79%; five minutes:
84%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
35 4520 68993 65 48.00% 30.00% 52.00% 0 IP Input
....
Processes High, Interrupt Low
84
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
High Processor Utilization
Here, the processor is high on the right side of the slash, which means the processor is spending most of its time in interrupt context
7206_VXR#show processes cpu
CPU utilization for five seconds: 80%/75%; one minute: 86%; five minutes:
92%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
2 68 227 299 0.00% 0.00% 0.00% 0 Exec
3 368920 138425 2665 0.08% 0.02% 0.00% 0 Check heaps
4 4 1 4000 0.00% 0.00% 0.00% 0 Chunk Manager
Interrupt High
85
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
High Processor Utilization
Check each interface for traffic levels
If the traffic levels are high, but look normal (a baseline is needed to know this), then the router could just be running at a normal level
If the traffic levels look too high compared to the baseline, then you should look for changes in the traffic flow
High Interrupt
86
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
High Forwarding Plane Utilization
Each multi-brain device will have different measurements for FP health.
Take a look at the show platform commands for a clue.
Look for punts going to the control plane. Traffic causing punts can kill your router in a hurry.
Know Your Multi-Brain Devices!
87
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Packet
6506#show fabric utilization
slot channel speed Ingress % Egress %
1 0 8G 22 23
2 0 8G 4 9
3 0 20G 0 1
3 1 20G 11 12
4 0 20G 0 1
4 1 20G 10 13
6 0 20G 0 1
6506#
Packet
Packet
Utilization Issues Internal to the Router
Routing Protocols
Processes
Handles Control Plane Traffic
Manages System
RIB
88
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Memory Utilization
Question: In your suspect router, in
how many places do you have
memory?
• FP Buffers & I/O Memory
• CP Memory
• Flash
• Modules
• DSP
• TCAM
• Modules
Routing Protocols
Processes
Handles Control Plane Traffic
Manages System
RIB
89
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Rules to Live By
Memory… Memory… Memory!
Buy it! Buy enough to have two images simultaneously in flash
Don’t delete the previous version of IOS®, even if you are booting off of the new one
Buy plenty of DRAM for future scale and features
– IOS behaves differently with different amounts of memory.
Buy the expensive memory
If you can afford it, buy maximum memory.
Feel free to complain to the nearest Cisco representative about the cost of memory.
Prevent
Prepare
Repair
90
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Failure Action Recovery Improve
Utilizing Post Mortems Prevent
Prepare
Repair
Every failure should improve the procedures
91
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Formalize the Post-Mortem
Create a formal procedure to review problems and results.
Procedures and baselines need to be living documents.
Conduct the post-mortem as soon after the event as possible.
No egos allowed! Don’t get caught playing the blame-game.
Every action requires an after-action report and analysis.
Analysis should feed back to design, procedures and planning.
It’s never cost effective to design for every possible failure, but that doesn’t mean that procedures can’t be prepared.
Don’t Forget the Most Important Part! Prevent
Prepare
Repair
92
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Maximize your Cisco Live experience with your
free Cisco Live 365 account. Download session
PDFs, view sessions on-demand and participate in
live activities throughout the year. Click the Enter
Cisco Live 365 button in your Cisco Live portal to
log in.
Complete Your Online Session Evaluation
Give us your feedback and you could win fabulous prizes. Winners announced daily.
Receive 20 Cisco Daily Challenge points for each session evaluation you complete.
Complete your session evaluation online now through either the mobile app or internet kiosk stations.
94
© 2013 Cisco and/or its affiliates. All rights reserved. BRKARC-2002 Cisco Public
Final Thoughts
Get hands-on experience with the Walk-in Labs located in World of Solutions, booth 1042
Come see demos of many key solutions and products in the main Cisco booth 2924
Visit www.ciscoLive365.com after the event for updated PDFs, on-demand session videos, networking, and more!
Follow Cisco Live! using social media:
– Facebook: https://www.facebook.com/ciscoliveus
– Twitter: https://twitter.com/#!/CiscoLive
– LinkedIn Group: http://linkd.in/CiscoLI
95